Re: Make safe_iterator inline friends

2018-08-28 Thread François Dumont

On 28/08/2018 21:04, Jonathan Wakely wrote:

On 23/08/18 22:59 +0200, François Dumont wrote:

On 22/08/2018 23:45, Jonathan Wakely wrote:

On 22/08/18 23:08 +0200, François Dumont wrote:
Only operator== and != remains outside _Safe_iterator because all 
my attempts to make them inline friends failed. I understand that 
an inline friend within a base class is not a very clean design.


Compiler error was:

/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_iterator.h:459: 
error: redefinition of 'bool __gnu_debug::operator==(const _Self&, 
const _OtherSelf&)'
/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_iterator.h:452: 
note: 'bool __gnu_debug::operator==(const _Self&, const _Self&)' 
previously declared here
/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_iterator.h:473: 
error: redefinition of 'bool __gnu_debug::operator!=(const _Self&, 
const _OtherSelf&)'
/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_iterator.h:466: 
note: 'bool __gnu_debug::operator!=(const _Self&, const _Self&)' 
previously declared here


I don't know if it is a compiler issue


I don't think so. The error seems clear: when _Self and _OtherSelf are
the same type the friend declarations are the same function.


_Self and _OtherSelf and like the types defined in 
_Safe_iterator<_It, _Sq, random_access_interator_tag> in this patch. 
Depending on __conditional_type so definitely different.


What about containers like std::set where iterator and const_iterator
are the same type?


Good idear but no, it is not that.

It really looks like g++ consider the inline friend definition in the 
context of the instantiation of _Safe_iteratorstd::forward_access_iterator_tag> but also in the context of 
_Safe_iterator and same for RAI.


I don't know if this behavior is correct or not but for sure it is not a 
clean design so I would prefer to avoid it keeping those operators in 
__gnu_debug namespace. Attach is my attempt to inline those if you want 
to have a closer look and maybe fill a compiler bug entry.


However I considered this good remark to constraint even more the 
conversion constructor iterator -> const_iterator, the new patch is 
attached.




There's no changelog in the email with the patch 


    * include/debug/safe_iterator.h
    (_Safe_iterator<_It, _Seq, std::random_access_iterator_tag>::_Self):
    New.
    (_Safe_iterator<_It, _Seq, std::random_access_iterator_tag>
    ::_OtherSelf): New.
    (_GLIBCXX_DEBUG_VERIFY_OPERANDS, _GLIBCXX_DEBUG_VERIFY_CMP_OPERANDS)
    (_GLIBCXX_DEBUG_VERIFY_ORDER_OPERANDS)
    (_GLIBCXX_DEBUG_VERIFY_DIST_OPERANDS): Define macros.
    (_Safe_iterator<_It, _Seq, std::random_access_iterator_tag>
    ::operator+(difference_type)): Use latters, inline as friend.
    (_Safe_iterator<_It, _Seq, std::random_access_iterator_tag>
    ::operator-(difference_type)): Likewise.
    (operator<(const _Safe_iterator<>&, const _Safe_iterator<>&)): 
Likewise.

    (operator<=(const _Safe_iterator<>&, const _Safe_iterator<>&)):
    Likewise.
    (operator>(const _Safe_iterator<>&, const _Safe_iterator<>&)): 
Likewise.

    (operator>=(const _Safe_iterator<>&, const _Safe_iterator<>&)):
    Likewise.
    (operator-(const _Safe_iterator<>&, const _Safe_iterator<>&)): 
Likewise.

    (operator+(difference_type, const _Safe_iterator<>&)): Likewise.
    (operator-(const _Safe_iterator<>&, difference_type)): Likewise.
    (operator==(const _Safe_iterator<>&, const _Safe_iterator<>&)):
    Use _GLIBCXX_DEBUG_VERIFY_CMP_OPERANDS.
    (operator!=(const _Safe_iterator<>&, const _Safe_iterator<>&)):
    Likewise.
    * include/debug/safe_iterator.tcc
    (_Safe_iterator<>::_M_can_advance(difference_type)): Take parameter by
    copy.
    * include/debug/safe_local_iterator.h
    (_Safe_local_iterator<_It, _Seq>::_Self): New.
    (_Safe_local_iterator<_It, _Seq>::_OtherSelf): New.
    (_GLIBCXX_DEBUG_VERIFY_OPERANDS): Define macro.
    (operator==(const _Safe_local_iterator<>&,
    const _Safe_local_iterator<>&)): Use latter, inline as friend.
    (operator!=(const _Safe_local_iterator<>&,
    const _Safe_local_iterator<>&)): Likewise.
    * testsuite/util/testsuite_containers.h
    (struct forward_members_unordered<_Tp, bool>): Remove 2nd template
    parameter.
(forward_members_unordered<>::forward_members_unordered(value_type&)):
    Add using namespace std::rel_ops.
    Add iterator_concept_checks on local_iterator and const_local_iterator.
    Add asserts on comparison between const_local_iterator and
    local_iterator.
    (struct forward_members_unordered<_Tp, false>): Remove partial
    specialization.
    * testsuite/23_containers/forward_list/types/1.cc: New.
    * testsuite/23_containers/list/types/1.cc: New.

Ok to commit ?

François
diff --git a/libstdc++-v3/include/debug/safe_iterator.h b/libstdc++-v3/include/debug/safe_iterator.h
index af1b3a0ad3f..4298302a47e 100644
--- 

Re: [PATCH] C++: underline param in print_conversion_rejection (more PR c++/85110)

2018-08-28 Thread Jason Merrill
OK.

On Thu, Aug 23, 2018 at 11:01 AM, David Malcolm  wrote:
> Consider this bogus code (from g++.dg/diagnostic/param-type-mismatch-2.C):
>
> struct s4 { static int member_1 (int one, const char **two, float three); };
>
> int test_4 (int first, const char *second, float third)
> {
>   return s4::member_1 (first, second, third);
> }
>
> Before this patch, g++ emits:
>
> demo.cc: In function 'int test_4(int, const char*, float)':
> demo.cc:5:44: error: no matching function for call to 's4::member_1(int&, 
> const char*&, float&)'
> 5 |   return s4::member_1 (first, second, third);
>   |^
> demo.cc:1:24: note: candidate: 'static int s4::member_1(int, const char**, 
> float)'
> 1 | struct s4 { static int member_1 (int one, const char **two, float three); 
> };
>   |^~~~
> demo.cc:1:24: note:   no known conversion for argument 2 from 'const char*' 
> to 'const char**'
>
> With this patch, it highlights the pertinent parameter in the
> "no known conversion" note:
>
> demo.cc: In function 'int test_4(int, const char*, float)':
> demo.cc:5:44: error: no matching function for call to 's4::member_1(int&, 
> const char*&, float&)'
> 5 |   return s4::member_1 (first, second, third);
>   |^
> demo.cc:1:24: note: candidate: 'static int s4::member_1(int, const char**, 
> float)'
> 1 | struct s4 { static int member_1 (int one, const char **two, float three); 
> };
>   |^~~~
> demo.cc:1:56: note:   no known conversion for argument 2 from 'const char*' 
> to 'const char**'
> 1 | struct s4 { static int member_1 (int one, const char **two, float three); 
> };
>   |   ~^~~
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu; adds
> 15 PASS results to g++.sum.
>
> OK for trunk?
>
> I'm working on a followup to highlight the pertinent argument
> in the initial error diagnostic.
>
> gcc/cp/ChangeLog:
> PR c++/85110
> * call.c (print_conversion_rejection): Add "fn" param and use it
> for "no known conversion" messages to underline the pertinent
> param.
> (print_z_candidate): Supply "fn" to the new param above.
>
> gcc/testsuite/ChangeLog:
> PR c++/85110
> * g++.dg/diagnostic/param-type-mismatch-2.C: Update expected
> output to reflect underlining of pertinent parameter in decl
> for "no known conversion" messages.
> ---
>  gcc/cp/call.c  | 17 +--
>  .../g++.dg/diagnostic/param-type-mismatch-2.C  | 25 
> +-
>  2 files changed, 31 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/cp/call.c b/gcc/cp/call.c
> index 626830c..ef445e0 100644
> --- a/gcc/cp/call.c
> +++ b/gcc/cp/call.c
> @@ -3432,10 +3432,11 @@ equal_functions (tree fn1, tree fn2)
>return fn1 == fn2;
>  }
>
> -/* Print information about a candidate being rejected due to INFO.  */
> +/* Print information about a candidate FN being rejected due to INFO.  */
>
>  static void
> -print_conversion_rejection (location_t loc, struct conversion_info *info)
> +print_conversion_rejection (location_t loc, struct conversion_info *info,
> +   tree fn)
>  {
>tree from = info->from;
>if (!TYPE_P (from))
> @@ -3466,8 +3467,12 @@ print_conversion_rejection (location_t loc, struct 
> conversion_info *info)
>  inform (loc, "  no known conversion from %qH to %qI",
> from, info->to_type);
>else
> -inform (loc, "  no known conversion for argument %d from %qH to %qI",
> -   info->n_arg + 1, from, info->to_type);
> +{
> +  if (TREE_CODE (fn) == FUNCTION_DECL)
> +   loc = get_fndecl_argument_location (fn, info->n_arg);
> +  inform (loc, "  no known conversion for argument %d from %qH to %qI",
> + info->n_arg + 1, from, info->to_type);
> +}
>  }
>
>  /* Print information about a candidate with WANT parameters and we found
> @@ -3542,10 +3547,10 @@ print_z_candidate (location_t loc, const char *msgstr,
>r->u.arity.expected);
>   break;
> case rr_arg_conversion:
> - print_conversion_rejection (cloc, >u.conversion);
> + print_conversion_rejection (cloc, >u.conversion, fn);
>   break;
> case rr_bad_arg_conversion:
> - print_conversion_rejection (cloc, >u.bad_conversion);
> + print_conversion_rejection (cloc, >u.bad_conversion, fn);
>   break;
> case rr_explicit_conversion:
>   inform (cloc, "  return type %qT of explicit conversion function "
> diff --git a/gcc/testsuite/g++.dg/diagnostic/param-type-mismatch-2.C 
> b/gcc/testsuite/g++.dg/diagnostic/param-type-mismatch-2.C
> index 8cf2dab..4957f61 100644
> --- a/gcc/testsuite/g++.dg/diagnostic/param-type-mismatch-2.C
> +++ b/gcc/testsuite/g++.dg/diagnostic/param-type-mismatch-2.C
> 

[PATCH] Rewrite pic.md to improve medany and pic code size.

2018-08-28 Thread Jim Wilson
The pic.md file has patterns used only for the medany code model and for pic
code.  They match an unsplit 2-instruction address load pattern followed by
a load or store instruction, and emit an assembler macro that expands to two
instructions.  This replaces 3 instructions with 2.  Unfortunately, there
are a lot of broken patterns in the file.  It has things like
   (sign_extend:ANYI (mem:ANYI ...)
which can't work, as the sign_extend needs to be a larger mode than the mem.
There are also a lot of missing patterns.  It has patterns for sign and zero
extending loads, but not regular loads for instance.  Fixing these problems
required a major rewrite of the file.  While doing this, I noticed a case
that results in worse code, and had to change address costs to fix that.
I also ended up adding a number of new mode attributes and mode iterators
to support the new pic.md file.

This was tested with cross rv{32,64}-{elf,linux} builds and checks using the
medany code model.  There were no regressions.  I'm seeing about a 0.3% to 0.4%
code size reduction for newlib/glibc libc.a/libc.so.  This was also tested
with a cross kernel build and boot on qemu to verify that I didn't break kernel
builds.

Committed.

Jim

gcc/
* config/riscv/pic.md: Rewrite.
* config/riscv/riscv.c (riscv_address_insns): Return cost of 3 for
invalid address.
* config/riscv/riscv.md (ZERO_EXTEND_LOAD): Delete.
(SOFTF, default_load, softload, softstore): New.
---
 gcc/config/riscv/pic.md   | 113 --
 gcc/config/riscv/riscv.c  |   8 ++-
 gcc/config/riscv/riscv.md |  16 +-
 3 files changed, 91 insertions(+), 46 deletions(-)

diff --git a/gcc/config/riscv/pic.md b/gcc/config/riscv/pic.md
index a4a9732656c..942502058e0 100644
--- a/gcc/config/riscv/pic.md
+++ b/gcc/config/riscv/pic.md
@@ -22,71 +22,100 @@
 ;; Simplify PIC loads to static variables.
 ;; These should go away once we figure out how to emit auipc discretely.
 
-(define_insn "*local_pic_load_s"
+(define_insn "*local_pic_load"
   [(set (match_operand:ANYI 0 "register_operand" "=r")
-   (sign_extend:ANYI (mem:ANYI (match_operand 1 
"absolute_symbolic_operand" ""]
+   (mem:ANYI (match_operand 1 "absolute_symbolic_operand" "")))]
+  "USE_LOAD_ADDRESS_MACRO (operands[1])"
+  "\t%0,%1"
+  [(set (attr "length") (const_int 8))])
+
+(define_insn "*local_pic_load_s"
+  [(set (match_operand:SUPERQI 0 "register_operand" "=r")
+   (sign_extend:SUPERQI (mem:SUBX (match_operand 1 
"absolute_symbolic_operand" ""]
   "USE_LOAD_ADDRESS_MACRO (operands[1])"
-  "\t%0,%1"
+  "\t%0,%1"
   [(set (attr "length") (const_int 8))])
 
 (define_insn "*local_pic_load_u"
-  [(set (match_operand:ZERO_EXTEND_LOAD 0 "register_operand" "=r")
-   (zero_extend:ZERO_EXTEND_LOAD (mem:ZERO_EXTEND_LOAD (match_operand 1 
"absolute_symbolic_operand" ""]
+  [(set (match_operand:SUPERQI 0 "register_operand" "=r")
+   (zero_extend:SUPERQI (mem:SUBX (match_operand 1 
"absolute_symbolic_operand" ""]
   "USE_LOAD_ADDRESS_MACRO (operands[1])"
-  "u\t%0,%1"
+  "u\t%0,%1"
   [(set (attr "length") (const_int 8))])
 
-(define_insn "*local_pic_load"
-  [(set (match_operand:ANYF 0 "register_operand" "=f")
+;; We can support ANYF loads into X register if there is no double support
+;; or if the target is 64-bit.
+
+(define_insn "*local_pic_load"
+  [(set (match_operand:ANYF 0 "register_operand" "=f,*r")
(mem:ANYF (match_operand 1 "absolute_symbolic_operand" "")))
-   (clobber (match_scratch:DI 2 "=r"))]
-  "TARGET_HARD_FLOAT && TARGET_64BIT && USE_LOAD_ADDRESS_MACRO (operands[1])"
-  "\t%0,%1,%2"
+   (clobber (match_scratch:P 2 "=r,X"))]
+  "TARGET_HARD_FLOAT && USE_LOAD_ADDRESS_MACRO (operands[1])
+   && (!TARGET_DOUBLE_FLOAT || TARGET_64BIT)"
+  "@
+   \t%0,%1,%2
+   \t%0,%1"
   [(set (attr "length") (const_int 8))])
 
-(define_insn "*local_pic_load"
+;; ??? For a 32-bit target with double float, a DF load into a X reg isn't
+;; supported.  ld is not valid in that case.  Punt for now.  Maybe add a split
+;; for this later.
+
+(define_insn "*local_pic_load_32d"
   [(set (match_operand:ANYF 0 "register_operand" "=f")
(mem:ANYF (match_operand 1 "absolute_symbolic_operand" "")))
-   (clobber (match_scratch:SI 2 "=r"))]
-  "TARGET_HARD_FLOAT && !TARGET_64BIT && USE_LOAD_ADDRESS_MACRO (operands[1])"
-  "\t%0,%1,%2"
+   (clobber (match_scratch:P 2 "=r"))]
+  "TARGET_HARD_FLOAT && USE_LOAD_ADDRESS_MACRO (operands[1])
+   && (TARGET_DOUBLE_FLOAT && !TARGET_64BIT)"
+  "\t%0,%1,%2"
   [(set (attr "length") (const_int 8))])
 
-(define_insn "*local_pic_loadu"
-  [(set (match_operand:SUPERQI 0 "register_operand" "=r")
-   (zero_extend:SUPERQI (mem:SUBX (match_operand 1 
"absolute_symbolic_operand" ""]
-  "USE_LOAD_ADDRESS_MACRO (operands[1])"
-  "u\t%0,%1"
+(define_insn "*local_pic_load_sf"
+  [(set (match_operand:SOFTF 0 "register_operand" "=r")
+   (mem:SOFTF (match_operand 1 

Go patch committed: Remove hmap field from map types

2018-08-28 Thread Ian Lance Taylor
In https://golang.org/cl/91796 in the master Go repository the hmap
field was removed from map types.  While the Go frontend doesn't have
to be compatible, it may as well be.  This implements part of that
change for the Go frontend, just the compiler change and the required
libgo change.  This is in preparation for updating libgo to 1.11.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 263840)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-8deaafd14414bb5cbbdf3e2673f61b6d836d7d2a
+da249ffd264154cc992e76ff03f91f700d3bf53e
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 263749)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -7975,12 +7975,11 @@ Map_type::make_map_type_descriptor_type(
   Type* bool_type = Type::lookup_bool_type();
 
   Struct_type* sf =
-   Type::make_builtin_struct_type(12,
+   Type::make_builtin_struct_type(11,
   "", tdt,
   "key", ptdt,
   "elem", ptdt,
   "bucket", ptdt,
-  "hmap", ptdt,
   "keysize", uint8_type,
   "indirectkey", bool_type,
   "valuesize", uint8_type,
@@ -8065,11 +8064,6 @@ Map_type::do_type_descriptor(Gogo* gogo,
   vals->push_back(Expression::make_type_descriptor(bucket_type, bloc));
 
   ++p;
-  go_assert(p->is_field_name("hmap"));
-  Type* hmap_type = this->hmap_type(bucket_type);
-  vals->push_back(Expression::make_type_descriptor(hmap_type, bloc));
-
-  ++p;
   go_assert(p->is_field_name("keysize"));
   if (keysize > Map_type::max_key_size)
 vals->push_back(Expression::make_integer_int64(ptrsize, uint8_type, bloc));
Index: libgo/go/reflect/type.go
===
--- libgo/go/reflect/type.go(revision 263749)
+++ libgo/go/reflect/type.go(working copy)
@@ -351,7 +351,6 @@ type mapType struct {
key   *rtype // map key type
elem  *rtype // map element (value) type
bucket*rtype // internal bucket structure
-   hmap  *rtype // internal map header
keysize   uint8  // size of key slot
indirectkey   uint8  // store ptr to key instead of key itself
valuesize uint8  // size of value slot
Index: libgo/go/runtime/hashmap.go
===
--- libgo/go/runtime/hashmap.go (revision 263749)
+++ libgo/go/runtime/hashmap.go (working copy)
@@ -311,20 +311,13 @@ func makemap_small() *hmap {
 // If h != nil, the map can be created directly in h.
 // If h.buckets != nil, bucket pointed to can be used as the first bucket.
 func makemap(t *maptype, hint int, h *hmap) *hmap {
-   // The size of hmap should be 48 bytes on 64 bit
-   // and 28 bytes on 32 bit platforms.
-   if sz := unsafe.Sizeof(hmap{}); sz != 8+5*sys.PtrSize {
-   println("runtime: sizeof(hmap) =", sz, ", t.hmap.size =", 
t.hmap.size)
-   throw("bad hmap size")
-   }
-
if hint < 0 || hint > int(maxSliceCap(t.bucket.size)) {
hint = 0
}
 
// initialize Hmap
if h == nil {
-   h = (*hmap)(newobject(t.hmap))
+   h = new(hmap)
}
h.hash0 = fastrand()
 
@@ -1210,11 +1203,6 @@ func ismapkey(t *_type) bool {
 
 //go:linkname reflect_makemap reflect.makemap
 func reflect_makemap(t *maptype, cap int) *hmap {
-   // Check invariants and reflects math.
-   if sz := unsafe.Sizeof(hmap{}); sz != t.hmap.size {
-   println("runtime: sizeof(hmap) =", sz, ", t.hmap.size =", 
t.hmap.size)
-   throw("bad hmap size")
-   }
if !ismapkey(t.key) {
throw("runtime.reflect_makemap: unsupported map key type")
}
Index: libgo/go/runtime/type.go
===
--- libgo/go/runtime/type.go(revision 263749)
+++ libgo/go/runtime/type.go(working copy)
@@ -72,7 +72,6 @@ type maptype struct {
key   *_type
elem  *_type
bucket*_type // internal type representing a hash bucket
-   hmap  *_type // internal type representing a hmap
keysize   uint8  // size of key slot
indirectkey   bool   // store ptr to key instead of key itself
valuesize uint8  // size of value slot


Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

2018-08-28 Thread Martin Sebor

Sadly, dstbase is the PARM_DECL for d.  That's where things are going
"wrong".  Not sure why you're getting the PARM_DECL in that case.  I'd
debug get_addr_base_and_unit_offset to understand what's going on.
Essentially you're getting different results of
get_addr_base_and_unit_offset in a case where they arguably should be
the same.


Probably get_attr_nonstring_decl has the same "mistake" and returns
the PARM_DECL instead of the SSA name pointer.  So we're comparing
apples and oranges here.


Returning the SSA_NAME_VAR from get_attr_nonstring_decl() is
intentional but the function need not (perhaps should not)
also set *REF to it.



Yeah:

/* If EXPR refers to a character array or pointer declared attribute
   nonstring return a decl for that array or pointer and set *REF to
   the referenced enclosing object or pointer.  Otherwise returns
   null.  */

tree
get_attr_nonstring_decl (tree expr, tree *ref)
{
  tree decl = expr;
  if (TREE_CODE (decl) == SSA_NAME)
{
  gimple *def = SSA_NAME_DEF_STMT (decl);

  if (is_gimple_assign (def))
{
  tree_code code = gimple_assign_rhs_code (def);
  if (code == ADDR_EXPR
  || code == COMPONENT_REF
  || code == VAR_DECL)
decl = gimple_assign_rhs1 (def);
}
  else if (tree var = SSA_NAME_VAR (decl))
decl = var;
}

  if (TREE_CODE (decl) == ADDR_EXPR)
decl = TREE_OPERAND (decl, 0);

  if (ref)
*ref = decl;

I see a lot of "magic" here again in the attempt to "propagate"
a nonstring attribute.


That's the function's purpose: to look for the attribute.  Is
there a better way to do this?


Note

foo (char *p __attribute__(("nonstring")))
{
  p = "bar";
  strlen (p); // or whatever is necessary to call get_attr_nonstring_decl
}

is perfectly valid and p as passed to strlen is _not_ nonstring(?).


I don't know if you're saying that it should get a warning or
shouldn't.  Right now it doesn't because the strlen() call is
folded before we check for nonstring.

I could see an argument for diagnosing it but I suspect you
wouldn't like it because it would mean more warning from
the folder.  I could also see an argument against it because,
as you said, it's safe.

If you take the assignment to p away then a warning is issued,
and that's because p is declared with attribute nonstring.
That's also why get_attr_nonstring_decl looks at SSA_NAME_VAR.


I think in your code comparing bases you want to look at the _original_
argument to the string function rather than what get_attr_nonstring_decl
returned as ref.


I've adjusted get_attr_nonstring_decl() to avoid setting *REF
to SSA_NAME_VAR.  That let me remove the GIMPLE_NOP code from
the patch.  I've also updated the comment above SSA_NAME_VAR
to clarify its purpose per Jeff's comments.

Attached is an updated revision with these changes.

Martin
PR tree-optimization/87028 - false positive -Wstringop-truncation strncpy with global variable source string
gcc/ChangeLog:

	PR tree-optimization/87028
	* calls.c (get_attr_nonstring_decl): Avoid setting *REF to
	SSA_NAME_VAR.
	* gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding
	when statement doesn't belong to a basic block.
	* tree.h (SSA_NAME_VAR): Update comment.
	* tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Simplify.

gcc/testsuite/ChangeLog:

	PR tree-optimization/87028
	* c-c++-common/Wstringop-truncation.c: Remove xfails.
	* gcc.dg/Wstringop-truncation-5.c: New test.

Index: gcc/tree.h
===
--- gcc/tree.h	(revision 263925)
+++ gcc/tree.h	(working copy)
@@ -1697,7 +1697,10 @@ extern tree maybe_wrap_with_location (tree, locati
: NULL_TREE)
 
 /* Returns the variable being referenced.  This can be NULL_TREE for
-   temporaries not associated with any user variable.
+   temporaries not associated with any user variable.  The result
+   is mainly useful for debugging, diagnostics, or as the target
+   declaration referenced by an SSA_NAME.  Otherwise, because
+   it has no dataflow information, it should not be used.
Once released, this is the only field that can be relied upon.  */
 #define SSA_NAME_VAR(NODE)	\
   (SSA_NAME_CHECK (NODE)->ssa_name.var == NULL_TREE		\
Index: gcc/calls.c
===
--- gcc/calls.c	(revision 263928)
+++ gcc/calls.c	(working copy)
@@ -1503,6 +1503,7 @@ tree
 get_attr_nonstring_decl (tree expr, tree *ref)
 {
   tree decl = expr;
+  tree var = NULL_TREE;
   if (TREE_CODE (decl) == SSA_NAME)
 {
   gimple *def = SSA_NAME_DEF_STMT (decl);
@@ -1515,17 +1516,25 @@ get_attr_nonstring_decl (tree expr, tree *ref)
 	  || code == VAR_DECL)
 	decl = gimple_assign_rhs1 (def);
 	}
-  else if (tree var = SSA_NAME_VAR (decl))
-	decl = var;
+  else
+	var = SSA_NAME_VAR (decl);
 }
 
   if (TREE_CODE (decl) == ADDR_EXPR)
 decl = TREE_OPERAND (decl, 0);
 
+  /* To simplify calling code, store the 

Re: [5/6] Insert pattern statements into vec_basic_blocks

2018-08-28 Thread Jeff Law
On 08/28/2018 05:24 AM, Richard Sandiford wrote:
> The point of this patch is to put pattern statements in the same
> vec_basic_block as the statements they replace, with the pattern
> statements for S coming between S and S's original predecessor.
> This removes the need to handle them specially in various places.
> 
> 
> 2018-08-28  Richard Sandiford  
> 
> gcc/
>   * tree-vectorizer.h (vec_basic_block): Expand comment.
>   (_stmt_vec_info::pattern_def_seq): Delete.
>   (STMT_VINFO_PATTERN_DEF_SEQ): Likewise.
>   (is_main_pattern_stmt_p): New function.
>   * tree-vect-loop.c (vect_determine_vf_for_stmt_1): Rename to...
>   (vect_determine_vf_for_stmt): ...this, deleting the original
>   function with this name.  Remove vectype_maybe_set_p argument
>   and test is_pattern_stmt_p instead.  Retain the "examining..."
>   message from the previous vect_determine_vf_for_stmt.
>   (vect_compute_single_scalar_iteration_cost, vect_update_vf_for_slp)
>   (vect_analyze_loop_2): Don't treat pattern statements specially.
>   (vect_transform_loop): Likewise.  Use vect_orig_stmt to find the
>   insertion point.
>   * tree-vect-slp.c (vect_detect_hybrid_slp): Expect pattern statements
>   to be in the statement list, without needing to follow
>   STMT_VINFO_RELATED_STMT.  Remove PATTERN_DEF_SEQ handling.
>   * tree-vect-stmts.c (vect_analyze_stmt): Don't handle pattern
>   statements specially.
>   (vect_remove_dead_scalar_stmts): Ignore pattern statements.
>   * tree-vect-patterns.c (vect_set_pattern_stmt): Insert the pattern
>   statement into the vec_basic_block immediately before the statement
>   it replaces.
>   (append_pattern_def_seq): Likewise.  If the original statement is
>   itself a pattern statement, associate the new one with the original
>   statement.
>   (vect_split_statement): Use append_pattern_def_seq to insert the
>   first pattern statement.
>   (vect_recog_vector_vector_shift_pattern): Remove mention of
>   STMT_VINFO_PATTERN_DEF_SEQ.
>   (adjust_bool_stmts): Get the last pattern statement from the
>   stmt_vec_info chain.
>   (vect_mark_pattern_stmts): Rename to...
>   (vect_replace_stmt_with_pattern): ...this.  Remove the
>   PATTERN_DEF_SEQ handling and process only the pattern statement given.
>   Use append_pattern_def_seq when replacing a pattern statement with
>   another pattern statement, and use vec_basic_block::remove instead
>   of gsi_remove to remove the old one.
>   (vect_pattern_recog_1): Update accordingly.  Remove PATTERN_DEF_SEQ
>   handling.  On failure, remove any half-formed pattern sequence from
>   the vec_basic_block.  Install the vector type in pattern statements
>   that don't yet have one.
>   (vect_pattern_recog): Iterate over statements that are added
>   by previous recognizers, but skipping those that have already
>   been replaced, or the main pattern statement in such a replacement.
Nice cleanup.  OK.
jeff


Re: [PATCH] convert MIN_EXPR operands to the same type (PR 87059)

2018-08-28 Thread Jeff Law
On 08/27/2018 02:32 AM, Richard Biener wrote:
> On Sat, Aug 25, 2018 at 9:14 PM Jeff Law  wrote:
>>
>> On 08/24/2018 01:06 PM, Martin Sebor wrote:
>>> PR 87059 points out an ICE in the recently enhanced VRP code
>>> that was traced back to a MIN_EXPR built out of operands of
>>> types with different sign by expand_builtin_strncmp().
>>>
>>> The attached patch adjusts the function to make sure both
>>> operands have the same type, and to make these mismatches
>>> easier to detect, also adds an assertion to fold_binary_loc()
>>> for these expressions.
>>>
>>> Bootstrapped on x86_64-linux.
>>>
>>> Martin
>>>
>>> PS Aldy, I have not tested this on powerpc64le.
>>>
>>> gcc-87059.diff
>>>
>>>
>>> PR tree-optimization/87059 - internal compiler error: in set_value_range
>>>
>>> gcc/ChangeLog:
>>>
>>>   PR tree-optimization/87059
>>>   * builtins.c (expand_builtin_strncmp): Convert MIN_EXPR operand
>>>   to the same type as the other.
>>>   * fold-const.c (fold_binary_loc): Assert expectation.
>> I bootstrapped (but did not regression test) this on ppc64le and also
>> built the linux kernel (which is where my tester tripped over this problem).
>>
>> Approved and installed on the trunk.
> 
> Please remove the assertion in fold_binary_loc again, we do not do this kind
> of assertions there.
Done after a bootstrap and regression test on x86.

jeff


Re: [4/6] Make the vectoriser do its own DCE

2018-08-28 Thread Jeff Law
On 08/28/2018 05:23 AM, Richard Sandiford wrote:
> The vectoriser normally leaves a later DCE pass to remove the scalar
> code, but we've accumulated various special cases for things that
> DCE can't handle, such as removing the scalar stores that have been
> replaced by vector stores, and the scalar calls to internal functions.
> (The latter must be removed for correctness, since no underlying scalar
> optabs exist for those calls.)
> 
> Now that vec_basic_block gives us an easy way of iterating over the
> original scalar code (ignoring any new code inserted by the vectoriser),
> it seems easier to do the DCE directly.  This involves marking the few
> cases in which the vector code needs part of the original scalar code
> to be kept around.
> 
> 
> 2018-08-28  Richard Sandiford  
> 
> gcc/
>   * tree-vectorizer.h (_stmt_vec_info::used_by_vector_code_p): New
>   member variable.
>   (vect_mark_used_by_vector_code): Declare.
>   (vect_remove_dead_scalar_stmts): Likewise.
>   (vect_transform_stmt): Return void.
>   (vect_remove_stores): Delete.
>   * tree-vectorizer.c (vec_info::remove_stmt): Handle phis.
>   * tree-vect-stmts.c (vect_mark_used_by_vector_code): New function.
>   (vectorizable_call, vectorizable_simd_clone_call): Don't remove
>   scalar calls here.
>   (vectorizable_shift): Mark shift amounts in a vector-scalar shift
>   as being used by the vector code.
>   (vectorizable_load): Mark unhoisted scalar loads that feed a
>   load-and-broadcast operation as being needed by the vector code.
>   (vect_transform_stmt): Return void.
>   (vect_remove_stores): Delete.
>   (vect_maybe_remove_scalar_stmt): New function.
>   (vect_remove_dead_scalar_stmts): Likewise.
>   * tree-vect-slp.c (vect_slp_bb): Call vect_remove_dead_scalar_stmts.
>   (vect_remove_slp_scalar_calls): Delete.
>   (vect_schedule_slp): Don't call it.  Don't remove scalar stores here.
>   * tree-vect-loop.c (vectorizable_reduction): Mark scalar phis that
>   are retained by the vector code.
>   (vectorizable_live_operation): Mark scalar live-out statements that
>   are retained by the vector code.
>   (vect_transform_loop_stmt): Remove seen_store argument.  Mark gconds
>   in nested loops as being needed by the vector code.  Replace the
>   outer loop's gcond with a dummy condition.
>   (vect_transform_loop): Update calls accordingly.  Don't remove
>   scalar stores or calls here; call vect_remove_dead_scalar_stmts
>   instead.
>   * tree-vect-loop-manip.c (vect_update_ivs_after_vectorizer): Don't
>   remove PHIs that were classified as reductions but never actually
>   vectorized.
Presumably you'll look at killing the actual DCE pass we're running as a
follow-up.

I'm hoping this will help the tendency of the vectorizer to leak
SSA_NAMEs.  Though it's late enough in the pipeline that leaking isn't
that bad.





> Index: gcc/tree-vectorizer.c
> ===
> --- gcc/tree-vectorizer.c 2018-08-28 12:05:11.466982927 +0100
> +++ gcc/tree-vectorizer.c 2018-08-28 12:05:14.014961439 +0100
> @@ -654,8 +654,13 @@ vec_info::remove_stmt (stmt_vec_info stm
>set_vinfo_for_stmt (stmt_info->stmt, NULL);
>gimple_stmt_iterator si = gsi_for_stmt (stmt_info->stmt);
>unlink_stmt_vdef (stmt_info->stmt);
> -  gsi_remove (, true);
> -  release_defs (stmt_info->stmt);
> +  if (is_a  (stmt_info->stmt))
> +remove_phi_node (, true);
> +  else
> +{
> +  gsi_remove (, true);
> +  release_defs (stmt_info->stmt);
> +}
More than once I've wondered if we should factor this into its own
little function.  I'm pretty sure I've seen this style of code
elsewhere.  Your call whether or not to clean that up.

OK with or without creating a little helper for the code above.
jeff


Re: [PATCH, rs6000] inline expansion of str[n]cmp using vec/vsx instructions

2018-08-28 Thread Segher Boessenkool
Hi Aaron,

On Wed, Aug 22, 2018 at 12:31:51PM -0500, Aaron Sawdey wrote:
> This patch teaches rs6000 inline expansion of strcmp/strncmp how to
> generate vector/vsx code for power8/power9 processors. Compares 16
> bytes and longer are generated using the vector code, which is
> considerably faster than the gpr based code. 

:-)

>   (expand_strn_compare): Support for vector strncmp.

"Add support"?

> -(define_insn "*altivec_vcmpequ_p"
> +(define_insn "altivec_vcmpequ_p"
>[(set (reg:CC CR6_REGNO)
>   (unspec:CC [(eq:CC (match_operand:VI2 1 "register_operand" "v")
>  (match_operand:VI2 2 "register_operand" "v"))]

You missed this one in the changelog.

> --- gcc/config/rs6000/rs6000-string.c (revision 263753)
> +++ gcc/config/rs6000/rs6000-string.c (working copy)
> @@ -157,6 +157,29 @@
>  {
>switch (GET_MODE (reg))
>  {
> +case E_V16QImode:
> +  switch (mode) {
> +  case E_V16QImode:
> + if (!BYTES_BIG_ENDIAN) 

This has a trailing space.

> +   if (TARGET_P9_VECTOR)
> + emit_insn (gen_vsx_ld_elemrev_v16qi_internal (reg, mem));
> +   else
> + {
> +   rtx reg_v2di = simplify_gen_subreg (V2DImode, reg, V16QImode, 0);
> +   gcc_assert (MEM_P (mem));
> +   rtx addr = XEXP (mem, 0);
> +   rtx mem_v2di = gen_rtx_MEM (V2DImode, addr);
> +   MEM_COPY_ATTRIBUTES (mem_v2di, mem);
> +   set_mem_size (mem, GET_MODE_SIZE (V2DImode));
> +   emit_insn (gen_vsx_ld_elemrev_v2di (reg_v2di, mem_v2di));
> + }
> + else

If you have to outdent more than one, you usually should put the inner if
in a block of its own.  Another option is to make a helper function, if you
can come up with something for which you can make a sane and short name.

> +   emit_insn (gen_vsx_movv2di_64bit (reg, mem));
> + break;
> +  default:
> + gcc_unreachable ();
> +  }
> +  break;

This cannot be correct (same indent for break and preceding } ).

> +  switch (mode)
> + {
> + case E_QImode:
> +   emit_move_insn (reg, mem);
> +   break;
> + default:
> +   debug_rtx(reg);
> +   gcc_unreachable ();
> +   break;
> + }

Unless you have a reason you want to keep debug_rtx here, why not just do

gcc_assert (mode == E_QImode);
emit_move_insn (reg, mem);

> +static void
> +expand_strncmp_vec_sequence(unsigned HOST_WIDE_INT bytes_to_compare,
> + rtx orig_src1, rtx orig_src2,
> + rtx s1addr, rtx s2addr, rtx off_reg,
> + rtx s1data, rtx s2data,
> + rtx vec_result, bool equality_compare_rest,
> + rtx _label, rtx final_move_label)

Space before (.  Wow what a monster :-)

Why use a reference instead of a pointer for cleanup_label?  Will that work
even, it seems to allow NULL?

> +  unsigned int i;
> +  rtx zr[16];
> +  for (i = 0; i < 16; i++)
> +zr[i] = GEN_INT (0);
> +  rtvec zv = gen_rtvec_v (16, zr);
> +  rtx zero_reg = gen_reg_rtx (V16QImode);
> +  rs6000_expand_vector_init (zero_reg, gen_rtx_PARALLEL (V16QImode, zv));

Is there no helper for this already?  Should there be?

> +   if (extra_bytes < offset)
> + {
> +   offset -= extra_bytes;
> +   cmp_bytes = load_mode_size;
> +   bytes_to_compare = cmp_bytes;
> + }
> +   else
> + /* We do not call this with less than 16 bytes.  */
> + gcc_unreachable ();

  gcc_assert (offset > extra_bytes);
  offset -= extra_bytes;
  cmp_bytes = load_mode_size;
  bytes_to_compare = cmp_bytes;

> +  /* Is it OK to use vec/vsx for this. TARGET_VSX means we have at
> + least POWER7 but we use TARGET_EFFICIENT_UNALIGNED_VSX which is
> + at least POWER8. That way we can rely on overlapping compares to
> + do the final comparison of less than 16 bytes. Also I do not want
> + to deal with making this work for 32 bits.  */

Two spaces after full stop.

> +  if (use_vec)
> +{
> +  s1addr = gen_reg_rtx (Pmode);
> +  s2addr = gen_reg_rtx (Pmode);
> +  off_reg = gen_reg_rtx (Pmode);
> +  vec_result = gen_reg_rtx (load_mode);
> +  emit_move_insn (result_reg, GEN_INT (0));
> +  expand_strncmp_vec_sequence(compare_length,

Space before (.

Please fix these nits, and then it is okay for trunk.  Thanks!


Segher


Re: [GCC][PATCH][Aarch64] Added pattern to match zero extended bfxil

2018-08-28 Thread James Greenhalgh
On Tue, Jul 31, 2018 at 04:38:43AM -0500, Sam Tebbs wrote:
> Hi all,
> 
> This patch captures the case where an unnecessary uxtw instruction is 
> generated
> after a bfxil instruction when in SI mode, and stops it from being 
> generated.
> Note that this depends on my previous patch submission
> (https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01148.html).
> 
> For example:
> 
> unsigned long
> combine_balanced_int (unsigned int a, unsigned int b)
> {
>    return (a & 0xll) | (b & 0xll);
> }
> 
> Would generate:
> 
> combine_balanced_int:
>      bfxil   w0, w1, 0, 16
>      uxtw    x0, w0
>      ret
> 
> But with this patch generates:
> 
> combine_balanced_int:
>      bfxil   w0, w1, 0, 16
>      ret
> 
> Bootstrapped on aarch64-none-linux-gnu and regtested on aarch64-none-elf 
> with
> no regressions.

OK once the other one is approved, with the obvious rebase over the renamed
function.

James

> gcc/
> 2018-07-31  Sam Tebbs  
> 
>      PR target/85628
>      * config/aarch64/aarch64.md (*aarch64_bfxilsi_uxtw): Define.
> 
> gcc/testsuite
> 2018-07-31  Sam Tebbs  
> 
>      PR target/85628
>      * gcc.target/aarch64/combine_bfxil.c 
> (combine_zero_extended_int, foo6):
>      New functions.



Re: [GCC][PATCH v2][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

2018-08-28 Thread James Greenhalgh
On Wed, Aug 01, 2018 at 10:07:23AM -0500, Sam Tebbs wrote:
> Hi all,
> 
> This patch adds an optimisation that exploits the AArch64 BFXIL
> instruction when or-ing the result of two bitwise and operations
> with non-overlapping bitmasks
> (e.g. (a & 0x) | (b & 0x)).
> 
> Example:
> 
> unsigned long long combine(unsigned long long a, unsigned long
> long b) {
>    return (a & 0xll) | (b & 0xll);
> }
> 
> void read(unsigned long long a, unsigned long long b, unsigned
> long long *c) {
>    *c = combine(a, b);
> }
> 
> When compiled with -O2, read would result in:
> 
> read:
>    and   x5, x1, #0x
>    and   x4, x0, #0x
>    orr   x4, x4, x5
>    str   x4, [x2]
>    ret
> 
> But with this patch results in:
> 
> read:
>    mov    x4, x0
>    bfxil    x4, x1, 0, 32
>    str    x4, [x2]
>    ret
> 
> Bootstrapped and regtested on aarch64-none-linux-gnu and
> aarch64-none-elf with no regressions.
> 
> 
> gcc/
> 2018-08-01  Sam Tebbs
> 
>      PR target/85628
>      * config/aarch64/aarch64.md (*aarch64_bfxil):
>      Define.
>      * config/aarch64/constraints.md (Ulc): Define
>      * config/aarch64/aarch64-protos.h
>  (aarch64_is_left_consecutive): Define.

Hm, I'm not very sure about the naming here; "left consecutive" isn't a
common phrase to denote the mask you're looking for (exact_log2 (-i) != -1
if I'm reading right), and is misleading 0x is 'left consecutive'
too, just with zeroes rather than ones.

>      * config/aarch64/aarch64.c (aarch64_is_left_consecutive):
>  New function.
> 
> gcc/testsuite
> 2018-08-01  Sam Tebbs
> 
>      PR target/85628
>      * gcc.target/aarch64/combine_bfxil.c: New file.
>      * gcc.target/aarch64/combine_bfxil_2.c: New file.
> 

> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> af5db9c595385f7586692258f750b6aceb3ed9c8..01d9e1bd634572fcfa60208ba4dc541805af5ccd
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -574,4 +574,6 @@ rtl_opt_pass *make_pass_fma_steering (gcc::context *ctxt);
>  
>  poly_uint64 aarch64_regmode_natural_size (machine_mode);
>  
> +bool aarch64_is_left_consecutive (HOST_WIDE_INT);
> +
>  #endif /* GCC_AARCH64_PROTOS_H */
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> fa01475aa9ee579b6a3b2526295b622157120660..3cfa51b15af3e241672f1383cf881c12a44494a5
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -1454,6 +1454,14 @@ aarch64_hard_regno_caller_save_mode (unsigned regno, 
> unsigned,
>  return SImode;
>  }
>  
> +/* Implement IS_LEFT_CONSECUTIVE.  Check if I's bits are consecutive

What is IS_LEFT_CONSECUTIVE - I don't see it elsewhere in the GCC code, so
what does the comment refer to implementing?

> +   ones from the MSB.  */
> +bool
> +aarch64_is_left_consecutive (HOST_WIDE_INT i)
> +{
> +  return (i | (i - 1)) == HOST_WIDE_INT_M1;

exact_log2(-i) != HOST_WIDE_INT_M1?

I don't have issues with the rest of the patch; but please try to find a more
descriptive name for this.

Thanks,
James



[C++ PATCH] Fix -fsanitize=vptr -fno-sanitize-recover=vptr (PR c++/87095)

2018-08-28 Thread Jakub Jelinek
Hi!

As mentioned in the PR, if some class has nearly empty virtual base as
primary base, it shares the vptr with that primary base; in that case,
we can't clear the vptr in the not in charge destructor of that class,
as the dtor for the virtual base will be invoked later on and needs to have
the virtual pointer non-NULL.

This version differs from the originally pasted into the PR by the
convert_to_void, so that with -O0 we don't emit an unnecessary load from the
vptr after the store (+ testcase).

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk and after a while release branches?

2018-08-28  Jakub Jelinek  

PR c++/87095
* decl.c (begin_destructor_body): If current_class_type has
virtual bases and the primary base is nearly empty virtual base,
voidify clearing of vptr and make it conditional on in-charge
argument.

* g++.dg/ubsan/vptr-13.C: New test.

--- gcc/cp/decl.c.jj2018-08-27 17:50:43.781489594 +0200
+++ gcc/cp/decl.c   2018-08-28 14:41:01.354365914 +0200
@@ -15696,6 +15696,18 @@ begin_destructor_body (void)
tree stmt = cp_build_modify_expr (input_location, vtbl_ptr,
  NOP_EXPR, vtbl,
  tf_warning_or_error);
+   /* If the vptr is shared with some virtual nearly empty base,
+  don't clear it if not in charge, the dtor of the virtual
+  nearly empty base will do that later.  */
+   if (CLASSTYPE_VBASECLASSES (current_class_type)
+   && CLASSTYPE_PRIMARY_BINFO (current_class_type)
+   && BINFO_VIRTUAL_P
+ (CLASSTYPE_PRIMARY_BINFO (current_class_type)))
+ {
+   stmt = convert_to_void (stmt, ICV_STATEMENT,
+   tf_warning_or_error);
+   stmt = build_if_in_charge (stmt);
+ }
finish_decl_cleanup (NULL_TREE, stmt);
  }
else
--- gcc/testsuite/g++.dg/ubsan/vptr-13.C.jj 2018-08-28 14:50:18.712112488 
+0200
+++ gcc/testsuite/g++.dg/ubsan/vptr-13.C2018-08-28 14:49:17.895122080 
+0200
@@ -0,0 +1,19 @@
+// PR c++/87095
+// { dg-do run }
+// { dg-options "-fsanitize=vptr -fno-sanitize-recover=vptr" }
+
+struct A
+{
+  virtual ~A () {}
+};
+
+struct B : virtual A {};
+
+struct C : B {};
+
+int
+main ()
+{
+  C c;
+  return 0;
+}

Jakub


Re: [3/6] Add a vec_basic_block structure

2018-08-28 Thread Jeff Law
On 08/28/2018 05:22 AM, Richard Sandiford wrote:
> This patch adds a vec_basic_block that records the scalar phis and
> scalar statements that we need to vectorise.  This is a slight
> simplification in its own right, since it avoids unnecesary statement
> lookups and shaves >50 LOC.  But the main reason for doing it is
> to allow the rest of the series to treat pattern statements less
> specially.
> 
> Putting phis (which are logically parallel) and normal statements
> (which are logically serial) into a single list might seem dangerous,
> but I think in practice it should be fine.  Very little vectoriser
> code needs to handle the parallel nature of phis specially, and code
> that does can still do so.  Having a single list simplifies code that
> wants to look at every scalar phi or stmt in isolation.
> 
> The new test is for cases in which we try to hoist the same expression
> twice, which I broke with an earlier version of the patch.
> 
> 
> 2018-08-28  Richard Sandiford  
> 
> gcc/
>   * tree-vectorizer.h (vec_basic_block): New structure.
>   (vec_info::blocks, _stmt_vec_info::block, _stmt_vec_info::prev)
>   (_stmt_vec_info::next): New member variables.
>   (FOR_EACH_VEC_BB_STMT, FOR_EACH_VEC_BB_STMT_REVERSE): New macros.
>   (vec_basic_block::vec_basic_block): New function.
>   * tree-vectorizer.c (vec_basic_block::add_to_end): Likewise.
>   (vec_basic_block::add_before): Likewise.
>   (vec_basic_block::remove): Likewise.
>   (vec_info::~vec_info): Free the vec_basic_blocks.
>   (vec_info::remove_stmt): Remove the statement from the containing
>   vec_basic_block.
>   * tree-vect-patterns.c (vect_determine_precisions)
>   (vect_pattern_recog): Iterate over vec_basic_blocks.
>   * tree-vect-loop.c (vect_determine_vectorization_factor)
>   (vect_compute_single_scalar_iteration_cost, vect_update_vf_for_slp)
>   (vect_analyze_loop_operations, vect_transform_loop): Likewise.
>   (_loop_vec_info::_loop_vec_info): Construct vec_basic_blocks.
>   * tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Likewise.
>   (vect_detect_hybrid_slp): Iterate over vec_basic_blocks.
>   * tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Likewise.
>   (vect_finish_replace_stmt, vectorizable_condition): Remove the original
>   statement from the containing block.
>   (hoist_defs_of_uses): Likewise the statement that we're hoisting.
> 
> gcc/testsuite/
>   * gcc.dg/vect/vect-hoist-1.c: New test.
> 
I'm generally not a fan of references to "this" like you do when you add
STMT_INFOs to a VEC_BASIC_BLOCK.  I'm going to trust you're not leaving
dangling pointers in the STMT_INFOs and that storing a pointer to the
containing VEC_BASIC_BLOCK within the STMT_INFO is the best way to get
at the data you want.

Firstly standard doubly linked list routines, lots of fairly mechanical
chunks get simplified a bit.   Overall it looks quite reasonable.

OK for the trunk.

jeff



[C++ PATCH] Fix structured binding range for in generic lambda inside of template (PR c++/87122)

2018-08-28 Thread Jakub Jelinek
Hi!

The following testcase ICEs in tsubst_decomp_names, because the earlier
tsubst_expr still with processing_template_decl called just
tsubst_decomp_names, but didn't arrange for DECL_VALUE_EXPR to be set on the
decomp identifier decls (which cp_finish_decomp does).

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk and 8.x?

2018-08-28  Jakub Jelinek  

PR c++/87122
* pt.c (tsubst_expr) : If
processing_template_decl and decl is structured binding decl, call
cp_finish_decomp.

* g++.dg/cpp1z/decomp47.C: New test.

--- gcc/cp/pt.c.jj  2018-08-27 17:50:43.786489511 +0200
+++ gcc/cp/pt.c 2018-08-28 11:41:52.020677695 +0200
@@ -16832,6 +16832,8 @@ tsubst_expr (tree t, tree args, tsubst_f
RANGE_FOR_IVDEP (stmt) = RANGE_FOR_IVDEP (t);
RANGE_FOR_UNROLL (stmt) = RANGE_FOR_UNROLL (t);
finish_range_for_decl (stmt, decl, expr);
+   if (decomp_first && decl != error_mark_node)
+ cp_finish_decomp (decl, decomp_first, decomp_cnt);
  }
else
  {
--- gcc/testsuite/g++.dg/cpp1z/decomp47.C.jj2018-08-28 11:56:18.587275225 
+0200
+++ gcc/testsuite/g++.dg/cpp1z/decomp47.C   2018-08-28 11:59:50.858747080 
+0200
@@ -0,0 +1,32 @@
+// PR c++/87122
+// { dg-do run { target c++14 } }
+// { dg-options "" }
+
+extern "C" void abort ();
+struct S { int a, b; };
+int c;
+
+template 
+void
+foo ()
+{
+  S x[4] = { { N, 2 }, { 3, 4 }, { 5, 6 }, { 7, 8 } };
+  auto f = [](auto & y) {
+for (auto & [ u, v ] : y)  // { dg-warning "structured bindings only 
available with" "" { target c++14_down } }
+  {
+   if ((u & 1) != 1 || v != u + 1 || u < N || u > 7 || (c & (1 << u))
+   ||  != [v / 2 - 1].a ||  != [v / 2 - 1].b)
+ abort ();
+   c |= 1 << u;
+  }
+  };
+  f (x);
+}
+
+int
+main ()
+{
+  foo<1> ();
+  if (c != 0xaa)
+abort ();
+}

Jakub


Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

2018-08-28 Thread Jeff Law
On 08/28/2018 02:43 PM, Martin Sebor wrote:
> On 08/27/2018 10:27 PM, Jeff Law wrote:
>> On 08/27/2018 10:27 AM, Martin Sebor wrote:
>>> On 08/27/2018 02:29 AM, Richard Biener wrote:
 On Sun, Aug 26, 2018 at 7:26 AM Jeff Law  wrote:
>
> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>> The warning suppression for -Wstringop-truncation looks for
>> the next statement after a truncating strncpy to see if it
>> adds a terminating nul.  This only works when the next
>> statement can be reached using the Gimple statement iterator
>> which isn't until after gimplification.  As a result, strncpy
>> calls that truncate their constant argument that are being
>> folded to memcpy this early get diagnosed even if they are
>> followed by the nul assignment:
>>
>>   const char s[] = "12345";
>>   char d[3];
>>
>>   void f (void)
>>   {
>>     strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>>     d[sizeof d - 1] = 0;
>>   }
>>
>> To avoid the warning I propose to defer folding strncpy to
>> memcpy until the pointer to the basic block the strnpy call
>> is in can be used to try to reach the next statement (this
>> happens as early as ccp1).  I'm aware of the preference to
>> fold things early but in the case of strncpy (a relatively
>> rarely used function that is often misused), getting
>> the warning right while folding a bit later but still fairly
>> early on seems like a reasonable compromise.  I fear that
>> otherwise, the false positives will drive users to adopt
>> other unsafe solutions (like memcpy) where these kinds of
>> bugs cannot be as readily detected.
>>
>> Tested on x86_64-linux.
>>
>> Martin
>>
>> PS There still are outstanding cases where the warning can
>> be avoided.  I xfailed them in the test for now but will
>> still try to get them to work for GCC 9.
>>
>> gcc-87028.diff
>>
>>
>> PR tree-optimization/87028 - false positive -Wstringop-truncation
>> strncpy with global variable source string
>> gcc/ChangeLog:
>>
>>   PR tree-optimization/87028
>>   * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding
>> when
>>   statement doesn't belong to a basic block.
>>   * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle
>> MEM_REF on
>>   the left hand side of assignment.
>>
>> gcc/testsuite/ChangeLog:
>>
>>   PR tree-optimization/87028
>>   * c-c++-common/Wstringop-truncation.c: Remove xfails.
>>   * gcc.dg/Wstringop-truncation-5.c: New test.
>>
>> diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>> index 07341eb..284c2fb 100644
>> --- a/gcc/gimple-fold.c
>> +++ b/gcc/gimple-fold.c
>> @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>> (gimple_stmt_iterator *gsi,
>>    if (tree_int_cst_lt (ssize, len))
>>  return false;
>>
>> +  /* Defer warning (and folding) until the next statement in the
>> basic
>> + block is reachable.  */
>> +  if (!gimple_bb (stmt))
>> +    return false;
> I think you want cfun->cfg as the test here.  They should be
> equivalent
> in practice.

 Please do not add 'cfun' references.  Note that the next stmt is also
 accessible
 when there is no CFG.  I guess the issue is that we fold this during
 gimplification
 where the next stmt is not yet "there" (but still in GENERIC)?

 We generally do not want to have unfolded stmts in the IL when we can
 avoid that
 which is why we fold most stmts during gimplification.  We also do
 that because
 we now do less folding on GENERIC.

 There may be the possibility to refactor gimplification time folding
 to what we
 do during inlining - queue stmts we want to fold and perform all
 folding delayed.
 This of course means bigger compile-time due to cache effects.

>
>> diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>> index d0792aa..f1988f6 100644
>> --- a/gcc/tree-ssa-strlen.c
>> +++ b/gcc/tree-ssa-strlen.c
>> @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
>> (gimple_stmt_iterator gsi, tree src, tree cnt)
>>     && known_eq (dstoff, lhsoff)
>>     && operand_equal_p (dstbase, lhsbase, 0))
>>   return false;
>> +
>> +  if (code == MEM_REF
>> +   && TREE_CODE (lhsbase) == SSA_NAME
>> +   && known_eq (dstoff, lhsoff))
>> + {
>> +   /* Extract the referenced variable from something like
>> +    MEM[(char *)d_3(D) + 3B] = 0;  */
>> +   gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>> +   if (gimple_nop_p (def))
>> + {
>> +   lhsbase = SSA_NAME_VAR (lhsbase);
>> +   if (lhsbase
>> +   && dstbase
>> +   && 

Re: [PATCH][GCC][AARCH64] Use stdint integers in vect_su_add_sub.c

2018-08-28 Thread James Greenhalgh
On Fri, Aug 03, 2018 at 11:28:08AM -0500, Matthew Malcomson wrote:
> On 02/08/18 20:18, James Greenhalgh wrote:
> > On Tue, Jul 31, 2018 at 04:53:19AM -0500, Matthew Malcomson wrote:
> >> Fixing the ilp32 issue that Christophe found.
> >>
> >> The existing testcase uses `long` to represent a 64 bit integer.
> >> This breaks when compiled using the `-mabi=ilp32` flag.
> >> We switch the use of int/long for int32_t/int64_t to avoid this problem
> >> and show the requirement of a widening operation more clearly.
> > Normally we try to avoid pulling more headers than we need in the testsuite.
> >
> > Have you seen this construct:
> >
> >typedef unsigned __attribute__((mode(DI))) uint64_t;
> >
> > It can be a good way to ensure you have the right type for the patterns you
> > are adding.
> >
> > Thanks,
> > James
> Thanks! No I hadn't seen that construct before.
> Have attached a patch using that construct.

OK.

Thanks,
James



Re: [PATCH, OpenACC] (2/2) Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)

2018-08-28 Thread Cesar Philippidis
On 08/28/2018 02:32 PM, Julian Brown wrote:
> On Tue, 28 Aug 2018 12:23:22 -0700
> Cesar Philippidis  wrote:

>> This is specific to OpenACC, and needs to be guarded as such.
> 
> Are you sure that condition can be true for OpenMP? I'd assumed not...

My bad, you're correct. OMP doesn't use those GOMP_MAP_FORCE map types
anymore.

Cesar



Re: [PATCH] [AArch64, Falkor] Switch to using Falkor-specific vector costs

2018-08-28 Thread James Greenhalgh
On Mon, Aug 27, 2018 at 10:05:17AM -0500, Luis Machado wrote:
> Hi,
> 
> On 08/08/2018 04:54 AM, Siddhesh Poyarekar wrote:
> > On 08/01/2018 04:23 AM, James Greenhalgh wrote:
> >> On Wed, Jul 25, 2018 at 01:10:34PM -0500, Luis Machado wrote:
> >>> The adjusted vector costs give Falkor a reasonable boost in 
> >>> performance for FP
> >>> benchmarks (both CPU2017 and CPU2006) and doesn't change INT 
> >>> benchmarks that
> >>> much. About 0.7% for CPU2017 FP and 1.54% for CPU2006 FP.
> >>>
> >>> OK for trunk?
> >>
> >> OK if this is what works best for your subtarget.
> >>
> > 
> > I have pushed this for Luis since he is on holiday.
> > 
> > Thanks,
> > Siddhesh
> 
> Given this has a significant performance impact on Falkor and the 
> changes are very localized and specific to Falkor, it would be great to 
> have them on GCC 8. Would it be ok to push it?

I've not got a problem with this if your opinion is that the risk is
worthwhile for Falkor.

Thanks,
James



Re: [PATCH] [AArch64, Falkor] Adjust Falkor's sign extend reg+reg address cost

2018-08-28 Thread James Greenhalgh
On Mon, Aug 27, 2018 at 10:05:21AM -0500, Luis Machado wrote:
> Hi,
> 
> On 08/08/2018 04:46 AM, Siddhesh Poyarekar wrote:
> > On 08/01/2018 04:24 AM, James Greenhalgh wrote:
> >> OK if this is what is best for your subtarget.
> >>
> > 
> > I have pushed this on behalf of Luis since he is on holiday.
> > 
> > Thanks,
> > Siddhesh
> 
> Similarly to the vector cost changes, we've also noticed a non-trivial 
> performance impact from this change, on top of fixing a testsuite failure.
> 
> This is also Falkor specific. Would it be ok to push it to GCC 8?

Likewise for this one. If you're happy taking the risk of disruption for the
subtarget, I'm happy with the backport.

James



Re: [PATCH][AArch64] Implement new intrinsics vabsd_s64 and vnegd_s64

2018-08-28 Thread James Greenhalgh
On Tue, Aug 28, 2018 at 03:59:25AM -0500, Vlad Lazar wrote:
> Gentle ping.
> 
> On 08/08/18 17:38, Vlad Lazar wrote:
> > On 01/08/18 18:35, James Greenhalgh wrote:
> >> On Wed, Aug 01, 2018 at 07:13:53AM -0500, Vlad Lazar wrote:
> >>> On 31/07/18 22:48, James Greenhalgh wrote:
>  On Fri, Jul 20, 2018 at 04:37:34AM -0500, Vlad Lazar wrote:
> > Hi,
> >
> > The patch adds implementations for the NEON intrinsics vabsd_s64 and 
> > vnegd_s64.
> > (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/docs/ihi0073/latest/arm-neon-intrinsics-reference-architecture-specification)
> >
> > Bootstrapped and regtested on aarch64-none-linux-gnu and there are no 
> > regressions.
> >
> > OK for trunk?
> >
> > +__extension__ extern __inline int64_t
> > +__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> > +vnegd_s64 (int64_t __a)
> > +{
> > +  return -__a;
> > +}
> 
>  Does this give the correct behaviour for the minimum value of int64_t? 
>  That
>  would be undefined behaviour in C, but well-defined under ACLE.
> 
>  Thanks,
>  James
> 
> >>>
> >>> Hi. Thanks for the review.
> >>>
> >>> For the minimum value of int64_t it behaves as the ACLE specifies:
> >>> "The negative of the minimum (signed) value is itself."
> >>
> >> What should happen in this testcase? The spoiler is below, but try to work 
> >> out
> >> what should happen and what goes wrong with your implementation.
> >>
> >>int foo (int64_t x)
> >>{
> >>  if (x < (int64_t) 0)
> >>return vnegd_s64(x) < (int64_t) 0;
> >>  else
> >>return 0;
> >>}
> >>int bar (void)
> >>{
> >>  return foo (INT64_MIN);
> >>}
> >> Thanks,
> >> James
> >>
> >>
> >> -
> >>
> >> 
> >>
> >>
> >>
> >>
> >> INT64_MIN < 0 should be true, so we should return vnegd_s64(INT64_MIN) < 0.
> >> vnegd_s64(INT64_MIN) is identity, so the return value should be
> >> INT64_MIN < 0; i.e. True.
> >>
> >> This isn't what the compiler thinks... The compiler makes use of the fact
> >> that -INT64_MIN is undefined behaviour in C, and doesn't need to be 
> >> considered
> >> as a special case. The if statement gives you a range reduction to [-INF, 
> >> -1],
> >> negating that gives you a range [1, INF], and [1, INF] is never less than 
> >> 0,
> >> so the compiler folds the function to return false. We have a mismatch in
> >> semantics
> >>
> > I see your point now. I have updated the vnegd_s64 intrinsic to convert to
> > unsigned before negating. This means that if the predicted range of x is
> > [INT64_MIN, y], then the predicted range of vnegd_s64 (x) will be
> > ~[INT64_MIN + 1, y] which seems to resolve the issue. I've also added 
> > testcases
> > which reflect the issue you've pointed out. Note that I've change the 
> > vabsd_s64
> > intrinsic in order to avoid moves between integer and vector registers.

I think from my reading of the standard that this is OK, but I may be rusty
and missing a corner case.

OK for trunk.

Thanks,
James



Re: RFA: Avoid warning for write_predicate_subfunction generated function

2018-08-28 Thread Richard Sandiford
Joern Wolfgang Rennecke  writes:
> 2018-08-28  Joern Rennecke  
>
>   * genpreds.c (write_predicate_subfunction): Also add ATTRIBUTE_UNUSED
>   to OP parmeter of generated function.

OK, thanks.

Richard

> Index: genpreds.c
> ===
> --- genpreds.c(revision 5271)
> +++ genpreds.c(working copy)
> @@ -152,7 +152,7 @@ write_predicate_subfunction (struct pred
>p->exp = and_exp;
>  
>printf ("static inline int\n"
> -   "%s_1 (rtx op, machine_mode mode ATTRIBUTE_UNUSED)\n",
> +   "%s_1 (rtx op ATTRIBUTE_UNUSED, machine_mode mode 
> ATTRIBUTE_UNUSED)\n",
> p->name);
>rtx_reader_ptr->print_md_ptr_loc (p->c_block);
>if (p->c_block[0] == '{')


Re: [PATCH, OpenACC] (2/2) Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)

2018-08-28 Thread Julian Brown
On Tue, 28 Aug 2018 12:23:22 -0700
Cesar Philippidis  wrote:

> On 08/28/2018 12:19 PM, Julian Brown wrote:
> 
> > diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
> > index f038f4c..86be407 100644
> > --- a/gcc/fortran/trans-openmp.c
> > +++ b/gcc/fortran/trans-openmp.c
> > @@ -1045,9 +1045,13 @@ gfc_omp_finish_clause (tree c, gimple_seq
> > *pre_p) 
> >tree decl = OMP_CLAUSE_DECL (c);
> >  
> > -  /* Assumed-size arrays can't be mapped implicitly, they have to
> > be
> > - mapped explicitly using array sections.  */
> > -  if (TREE_CODE (decl) == PARM_DECL
> > +  /* Assumed-size arrays can't be mapped implicitly, they have to
> > be mapped
> > + explicitly using array sections.  An exception is if the
> > array is
> > + mapped explicitly in an enclosing data construct for OpenACC,
> > in which
> > + case we see GOMP_MAP_FORCE_PRESENT here and do not need to
> > raise an
> > + error.  */
> > +  if (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_PRESENT
> > +  && TREE_CODE (decl) == PARM_DECL
> >&& GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
> >&& GFC_TYPE_ARRAY_AKIND (TREE_TYPE (decl)) ==
> > GFC_ARRAY_UNKNOWN && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),  
> 
> This is specific to OpenACC, and needs to be guarded as such.

Are you sure that condition can be true for OpenMP? I'd assumed not...

Julian


RFA: Avoid warning for write_predicate_subfunction generated function

2018-08-28 Thread Joern Wolfgang Rennecke


In predicates.md, we have a predicate like this:

(define_special_predicate "esirisc_simd_shift_reg_operand"
  (match_operand 0 "d_register_operand")
{
  /* Earlier revs shifted both halves by the same amount, which is not usable.  
*/
  return esirisc_rev_option > 10;
})

genpreds generates for that:

static inline int
esirisc_simd_shift_reg_operand_1 (rtx op, machine_mode mode ATTRIBUTE_UNUSED)
#line 300 "../../gcc/trunk/gcc/gcc/config/esirisc/predicates.md"
{
  /* Earlier revs shifted both halves by the same amount, which is not usable.  
*/
  return esirisc_rev_option > 10;
}

which means that insn-preds.c fails to build when gcc is configured
with --enable-werror-always .

Fixed with the following patch.  Bootstrapped in r263928 on x86_64-pc-linux-gnu.
OK to apply?

2018-08-28  Joern Rennecke  

* genpreds.c (write_predicate_subfunction): Also add ATTRIBUTE_UNUSED
to OP parmeter of generated function.

Index: genpreds.c
===
--- genpreds.c  (revision 5271)
+++ genpreds.c  (working copy)
@@ -152,7 +152,7 @@ write_predicate_subfunction (struct pred
   p->exp = and_exp;
 
   printf ("static inline int\n"
- "%s_1 (rtx op, machine_mode mode ATTRIBUTE_UNUSED)\n",
+ "%s_1 (rtx op ATTRIBUTE_UNUSED, machine_mode mode 
ATTRIBUTE_UNUSED)\n",
  p->name);
   rtx_reader_ptr->print_md_ptr_loc (p->c_block);
   if (p->c_block[0] == '{')


Re: [PATCH] doc: document -ftree-scev-cprop

2018-08-28 Thread Richard Sandiford
Alexander Monakov  writes:
> @@ -9120,6 +9121,15 @@ determining number of iterations requires complicated 
> analysis.  Later
>  optimizations then may determine the number easily.  Useful especially
>  in connection with unrolling.
>  
> +@item -ftree-scev-cprop
> +@opindex ftree-scev-cprop
> +Perform final value replacement.  If a variable is modified in a loop
> +in a way that its value when exiting the loop can be determined using

"in such a way"

> +only its initial value and number of loop iterations, replace uses of

"and the number"

> +the final value by such computation, provided it is sufficiently cheap.

"by such a computation".

OK otherwise, thanks.

Richard

> +This reduces data dependencies and may allow further simplifications.
> +Enabled by default at @option{-O} and higher.



Re: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]

2018-08-28 Thread Richard Sandiford
Tamar Christina  writes:
> +  HOST_WIDE_INT guard_used_by_caller = STACK_CLASH_CALLER_GUARD;
> +  /* When doing the final adjustment for the outgoing argument size we can't
> + assume that LR was saved at position 0.  So subtract it's offset from 
> the
> + ABI safe buffer so that we don't accidentally allow an adjustment that
> + would result in an allocation larger than the ABI buffer without
> + probing.  */
> +  HOST_WIDE_INT min_probe_threshold
> += final_adjustment_p
> +  ? guard_used_by_caller - cfun->machine->frame.reg_offset[LR_REGNUM]
> +  : guard_size - guard_used_by_caller;
[...]
> +  if (residual)
> +{
> +  aarch64_sub_sp (temp1, temp2, residual, frame_related_p);
> +  if (residual >= min_probe_threshold)
> + {
> +   if (dump_file)
> + fprintf (dump_file,
> +  "Stack clash AArch64 prologue residuals: "
> +  HOST_WIDE_INT_PRINT_DEC " bytes, probing will be required."
> +  "\n", residual);
> +   emit_stack_probe (plus_constant (Pmode, stack_pointer_rtx,
> +STACK_CLASH_CALLER_GUARD));

reg_offsets are nonnegative, so if LR_REGNUM isn't saved at position 0,
min_probe_threshold will be less than STACK_CLASH_CALLER_GUARD.  It looks
like the probe would then write above the region.

Using >= rather than > means that the same thing could happen when
LR_REGNUM is at position 0, if the residual is exactly
STACK_CLASH_CALLER_GUARD.

Thanks,
Richard


Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

2018-08-28 Thread Martin Sebor

On 08/27/2018 10:27 PM, Jeff Law wrote:

On 08/27/2018 10:27 AM, Martin Sebor wrote:

On 08/27/2018 02:29 AM, Richard Biener wrote:

On Sun, Aug 26, 2018 at 7:26 AM Jeff Law  wrote:


On 08/24/2018 09:58 AM, Martin Sebor wrote:

The warning suppression for -Wstringop-truncation looks for
the next statement after a truncating strncpy to see if it
adds a terminating nul.  This only works when the next
statement can be reached using the Gimple statement iterator
which isn't until after gimplification.  As a result, strncpy
calls that truncate their constant argument that are being
folded to memcpy this early get diagnosed even if they are
followed by the nul assignment:

  const char s[] = "12345";
  char d[3];

  void f (void)
  {
strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
d[sizeof d - 1] = 0;
  }

To avoid the warning I propose to defer folding strncpy to
memcpy until the pointer to the basic block the strnpy call
is in can be used to try to reach the next statement (this
happens as early as ccp1).  I'm aware of the preference to
fold things early but in the case of strncpy (a relatively
rarely used function that is often misused), getting
the warning right while folding a bit later but still fairly
early on seems like a reasonable compromise.  I fear that
otherwise, the false positives will drive users to adopt
other unsafe solutions (like memcpy) where these kinds of
bugs cannot be as readily detected.

Tested on x86_64-linux.

Martin

PS There still are outstanding cases where the warning can
be avoided.  I xfailed them in the test for now but will
still try to get them to work for GCC 9.

gcc-87028.diff


PR tree-optimization/87028 - false positive -Wstringop-truncation
strncpy with global variable source string
gcc/ChangeLog:

  PR tree-optimization/87028
  * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
  statement doesn't belong to a basic block.
  * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
  the left hand side of assignment.

gcc/testsuite/ChangeLog:

  PR tree-optimization/87028
  * c-c++-common/Wstringop-truncation.c: Remove xfails.
  * gcc.dg/Wstringop-truncation-5.c: New test.

diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
index 07341eb..284c2fb 100644
--- a/gcc/gimple-fold.c
+++ b/gcc/gimple-fold.c
@@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
(gimple_stmt_iterator *gsi,
   if (tree_int_cst_lt (ssize, len))
 return false;

+  /* Defer warning (and folding) until the next statement in the basic
+ block is reachable.  */
+  if (!gimple_bb (stmt))
+return false;

I think you want cfun->cfg as the test here.  They should be equivalent
in practice.


Please do not add 'cfun' references.  Note that the next stmt is also
accessible
when there is no CFG.  I guess the issue is that we fold this during
gimplification
where the next stmt is not yet "there" (but still in GENERIC)?

We generally do not want to have unfolded stmts in the IL when we can
avoid that
which is why we fold most stmts during gimplification.  We also do
that because
we now do less folding on GENERIC.

There may be the possibility to refactor gimplification time folding
to what we
do during inlining - queue stmts we want to fold and perform all
folding delayed.
This of course means bigger compile-time due to cache effects.




diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
index d0792aa..f1988f6 100644
--- a/gcc/tree-ssa-strlen.c
+++ b/gcc/tree-ssa-strlen.c
@@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
(gimple_stmt_iterator gsi, tree src, tree cnt)
&& known_eq (dstoff, lhsoff)
&& operand_equal_p (dstbase, lhsbase, 0))
  return false;
+
+  if (code == MEM_REF
+   && TREE_CODE (lhsbase) == SSA_NAME
+   && known_eq (dstoff, lhsoff))
+ {
+   /* Extract the referenced variable from something like
+MEM[(char *)d_3(D) + 3B] = 0;  */
+   gimple *def = SSA_NAME_DEF_STMT (lhsbase);
+   if (gimple_nop_p (def))
+ {
+   lhsbase = SSA_NAME_VAR (lhsbase);
+   if (lhsbase
+   && dstbase
+   && operand_equal_p (dstbase, lhsbase, 0))
+ return false;
+ }
+ }

If you find yourself looking at SSA_NAME_VAR, you're usually barking up
the wrong tree.  It'd be easier to suggest something here if I could see
the gimple (with virtual operands).  BUt at some level what you really
want to do is make sure the base of the MEM_REF is the same as what got
passed as the destination of the strncpy.  You'd want to be testing
SSA_NAMEs in that case.


Yes.  Why not simply compare the SSA names?  Why would it be
not OK to do that when !lhsbase?


The added code handles this case:

  void f (char *d)
  {
__builtin_strncpy (d, "12345", 4);
d[3] = 0;
  }

where during forwprop we see:

  __builtin_strncpy (d_3(D), "12345", 4);
  MEM[(char *)d_3(D) + 3B] = 0;

The next statement after the strncpy is the 

Re: [PATCH][GCC][AArch64] Add support for SVE stack clash probing [patch (2/7)]

2018-08-28 Thread Richard Sandiford
I'll leave the AArch64 maintainers to review, but some comments.

Tamar Christina  writes:
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 06451f38b11822ea77323438fe8c7e373eb9e614..e7efde79bb111e820f4df44a276f6f73070ecd17
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -3970,6 +3970,90 @@ aarch64_output_probe_stack_range (rtx reg1, rtx reg2)
>return "";
>  }
 
> +/* Emit the probe loop for doing stack clash probes and stack adjustments for
> +   SVE.  This emits probes from BASE to BASE + ADJUSTMENT based on a guard 
> size
> +   of GUARD_SIZE and emits a probe when at least LIMIT bytes are allocated.  
> By
> +   the end of this function BASE = BASE + ADJUSTMENT.  */
> +
> +const char *
> +aarch64_output_probe_sve_stack_clash (rtx base, rtx adjustment, rtx limit,
> +   rtx guard_size)
> +{
> +  /* This function is not allowed to use any instruction generation function
> + like gen_ and friends.  If you do you'll likely ICE during CFG 
> validation,
> + so instead emit the code you want using output_asm_insn.  */
> +  gcc_assert (flag_stack_clash_protection);
> +  gcc_assert (CONST_INT_P (limit) && CONST_INT_P (guard_size));
> +  gcc_assert (aarch64_uimm12_shift (INTVAL (limit)));
> +  gcc_assert (aarch64_uimm12_shift (INTVAL (guard_size)));
> +
> +  static int labelno = 0;
> +  char loop_start_lab[32];
> +  char loop_res_lab[32];
> +  char loop_end_lab[32];
> +  rtx xops[2];
> +
> +  ASM_GENERATE_INTERNAL_LABEL (loop_start_lab, "SVLPSRL", labelno);
> +  ASM_GENERATE_INTERNAL_LABEL (loop_res_lab, "BRRCHK", labelno);
> +  ASM_GENERATE_INTERNAL_LABEL (loop_end_lab, "BERCHK", labelno++);
> +
> +  /* Emit loop start label.  */
> +  ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, loop_start_lab);
> +
> +  /* Test if ADJUSTMENT < GUARD_SIZE.  */
> +  xops[0] = adjustment;
> +  xops[1] = guard_size;
> +  output_asm_insn ("cmp\t%0, %1", xops);
> +
> +  /* Branch to residual loop if it is.  */
> +  fputs ("\tb.lt\t", asm_out_file);
> +  assemble_name_raw (asm_out_file, loop_res_lab);
> +  fputc ('\n', asm_out_file);
> +
> +  /* BASE = BASE - GUARD_SIZE.  */
> +  xops[0] = base;
> +  xops[1] = guard_size;
> +  output_asm_insn ("sub\t%0, %0, %1", xops);
> +
> +  /* Probe at BASE + LIMIT.  */
> +  xops[1] = limit;
> +  output_asm_insn ("str\txzr, [%0, %1]", xops);
> +
> +  /* ADJUSTMENT = ADJUSTMENT - GUARD_SIZE.  */
> +  xops[0] = adjustment;
> +  xops[1] = guard_size;
> +  output_asm_insn ("sub\t%0, %0, %1", xops);
> +
> +  /* Branch to loop start.  */
> +  fputs ("\tb\t", asm_out_file);
> +  assemble_name_raw (asm_out_file, loop_start_lab);
> +  fputc ('\n', asm_out_file);
> +
> +  /* Emit residual check label.  */
> +  ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, loop_res_lab);
> +
> +  /* BASE = BASE - ADJUSTMENT.  */
> +  xops[0] = base;
> +  xops[1] = adjustment;
> +  output_asm_insn ("sub\t%0, %0, %1", xops);
> +
> +  /* Test if BASE < LIMIT.  */
> +  xops[1] = limit;
> +  output_asm_insn ("cmp\t%0, %1", xops);

Think this should be ADJUSTMENT < LIMIT.

> +  /* Branch to end.  */
> +  fputs ("\tb.lt\t", asm_out_file);
> +  assemble_name_raw (asm_out_file, loop_end_lab);
> +  fputc ('\n', asm_out_file);
> +
> +  /* Probe at BASE + LIMIT.  */
> +  output_asm_insn ("str\txzr, [%0, %1]", xops);

It looks like this would probe at LIMIT when ADJUSTMENT is exactly LIMIT,
which could clobber the caller's frame.

> +
> +  /* No probe leave.  */
> +  ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, loop_end_lab);
> +  return "";

With the CFA stuff and constant load, I think this works out as:

-
# 12 insns
mov r15, base
mov adjustment, N
1:
cmp adjustment, guard_size
b.lt2f
sub base, base, guard_size
str xzr, [base, limit]
sub adjustment, adjustment, guard_size
b   1b
2:
sub base, base, adjustment
cmp adjustment, limit
b.le3f
str xzr, [base, limit]
3:
-

What do you think about something like:

-
# 10 insns
mov adjustment, N
sub r15, base, adjustment
subsadjustment, adjustment, min_probe_threshold
b.lo2f
1:
add base, x15, adjustment
str xzr, [base, 0]
subsadjustment, adjustment, 16
and adjustment, adjustment, ~(guard_size-1)
b.hs1b
2:
mov base, r15
-

or (with different trade-offs):

-
# 11 insns
mov adjustment, N
sub r15, base, adjustment
subsadjustment, adjustment, min_probe_threshold
b.lo2f
# Might be 0, leading to a double probe
and r14, adjustment, guard_size-1
1:
  

Re: [PATCH] Optimize more boolean functions

2018-08-28 Thread Jeff Law
On 08/28/2018 09:34 AM, MCC CS wrote:
> Hi again,
> 
> I guess I haven't been clear enough. I don't
> have push access, is it possible for you to
> push my patch to trunk?
> 
> The same patch as:
> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01688.html
> but updated dates.
> 
> 2018-08-28 MCC CS 
> 
>   gcc/
>   PR tree-optimization/87009
>   * match.pd: Add boolean optimizations.
> 
> 2018-08-28 MCC CS 
> 
>   gcc/testsuite/
>   PR tree-optimization/87009
>   * gcc.dg/pr87009.c: New test.
I didn't see a note WRT bootstrap/comparison test of the final version.
So I took care of that for you.

Installed on the trunk,
Jeff



Re: [PATCH 4/4] bb-reorder: convert to gcc_stablesort

2018-08-28 Thread Michael Matz
Hi,

On Tue, 28 Aug 2018, Alexander Monakov wrote:

> > I think your proposed one
> > warrants some comments.  Maybe trade speed for some clearer code?
> 
> I think it's not too complicated, but how about adding this comment:
> 
>   profile_count m = c1.max (c2);
>   /* Return 0 if counts are equal, -1 if E1 has the larger count.  */
>   return (m == c2) - (m == c1);
> 
> Or, alternatively, employ more branchy checks:
> 
>   if (c1 == c2)
> return 0;
>   if (c1 == c1.max (c2))
> return -1;
>   return 1;

You want to at least comment on why a tradition /== check isn't 
working, namely because of the uninitialized counts.  (Though I also think 
that comparing uninitialized counts is just non-sense and hence should 
actually be ruled out from the whole comparison business, at which point a 
normal <= check can be done again).

> > The comments on the operator<> implementations are odd as well:
> 
> (an aside, but: it also has strange redundant code like
> 
>   if (*this  == profile_count::zero ())
> return false;
>   if (other == profile_count::zero ())
> return !(*this == profile_count::zero ());
> 
> where the second return could have been 'return true;')

Oh my.


Ciao,
Michael.


Re: [PATCH, OpenACC] (2/2) Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)

2018-08-28 Thread Cesar Philippidis
On 08/28/2018 12:19 PM, Julian Brown wrote:

> diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
> index f038f4c..86be407 100644
> --- a/gcc/fortran/trans-openmp.c
> +++ b/gcc/fortran/trans-openmp.c
> @@ -1045,9 +1045,13 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
>  
>tree decl = OMP_CLAUSE_DECL (c);
>  
> -  /* Assumed-size arrays can't be mapped implicitly, they have to be
> - mapped explicitly using array sections.  */
> -  if (TREE_CODE (decl) == PARM_DECL
> +  /* Assumed-size arrays can't be mapped implicitly, they have to be mapped
> + explicitly using array sections.  An exception is if the array is
> + mapped explicitly in an enclosing data construct for OpenACC, in which
> + case we see GOMP_MAP_FORCE_PRESENT here and do not need to raise an
> + error.  */
> +  if (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_PRESENT
> +  && TREE_CODE (decl) == PARM_DECL
>&& GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
>&& GFC_TYPE_ARRAY_AKIND (TREE_TYPE (decl)) == GFC_ARRAY_UNKNOWN
>&& GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),

This is specific to OpenACC, and needs to be guarded as such.

Cesar


Re: Make safe_iterator inline friends

2018-08-28 Thread François Dumont

On 28/08/2018 21:04, Jonathan Wakely wrote:

On 23/08/18 22:59 +0200, François Dumont wrote:

On 22/08/2018 23:45, Jonathan Wakely wrote:

On 22/08/18 23:08 +0200, François Dumont wrote:
Only operator== and != remains outside _Safe_iterator because all 
my attempts to make them inline friends failed. I understand that 
an inline friend within a base class is not a very clean design.


Compiler error was:

/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_iterator.h:459: 
error: redefinition of 'bool __gnu_debug::operator==(const _Self&, 
const _OtherSelf&)'
/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_iterator.h:452: 
note: 'bool __gnu_debug::operator==(const _Self&, const _Self&)' 
previously declared here
/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_iterator.h:473: 
error: redefinition of 'bool __gnu_debug::operator!=(const _Self&, 
const _OtherSelf&)'
/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_iterator.h:466: 
note: 'bool __gnu_debug::operator!=(const _Self&, const _Self&)' 
previously declared here


I don't know if it is a compiler issue


I don't think so. The error seems clear: when _Self and _OtherSelf are
the same type the friend declarations are the same function.


_Self and _OtherSelf and like the types defined in 
_Safe_iterator<_It, _Sq, random_access_interator_tag> in this patch. 
Depending on __conditional_type so definitely different.


What about containers like std::set where iterator and const_iterator
are the same type?


Good point, thanks, I'll consider this point.


There's no changelog in the email with the patch.


And I'll also provide ChangeLog next time.

François


[PATCH, OpenACC] (2/2) Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)

2018-08-28 Thread Julian Brown
This follow-up patch enables the "inheritance" of mappings for OpenACC
data constructs to work also for Fortran assumed-size arrays.
Otherwise, such arrays are (arguably, prematurely) bailed out on in the
Fortran front-end.

Tested alongside the previous patch with offloading to nvptx.

OK to apply?

Thanks,

Julian

2018-08-28  Julian Brown  

gcc/fortran/
* trans-openmp.c (gfc_omp_finish_clause): Don't raise error for
assumed-size array if present in a lexically-enclosing data construct.

libgomp/
* testsuite/libgomp.oacc-fortran/pr70828-4.f90: New test.
>From 9214ffc6bb2ac7cf023f4e62ca324b1a47123ffc Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Tue, 28 Aug 2018 09:01:15 -0700
Subject: [PATCH 2/2] Assumed-size array fix

2018-08-28  Julian Brown  

	gcc/fortran/
	* trans-openmp.c (gfc_omp_finish_clause): Don't raise error for
	assumed-size array if present in a lexically-enclosing data construct.

	libgomp/
	* testsuite/libgomp.oacc-fortran/pr70828-4.f90: New test.
---
 gcc/fortran/trans-openmp.c | 10 ---
 .../testsuite/libgomp.oacc-fortran/pr70828-4.f90   | 31 ++
 2 files changed, 38 insertions(+), 3 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pr70828-4.f90

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index f038f4c..86be407 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1045,9 +1045,13 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
 
   tree decl = OMP_CLAUSE_DECL (c);
 
-  /* Assumed-size arrays can't be mapped implicitly, they have to be
- mapped explicitly using array sections.  */
-  if (TREE_CODE (decl) == PARM_DECL
+  /* Assumed-size arrays can't be mapped implicitly, they have to be mapped
+ explicitly using array sections.  An exception is if the array is
+ mapped explicitly in an enclosing data construct for OpenACC, in which
+ case we see GOMP_MAP_FORCE_PRESENT here and do not need to raise an
+ error.  */
+  if (OMP_CLAUSE_MAP_KIND (c) != GOMP_MAP_FORCE_PRESENT
+  && TREE_CODE (decl) == PARM_DECL
   && GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
   && GFC_TYPE_ARRAY_AKIND (TREE_TYPE (decl)) == GFC_ARRAY_UNKNOWN
   && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),
diff --git a/libgomp/testsuite/libgomp.oacc-fortran/pr70828-4.f90 b/libgomp/testsuite/libgomp.oacc-fortran/pr70828-4.f90
new file mode 100644
index 000..01da999
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-fortran/pr70828-4.f90
@@ -0,0 +1,31 @@
+! Subarrays declared on data construct: assumed-size array.
+
+subroutine s1(n, arr)
+  integer :: n
+  integer :: arr(*)
+
+  !$acc data copy(arr(5:n-10))
+  !$acc parallel loop
+  do i = 10, n - 10
+ arr(i) = i
+  end do
+  !$acc end parallel loop
+  !$acc end data
+end subroutine s1
+
+program test
+  integer, parameter :: n = 100
+  integer i, data(n)
+
+  data(:) = 0
+
+  call s1(n, data)
+
+  do i = 1, n
+ if ((i < 10 .or. i > n-10)) then
+if ((data(i) .ne. 0)) call abort
+ else if (data(i) .ne. i) then
+call abort
+ end if
+  end do
+end program test
-- 
1.8.1.1



[PATCH, OpenACC] (1/2) Fix implicit mapping for array slices on lexically-enclosing data constructs (PR70828)

2018-08-28 Thread Julian Brown
This patch implements support for array slices (with a non-zero base
element) declared on OpenACC data constructs. Any lexically-enclosed
parallel or kernels regions should "inherit" such mappings, e.g. if we
have:

#pragma acc data copy(arr[10:20])
{
#pragma acc parallel loop
  for (...) { ...arr[X]... }
}

the mapping for "arr" on the data construct takes precedence over the
default mapping behaviour for the parallel construct, which is to map
the whole array. (OpenACC 2.5, "2.5.1. Parallel Construct" and
elsewhere).

Tested with offloading to nvptx. (This patch differs in implementation
somewhat from the version on the gomp4, etc. branches.)

OK to apply?

Thanks,

Julian

2018-08-28  Julian Brown  
Cesar Philippidis  

PR middle-end/70828

gcc/
* gimplify.c (gimplify_omp_ctx): Add decl_data_clause hash map.
(new_omp_context): Initialise above.
(delete_omp_context): Delete above.
(gimplify_scan_omp_clauses): Scan for array mappings on data constructs,
and record in above map.
(gomp_needs_data_present): New function.
(gimplify_adjust_omp_clauses_1): Handle data mappings (e.g. array
slices) declared in lexically-enclosing data constructs.
* omp-low.c (lower_omp_target): Allow decl for bias not to be present
in omp context.

gcc/testsuite/
* c-c++-common/goacc/acc-data-chain.c: New test.
* gfortran.dg/goacc/pr70828.f90: New test.
* gfortran.dg/goacc/pr70828-2.f90: New test.

libgomp/
* testsuite/libgomp.oacc-c-c++-common/pr70828.c: New test.
* testsuite/libgomp.oacc-fortran/implicit_copy.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-2.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-3.f90: New test.
* testsuite/libgomp.oacc-fortran/pr70828-5.f90: New test.
>From 9123c4ddd701c40c3e85a0c6cd327066542b9e7a Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Thu, 16 Aug 2018 20:02:10 -0700
Subject: [PATCH 1/2] Inheritance of array sections on data constructs.

2018-08-28  Julian Brown  
	Cesar Philippidis  

	gcc/
	* gimplify.c (gimplify_omp_ctx): Add decl_data_clause hash map.
	(new_omp_context): Initialise above.
	(delete_omp_context): Delete above.
	(gimplify_scan_omp_clauses): Scan for array mappings on data constructs,
	and record in above map.
	(gomp_needs_data_present): New function.
	(gimplify_adjust_omp_clauses_1): Handle data mappings (e.g. array
	slices) declared in lexically-enclosing data constructs.
	* omp-low.c (lower_omp_target): Allow decl for bias not to be present
	in omp context.

	gcc/testsuite/
	* c-c++-common/goacc/acc-data-chain.c: New test.
	* gfortran.dg/goacc/pr70828.f90: New test.
	* gfortran.dg/goacc/pr70828-2.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/pr70828.c: New test.
	* testsuite/libgomp.oacc-fortran/implicit_copy.f90: New test.
	* testsuite/libgomp.oacc-fortran/pr70828.f90: New test.
	* testsuite/libgomp.oacc-fortran/pr70828-2.f90: New test.
	* testsuite/libgomp.oacc-fortran/pr70828-3.f90: New test.
	* testsuite/libgomp.oacc-fortran/pr70828-5.f90: New test.
---
 gcc/gimplify.c | 97 +-
 gcc/omp-low.c  |  7 +-
 gcc/testsuite/c-c++-common/goacc/acc-data-chain.c  | 24 ++
 gcc/testsuite/gfortran.dg/goacc/pr70828.f90| 22 +
 .../libgomp.oacc-c-c++-common/pr70828-2.c  | 34 
 .../testsuite/libgomp.oacc-c-c++-common/pr70828.c  | 27 ++
 .../libgomp.oacc-fortran/implicit_copy.f90 | 30 +++
 .../testsuite/libgomp.oacc-fortran/pr70828-2.f90   | 31 +++
 .../testsuite/libgomp.oacc-fortran/pr70828-3.f90   | 34 
 .../testsuite/libgomp.oacc-fortran/pr70828-5.f90   | 29 +++
 libgomp/testsuite/libgomp.oacc-fortran/pr70828.f90 | 24 ++
 11 files changed, 354 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/acc-data-chain.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/pr70828.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/pr70828-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/pr70828.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/implicit_copy.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pr70828-2.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pr70828-3.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pr70828-5.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/pr70828.f90

diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index dbd0f0e..d704aef 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -191,6 +191,7 @@ struct gimplify_omp_ctx
   bool target_map_scalars_firstprivate;
   bool target_map_pointers_as_0len_arrays;
   bool target_firstprivatize_array_bases;
+  hash_map > *decl_data_clause;
 

Re: [PATCH] Use more DECL_BUILT_IN_P macro.

2018-08-28 Thread H.J. Lu
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87130


H.J.


Re: Make safe_iterator inline friends

2018-08-28 Thread Jonathan Wakely

On 23/08/18 22:59 +0200, François Dumont wrote:

On 22/08/2018 23:45, Jonathan Wakely wrote:

On 22/08/18 23:08 +0200, François Dumont wrote:
Only operator== and != remains outside _Safe_iterator because all 
my attempts to make them inline friends failed. I understand that 
an inline friend within a base class is not a very clean design.


Compiler error was:

/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_iterator.h:459: 
error: redefinition of 'bool __gnu_debug::operator==(const _Self&, 
const _OtherSelf&)'
/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_iterator.h:452: 
note: 'bool __gnu_debug::operator==(const _Self&, const _Self&)' 
previously declared here
/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_iterator.h:473: 
error: redefinition of 'bool __gnu_debug::operator!=(const _Self&, 
const _OtherSelf&)'
/home/fdt/dev/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/debug/safe_iterator.h:466: 
note: 'bool __gnu_debug::operator!=(const _Self&, const _Self&)' 
previously declared here


I don't know if it is a compiler issue


I don't think so. The error seems clear: when _Self and _OtherSelf are
the same type the friend declarations are the same function.


_Self and _OtherSelf and like the types defined in _Safe_iterator<_It, 
_Sq, random_access_interator_tag> in this patch. Depending on 
__conditional_type so definitely different.


What about containers like std::set where iterator and const_iterator
are the same type?

There's no changelog in the email with the patch.






Re: Keep std::deque algos specializations in Debug mode

2018-08-28 Thread Jonathan Wakely

On 25/08/18 22:44 +0200, François Dumont wrote:
The last optimizations that get disabled when Debug mode is enable are 
the algo specializations for std::deque iterators.


This patch move those algos in std namespace as they should even when 
Debug mode is enable so that they get considered even when calls are 
made with the namespace qualification. And it adds all the algos Debug 
implementations which forward to normal implementations to benefit 
from optimizations.


Please add a ChangeLog to the patch.




Re: Keep std::deque algos specializations in Debug mode

2018-08-28 Thread Jonathan Wakely

On 25/08/18 22:44 +0200, François Dumont wrote:
Note that I try to use typename deque<>::iterator or typename 
deque<>::const_iterator to define Debug algos but it didn't work, gcc 
was just not considering those overloads. I wonder why ?


Because you can't deduce the template arguments from those types:
https://en.cppreference.com/w/cpp/language/template_argument_deduction#Non-deduced_contexts



Re: [2/6] Make vec_info::lookup_single_use take a stmt_vec_info

2018-08-28 Thread Jeff Law
On 08/28/2018 05:21 AM, Richard Sandiford wrote:
> All callers to vec_info::lookup_single_use are asking about the lhs of a
> statement that they're already examining.  It makes more sense to pass
> that statement instead of the SSA name, since it is then easier to
> handle statements that have been replaced by pattern statements.
> 
> A later patch adds support for pattern statements.  This one just
> changes the interface to make that possible.
> 
> 
> 2018-08-28  Richard Sandiford  
> 
> gcc/
>   * tree-vectorizer.h (vec_info::lookup_single_use): Take a
>   stmt_vec_info instead of an SSA name.
>   * tree-vectorizer.c (vec_info::lookup_single_use): Likewise.
>   * tree-vect-loop.c (vectorizable_reduction): Update call
>   accordingly.
>   * tree-vect-stmts.c (supportable_widening_operation): Likewise.
OK
jeff


Re: [1/6] Handle gphis in gimple_get_lhs

2018-08-28 Thread Jeff Law
On 08/28/2018 05:20 AM, Richard Sandiford wrote:
> Several callers of gimple_get_lhs deal with statements that might
> be phis.  This patch makes gimple_get_lhs handle that case too,
> so that the callers don't have to.
> 
> 
> 2018-08-28  Richard Sandiford  
> 
> gcc/
>   * gimple.c (gimple_get_lhs): Handle gphis.
>   * tree-ssa-phionlycprop.c (get_lhs_or_phi_result): Delete and...
>   (propagate_rhs_into_lhs, eliminate_const_or_copy): ...use
>   gimple_get_lhs instead.
>   * tree-ssa-dom.c (eliminate_redundant_computations): Don't handle
>   phis specially before calling gimple_get_lhs.
>   * tree-ssa-scopedtables.c (avail_exprs_stack::lookup_avail_expr):
>   Likewise.
>   * tree-vect-loop.c (vectorizable_live_operation): Likewise.
>   * tree-vect-slp.c (vect_get_slp_vect_defs): Likewise.
>   (vect_get_slp_defs): Likewise.
>   * tree-vect-stmts.c (vect_get_vec_def_for_operand_1): Likewise.
>   (vect_get_vec_def_for_stmt_copy, vect_transform_stmt): Likewise.
I pondered doing this repeatedly through the years.  So I'm all for it.
Changes look fine to me.

jeff


Re: [PATCH] DWARF: add DW_AT_count to zero-length arrays

2018-08-28 Thread Omar Sandoval
On Fri, Aug 17, 2018 at 12:16:07AM -0700, Omar Sandoval wrote:
> On Thu, Aug 16, 2018 at 11:54:53PM -0700, Omar Sandoval wrote:
> > On Thu, Aug 16, 2018 at 10:27:48PM -0700, Andrew Pinski wrote:
> > > On Thu, Aug 16, 2018 at 9:29 PM Omar Sandoval  wrote:
> > > >
> > > > Hi,
> > > >
> > > > This fixes the issue that it is impossible to distinguish a zero-length 
> > > > array
> > > > type from a flexible array type given the DWARF produced by GCC (which I
> > > > reported here [1]). We do so by adding a DW_AT_count attribute with a 
> > > > value of
> > > > zero only for zero-length arrays (this is what clang does in this case, 
> > > > too).
> > > >
> > > > 1: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86985
> > > >
> > > > The reproducer from the PR now produces the expected output:
> > > >
> > > > $ ~/gcc-build/gcc/xgcc -B ~/gcc-build/gcc -g -c zero_length.c
> > > > $ ~/gcc-build/gcc/xgcc -B ~/gcc-build/gcc -g -c flexible.c
> > > > $ gdb -batch -ex 'ptype baz' zero_length.o
> > > > type = struct {
> > > > int foo;
> > > > int bar[0];
> > > > }
> > > > $ gdb -batch -ex 'ptype baz' flexible.o
> > > > type = struct {
> > > > int foo;
> > > > int bar[];
> > > > }
> > > >
> > > > This was bootstrapped and tested on x86_64-pc-linux-gnu.
> > > >
> > > > I don't have commit rights (first time contributor), so if this change 
> > > > is okay
> > > > could it please be applied?
> 
> [snip]
> 
> Here's the patch with the is_c () helper instead.
> 
> 2018-08-17  Omar Sandoval  
> 
>   * dwarf2out.c (is_c): New.
>   (add_subscript_info): Add DW_AT_count of 0 for C zero-length arrays.
> 
> diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
> index 5a74131d332..189f9bb381f 100644
> --- a/gcc/dwarf2out.c
> +++ b/gcc/dwarf2out.c
> @@ -3671,6 +3671,7 @@ static const char *get_AT_string (dw_die_ref, enum 
> dwarf_attribute);
>  static int get_AT_flag (dw_die_ref, enum dwarf_attribute);
>  static unsigned get_AT_unsigned (dw_die_ref, enum dwarf_attribute);
>  static inline dw_die_ref get_AT_ref (dw_die_ref, enum dwarf_attribute);
> +static bool is_c (void);
>  static bool is_cxx (void);
>  static bool is_cxx (const_tree);
>  static bool is_fortran (void);
> @@ -5434,6 +5435,19 @@ get_AT_file (dw_die_ref die, enum dwarf_attribute 
> attr_kind)
>return a ? AT_file (a) : NULL;
>  }
>  
> +/* Return TRUE if the language is C.  */
> +
> +static inline bool
> +is_c (void)
> +{
> +  unsigned int lang = get_AT_unsigned (comp_unit_die (), DW_AT_language);
> +
> +  return (lang == DW_LANG_C || lang == DW_LANG_C89 || lang == DW_LANG_C99
> +   || lang == DW_LANG_C11 || lang == DW_LANG_ObjC);
> +
> +
> +}
> +
>  /* Return TRUE if the language is C++.  */
>  
>  static inline bool
> @@ -20918,12 +20932,24 @@ add_subscript_info (dw_die_ref type_die, tree type, 
> bool collapse_p)
>  dimension arr(N:*)
>Since the debugger is definitely going to need to know N
>to produce useful results, go ahead and output the lower
> -  bound solo, and hope the debugger can cope.  */
> +  bound solo, and hope the debugger can cope.
> +
> +  For C and C++, if upper is NULL, this may be a zero-length array
> +  or a flexible array; we'd like to be able to distinguish between
> +  the two.  Set DW_AT_count to 0 for the former.  TYPE_SIZE is NULL
> +  for the latter.  */
>  
> if (!get_AT (subrange_die, DW_AT_lower_bound))
>   add_bound_info (subrange_die, DW_AT_lower_bound, lower, NULL);
> -   if (upper && !get_AT (subrange_die, DW_AT_upper_bound))
> - add_bound_info (subrange_die, DW_AT_upper_bound, upper, NULL);
> +   if (!get_AT (subrange_die, DW_AT_upper_bound)
> +   && !get_AT (subrange_die, DW_AT_count))
> + {
> +   if (upper)
> + add_bound_info (subrange_die, DW_AT_upper_bound, upper, NULL);
> +   else if ((is_c () || is_cxx ()) && TYPE_SIZE (type))
> + add_bound_info (subrange_die, DW_AT_count,
> + build_int_cst (TREE_TYPE (lower), 0), NULL);
> + }
>   }
>  
>/* Otherwise we have an array type with an unspecified length.  The

Ping. Does this version look okay for trunk?


Re: [PATCH][GCC][AArch64] Cleanup the AArch64 testsuite when stack-clash is on [Patch (7/7)]

2018-08-28 Thread Jeff Law
On 08/28/2018 06:18 AM, Tamar Christina wrote:
> Hi All,
> 
> Since this patch series now contains SVE support I am removing the changes
> to the SVE tests in this patch series.  I assume the OK still stands as the
> only change here is undoing updates to three files.
> 
> Thanks,
> Tamar
> 
> gcc/testsuite/
> 2018-08-28  Tamar Christina  
> 
>   PR target/86486
>   * gcc.dg/pr82788.c: Skip for AArch64.
>   * gcc.dg/guality/vla-1.c: Turn off stack-clash.
>   * gcc.target/aarch64/subsp.c: Likewise.
>   * gcc.dg/params/blocksort-part.c: Skip stack-clash checks
>   on AArch64.
>   * gcc.dg/stack-check-10.c: Add AArch64 specific checks.
>   * gcc.dg/stack-check-5.c: Add AArch64 specific checks.
>   * gcc.dg/stack-check-6a.c: Skip on AArch64, we don't support this.
>   * testsuite/lib/target-supports.exp
>   (check_effective_target_frame_pointer_for_non_leaf): AArch64 does not
>   require frame pointer for non-leaf functions.
Yea, the OK still applies.
jeff


[PATCH] i386: Add pass_remove_partial_avx_dependency

2018-08-28 Thread H.J. Lu
With -mavx, for

[hjl@gnu-cfl-1 skx-2]$ cat foo.i
extern float f;
extern double d;
extern int i;

void
foo (void)
{
  d = f;
  f = i;
}

we need to generate

vxorp[ds]   %xmmN, %xmmN, %xmmN
...
vcvtss2sd   f(%rip), %xmmN, %xmmX
...
vcvtsi2ss   i(%rip), %xmmN, %xmmY

to avoid partial XMM register stall.  This patch adds a pass to generate
a single

vxorps  %xmmN, %xmmN, %xmmN

at function entry, which is shared by all SF and DF conversions, instead
of generating one

vxorp[ds]   %xmmN, %xmmN, %xmmN

for each SF/DF conversion.

Performance impacts on SPEC CPU 2017 rate with 1 copy using

-Ofast -march=native -mfpmath=sse -fno-associative-math -funroll-loops

are

1. On Broadwell server:

500.perlbench_r (-0.82%)
502.gcc_r (0.73%)
505.mcf_r (-0.24%)
520.omnetpp_r (-2.22%)
523.xalancbmk_r (-1.47%)
525.x264_r (0.31%)
531.deepsjeng_r (0.27%)
541.leela_r (0.85%)
548.exchange2_r (-0.11%)
557.xz_r (-0.34%)
Geomean: (-0.23%)

503.bwaves_r (0.00%)
507.cactuBSSN_r (-1.88%)
508.namd_r (0.00%)
510.parest_r (-0.56%)
511.povray_r (0.49%)
519.lbm_r (-1.28%)
521.wrf_r (-0.28%)
526.blender_r (0.55%)
527.cam4_r (-0.20%)
538.imagick_r (2.52%)
544.nab_r (-0.18%)
549.fotonik3d_r (-0.51%)
554.roms_r (-0.22%)
Geomean: (0.00%)

2. On Skylake client:

500.perlbench_r (-0.29%)
502.gcc_r (-0.36%)
505.mcf_r (1.77%)
520.omnetpp_r (-0.26%)
523.xalancbmk_r (-3.69%)
525.x264_r (-0.32%)
531.deepsjeng_r (0.00%)
541.leela_r (-0.46%)
548.exchange2_r (0.00%)
557.xz_r (0.00%)
Geomean: (-0.34%)

503.bwaves_r (0.00%)
507.cactuBSSN_r (-0.56%)
508.namd_r (0.87%)
510.parest_r (0.00%)
511.povray_r (-0.73%)
519.lbm_r (0.84%)
521.wrf_r (0.00%)
526.blender_r (-0.81%)
527.cam4_r (-0.43%)
538.imagick_r (2.55%)
544.nab_r (0.28%)
549.fotonik3d_r (0.00%)
554.roms_r (0.32%)
Geomean: (0.12%)

3. On Skylake server:

500.perlbench_r (-0.55%)
502.gcc_r (0.69%)
505.mcf_r (0.00%)
520.omnetpp_r (-0.33%)
523.xalancbmk_r (-0.21%)
525.x264_r (-0.27%)
531.deepsjeng_r (0.00%)
541.leela_r (0.00%)
548.exchange2_r (-0.11%)
557.xz_r (0.00%)
Geomean: (0.00%)

503.bwaves_r (0.58%)
507.cactuBSSN_r (0.00%)
508.namd_r (0.00%)
510.parest_r (0.18%)
511.povray_r (-0.58%)
519.lbm_r (0.25%)
521.wrf_r (0.40%)
526.blender_r (0.34%)
527.cam4_r (0.19%)
538.imagick_r (5.87%)
544.nab_r (0.17%)
549.fotonik3d_r (0.00%)
554.roms_r (0.00%)
Geomean: (0.62%)

On Skylake client, impacts on 538.imagick_r are

size before:

   textdata bss dec hex filename
277   108765576 2572029  273efd imagick_r.exe

size after:

   textdata bss dec hex filename
2511825   108765576 2528277  269415 imagick_r.exe

number of vxorp[ds]:

before  after   difference
14570   4515-69%

OK for trunk?

Thanks.


H.J.
---
gcc/

2018-08-28  H.J. Lu  
Sunil K Pandey  

PR target/87007
* config/i386/i386-passes.def: Add
pass_remove_partial_avx_dependency.
* config/i386/i386-protos.h
(make_pass_remove_partial_avx_dependency): New.
* config/i386/i386.c (make_pass_remove_partial_avx_dependency):
New function.
(pass_data_remove_partial_avx_dependency): New.
(pass_remove_partial_avx_dependency): Likewise.
(make_pass_remove_partial_avx_dependency): Likewise.
* config/i386/i386.md (SF/DF conversion splitters): Disabled
for TARGET_AVX.

gcc/testsuite/

2018-08-28  H.J. Lu  
Sunil K Pandey  

PR target/87007
* gcc.target/i386/pr87007.c: New file.
---
 gcc/config/i386/i386-passes.def |   2 +
 gcc/config/i386/i386-protos.h   |   2 +
 gcc/config/i386/i386.c  | 168 
 gcc/config/i386/i386.md |   9 +-
 gcc/testsuite/gcc.target/i386/pr87007.c |  15 +++
 5 files changed, 193 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr87007.c

diff --git a/gcc/config/i386/i386-passes.def b/gcc/config/i386/i386-passes.def
index 4a6a95cc2d9..b439f3a9028 100644
--- a/gcc/config/i386/i386-passes.def
+++ b/gcc/config/i386/i386-passes.def
@@ -31,3 +31,5 @@ along with GCC; see the file COPYING3.  If not see
   INSERT_PASS_BEFORE (pass_cse2, 1, pass_stv, true /* timode_p */);
 
   INSERT_PASS_BEFORE (pass_shorten_branches, 1, pass_insert_endbranch);
+
+  INSERT_PASS_AFTER (pass_combine, 1, pass_remove_partial_avx_dependency);
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index d1d59633dc0..1626c9d0d70 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -358,3 +358,5 @@ class rtl_opt_pass;
 extern rtl_opt_pass *make_pass_insert_vzeroupper (gcc::context *);
 extern rtl_opt_pass *make_pass_stv (gcc::context *);
 extern rtl_opt_pass *make_pass_insert_endbranch (gcc::context *);
+extern rtl_opt_pass *make_pass_remove_partial_avx_dependency
+  (gcc::context *);
diff --git a/gcc/config/i386/i386.c 

Re: [ARM/FDPIC v2 00/21] FDPIC ABI for ARM

2018-08-28 Thread Christophe Lyon
Ping?

The original thread started at:
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg00707.html
On Fri, 17 Aug 2018 at 00:19, Christophe Lyon
 wrote:
>
> Ping?
>
> Le mer. 1 août 2018 à 10:03, Christophe Lyon  a écrit 
> :
>>
>> Ping?
>>
>>
>> On 13/07/2018 18:10, christophe.l...@st.com wrote:
>> > From: Christophe Lyon 
>> >
>> > Hello,
>> >
>> > This patch series implements the GCC contribution of the FDPIC ABI for
>> > ARM targets.
>> >
>> > This ABI enables to run Linux on ARM MMU-less cores and supports
>> > shared libraries to reduce the memory footprint.
>> >
>> > Without MMU, text and data segments relative distances are different
>> > from one process to another, hence the need for a dedicated FDPIC
>> > register holding the start address of the data segment. One of the
>> > side effects is that function pointers require two words to be
>> > represented: the address of the code, and the data segment start
>> > address. These two words are designated as "Function Descriptor",
>> > hence the "FD PIC" name.
>> >
>> > On ARM, the FDPIC register is r9 [1], and the target name is
>> > arm-uclinuxfdpiceabi. Note that arm-uclinux exists, but uses another
>> > ABI and the BFLAT file format; it does not support code sharing.
>> > The -mfdpic option is enabled by default, and -mno-fdpic should be
>> > used to build the Linux kernel.
>> >
>> > This work was developed some time ago by STMicroelectronics, and was
>> > presented during Linaro Connect SFO15 (September 2015). You can watch
>> > the discussion and read the slides [2].
>> > This presentation was related to the toolchain published on github [3],
>> > which is based on binutils-2.22, gcc-4.7, uclibc-0.9.33.2, gdb-7.5.1
>> > and qemu-2.3.0, and for which pre-built binaries are available [3].
>> >
>> > The ABI itself is described in details in [1].
>> >
>> > Our Linux kernel patches have been updated and committed by Nicolas
>> > Pitre (Linaro) in July 2017. They are required so that the loader is
>> > able to handle this new file type. Indeed, the ELF files are tagged
>> > with ELFOSABI_ARM_FDPIC. This new tag has been allocated by ARM, as
>> > well as the new relocations involved.
>> >
>> > The binutils and QEMU patch series have been merged recently. [4][5]
>> >
>> > To build such a toolchain, you'd also need to use my uClibc branch[6].
>> > I have posted uclibc-ng patches for review [7]
>> >
>> > I am currently working on updating the patches for the remaining
>> > toolchain components: uclibc and gdb.
>> >
>> > This series provides support for ARM v7 architecture and has been
>> > tested on arm-linux-gnueabi without regression, as well as
>> > arm-uclinuxfdpiceabi, using QEMU. arm-uclinuxfdpiceabi has more
>> > failures than arm-linux-gnueabi, but is quite functional.
>> >
>> > Are the GCC patches OK for inclusion in master?
>> >
>> > Changes between v1 and v2:
>> > - fix GNU coding style
>> > - exit with an error for pre-Armv7
>> > - use ACLE __ARM_ARCH and remove dead code for pre-Armv4
>> > - remove unsupported attempts of pre-Armv7/thumb1 support
>> > - add instructions in comments next to opcodes
>> > - merge patches 11 and 13
>> > - fixed protected visibility handling in patch 8
>> > - merged legitimize_tls_address_fdpic and
>> >legitimize_tls_address_not_fdpic as requested
>> >
>> > Thanks,
>> >
>> > Christophe.
>> >
>> >
>> > [1] https://github.com/mickael-guene/fdpic_doc/blob/master/abi.txt
>> > [2] 
>> > http://connect.linaro.org/resource/sfo15/sfo15-406-arm-fdpic-toolset-kernel-libraries-for-cortex-m-cortex-r-mmuless-cores/
>> > [3] https://github.com/mickael-guene/fdpic_manifest
>> > [4] 
>> > https://sourceware.org/git/gitweb.cgi?p=binutils-gdb.git;a=commit;h=f1ac0afe481e83c9a33f247b81fa7de789edc4d9
>> > [5] 
>> > https://git.qemu.org/?p=qemu.git;a=commit;h=e8fa72957419c11984608062c7dcb204a6003a06
>> > [6] 
>> > https://git.linaro.org/people/christophe.lyon/uclibc.git/log/?h=uClibc-0.9.33.2-fdpic-upstream
>> > [7] https://mailman.uclibc-ng.org/pipermail/devel/2018-July/001705.html
>> >
>> > Christophe Lyon (21):
>> >[ARM] FDPIC: Add -mfdpic option support
>> >[ARM] FDPIC: Handle arm*-*-uclinuxfdpiceabi in configure scripts
>> >[ARM] FDPIC: Force FDPIC related options unless -mno-fdpic is provided
>> >[ARM] FDPIC: Add support for FDPIC for arm architecture
>> >[ARM] FDPIC: Fix __do_global_dtors_aux and frame_dummy generation
>> >[ARM] FDPIC: Add support for c++ exceptions
>> >[ARM] FDPIC: Avoid saving/restoring r9 on stack since it is RO
>> >[ARM] FDPIC: Ensure local/global binding for function descriptors
>> >[ARM] FDPIC: Add support for taking address of nested function
>> >[ARM] FDPIC: Implement TLS support.
>> >[ARM] FDPIC: Add support to unwind FDPIC signal frame
>> >[ARM] FDPIC: Restore r9 after we call __aeabi_read_tp
>> >[ARM] FDPIC: Force LSB bit for PC in Cortex-M architecture
>> >[ARM][testsuite] FDPIC: Skip unsupported tests
>> >[ARM][testsuite] FDPIC: Adjust 

Re: [PATCH 4/4] bb-reorder: convert to gcc_stablesort

2018-08-28 Thread Alexander Monakov
On Tue, 28 Aug 2018, Richard Biener wrote:

> On Tue, Aug 28, 2018 at 11:22 AM Alexander Monakov  wrote:
> >
> > This converts the use in bb-reorder.  I had to get a little bit creative 
> > with
> > the comparator as profile_count::operator> does not implement a strict weak
> > order.
> 
> So the previously used comparator was invalid?

Yes, when one of profile_count values is !initialized_p () (which happens in
testsuite).

> I think your proposed one
> warrants some comments.  Maybe trade speed for some clearer code?

I think it's not too complicated, but how about adding this comment:

  profile_count m = c1.max (c2);
  /* Return 0 if counts are equal, -1 if E1 has the larger count.  */
  return (m == c2) - (m == c1);

Or, alternatively, employ more branchy checks:

  if (c1 == c2)
return 0;
  if (c1 == c1.max (c2))
return -1;
  return 1;

> The comments on the operator<> implementations are odd as well:

(an aside, but: it also has strange redundant code like

  if (*this  == profile_count::zero ())
return false;
  if (other == profile_count::zero ())
return !(*this == profile_count::zero ());

where the second return could have been 'return true;')

>   /* Comparsions are three-state and conservative.  False is returned if
>  the inequality can not be decided.  */
>   bool operator< (const profile_count ) const
> {
> 
> but bool can only represent a single state?  Are you supposed to do
> 
>   bool gt = count1 > count2;
>   bool le = count1 <= count2
>   if (!gt && !le)
> /* unordered */;
>   else if (gt)
> ...
>   else if (le)
> ...
> 
> ? 

*shudder*  I assume your question is directed to Honza, but IMO profile_count
comparison APIs could be easier to understand if there was a single function
implementing the meat of comparison and returning one of {unordered, equal,
less, greater}, and remaining operations would simply wrap it.

> And what do we want to do for the unordered case in bb-reorder?
> Tie with current order, thus return zero?

No, that wouldn't be valid for sorting.  My patch restores the previous
treatment (uninit/0-frequency edges are worse than any other).

Alexander


Re: [PATCH] Optimize more boolean functions

2018-08-28 Thread MCC CS
Hi again,

I guess I haven't been clear enough. I don't
have push access, is it possible for you to
push my patch to trunk?

The same patch as:
https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01688.html
but updated dates.

2018-08-28 MCC CS 

gcc/
PR tree-optimization/87009
* match.pd: Add boolean optimizations.

2018-08-28 MCC CS 

gcc/testsuite/
PR tree-optimization/87009
* gcc.dg/pr87009.c: New test.

Index: gcc/match.pd
===
--- gcc/match.pd (revision 263646)
+++ gcc/match.pd (working copy)
@@ -776,6 +776,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(bit_not (bit_and:cs (bit_not @0) @1))
(bit_ior @0 (bit_not @1)))

+/* ~(~a | b) --> a & ~b */
+(simplify
+ (bit_not (bit_ior:cs (bit_not @0) @1))
+ (bit_and @0 (bit_not @1)))
+
/* Simplify (~X & Y) to X ^ Y if we know that (X & ~Y) is 0. */
#if GIMPLE
(simplify
@@ -981,6 +986,16 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
(bit_and:c (bit_ior:c @0 @1) (bit_xor:c @1 (bit_not @0)))
(bit_and @0 @1))

+/* (~x | y) & (x | ~y) -> ~(x ^ y) */
+(simplify
+ (bit_and (bit_ior:cs (bit_not @0) @1) (bit_ior:cs @0 (bit_not @1)))
+ (bit_not (bit_xor @0 @1)))
+
+/* (~x | y) ^ (x | ~y) -> x ^ y */
+(simplify
+ (bit_xor (bit_ior:c (bit_not @0) @1) (bit_ior:c @0 (bit_not @1)))
+ (bit_xor @0 @1))
+
/* ~x & ~y -> ~(x | y)
~x | ~y -> ~(x & y) */
(for op (bit_and bit_ior)
Index: gcc/testsuite/gcc.dg/pr87009.c
===
--- gcc/testsuite/gcc.dg/pr87009.c (nonexistent)
+++ gcc/testsuite/gcc.dg/pr87009.c (working copy)
@@ -0,0 +1,23 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-original" } */
+/* { dg-final { scan-tree-dump-times "return s \\^ x;" 4 "original" } } */
+
+int f1 (int x, int s)
+{
+ return ~(~(x|s)|x)|~(~(x|s)|s);
+}
+
+int f2 (int x, int s)
+{
+ return ~(~(~x)&~(x&~s));
+}
+
+int f3 (int x, int s)
+{
+ return ~((x|~s)&(~x|s));
+}
+
+int f4 (int x, int s)
+{
+ return (x|~s)^(~x|s);
+}


[PATCH] PR libstdc++/87116 fix path::lexically_normal() handling of dot-dot

2018-08-28 Thread Jonathan Wakely

Previously the logic that turned "a/b/c/../.." into "a/" failed to
preserve an empty path at the end of the iteration sequence, as required
by the trailing slash. That meant the result didn't meet the class
invariants, and that "a/b/c/d/../../.." would remove four components
instead of the three that "../../.." should remove.

PR libstdc++/87116
* src/filesystem/std-path.cc (path::lexically_normal): When handling
a dot-dot filename, preserve an empty final component in the iteration
sequence.
[_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use preferred-separator for
root-directory.
* testsuite/27_io/filesystem/path/generation/normal.cc: Add new tests
for more than two adjacent dot-dot filenames.
[_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Replace slashes with
preferred-separator in expected normalized strings.

Tested x86_64-linux and x86_64-w64-mingw32.


commit e98e02af564a5836c885eec607f1a288d69e16c8
Author: Jonathan Wakely 
Date:   Tue Aug 28 13:55:09 2018 +0100

PR libstdc++/87116 fix path::lexically_normal() handling of dot-dot

Previously the logic that turned "a/b/c/../.." into "a/" failed to
preserve an empty path at the end of the iteration sequence, as required
by the trailing slash. That meant the result didn't meet the class
invariants, and that "a/b/c/d/../../.." would remove four components
instead of the three that "../../.." should remove.

PR libstdc++/87116
* src/filesystem/std-path.cc (path::lexically_normal): When handling
a dot-dot filename, preserve an empty final component in the 
iteration
sequence.
[_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Use preferred-separator for
root-directory.
* testsuite/27_io/filesystem/path/generation/normal.cc: Add new 
tests
for more than two adjacent dot-dot filenames.
[_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Replace slashes with
preferred-separator in expected normalized strings.

diff --git a/libstdc++-v3/src/filesystem/std-path.cc 
b/libstdc++-v3/src/filesystem/std-path.cc
index f6c0b8bb0f6..f382eb3759a 100644
--- a/libstdc++-v3/src/filesystem/std-path.cc
+++ b/libstdc++-v3/src/filesystem/std-path.cc
@@ -438,7 +438,7 @@ path::lexically_normal() const
 {
 #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
   // Replace each slash character in the root-name
-  if (p._M_type == _Type::_Root_name)
+  if (p._M_type == _Type::_Root_name || p._M_type == _Type::_Root_dir)
{
  string_type s = p.native();
  std::replace(s.begin(), s.end(), L'/', L'\\');
@@ -458,7 +458,8 @@ path::lexically_normal() const
}
  else if (!ret.has_relative_path())
{
- if (!ret.is_absolute())
+ // remove a dot-dot filename immediately after root-directory
+ if (!ret.has_root_directory())
ret /= p;
}
  else
@@ -471,8 +472,18 @@ path::lexically_normal() const
{
  // Remove the filename before the trailing slash
  // (equiv. to ret = ret.parent_path().remove_filename())
- ret._M_pathname.erase(elem._M_cur->_M_pos);
- ret._M_cmpts.erase(elem._M_cur, ret._M_cmpts.end());
+
+ if (elem == ret.begin())
+   ret.clear();
+ else
+   {
+ ret._M_pathname.erase(elem._M_cur->_M_pos);
+ // Do we still have a trailing slash?
+ if (std::prev(elem)->_M_type == _Type::_Filename)
+   ret._M_cmpts.erase(elem._M_cur);
+ else
+   ret._M_cmpts.erase(elem._M_cur, ret._M_cmpts.end());
+   }
}
  else // ???
ret /= p;
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/path/generation/normal.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/path/generation/normal.cc
index 6d0e2007a75..3b8311f81ad 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/path/generation/normal.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/path/generation/normal.cc
@@ -24,7 +24,17 @@
 #include 
 
 using std::filesystem::path;
-using __gnu_test::compare_paths;
+
+void
+compare_paths(path p, std::string expected)
+{
+#if defined(_WIN32) && !defined(__CYGWIN__)
+  for (auto& c : expected)
+if (c == '/')
+  c = '\\';
+#endif
+  __gnu_test::compare_paths(p, expected);
+}
 
 void
 test01()
@@ -69,8 +79,11 @@ test03()
 {"/foo", "/foo" },
 {"/foo/"   , "/foo/" },
 {"/foo/."  , "/foo/" },
-{"/foo/bar/.." , "/foo/" },
 {"/foo/.." , "/" },
+{"/foo/../.."  , "/" },
+{"/foo/bar/.." , "/foo/" },
+{"/foo/bar/../.." , "/" },
+{"/foo/bar/baz/../../.." , "/" }, // PR libstdc++/87116
 
 {"/."  , "/" },
 {"/./" , "/" },
@@ -88,10 

[PATCH] Fix PR87126

2018-08-28 Thread Richard Biener


I am testing the following.

Richard.

2018-08-28  Richard Biener  

PR tree-optimization/87126
* tree-ssa-sccvn.c (vn_reference_insert): Remove assert.

* gcc.dg/tree-ssa/pr87126.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 263917)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -2696,7 +2696,17 @@ vn_reference_insert (tree op, tree resul
  but save a lookup if we deal with already inserted refs here.  */
   if (*slot)
 {
-  gcc_assert (operand_equal_p ((*slot)->result, vr1->result, 0));
+  /* We cannot assert that we have the same value either because
+ when disentangling an irreducible region we may end up visiting
+a use before the corresponding def.  That's a missed optimization
+only though.  See gcc.dg/tree-ssa/pr87126.c for example.  */
+  if (dump_file && (dump_flags & TDF_DETAILS)
+ && !operand_equal_p ((*slot)->result, vr1->result, 0))
+   {
+ fprintf (dump_file, "Keeping old value ");
+ print_generic_expr (dump_file, (*slot)->result);
+ fprintf (dump_file, " because of collision\n");
+   }
   free_reference (vr1);
   obstack_free (_tables_obstack, vr1);
   return;
Index: gcc/testsuite/gcc.dg/tree-ssa/pr87126.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/pr87126.c (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr87126.c (working copy)
@@ -0,0 +1,25 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-fre1" } */
+
+int a, *b;
+
+void f ()
+{ 
+  int d = 0, e = d;
+  while (a++)
+;
+  if (e)
+goto L2;
+L1:
+  d = e;
+  b = 
+L2:
+  if (d)
+goto L1;
+}
+
+/* The load of d could be eliminated if we'd value-number the
+   irreducible region in RPO of the reducible result.  Likewise
+   a redundant store could be removed.  */
+/* { dg-final { scan-tree-dump-times "d = 0;" 1 "fre1" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-not " = d;" "fre1" { xfail *-*-* } } } */


[PATCH] doc: document -ftree-scev-cprop

2018-08-28 Thread Alexander Monakov
Hi,

PR 86726 is pointing out that -ftree-scev-cprop is not documented.

Here's documentation for the final-value-replacement aspect of the
option. It technically can also replace in-loop references by 
constants, but that doesn't seem very useful.

I'm not aware of any required order for option paragraphs; placing
new text between -floop-* and -fivopts.

PR other/86726
* invoke.texi (Optimization Options): List -ftree-scev-cprop.
(-O): Ditto.
(-ftree-scev-cprop): Document.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e37233d6ed4..de11ae27c95 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -464,7 +464,7 @@ Objective-C and Objective-C++ Dialects}.
 -ftree-loop-ivcanon  -ftree-loop-linear  -ftree-loop-optimize @gol
 -ftree-loop-vectorize @gol
 -ftree-parallelize-loops=@var{n}  -ftree-pre  -ftree-partial-pre  -ftree-pta 
@gol
--ftree-reassoc  -ftree-sink  -ftree-slsr  -ftree-sra @gol
+-ftree-reassoc  -ftree-scev-cprop  -ftree-sink  -ftree-slsr  -ftree-sra @gol
 -ftree-switch-conversion  -ftree-tail-merge @gol
 -ftree-ter  -ftree-vectorize  -ftree-vrp  -funconstrained-commons @gol
 -funit-at-a-time  -funroll-all-loops  -funroll-loops @gol
@@ -7813,6 +7813,7 @@ compilation time.
 -ftree-forwprop @gol
 -ftree-fre @gol
 -ftree-phiprop @gol
+-ftree-scev-cprop @gol
 -ftree-sink @gol
 -ftree-slsr @gol
 -ftree-sra @gol
@@ -9120,6 +9121,15 @@ determining number of iterations requires complicated 
analysis.  Later
 optimizations then may determine the number easily.  Useful especially
 in connection with unrolling.
 
+@item -ftree-scev-cprop
+@opindex ftree-scev-cprop
+Perform final value replacement.  If a variable is modified in a loop
+in a way that its value when exiting the loop can be determined using
+only its initial value and number of loop iterations, replace uses of
+the final value by such computation, provided it is sufficiently cheap.
+This reduces data dependencies and may allow further simplifications.
+Enabled by default at @option{-O} and higher.
+
 @item -fivopts
 @opindex fivopts
 Perform induction variable optimizations (strength reduction, induction


Re: [PATCH] Fix version check for ATTRIBUTE_GCC_DUMP_PRINTF

2018-08-28 Thread David Malcolm
On Tue, 2018-08-28 at 14:26 +0200, Jakub Jelinek wrote:
> On Tue, Aug 28, 2018 at 08:43:59AM +0200, Jakub Jelinek wrote:
> > On Mon, Aug 27, 2018 at 08:32:11PM -0400, David Malcolm wrote:
> > > gcc/ChangeLog:
> > >   * dumpfile.h (ATTRIBUTE_GCC_DUMP_PRINTF): Change version check
> > > on
> > >   GCC_VERSION for usage of "__gcc_dump_printf__" format from
> > >   >= 3005 to >= 9000.
> > 
> > Ok, thanks.
> 
> Another option would be to use __gcc_tdiag__ for GCC_VERSION >= 3005
> and <
> 9000, that differs from __gcc_dump_printf__ only in %E (tree vs.
> gimple *)
> and %D/%F/%V/%K added to __gcc_tdiag__, so worst case one would get a
> couple
> of rare warnings when %E is used somewhere; certainly better than
> having a
> warning on every TU that includes dumpfile.h.

Maybe, but I think that trying to decipher warnings based on something
that's close to the rules but not the same as them could get confusing
(especially if we add some new codes to __gcc_dump_printf__).

I've gone ahead with the simpler fix, as posted above, to stop the
flood of warnings (r263920).

Sorry again.

Dave



Re: [PATCH 3/3] or1k: gcc: initial support for openrisc

2018-08-28 Thread Jeff Law
On 08/26/2018 03:18 PM, Stafford Horne wrote:
> -mm-dd  Stafford Horne  
>   Richard Henderson  
> 
> gcc/ChangeLog:
> 
>   * common/config/or1k/or1k-common.c: New file.
>   * config/or1k/*: New.
>   * config.gcc (or1k*-*-*): New.
>   * configure.ac (or1k*-*-*): New test for openrisc tls.
>   * configure: Regenerated.
So I still want to do another looksie, but a few comments beyond what
Joseph has already pointed out.

Many functions are documented with "WORKER for xyz." kinds of comments.
Policy is each function should have a comment which describes what the
function does at a high level as well as the inputs/outputs.

Your port defines instruction scheduling, so please check that you're
emitting the proper barriers, particularly in your epilogue code.  In
particular most ports need a barrier to prevent movement of register
restores past the stack adjustment when the frame pointer is in use.

Please consider using function descriptors rather than trampolines.
This allows you to make the stack non-executable at all times which is
good from a security standpoint.  The downside is the indirect calling
mechanism has to change slightly to distinguish between a simple
indirect call and one through a function descriptor (usually by having a
low bit on to indicate the latter).  GIven this is an ABI change, now is
probably the last opportunity to make this change.

I didn't see any provision for long vs short branches.  How are branches
to targets within the same function, but which are out of the range that
can be encoded in a single instruction work?  Similarly how is this
handled for function calls?  Note handling this gets even more complex
with delay slots :(

I don't know if openrisc has any speculation capabilities.  Please
define the SPECULATION_SAFE_VALUE hook appropriately for your
architecture.  If you don't have speculation, define it to
speculation_safe_value_not_needed.


I don't see anything glaringly wrong, but would like to take another
looksie once the issues raised by Joseph and the function doc issue are
addressed.

Jeff

ps.  1/3 and 2/3 look fine.



Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

2018-08-28 Thread Bernd Edlinger
On 08/28/18 11:25, Bernd Edlinger wrote:
> On 08/28/18 04:55, Jeff Law wrote:
>> On 08/23/2018 08:48 AM, Bernd Edlinger wrote:
>>> On 08/23/18 16:24, Jeff Law wrote:
>
> Yes, and which one was the earlier, more controversial patch from me?

 https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html


 Which is the issue I'm working through right now :-)

>>>
>>> Okay, please note that a re-based patch is here:
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01005.html
>>>
>>> and if you want, you can split that patch in two parts:
>>>
>>> first:
>>> 86711 fix:
>>> 2018-08-17  Bernd Edlinger  
>>>
>>>   PR middle-end/86711
>>>   * expr.c (string_constant): Don't return truncated string 
>>> literals.
>>>
>>>   * gcc.c-torture/execute/pr86711.c: New test.
>> Note that while this fixes your pr86711.c, it doesn't fix the larger set
>> of tests Martin has created for the issues in pr86711.
>>
>> Sigh.
>> jeff
>>
> 
> 
> Hmm.  which test do you mean?
> 
> I just looked at the tests from part1/6.
> 
> there I found test cases memchr-1.c, pr86714.c, strlenopt-56.c
> 
> test them against trunk from yesterday (only build stage1):
> $ svn info
> Path: .
> Working Copy Root Path: /home/ed/gnu/gcc-trunk
> URL: svn+ssh://edlin...@gcc.gnu.org/svn/gcc/trunk
> Relative URL: ^/trunk
> Repository Root: svn+ssh://edlin...@gcc.gnu.org/svn/gcc
> Repository UUID: 138bc75d-0d04-0410-961f-82ee72b054a4
> Revision: 263893
> Node Kind: directory
> Schedule: normal
> Last Changed Author: jakub
> Last Changed Rev: 263891
> Last Changed Date: 2018-08-27 20:36:23 +0200 (Mon, 27 Aug 2018)
> 
> memchr-1.c fail
> pr86714.c fail
> strlenopt-56.c pass
> 
> I don't know what this test case is trying to test, sorry.
> What is obvious about it:
> 
> static const wchar_t wsx[] = L"\x12345678";
> static const wchar_t ws4[] = L"\x00123456\x12005678\x12340078\x12345600";
> 
> This will be one more failed test case on AIX, given it uses -Wall.
> 
> Now I apply my pr86711 patch (only expr.c changed):
> 
> result
> memchr-1.c pass
> pr86714.c pass
> 

Wow, wait a moment.

When I wrote this: https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01089.html
with the original test results of the pr86711 patch I wrote:

"This time I attach the test results for your reference:
gcc-trunk-263644-test.txt: unchanged r263644
gcc-trunk-263644-0-test.txt: r263644 + this patch
gcc-trunk-263644-1-test.txt: r263644 + this patch without string_constant"

"this patch" means both parts,
"this patch without string_constant" means only fold-const.c changed.
string_constant is in expr.c.

I have mis-remembered that, deeply sorry about it.


So I try it the other way round:

apply pr86711 patch (only fold-const.c/h changed):

memchr-1.c pass
pr86714.c fail
strlenop-56.c pass (which probably disqualifies that test case)

So this would have been the correct changelog:

2018-08-17  Bernd Edlinger  

 PR middle-end/86711
 * fold-const.c (c_getstr): Fix function comment.  Remove unused third
 argument.  Fix range checks.
 * fold-const.h (c_getstr): Adjust prototype.

 * gcc.c-torture/execute/pr86711.c: New test.



Bernd.


[PATCH][5/4] Fix PR87117

2018-08-28 Thread Richard Biener


Another one.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2018-08-28  Richard Biener  

PR tree-optimization/87117
* tree-ssa-sccvn.c (eliminate_dom_walker::eliminate_stmt): Only
re-value-number released SSA VDEFs.

* gfortran.dg/pr87117.f90: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 263917)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -4979,8 +4979,14 @@ eliminate_dom_walker::eliminate_stmt (ba
  propagate_tree_value_into_stmt (gsi, sprime);
  stmt = gsi_stmt (*gsi);
  update_stmt (stmt);
+ /* In case the VDEF on the original stmt was released, value-number
+it to the VUSE.  This is to make vuse_ssa_val able to skip
+released virtual operands.  */
  if (vdef != gimple_vdef (stmt))
-   VN_INFO (vdef)->valnum = vuse;
+   {
+ gcc_assert (SSA_NAME_IN_FREE_LIST (vdef));
+ VN_INFO (vdef)->valnum = vuse;
+   }
 
  /* If we removed EH side-effects from the statement, clean
 its EH information.  */
@@ -5258,7 +5264,10 @@ eliminate_dom_walker::eliminate_stmt (ba
fprintf (dump_file, "  Removed AB side-effects.\n");
}
   update_stmt (stmt);
-  if (vdef != gimple_vdef (stmt))
+  /* In case the VDEF on the original stmt was released, value-number
+ it to the VUSE.  This is to make vuse_ssa_val able to skip
+released virtual operands.  */
+  if (vdef && SSA_NAME_IN_FREE_LIST (vdef))
VN_INFO (vdef)->valnum = vuse;
 }
 
Index: gcc/testsuite/gfortran.dg/pr87117.f90
===
--- gcc/testsuite/gfortran.dg/pr87117.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr87117.f90   (working copy)
@@ -0,0 +1,14 @@
+! { dg-do compile }
+! { dg-options "-O" }
+program p
+   real(4) :: a, b
+   integer(4) :: n, m
+   equivalence (a, n)
+   a = 1024.0
+   m = 8
+   a = 1024.0
+   b = set_exponent(a, m)
+   n = 8
+   a = f(a, n)
+   b = set_exponent(a, m)
+end


[PATCH] Fix PR87124

2018-08-28 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2018-08-28  Richard Biener  

PR tree-optimization/87124
* tree-ssa-sccvn.c (vn_lookup_simplify_result): Guard against
constants before looking up avail.

* g++.dg/torture/pr87124.C: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 263917)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -5667,7 +5667,7 @@ vn_lookup_simplify_result (gimple_match_
   res_op->type, ops, );
   /* If this is used from expression simplification make sure to
  return an available expression.  */
-  if (res && mprts_hook && rpo_avail)
+  if (res && TREE_CODE (res) == SSA_NAME && mprts_hook && rpo_avail)
 res = rpo_avail->eliminate_avail (vn_context_bb, res);
   return res;
 }
Index: gcc/testsuite/g++.dg/torture/pr87124.C
===
--- gcc/testsuite/g++.dg/torture/pr87124.C  (nonexistent)
+++ gcc/testsuite/g++.dg/torture/pr87124.C  (working copy)
@@ -0,0 +1,12 @@
+// { dg-do compile }
+
+class A {
+void m_fn1();
+};
+
+void A::m_fn1()
+{
+  A *a = this;
+  for (int i; i && a;)
+a = 0;
+}


Re: [PATCH 4/4] bb-reorder: convert to gcc_stablesort

2018-08-28 Thread Richard Biener
On Tue, Aug 28, 2018 at 11:22 AM Alexander Monakov  wrote:
>
> This converts the use in bb-reorder.  I had to get a little bit creative with
> the comparator as profile_count::operator> does not implement a strict weak
> order.

So the previously used comparator was invalid?  I think your proposed one
warrants some comments.  Maybe trade speed for some clearer code?
The comments on the operator<> implementations are odd as well:

  /* Comparsions are three-state and conservative.  False is returned if
 the inequality can not be decided.  */
  bool operator< (const profile_count ) const
{

but bool can only represent a single state?  Are you supposed to do

  bool gt = count1 > count2;
  bool le = count1 <= count2
  if (!gt && !le)
/* unordered */;
  else if (gt)
...
  else if (le)
...

?  And what do we want to do for the unordered case in bb-reorder?
Tie with current order, thus return zero?

Richard.

> * gcc/bb-reorder.c (edge_order): Convert to C-qsort-style
> tri-state comparator.
> (reorder_basic_blocks_simple): Change std::stable_sort to 
> gcc_stablesort.
>
> diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
> index 57edde62c62..7c80609fbf4 100644
> --- a/gcc/bb-reorder.c
> +++ b/gcc/bb-reorder.c
> @@ -91,7 +91,6 @@
>  */
>
>  #include "config.h"
> -#define INCLUDE_ALGORITHM /* stable_sort */
>  #include "system.h"
>  #include "coretypes.h"
>  #include "backend.h"
> @@ -2351,13 +2350,17 @@ reorder_basic_blocks_software_trace_cache (void)
>FREE (bbd);
>  }
>
> -/* Return true if edge E1 is more desirable as a fallthrough edge than
> -   edge E2 is.  */
> +/* Order edges by execution frequency, higher first.  */
>
> -static bool
> -edge_order (edge e1, edge e2)
> +static int
> +edge_order (const void *ve1, const void *ve2)
>  {
> -  return e1->count () > e2->count ();
> +  edge e1 = *(const edge *) ve1;
> +  edge e2 = *(const edge *) ve2;
> +  profile_count c1 = e1->count ();
> +  profile_count c2 = e2->count ();
> +  profile_count m = c1.max (c2);
> +  return (m == c2) - (m == c1);
>  }
>
>  /* Reorder basic blocks using the "simple" algorithm.  This tries to
> @@ -2410,7 +2413,7 @@ reorder_basic_blocks_simple (void)
>   all edges are equally desirable.  */
>
>if (optimize_function_for_speed_p (cfun))
> -std::stable_sort (edges, edges + n, edge_order);
> +gcc_stablesort (edges, n, sizeof *edges, edge_order);
>
>/* Now decide which of those edges to make fallthrough edges.  We set
>   BB_VISITED if a block already has a fallthrough successor assigned
> --
> 2.13.3
>


Re: [C++ Patch] PR 86546 (" ICE on invalid: tree_class_check_failed()")

2018-08-28 Thread Nathan Sidwell

On 08/28/2018 08:43 AM, Paolo Carlini wrote:

Hi,

this low priority ICE on invalid happens only in c++98 mode but I think 
we can as well avoid it and have consistent error recovery across std 
modes. Essentially, the VAR_DECL for 'b' with erroneous TREE_TYPE (the 
type is incomplete) reaches cxx_eval_constant_expression and we ICE when 
we use COMPLETE_TYPE_P on the TREE_TYPE itself. I think we can - 
consistently across std modes - catch the erroneous type quite a bit 
earlier, midway in finish_case_label, thus before the case_conversion 
call which eventually leads to the ICE. Tested x86_64-linux.


ok


--
Nathan Sidwell


Re: [PATCH 3/4] tree-loop-distribution: convert to gcc_stablesort

2018-08-28 Thread Richard Biener
On Tue, Aug 28, 2018 at 11:17 AM Alexander Monakov  wrote:
>
> This converts std::stable_sort use in tree-loop-distribution to 
> gcc_stablesort.
>
> * gcc/tree-loop-distribution.c (offset_cmp): Convert to C-qsort-style
> tri-state comparator.
> (fuse_memset_builtins): Change std::stable_sort to  gcc_stablesort .

OK.

Richard.

> diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
> index 120661447f0..d8db03b545b 100644
> --- a/gcc/tree-loop-distribution.c
> +++ b/gcc/tree-loop-distribution.c
> @@ -90,7 +90,6 @@ along with GCC; see the file COPYING3.  If not see
> data reuse.  */
>
>  #include "config.h"
> -#define INCLUDE_ALGORITHM /* stable_sort */
>  #include "system.h"
>  #include "coretypes.h"
>  #include "backend.h"
> @@ -2561,12 +2560,14 @@ version_for_distribution_p (vec 
> *partitions,
>
>  /* Compare base offset of builtin mem* partitions P1 and P2.  */
>
> -static bool
> -offset_cmp (struct partition *p1, struct partition *p2)
> +static int
> +offset_cmp (const void *vp1, const void *vp2)
>  {
> -  gcc_assert (p1 != NULL && p1->builtin != NULL);
> -  gcc_assert (p2 != NULL && p2->builtin != NULL);
> -  return p1->builtin->dst_base_offset < p2->builtin->dst_base_offset;
> +  struct partition *p1 = *(struct partition *const *) vp1;
> +  struct partition *p2 = *(struct partition *const *) vp2;
> +  unsigned HOST_WIDE_INT o1 = p1->builtin->dst_base_offset;
> +  unsigned HOST_WIDE_INT o2 = p2->builtin->dst_base_offset;
> +  return (o2 < o1) - (o1 < o2);
>  }
>
>  /* Fuse adjacent memset builtin PARTITIONS if possible.  This is a special
> @@ -2618,8 +2619,8 @@ fuse_memset_builtins (vec 
> *partitions)
> }
>
>/* Stable sort is required in order to avoid breaking dependence.  */
> -  std::stable_sort (&(*partitions)[i],
> -   &(*partitions)[i] + j - i, offset_cmp);
> +  gcc_stablesort (&(*partitions)[i], j - i, sizeof (*partitions)[i],
> + offset_cmp);
>/* Continue with next partition.  */
>i = j;
>  }
> --
> 2.13.3
>


Re: [PATCH 2/4] introduce gcc_stablesort

2018-08-28 Thread Richard Biener
On Tue, Aug 28, 2018 at 11:14 AM Alexander Monakov  wrote:
>
> This adds a stable sort to sort.cc: mergesort implementation is stable, so
> we just need network sort limited to sizes 2-3 to get the complete sort 
> stable.
>
> As I don't want to duplicate code for this, I've chosen to have gcc_qsort
> accept bit-inverted 'size' parameter to request stable sorting.

OK.

> * gcc/sort.cc (struct sort_ctx): New field 'nlim'. Use it...
> (mergesort): ... here as maximum count for using netsort.
> (gcc_qsort): Set nlim to 3 if stable sort is requested.
> (gcc_stablesort): New.
> * gcc/system.h (gcc_stablesort): Declare.
>
> diff --git a/gcc/sort.cc b/gcc/sort.cc
> index 9f8ee12e13b..b3be1eac72b 100644
> --- a/gcc/sort.cc
> +++ b/gcc/sort.cc
> @@ -55,6 +55,7 @@ struct sort_ctx
>char   *out; // output buffer
>size_t n;// number of elements
>size_t size; // element size
> +  size_t nlim; // limit for network sort
>  };
>
>  /* Helper for netsort. Permute, possibly in-place, 2 or 3 elements,
> @@ -178,7 +179,7 @@ do {  \
>  static void
>  mergesort (char *in, sort_ctx *c, size_t n, char *out, char *tmp)
>  {
> -  if (likely (n <= 5))
> +  if (likely (n <= c->nlim))
>  {
>c->out = out;
>c->n = n;
> @@ -221,8 +222,12 @@ gcc_qsort (void *vbase, size_t n, size_t size, cmp_fn 
> *cmp)
>  {
>if (n < 2)
>  return;
> +  size_t nlim = 5;
> +  bool stable = (ssize_t) size < 0;
> +  if (stable)
> +nlim = 3, size = ~size;
>char *base = (char *)vbase;
> -  sort_ctx c = {cmp, base, n, size};
> +  sort_ctx c = {cmp, base, n, size, nlim};
>long long scratch[32];
>size_t bufsz = (n / 2) * size;
>void *buf = bufsz <= sizeof scratch ? scratch : xmalloc (bufsz);
> @@ -233,3 +238,9 @@ gcc_qsort (void *vbase, size_t n, size_t size, cmp_fn 
> *cmp)
>qsort_chk (vbase, n, size, cmp);
>  #endif
>  }
> +
> +void
> +gcc_stablesort (void *vbase, size_t n, size_t size, cmp_fn *cmp)
> +{
> +  gcc_qsort (vbase, n, ~size, cmp);
> +}
> diff --git a/gcc/system.h b/gcc/system.h
> index 203c6a4f0cf..100feb567c9 100644
> --- a/gcc/system.h
> +++ b/gcc/system.h
> @@ -1202,6 +1202,8 @@ helper_const_non_const_cast (const char *p)
> corresponding to vec::qsort (cmp): they use C qsort internally anyway.  */
>  void qsort_chk (void *, size_t, size_t, int (*)(const void *, const void *));
>  void gcc_qsort (void *, size_t, size_t, int (*)(const void *, const void *));
> +void gcc_stablesort (void *, size_t, size_t,
> +int (*)(const void *, const void *));
>  #define PP_5th(a1, a2, a3, a4, a5, ...) a5
>  #undef qsort
>  #define qsort(...) PP_5th (__VA_ARGS__, gcc_qsort, 3, 2, qsort, 0) 
> (__VA_ARGS__)
> --
> 2.13.3
>


Re: [PATCH 3/3] or1k: gcc: initial support for openrisc

2018-08-28 Thread Stafford Horne
Hi Joseph,

On Mon, Aug 27, 2018 at 05:24:51PM +, Joseph Myers wrote:
> On Mon, 27 Aug 2018, Stafford Horne wrote:
> 
> >  gcc/config/or1k/elf.opt  |   33 +
> 
> >  gcc/config/or1k/or1k.opt |   41 +
> 
> Command-line options need documenting in invoke.texi.  This patch is 
> missing documentation updates.  Please see sourcebuild.texi, section "Back 
> End", for a list of the various places that need updating for a new target 
> architecture, including the various places in the documentation that need 
> updating if the architecture has any instances of the feature documented 
> there.

Sure I will document these.  I was thinking about what needed to be documented,
thanks for pointing this out.

> > +#define TARGET_FLOAT_FORMAT IEEE_FLOAT_FORMAT
> 
> This target macro was obsoleted in 2008.  You shouldn't be defining it 
> here.
> 
> Whereas TARGET_FLOAT_FORMAT is missing a poisoning in system.h, such as 
> should be done for obsolete target macros to make it easy to spot when an 
> out-of-tree port is defining them, the obsolete NO_IMPLICIT_EXTERN_C, 
> which you're defining in or1k/linux.h, is properly poisoned in system.h, 
> so I'd have expected the build for that target to fail.

I will fix the TARGET_FLOAT_FORMAT.  However, I am not sure why
NO_IMPLICIT_EXTERN_C is not causing a compile failure.  This definitely has been
working and tested with musl libc.

I will let you know what I find.

> 
> You're using /lib/ld.so.1 as GLIBC_DYNAMIC_LINKER.  Preferred glibc 
> practice for new ports is to have an architecture-specific, ABI-specific 
> dynamic linker name, not shared with any other ABI or architecture, to 
> support distributions using multi-arch directory arrangements.

Sure, I will do that.  We have an existing glibc port which I haven't touched
before, I will see what it expects and set it up that way.

> Why the __BIG_ENDIAN__ built-in macro?  Since we have the 
> architecture-independent __BYTE_ORDER__ macro, new architectures shouldn't 
> generally define their own macros related to byte-order (especially for an 
> architecture that only supports one endianness!) unless it's trying to 
> implement some compiler-independent API for that architecture that says 
> such macros are defined.

I think this just comes from me copying an example from m32r.  I don't think its
needed and I will remove it.

Thank you for the review, I would not have spotted these.  If anything is
missing in the gccint docs which might make it easier for future porters to not
make these mistakes I will try to add it too.

-Stafford



[C++ Patch] PR 86546 (" ICE on invalid: tree_class_check_failed()")

2018-08-28 Thread Paolo Carlini

Hi,

this low priority ICE on invalid happens only in c++98 mode but I think 
we can as well avoid it and have consistent error recovery across std 
modes. Essentially, the VAR_DECL for 'b' with erroneous TREE_TYPE (the 
type is incomplete) reaches cxx_eval_constant_expression and we ICE when 
we use COMPLETE_TYPE_P on the TREE_TYPE itself. I think we can - 
consistently across std modes - catch the erroneous type quite a bit 
earlier, midway in finish_case_label, thus before the case_conversion 
call which eventually leads to the ICE. Tested x86_64-linux.


Thanks, Paolo.

/

/cp
2018-08-28  Paolo Carlini  

PR c++/86546
* decl.c (finish_case_label): If the type is erroneous early
return error_mark_node.

/testsuite
2018-08-28  Paolo Carlini  

PR c++/86546
* g++.dg/other/switch4.C: New.
Index: cp/decl.c
===
--- cp/decl.c   (revision 263914)
+++ cp/decl.c   (working copy)
@@ -3662,6 +3662,8 @@ finish_case_label (location_t loc, tree low_value,
 return error_mark_node;
 
   type = SWITCH_STMT_TYPE (switch_stack->switch_stmt);
+  if (type == error_mark_node)
+return error_mark_node;
 
   low_value = case_conversion (type, low_value);
   high_value = case_conversion (type, high_value);
Index: testsuite/g++.dg/other/switch4.C
===
--- testsuite/g++.dg/other/switch4.C(nonexistent)
+++ testsuite/g++.dg/other/switch4.C(working copy)
@@ -0,0 +1,6 @@
+// PR c++/86546
+
+class a b;  // { dg-error "aggregate" }
+void c() {
+  switch ()  // { dg-error "expected" }
+  case b  // { dg-error "expected" }


Re: [Patch, fortran] PRs 80477 and 86481 - memory leaks following function calls.

2018-08-28 Thread Paul Richard Thomas
Committed as r263916.

Thanks for taking a look over it, Thomas.

Paul

On Mon, 27 Aug 2018 at 20:26, Thomas Koenig  wrote:
>
> Hi Paul,
>
> > Bootstrapped and regtested on FC28/x86_64 - OK for trunk?
>
> OK, and thanks for the patch!
>
> Regards
>
> Thomas



-- 
"If you can't explain it simply, you don't understand it well enough"
- Albert Einstein


Re: [GCC][PATCH v2][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

2018-08-28 Thread Sam Tebbs

ping

https://gcc.gnu.org/ml/gcc-patches/2018-08/msg00108.html


On 08/01/2018 04:07 PM, Sam Tebbs wrote:

Hi all,

This patch adds an optimisation that exploits the AArch64 BFXIL
instruction when or-ing the result of two bitwise and operations
with non-overlapping bitmasks
(e.g. (a & 0x) | (b & 0x)).

Example:

unsigned long long combine(unsigned long long a, unsigned long
long b) {
  return (a & 0xll) | (b & 0xll);
}

void read(unsigned long long a, unsigned long long b, unsigned
long long *c) {
  *c = combine(a, b);
}

When compiled with -O2, read would result in:

read:
  and   x5, x1, #0x
  and   x4, x0, #0x
  orr   x4, x4, x5
  str   x4, [x2]
  ret

But with this patch results in:

read:
  mov    x4, x0
  bfxil    x4, x1, 0, 32
  str    x4, [x2]
  ret

Bootstrapped and regtested on aarch64-none-linux-gnu and
aarch64-none-elf with no regressions.


gcc/
2018-08-01  Sam Tebbs

    PR target/85628
    * config/aarch64/aarch64.md (*aarch64_bfxil):
    Define.
    * config/aarch64/constraints.md (Ulc): Define
    * config/aarch64/aarch64-protos.h
    (aarch64_is_left_consecutive): Define.
    * config/aarch64/aarch64.c (aarch64_is_left_consecutive):
    New function.

gcc/testsuite
2018-08-01  Sam Tebbs

    PR target/85628
    * gcc.target/aarch64/combine_bfxil.c: New file.
    * gcc.target/aarch64/combine_bfxil_2.c: New file.





Re: [PATCH] Fix version check for ATTRIBUTE_GCC_DUMP_PRINTF

2018-08-28 Thread Jakub Jelinek
On Tue, Aug 28, 2018 at 08:43:59AM +0200, Jakub Jelinek wrote:
> On Mon, Aug 27, 2018 at 08:32:11PM -0400, David Malcolm wrote:
> > gcc/ChangeLog:
> > * dumpfile.h (ATTRIBUTE_GCC_DUMP_PRINTF): Change version check on
> > GCC_VERSION for usage of "__gcc_dump_printf__" format from
> > >= 3005 to >= 9000.
> 
> Ok, thanks.

Another option would be to use __gcc_tdiag__ for GCC_VERSION >= 3005 and <
9000, that differs from __gcc_dump_printf__ only in %E (tree vs. gimple *)
and %D/%F/%V/%K added to __gcc_tdiag__, so worst case one would get a couple
of rare warnings when %E is used somewhere; certainly better than having a
warning on every TU that includes dumpfile.h.

Jakub


[PATCH][GCC][AArch64] Add support for SVE stack clash probing [patch (2/7)]

2018-08-28 Thread Tamar Christina
Hi all,

This patch adds basic support for SVE stack clash protection.
It is a first implementation and will use a loop to do the
probing and stack adjustments.

An example sequence is:

.cfi_startproc
mov x15, sp
cntbx16, all, mul #11
add x16, x16, 304
.cfi_def_cfa_register 15
.SVLPSRL0:
cmp x16, 4096
b.lt.BRRCHK0
sub sp, sp, 4096
str xzr, [sp, 1024]
sub x16, x16, 4096
b   .SVLPSRL0
.BRRCHK0:
sub sp, sp, x16
cmp sp, 1024
b.lt.BERCHK0
str xzr, [sp, 1024]
.BERCHK0:
.cfi_escape 0xf,0xc,0x8f,0,0x92,0x2e,0,0x8,0x58,0x1e,0x23,0xb0,0x2,0x22

This has about the same semantics as alloca.


Bootstrapped Regtested on aarch64-none-linux-gnu and no issues in sve testsuite.
Target was tested with stack clash on and off by default.

Ok for trunk?

Thanks,
Tamar

gcc/
2018-08-28  Tamar Christina  

PR target/86486
* config/aarch64/aarch64-protos.h 
(aarch64_output_probe_sve_stack_clash): New.
* config/aarch64/aarch64.c (aarch64_output_probe_sve_stack_clash): New.
(aarch64_allocate_and_probe_stack_space): Add SVE specific section.
* config/aarch64/aarch64.md (probe_sve_stack_clash): New.

gcc/testsuite/
2018-08-28  Tamar Christina  

PR target/86486
* gcc.target/aarch64/stack-check-prologue-12.c: New test
* gcc.target/aarch64/stack-check-cfa-3.c: New test.

-- 
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index ef95fc829b83886e2ff00e4664e31af916e99b0c..e2d8734a8d5e513588e3b0318e9c67fdaebdf0d4 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -453,6 +453,7 @@ void aarch64_asm_output_labelref (FILE *, const char *);
 void aarch64_cpu_cpp_builtins (cpp_reader *);
 const char * aarch64_gen_far_branch (rtx *, int, const char *, const char *);
 const char * aarch64_output_probe_stack_range (rtx, rtx);
+const char * aarch64_output_probe_sve_stack_clash (rtx, rtx, rtx, rtx);
 void aarch64_err_no_fpadvsimd (machine_mode);
 void aarch64_expand_epilogue (bool);
 void aarch64_expand_mov_immediate (rtx, rtx, rtx (*) (rtx, rtx) = 0);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 06451f38b11822ea77323438fe8c7e373eb9e614..e7efde79bb111e820f4df44a276f6f73070ecd17 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -3970,6 +3970,90 @@ aarch64_output_probe_stack_range (rtx reg1, rtx reg2)
   return "";
 }
 
+/* Emit the probe loop for doing stack clash probes and stack adjustments for
+   SVE.  This emits probes from BASE to BASE + ADJUSTMENT based on a guard size
+   of GUARD_SIZE and emits a probe when at least LIMIT bytes are allocated.  By
+   the end of this function BASE = BASE + ADJUSTMENT.  */
+
+const char *
+aarch64_output_probe_sve_stack_clash (rtx base, rtx adjustment, rtx limit,
+  rtx guard_size)
+{
+  /* This function is not allowed to use any instruction generation function
+ like gen_ and friends.  If you do you'll likely ICE during CFG validation,
+ so instead emit the code you want using output_asm_insn.  */
+  gcc_assert (flag_stack_clash_protection);
+  gcc_assert (CONST_INT_P (limit) && CONST_INT_P (guard_size));
+  gcc_assert (aarch64_uimm12_shift (INTVAL (limit)));
+  gcc_assert (aarch64_uimm12_shift (INTVAL (guard_size)));
+
+  static int labelno = 0;
+  char loop_start_lab[32];
+  char loop_res_lab[32];
+  char loop_end_lab[32];
+  rtx xops[2];
+
+  ASM_GENERATE_INTERNAL_LABEL (loop_start_lab, "SVLPSRL", labelno);
+  ASM_GENERATE_INTERNAL_LABEL (loop_res_lab, "BRRCHK", labelno);
+  ASM_GENERATE_INTERNAL_LABEL (loop_end_lab, "BERCHK", labelno++);
+
+  /* Emit loop start label.  */
+  ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, loop_start_lab);
+
+  /* Test if ADJUSTMENT < GUARD_SIZE.  */
+  xops[0] = adjustment;
+  xops[1] = guard_size;
+  output_asm_insn ("cmp\t%0, %1", xops);
+
+  /* Branch to residual loop if it is.  */
+  fputs ("\tb.lt\t", asm_out_file);
+  assemble_name_raw (asm_out_file, loop_res_lab);
+  fputc ('\n', asm_out_file);
+
+  /* BASE = BASE - GUARD_SIZE.  */
+  xops[0] = base;
+  xops[1] = guard_size;
+  output_asm_insn ("sub\t%0, %0, %1", xops);
+
+  /* Probe at BASE + LIMIT.  */
+  xops[1] = limit;
+  output_asm_insn ("str\txzr, [%0, %1]", xops);
+
+  /* ADJUSTMENT = ADJUSTMENT - GUARD_SIZE.  */
+  xops[0] = adjustment;
+  xops[1] = guard_size;
+  output_asm_insn ("sub\t%0, %0, %1", xops);
+
+  /* Branch to loop start.  */
+  fputs ("\tb\t", asm_out_file);
+  assemble_name_raw (asm_out_file, loop_start_lab);
+  fputc ('\n', asm_out_file);
+
+  /* Emit residual check label.  */
+  ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, loop_res_lab);
+
+  /* BASE = BASE - ADJUSTMENT.  */
+  xops[0] = base;
+  xops[1] = adjustment;
+  output_asm_insn ("sub\t%0, %0, %1", xops);
+
+  /* Test if BASE < LIMIT.  */
+  

RE: [PATCH][GCC][AArch64] Updated stack-clash implementation supporting 64k probes. [patch (1/7)]

2018-08-28 Thread Tamar Christina
Hi All,

As requested this patch series now contains basic SVE support, following that
I am updating this patch to remove the error/warnings generated when SVE is 
used.

The series now consists of 7 patches but I will only send updates for those 
that changed.


Ok for trunk?

Thanks,
Tamar

gcc/
2018-08-28  Jeff Law  
Richard Sandiford 
Tamar Christina  

PR target/86486
* config/aarch64/aarch64.md
(probe_stack_range): Add k (SP) constraint.
* config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
STACK_CLASH_MAX_UNROLL_PAGES): New.
* config/aarch64/aarch64.c (aarch64_output_probe_stack_range): Emit
stack probes for stack clash.
(aarch64_allocate_and_probe_stack_space): New.
(aarch64_expand_prologue): Use it.
(aarch64_expand_epilogue): Likewise and update IP regs re-use criteria.
(aarch64_sub_sp): Add emit_move_imm optional param.

gcc/testsuite/
2018-08-28  Jeff Law  
Richard Sandiford 
Tamar Christina  

PR target/86486
* gcc.target/aarch64/stack-check-12.c: New.
* gcc.target/aarch64/stack-check-13.c: New.
* gcc.target/aarch64/stack-check-cfa-1.c: New.
* gcc.target/aarch64/stack-check-cfa-2.c: New.
* gcc.target/aarch64/stack-check-prologue-1.c: New.
* gcc.target/aarch64/stack-check-prologue-10.c: New.
* gcc.target/aarch64/stack-check-prologue-11.c: New.
* gcc.target/aarch64/stack-check-prologue-2.c: New.
* gcc.target/aarch64/stack-check-prologue-3.c: New.
* gcc.target/aarch64/stack-check-prologue-4.c: New.
* gcc.target/aarch64/stack-check-prologue-5.c: New.
* gcc.target/aarch64/stack-check-prologue-6.c: New.
* gcc.target/aarch64/stack-check-prologue-7.c: New.
* gcc.target/aarch64/stack-check-prologue-8.c: New.
* gcc.target/aarch64/stack-check-prologue-9.c: New.
* gcc.target/aarch64/stack-check-prologue.h: New.
* lib/target-supports.exp
(check_effective_target_supports_stack_clash_protection): Add AArch64.


> -Original Message-
> From: gcc-patches-ow...@gcc.gnu.org 
> On Behalf Of Tamar Christina
> Sent: Tuesday, August 7, 2018 11:10
> To: Jeff Law 
> Cc: gcc-patches@gcc.gnu.org; nd ; James Greenhalgh
> ; Richard Earnshaw
> ; Marcus Shawcroft
> 
> Subject: RE: [PATCH][GCC][AArch64] Updated stack-clash implementation
> supporting 64k probes. [patch (1/6)]
> 
> Hi All,
> 
> This is  a re-spin of the patch to address review comments.
> It mostly just adds more comments and corrects typos.
> 
> 
> Ok for trunk?
> 
> Thanks,
> Tamar
> 
> gcc/
> 2018-08-07  Jeff Law  
>   Richard Sandiford 
>   Tamar Christina  
> 
>   PR target/86486
>   * config/aarch64/aarch64.md (cmp,
>   probe_stack_range): Add k (SP) constraint.
>   * config/aarch64/aarch64.h (STACK_CLASH_CALLER_GUARD,
>   STACK_CLASH_MAX_UNROLL_PAGES): New.
>   * config/aarch64/aarch64.c (aarch64_output_probe_stack_range):
> Emit
>   stack probes for stack clash.
>   (aarch64_allocate_and_probe_stack_space): New.
>   (aarch64_expand_prologue): Use it.
>   (aarch64_expand_epilogue): Likewise and update IP regs re-use
> criteria.
>   (aarch64_sub_sp): Add emit_move_imm optional param.
> 
> gcc/testsuite/
> 2018-08-07  Jeff Law  
>   Richard Sandiford 
>   Tamar Christina  
> 
>   PR target/86486
>   * gcc.target/aarch64/stack-check-12.c: New.
>   * gcc.target/aarch64/stack-check-13.c: New.
>   * gcc.target/aarch64/stack-check-cfa-1.c: New.
>   * gcc.target/aarch64/stack-check-cfa-2.c: New.
>   * gcc.target/aarch64/stack-check-prologue-1.c: New.
>   * gcc.target/aarch64/stack-check-prologue-10.c: New.
>   * gcc.target/aarch64/stack-check-prologue-11.c: New.
>   * gcc.target/aarch64/stack-check-prologue-2.c: New.
>   * gcc.target/aarch64/stack-check-prologue-3.c: New.
>   * gcc.target/aarch64/stack-check-prologue-4.c: New.
>   * gcc.target/aarch64/stack-check-prologue-5.c: New.
>   * gcc.target/aarch64/stack-check-prologue-6.c: New.
>   * gcc.target/aarch64/stack-check-prologue-7.c: New.
>   * gcc.target/aarch64/stack-check-prologue-8.c: New.
>   * gcc.target/aarch64/stack-check-prologue-9.c: New.
>   * gcc.target/aarch64/stack-check-prologue.h: New.
>   * lib/target-supports.exp
>   (check_effective_target_supports_stack_clash_protection): Add
> AArch64.
> 
> > -Original Message-
> > From: Jeff Law 
> > Sent: Friday, August 3, 2018 19:05
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; James Greenhalgh
> > ; Richard Earnshaw
> > ; Marcus Shawcroft
> > 
> > Subject: Re: [PATCH][GCC][AArch64] Updated stack-clash implementation
> > supporting 64k probes. [patch (1/6)]
> >
> > On 07/25/2018 05:09 AM, Tamar Christina wrote:
> > > Hi All,
> > >
> > > Attached is an updated patch that clarifies some of 

RE: [PATCH][GCC][AArch64] Cleanup the AArch64 testsuite when stack-clash is on [Patch (7/7)]

2018-08-28 Thread Tamar Christina
Hi All,

Since this patch series now contains SVE support I am removing the changes
to the SVE tests in this patch series.  I assume the OK still stands as the
only change here is undoing updates to three files.

Thanks,
Tamar

gcc/testsuite/
2018-08-28  Tamar Christina  

PR target/86486
* gcc.dg/pr82788.c: Skip for AArch64.
* gcc.dg/guality/vla-1.c: Turn off stack-clash.
* gcc.target/aarch64/subsp.c: Likewise.
* gcc.dg/params/blocksort-part.c: Skip stack-clash checks
on AArch64.
* gcc.dg/stack-check-10.c: Add AArch64 specific checks.
* gcc.dg/stack-check-5.c: Add AArch64 specific checks.
* gcc.dg/stack-check-6a.c: Skip on AArch64, we don't support this.
* testsuite/lib/target-supports.exp
(check_effective_target_frame_pointer_for_non_leaf): AArch64 does not
require frame pointer for non-leaf functions.


> -Original Message-
> From: Jeff Law 
> Sent: Friday, August 3, 2018 19:30
> To: Tamar Christina ; gcc-patches@gcc.gnu.org
> Cc: nd ; James Greenhalgh ;
> Richard Earnshaw ; Marcus Shawcroft
> 
> Subject: Re: [PATCH][GCC][AArch64] Cleanup the AArch64 testsuite when
> stack-clash is on [Patch (6/6)]
> 
> On 07/24/2018 04:28 AM, tamar.christ...@arm.com wrote:
> > Hi All,
> >
> > This patch cleans up the testsuite when a run is done with stack clash
> > protection turned on.
> >
> > Concretely this switches off -fstack-clash-protection for a couple of tests:
> >
> > * sve: We don't yet support stack-clash-protection and sve, so for now turn
> these off.
> > * assembler scan: some tests are quite fragile in that they check for exact
> >assembly output, e.g. check for exact amount of sub etc.  These won't
> >match now.
> > * vla: Some of the ubsan tests negative array indices. Because the arrays
> weren't
> >used before the incorrect $sp wouldn't have been used. The correct
> value is
> >restored on ret.  Now however we probe the $sp which causes a
> segfault.
> > * params: When testing the parameters we have to skip these on AArch64
> because of our
> >   custom constraints on them.  We already test them separately so 
> > this
> isn't a
> >   loss.
> >
> > Note that the testsuite is not entire clean due to gdb failure caused
> > by alloca with stack clash. On AArch64 we output an incorrect .loc
> > directive, but this is already the case with the current implementation in
> GCC and is a bug unrelated to this patch series.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> and no issues.
> > Both targets were tested with stack clash on and off by default.
> >
> > Ok for trunk?
> >
> > Thanks,
> > Tamar
> >
> > gcc/testsuite/
> > 2018-07-24  Tamar Christina  
> >
> > PR target/86486
> > * gcc.dg/pr82788.c: Skip for AArch64.
> > * gcc.dg/guality/vla-1.c: Turn off stack-clash.
> > * gcc.target/aarch64/subsp.c: Likewise.
> > * gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise.
> > * gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise.
> > * gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise.
> > * gcc.dg/params/blocksort-part.c: Skip stack-clash checks
> > on AArch64.
> > * gcc.dg/stack-check-10.c: Add AArch64 specific checks.
> > * gcc.dg/stack-check-5.c: Add AArch64 specific checks.
> > * gcc.dg/stack-check-6a.c: Skip on AArch64, we don't support this.
> > * testsuite/lib/target-supports.exp
> > (check_effective_target_frame_pointer_for_non_leaf): AArch64
> does not
> > require frame pointer for non-leaf functions.
> >
> >> -Original Message-
> >> From: Tamar Christina 
> >> Sent: Wednesday, July 11, 2018 12:23
> >> To: gcc-patches@gcc.gnu.org
> >> Cc: nd ; James Greenhalgh
> ;
> >> Richard Earnshaw ; Marcus Shawcroft
> >> 
> >> Subject: [PATCH][GCC][AArch64] Cleanup the AArch64 testsuite when
> >> stack- clash is on [Patch (6/6)]
> >>
> >> Hi All,
> >>
> >> This patch cleans up the testsuite when a run is done with stack
> >> clash protection turned on.
> >>
> >> Concretely this switches off -fstack-clash-protection for a couple of 
> >> tests:
> >>
> >> * sve: We don't yet support stack-clash-protection and sve, so for
> >> now turn these off.
> >> * assembler scan: some tests are quite fragile in that they check for exact
> >>assembly output, e.g. check for exact amount of sub etc.  These 
> >> won't
> >>match now.
> >> * vla: Some of the ubsan tests negative array indices. Because the
> >> arrays weren't
> >>used before the incorrect $sp wouldn't have been used. The
> >> correct value is
> >>restored on ret.  Now however we probe the $sp which causes a
> segfault.
> >> * params: When testing the parameters we have to skip these on
> >> AArch64 because of our
> >>   custom constraints on them.  We already test them
> >> separately so this isn't a
> >>   loss.
> >>
> >> Note that the testsuite is not 

PR85787: Extend malloc_candidate_p to handle multiple phis.

2018-08-28 Thread Prathamesh Kulkarni
H
The attached patch extends malloc_candidate_p to handle multiple phis.
There's a lot of noise in the patch because I moved most of
malloc_candidate_p into
new function malloc_candidate_p_1. The only real change is following hunk:

+   gimple *arg_def = SSA_NAME_DEF_STMT (arg);
+   if (is_a (arg_def))
+ {
+   if (!malloc_candidate_p_1 (fun, arg, phi, ipa))
+   DUMP_AND_RETURN ("nested phi fail")
+   continue;
+ }
+

Which checks recursively that the phi argument is used only within
comparisons against 0
and the phi.

Bootstrapped+tested on x86_64-unknown-linux-gnu.
OK to commit ?

Thanks,
Prathamesh
2018-08-28  Prathamesh Kulkarni  

PR tree-optimization/85787
* ipa-pure-const.c (malloc_candidate_p_1): Move most of 
malloc_candidate_p
into this function and add support for detecting multiple phis.
(DUMP_AND_RETURN): Move from malloc_candidate_p into top-level macro.

diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index a9a8863d907..66c81be23ec 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -869,14 +869,6 @@ check_retval_uses (tree retval, gimple *stmt)
   and return_stmt (and likewise a phi arg has immediate use only within 
comparison
   or the phi stmt).  */
 
-static bool
-malloc_candidate_p (function *fun, bool ipa)
-{
-  basic_block exit_block = EXIT_BLOCK_PTR_FOR_FN (fun);
-  edge e;
-  edge_iterator ei;
-  cgraph_node *node = cgraph_node::get_create (fun->decl);
-
 #define DUMP_AND_RETURN(reason)  \
 {  \
   if (dump_file && (dump_flags & TDF_DETAILS))  \
@@ -885,6 +877,96 @@ malloc_candidate_p (function *fun, bool ipa)
   return false;  \
 }
 
+static bool
+malloc_candidate_p_1 (function *fun, tree retval, gimple *ret_stmt, bool ipa)
+{
+  cgraph_node *node = cgraph_node::get_create (fun->decl);
+
+  if (!check_retval_uses (retval, ret_stmt))
+DUMP_AND_RETURN("Return value has uses outside return stmt"
+   " and comparisons against 0.")
+
+  gimple *def = SSA_NAME_DEF_STMT (retval);
+
+  if (gcall *call_stmt = dyn_cast (def))
+{
+  tree callee_decl = gimple_call_fndecl (call_stmt);
+  if (!callee_decl)
+   return false;
+
+  if (!ipa && !DECL_IS_MALLOC (callee_decl))
+   DUMP_AND_RETURN("callee_decl does not have malloc attribute for"
+   " non-ipa mode.")
+
+  cgraph_edge *cs = node->get_edge (call_stmt);
+  if (cs)
+   {
+ ipa_call_summary *es = ipa_call_summaries->get_create (cs);
+ es->is_return_callee_uncaptured = true;
+   }
+}
+
+else if (gphi *phi = dyn_cast (def))
+  {
+   bool all_args_zero = true;
+   for (unsigned i = 0; i < gimple_phi_num_args (phi); ++i)
+ {
+   tree arg = gimple_phi_arg_def (phi, i);
+   if (integer_zerop (arg))
+ continue;
+
+   all_args_zero = false;
+   if (TREE_CODE (arg) != SSA_NAME)
+ DUMP_AND_RETURN ("phi arg is not SSA_NAME.");
+   if (!check_retval_uses (arg, phi))
+ DUMP_AND_RETURN ("phi arg has uses outside phi"
+" and comparisons against 0.")
+
+   gimple *arg_def = SSA_NAME_DEF_STMT (arg);
+   if (is_a (arg_def))
+ {
+   if (!malloc_candidate_p_1 (fun, arg, phi, ipa))
+   DUMP_AND_RETURN ("nested phi fail")
+   continue;
+ }
+
+   gcall *call_stmt = dyn_cast (arg_def);
+   if (!call_stmt)
+ DUMP_AND_RETURN ("phi arg is a not a call_stmt.")
+
+   tree callee_decl = gimple_call_fndecl (call_stmt);
+   if (!callee_decl)
+ return false;
+   if (!ipa && !DECL_IS_MALLOC (callee_decl))
+ DUMP_AND_RETURN("callee_decl does not have malloc attribute"
+ " for non-ipa mode.")
+
+   cgraph_edge *cs = node->get_edge (call_stmt);
+   if (cs)
+ {
+   ipa_call_summary *es = ipa_call_summaries->get_create (cs);
+   es->is_return_callee_uncaptured = true;
+ }
+ }
+
+   if (all_args_zero)
+ DUMP_AND_RETURN ("Return value is a phi with all args equal to 0.")
+  }
+
+else
+  DUMP_AND_RETURN("def_stmt of return value is not a call or phi-stmt.")
+
+  return true;
+}
+
+static bool
+malloc_candidate_p (function *fun, bool ipa)
+{
+  basic_block exit_block = EXIT_BLOCK_PTR_FOR_FN (fun);
+  edge e;
+  edge_iterator ei;
+  cgraph_node *node = cgraph_node::get_create (fun->decl);
+
   if (EDGE_COUNT (exit_block->preds) == 0
   || !flag_delete_null_pointer_checks)
 return false;
@@ -905,80 +987,17 @@ malloc_candidate_p (function *fun, bool ipa)
  || TREE_CODE (TREE_TYPE (retval)) != POINTER_TYPE)
DUMP_AND_RETURN("Return value is not SSA_NAME or not a pointer type.")
 
-  if (!check_retval_uses (retval, 

[6/6] Link imm uses for pattern stmts

2018-08-28 Thread Richard Sandiford
One of the warts of the vectoriser IR is that it doesn't link SSA name
uses for pattern statements, leading to complicated special cases in
vect_mark_stmts_to_be_vectorized and (especially) vect_detect_hybrid_slp.
It also makes it harder to check how an SSA name is used after pattern
replacement (something I need for a later patch).

This patch adds a mode in which tree-ssa-operands.c can update
statements in the same non-invasive way as for debug statements.
It then uses this mode to update pattern statements when adding
them to a vec_basic_block, so that pattern statements become
even more like statements that existed from the outset.


2018-08-28  Richard Sandiford  

gcc/
* tree-ssa-operands.h (update_stmt_operands): Add a transparent_p
argument.
* tree-ssa-operands.c (opf_transparent, opf_sticky): New macros.
(get_mem_ref_operands, get_tmr_operands): Preserve opf_sticky
rather than just opf_no_vops.
(get_expr_operands): Preserve opf_sticky bits in the use flags.
Assert that opf_no_vops and opf_transparent are already set
for the debug statements.  Use opf_transparent rather than
is_gimple_debug when deciding whether to mark something as
having its address taken.
(parse_ssa_operands): Add a transparent_p argument.  Set the
opf_no_vops and opf_transparent flags when the argument is true,
or when dealing with debug statements.  Check opf_no_vops before
adding vuses and vdefs.
(build_ssa_operands): Add a transparent_p argument and pass it to
parse_ssa_operands.
(verify_ssa_operands): Update call to parse_ssa_operands.
(update_stmt_operands): Add a transparent_p argument and pass it to
build_ssa_operands.
* gimple-ssa.h (update_stmt, update_stmt_if_modified)
(update_stmt_fn): Add an optional transparent_p parameter and
update call to update_stmt_operands.
* tree-vect-slp.c (vect_detect_hybrid_slp_1): Delete.
(vect_detect_hybrid_slp_2): Likewise.
(vect_detect_hybrid_slp): Don't treat pattern statements specially.
* tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Likewise.
(vect_remove_dead_scalar_stmts): Remove pattern statements from
the containing vec_info.
* tree-vectorizer.h (vec_info::add_pattern_stmt_to_block): Declare.
* tree-vectorizer.c (vec_basic_block::add_to_end)
(vec_basic_block::add_before): Call add_pattern_stmt_to_block.
(vec_basic_block::remove, vec_info::remove_stmt): Call
remove_pattern_stmt_from_block.
(vec_basic_block::add_pattern_stmt_to_block): New function.
(remove_pattern_stmt_from_block): Likewise.
(vec_info::free_stmt_vec_info): Handle pattern statements.
(vec_info::lookup_single_use): Accept pattern statements
as well as original statements.  Ignore uses in statements
that have been replaced by a pattern statement.
* tree-vect-patterns.c (vect_init_pattern_stmt): Don't call
gimple_set_bb.
(vect_look_through_possible_promotion): Use vinfo->lookup_single_use
instead of has_single_use.  Track single uses for pattern statements
too.

Index: gcc/tree-ssa-operands.h
===
--- gcc/tree-ssa-operands.h 2018-05-02 08:37:32.405761509 +0100
+++ gcc/tree-ssa-operands.h 2018-08-28 12:05:19.262917177 +0100
@@ -94,7 +94,7 @@ extern void init_ssa_operands (struct fu
 extern void fini_ssa_operands (struct function *);
 extern bool verify_ssa_operands (struct function *, gimple *stmt);
 extern void free_stmt_operands (struct function *, gimple *);
-extern void update_stmt_operands (struct function *, gimple *);
+extern void update_stmt_operands (struct function *, gimple *, bool);
 extern void swap_ssa_operands (gimple *, tree *, tree *);
 extern bool verify_imm_links (FILE *f, tree var);
 
Index: gcc/tree-ssa-operands.c
===
--- gcc/tree-ssa-operands.c 2018-08-28 11:25:46.242879876 +0100
+++ gcc/tree-ssa-operands.c 2018-08-28 12:05:19.262917177 +0100
@@ -99,6 +99,14 @@ #define opf_not_non_addressable (1 << 4)
 /* Operand is having its address taken.  */
 #define opf_address_taken (1 << 5)
 
+/* Operand must have no effect on code generation.  This is used for
+   debug statements, and also for statements that a pass has no intention
+   of adding to the block in their current form.  */
+#define opf_transparent (1 << 6)
+
+/* Flags that must never be dropped.  */
+#define opf_sticky (opf_no_vops | opf_transparent)
+
 /* Array for building all the use operands.  */
 static vec build_uses;
 
@@ -590,7 +598,7 @@ get_mem_ref_operands (struct function *f
   /* If requested, add a USE operand for the base pointer.  */
   get_expr_operands (fn, stmt, pptr,
 opf_non_addressable | opf_use
-

[5/6] Insert pattern statements into vec_basic_blocks

2018-08-28 Thread Richard Sandiford
The point of this patch is to put pattern statements in the same
vec_basic_block as the statements they replace, with the pattern
statements for S coming between S and S's original predecessor.
This removes the need to handle them specially in various places.


2018-08-28  Richard Sandiford  

gcc/
* tree-vectorizer.h (vec_basic_block): Expand comment.
(_stmt_vec_info::pattern_def_seq): Delete.
(STMT_VINFO_PATTERN_DEF_SEQ): Likewise.
(is_main_pattern_stmt_p): New function.
* tree-vect-loop.c (vect_determine_vf_for_stmt_1): Rename to...
(vect_determine_vf_for_stmt): ...this, deleting the original
function with this name.  Remove vectype_maybe_set_p argument
and test is_pattern_stmt_p instead.  Retain the "examining..."
message from the previous vect_determine_vf_for_stmt.
(vect_compute_single_scalar_iteration_cost, vect_update_vf_for_slp)
(vect_analyze_loop_2): Don't treat pattern statements specially.
(vect_transform_loop): Likewise.  Use vect_orig_stmt to find the
insertion point.
* tree-vect-slp.c (vect_detect_hybrid_slp): Expect pattern statements
to be in the statement list, without needing to follow
STMT_VINFO_RELATED_STMT.  Remove PATTERN_DEF_SEQ handling.
* tree-vect-stmts.c (vect_analyze_stmt): Don't handle pattern
statements specially.
(vect_remove_dead_scalar_stmts): Ignore pattern statements.
* tree-vect-patterns.c (vect_set_pattern_stmt): Insert the pattern
statement into the vec_basic_block immediately before the statement
it replaces.
(append_pattern_def_seq): Likewise.  If the original statement is
itself a pattern statement, associate the new one with the original
statement.
(vect_split_statement): Use append_pattern_def_seq to insert the
first pattern statement.
(vect_recog_vector_vector_shift_pattern): Remove mention of
STMT_VINFO_PATTERN_DEF_SEQ.
(adjust_bool_stmts): Get the last pattern statement from the
stmt_vec_info chain.
(vect_mark_pattern_stmts): Rename to...
(vect_replace_stmt_with_pattern): ...this.  Remove the
PATTERN_DEF_SEQ handling and process only the pattern statement given.
Use append_pattern_def_seq when replacing a pattern statement with
another pattern statement, and use vec_basic_block::remove instead
of gsi_remove to remove the old one.
(vect_pattern_recog_1): Update accordingly.  Remove PATTERN_DEF_SEQ
handling.  On failure, remove any half-formed pattern sequence from
the vec_basic_block.  Install the vector type in pattern statements
that don't yet have one.
(vect_pattern_recog): Iterate over statements that are added
by previous recognizers, but skipping those that have already
been replaced, or the main pattern statement in such a replacement.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2018-08-28 12:05:14.014961439 +0100
+++ gcc/tree-vectorizer.h   2018-08-28 12:05:16.522940287 +0100
@@ -172,7 +172,13 @@ #define SLP_TREE_TWO_OPERATORS(S)   (S)-
 #define SLP_TREE_DEF_TYPE(S)(S)->def_type
 
 /* Information about the phis and statements in a block that we're trying
-   to vectorize, in their original order.  */
+   to vectorize.  This includes the phis and statements that were in the
+   original scalar code, in their original order.  It also includes any
+   pattern statements that the vectorizer has created to replace some
+   of the scalar ones.  Such pattern statements come immediately before
+   the statement that they replace; that is, all pattern statements P for
+   which vect_orig_stmt (P) == S form a sequence that comes immediately
+   before S.  */
 class vec_basic_block
 {
 public:
@@ -870,11 +876,6 @@ struct _stmt_vec_info {
 pattern).  */
   stmt_vec_info related_stmt;
 
-  /* Used to keep a sequence of def stmts of a pattern stmt if such exists.
- The sequence is attached to the original statement rather than the
- pattern statement.  */
-  gimple_seq pattern_def_seq;
-
   /* List of datarefs that are known to have the same alignment as the dataref
  of this stmt.  */
   vec same_align_refs;
@@ -1048,7 +1049,6 @@ #define STMT_VINFO_DR_INFO(S) \
 
 #define STMT_VINFO_IN_PATTERN_P(S) (S)->in_pattern_p
 #define STMT_VINFO_RELATED_STMT(S) (S)->related_stmt
-#define STMT_VINFO_PATTERN_DEF_SEQ(S)  (S)->pattern_def_seq
 #define STMT_VINFO_SAME_ALIGN_REFS(S)  (S)->same_align_refs
 #define STMT_VINFO_SIMD_CLONE_INFO(S) (S)->simd_clone_info
 #define STMT_VINFO_DEF_TYPE(S) (S)->def_type
@@ -1176,6 +1176,17 @@ is_pattern_stmt_p (stmt_vec_info stmt_in
   return stmt_info->pattern_stmt_p;
 }
 
+/* Return TRUE if a statement represented by 

[4/6] Make the vectoriser do its own DCE

2018-08-28 Thread Richard Sandiford
The vectoriser normally leaves a later DCE pass to remove the scalar
code, but we've accumulated various special cases for things that
DCE can't handle, such as removing the scalar stores that have been
replaced by vector stores, and the scalar calls to internal functions.
(The latter must be removed for correctness, since no underlying scalar
optabs exist for those calls.)

Now that vec_basic_block gives us an easy way of iterating over the
original scalar code (ignoring any new code inserted by the vectoriser),
it seems easier to do the DCE directly.  This involves marking the few
cases in which the vector code needs part of the original scalar code
to be kept around.


2018-08-28  Richard Sandiford  

gcc/
* tree-vectorizer.h (_stmt_vec_info::used_by_vector_code_p): New
member variable.
(vect_mark_used_by_vector_code): Declare.
(vect_remove_dead_scalar_stmts): Likewise.
(vect_transform_stmt): Return void.
(vect_remove_stores): Delete.
* tree-vectorizer.c (vec_info::remove_stmt): Handle phis.
* tree-vect-stmts.c (vect_mark_used_by_vector_code): New function.
(vectorizable_call, vectorizable_simd_clone_call): Don't remove
scalar calls here.
(vectorizable_shift): Mark shift amounts in a vector-scalar shift
as being used by the vector code.
(vectorizable_load): Mark unhoisted scalar loads that feed a
load-and-broadcast operation as being needed by the vector code.
(vect_transform_stmt): Return void.
(vect_remove_stores): Delete.
(vect_maybe_remove_scalar_stmt): New function.
(vect_remove_dead_scalar_stmts): Likewise.
* tree-vect-slp.c (vect_slp_bb): Call vect_remove_dead_scalar_stmts.
(vect_remove_slp_scalar_calls): Delete.
(vect_schedule_slp): Don't call it.  Don't remove scalar stores here.
* tree-vect-loop.c (vectorizable_reduction): Mark scalar phis that
are retained by the vector code.
(vectorizable_live_operation): Mark scalar live-out statements that
are retained by the vector code.
(vect_transform_loop_stmt): Remove seen_store argument.  Mark gconds
in nested loops as being needed by the vector code.  Replace the
outer loop's gcond with a dummy condition.
(vect_transform_loop): Update calls accordingly.  Don't remove
scalar stores or calls here; call vect_remove_dead_scalar_stmts
instead.
* tree-vect-loop-manip.c (vect_update_ivs_after_vectorizer): Don't
remove PHIs that were classified as reductions but never actually
vectorized.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2018-08-28 12:05:11.466982927 +0100
+++ gcc/tree-vectorizer.h   2018-08-28 12:05:14.014961439 +0100
@@ -925,6 +925,10 @@ struct _stmt_vec_info {
   /* For both loads and stores.  */
   bool simd_lane_access_p;
 
+  /* True if the vectorized code keeps this statement in its current form.
+ Only meaningful for statements that were in the original scalar code.  */
+  bool used_by_vector_code_p;
+
   /* Classifies how the load or store is going to be implemented
  for loop vectorization.  */
   vect_memory_access_type memory_access_type;
@@ -1522,6 +1526,7 @@ extern stmt_vec_info vect_finish_replace
 extern stmt_vec_info vect_finish_stmt_generation (stmt_vec_info, gimple *,
  gimple_stmt_iterator *);
 extern bool vect_mark_stmts_to_be_vectorized (loop_vec_info);
+extern void vect_mark_used_by_vector_code (stmt_vec_info);
 extern tree vect_get_store_rhs (stmt_vec_info);
 extern tree vect_get_vec_def_for_operand_1 (stmt_vec_info, enum vect_def_type);
 extern tree vect_get_vec_def_for_operand (tree, stmt_vec_info, tree = NULL);
@@ -1532,9 +1537,8 @@ extern void vect_get_vec_defs_for_stmt_c
 extern tree vect_init_vector (stmt_vec_info, tree, tree,
   gimple_stmt_iterator *);
 extern tree vect_get_vec_def_for_stmt_copy (vec_info *, tree);
-extern bool vect_transform_stmt (stmt_vec_info, gimple_stmt_iterator *,
+extern void vect_transform_stmt (stmt_vec_info, gimple_stmt_iterator *,
 slp_tree, slp_instance);
-extern void vect_remove_stores (stmt_vec_info);
 extern bool vect_analyze_stmt (stmt_vec_info, bool *, slp_tree, slp_instance,
   stmt_vector_for_cost *);
 extern bool vectorizable_condition (stmt_vec_info, gimple_stmt_iterator *,
@@ -1554,6 +1558,7 @@ extern gcall *vect_gen_while (tree, tree
 extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
 extern bool vect_get_vector_types_for_stmt (stmt_vec_info, tree *, tree *);
 extern tree vect_get_mask_type_for_stmt (stmt_vec_info);
+extern void vect_remove_dead_scalar_stmts (vec_info *);
 
 /* In tree-vect-data-refs.c.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, 

[3/6] Add a vec_basic_block structure

2018-08-28 Thread Richard Sandiford
This patch adds a vec_basic_block that records the scalar phis and
scalar statements that we need to vectorise.  This is a slight
simplification in its own right, since it avoids unnecesary statement
lookups and shaves >50 LOC.  But the main reason for doing it is
to allow the rest of the series to treat pattern statements less
specially.

Putting phis (which are logically parallel) and normal statements
(which are logically serial) into a single list might seem dangerous,
but I think in practice it should be fine.  Very little vectoriser
code needs to handle the parallel nature of phis specially, and code
that does can still do so.  Having a single list simplifies code that
wants to look at every scalar phi or stmt in isolation.

The new test is for cases in which we try to hoist the same expression
twice, which I broke with an earlier version of the patch.


2018-08-28  Richard Sandiford  

gcc/
* tree-vectorizer.h (vec_basic_block): New structure.
(vec_info::blocks, _stmt_vec_info::block, _stmt_vec_info::prev)
(_stmt_vec_info::next): New member variables.
(FOR_EACH_VEC_BB_STMT, FOR_EACH_VEC_BB_STMT_REVERSE): New macros.
(vec_basic_block::vec_basic_block): New function.
* tree-vectorizer.c (vec_basic_block::add_to_end): Likewise.
(vec_basic_block::add_before): Likewise.
(vec_basic_block::remove): Likewise.
(vec_info::~vec_info): Free the vec_basic_blocks.
(vec_info::remove_stmt): Remove the statement from the containing
vec_basic_block.
* tree-vect-patterns.c (vect_determine_precisions)
(vect_pattern_recog): Iterate over vec_basic_blocks.
* tree-vect-loop.c (vect_determine_vectorization_factor)
(vect_compute_single_scalar_iteration_cost, vect_update_vf_for_slp)
(vect_analyze_loop_operations, vect_transform_loop): Likewise.
(_loop_vec_info::_loop_vec_info): Construct vec_basic_blocks.
* tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Likewise.
(vect_detect_hybrid_slp): Iterate over vec_basic_blocks.
* tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Likewise.
(vect_finish_replace_stmt, vectorizable_condition): Remove the original
statement from the containing block.
(hoist_defs_of_uses): Likewise the statement that we're hoisting.

gcc/testsuite/
* gcc.dg/vect/vect-hoist-1.c: New test.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2018-08-28 12:05:08.743005901 +0100
+++ gcc/tree-vectorizer.h   2018-08-28 12:05:11.466982927 +0100
@@ -171,7 +171,30 @@ #define SLP_TREE_LOAD_PERMUTATION(S)
 #define SLP_TREE_TWO_OPERATORS(S)   (S)->two_operators
 #define SLP_TREE_DEF_TYPE(S)(S)->def_type
 
+/* Information about the phis and statements in a block that we're trying
+   to vectorize, in their original order.  */
+class vec_basic_block
+{
+public:
+  vec_basic_block (basic_block);
+
+  void add_to_end (stmt_vec_info);
+  void add_before (stmt_vec_info, stmt_vec_info);
+  void remove (stmt_vec_info);
+
+  basic_block bb () const { return m_bb; }
+  stmt_vec_info first () const { return m_first; }
+  stmt_vec_info last () const { return m_last; }
+
+private:
+  /* The block itself.  */
+  basic_block m_bb;
 
+  /* The first and last statements in the block, forming a doubly-linked list.
+ The list includes both phis and true statements.  */
+  stmt_vec_info m_first;
+  stmt_vec_info m_last;
+};
 
 /* Describes two objects whose addresses must be unequal for the vectorized
loop to be valid.  */
@@ -249,6 +272,9 @@ struct vec_info {
   /* Cost data used by the target cost model.  */
   void *target_cost_data;
 
+  /* The basic blocks in the vectorization region.  */
+  auto_vec blocks;
+
 private:
   stmt_vec_info new_stmt_vec_info (gimple *stmt);
   void set_vinfo_for_stmt (gimple *, stmt_vec_info);
@@ -776,6 +802,11 @@ struct dr_vec_info {
 typedef struct data_reference *dr_p;
 
 struct _stmt_vec_info {
+  /* The block to which the statement belongs, or null if none.  */
+  vec_basic_block *block;
+
+  /* Link chains for the previous and next statements in BLOCK.  */
+  stmt_vec_info prev, next;
 
   enum stmt_vec_info_type type;
 
@@ -1072,6 +1103,27 @@ #define VECT_SCALAR_BOOLEAN_TYPE_P(TYPE)
&& TYPE_PRECISION (TYPE) == 1   \
&& TYPE_UNSIGNED (TYPE)))
 
+/* Make STMT_INFO iterate over each statement in vec_basic_block VEC_BB
+   in forward order.  */
+
+#define FOR_EACH_VEC_BB_STMT(VEC_BB, STMT_INFO) \
+  for (stmt_vec_info STMT_INFO = (VEC_BB)->first (); STMT_INFO; \
+   STMT_INFO = STMT_INFO->next)
+
+/* Make STMT_INFO iterate over each statement in vec_basic_block VEC_BB
+   in backward order.  */
+
+#define FOR_EACH_VEC_BB_STMT_REVERSE(VEC_BB, STMT_INFO) \
+  for (stmt_vec_info STMT_INFO = (VEC_BB)->last (); STMT_INFO; \
+   STMT_INFO = STMT_INFO->prev)
+
+/* 

[2/6] Make vec_info::lookup_single_use take a stmt_vec_info

2018-08-28 Thread Richard Sandiford
All callers to vec_info::lookup_single_use are asking about the lhs of a
statement that they're already examining.  It makes more sense to pass
that statement instead of the SSA name, since it is then easier to
handle statements that have been replaced by pattern statements.

A later patch adds support for pattern statements.  This one just
changes the interface to make that possible.


2018-08-28  Richard Sandiford  

gcc/
* tree-vectorizer.h (vec_info::lookup_single_use): Take a
stmt_vec_info instead of an SSA name.
* tree-vectorizer.c (vec_info::lookup_single_use): Likewise.
* tree-vect-loop.c (vectorizable_reduction): Update call
accordingly.
* tree-vect-stmts.c (supportable_widening_operation): Likewise.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2018-08-01 16:14:45.107097179 +0100
+++ gcc/tree-vectorizer.h   2018-08-28 12:05:08.743005901 +0100
@@ -220,7 +220,7 @@ struct vec_info {
   stmt_vec_info add_stmt (gimple *);
   stmt_vec_info lookup_stmt (gimple *);
   stmt_vec_info lookup_def (tree);
-  stmt_vec_info lookup_single_use (tree);
+  stmt_vec_info lookup_single_use (stmt_vec_info);
   struct dr_vec_info *lookup_dr (data_reference *);
   void move_dr (stmt_vec_info, stmt_vec_info);
   void remove_stmt (stmt_vec_info);
Index: gcc/tree-vectorizer.c
===
--- gcc/tree-vectorizer.c   2018-08-21 14:47:07.795167843 +0100
+++ gcc/tree-vectorizer.c   2018-08-28 12:05:08.743005901 +0100
@@ -544,13 +544,14 @@ vec_info::lookup_def (tree name)
   return NULL;
 }
 
-/* See whether there is a single non-debug statement that uses LHS and
-   whether that statement has an associated stmt_vec_info.  Return the
-   stmt_vec_info if so, otherwise return null.  */
+/* See whether there is a single non-debug use of the lhs of STMT_INFO
+   and whether the containing statement has an associated stmt_vec_info.
+   Return the stmt_vec_info if so, otherwise return null.  */
 
 stmt_vec_info
-vec_info::lookup_single_use (tree lhs)
+vec_info::lookup_single_use (stmt_vec_info stmt_info)
 {
+  tree lhs = gimple_get_lhs (stmt_info->stmt);
   use_operand_p dummy;
   gimple *use_stmt;
   if (single_imm_use (lhs, , _stmt))
Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2018-08-28 12:05:06.207027289 +0100
+++ gcc/tree-vect-loop.c2018-08-28 12:05:08.743005901 +0100
@@ -6180,7 +6180,7 @@ vectorizable_reduction (stmt_vec_info st
   stmt_vec_info use_stmt_info;
   if (ncopies > 1
  && STMT_VINFO_RELEVANT (reduc_stmt_info) <= vect_used_only_live
- && (use_stmt_info = loop_vinfo->lookup_single_use (phi_result))
+ && (use_stmt_info = loop_vinfo->lookup_single_use (stmt_info))
  && vect_stmt_to_vectorize (use_stmt_info) == reduc_stmt_info)
single_defuse_cycle = true;
 
@@ -6947,10 +6947,9 @@ vectorizable_reduction (stmt_vec_info st
in vectorizable_reduction and there are no intermediate stmts
participating.  */
   stmt_vec_info use_stmt_info;
-  tree reduc_phi_result = gimple_phi_result (reduc_def_phi);
   if (ncopies > 1
   && (STMT_VINFO_RELEVANT (stmt_info) <= vect_used_only_live)
-  && (use_stmt_info = loop_vinfo->lookup_single_use (reduc_phi_result))
+  && (use_stmt_info = loop_vinfo->lookup_single_use (reduc_def_info))
   && vect_stmt_to_vectorize (use_stmt_info) == stmt_info)
 {
   single_defuse_cycle = true;
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2018-08-28 12:05:06.211027256 +0100
+++ gcc/tree-vect-stmts.c   2018-08-28 12:05:08.743005901 +0100
@@ -10237,8 +10237,8 @@ supportable_widening_operation (enum tre
  same operation.  One such an example is s += a * b, where elements
  in a and b cannot be reordered.  Here we check if the vector 
defined
  by STMT is only directly used in the reduction statement.  */
- tree lhs = gimple_assign_lhs (stmt_info->stmt);
- stmt_vec_info use_stmt_info = loop_info->lookup_single_use (lhs);
+ stmt_vec_info use_stmt_info
+   = loop_info->lookup_single_use (stmt_info);
  if (use_stmt_info
  && STMT_VINFO_DEF_TYPE (use_stmt_info) == vect_reduction_def)
return true;


[1/6] Handle gphis in gimple_get_lhs

2018-08-28 Thread Richard Sandiford
Several callers of gimple_get_lhs deal with statements that might
be phis.  This patch makes gimple_get_lhs handle that case too,
so that the callers don't have to.


2018-08-28  Richard Sandiford  

gcc/
* gimple.c (gimple_get_lhs): Handle gphis.
* tree-ssa-phionlycprop.c (get_lhs_or_phi_result): Delete and...
(propagate_rhs_into_lhs, eliminate_const_or_copy): ...use
gimple_get_lhs instead.
* tree-ssa-dom.c (eliminate_redundant_computations): Don't handle
phis specially before calling gimple_get_lhs.
* tree-ssa-scopedtables.c (avail_exprs_stack::lookup_avail_expr):
Likewise.
* tree-vect-loop.c (vectorizable_live_operation): Likewise.
* tree-vect-slp.c (vect_get_slp_vect_defs): Likewise.
(vect_get_slp_defs): Likewise.
* tree-vect-stmts.c (vect_get_vec_def_for_operand_1): Likewise.
(vect_get_vec_def_for_stmt_copy, vect_transform_stmt): Likewise.

Index: gcc/gimple.c
===
--- gcc/gimple.c2018-08-28 11:25:46.162880564 +0100
+++ gcc/gimple.c2018-08-28 12:05:06.203027323 +0100
@@ -1757,12 +1757,12 @@ gimple_assign_set_rhs_with_ops (gimple_s
 tree
 gimple_get_lhs (const gimple *stmt)
 {
-  enum gimple_code code = gimple_code (stmt);
-
-  if (code == GIMPLE_ASSIGN)
-return gimple_assign_lhs (stmt);
-  else if (code == GIMPLE_CALL)
-return gimple_call_lhs (stmt);
+  if (const gphi *phi = dyn_cast  (stmt))
+return gimple_phi_result (phi);
+  else if (const gassign *assign = dyn_cast  (stmt))
+return gimple_assign_lhs (assign);
+  else if (const gcall *call = dyn_cast  (stmt))
+return gimple_call_lhs (call);
   else
 return NULL_TREE;
 }
Index: gcc/tree-ssa-phionlycprop.c
===
--- gcc/tree-ssa-phionlycprop.c 2018-05-02 08:38:14.413364283 +0100
+++ gcc/tree-ssa-phionlycprop.c 2018-08-28 12:05:06.203027323 +0100
@@ -72,20 +72,6 @@ get_rhs_or_phi_arg (gimple *stmt)
 }
 
 
-/* Given a statement STMT, which is either a PHI node or an assignment,
-   return the "lhs" of the node.  */
-
-static tree
-get_lhs_or_phi_result (gimple *stmt)
-{
-  if (gimple_code (stmt) == GIMPLE_PHI)
-return gimple_phi_result (stmt);
-  else if (is_gimple_assign (stmt))
-return gimple_assign_lhs (stmt);
-  else
-gcc_unreachable ();
-}
-
 /* Propagate RHS into all uses of LHS (when possible).
 
RHS and LHS are derived from STMT, which is passed in solely so
@@ -186,7 +172,7 @@ propagate_rhs_into_lhs (gimple *stmt, tr
  print_gimple_stmt (dump_file, use_stmt, 0, dump_flags);
}
 
- result = get_lhs_or_phi_result (use_stmt);
+ result = gimple_get_lhs (use_stmt);
  bitmap_set_bit (interesting_names, SSA_NAME_VERSION (result));
  continue;
}
@@ -240,7 +226,7 @@ propagate_rhs_into_lhs (gimple *stmt, tr
   && (TREE_CODE (gimple_assign_rhs1 (use_stmt)) == SSA_NAME
   || is_gimple_min_invariant (gimple_assign_rhs1 (use_stmt
 {
- tree result = get_lhs_or_phi_result (use_stmt);
+ tree result = gimple_get_lhs (use_stmt);
  bitmap_set_bit (interesting_names, SSA_NAME_VERSION (result));
}
 
@@ -345,7 +331,7 @@ propagate_rhs_into_lhs (gimple *stmt, tr
 eliminate_const_or_copy (gimple *stmt, bitmap interesting_names,
 bitmap need_eh_cleanup)
 {
-  tree lhs = get_lhs_or_phi_result (stmt);
+  tree lhs = gimple_get_lhs (stmt);
   tree rhs;
   int version = SSA_NAME_VERSION (lhs);
   bool cfg_altered = false;
Index: gcc/tree-ssa-dom.c
===
--- gcc/tree-ssa-dom.c  2018-08-28 11:25:45.842883317 +0100
+++ gcc/tree-ssa-dom.c  2018-08-28 12:05:06.203027323 +0100
@@ -1491,10 +1491,7 @@ eliminate_redundant_computations (gimple
 
   gimple *stmt = gsi_stmt (*gsi);
 
-  if (gimple_code (stmt) == GIMPLE_PHI)
-def = gimple_phi_result (stmt);
-  else
-def = gimple_get_lhs (stmt);
+  def = gimple_get_lhs (stmt);
 
   /* Certain expressions on the RHS can be optimized away, but can not
  themselves be entered into the hash tables.  */
Index: gcc/tree-ssa-scopedtables.c
===
--- gcc/tree-ssa-scopedtables.c 2018-06-14 12:27:39.536036781 +0100
+++ gcc/tree-ssa-scopedtables.c 2018-08-28 12:05:06.203027323 +0100
@@ -236,11 +236,7 @@ avail_exprs_stack::lookup_avail_expr (gi
   expr_hash_elt **slot;
   tree lhs;
 
-  /* Get LHS of phi, assignment, or call; else NULL_TREE.  */
-  if (gimple_code (stmt) == GIMPLE_PHI)
-lhs = gimple_phi_result (stmt);
-  else
-lhs = gimple_get_lhs (stmt);
+  lhs = gimple_get_lhs (stmt);
 
   class expr_hash_elt element (stmt, lhs);
 
Index: gcc/tree-vect-loop.c

[0/6] Make vector pattern statements less special

2018-08-28 Thread Richard Sandiford
The goal of this series is to make vector pattern statements
less special compared to normal bb statements, and thus make
them easier to work with.  It picks up the tail end of:

  https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01821.html

Patch 08/11 from that series turned out to be wrong, for the
reason shown by a test in 3/6.  The new series also contains a
couple of other bug fixes and three new patches (1/6, 2/6 and 6/6).

A side benefit is that it becomes as easy for the vectoriser to
do its own DCE as it is to do the half-DCE it currently does (4/6).
This might make:

  PUSH_INSERT_PASSES_WITHIN (pass_vectorize)
  NEXT_PASS (pass_dce);
  POP_INSERT_PASSES ()

redundant -- I can experiment with that as a follow-on if the
series is OK.

The series is a prerequisite to supporting extending loads and
truncating stores.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  Also tested on SPEC2k6 and SPEC2017 for
SVE and x86_64-linux-gnu.

Richard


Re: [PATCH] Frame pointer for arm with THUMB2 mode

2018-08-28 Thread Denis Khalikov

Hi,
thanks for the answer.

> Switching on the frame pointer typically costs 1-2% performance, so 
it's a bad
> idea to use it. However changing the frame pointer like in the 
proposed patch
> will have a much larger cost - both in performance and codesize. 
You'd be
> lucky if it is less than 10%. This is due to placing the frame 
pointer at the top
> rather than the bottom of the frame, and that is very inefficient in 
Thumb-2.

>
> You would need to unwind ~100k times a second before you might see a
> performance gain. However you pay the performance cost all the time, even
> when no unwinding is required. So I don't see this as being a good idea.
>
> If unwind performance is an issue, it would make far more sense to 
solve that.
> Profiling typically hits the same functions again and again. 
Callgraph profiling to
> a fixed depth hits the same functions all the time. So caching is the 
obvious

> solution.

We are working on applying Address/LeakSanitizer for the full Tizen OS
distribution. It's about ~1000 packages, ASan/LSan runtime is installed 
to ld.so.preload. As we know ASan/LSan has interceptors for 
allocators/deallocators such as (malloc/realloc/calloc/free) and so on.

On every allocation from user space program, ASan calls
GET_STACK_TRACE_MALLOC;
which unwinds the stack frame, and by default uses frame based stack
unwinder. So, it requires to build with "-fno-omit-frame-pointer", 
switching it to default unwinder really hits the performance in our case.


> Doing real unwinding is also far more accurate than frame pointer based
> unwinding (the latter doesn't handle leaf functions correctly, 
entry/exit in
> non-leaf functions and shrinkwrapped functions - and this breaks 
callgraph

> profiling).

I agree, but in our case, all interceptors for allocators are
leaf functions, so the frame based stack unwinder works well for us.

> So my question is, is there any point in making code run 
significantly slower

> all the time and in return get inaccurate unwinding?

By default we build packages with ("-marm" "-fno-omit-frame-pointer"),
because need frame based stack unwinder for every allocation, as I said
before. As we know GCC sets fp to lr on the stack with 
("-fno-omit-frame-pointer" and "-marm") and I don't really know why.
But the binary size is bigger than for thumb, so, we cannot use default 
thumb frame pointer and want to reduce binary size for the full 
sanitized image.


In other case clang works the same way, as I offered at the patch.
It has the same issue, but it was fixed at the end of 2017
https://bugs.llvm.org/show_bug.cgi?id=18505 (The topics starts from
discussion about APCS, but it is not the main point.)

Also, unresolved issue related to this
https://github.com/google/sanitizers/issues/640

Thanks.


On 08/28/2018 03:48 AM, Wilco Dijkstra wrote:

Hi,


But we still have an issue with performance, when we are using default
unwinder, which uses unwind tables. It could be up to 10 times faster to
use frame based stack unwinder instead "default unwinder".


Switching on the frame pointer typically costs 1-2% performance, so it's a bad
idea to use it. However changing the frame pointer like in the proposed patch
will have a much larger cost - both in performance and codesize. You'd be
lucky if it is less than 10%. This is due to placing the frame pointer at the 
top
rather than the bottom of the frame, and that is very inefficient in Thumb-2.

You would need to unwind ~100k times a second before you might see a
performance gain. However you pay the performance cost all the time, even
when no unwinding is required. So I don't see this as being a good idea.

If unwind performance is an issue, it would make far more sense to solve that.
Profiling typically hits the same functions again and again. Callgraph 
profiling to
a fixed depth hits the same functions all the time. So caching is the obvious
solution.

Doing real unwinding is also far more accurate than frame pointer based
unwinding (the latter doesn't handle leaf functions correctly, entry/exit in
non-leaf functions and shrinkwrapped functions - and this breaks callgraph
profiling).

So my question is, is there any point in making code run significantly slower
all the time and in return get inaccurate unwinding?

Cheers,
Wilco



Re: Add target selectors to slp-37.c (PR87078)

2018-08-28 Thread Richard Biener
On Tue, Aug 28, 2018 at 12:31 PM Richard Sandiford
 wrote:
>
> This test was failing for Power 7 due to the lack of hw support
> for unaligned accesses.
>
> Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
> OK to install?

OK.

> Richard
>
>
> 2018-08-28  Richard Sandiford  
>
> gcc/testsuite/
> PR testsuite/87078
> * gcc.dg/vect/slp-37.c: Restrict scan tests to vect_hw_misalign.
>
> Index: gcc/testsuite/gcc.dg/vect/slp-37.c
> ===
> --- gcc/testsuite/gcc.dg/vect/slp-37.c  2018-08-22 13:58:57.492110344 +0100
> +++ gcc/testsuite/gcc.dg/vect/slp-37.c  2018-08-28 11:26:10.178673982 +0100
> @@ -58,5 +58,5 @@ int main (void)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
> -/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  
> } } */
> +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
> vect_hw_misalign } } } */
> +/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" 
> { target vect_hw_misalign } } } */


Re: Fix unguarded use of tree_to_shwi in tree-ssa-sccvn.c

2018-08-28 Thread Richard Biener
On Tue, Aug 28, 2018 at 12:39 PM Richard Sandiford
 wrote:
>
> Fixes many testsuite failures for SVE.
>
> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
> and x86_64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> 2018-08-28  Richard Sandiford  
>
> gcc/
> * tree-ssa-sccvn.c (fully_constant_vn_reference_p): Fix unguarded
> use of tree_to_shwi.  Remove duplicated test for the size being
> a whole number of bytes.
>
> Index: gcc/tree-ssa-sccvn.c
> ===
> --- gcc/tree-ssa-sccvn.c2018-08-28 11:38:22.0 +0100
> +++ gcc/tree-ssa-sccvn.c2018-08-28 11:38:22.676512354 +0100
> @@ -1409,16 +1409,16 @@ fully_constant_vn_reference_p (vn_refere
>/* Simplify reads from constants or constant initializers.  */
>else if (BITS_PER_UNIT == 8
>&& COMPLETE_TYPE_P (ref->type)
> -  && is_gimple_reg_type (ref->type)
> -  && (!INTEGRAL_TYPE_P (ref->type)
> -  || TYPE_PRECISION (ref->type) % BITS_PER_UNIT == 0))
> +  && is_gimple_reg_type (ref->type))
>  {
>poly_int64 off = 0;
>HOST_WIDE_INT size;
>if (INTEGRAL_TYPE_P (ref->type))
> size = TYPE_PRECISION (ref->type);
> -  else
> +  else if (tree_fits_shwi_p (TYPE_SIZE (ref->type)))
> size = tree_to_shwi (TYPE_SIZE (ref->type));
> +  else
> +   return NULL_TREE;
>if (size % BITS_PER_UNIT != 0
>   || size > MAX_BITSIZE_MODE_ANY_MODE)
> return NULL_TREE;


Fix unguarded use of tree_to_shwi in tree-ssa-sccvn.c

2018-08-28 Thread Richard Sandiford
Fixes many testsuite failures for SVE.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2018-08-28  Richard Sandiford  

gcc/
* tree-ssa-sccvn.c (fully_constant_vn_reference_p): Fix unguarded
use of tree_to_shwi.  Remove duplicated test for the size being
a whole number of bytes.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c2018-08-28 11:38:22.0 +0100
+++ gcc/tree-ssa-sccvn.c2018-08-28 11:38:22.676512354 +0100
@@ -1409,16 +1409,16 @@ fully_constant_vn_reference_p (vn_refere
   /* Simplify reads from constants or constant initializers.  */
   else if (BITS_PER_UNIT == 8
   && COMPLETE_TYPE_P (ref->type)
-  && is_gimple_reg_type (ref->type)
-  && (!INTEGRAL_TYPE_P (ref->type)
-  || TYPE_PRECISION (ref->type) % BITS_PER_UNIT == 0))
+  && is_gimple_reg_type (ref->type))
 {
   poly_int64 off = 0;
   HOST_WIDE_INT size;
   if (INTEGRAL_TYPE_P (ref->type))
size = TYPE_PRECISION (ref->type);
-  else
+  else if (tree_fits_shwi_p (TYPE_SIZE (ref->type)))
size = tree_to_shwi (TYPE_SIZE (ref->type));
+  else
+   return NULL_TREE;
   if (size % BITS_PER_UNIT != 0
  || size > MAX_BITSIZE_MODE_ANY_MODE)
return NULL_TREE;


Add target selectors to slp-37.c (PR87078)

2018-08-28 Thread Richard Sandiford
This test was failing for Power 7 due to the lack of hw support
for unaligned accesses.

Tested on aarch64-linux-gnu, x86_64-linux-gnu and powerpc64le-linux-gnu.
OK to install?

Richard


2018-08-28  Richard Sandiford  

gcc/testsuite/
PR testsuite/87078
* gcc.dg/vect/slp-37.c: Restrict scan tests to vect_hw_misalign.

Index: gcc/testsuite/gcc.dg/vect/slp-37.c
===
--- gcc/testsuite/gcc.dg/vect/slp-37.c  2018-08-22 13:58:57.492110344 +0100
+++ gcc/testsuite/gcc.dg/vect/slp-37.c  2018-08-28 11:26:10.178673982 +0100
@@ -58,5 +58,5 @@ int main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect"  } } */
-/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect"  } 
} */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target 
vect_hw_misalign } } } */
+/* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
target vect_hw_misalign } } } */


Re: [PATCH] Strenghten assumption about gswitch statements.

2018-08-28 Thread Richard Biener
On Tue, Aug 28, 2018 at 9:11 AM Martin Liška  wrote:
>
> On 08/27/2018 05:21 PM, Richard Biener wrote:
> > On Mon, Aug 27, 2018 at 4:05 PM Martin Liška  wrote:
> >>
> >> Hi.
> >>
> >> Now we should not meet a degenerated gswitch statements.
> >>
> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> Ready to be installed?
> >
> > Hum.  This relies on us doing CFG cleanup.  Do we really want
> > find_taken_edge to ICE when called between CFG construction
> > and the first such call?  That is, I think this kind of "verification"
> > belongs in the CFG verifier.  Note that there's GIMPLE switches
> > before we have a CFG and those are not sanitized (at least
> > not that I know of).  So we cannot really declare single-case
> > switch stmts invalid?
>
> Thanks for explanation. In case of switch_conversion::expand, may I apply
> patch for that? It should not see a degenerated switch, or do I miss
> something?

Hopefully not.

So that part is OK.

Richard.

> Martin
>
> >
> > Thanks,
> > Richard.
> >
> >> Martin
> >>
> >> gcc/ChangeLog:
> >>
> >> 2018-08-27  Martin Liska  
> >>
> >> * tree-cfg.c (find_taken_edge_switch_expr): Replace not possible
> >> condition with assert.
> >> * tree-switch-conversion.c (switch_conversion::expand):
> >> Likewise.
> >> ---
> >>  gcc/tree-cfg.c   | 17 +++--
> >>  gcc/tree-switch-conversion.c |  9 +
> >>  2 files changed, 8 insertions(+), 18 deletions(-)
> >>
> >>
>


Re: [PATCH RFC] add generic expansion for MULT_HIGHPART_EXP

2018-08-28 Thread Richard Biener
On Tue, Aug 28, 2018 at 7:59 AM Alexander Monakov  wrote:
>
> > > > So - how difficult is it to fix BRIG to not use MULT_HIGHPART_EXPR if
> > > > not supported?
>
> Richard, how should we proceed from here?  Do you like the solution in the
> initial mail, or would you prefer something else?  FWIW I agree with Pekka,
> no need to burden BRIG FE with expanding mul-highpart.

I think that if we want to support MULT_HIGHPART generally then we need
to support it for all modes - what's your plan for 32bit targets here?  This
means providing libgcc fallback implementations for modes we cannot directly
expand to.

I'd also like to see better support for MULT_HIGHPART then, for example
verify_gimple_assign_binary does nothing (TODO).

As for the expansion code I wonder if handling it in expand_binop would
be better?  Do we want to expose highpart multiply to users somehow?

Thanks,
Richard.

> > > Pekka, can you comment? I think you have fallback paths for vector types
> > > only at the moment?
> >
> > I think it involves pretty much moving the code of your patch to the
> > BRIG frontend.
> > To me it'd look a bit wrong in terms of "division of responsibilities"
> > as this is not really
> > frontend-specific as far as I understand (even if BRIG/HSAIL happened
> > to be the only language
> > supporting them currently).
> >
> > > Does BRIG have mult-highpart for 64-bit integers? On 32-bit targets we
> > > won't be able to easily expand them (but on 64-bit targets it is fine).
> >
> > Yes it does. 32b targets are not high priority for BRIG FE at this
> > point, so I wouldn't
> > worry about this as we assume HSA's "large" model is used, so this is 
> > likely not
> > the only problem when trying to generate for 32b machines.
> >
> > > For scalar types I think we should prefer to implement a generic expansion
> > > rather than have the frontend query the backend. For vector types I am not
> > > sure.
> >
> > In my relatively gcc-uneducated humble opinion these both belong more
> > naturally to a
> > target-specific expansion or "legalization" pass, which tries to
> > convert tree nodes to what the target
> > can support.
> >
> > BR,
> > Pekka
> >


Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

2018-08-28 Thread Richard Biener
On Tue, Aug 28, 2018 at 11:55 AM Richard Biener
 wrote:
>
> On Tue, Aug 28, 2018 at 6:27 AM Jeff Law  wrote:
> >
> > On 08/27/2018 10:27 AM, Martin Sebor wrote:
> > > On 08/27/2018 02:29 AM, Richard Biener wrote:
> > >> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law  wrote:
> > >>>
> > >>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
> >  The warning suppression for -Wstringop-truncation looks for
> >  the next statement after a truncating strncpy to see if it
> >  adds a terminating nul.  This only works when the next
> >  statement can be reached using the Gimple statement iterator
> >  which isn't until after gimplification.  As a result, strncpy
> >  calls that truncate their constant argument that are being
> >  folded to memcpy this early get diagnosed even if they are
> >  followed by the nul assignment:
> > 
> >    const char s[] = "12345";
> >    char d[3];
> > 
> >    void f (void)
> >    {
> >  strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
> >  d[sizeof d - 1] = 0;
> >    }
> > 
> >  To avoid the warning I propose to defer folding strncpy to
> >  memcpy until the pointer to the basic block the strnpy call
> >  is in can be used to try to reach the next statement (this
> >  happens as early as ccp1).  I'm aware of the preference to
> >  fold things early but in the case of strncpy (a relatively
> >  rarely used function that is often misused), getting
> >  the warning right while folding a bit later but still fairly
> >  early on seems like a reasonable compromise.  I fear that
> >  otherwise, the false positives will drive users to adopt
> >  other unsafe solutions (like memcpy) where these kinds of
> >  bugs cannot be as readily detected.
> > 
> >  Tested on x86_64-linux.
> > 
> >  Martin
> > 
> >  PS There still are outstanding cases where the warning can
> >  be avoided.  I xfailed them in the test for now but will
> >  still try to get them to work for GCC 9.
> > 
> >  gcc-87028.diff
> > 
> > 
> >  PR tree-optimization/87028 - false positive -Wstringop-truncation
> >  strncpy with global variable source string
> >  gcc/ChangeLog:
> > 
> >    PR tree-optimization/87028
> >    * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
> >    statement doesn't belong to a basic block.
> >    * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
> >    the left hand side of assignment.
> > 
> >  gcc/testsuite/ChangeLog:
> > 
> >    PR tree-optimization/87028
> >    * c-c++-common/Wstringop-truncation.c: Remove xfails.
> >    * gcc.dg/Wstringop-truncation-5.c: New test.
> > 
> >  diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
> >  index 07341eb..284c2fb 100644
> >  --- a/gcc/gimple-fold.c
> >  +++ b/gcc/gimple-fold.c
> >  @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
> >  (gimple_stmt_iterator *gsi,
> > if (tree_int_cst_lt (ssize, len))
> >   return false;
> > 
> >  +  /* Defer warning (and folding) until the next statement in the basic
> >  + block is reachable.  */
> >  +  if (!gimple_bb (stmt))
> >  +return false;
> > >>> I think you want cfun->cfg as the test here.  They should be equivalent
> > >>> in practice.
> > >>
> > >> Please do not add 'cfun' references.  Note that the next stmt is also
> > >> accessible
> > >> when there is no CFG.  I guess the issue is that we fold this during
> > >> gimplification
> > >> where the next stmt is not yet "there" (but still in GENERIC)?
> > >>
> > >> We generally do not want to have unfolded stmts in the IL when we can
> > >> avoid that
> > >> which is why we fold most stmts during gimplification.  We also do
> > >> that because
> > >> we now do less folding on GENERIC.
> > >>
> > >> There may be the possibility to refactor gimplification time folding
> > >> to what we
> > >> do during inlining - queue stmts we want to fold and perform all
> > >> folding delayed.
> > >> This of course means bigger compile-time due to cache effects.
> > >>
> > >>>
> >  diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
> >  index d0792aa..f1988f6 100644
> >  --- a/gcc/tree-ssa-strlen.c
> >  +++ b/gcc/tree-ssa-strlen.c
> >  @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
> >  (gimple_stmt_iterator gsi, tree src, tree cnt)
> >  && known_eq (dstoff, lhsoff)
> >  && operand_equal_p (dstbase, lhsbase, 0))
> >    return false;
> >  +
> >  +  if (code == MEM_REF
> >  +   && TREE_CODE (lhsbase) == SSA_NAME
> >  +   && known_eq (dstoff, lhsoff))
> >  + {
> >  +   /* Extract the referenced variable from something like
> >  +MEM[(char *)d_3(D) + 3B] = 0;  */
> >  +   gimple *def = 

Re: [PATCH] avoid warning on constant strncpy until next statement is reachable (PR 87028)

2018-08-28 Thread Richard Biener
On Tue, Aug 28, 2018 at 6:27 AM Jeff Law  wrote:
>
> On 08/27/2018 10:27 AM, Martin Sebor wrote:
> > On 08/27/2018 02:29 AM, Richard Biener wrote:
> >> On Sun, Aug 26, 2018 at 7:26 AM Jeff Law  wrote:
> >>>
> >>> On 08/24/2018 09:58 AM, Martin Sebor wrote:
>  The warning suppression for -Wstringop-truncation looks for
>  the next statement after a truncating strncpy to see if it
>  adds a terminating nul.  This only works when the next
>  statement can be reached using the Gimple statement iterator
>  which isn't until after gimplification.  As a result, strncpy
>  calls that truncate their constant argument that are being
>  folded to memcpy this early get diagnosed even if they are
>  followed by the nul assignment:
> 
>    const char s[] = "12345";
>    char d[3];
> 
>    void f (void)
>    {
>  strncpy (d, s, sizeof d - 1);   // -Wstringop-truncation
>  d[sizeof d - 1] = 0;
>    }
> 
>  To avoid the warning I propose to defer folding strncpy to
>  memcpy until the pointer to the basic block the strnpy call
>  is in can be used to try to reach the next statement (this
>  happens as early as ccp1).  I'm aware of the preference to
>  fold things early but in the case of strncpy (a relatively
>  rarely used function that is often misused), getting
>  the warning right while folding a bit later but still fairly
>  early on seems like a reasonable compromise.  I fear that
>  otherwise, the false positives will drive users to adopt
>  other unsafe solutions (like memcpy) where these kinds of
>  bugs cannot be as readily detected.
> 
>  Tested on x86_64-linux.
> 
>  Martin
> 
>  PS There still are outstanding cases where the warning can
>  be avoided.  I xfailed them in the test for now but will
>  still try to get them to work for GCC 9.
> 
>  gcc-87028.diff
> 
> 
>  PR tree-optimization/87028 - false positive -Wstringop-truncation
>  strncpy with global variable source string
>  gcc/ChangeLog:
> 
>    PR tree-optimization/87028
>    * gimple-fold.c (gimple_fold_builtin_strncpy): Avoid folding when
>    statement doesn't belong to a basic block.
>    * tree-ssa-strlen.c (maybe_diag_stxncpy_trunc): Handle MEM_REF on
>    the left hand side of assignment.
> 
>  gcc/testsuite/ChangeLog:
> 
>    PR tree-optimization/87028
>    * c-c++-common/Wstringop-truncation.c: Remove xfails.
>    * gcc.dg/Wstringop-truncation-5.c: New test.
> 
>  diff --git a/gcc/gimple-fold.c b/gcc/gimple-fold.c
>  index 07341eb..284c2fb 100644
>  --- a/gcc/gimple-fold.c
>  +++ b/gcc/gimple-fold.c
>  @@ -1702,6 +1702,11 @@ gimple_fold_builtin_strncpy
>  (gimple_stmt_iterator *gsi,
> if (tree_int_cst_lt (ssize, len))
>   return false;
> 
>  +  /* Defer warning (and folding) until the next statement in the basic
>  + block is reachable.  */
>  +  if (!gimple_bb (stmt))
>  +return false;
> >>> I think you want cfun->cfg as the test here.  They should be equivalent
> >>> in practice.
> >>
> >> Please do not add 'cfun' references.  Note that the next stmt is also
> >> accessible
> >> when there is no CFG.  I guess the issue is that we fold this during
> >> gimplification
> >> where the next stmt is not yet "there" (but still in GENERIC)?
> >>
> >> We generally do not want to have unfolded stmts in the IL when we can
> >> avoid that
> >> which is why we fold most stmts during gimplification.  We also do
> >> that because
> >> we now do less folding on GENERIC.
> >>
> >> There may be the possibility to refactor gimplification time folding
> >> to what we
> >> do during inlining - queue stmts we want to fold and perform all
> >> folding delayed.
> >> This of course means bigger compile-time due to cache effects.
> >>
> >>>
>  diff --git a/gcc/tree-ssa-strlen.c b/gcc/tree-ssa-strlen.c
>  index d0792aa..f1988f6 100644
>  --- a/gcc/tree-ssa-strlen.c
>  +++ b/gcc/tree-ssa-strlen.c
>  @@ -1981,6 +1981,23 @@ maybe_diag_stxncpy_trunc
>  (gimple_stmt_iterator gsi, tree src, tree cnt)
>  && known_eq (dstoff, lhsoff)
>  && operand_equal_p (dstbase, lhsbase, 0))
>    return false;
>  +
>  +  if (code == MEM_REF
>  +   && TREE_CODE (lhsbase) == SSA_NAME
>  +   && known_eq (dstoff, lhsoff))
>  + {
>  +   /* Extract the referenced variable from something like
>  +MEM[(char *)d_3(D) + 3B] = 0;  */
>  +   gimple *def = SSA_NAME_DEF_STMT (lhsbase);
>  +   if (gimple_nop_p (def))
>  + {
>  +   lhsbase = SSA_NAME_VAR (lhsbase);
>  +   if (lhsbase
>  +   && dstbase
>  +   && operand_equal_p (dstbase, lhsbase, 0))
>  + return 

Re: [PATCH] treat -Wxxx-larger-than=HWI_MAX special (PR 86631)

2018-08-28 Thread Richard Biener
On Tue, Aug 28, 2018 at 4:37 AM Martin Sebor  wrote:
>
> Richard, please let me know if the patch is acceptable as is
> (with the RejectNegative property added).  As I said, I realize
> it's not ideal, but neither is any of the alternatives we have
> discussed.  They all involve trade- offs, and I think they would
> all make the behavior of the options less regular/useful, and
> the ideal solution would likely be too intrusive and laborious
> to implement.
>
> I leave for Cauldron this Friday so I'm hoping we can get this
> wrapped up before then.

OK, I'm going out of the way.

OK as-is.

Richard.

> Thanks
> Martin
>
> On 08/24/2018 09:13 AM, Martin Sebor wrote:
> > On 08/24/2018 03:36 AM, Richard Biener wrote:
> >> On Fri, Aug 24, 2018 at 12:35 AM Martin Sebor  wrote:
> >>>
> >>> On 08/23/2018 07:18 AM, Richard Biener wrote:
>  On Thu, Aug 23, 2018 at 12:20 AM Martin Sebor  wrote:
> >
> > On 08/20/2018 06:14 AM, Richard Biener wrote:
> >> On Thu, Jul 26, 2018 at 10:52 PM Martin Sebor 
> >> wrote:
> >>>
> >>> On 07/26/2018 08:58 AM, Martin Sebor wrote:
>  On 07/26/2018 02:38 AM, Richard Biener wrote:
> > On Wed, Jul 25, 2018 at 5:54 PM Martin Sebor 
> > wrote:
> >>
> >> On 07/25/2018 08:57 AM, Jakub Jelinek wrote:
> >>> On Wed, Jul 25, 2018 at 08:54:13AM -0600, Martin Sebor wrote:
>  I don't mean for the special value to be used except internally
>  for the defaults.  Otherwise, users wanting to override the
>  default
>  will choose a value other than it.  I'm happy to document it in
>  the .opt file for internal users though.
> 
>  -1 has the documented effect of disabling the warnings
>  altogether
>  (-1 is SIZE_MAX) so while I agree that -1 looks better it
>  doesn't
>  work.  (It would need more significant changes.)
> >>>
> >>> The variable is signed, so -1 is not SIZE_MAX.  Even if -1
> >>> disables
> >>> it, you
> >>> could use e.g. -2 or other negative value for the other
> >>> special case.
> >>
> >> The -Wxxx-larger-than=N distinguish three ranges of argument
> >> values (treated as unsigned):
> >>
> >>1.  [0, HOST_WIDE_INT_MAX)
> >>2.  HOST_WIDE_INT_MAX
> >>3.  [HOST_WIDE_INT_MAX + 1, Infinity)
> >
> > But it doesn't make sense for those to be host dependent.
> 
>  It isn't when the values are handled by each warning.  That's
>  also the point of this patch: to remove this (unintended)
>  dependency.
> 
> > I think numerical user input should be limited to [0, ptrdiff_max]
> > and cases (1) and (2) should be simply merged, I see no value
> > in distinguishing them.  -Wxxx-larger-than should be aliased
> > to [0, ptrdiff_max], case (3) is achieved by -Wno-xxx-larger-than.
> >>>
> >>> To be clear: this is also close to what this patch does.
> >>>
> >>> The only wrinkle is that we don't know the value of PTRDIFF_MAX
> >>> either at the time the option initial value is set in the .opt
> >>> file or when the option is processed when it's specified either
> >>> on the command line or as an alias in the .opt file (as all
> >>> -Wno-xxx-larger-than options are).
> >>
> >> But then why not make that special value accessible and handle
> >> it as PTRDIFF_MAX when that is available (at users of the params)?
> >>
> >> That is,
> >>
> >> Index: gcc/calls.c
> >> ===
> >> --- gcc/calls.c (revision 262951)
> >> +++ gcc/calls.c (working copy)
> >> @@ -1222,9 +1222,12 @@ alloc_max_size (void)
> >>if (alloc_object_size_limit)
> >>  return alloc_object_size_limit;
> >>
> >> -  alloc_object_size_limit
> >> -= build_int_cst (size_type_node, warn_alloc_size_limit);
> >> +  HOST_WIDE_INT limit = warn_alloc_size_limit;
> >> +  if (limit == HOST_WIDE_INT_MAX)
> >> +limit = tree_to_shwi (TYPE_MAX_VALUE (ptrdiff_type_node));
> >>
> >> +  alloc_object_size_limit = build_int_cst (size_type_node, limit);
> >> +
> >>return alloc_object_size_limit;
> >>  }
> >>
> >> use sth like
> >>
> >>  if (warn_alloc_size_limit == -1)
> >>alloc_object_size_limit = fold_convert (size_type_node,
> >> TYPE_MAX_VALUE (ptrdiff_type_node));
> >>  else
> >>alloc_object_size_limit = size_int (warn_alloc_size_limit);
> >>
> >> ?  Also removing the need to have > int params values.
> >
> > Not sure I understand this last part.  Remove the enhancement?
> > (We do need to handle option arguments in excess of INT_MAX.)
> 
>  I see.
> 
> >>
> >> It's that 

Re: [PATCH] Fix some bugs in maybe_warn_nonstring_arg (PR middle-end/87099)

2018-08-28 Thread Richard Biener
On Tue, 28 Aug 2018, Jakub Jelinek wrote:

> Hi!
> 
> The following patch fixes some bugs in maybe_warn_nonstring_arg.
> The testcase ICEs, because lenrng[1] is a PLUS_EXPR, but the code assumes
> without checking that it must be INTEGER_CST and feeds it into
> tree_int_cst_lt.  If the upper bound is NULL or is some expression
> other than INTEGER_CST, we can't do anything useful with it, so should
> just treat it as unknown bound (but e.g. if the first argument doesn't
> have a bound or has a complex expression as bound, but the second one
> has INTEGER_CST upper bound, we can use that).
> Additionally, the code doesn't ever care about lenrng[0] (lower bound),
> only lenrng[1] (upper bound), so I've changed the code to check that that
> bound is non-NULL and INTEGER_CST.
> Because nothing uses lenrng[0], it is wasted work to add 1 to it using
> const_binop and the whole function is about -Wstringop-overflow{,=} warning,
> so if !warn_stringop_overflow, there is no reason to do all of it and we can
> bail out early.
> I've also noticed that it can call get_range_strlen on the length argument
> of strncmp/strncasecmp, that doesn't make sense either.
> And last, just a formatting fix to put constants on rhs of comparison.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Richard.

> 2018-08-27  Jakub Jelinek  
> 
>   PR middle-end/87099
>   * calls.c (maybe_warn_nonstring_arg): Punt early if
>   warn_stringop_overflow is zero.  Don't call get_range_strlen
>   on 3rd argument, keep iterating until lenrng[1] is INTEGER_CST.
>   Swap comparison operands to have constants on rhs.  Only use
>   lenrng[1] if non-NULL and INTEGER_CST.  Don't uselessly
>   increment lenrng[0].
> 
>   * gcc.dg/pr87099.c: New test.
> 
> --- gcc/calls.c.jj2018-08-26 22:42:22.525779600 +0200
> +++ gcc/calls.c   2018-08-27 14:34:07.959235490 +0200
> @@ -1545,7 +1545,7 @@ maybe_warn_nonstring_arg (tree fndecl, t
>if (!fndecl || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
>  return;
>  
> -  if (TREE_NO_WARNING (exp))
> +  if (TREE_NO_WARNING (exp) || !warn_stringop_overflow)
>  return;
>  
>unsigned nargs = call_expr_nargs (exp);
> @@ -1573,7 +1573,9 @@ maybe_warn_nonstring_arg (tree fndecl, t
>  the range of their known or possible lengths and use it
>  conservatively as the bound for the unbounded function,
>  and to adjust the range of the bound of the bounded ones.  */
> - for (unsigned argno = 0; argno < nargs && !*lenrng; argno ++)
> + for (unsigned argno = 0;
> +  argno < MIN (nargs, 2)
> +  && !(lenrng[1] && TREE_CODE (lenrng[1]) == INTEGER_CST); argno++)
> {
>   tree arg = CALL_EXPR_ARG (exp, argno);
>   if (!get_attr_nonstring_decl (arg))
> @@ -1585,12 +1587,12 @@ maybe_warn_nonstring_arg (tree fndecl, t
>  case BUILT_IN_STRNCAT:
>  case BUILT_IN_STPNCPY:
>  case BUILT_IN_STRNCPY:
> -  if (2 < nargs)
> +  if (nargs > 2)
>   bound = CALL_EXPR_ARG (exp, 2);
>break;
>  
>  case BUILT_IN_STRNDUP:
> -  if (1 < nargs)
> +  if (nargs > 1)
>   bound = CALL_EXPR_ARG (exp, 1);
>break;
>  
> @@ -1600,7 +1602,7 @@ maybe_warn_nonstring_arg (tree fndecl, t
>   if (!get_attr_nonstring_decl (arg))
> get_range_strlen (arg, lenrng);
>  
> - if (1 < nargs)
> + if (nargs > 1)
> bound = CALL_EXPR_ARG (exp, 1);
>   break;
>}
> @@ -1640,11 +1642,9 @@ maybe_warn_nonstring_arg (tree fndecl, t
>   }
>  }
>  
> -  if (*lenrng)
> +  if (lenrng[1] && TREE_CODE (lenrng[1]) == INTEGER_CST)
>  {
>/* Add one for the nul.  */
> -  lenrng[0] = const_binop (PLUS_EXPR, TREE_TYPE (lenrng[0]),
> -lenrng[0], size_one_node);
>lenrng[1] = const_binop (PLUS_EXPR, TREE_TYPE (lenrng[1]),
>  lenrng[1], size_one_node);
>  
> --- gcc/testsuite/gcc.dg/pr87099.c.jj 2018-08-27 14:36:30.220856712 +0200
> +++ gcc/testsuite/gcc.dg/pr87099.c2018-08-27 14:36:06.475253767 +0200
> @@ -0,0 +1,21 @@
> +/* PR middle-end/87099 */
> +/* { dg-do compile } */
> +/* { dg-options "-Wstringop-overflow" } */
> +
> +void bar (char *);
> +
> +int
> +foo (int n)
> +{
> +  char v[n];
> +  bar (v);
> +  return __builtin_strncmp ([1], "aaa", 3);
> +}
> +
> +int
> +baz (int n, char *s)
> +{
> +  char v[n];
> +  bar (v);
> +  return __builtin_strncmp ([1], s, 3);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [committed] Fix minor bug in recently added sanity test in tree-ssa-dse.c

2018-08-28 Thread Richard Biener
On Mon, Aug 27, 2018 at 10:31 PM Jeff Law  wrote:
>
>
> We recently changes tree-ssa-dse.c to not trim stores outside the bounds
> of the referenced object.  This is generally a good thing.
>
> However, there are cases where the object doesn't have a usable size.
> We see this during kernel builds, at least on the microblaze target.
>
> We've got...
>
> _1 = p_47(D)->stack;
> childregs_48 = [(void *)_1 + 8040B];
> [ ... ]
> memset (childregs_48, 0, 152);
> [ ... ]
> MEM[(struct pt_regs *)_1 + 8040B].pt_mode = 1;
>
>
> We want to trim the memset call as the assignment to pt_mode is the last
> word written by the memset call, thus making the store of that word via
> memset dead.
>
> Our ref->base is:
>
> (gdb) p debug_tree ($66)
>   type  align:8 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x7fffe9e210a8
> pointer_to_this >
>
> arg:0  type  void>
> public unsigned SI
> size 
> unit-size 
> align:32 warn_if_not_align:0 symtab:0 alias-set 8
> canonical-type 0x7fffe9e21150 context  0x7fffe8f72438 /tmp/process.i>
> pointer_to_this 
> reference_to_this >
> visited
> def_stmt _1 = p_47(D)->stack;
> version:1
> ptr-info 0x7fffe8f65fa8>
> arg:1 
> constant 8040>>
>
>
> Note the void type.  Those do not have a suitable TYPE_SIZE_UNIT, thus
> causing an ICE when we try to reference it within compute_trims:
> /* But don't trim away out of bounds accesses, as this defeats
>  proper warnings.  */
>   if (*trim_tail
>   && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (ref->base)),
>last_orig) <= 0)
>
>
> The fix is obvious enough.  Don't do the check when the underlying type
> has no usable TYPE_SIZE_UNIT.
>
> I pondered moving the tests into the code that determines whether or not
> we do byte tracking DSE, but decided the current location was fine.
>
> Bootstrapped and regression tested on x86.  Also verified the original
> testfile will build with a microblaze cross compiler.
>
> Installing on the trunk momentarily.

Hmm, you seem to get ref->base from an address but you know types
on addresses do not have any meaning, so ...  how does

  /* But don't trim away out of bounds accesses, as this defeats
 proper warnings.

 We could have a type with no TYPE_SIZE_UNIT or we could have a VLA
 where TYPE_SIZE_UNIT is not a constant.  */
  if (*trim_tail
  && TYPE_SIZE_UNIT (TREE_TYPE (ref->base))
  && TREE_CODE (TYPE_SIZE_UNIT (TREE_TYPE (ref->base))) == INTEGER_CST
  && compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (ref->base)),
   last_orig) <= 0)
*trim_tail = 0;

make any sense in the above case where ref->base is even based on a
pointer?  I'd say the
above should be at least

if (*trim_tail
&& DECL_P (ref->base)
&& 

?  Not sure if we ever have decls with incomplete types so eventually
the new check
isn't needed.

Richard.

> jeff


Re: VRP: abstract out wide int CONVERT_EXPR_P code

2018-08-28 Thread Aldy Hernandez




On 08/28/2018 05:27 AM, Richard Biener wrote:

On Mon, Aug 27, 2018 at 2:24 PM Aldy Hernandez  wrote:


Howdy!

Phew, I think this is the last abstraction.  This handles the unary
CONVERT_EXPR_P code.

It's the usual story-- normalize the symbolics to [-MIN,+MAX] and handle
everything generically.

Normalizing the symbolics brought about some nice surprises.  We now
handle a few things we were punting on before, which I've documented in
the patch, but can remove if so desired.  I wrote them mainly for myself:

/* NOTES: Previously we were returning VARYING for all symbolics, but
 we can do better by treating them as [-MIN, +MAX].  For
 example, converting [SYM, SYM] from INT to LONG UNSIGNED,
 we can return: ~[0x800, 0x7fff].

 We were also failing to convert ~[0,0] from char* to unsigned,
 instead choosing to return VR_VARYING.  Now we return ~[0,0].  */

Tested on x86-64 by the usual bootstrap and regtest gymnastics,
including --enable-languages=all, because my past sins are still
haunting me.

OK?


The new wide_int_range_convert_tree looks odd given it returns
tree's.  I'd have expected an API that does the conversion resulting
in a wide_int range and the VRP code adapting to that by converting
the result to trees.


Hmmm, yeah.  I agree.  I'll think about this some more and see what I 
can come up with.


Thanks.
Aldy


Re: VRP: abstract out wide int CONVERT_EXPR_P code

2018-08-28 Thread Richard Biener
On Mon, Aug 27, 2018 at 2:24 PM Aldy Hernandez  wrote:
>
> Howdy!
>
> Phew, I think this is the last abstraction.  This handles the unary
> CONVERT_EXPR_P code.
>
> It's the usual story-- normalize the symbolics to [-MIN,+MAX] and handle
> everything generically.
>
> Normalizing the symbolics brought about some nice surprises.  We now
> handle a few things we were punting on before, which I've documented in
> the patch, but can remove if so desired.  I wrote them mainly for myself:
>
> /* NOTES: Previously we were returning VARYING for all symbolics, but
> we can do better by treating them as [-MIN, +MAX].  For
> example, converting [SYM, SYM] from INT to LONG UNSIGNED,
> we can return: ~[0x800, 0x7fff].
>
> We were also failing to convert ~[0,0] from char* to unsigned,
> instead choosing to return VR_VARYING.  Now we return ~[0,0].  */
>
> Tested on x86-64 by the usual bootstrap and regtest gymnastics,
> including --enable-languages=all, because my past sins are still
> haunting me.
>
> OK?

The new wide_int_range_convert_tree looks odd given it returns
tree's.  I'd have expected an API that does the conversion resulting
in a wide_int range and the VRP code adapting to that by converting
the result to trees.

Richard.


Re: [PATCH] Add a character size parameter to c_strlen/get_range_strlen

2018-08-28 Thread Bernd Edlinger
On 08/28/18 04:55, Jeff Law wrote:
> On 08/23/2018 08:48 AM, Bernd Edlinger wrote:
>> On 08/23/18 16:24, Jeff Law wrote:

 Yes, and which one was the earlier, more controversial patch from me?
>>>
>>> https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01800.html
>>>
>>>
>>> Which is the issue I'm working through right now :-)
>>>
>>
>> Okay, please note that a re-based patch is here:
>>
>> https://gcc.gnu.org/ml/gcc-patches/2018-08/msg01005.html
>>
>> and if you want, you can split that patch in two parts:
>>
>> first:
>> 86711 fix:
>> 2018-08-17  Bernd Edlinger  
>>
>>   PR middle-end/86711
>>   * expr.c (string_constant): Don't return truncated string literals.
>>
>>   * gcc.c-torture/execute/pr86711.c: New test.
> Note that while this fixes your pr86711.c, it doesn't fix the larger set
> of tests Martin has created for the issues in pr86711.
> 
> Sigh.
> jeff
> 


Hmm.  which test do you mean?

I just looked at the tests from part1/6.

there I found test cases memchr-1.c, pr86714.c, strlenopt-56.c

test them against trunk from yesterday (only build stage1):
$ svn info
Path: .
Working Copy Root Path: /home/ed/gnu/gcc-trunk
URL: svn+ssh://edlin...@gcc.gnu.org/svn/gcc/trunk
Relative URL: ^/trunk
Repository Root: svn+ssh://edlin...@gcc.gnu.org/svn/gcc
Repository UUID: 138bc75d-0d04-0410-961f-82ee72b054a4
Revision: 263893
Node Kind: directory
Schedule: normal
Last Changed Author: jakub
Last Changed Rev: 263891
Last Changed Date: 2018-08-27 20:36:23 +0200 (Mon, 27 Aug 2018)

memchr-1.c fail
pr86714.c fail
strlenopt-56.c pass

I don't know what this test case is trying to test, sorry.
What is obvious about it:

static const wchar_t wsx[] = L"\x12345678";
static const wchar_t ws4[] = L"\x00123456\x12005678\x12340078\x12345600";

This will be one more failed test case on AIX, given it uses -Wall.

Now I apply my pr86711 patch (only expr.c changed):

result
memchr-1.c pass
pr86714.c pass

When I tested that previously pr86714.c was failed, when only
the first part was applied.  Now it passes, everything depends
on not really well defined semantics, you know.

pr86714.c tests specifically what happens at tree-ssa-forwprop.c
when TREE_STRING_LENGTH is trusted, but when I debug it again, it
appears to be fixed because this prevents it attempt the wrong folding:

   else if (compare_tree_int (TYPE_SIZE_UNIT (TREE_TYPE (init)),
 TREE_STRING_LENGTH (init)) < 0)
return NULL_TREE;

So my guess is, maybe it failed for a different reason, last week.



Bernd.


Re: [PATCH 4/4] bb-reorder: convert to gcc_stablesort

2018-08-28 Thread Alexander Monakov
This converts the use in bb-reorder.  I had to get a little bit creative with
the comparator as profile_count::operator> does not implement a strict weak
order.

* gcc/bb-reorder.c (edge_order): Convert to C-qsort-style
tri-state comparator.
(reorder_basic_blocks_simple): Change std::stable_sort to 
gcc_stablesort.

diff --git a/gcc/bb-reorder.c b/gcc/bb-reorder.c
index 57edde62c62..7c80609fbf4 100644
--- a/gcc/bb-reorder.c
+++ b/gcc/bb-reorder.c
@@ -91,7 +91,6 @@
 */
 
 #include "config.h"
-#define INCLUDE_ALGORITHM /* stable_sort */
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -2351,13 +2350,17 @@ reorder_basic_blocks_software_trace_cache (void)
   FREE (bbd);
 }
 
-/* Return true if edge E1 is more desirable as a fallthrough edge than
-   edge E2 is.  */
+/* Order edges by execution frequency, higher first.  */
 
-static bool
-edge_order (edge e1, edge e2)
+static int
+edge_order (const void *ve1, const void *ve2)
 {
-  return e1->count () > e2->count ();
+  edge e1 = *(const edge *) ve1;
+  edge e2 = *(const edge *) ve2;
+  profile_count c1 = e1->count ();
+  profile_count c2 = e2->count ();
+  profile_count m = c1.max (c2);
+  return (m == c2) - (m == c1);
 }
 
 /* Reorder basic blocks using the "simple" algorithm.  This tries to
@@ -2410,7 +2413,7 @@ reorder_basic_blocks_simple (void)
  all edges are equally desirable.  */
 
   if (optimize_function_for_speed_p (cfun))
-std::stable_sort (edges, edges + n, edge_order);
+gcc_stablesort (edges, n, sizeof *edges, edge_order);
 
   /* Now decide which of those edges to make fallthrough edges.  We set
  BB_VISITED if a block already has a fallthrough successor assigned
-- 
2.13.3



Re: [PATCH 1/4] qsort_chk: call from gcc_qsort instead of wrapping it

2018-08-28 Thread Richard Biener
On Tue, Aug 28, 2018 at 11:03 AM Alexander Monakov  wrote:
>
> Hi,
>
> This is an independently useful patch that makes it easier to introduce
> gcc_stablesort.
>
> Swap responsibilities of gcc_qsort and qsort_chk: call the checking function
> from the sorting function instead of wrapping gcc_qsort with qsort_chk.

OK.

> * gcc/sort.cc (gcc_qsort) [CHECKING_P]: Call qsort_chk.
> * gcc/system.h (qsort): Always redirect to gcc_qsort.  Update comment.
> * gcc/vec.c (qsort_chk): Do not call gcc_qsort.  Update comment.
>
> diff --git a/gcc/sort.cc b/gcc/sort.cc
> index 293e2058f89..9f8ee12e13b 100644
> --- a/gcc/sort.cc
> +++ b/gcc/sort.cc
> @@ -229,4 +229,7 @@ gcc_qsort (void *vbase, size_t n, size_t size, cmp_fn 
> *cmp)
>mergesort (base, , n, base, (char *)buf);
>if (buf != scratch)
>  free (buf);
> +#if CHECKING_P
> +  qsort_chk (vbase, n, size, cmp);
> +#endif
>  }
> diff --git a/gcc/system.h b/gcc/system.h
> index f87fbaae69e..203c6a4f0cf 100644
> --- a/gcc/system.h
> +++ b/gcc/system.h
> @@ -1197,17 +1197,13 @@ helper_const_non_const_cast (const char *p)
>  /* Get definitions of HOST_WIDE_INT.  */
>  #include "hwint.h"
>
> -/* qsort comparator consistency checking: except in release-checking 
> compilers,
> -   redirect 4-argument qsort calls to qsort_chk; keep 1-argument invocations
> +/* GCC qsort API-compatible functions: except in release-checking compilers,
> +   redirect 4-argument qsort calls to gcc_qsort; keep 1-argument invocations
> corresponding to vec::qsort (cmp): they use C qsort internally anyway.  */
>  void qsort_chk (void *, size_t, size_t, int (*)(const void *, const void *));
>  void gcc_qsort (void *, size_t, size_t, int (*)(const void *, const void *));
>  #define PP_5th(a1, a2, a3, a4, a5, ...) a5
>  #undef qsort
> -#if CHECKING_P
> -#define qsort(...) PP_5th (__VA_ARGS__, qsort_chk, 3, 2, qsort, 0) 
> (__VA_ARGS__)
> -#else
>  #define qsort(...) PP_5th (__VA_ARGS__, gcc_qsort, 3, 2, qsort, 0) 
> (__VA_ARGS__)
> -#endif
>
>  #endif /* ! GCC_SYSTEM_H */
> diff --git a/gcc/vec.c b/gcc/vec.c
> index beb857fd838..ac3226b5fcb 100644
> --- a/gcc/vec.c
> +++ b/gcc/vec.c
> @@ -201,21 +201,12 @@ qsort_chk_error (const void *p1, const void *p2, const 
> void *p3,
>internal_error ("qsort checking failed");
>  }
>
> -/* Wrapper around qsort with checking that CMP is consistent on given input.
> -
> -   Strictly speaking, passing invalid (non-transitive, non-anti-commutative)
> -   comparators to libc qsort can result in undefined behavior.  Therefore we
> -   should ideally perform consistency checks prior to invoking qsort, but in
> -   order to do that optimally we'd need to sort the array ourselves 
> beforehand
> -   with a sorting routine known to be "safe".  Instead, we expect that most
> -   implementations in practice will still produce some permutation of input
> -   array even for invalid comparators, which enables us to perform checks on
> -   the output array.  */
> +/* Verify anti-symmetry and transitivity for comparator CMP on sorted array
> +   of N SIZE-sized elements pointed to by BASE.  */
>  void
>  qsort_chk (void *base, size_t n, size_t size,
>int (*cmp)(const void *, const void *))
>  {
> -  gcc_qsort (base, n, size, cmp);
>  #if 0
>  #define LIM(n) (n)
>  #else
> --
> 2.13.3
>


Re: [PATCH 3/4] tree-loop-distribution: convert to gcc_stablesort

2018-08-28 Thread Alexander Monakov
This converts std::stable_sort use in tree-loop-distribution to gcc_stablesort.

* gcc/tree-loop-distribution.c (offset_cmp): Convert to C-qsort-style
tri-state comparator.
(fuse_memset_builtins): Change std::stable_sort to  gcc_stablesort .

diff --git a/gcc/tree-loop-distribution.c b/gcc/tree-loop-distribution.c
index 120661447f0..d8db03b545b 100644
--- a/gcc/tree-loop-distribution.c
+++ b/gcc/tree-loop-distribution.c
@@ -90,7 +90,6 @@ along with GCC; see the file COPYING3.  If not see
data reuse.  */
 
 #include "config.h"
-#define INCLUDE_ALGORITHM /* stable_sort */
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -2561,12 +2560,14 @@ version_for_distribution_p (vec 
*partitions,
 
 /* Compare base offset of builtin mem* partitions P1 and P2.  */
 
-static bool
-offset_cmp (struct partition *p1, struct partition *p2)
+static int
+offset_cmp (const void *vp1, const void *vp2)
 {
-  gcc_assert (p1 != NULL && p1->builtin != NULL);
-  gcc_assert (p2 != NULL && p2->builtin != NULL);
-  return p1->builtin->dst_base_offset < p2->builtin->dst_base_offset;
+  struct partition *p1 = *(struct partition *const *) vp1;
+  struct partition *p2 = *(struct partition *const *) vp2;
+  unsigned HOST_WIDE_INT o1 = p1->builtin->dst_base_offset;
+  unsigned HOST_WIDE_INT o2 = p2->builtin->dst_base_offset;
+  return (o2 < o1) - (o1 < o2);
 }
 
 /* Fuse adjacent memset builtin PARTITIONS if possible.  This is a special
@@ -2618,8 +2619,8 @@ fuse_memset_builtins (vec *partitions)
}
 
   /* Stable sort is required in order to avoid breaking dependence.  */
-  std::stable_sort (&(*partitions)[i],
-   &(*partitions)[i] + j - i, offset_cmp);
+  gcc_stablesort (&(*partitions)[i], j - i, sizeof (*partitions)[i],
+ offset_cmp);
   /* Continue with next partition.  */
   i = j;
 }
-- 
2.13.3



Re: [PATCH 2/4] introduce gcc_stablesort

2018-08-28 Thread Alexander Monakov
This adds a stable sort to sort.cc: mergesort implementation is stable, so
we just need network sort limited to sizes 2-3 to get the complete sort stable.

As I don't want to duplicate code for this, I've chosen to have gcc_qsort
accept bit-inverted 'size' parameter to request stable sorting.

* gcc/sort.cc (struct sort_ctx): New field 'nlim'. Use it...
(mergesort): ... here as maximum count for using netsort.
(gcc_qsort): Set nlim to 3 if stable sort is requested.
(gcc_stablesort): New.
* gcc/system.h (gcc_stablesort): Declare.

diff --git a/gcc/sort.cc b/gcc/sort.cc
index 9f8ee12e13b..b3be1eac72b 100644
--- a/gcc/sort.cc
+++ b/gcc/sort.cc
@@ -55,6 +55,7 @@ struct sort_ctx
   char   *out; // output buffer
   size_t n;// number of elements
   size_t size; // element size
+  size_t nlim; // limit for network sort
 };
 
 /* Helper for netsort. Permute, possibly in-place, 2 or 3 elements,
@@ -178,7 +179,7 @@ do {  \
 static void
 mergesort (char *in, sort_ctx *c, size_t n, char *out, char *tmp)
 {
-  if (likely (n <= 5))
+  if (likely (n <= c->nlim))
 {
   c->out = out;
   c->n = n;
@@ -221,8 +222,12 @@ gcc_qsort (void *vbase, size_t n, size_t size, cmp_fn *cmp)
 {
   if (n < 2)
 return;
+  size_t nlim = 5;
+  bool stable = (ssize_t) size < 0;
+  if (stable)
+nlim = 3, size = ~size;
   char *base = (char *)vbase;
-  sort_ctx c = {cmp, base, n, size};
+  sort_ctx c = {cmp, base, n, size, nlim};
   long long scratch[32];
   size_t bufsz = (n / 2) * size;
   void *buf = bufsz <= sizeof scratch ? scratch : xmalloc (bufsz);
@@ -233,3 +238,9 @@ gcc_qsort (void *vbase, size_t n, size_t size, cmp_fn *cmp)
   qsort_chk (vbase, n, size, cmp);
 #endif
 }
+
+void
+gcc_stablesort (void *vbase, size_t n, size_t size, cmp_fn *cmp)
+{
+  gcc_qsort (vbase, n, ~size, cmp);
+}
diff --git a/gcc/system.h b/gcc/system.h
index 203c6a4f0cf..100feb567c9 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -1202,6 +1202,8 @@ helper_const_non_const_cast (const char *p)
corresponding to vec::qsort (cmp): they use C qsort internally anyway.  */
 void qsort_chk (void *, size_t, size_t, int (*)(const void *, const void *));
 void gcc_qsort (void *, size_t, size_t, int (*)(const void *, const void *));
+void gcc_stablesort (void *, size_t, size_t,
+int (*)(const void *, const void *));
 #define PP_5th(a1, a2, a3, a4, a5, ...) a5
 #undef qsort
 #define qsort(...) PP_5th (__VA_ARGS__, gcc_qsort, 3, 2, qsort, 0) 
(__VA_ARGS__)
-- 
2.13.3



[PATCH 1/4] qsort_chk: call from gcc_qsort instead of wrapping it

2018-08-28 Thread Alexander Monakov
Hi,

This is an independently useful patch that makes it easier to introduce
gcc_stablesort.

Swap responsibilities of gcc_qsort and qsort_chk: call the checking function
from the sorting function instead of wrapping gcc_qsort with qsort_chk.

* gcc/sort.cc (gcc_qsort) [CHECKING_P]: Call qsort_chk.
* gcc/system.h (qsort): Always redirect to gcc_qsort.  Update comment.
* gcc/vec.c (qsort_chk): Do not call gcc_qsort.  Update comment.

diff --git a/gcc/sort.cc b/gcc/sort.cc
index 293e2058f89..9f8ee12e13b 100644
--- a/gcc/sort.cc
+++ b/gcc/sort.cc
@@ -229,4 +229,7 @@ gcc_qsort (void *vbase, size_t n, size_t size, cmp_fn *cmp)
   mergesort (base, , n, base, (char *)buf);
   if (buf != scratch)
 free (buf);
+#if CHECKING_P
+  qsort_chk (vbase, n, size, cmp);
+#endif
 }
diff --git a/gcc/system.h b/gcc/system.h
index f87fbaae69e..203c6a4f0cf 100644
--- a/gcc/system.h
+++ b/gcc/system.h
@@ -1197,17 +1197,13 @@ helper_const_non_const_cast (const char *p)
 /* Get definitions of HOST_WIDE_INT.  */
 #include "hwint.h"
 
-/* qsort comparator consistency checking: except in release-checking compilers,
-   redirect 4-argument qsort calls to qsort_chk; keep 1-argument invocations
+/* GCC qsort API-compatible functions: except in release-checking compilers,
+   redirect 4-argument qsort calls to gcc_qsort; keep 1-argument invocations
corresponding to vec::qsort (cmp): they use C qsort internally anyway.  */
 void qsort_chk (void *, size_t, size_t, int (*)(const void *, const void *));
 void gcc_qsort (void *, size_t, size_t, int (*)(const void *, const void *));
 #define PP_5th(a1, a2, a3, a4, a5, ...) a5
 #undef qsort
-#if CHECKING_P
-#define qsort(...) PP_5th (__VA_ARGS__, qsort_chk, 3, 2, qsort, 0) 
(__VA_ARGS__)
-#else
 #define qsort(...) PP_5th (__VA_ARGS__, gcc_qsort, 3, 2, qsort, 0) 
(__VA_ARGS__)
-#endif
 
 #endif /* ! GCC_SYSTEM_H */
diff --git a/gcc/vec.c b/gcc/vec.c
index beb857fd838..ac3226b5fcb 100644
--- a/gcc/vec.c
+++ b/gcc/vec.c
@@ -201,21 +201,12 @@ qsort_chk_error (const void *p1, const void *p2, const 
void *p3,
   internal_error ("qsort checking failed");
 }
 
-/* Wrapper around qsort with checking that CMP is consistent on given input.
-
-   Strictly speaking, passing invalid (non-transitive, non-anti-commutative)
-   comparators to libc qsort can result in undefined behavior.  Therefore we
-   should ideally perform consistency checks prior to invoking qsort, but in
-   order to do that optimally we'd need to sort the array ourselves beforehand
-   with a sorting routine known to be "safe".  Instead, we expect that most
-   implementations in practice will still produce some permutation of input
-   array even for invalid comparators, which enables us to perform checks on
-   the output array.  */
+/* Verify anti-symmetry and transitivity for comparator CMP on sorted array
+   of N SIZE-sized elements pointed to by BASE.  */
 void
 qsort_chk (void *base, size_t n, size_t size,
   int (*cmp)(const void *, const void *))
 {
-  gcc_qsort (base, n, size, cmp);
 #if 0
 #define LIM(n) (n)
 #else
-- 
2.13.3



Re: [PATCH][AAarch64][v3] Add support for TARGET_COMPUTE_FRAME_LAYOUT

2018-08-28 Thread Vlad Lazar

Gentle ping.

On 06/08/18 17:14, Vlad Lazar wrote:

Hi,

The patch adds support for the TARGET_COMPUTE_FRAME_LAYOUT hook on AArch64
and removes unneeded frame layout recalculation.

The removed aarch64_layout_frame calls are unnecessary because the functions in 
which
they appear will be called during or after the reload pass in which the 
TARGET_COMPUTE_FRAME_LAYOUT
hook is called. The if statement in aarch64_layout_frame had the purpose of 
avoiding
the extra work from the calls which have been removed and is now redundant.

Bootstrapped and regtested on aarch64-none-linux-gnu and there are no 
regressions.

Thanks,
Vlad

gcc/
2018-08-06  Vlad Lazar  

 * config/aarch64/aarch64.h (TARGET_COMPUTE_FRAME_LAYOUT): Define.
 * config/aarch64/aarch64.c (aarch64_expand_prologue): Remove 
aarch64_layout_frame call.
 (aarch64_expand_epilogue): Likewise.
 (aarch64_initial_elimination_offset): Likewise.
 (aarch64_get_separate_components): Likewise.
 (aarch64_use_return_insn_p): Likewise.
 (aarch64_layout_frame): Remove unneeded check.

---

diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index 
976f9afae54c1c98c22a4d5002d8d94e33b3190a..02aaa333fb57eff918049681173403f004a8a8e3
 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -478,6 +478,9 @@ extern unsigned aarch64_architecture_version;
  #undef DONT_USE_BUILTIN_SETJMP
  #define DONT_USE_BUILTIN_SETJMP 1

+#undef TARGET_COMPUTE_FRAME_LAYOUT
+#define TARGET_COMPUTE_FRAME_LAYOUT aarch64_layout_frame
+
  /* Register in which the structure value is to be returned.  */
  #define AARCH64_STRUCT_VALUE_REGNUM R8_REGNUM

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
b88e7cac27ab76e01b9769563ec9077d2a81bd7b..6a52eecf94011f5a7ee787c1295ca24732af2ff4
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -4021,9 +4021,6 @@ aarch64_layout_frame (void)
HOST_WIDE_INT offset = 0;
int regno, last_fp_reg = INVALID_REGNUM;

-  if (reload_completed && cfun->machine->frame.laid_out)
-return;
-
cfun->machine->frame.emit_frame_chain = aarch64_needs_frame_chain ();

  #define SLOT_NOT_REQUIRED (-2)
@@ -4567,8 +4564,6 @@ offset_12bit_unsigned_scaled_p (machine_mode mode, 
poly_int64 offset)
  static sbitmap
  aarch64_get_separate_components (void)
  {
-  aarch64_layout_frame ();
-
sbitmap components = sbitmap_alloc (LAST_SAVED_REGNUM + 1);
bitmap_clear (components);

@@ -4592,7 +4587,7 @@ aarch64_get_separate_components (void)

unsigned reg1 = cfun->machine->frame.wb_candidate1;
unsigned reg2 = cfun->machine->frame.wb_candidate2;
-  /* If aarch64_layout_frame has chosen registers to store/restore with
+  /* If registers have been chosen to be stored/restored with
   writeback don't interfere with them to avoid having to output explicit
   stack adjustment instructions.  */
if (reg2 != INVALID_REGNUM)
@@ -4850,8 +4845,6 @@ aarch64_add_cfa_expression (rtx_insn *insn, unsigned int 
reg,
  void
  aarch64_expand_prologue (void)
  {
-  aarch64_layout_frame ();
-
poly_int64 frame_size = cfun->machine->frame.frame_size;
poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
@@ -4964,8 +4957,6 @@ aarch64_use_return_insn_p (void)
if (crtl->profile)
  return false;

-  aarch64_layout_frame ();
-
return known_eq (cfun->machine->frame.frame_size, 0);
  }

@@ -4977,8 +4968,6 @@ aarch64_use_return_insn_p (void)
  void
  aarch64_expand_epilogue (bool for_sibcall)
  {
-  aarch64_layout_frame ();
-
poly_int64 initial_adjust = cfun->machine->frame.initial_adjust;
HOST_WIDE_INT callee_adjust = cfun->machine->frame.callee_adjust;
poly_int64 final_adjust = cfun->machine->frame.final_adjust;
@@ -7525,8 +7514,6 @@ aarch64_can_eliminate (const int from ATTRIBUTE_UNUSED, 
const int to)
  poly_int64
  aarch64_initial_elimination_offset (unsigned from, unsigned to)
  {
-  aarch64_layout_frame ();
-
if (to == HARD_FRAME_POINTER_REGNUM)
  {
if (from == ARG_POINTER_REGNUM)




Re: [PATCH][AArch64] Implement new intrinsics vabsd_s64 and vnegd_s64

2018-08-28 Thread Vlad Lazar

Gentle ping.

On 08/08/18 17:38, Vlad Lazar wrote:

On 01/08/18 18:35, James Greenhalgh wrote:

On Wed, Aug 01, 2018 at 07:13:53AM -0500, Vlad Lazar wrote:

On 31/07/18 22:48, James Greenhalgh wrote:

On Fri, Jul 20, 2018 at 04:37:34AM -0500, Vlad Lazar wrote:

Hi,

The patch adds implementations for the NEON intrinsics vabsd_s64 and vnegd_s64.
(https://developer.arm.com/products/architecture/cpu-architecture/a-profile/docs/ihi0073/latest/arm-neon-intrinsics-reference-architecture-specification)

Bootstrapped and regtested on aarch64-none-linux-gnu and there are no 
regressions.

OK for trunk?

+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vnegd_s64 (int64_t __a)
+{
+  return -__a;
+}


Does this give the correct behaviour for the minimum value of int64_t? That
would be undefined behaviour in C, but well-defined under ACLE.

Thanks,
James



Hi. Thanks for the review.

For the minimum value of int64_t it behaves as the ACLE specifies:
"The negative of the minimum (signed) value is itself."


What should happen in this testcase? The spoiler is below, but try to work out
what should happen and what goes wrong with your implementation.

   int foo (int64_t x)
   {
 if (x < (int64_t) 0)
   return vnegd_s64(x) < (int64_t) 0;
 else
   return 0;
   }
   int bar (void)
   {
 return foo (INT64_MIN);
   }
Thanks,
James


-






INT64_MIN < 0 should be true, so we should return vnegd_s64(INT64_MIN) < 0.
vnegd_s64(INT64_MIN) is identity, so the return value should be
INT64_MIN < 0; i.e. True.

This isn't what the compiler thinks... The compiler makes use of the fact
that -INT64_MIN is undefined behaviour in C, and doesn't need to be considered
as a special case. The if statement gives you a range reduction to [-INF, -1],
negating that gives you a range [1, INF], and [1, INF] is never less than 0,
so the compiler folds the function to return false. We have a mismatch in
semantics


I see your point now. I have updated the vnegd_s64 intrinsic to convert to
unsigned before negating. This means that if the predicted range of x is
[INT64_MIN, y], then the predicted range of vnegd_s64 (x) will be
~[INT64_MIN + 1, y] which seems to resolve the issue. I've also added testcases
which reflect the issue you've pointed out. Note that I've change the vabsd_s64
intrinsic in order to avoid moves between integer and vector registers.

See the updated patch below. Ok for trunk?

---

diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
index 
2d18400040f031dfcdaf60269ad484647804e1be..fc734e1aa9e93c171c0670164e5a3a54209905d3
 100644
--- a/gcc/config/aarch64/arm_neon.h
+++ b/gcc/config/aarch64/arm_neon.h
@@ -11822,6 +11822,18 @@ vabsq_s64 (int64x2_t __a)
return __builtin_aarch64_absv2di (__a);
  }

+/* Try to avoid moving between integer and vector registers.
+   For why the cast to unsigned is needed check the vnegd_s64 intrinsic.
+   There is a testcase related to this issue:
+   gcc.target/aarch64/vabsd_s64.c.  */
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vabsd_s64 (int64_t __a)
+{
+  return __a < 0 ? - (uint64_t) __a : __a;
+}
+
  /* vadd */

  __extension__ extern __inline int64_t
@@ -22907,6 +22919,25 @@ vneg_s64 (int64x1_t __a)
return -__a;
  }

+/* According to the ACLE, the negative of the minimum (signed)
+   value is itself.  This leads to a semantics mismatch, as this is
+   undefined behaviour in C.  The value range predictor is not
+   aware that the negation of a negative number can still be negative
+   and it may try to fold the expression.  See the test in
+   gcc.target/aarch64/vnegd_s64.c for an example.
+
+   The cast below tricks the value range predictor to include
+   INT64_MIN in the range it computes.  So for x in the range
+   [INT64_MIN, y] the range prediction after vnegd_s64 (x) will
+   be ~[INT64_MIN + 1, y].  */
+
+__extension__ extern __inline int64_t
+__attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
+vnegd_s64 (int64_t __a)
+{
+  return - (uint64_t) __a;
+}
+
  __extension__ extern __inline float32x4_t
  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
  vnegq_f32 (float32x4_t __a)
diff --git a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c 
b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
index 
ea29066e369b967d0781d31c8a5208bda9e4f685..d943989768dd8c9aa87d9dcb899e199029ef3f8b
 100644
--- a/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
+++ b/gcc/testsuite/gcc.target/aarch64/scalar_intrinsics.c
@@ -627,6 +627,14 @@ test_vqabss_s32 (int32_t a)
return vqabss_s32 (a);
  }

+/* { dg-final { scan-assembler-times "\\tneg\\tx\[0-9\]+" 1 } } */
+
+int64_t
+test_vnegd_s64 (int64_t a)
+{
+  return vnegd_s64 (a);
+}
+
  /* { dg-final { scan-assembler-times "\\tsqneg\\tb\[0-9\]+" 1 } } */

  int8_t
diff --git a/gcc/testsuite/gcc.target/aarch64/vabs_intrinsic_3.c 

[PATCH][4/4] Fix PR87117

2018-08-28 Thread Richard Biener


Bootstrap and regtest running on x86_64-unknown-linux-gnu.

2018-08-28  Richard Biener  

PR tree-optimization/87117
* tree-ssa-sccvn.c (eliminate_dom_walker::eliminate_cleanup):
Handle removed stmt without LHS (GIMPLE_NOP).

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 263906)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -5424,31 +5425,28 @@ eliminate_dom_walker::eliminate_cleanup
  do_release_defs = false;
}
}
- else
-   {
- tree lhs = gimple_get_lhs (stmt);
- if (TREE_CODE (lhs) == SSA_NAME
- && !has_zero_uses (lhs))
-   {
- if (dump_file && (dump_flags & TDF_DETAILS))
-   fprintf (dump_file, "Keeping eliminated stmt live "
-"as copy because of out-of-region uses\n");
- tree sprime = eliminate_avail (gimple_bb (stmt), lhs);
- gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
- if (is_gimple_assign (stmt))
-   {
- gimple_assign_set_rhs_from_tree (, sprime);
- update_stmt (gsi_stmt (gsi));
- continue;
-   }
- else
-   {
- gimple *copy = gimple_build_assign (lhs, sprime);
- gsi_insert_before (, copy, GSI_SAME_STMT);
- do_release_defs = false;
-   }
-   }
-   }
+ else if (tree lhs = gimple_get_lhs (stmt))
+   if (TREE_CODE (lhs) == SSA_NAME
+   && !has_zero_uses (lhs))
+ {
+   if (dump_file && (dump_flags & TDF_DETAILS))
+ fprintf (dump_file, "Keeping eliminated stmt live "
+  "as copy because of out-of-region uses\n");
+   tree sprime = eliminate_avail (gimple_bb (stmt), lhs);
+   gimple_stmt_iterator gsi = gsi_for_stmt (stmt);
+   if (is_gimple_assign (stmt))
+ {
+   gimple_assign_set_rhs_from_tree (, sprime);
+   update_stmt (gsi_stmt (gsi));
+   continue;
+ }
+   else
+ {
+   gimple *copy = gimple_build_assign (lhs, sprime);
+   gsi_insert_before (, copy, GSI_SAME_STMT);
+   do_release_defs = false;
+ }
+ }
}
 
   if (dump_file && (dump_flags & TDF_DETAILS))


[PATCH][3/4] Fix PR87118

2018-08-28 Thread Richard Biener


For now give up on predicated values when doing PRE (there's a
similar hunk in PHI-translation already).  I need to sit down
and decide whether it's worth handling them or whether we'd
better prune them from the hash tables when assigning value-ids.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

2018-08-28  Richard Biener  

PR tree-optimization/87117
* tree-ssa-pre.c (compute_avail): Do not make expressions
with predicated values available.
(get_expr_value_id): Assert we do not run into predicated value
expressions.

* gcc.dg/pr87117-2.c: New testcase.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 263906)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -663,6 +663,7 @@ get_expr_value_id (pre_expr expr)
   id = VN_INFO (PRE_EXPR_NAME (expr))->value_id;
   break;
 case NARY:
+  gcc_assert (!PRE_EXPR_NARY (expr)->predicated_values);
   id = PRE_EXPR_NARY (expr)->value_id;
   break;
 case REFERENCE:
@@ -3902,7 +3903,7 @@ compute_avail (void)
continue;
 
  vn_nary_op_lookup_stmt (stmt, );
- if (!nary)
+ if (!nary || nary->predicated_values)
continue;
 
  /* If the NARY traps and there was a preceding
Index: gcc/testsuite/gcc.dg/pr87117-2.c
===
--- gcc/testsuite/gcc.dg/pr87117-2.c(nonexistent)
+++ gcc/testsuite/gcc.dg/pr87117-2.c(working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fcode-hoisting" } */
+
+void e();
+
+void a(int c, char **d)
+{
+  char b;
+  if (1 < c)
+b = (char)(__INTPTR_TYPE__)d[0];
+  if (1 < c && b)
+e();
+  while (1 < c)
+;
+}


[PATCH][2/4] Fix PR87117

2018-08-28 Thread Richard Biener


Err, see comment of [1/4] ;)

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

2018-08-28  Richard Biener  

PR tree-optimization/87117
* tree-ssa-operands.c (add_stmt_operand): STRING_CST may
get virtual operands.
(get_expr_operands): Handle STRING_CST like other decls.

* gcc.dg/lvalue-5.c: New testcase.

Index: gcc/tree-ssa-operands.c
===
--- gcc/tree-ssa-operands.c (revision 263906)
+++ gcc/tree-ssa-operands.c (working copy)
@@ -515,7 +515,7 @@ add_stmt_operand (struct function *fn, t
 {
   tree var = *var_p;
 
-  gcc_assert (SSA_VAR_P (*var_p));
+  gcc_assert (SSA_VAR_P (*var_p) || TREE_CODE (*var_p) == STRING_CST);
 
   if (is_gimple_reg (var))
 {
@@ -740,6 +740,7 @@ get_expr_operands (struct function *fn,
 case VAR_DECL:
 case PARM_DECL:
 case RESULT_DECL:
+case STRING_CST:
   if (!(flags & opf_address_taken))
add_stmt_operand (fn, expr_p, stmt, flags);
   return;
Index: gcc/testsuite/gcc.dg/lvalue-5.c
===
--- gcc/testsuite/gcc.dg/lvalue-5.c (revision 263906)
+++ gcc/testsuite/gcc.dg/lvalue-5.c (working copy)
@@ -1,7 +1,7 @@
 /* Test assignment to elements of a string literal is a warning, not
an error.  PR 27676.  */
 /* { dg-do compile } */
-/* { dg-options "-pedantic-errors" } */
+/* { dg-options "-O -pedantic-errors" } */
 
 void
 f (void)


[PATCH][1/4] Fix PR87117

2018-08-28 Thread Richard Biener


I believe Micha stumbled over this as well.  For stores to
string literals we miss VDEFs and loads from STRING_CSTs miss
VUSEs.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

2018-08-28  Richard Biener  

PR tree-optimization/87117
* tree-ssa-sccvn.c (fully_constant_vn_reference_p): Exclude
void which is is_gimple_reg_type by checking for COMPLETE_TYPE_P.

* gcc.dg/pr87117-1.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 263906)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -1408,6 +1408,7 @@ fully_constant_vn_reference_p (vn_refere
 
   /* Simplify reads from constants or constant initializers.  */
   else if (BITS_PER_UNIT == 8
+  && COMPLETE_TYPE_P (ref->type)
   && is_gimple_reg_type (ref->type)
   && (!INTEGRAL_TYPE_P (ref->type)
   || TYPE_PRECISION (ref->type) % BITS_PER_UNIT == 0))
Index: gcc/testsuite/gcc.dg/pr87117-1.c
===
--- gcc/testsuite/gcc.dg/pr87117-1.c(nonexistent)
+++ gcc/testsuite/gcc.dg/pr87117-1.c(working copy)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fno-inline -fno-tree-dce" } */
+
+int a, b, c;
+long *d;
+void fn1()
+{
+  for (; 0 < a;)
+a++;
+}
+void fn3()
+{
+  for (; c; c++)
+d[c] = 0;
+}
+void fn2()
+{
+  if (b)
+fn3();
+  fn1();
+}


[PATCH] Fix some bugs in maybe_warn_nonstring_arg (PR middle-end/87099)

2018-08-28 Thread Jakub Jelinek
Hi!

The following patch fixes some bugs in maybe_warn_nonstring_arg.
The testcase ICEs, because lenrng[1] is a PLUS_EXPR, but the code assumes
without checking that it must be INTEGER_CST and feeds it into
tree_int_cst_lt.  If the upper bound is NULL or is some expression
other than INTEGER_CST, we can't do anything useful with it, so should
just treat it as unknown bound (but e.g. if the first argument doesn't
have a bound or has a complex expression as bound, but the second one
has INTEGER_CST upper bound, we can use that).
Additionally, the code doesn't ever care about lenrng[0] (lower bound),
only lenrng[1] (upper bound), so I've changed the code to check that that
bound is non-NULL and INTEGER_CST.
Because nothing uses lenrng[0], it is wasted work to add 1 to it using
const_binop and the whole function is about -Wstringop-overflow{,=} warning,
so if !warn_stringop_overflow, there is no reason to do all of it and we can
bail out early.
I've also noticed that it can call get_range_strlen on the length argument
of strncmp/strncasecmp, that doesn't make sense either.
And last, just a formatting fix to put constants on rhs of comparison.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-08-27  Jakub Jelinek  

PR middle-end/87099
* calls.c (maybe_warn_nonstring_arg): Punt early if
warn_stringop_overflow is zero.  Don't call get_range_strlen
on 3rd argument, keep iterating until lenrng[1] is INTEGER_CST.
Swap comparison operands to have constants on rhs.  Only use
lenrng[1] if non-NULL and INTEGER_CST.  Don't uselessly
increment lenrng[0].

* gcc.dg/pr87099.c: New test.

--- gcc/calls.c.jj  2018-08-26 22:42:22.525779600 +0200
+++ gcc/calls.c 2018-08-27 14:34:07.959235490 +0200
@@ -1545,7 +1545,7 @@ maybe_warn_nonstring_arg (tree fndecl, t
   if (!fndecl || DECL_BUILT_IN_CLASS (fndecl) != BUILT_IN_NORMAL)
 return;
 
-  if (TREE_NO_WARNING (exp))
+  if (TREE_NO_WARNING (exp) || !warn_stringop_overflow)
 return;
 
   unsigned nargs = call_expr_nargs (exp);
@@ -1573,7 +1573,9 @@ maybe_warn_nonstring_arg (tree fndecl, t
   the range of their known or possible lengths and use it
   conservatively as the bound for the unbounded function,
   and to adjust the range of the bound of the bounded ones.  */
-   for (unsigned argno = 0; argno < nargs && !*lenrng; argno ++)
+   for (unsigned argno = 0;
+argno < MIN (nargs, 2)
+&& !(lenrng[1] && TREE_CODE (lenrng[1]) == INTEGER_CST); argno++)
  {
tree arg = CALL_EXPR_ARG (exp, argno);
if (!get_attr_nonstring_decl (arg))
@@ -1585,12 +1587,12 @@ maybe_warn_nonstring_arg (tree fndecl, t
 case BUILT_IN_STRNCAT:
 case BUILT_IN_STPNCPY:
 case BUILT_IN_STRNCPY:
-  if (2 < nargs)
+  if (nargs > 2)
bound = CALL_EXPR_ARG (exp, 2);
   break;
 
 case BUILT_IN_STRNDUP:
-  if (1 < nargs)
+  if (nargs > 1)
bound = CALL_EXPR_ARG (exp, 1);
   break;
 
@@ -1600,7 +1602,7 @@ maybe_warn_nonstring_arg (tree fndecl, t
if (!get_attr_nonstring_decl (arg))
  get_range_strlen (arg, lenrng);
 
-   if (1 < nargs)
+   if (nargs > 1)
  bound = CALL_EXPR_ARG (exp, 1);
break;
   }
@@ -1640,11 +1642,9 @@ maybe_warn_nonstring_arg (tree fndecl, t
}
 }
 
-  if (*lenrng)
+  if (lenrng[1] && TREE_CODE (lenrng[1]) == INTEGER_CST)
 {
   /* Add one for the nul.  */
-  lenrng[0] = const_binop (PLUS_EXPR, TREE_TYPE (lenrng[0]),
-  lenrng[0], size_one_node);
   lenrng[1] = const_binop (PLUS_EXPR, TREE_TYPE (lenrng[1]),
   lenrng[1], size_one_node);
 
--- gcc/testsuite/gcc.dg/pr87099.c.jj   2018-08-27 14:36:30.220856712 +0200
+++ gcc/testsuite/gcc.dg/pr87099.c  2018-08-27 14:36:06.475253767 +0200
@@ -0,0 +1,21 @@
+/* PR middle-end/87099 */
+/* { dg-do compile } */
+/* { dg-options "-Wstringop-overflow" } */
+
+void bar (char *);
+
+int
+foo (int n)
+{
+  char v[n];
+  bar (v);
+  return __builtin_strncmp ([1], "aaa", 3);
+}
+
+int
+baz (int n, char *s)
+{
+  char v[n];
+  bar (v);
+  return __builtin_strncmp ([1], s, 3);
+}

Jakub


  1   2   >