Re: Limit Debug mode impact: overload __niter_base

2018-07-02 Thread François Dumont

Here is the updated patch.

    * include/bits/stl_algobase.h (__niter_wrap): New.
    (__copy_move_a2(_II, _II, _OI)): Use latter.
    (__copy_move_backward_a2(_BI1, _BI1, _BI2)): Likewise.
    (fill_n(_OI, _Size, const _Tp&)): Likewise.
    (equal(_II1, _II1, _II2)): Use __glibcxx_requires_can_increment.
    * include/debug/stl_iterator.h
    (std::__niter_base(const __gnu_cxx::_Safe_iterator<
    __gnu_cxx::__normal_iterator<>, _Sequence>&)): New declaration.
    * include/debug/vector (__niter_base(const __gnu_cxx::_Safe_iterator<
    __gnu_cxx::__normal_iterator<>, _Sequence>&)): New.

Ok to commit ?


On 02/07/2018 13:57, Jonathan Wakely wrote:

On 01/07/18 21:20 +0200, François Dumont wrote:

    Here is a new proposal between yours and mine.

    It is still adding a function to wrap what __niter_base unwrap, I 
called it __nwrap_iter for this reason. But it takes advantage of 


Since "niter" refers to __normal_iterator I think a name based on
"niter" would be better than nsomething_iter.

__niter_wrap
__niter_rewrap
__niter_lift (misuse of functional programming term?)
__niter_raise (misuse of linear algebra term?)
__make_niter
__remake_niter


knowing that __niter_base will only unwrap random access iterator to 
use an expression to that will do the right thing, no matter the 
original iterator type.


OK, since __niter_base only transforms types based on 
__normal_iterator that seems safe to assume (in theory we could use

__normal_iterator with non-random access iterators, but we don't).

Could you please add a comment to the __nwrap_iter saying something
like:

 // Reverse the __niter_base transformation to get a
 // __normal_iterator back again (this assumes that __normal_iterator
 // is only used to wrap random access iterators, like pointers).


    I eventually found no issue in the testsuite, despite the 
std::equal change. I might have had a local invalid test.


Yes, I *did* test it already :-)




diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h

index d429943..003ae8d 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -277,6 +277,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    __niter_base(_Iterator __it)
    { return __it; }

+  // Convert an iterator of type _From to an iterator of type _To.
+  // e.g. from int* to __normal_iterator.
+  template
+    inline _Iterator
+    __nwrap_iter(_Iterator, _Iterator, _Iterator __res)
+    { return __res; }
+
+  template
+    inline _From
+    __nwrap_iter(_From __from, _To __to, _To __res)
+    { return __from + (__res - __to); }


Every time you call this function you pass it:

 __nwrap_iter(x, __niter_base(x), y)

So can the __niter_base(x) call happen inside __nwrap_iter?

i.e.

 template
   inline _From
   __nwrap_iter(_From __from, _To __res)
   { return __from + (__res - __niter_base(__from)); }





diff --git a/libstdc++-v3/include/bits/stl_algobase.h b/libstdc++-v3/include/bits/stl_algobase.h
index d429943..e5e7d15 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -277,6 +277,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 __niter_base(_Iterator __it)
 { return __it; }
 
+  // Reverse the __niter_base transformation to get a
+  // __normal_iterator back again (this assumes that __normal_iterator
+  // is only used to wrap random access iterators, like pointers).
+  template
+inline _Iterator
+__niter_wrap(_Iterator, _Iterator __res)
+{ return __res; }
+
+  template
+inline _From
+__niter_wrap(_From __from, _To __res)
+{ return __from + (__res - __niter_base(__from)); }
+
   // All of these auxiliary structs serve two purposes.  (1) Replace
   // calls to copy with memmove whenever possible.  (Memmove, not memcpy,
   // because the input and output ranges are permitted to overlap.)
@@ -419,9 +432,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline _OI
 __copy_move_a2(_II __first, _II __last, _OI __result)
 {
-  return _OI(std::__copy_move_a<_IsMove>(std::__niter_base(__first),
-	 std::__niter_base(__last),
-	 std::__niter_base(__result)));
+  return std::__niter_wrap(__result,
+		std::__copy_move_a<_IsMove>(std::__niter_base(__first),
+	std::__niter_base(__last),
+	std::__niter_base(__result)));
 }
 
   /**
@@ -593,7 +607,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 inline _BI2
 __copy_move_backward_a2(_BI1 __first, _BI1 __last, _BI2 __result)
 {
-  return _BI2(std::__copy_move_backward_a<_IsMove>
+  return std::__niter_wrap(__result,
+		std::__copy_move_backward_a<_IsMove>
 		  (std::__niter_base(__first), std::__niter_base(__last),
 		   std::__niter_base(__result)));
 }
@@ -804,7 +819,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __glibcxx_function_requires(_OutputIteratorConcept<_OI, _Tp>)
   __glibcxx_requires_can_increment(__first, __n);
 
-  return 

[committed] Consolidate some H8 patterns

2018-07-02 Thread Jeff Law


The h8 port has 3 movqi and movhi patterns.  One for the H8, another for
the H8/H and H8/S and another for the H8/SX.  The basic H8 pattern can
be trivially merged with the H8/H and H8/S patterns.  In addition to
just simplifing the port, the movqi H8/H and H8/S pattern is better WRT
condition code handling and length computation.  The latter in
particular can result in better code as we're more likely to use short
branches without resorting to linker relaxation.

We can also combine the pushqi and pushhi patterns into a single pattern
using a mode iterator.

I've verified that the only changes in libgcc are the conversion of long
branches to short branches.

These aren't huge cleanups, but every little bit helps this rather
convoluted machine description. I expect to find more cleanups of a
similar nature.

Jeff

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 15becdc2466..a1f2d351983 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,11 @@
 2018-07-02  Jeff Law  
 
+   * config/h8300/h8300.md (movqi_h8300, movqi_h8300hs): Consolidate
+   the H8/300, H8/300H and H8/S variants into a single pattern.
+   (movhi_h8300, movqi_h8300hs): Similarly.
+   (pushqi_h8300hs, pushhi_h8300hs): Consolidate into a single pattern.
+   (QHI mode iterator): New.
+
* config/h8300/h8300.md: Remove trailing whitespace.
 
 2018-07-02  Jim Wilson  
diff --git a/gcc/config/h8300/h8300.md b/gcc/config/h8300/h8300.md
index 846fd735de0..74b22338c21 100644
--- a/gcc/config/h8300/h8300.md
+++ b/gcc/config/h8300/h8300.md
@@ -184,6 +184,7 @@
 
 (define_mode_iterator P [(HI "Pmode == HImode") (SI "Pmode == SImode")])
 
+(define_mode_iterator QHI [QI HI])
 
 ;; --
 ;; MOVE INSTRUCTIONS
@@ -191,25 +192,10 @@
 
 ;; movqi
 
-(define_insn "*movqi_h8300"
+(define_insn "*movqi_h8nosx"
   [(set (match_operand:QI 0 "general_operand_dst" "=r,r ,<,r,r,m")
(match_operand:QI 1 "general_operand_src" " I,r>,r,n,m,r"))]
-  "TARGET_H8300
-   && h8300_move_ok (operands[0], operands[1])"
-  "@
-   sub.b   %X0,%X0
-   mov.b   %R1,%X0
-   mov.b   %X1,%R0
-   mov.b   %R1,%X0
-   mov.b   %R1,%X0
-   mov.b   %X1,%R0"
-  [(set_attr "length" "2,2,2,2,4,4")
-   (set_attr "cc" "set_zn,set_znv,set_znv,set_znv,set_znv,set_znv")])
-
-(define_insn "*movqi_h8300hs"
-  [(set (match_operand:QI 0 "general_operand_dst" "=r,r ,<,r,r,m")
-   (match_operand:QI 1 "general_operand_src" " I,r>,r,n,m,r"))]
-  "(TARGET_H8300H || TARGET_H8300S) && !TARGET_H8300SX
+  "(TARGET_H8300 || TARGET_H8300H || TARGET_H8300S) && !TARGET_H8300SX
 && h8300_move_ok (operands[0], operands[1])"
   "@
sub.b   %X0,%X0
@@ -220,7 +206,7 @@
mov.b   %X1,%R0"
   [(set (attr "length")
(symbol_ref "compute_mov_length (operands)"))
-   (set_attr "cc" "set_zn,set_znv,set_znv,clobber,set_znv,set_znv")])
+   (set_attr "cc" "set_zn,set_znv,set_znv,set_znv,set_znv,set_znv")])
 
 (define_insn "*movqi_h8sx"
   [(set (match_operand:QI 0 "general_operand_dst" "=Z,rQ")
@@ -255,26 +241,10 @@
 
 ;; movhi
 
-(define_insn "*movhi_h8300"
+(define_insn "*movhi_h8nosx"
   [(set (match_operand:HI 0 "general_operand_dst" "=r,r,<,r,r,m")
(match_operand:HI 1 "general_operand_src" "I,r>,r,i,m,r"))]
-  "TARGET_H8300
-   && h8300_move_ok (operands[0], operands[1])"
-  "@
-   sub.w   %T0,%T0
-   mov.w   %T1,%T0
-   mov.w   %T1,%T0
-   mov.w   %T1,%T0
-   mov.w   %T1,%T0
-   mov.w   %T1,%T0"
-  [(set (attr "length")
-   (symbol_ref "compute_mov_length (operands)"))
-   (set_attr "cc" "set_zn,set_znv,set_znv,set_znv,set_znv,set_znv")])
-
-(define_insn "*movhi_h8300hs"
-  [(set (match_operand:HI 0 "general_operand_dst" "=r,r,<,r,r,m")
-   (match_operand:HI 1 "general_operand_src" "I,r>,r,i,m,r"))]
-  "(TARGET_H8300H || TARGET_H8300S) && !TARGET_H8300SX
+  "(TARGET_H8300 || TARGET_H8300H || TARGET_H8300S) && !TARGET_H8300SX
 && h8300_move_ok (operands[0], operands[1])"
   "@
sub.w   %T0,%T0
@@ -855,25 +825,16 @@
   "mov.w\\t%T0,@-r7"
   [(set_attr "length" "2")])
 
-(define_insn "*pushqi1_h8300hs_"
-  [(set (mem:QI
+(define_insn "*push1_h8300hs_"
+  [(set (mem:QHI
(pre_modify:P
  (reg:P SP_REG)
  (plus:P (reg:P SP_REG) (const_int -4
-   (match_operand:QI 0 "register_no_sp_elim_operand" "r"))]
+   (match_operand:QHI 0 "register_no_sp_elim_operand" "r"))]
   "TARGET_H8300H || TARGET_H8300S"
   "mov.l\\t%S0,@-er7"
   [(set_attr "length" "4")])
 
-(define_insn "*pushhi1_h8300hs_"
-  [(set (mem:HI
-   (pre_modify:P
- (reg:P SP_REG)
- (plus:P (reg:P SP_REG) (const_int -4
-   (match_operand:HI 0 "register_no_sp_elim_operand" "r"))]
-  "TARGET_H8300H || TARGET_H8300S"
-  "mov.l\\t%S0,@-er7"
-  [(set_attr "length" "4")])
 
 ;; --
 ;; TEST INSTRUCTIONS


[committed] Fix whitespace issues in h8300.md

2018-07-02 Thread Jeff Law


So the H8 port is the next planned conversion away from cc0.  Given that
nearly every insn (including simple register moves) hits cc0,
essentially every pattern is going to need twiddling.  Sigh.

This seems like a good time to fix some nits since the whole file is
about to change anyway.

This commit just fixes trailing whitespace problems.

Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a8c2629cdac..15becdc2466 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2018-07-02  Jeff Law  
+
+   * config/h8300/h8300.md: Remove trailing whitespace.
+
 2018-07-02  Jim Wilson  
 
* config/riscv/riscv.c (riscv_expand_epilogue): Use emit_jump_insn
diff --git a/gcc/config/h8300/h8300.md b/gcc/config/h8300/h8300.md
index 8464565a8a4..846fd735de0 100644
--- a/gcc/config/h8300/h8300.md
+++ b/gcc/config/h8300/h8300.md
@@ -880,7 +880,7 @@
 ;; --
 
 (define_insn ""
-  [(set (cc0) 
+  [(set (cc0)
(compare (zero_extract:QI (match_operand:QI 0 "bit_memory_operand" 
"r,U")
  (const_int 1)
  (match_operand 1 "const_int_operand" "n,n"))
@@ -902,7 +902,7 @@
(set_attr "cc" "set_zn")])
 
 (define_insn_and_split "*tst_extzv_1_n"
-  [(set (cc0) 
+  [(set (cc0)
(compare (zero_extract:SI (match_operand:QI 0 "general_operand_src" 
"r,U,mn>")
  (const_int 1)
  (match_operand 1 "const_int_operand" "n,n,n"))
@@ -927,7 +927,7 @@
(set_attr "cc" "set_zn,set_zn,set_zn")])
 
 (define_insn ""
-  [(set (cc0) 
+  [(set (cc0)
(compare (zero_extract:SI (match_operand:SI 0 "register_operand" "r")
  (const_int 1)
  (match_operand 1 "const_int_operand" "n"))
@@ -939,7 +939,7 @@
(set_attr "cc" "set_zn")])
 
 (define_insn_and_split "*tstsi_upper_bit"
-  [(set (cc0) 
+  [(set (cc0)
(compare (zero_extract:SI (match_operand:SI 0 "register_operand" "r")
  (const_int 1)
  (match_operand 1 "const_int_operand" "n"))
@@ -1004,7 +1004,7 @@
(set_attr "cc" "set_zn,set_zn,set_zn")])
 
 (define_insn "*tstqi"
-  [(set (cc0) 
+  [(set (cc0)
(compare (match_operand:QI 0 "register_operand" "r")
 (const_int 0)))]
   ""
@@ -1169,7 +1169,7 @@
   "TARGET_H8300 && epilogue_completed"
   [(const_int 0)]
   {
-split_adds_subs (HImode, operands); 
+split_adds_subs (HImode, operands);
 DONE;
   })
 
@@ -1233,7 +1233,7 @@
   ""
   [(const_int 0)]
   {
-split_adds_subs (HImode, operands); 
+split_adds_subs (HImode, operands);
 DONE;
   })
 
@@ -1262,7 +1262,7 @@
(plus:SI (match_operand:SI 1 "h8300_dst_operand" "%0,0")
 (match_operand:SI 2 "h8300_src_operand" "i,rQ")))]
   "(TARGET_H8300H || TARGET_H8300S) && h8300_operands_match_p (operands)"
-{  
+{
   return output_plussi (operands);
 }
   [(set (attr "length")
@@ -1289,7 +1289,7 @@
   "TARGET_H8300H || TARGET_H8300S"
   [(const_int 0)]
   {
-split_adds_subs (SImode, operands); 
+split_adds_subs (SImode, operands);
 DONE;
   })
 
@@ -1585,7 +1585,7 @@
   "TARGET_H8300SX"
   "divu.w\\t%T2,%T0"
   [(set_attr "length" "2")])
-  
+
 (define_insn "divhi3"
   [(set (match_operand:HI 0 "register_operand" "=r")
(div:HI (match_operand:HI 1 "register_operand" "0")
@@ -1593,7 +1593,7 @@
   "TARGET_H8300SX"
   "divs.w\\t%T2,%T0"
   [(set_attr "length" "2")])
-  
+
 (define_insn "udivsi3"
   [(set (match_operand:SI 0 "register_operand" "=r")
(udiv:SI (match_operand:SI 1 "register_operand" "0")
@@ -1601,7 +1601,7 @@
   "TARGET_H8300SX"
   "divu.l\\t%S2,%S0"
   [(set_attr "length" "2")])
-  
+
 (define_insn "divsi3"
   [(set (match_operand:SI 0 "register_operand" "=r")
(div:SI (match_operand:SI 1 "register_operand" "0")
@@ -1609,7 +1609,7 @@
   "TARGET_H8300SX"
   "divs.l\\t%S2,%S0"
   [(set_attr "length" "2")])
-  
+
 (define_insn "udivmodqi4"
   [(set (match_operand:QI 0 "register_operand" "=r")
(truncate:QI
@@ -2263,7 +2263,7 @@
 if ((GET_CODE (operands[2]) != REG && operands[2] != const0_rtx)
&& TARGET_H8300)
   operands[2] = force_reg (HImode, operands[2]);
-h8300_expand_branch (operands); 
+h8300_expand_branch (operands);
 DONE;
   })
 
@@ -2648,7 +2648,7 @@
   [(const_int 0)]
   ""
   {
-h8300_expand_prologue (); 
+h8300_expand_prologue ();
 DONE;
   })
 
@@ -2656,7 +2656,7 @@
   [(return)]
   ""
   {
-h8300_expand_epilogue (); 
+h8300_expand_epilogue ();
 DONE;
   })
 
@@ -2671,7 +2671,7 @@
   else if (TARGET_H8300H)
 return 
"mov.l\\ter0,@-er7\;stc\\tccr,r0l\;mov.b\\tr0l,@(4,er7)\;mov.l\\t@er7+,er0\;orc\\t#128,ccr";
   else if (TARGET_H8300S && TARGET_NEXR )
-return 
"mov.l\\ter0,@-er7\;stc\tccr,r0l\;mov.b\tr0l,@(4,er7)\;mov.l\\t@er7+,er0\;orc\t#128,ccr";
 
+

Re: [PATCH] RISC-V: Fix interrupt support for -g.

2018-07-02 Thread Kito Cheng
Hi Jim:

It's no problem with current approach, I just think it can simplify
the .md file.

Thanks :)
On Tue, Jul 3, 2018 at 11:22 AM Jim Wilson  wrote:
>
> On Mon, Jul 2, 2018 at 8:04 PM, Kito Cheng  wrote:
> > Does it possible just combine those pattern into simple_return
> > pattern, and then check the function type and output correct return
> > instruction in riscv_output_return?
>
> There might be problems with optimizations thinking this is a regular
> return when it isn't.  But it might work if we have enough checks for
> the interrupt attribute in the right places.
>
> Having different patterns for different types of return instructions
> makes the RTL dumps self contained.  You can tell what kind of
> instruction it is without having to look at the source tree to see how
> the function was declared.  There is some minor benefit to that.
>
> Is there a problem with the current approach that needs fixing?
>
> Jim


Re: [PATCH] RISC-V: Fix interrupt support for -g.

2018-07-02 Thread Jim Wilson
On Mon, Jul 2, 2018 at 8:04 PM, Kito Cheng  wrote:
> Does it possible just combine those pattern into simple_return
> pattern, and then check the function type and output correct return
> instruction in riscv_output_return?

There might be problems with optimizations thinking this is a regular
return when it isn't.  But it might work if we have enough checks for
the interrupt attribute in the right places.

Having different patterns for different types of return instructions
makes the RTL dumps self contained.  You can tell what kind of
instruction it is without having to look at the source tree to see how
the function was declared.  There is some minor benefit to that.

Is there a problem with the current approach that needs fixing?

Jim


Re: [PATCH] RISC-V: Fix interrupt support for -g.

2018-07-02 Thread Kito Cheng
Hi Jim:

Does it possible just combine those pattern into simple_return
pattern, and then check the function type and output correct return
instruction in riscv_output_return?
On Tue, Jul 3, 2018 at 8:22 AM Jim Wilson  wrote:
>
> This fixes a problem found by someone trying to use the new RISC-V interrupt
> attribute support.  With a slightly non-trivial example, and the -g option, we
> get an abort in dwarf2cfi for an inconsistent CFI state.  This is my fault
> for not making the new interrupt return patterns look enough like regular
> return patterns, which is simple to fix.
>
> Tested with cross riscv32-elf and riscv64-linux builds and tests.  There were
> no regressions.  The new testcase fails without the patch and works with the
> patch.
>
> Committed.
>
> Jim
>
> gcc/
> * config/riscv/riscv.c (riscv_expand_epilogue): Use emit_jump_insn
> instead of emit_insn for interrupt returns.
> * config/riscv/riscv.md (riscv_met): Add (return) to rtl.
> (riscv_sret, riscv_uret): Likewise.
>
> gcc/testsuite/
> * gcc.target/riscv/interrupt-debug.c: New.
> ---
>  gcc/config/riscv/riscv.c |  6 +++---
>  gcc/config/riscv/riscv.md|  9 ++---
>  gcc/testsuite/gcc.target/riscv/interrupt-debug.c | 15 +++
>  3 files changed, 24 insertions(+), 6 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/interrupt-debug.c
>
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index 2709ebdd797..d87836f53f8 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -3985,11 +3985,11 @@ riscv_expand_epilogue (int style)
>enum riscv_privilege_levels mode = cfun->machine->interrupt_mode;
>
>if (mode == MACHINE_MODE)
> -   emit_insn (gen_riscv_mret ());
> +   emit_jump_insn (gen_riscv_mret ());
>else if (mode == SUPERVISOR_MODE)
> -   emit_insn (gen_riscv_sret ());
> +   emit_jump_insn (gen_riscv_sret ());
>else
> -   emit_insn (gen_riscv_uret ());
> +   emit_jump_insn (gen_riscv_uret ());
>  }
>else if (style != SIBCALL_RETURN)
>  emit_jump_insn (gen_simple_return_internal (ra));
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 7b411f0538e..613af9d79e4 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -2328,17 +2328,20 @@
>"fsflags\t%0")
>
>  (define_insn "riscv_mret"
> -  [(unspec_volatile [(const_int 0)] UNSPECV_MRET)]
> +  [(return)
> +   (unspec_volatile [(const_int 0)] UNSPECV_MRET)]
>""
>"mret")
>
>  (define_insn "riscv_sret"
> -  [(unspec_volatile [(const_int 0)] UNSPECV_SRET)]
> +  [(return)
> +   (unspec_volatile [(const_int 0)] UNSPECV_SRET)]
>""
>"sret")
>
>  (define_insn "riscv_uret"
> -  [(unspec_volatile [(const_int 0)] UNSPECV_URET)]
> +  [(return)
> +   (unspec_volatile [(const_int 0)] UNSPECV_URET)]
>""
>"uret")
>
> diff --git a/gcc/testsuite/gcc.target/riscv/interrupt-debug.c 
> b/gcc/testsuite/gcc.target/riscv/interrupt-debug.c
> new file mode 100644
> index 000..a1b6dac8fbb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/interrupt-debug.c
> @@ -0,0 +1,15 @@
> +/* Verify that we can compile with debug info.  */
> +/* { dg-do compile } */
> +/* { dg-options "-Og -g" } */
> +extern int var1;
> +extern int var2;
> +extern void sub2 (void);
> +
> +void __attribute__ ((interrupt))
> +sub (void)
> +{
> +  if (var1)
> +var2 = 0;
> +  else
> +sub2 ();
> +}
> --
> 2.17.1
>


Test (please ignore)

2018-07-02 Thread Qing Zhao




[PATCH] RISC-V: Fix interrupt support for -g.

2018-07-02 Thread Jim Wilson
This fixes a problem found by someone trying to use the new RISC-V interrupt
attribute support.  With a slightly non-trivial example, and the -g option, we
get an abort in dwarf2cfi for an inconsistent CFI state.  This is my fault
for not making the new interrupt return patterns look enough like regular
return patterns, which is simple to fix.

Tested with cross riscv32-elf and riscv64-linux builds and tests.  There were
no regressions.  The new testcase fails without the patch and works with the
patch.

Committed.

Jim

gcc/
* config/riscv/riscv.c (riscv_expand_epilogue): Use emit_jump_insn
instead of emit_insn for interrupt returns.
* config/riscv/riscv.md (riscv_met): Add (return) to rtl.
(riscv_sret, riscv_uret): Likewise.

gcc/testsuite/
* gcc.target/riscv/interrupt-debug.c: New.
---
 gcc/config/riscv/riscv.c |  6 +++---
 gcc/config/riscv/riscv.md|  9 ++---
 gcc/testsuite/gcc.target/riscv/interrupt-debug.c | 15 +++
 3 files changed, 24 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/interrupt-debug.c

diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
index 2709ebdd797..d87836f53f8 100644
--- a/gcc/config/riscv/riscv.c
+++ b/gcc/config/riscv/riscv.c
@@ -3985,11 +3985,11 @@ riscv_expand_epilogue (int style)
   enum riscv_privilege_levels mode = cfun->machine->interrupt_mode;
 
   if (mode == MACHINE_MODE)
-   emit_insn (gen_riscv_mret ());
+   emit_jump_insn (gen_riscv_mret ());
   else if (mode == SUPERVISOR_MODE)
-   emit_insn (gen_riscv_sret ());
+   emit_jump_insn (gen_riscv_sret ());
   else
-   emit_insn (gen_riscv_uret ());
+   emit_jump_insn (gen_riscv_uret ());
 }
   else if (style != SIBCALL_RETURN)
 emit_jump_insn (gen_simple_return_internal (ra));
diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
index 7b411f0538e..613af9d79e4 100644
--- a/gcc/config/riscv/riscv.md
+++ b/gcc/config/riscv/riscv.md
@@ -2328,17 +2328,20 @@
   "fsflags\t%0")
 
 (define_insn "riscv_mret"
-  [(unspec_volatile [(const_int 0)] UNSPECV_MRET)]
+  [(return)
+   (unspec_volatile [(const_int 0)] UNSPECV_MRET)]
   ""
   "mret")
 
 (define_insn "riscv_sret"
-  [(unspec_volatile [(const_int 0)] UNSPECV_SRET)]
+  [(return)
+   (unspec_volatile [(const_int 0)] UNSPECV_SRET)]
   ""
   "sret")
 
 (define_insn "riscv_uret"
-  [(unspec_volatile [(const_int 0)] UNSPECV_URET)]
+  [(return)
+   (unspec_volatile [(const_int 0)] UNSPECV_URET)]
   ""
   "uret")
 
diff --git a/gcc/testsuite/gcc.target/riscv/interrupt-debug.c 
b/gcc/testsuite/gcc.target/riscv/interrupt-debug.c
new file mode 100644
index 000..a1b6dac8fbb
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/interrupt-debug.c
@@ -0,0 +1,15 @@
+/* Verify that we can compile with debug info.  */
+/* { dg-do compile } */
+/* { dg-options "-Og -g" } */
+extern int var1;
+extern int var2;
+extern void sub2 (void);
+
+void __attribute__ ((interrupt))
+sub (void)
+{
+  if (var1)
+var2 = 0;
+  else
+sub2 ();
+}
-- 
2.17.1



[patch, fortran] Asynchronous I/O, take 3

2018-07-02 Thread Thomas König

Hello world,

the attached patch is the third take on Nicolas' and my patch
for implementing asynchronous I/O.  Some parts have been reworked, and 
several bugs which caused either incorrect I/O or hangs have been

fixed in the process.

I have to say that getting out these bugs has been much harder
than Nicolas and I originally thought, and that this has cost more
working hours than any other patch I have been involved in.

This has been regression-tested on x86_64-pc-linux-gnu. The new test
cases have also been tested in a tight loop with

n=1; while ./a.out; do echo -n $n " " ; n=$((n+1)); done

or (for async_io_3.f90, which is supposed to fail)

while true ; do ./a.out > /dev/null 2>&1 ;  echo -n $n " " ; n=$((n+1)); 
done


and the test cases also come up clean with valgrind --tool=drd
(which is a _very_ strict tool which, after this experience, I
wholeheartedly recommend for doing pthreads debugging).

The interface remains as before - link in pthread to get asynchronous
I/O, which matches what ifort does.

So, OK for trunk?

Regards

Thomas

2018-07-02  Nicolas Koenig  
Thomas Koenig 

PR fortran/25829
* gfortran.texi: Add description of asynchronous I/O.
* trans-decl.c (gfc_finish_var_decl): Treat asynchronous variables
as volatile.
* trans-io.c (gfc_build_io_library_fndecls): Rename st_wait to
st_wait_async and change argument spec from ".X" to ".w".
(gfc_trans_wait): Pass ID argument via reference.

2018-07-02  Nicolas Koenig  
Thomas Koenig 

PR fortran/25829
* gfortran.dg/f2003_inquire_1.f03: Add write statement.
* gfortran.dg/f2003_io_1.f03: Add wait statement.

2018-01-02  Nicolas Koenig  
Thomas Koenig 

PR fortran/25829
* Makefile.am: Add async.c to gfor_io_src.
Add async.h to gfor_io_headers.
* Makefile.in: Regenerated.
* gfortran.map: Add _gfortran_st_wait_async.
* io/async.c: New file.
* io/async.h: New file.
* io/close.c: Include async.h.
(st_close): Call async_wait for an asynchronous unit.
* io/file_pos.c (st_backspace): Likewise.
(st_endfile): Likewise.
(st_rewind): Likewise.
(st_flush): Likewise.
* io/inquire.c: Add handling for asynchronous PENDING
and ID arguments.
* io/io.h (st_parameter_dt): Add async bit.
(st_parameter_wait): Correct.
(gfc_unit): Add au pointer.
(st_wait_async): Add prototype.
(transfer_array_inner): Likewise.
(st_write_done_worker): Likewise.
* io/open.c: Include async.h.
(new_unit): Initialize asynchronous unit.
* io/transfer.c (async_opt): New struct.
(wrap_scalar_transfer): New function.
(transfer_integer): Call wrap_scalar_transfer to do the work.
(transfer_real): Likewise.
(transfer_real_write): Likewise.
(transfer_character): Likewise.
(transfer_character_wide): Likewise.
(transfer_complex): Likewise.
(transfer_array_inner): New function.
(transfer_array): Call transfer_array_inner.
(transfer_derived): Call wrap_scalar_transfer.
(data_transfer_init): Check for asynchronous I/O.
Perform a wait operation on any pending asynchronous I/O
if the data transfer is synchronous. Copy PDT and enqueue
thread for data transfer.
(st_read_done_worker): New function.
(st_read_done): Enqueue transfer or call st_read_done_worker.
(st_write_done_worker): New function.
(st_write_done): Enqueue transfer or call st_read_done_worker.
(st_wait): Document as no-op for compatibility reasons.
(st_wait_async): New function.
* io/unit.c (insert_unit): Use macros LOCK, UNLOCK and TRYLOCK;
add NOTE where necessary.
(get_gfc_unit): Likewise.
(init_units): Likewise.
(close_unit_1): Likewise. Call async_close if asynchronous.
(close_unit): Use macros LOCK and UNLOCK.
(finish_last_advance_record): Likewise.
(newunit_alloc): Likewise.
* io/unix.c (find_file): Likewise.
(flush_all_units_1): Likewise.
(flush_all_units): Likewise.
* libgfortran.h (generate_error_common): Add prototype.
* runtime/error.c: Include io.h and async.h.
(generate_error_common): New function.

2018-07-02  Nicolas Koenig  
Thomas Koenig 

PR fortran/25829
* testsuite/libgfomp.fortran/async_io_1.f90: New test.
* testsuite/libgfomp.fortran/async_io_2.f90: New test.
* testsuite/libgfomp.fortran/async_io_3.f90: New test.

Index: gcc/fortran/gfortran.texi
===
--- gcc/fortran/gfortran.texi	(Revision 259739)
+++ gcc/fortran/gfortran.texi	(Arbeitskopie)
@@ -882,8 +882,7 @@ than @code{(/.../)}.  Type-specification for array
 @item Extensions to the 

[PATCH] P0758R1 Implicit conversion traits

2018-07-02 Thread Jonathan Wakely

Extend __is_convertible_helper to also detect whether the conversion is
non-throwing, for std::is_nothrow_convertible in C++2a,

* include/std/type_traits [__cplusplus > 201703]
(__is_convertible_helper::__is_nothrow_type): Define new member.
(__is_convertible_helper<_From, _To, false>::__test_aux1): Add
noexcept.
(__is_convertible_helper<_From, _To, false>::__test_nothrow)
(__is_convertible_helper<_From, _To, false>::__is_nothrow_type): Add
new members.
(is_nothrow_convertible, is_nothrow_convertible_v): Define for C++2a.
* testsuite/20_util/is_nothrow_convertible/value.cc: New.
* testsuite/20_util/is_nothrow_convertible/requirements/
explicit_instantiation.cc: New.
* testsuite/20_util/is_nothrow_convertible/requirements/typedefs.cc:
New.

Tested powerpc64le-linux, committed to trunk.


commit 3cc486f1802c45d8705d8d5239170a0e1c0151c7
Author: Jonathan Wakely 
Date:   Mon Jul 2 21:33:14 2018 +0100

P0758R1 Implicit conversion traits

Extend __is_convertible_helper to also detect whether the conversion is
non-throwing, for std::is_nothrow_convertible in C++2a,

* include/std/type_traits [__cplusplus > 201703]
(__is_convertible_helper::__is_nothrow_type): Define new member.
(__is_convertible_helper<_From, _To, false>::__test_aux1): Add
noexcept.
(__is_convertible_helper<_From, _To, false>::__test_nothrow)
(__is_convertible_helper<_From, _To, false>::__is_nothrow_type): Add
new members.
(is_nothrow_convertible, is_nothrow_convertible_v): Define for 
C++2a.
* testsuite/20_util/is_nothrow_convertible/value.cc: New.
* testsuite/20_util/is_nothrow_convertible/requirements/
explicit_instantiation.cc: New.
* testsuite/20_util/is_nothrow_convertible/requirements/typedefs.cc:
New.

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index b2d3380f024..accea6df648 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -1341,13 +1341,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
bool = __or_, is_function<_To>,
 is_array<_To>>::value>
 struct __is_convertible_helper
-{ typedef typename is_void<_To>::type type; };
+{
+  typedef typename is_void<_To>::type type;
+#if __cplusplus > 201703L
+  typedef type __is_nothrow_type;
+#endif
+};
 
   template
 class __is_convertible_helper<_From, _To, false>
 {
-   template
-   static void __test_aux(_To1);
+  template
+   static void __test_aux(_To1) noexcept;
 
   template(std::declval<_From1>()))>
@@ -1358,8 +1363,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
static false_type
__test(...);
 
+#if __cplusplus > 201703L
+  template(std::declval<_From1>()))>
+   static __bool_constant<_NoEx>
+   __test_nothrow(int);
+
+  template
+   static false_type
+   __test_nothrow(...);
+#endif
+
 public:
   typedef decltype(__test<_From, _To>(0)) type;
+
+#if __cplusplus > 201703L
+  typedef decltype(__test_nothrow<_From, _To>(0)) __is_nothrow_type;
+#endif
 };
 
 
@@ -1369,6 +1389,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public __is_convertible_helper<_From, _To>::type
 { };
 
+#if __cplusplus > 201703L
+  /// is_nothrow_convertible
+  template
+struct is_nothrow_convertible
+: public __is_convertible_helper<_From, _To>::__is_nothrow_type
+{ };
+
+  /// is_nothrow_convertible_v
+  template
+inline constexpr bool is_nothrow_convertible_v
+  = is_nothrow_convertible<_From, _To>::value;
+#endif // C++2a
 
   // Const-volatile modifications.
 
diff --git 
a/libstdc++-v3/testsuite/20_util/is_nothrow_convertible/requirements/explicit_instantiation.cc
 
b/libstdc++-v3/testsuite/20_util/is_nothrow_convertible/requirements/explicit_instantiation.cc
new file mode 100644
index 000..68f1adecb38
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/20_util/is_nothrow_convertible/requirements/explicit_instantiation.cc
@@ -0,0 +1,29 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+

[PING][PATCH, rs6000, C/C++] Fix PR target/86324: divkc3-1.c FAILs when compiling with -mabi=ieeelongdouble

2018-07-02 Thread Peter Bergner
I'd like to PING:

  https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01713.html

I've included the entire patch below, since I missed the test cases in
the original submission and Segher asked for some updated text for the
hook documentation which I've included below.

Peter


gcc/
PR target/86324
* target.def (translate_mode_attribute): New hook.
* targhooks.h (default_translate_mode_attribute): Declare.
* targhooks.c (default_translate_mode_attribute): New function.
* doc/tm.texi.in (TARGET_TRANSLATE_MODE_ATTRIBUTE): New hook.
* doc/tm.texi: Regenerate.
* config/rs6000/rs6000.c (TARGET_TRANSLATE_MODE_ATTRIBUTE): Define.
(rs6000_translate_mode_attribute): New function.

gcc/c-family/
PR target/86324
* c-attribs.c (handle_mode_attribute): Call new translate_mode_attribute
target hook.

gcc/testsuite/
PR target/86324
gcc.target/powerpc/pr86324-1.c: New test.
gcc.target/powerpc/pr86324-2.c: Likewise.

Index: gcc/target.def
===
--- gcc/target.def  (revision 262159)
+++ gcc/target.def  (working copy)
@@ -3310,6 +3310,16 @@ constants can be done inline.  The funct
  HOST_WIDE_INT, (const_tree constant, HOST_WIDE_INT basic_align),
  default_constant_alignment)
 
+DEFHOOK
+(translate_mode_attribute,
+ "Define this hook if during mode attribute processing, the port should\n\
+translate machine_mode @var{mode} to another mode.  For example, rs6000's\n\
+@code{KFmode}, when it is the same as @code{TFmode}.\n\
+\n\
+The default version of the hook returns that mode that was passed in.",
+ machine_mode, (machine_mode mode),
+ default_translate_mode_attribute)
+
 /* True if MODE is valid for the target.  By "valid", we mean able to
be manipulated in non-trivial ways.  In particular, this means all
the arithmetic is supported.  */
Index: gcc/targhooks.c
===
--- gcc/targhooks.c (revision 262159)
+++ gcc/targhooks.c (working copy)
@@ -393,6 +393,14 @@ default_mangle_assembler_name (const cha
   return get_identifier (stripped);
 }
 
+/* The default implementation of TARGET_TRANSLATE_MODE_ATTRIBUTE.  */
+
+machine_mode
+default_translate_mode_attribute (machine_mode mode)
+{
+  return mode;
+}
+
 /* True if MODE is valid for the target.  By "valid", we mean able to
be manipulated in non-trivial ways.  In particular, this means all
the arithmetic is supported.
Index: gcc/targhooks.h
===
--- gcc/targhooks.h (revision 262159)
+++ gcc/targhooks.h (working copy)
@@ -72,6 +72,7 @@ extern void default_print_operand_addres
 extern bool default_print_operand_punct_valid_p (unsigned char);
 extern tree default_mangle_assembler_name (const char *);
 
+extern machine_mode default_translate_mode_attribute (machine_mode);
 extern bool default_scalar_mode_supported_p (scalar_mode);
 extern bool default_libgcc_floating_mode_supported_p (scalar_float_mode);
 extern opt_scalar_float_mode default_floatn_mode (int, bool);
Index: gcc/doc/tm.texi.in
===
--- gcc/doc/tm.texi.in  (revision 262159)
+++ gcc/doc/tm.texi.in  (working copy)
@@ -3336,6 +3336,8 @@ stack.
 
 @hook TARGET_REF_MAY_ALIAS_ERRNO
 
+@hook TARGET_TRANSLATE_MODE_ATTRIBUTE
+
 @hook TARGET_SCALAR_MODE_SUPPORTED_P
 
 @hook TARGET_VECTOR_MODE_SUPPORTED_P
Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi (revision 262159)
+++ gcc/doc/tm.texi (working copy)
@@ -4255,6 +4255,14 @@ hook returns true for both @code{ptr_mod
 Define this to return nonzero if the memory reference @var{ref}  may alias 
with the system C library errno location.  The default  version of this hook 
assumes the system C library errno location  is either a declaration of type 
int or accessed by dereferencing  a pointer to int.
 @end deftypefn
 
+@deftypefn {Target Hook} machine_mode TARGET_TRANSLATE_MODE_ATTRIBUTE 
(machine_mode @var{mode})
+Define this hook if during mode attribute processing, the port should
+translate machine_mode @var{mode} to another mode.  For example, rs6000's
+@code{KFmode}, when it is the same as @code{TFmode}.
+
+The default version of the hook returns that mode that was passed in.
+@end deftypefn
+
 @deftypefn {Target Hook} bool TARGET_SCALAR_MODE_SUPPORTED_P (scalar_mode 
@var{mode})
 Define this to return nonzero if the port is prepared to handle
 insns involving scalar mode @var{mode}.  For a scalar mode to be
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 262159)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -1802,6 +1802,9 @@ static const struct attribute_spec rs600
 #undef TARGET_EH_RETURN_FILTER_MODE
 #define 

[PATCH] rs6000: Set up ieee128_float_type_node correctly (PR86285)

2018-07-02 Thread Segher Boessenkool
[ I wonder if I ever sent this patch to the list.  I now backported it
  to 8 branch as well.  Apologies. ]

We shouldn't init __ieee128 to be the same as long double if the
latter is not even a 128-bit type.

This also reorders the nearby __ibm128 code so both types use similar
logic.

Committing to trunk.  This should also go to 8.


Segher


2018-06-26  Segher Boessenkool  

PR target/86285
* config/rs6000/rs6000.c (rs6000_init_builtins): Do not set
ieee128_float_type_node to long_double_type_node unless
TARGET_LONG_DOUBLE_128 is set.

---
 gcc/config/rs6000/rs6000.c | 13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index cb228ad..5b70521 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -16414,21 +16414,24 @@ rs6000_init_builtins (void)
  __ieee128.  */
   if (TARGET_FLOAT128_TYPE)
 {
-  if (TARGET_IEEEQUAD || !TARGET_LONG_DOUBLE_128)
+  if (!TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128)
+   ibm128_float_type_node = long_double_type_node;
+  else
{
  ibm128_float_type_node = make_node (REAL_TYPE);
  TYPE_PRECISION (ibm128_float_type_node) = 128;
  SET_TYPE_MODE (ibm128_float_type_node, IFmode);
  layout_type (ibm128_float_type_node);
}
-  else
-   ibm128_float_type_node = long_double_type_node;
 
   lang_hooks.types.register_builtin_type (ibm128_float_type_node,
  "__ibm128");
 
-  ieee128_float_type_node
-   = TARGET_IEEEQUAD ? long_double_type_node : float128_type_node;
+  if (TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128)
+   ieee128_float_type_node = long_double_type_node;
+  else
+   ieee128_float_type_node = float128_type_node;
+
   lang_hooks.types.register_builtin_type (ieee128_float_type_node,
  "__ieee128");
 }
-- 
1.8.3.1



Re: [PATCH][Middle-end]3rd patch of PR78809

2018-07-02 Thread Qing Zhao
Hi, Jeff,

thanks a lot for your review and comments.

I have addressed your comments,updated the patch, retested on both
aarch64 and x86.

The major changes in this version compared to the previous version are:

1. in routine expand_builtin_memcmp:
* move the inlining transformation AFTER the warning is issues for
-Wstringop-overflow;
* only apply inlining when there is No warning is issued.
2. in the testsuite, add a new testcase strcmpopt_6.c for this case.
3. update comments to:
* capitalize the first word.
* capitalize all the arguments.

NOTE, the routine expand_builtin_strcmp and expand_builtin_strncmp are not 
changed.
the reason is:  there is NO overflow checking for these two routines currently.
if we need overflow checking for these two routines, I think that a separate 
patch is needed.
if this is needed, let me know, I can work on this separate patch for issuing 
warning for strcmp/strncmp when
-Wstringop-overflow is specified.

The new patch is as following, please take a look at it.

thanks.

Qing

gcc/ChangeLog

+2018-07-02  Qing Zhao  
+
+   PR middle-end/78809
+   * builtins.c (expand_builtin_memcmp): Inline the calls first
+   when result_eq is false.
+   (expand_builtin_strcmp): Inline the calls first.
+   (expand_builtin_strncmp): Likewise.
+   (inline_string_cmp): New routine. Expand a string compare
+   call by using a sequence of char comparison.
+   (inline_expand_builtin_string_cmp): New routine. Inline expansion
+   a call to str(n)cmp/memcmp.
+   * doc/invoke.texi (--param builtin-string-cmp-inline-length): New 
option.
+   * params.def (BUILTIN_STRING_CMP_INLINE_LENGTH): New.
+

gcc/testsuite/ChangeLog

+2018-07-02  Qing Zhao  
+
+   PR middle-end/78809
+   * gcc.dg/strcmpopt_5.c: New test.
+   * gcc.dg/strcmpopt_6.c: New test.
+



0001-3nd-Patch-for-PR78009.patch
Description: Binary data


> But leave them as lower case in code fragments like this.
> 
> 
> So this generally looks pretty good.  THe biggest technical concern is
> making sure we're doing the right thing WRT issuing warnings.  You can
> tackle that problem by deferring inlining to a later point after
> warnings have been issued or by verifying that your routines do not
> inline in cases where warnings will be issued.  It may be worth adding
> testcases for these issues.
> 
> There's a large number of comments that need capitalization fixes.
> 
> Given there was no measured runtime performance impact, but slight
> improvements on codesize for values <= 3, let's go ahead with that as
> the default.
> 
> Can you address the issues above and repost for final review?
> 
> Thanks,
> jeff



[PATCH] P0887R1 The identity metafunction

2018-07-02 Thread Jonathan Wakely

A recently-approved C++2a feature, trivial to implement.

* include/std/type_traits (type_identity, type_identity_t): Define
   for C++2a.
* testsuite/20_util/type_identity/requirements/alias_decl.cc: New.
* testsuite/20_util/type_identity/requirements/
explicit_instantiation.cc:New.
* testsuite/20_util/type_identity/requirements/typedefs.cc: New.

Tested powerpc64le-linux, committed to trunk.

commit 54f6b78dc937cbaee50522bf9cd18ac4e0855d39
Author: Jonathan Wakely 
Date:   Mon Jul 2 21:32:24 2018 +0100

P0887R1 The identity metafunction

* include/std/type_traits (type_identity, type_identity_t): Define
for C++2a.
* testsuite/20_util/type_identity/requirements/alias_decl.cc: New.
* testsuite/20_util/type_identity/requirements/
explicit_instantiation.cc:New.
* testsuite/20_util/type_identity/requirements/typedefs.cc: New.

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 0c7e97286ca..b2d3380f024 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -2996,6 +2996,13 @@ template 
   template
 using remove_cvref_t = __remove_cvref_t<_Tp>;
 
+  /// Identity metafunction.
+  template
+struct type_identity { using type = _Tp; };
+
+  template
+using type_identity_t = typename type_identity<_Tp>::type;
+
 #endif // C++2a
 
 _GLIBCXX_END_NAMESPACE_VERSION
diff --git 
a/libstdc++-v3/testsuite/20_util/type_identity/requirements/alias_decl.cc 
b/libstdc++-v3/testsuite/20_util/type_identity/requirements/alias_decl.cc
new file mode 100644
index 000..6729d118273
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/type_identity/requirements/alias_decl.cc
@@ -0,0 +1,35 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+//
+#include 
+
+using namespace std;
+
+template>
+  struct test; // undefined
+
+template
+  struct test : std::true_type { };
+
+static_assert( test{}, "type_identity_t" );
+static_assert( test{}, "type_identity_t" );
+static_assert( test{}, "type_identity_t" );
+static_assert( test{}, "type_identity_t" );
+static_assert( test{}, "type_identity_t" );
diff --git 
a/libstdc++-v3/testsuite/20_util/type_identity/requirements/explicit_instantiation.cc
 
b/libstdc++-v3/testsuite/20_util/type_identity/requirements/explicit_instantiation.cc
new file mode 100644
index 000..07387e719ce
--- /dev/null
+++ 
b/libstdc++-v3/testsuite/20_util/type_identity/requirements/explicit_instantiation.cc
@@ -0,0 +1,29 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+// NB: This file is for testing type_traits with NO OTHER INCLUDES.
+
+#include 
+
+namespace std
+{
+  typedef short test_type;
+  template struct type_identity;
+}
diff --git 
a/libstdc++-v3/testsuite/20_util/type_identity/requirements/typedefs.cc 
b/libstdc++-v3/testsuite/20_util/type_identity/requirements/typedefs.cc
new file mode 100644
index 000..77f0cba9b4c
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/type_identity/requirements/typedefs.cc
@@ -0,0 +1,94 @@
+// Copyright (C) 2018 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software 

[PATCH 2/2] optinfo: add diagnostic remarks

2018-07-02 Thread David Malcolm
This patch adds the first destination for optinfo instances to be
emitted to: as "remarks" through the diagnostics subsystem.

Examples can be seen at
  https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.remarks.html

Remarks look a lot like the output of -fopt-info, with the following
differences:

* announcing "In function 'blah':" etc when the function changes.

* printing the corresponding source lines (unless
  "-fno-diagnostics-show-caret"), as per other diagnostics, and

* colorizing the various parts of the message if stderr is at a tty.

* showing extra metadata:
  * the pass that emitted the remark,
  * the execution count of the code in question
  * which file/line/function of GCC emitted the remark

* possibly allowing for remarks to be used in DejaGnu tests to better
  associate testing of an optimization with the source line in
  question, rather than relying on a scan of the dumpfile (which is
  per-source file and thus rather "coarse-grained").*

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

(see the notes in the cover letter about the state of this patch;
posting for motivation of the optinfo stuff).

gcc/ChangeLog:
* Makefile.in (OBJS): Add optinfo-emit-diagnostics.o.
* common.opt (fremarks): New option.
(fdiagnostics-show-remark-hotness): New option.
(fdiagnostics-show-remark-origin): New option.
(fdiagnostics-show-remark-pass): New option.
* diagnostic-color.c (color_dict): Add "remark", as bold green.
* diagnostic-core.h (remark): New decl.
* diagnostic.c (diagnostic_action_after_output): Handle DK_REMARK.
(remark): New function.
* diagnostic.def (DK_REMARK): New diagnostic kind.
* doc/invoke.texi (Remarks): New section.
(-fremarks): New option.
(-fno-diagnostics-show-remark-hotness): New option.
(-fno-diagnostics-show-remark-origin): New option.
(-fno-diagnostics-show-remark-pass): New option.
* dumpfile.c (dump_context::get_scope_depth): Update comment.
(dump_context::end_any_optinfo): Likewise.
* dumpfile.h: Update comment.
* opt-functions.awk (function): Handle "Remark" by adding
CL_REMARK.
* optinfo-emit-diagnostics.cc: New file.
* optinfo-emit-diagnostics.h: New file.
* optinfo.cc: Include "optinfo-emit-diagnostics.h".
(optinfo::emit): Call emit_optinfo_as_diagnostic_remark.
(optinfo_enabled_p): Use flag_remarks.
* optinfo.h: Update comment.
* opts.c (print_specific_help): Handle CL_REMARK.
(common_handle_option): Likewise.
* opts.h (CL_REMARK): New macro.
(CL_MAX_OPTION_CLASS): Update for CL_REMARK.
(CL_JOINED, CL_SEPARATE, CL_UNDOCUMENTED, CL_NO_DWARF_RECORD)
(CL_PCH_IGNORE): Likewise.
* profile-count.c (profile_quality_as_string): New function.
* profile-count.h (profile_quality_as_string): New decl.
(profile_count::quality): New accessor.
* selftest-run-tests.c (selftest::run_tests): Call
selftest::optinfo_emit_diagnostics_cc_tests.
* selftest.h (selftest::optinfo_emit_diagnostics_cc_tests): New
decl.

gcc/fortran/ChangeLog:
* gfc-diagnostic.def (DK_REMARK): New diagnostic kind.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add
remarks_plugin.c.
* gcc.dg/plugin/remarks-1.c: New test.
* gcc.dg/plugin/remarks_plugin.c: New test plugin.
* lib/gcc-dg.exp (dg-remark): New function.
---
 gcc/Makefile.in  |   1 +
 gcc/common.opt   |  17 ++
 gcc/diagnostic-color.c   |   2 +
 gcc/diagnostic-core.h|   2 +
 gcc/diagnostic.c |  17 ++
 gcc/diagnostic.def   |   1 +
 gcc/doc/invoke.texi  |  67 ++
 gcc/dumpfile.c   |   5 +-
 gcc/dumpfile.h   |   2 +
 gcc/fortran/gfc-diagnostic.def   |   1 +
 gcc/opt-functions.awk|   1 +
 gcc/optinfo-emit-diagnostics.cc  | 317 +++
 gcc/optinfo-emit-diagnostics.h   |  26 +++
 gcc/optinfo.cc   |   9 +-
 gcc/optinfo.h|   4 +-
 gcc/opts.c   |   4 +
 gcc/opts.h   |  13 +-
 gcc/profile-count.c  |  28 +++
 gcc/profile-count.h  |   5 +
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.h   |   1 +
 gcc/testsuite/gcc.dg/plugin/plugin.exp   |   2 +
 gcc/testsuite/gcc.dg/plugin/remarks-1.c  |  30 +++
 gcc/testsuite/gcc.dg/plugin/remarks_plugin.c | 152 +
 gcc/testsuite/lib/gcc-dg.exp |   9 +
 25 files 

[PATCH 1/2] Add "optinfo" framework

2018-07-02 Thread David Malcolm
This patch implements a way to consolidate dump_* calls into
optinfo objects, as enabling work towards being able to write out
optimization records to a file, or emit them as diagnostic "remarks".

The patch adds the support for building optinfo instances from dump_*
calls, but leaves implementing any *users* of them to followup patches.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

OK for trunk?

gcc/ChangeLog:
* Makefile.in (OBJS): Add optinfo.o.
* coretypes.h (class symtab_node): New forward decl.
(struct cgraph_node): New forward decl.
(class varpool_node): New forward decl.
* dump-context.h: New file.
* dumpfile.c: Include "optinfo.h", "dump-context.h", "cgraph.h",
"tree-pass.h", "optinfo-internal.h".
(refresh_dumps_are_enabled): Use optinfo_enabled_p.
(set_dump_file): Call dumpfile_ensure_any_optinfo_are_flushed.
(set_alt_dump_file): Likewise.
(dump_context::~dump_context): New dtor.
(dump_gimple_stmt): Move implementation to...
(dump_context::dump_gimple_stmt): ...this new member function.
Add the stmt to any pending optinfo, creating one if need be.
(dump_gimple_stmt_loc): Move implementation to...
(dump_context::dump_gimple_stmt_loc): ...this new member function.
Convert param "loc" from location_t to const dump_location_t &.
Start a new optinfo and add the stmt to it.
(dump_generic_expr): Move implementation to...
(dump_context::dump_generic_expr): ...this new member function.
Add the tree to any pending optinfo, creating one if need be.
(dump_generic_expr_loc): Move implementation to...
(dump_context::dump_generic_expr_loc): ...this new member
function.  Add the tree to any pending optinfo, creating one if
need be.
(dump_printf): Move implementation to...
(dump_context::dump_printf_va): ...this new member function.  Add
the text to any pending optinfo, creating one if need be.
(dump_printf_loc): Move implementation to...
(dump_context::dump_printf_loc_va): ...this new member function.
Convert param "loc" from location_t to const dump_location_t &.
Start a new optinfo and add the stmt to it.
(dump_dec): Move implementation to...
(dump_context::dump_dec): ...this new member function.  Add the
value to any pending optinfo, creating one if need be.
(dump_context::dump_symtab_node): New member function.
(dump_context::get_scope_depth): New member function.
(dump_context::begin_scope): New member function.
(dump_context::end_scope): New member function.
(dump_context::ensure_pending_optinfo): New member function.
(dump_context::begin_next_optinfo): New member function.
(dump_context::end_any_optinfo): New member function.
(dump_context::s_current): New global.
(dump_context::s_default): New global.
(dump_scope_depth): Delete global.
(dumpfile_ensure_any_optinfo_are_flushed): New function.
(dump_symtab_node): New function.
(get_dump_scope_depth): Reimplement in terms of dump_context.
(dump_begin_scope): Likewise.
(dump_end_scope): Likewise.
(selftest::temp_dump_context::temp_dump_context): New ctor.
(selftest::temp_dump_context::~temp_dump_context): New dtor.
(selftest::assert_is_text): New support function.
(selftest::assert_is_tree): New support function.
(selftest::assert_is_gimple): New support function.
(selftest::test_capture_of_dump_calls): New test.
(selftest::dumpfile_c_tests): Call it.
* dumpfile.h (dump_printf, dump_printf_loc, dump_basic_block,
dump_generic_expr_loc, dump_generic_expr, dump_gimple_stmt_loc,
dump_gimple_stmt, dump_dec): Gather these related decls and add a
descriptive comment.
(dump_function, print_combine_total_stats, enable_rtl_dump_file,
dump_node, dump_bb): Move these unrelated decls.
(class dump_manager): Add leading comment.
* ggc-page.c (ggc_collect): Call
dumpfile_ensure_any_optinfo_are_flushed.
* optinfo-internal.h: New file.
* optinfo.cc: New file.
* optinfo.h: New file.
* selftest-run-tests.c (selftest::run_tests): Call
selftest::optinfo_cc_tests.
* selftest.h (selftest::optinfo_cc_tests): New decl.
---
 gcc/Makefile.in  |   1 +
 gcc/coretypes.h  |   7 +
 gcc/dump-context.h   | 128 
 gcc/dumpfile.c   | 498 +++
 gcc/dumpfile.h   |  82 +---
 gcc/ggc-page.c   |   2 +
 gcc/optinfo-internal.h   | 148 ++
 gcc/optinfo.cc   | 251 
 gcc/optinfo.h| 175 +
 gcc/selftest-run-tests.c |   1 +
 gcc/selftest.h   |   

[PATCH 0/2] v4: optinfo framework and remarks

2018-07-02 Thread David Malcolm
On Fri, 2018-06-29 at 09:09 +0200, Richard Biener wrote:
> On Thu, Jun 28, 2018 at 4:29 PM David Malcolm 
> wrote:
> > 
> > On Thu, 2018-06-28 at 13:29 +0200, Richard Biener wrote:
> > > On Tue, Jun 26, 2018 at 3:54 PM David Malcolm  > > m>
> > > wrote:
> > > > 
> > > > On Mon, 2018-06-25 at 15:34 +0200, Richard Biener wrote:
> > > > > On Wed, Jun 20, 2018 at 6:34 PM David Malcolm  > > > > t.co
> > > > > m>
> > > > > wrote:
> > > > > > 
> > > > > > Here's v3 of the patch (one big patch this time, rather
> > > > > > than a
> > > > > > kit).
> > > > > > 
> > > > > > Like the v2 patch kit, this patch reuses the existing dump
> > > > > > API,
> > > > > > rather than inventing its own.
> > > > > > 
> > > > > > Specifically, it uses the dump_* functions in dumpfile.h
> > > > > > that
> > > > > > don't
> > > > > > take a FILE *, the ones that implicitly write to dump_file
> > > > > > and/or
> > > > > > alt_dump_file.  I needed a name for them, so I've taken to
> > > > > > calling
> > > > > > them the "structured dump API" (better name ideas welcome).
> > > > > > 
> > > > > > v3 eliminates v2's optinfo_guard class, instead using
> > > > > > "dump_*_loc"
> > > > > > calls as delimiters when consolidating "dump_*"
> > > > > > calls.  There's
> > > > > > a
> > > > > > new dump_context class which has responsibility for
> > > > > > consolidating
> > > > > > them into optimization records.
> > > > > > 
> > > > > > The dump_*_loc calls now capture more than just a
> > > > > > location_t:
> > > > > > they
> > > > > > capture the profile_count and the location in GCC's own
> > > > > > sources
> > > > > > where
> > > > > > the dump is being emitted from.
> > > > > > 
> > > > > > This works by introducing a new "dump_location_t" class as
> > > > > > the
> > > > > > argument of those dump_*_loc calls.  The dump_location_t
> > > > > > can
> > > > > > be constructed from a gimple * or from an rtx_insn *, so
> > > > > > that
> > > > > > rather than writing:
> > > > > > 
> > > > > >   dump_printf_loc (MSG_NOTE, gimple_location (stmt),
> > > > > >"some message: %i", 42);
> > > > > > 
> > > > > > you can write:
> > > > > > 
> > > > > >   dump_printf_loc (MSG_NOTE, stmt,
> > > > > >"some message: %i", 42);
> > > > > > 
> > > > > > and the dump_location_t constructor will grab the
> > > > > > location_t
> > > > > > and
> > > > > > profile_count of stmt, and the location of the
> > > > > > "dump_printf_loc"
> > > > > > callsite (and gracefully handle "stmt" being NULL).
> > > > > > 
> > > > > > Earlier versions of the patch captured the location of the
> > > > > > dump_*_loc call via preprocessor hacks, or didn't work
> > > > > > properly;
> > > > > > this version of the patch works more cleanly: internally,
> > > > > > dump_location_t is split into two new classes:
> > > > > >   * dump_user_location_t: the location_t and profile_count
> > > > > > within
> > > > > > the *user's code*, and
> > > > > >   * dump_impl_location_t: the __builtin_FILE/LINE/FUNCTION
> > > > > > within
> > > > > > the *implementation* code (i.e. GCC or a plugin),
> > > > > > captured
> > > > > > "automagically" via default params
> > > > > > 
> > > > > > These classes are sometimes used elsewhere in the
> > > > > > code.  For
> > > > > > example, "vect_location" becomes a dump_user_location_t
> > > > > > (location_t and profile_count), so that in e.g:
> > > > > > 
> > > > > >   vect_location = find_loop_location (loop);
> > > > > > 
> > > > > > it's capturing the location_t and profile_count, and then
> > > > > > when
> > > > > > it's used here:
> > > > > > 
> > > > > >   dump_printf_loc (MSG_NOTE, vect_location, "foo");
> > > > > > 
> > > > > > the dump_location_t is constructed from the vect_location
> > > > > > plus the dump_impl_location_t at that callsite.
> > > > > > 
> > > > > > In contrast, loop-unroll.c's report_unroll's "locus" param
> > > > > > becomes a dump_location_t: we're interested in where it was
> > > > > > called from, not in the locations of the various dump_*_loc
> > > > > > calls
> > > > > > within it.
> > > > > > 
> > > > > > Previous versions of the patch captured a gimple *, and
> > > > > > needed
> > > > > > GTY markers; in this patch, the dump_user_location_t is now
> > > > > > just a
> > > > > > location_t and a profile_count.
> > > > > > 
> > > > > > The v2 patch added an overload for dump_printf_loc so that
> > > > > > you
> > > > > > could pass in either a location_t, or the new type; this
> > > > > > version
> > > > > > of the patch eliminates that: they all now take
> > > > > > dump_location_t.
> > > > > > 
> > > > > > Doing so required adding support for rtx_insn *, so that
> > > > > > one
> > > > > > can
> > > > > > write this kind of thing in RTL passes:
> > > > > > 
> > > > > >   dump_printf_loc (MSG_NOTE, insn, "foo");
> > > > > > 
> > > > > > One knock-on effect is that get_loop_location now returns a
> > > > > > dump_user_location_t rather than a location_t, so that 

[version 2] Re: [PATCH][Middle-end]3rd patch of PR78809

2018-07-02 Thread Qing Zhao
Hi, Jeff,

thanks a lot for your review and comments.

I have addressed your comments,updated the patch, retested on both
aarch64 and x86.

The major changes in this version compared to the previous version are:

1. in routine “expand_builtin_memcmp”:
* move the inlining transformation AFTER the warning is issues for
-Wstringop-overflow;
* only apply inlining when there is No warning is issued.
2. in the testsuite, add a new testcase strcmpopt_6.c for this case.
3. update comments to:
* capitalize the first word.
* capitalize all the arguments.

NOTE, the routine ”expand_builtin_strcmp” and “expand_builtin_strncmp" are not 
changed.
the reason is:  there is NO overflow checking for these two routines currently.
if we need overflow checking for these two routines, I think that a separate 
patch is needed.
if this is needed, let me know, I can work on this separate patch for issuing 
warning for strcmp/strncmp when
-Wstringop-overflow is specified.

The new patch is as following, please take a look at it.

thanks.

Qing

gcc/ChangeLog

+2018-07-02  Qing Zhao  
+
+   PR middle-end/78809
+   * builtins.c (expand_builtin_memcmp): Inline the calls first
+   when result_eq is false.
+   (expand_builtin_strcmp): Inline the calls first.
+   (expand_builtin_strncmp): Likewise.
+   (inline_string_cmp): New routine. Expand a string compare
+   call by using a sequence of char comparison.
+   (inline_expand_builtin_string_cmp): New routine. Inline expansion
+   a call to str(n)cmp/memcmp.
+   * doc/invoke.texi (--param builtin-string-cmp-inline-length): New 
option.
+   * params.def (BUILTIN_STRING_CMP_INLINE_LENGTH): New.
+

gcc/testsuite/ChangeLog

+2018-07-02  Qing Zhao  
+
+   PR middle-end/78809
+   * gcc.dg/strcmpopt_5.c: New test.
+   * gcc.dg/strcmpopt_6.c: New test.
+



0001-3nd-Patch-for-PR78009.patch
Description: Binary data


> On Jun 28, 2018, at 12:10 AM, Jeff Law  wrote:
> So I still need to dig into this patch.  But I wanted to raise an
> potential issue and get yours and Martin's thoughts on it.
> 
> Martin (and others) have been working hard to improve GCC's ability to
> give good diagnostics for various problems with calls to mem* and str*
> functions (buffer overflows, restrict issues, etc).
> 
> One of the problems Martin has identified is early conversion of these
> calls into inlined direct operations.  If those conversions happen prior
> to the analysis for warnings we obviously can't issue any relevant warnings.
> 
> Please capitalize the first word in sentences like this.  This nit
> appears in most of your comments.
> 
> 
> So I believe you do inline expansion here prior to the checks for
> Wstringop_overflow.  I think you can safely move this code to after the
> warn_stringop_overflow checks.  Though you may want to make this code
> conditional on both calls to check_access returning true and avoiding
> your transformation if either or both calls return false.
> 
> Alternately you'd need to verify that inline_expand_builtin_string_cmp
> always returns false for cases which are going to generate a warning.
> But that seems a bit tougher to maintain over time if we were to add
> more warnings to this code.
> 
> When referring to arguments in comments, please capitalize them.  ie
> VAR_STR, CONST_STR, etc.
> 
> 
> But leave them as lower case in code fragments like this.
> 
> 
> So this generally looks pretty good.  THe biggest technical concern is
> making sure we're doing the right thing WRT issuing warnings.  You can
> tackle that problem by deferring inlining to a later point after
> warnings have been issued or by verifying that your routines do not
> inline in cases where warnings will be issued.  It may be worth adding
> testcases for these issues.
> 
> There's a large number of comments that need capitalization fixes.
> 
> Given there was no measured runtime performance impact, but slight
> improvements on codesize for values <= 3, let's go ahead with that as
> the default.
> 
> Can you address the issues above and repost for final review?
> 
> Thanks,
> jeff



[PATCH] Optimize std::sub_match comparisons using string_view-like type

2018-07-02 Thread Jonathan Wakely

Avoid creation of unnecessary basic_string objects by using a simplified
string_view type and performing comparisons on that type instead. A
temporary basic_string object is still used when the sub_match's
iterators are not contiguous, in order to get an object that the
__string_view can reference.

* include/bits/regex.h (sub_match::operator string_type): Call str().
(sub_match::compare): Use _M_str() instead of str().
(sub_match::_M_compare): New public function.
(sub_match::__string_view): New helper type.
(sub_match::_M_str): New overloaded functions to avoid creating a
string_type object when not needed.
(operator==, operator!=, operator<, operator>, operator<=, operator>=):
Use sub_match::_M_compare instead of creating string_type objects.
Fix Doxygen comments.
* include/bits/regex_compiler.h (__has_contiguous_iter): Remove.
(__is_contiguous_normal_iter): Rename to __is_contiguous_iter and
simplify.
(__enable_if_contiguous_iter, __disable_if_contiguous_iter): Use
__enable_if_t.
* include/std/type_traits (__enable_if_t): Define for C++11.
* testsuite/28_regex/sub_match/compare.cc: New.
* testsuite/util/testsuite_iterators.h (remove_cv): Add transformation
trait.
(input_iterator_wrapper): Use remove_cv for value_type argument of
std::iterator base class.

We could avoid temporary string_type objects even for non-contiguous
iterators by writing a manual "compare" function that uses
char_traits::lt on every element in the iterator ranges. It's not
obvious that would actually be an optimization (and I haven't measured
it). For sub-expression matches that fit in an SSO buffer creating a
string doesn't allocate, and once we have a std::string or
std::wstring we benefit from highly optimized strncmp or wcsncmp
implementations in glibc. Maybe we could use an adaptive approach
where we create a string when value_type is char or wchar_t, and
_BiIter is random-access, and std:distance(first, second) <= SSO size.
Otherwise we'd use a manual loop using char_traits::lt. I don't plan
to work on that, the common cases are all likely to be using pointers
or basic_string::const_iterator and so already optimized by this
patch.

Tested powerpc64le-linux, committed to trunk.


commit b6d43fcbdd4987c3bcb48960dc05a1145deca747
Author: Jonathan Wakely 
Date:   Mon Jul 2 18:34:09 2018 +0100

Optimize std::sub_match comparisons using string_view-like type

Avoid creation of unnecessary basic_string objects by using a simplified
string_view type and performing comparisons on that type instead. A
temporary basic_string object is still used when the sub_match's
iterators are not contiguous, in order to get an object that the
__string_view can reference.

* include/bits/regex.h (sub_match::operator string_type): Call 
str().
(sub_match::compare): Use _M_str() instead of str().
(sub_match::_M_compare): New public function.
(sub_match::__string_view): New helper type.
(sub_match::_M_str): New overloaded functions to avoid creating a
string_type object when not needed.
(operator==, operator!=, operator<, operator>, operator<=, 
operator>=):
Use sub_match::_M_compare instead of creating string_type objects.
Fix Doxygen comments.
* include/bits/regex_compiler.h (__has_contiguous_iter): Remove.
(__is_contiguous_normal_iter): Rename to __is_contiguous_iter and
simplify.
(__enable_if_contiguous_iter, __disable_if_contiguous_iter): Use
__enable_if_t.
* include/std/type_traits (__enable_if_t): Define for C++11.
* testsuite/28_regex/sub_match/compare.cc: New.
* testsuite/util/testsuite_iterators.h (remove_cv): Add 
transformation
trait.
(input_iterator_wrapper): Use remove_cv for value_type argument of
std::iterator base class.

diff --git a/libstdc++-v3/include/bits/regex.h 
b/libstdc++-v3/include/bits/regex.h
index 6b6501e98ae..af6fe3f0d79 100644
--- a/libstdc++-v3/include/bits/regex.h
+++ b/libstdc++-v3/include/bits/regex.h
@@ -42,8 +42,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
 
 namespace __detail
 {
-  enum class _RegexExecutorPolicy : int
-{ _S_auto, _S_alternate };
+  enum class _RegexExecutorPolicy : int { _S_auto, _S_alternate };
 
   templatematched ? std::distance(this->first, this->second) : 0; }
@@ -893,11 +890,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
* from the unwary.
*/
   operator string_type() const
-  {
-   return this->matched
- ? string_type(this->first, this->second)
- : string_type();
-  }
+  { return str(); }
 
   /**
* @brief Gets the matching sequence as a string.
@@ -923,9 +916,10 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
*/
   int

[committed] selftest: introduce class auto_fix_quotes

2018-07-02 Thread David Malcolm
This patch moves a workaround for locale differences from a
selftest in pretty-print.c to selftest.h/c to make it reusable; I need
this for a selftest in a followup patch.

Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.

Committed to trunk as r262317 (with my "diagnostic messages" maintainer
hat on)

gcc/ChangeLog:
* pretty-print.c (selftest::test_pp_format): Move save and restore
of quotes to class auto_fix_quotes, and add an instance.
* selftest.c: Include "intl.h".
(selftest::auto_fix_quotes::auto_fix_quotes): New ctor.
(selftest::auto_fix_quotes::~auto_fix_quotes): New dtor.
* selftest.h (selftest::auto_fix_quotes): New class.
---
 gcc/pretty-print.c |  9 +
 gcc/selftest.c | 20 
 gcc/selftest.h | 20 
 3 files changed, 41 insertions(+), 8 deletions(-)

diff --git a/gcc/pretty-print.c b/gcc/pretty-print.c
index 8babbffd..df3ee81 100644
--- a/gcc/pretty-print.c
+++ b/gcc/pretty-print.c
@@ -2102,10 +2102,7 @@ test_pp_format ()
 {
   /* Avoid introducing locale-specific differences in the results
  by hardcoding open_quote and close_quote.  */
-  const char *old_open_quote = open_quote;
-  const char *old_close_quote = close_quote;
-  open_quote = "`";
-  close_quote = "'";
+  auto_fix_quotes fix_quotes;
 
   /* Verify that plain text is passed through unchanged.  */
   assert_pp_format (SELFTEST_LOCATION, "unformatted", "unformatted");
@@ -2187,10 +2184,6 @@ test_pp_format ()
   assert_pp_format (SELFTEST_LOCATION, "item 3 of 7", "item %i of %i", 3, 7);
   assert_pp_format (SELFTEST_LOCATION, "problem with `bar' at line 10",
"problem with %qs at line %i", "bar", 10);
-
-  /* Restore old values of open_quote and close_quote.  */
-  open_quote = old_open_quote;
-  close_quote = old_close_quote;
 }
 
 /* Run all of the selftests within this file.  */
diff --git a/gcc/selftest.c b/gcc/selftest.c
index 27de9a4..dc90557 100644
--- a/gcc/selftest.c
+++ b/gcc/selftest.c
@@ -21,6 +21,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "system.h"
 #include "coretypes.h"
 #include "selftest.h"
+#include "intl.h"
 
 #if CHECKING_P
 
@@ -192,6 +193,25 @@ temp_source_file::temp_source_file (const location ,
   fclose (out);
 }
 
+/* Avoid introducing locale-specific differences in the results
+   by hardcoding open_quote and close_quote.  */
+
+auto_fix_quotes::auto_fix_quotes ()
+{
+  m_saved_open_quote = open_quote;
+  m_saved_close_quote = close_quote;
+  open_quote = "`";
+  close_quote = "'";
+}
+
+/* Restore old values of open_quote and close_quote.  */
+
+auto_fix_quotes::~auto_fix_quotes ()
+{
+  open_quote = m_saved_open_quote;
+  close_quote = m_saved_close_quote;
+}
+
 /* Read the contents of PATH into memory, returning a 0-terminated buffer
that must be freed by the caller.
Fail (and abort) if there are any problems, with LOC as the reported
diff --git a/gcc/selftest.h b/gcc/selftest.h
index d66fb93..54fc488 100644
--- a/gcc/selftest.h
+++ b/gcc/selftest.h
@@ -113,6 +113,26 @@ class temp_source_file : public named_temp_file
const char *content);
 };
 
+/* RAII-style class for avoiding introducing locale-specific differences
+   in strings containing localized quote marks, by temporarily overriding
+   the "open_quote" and "close_quote" globals to something hardcoded.
+
+   Specifically, the C locale's values are used:
+   - open_quote becomes "`"
+   - close_quote becomes "'"
+   for the lifetime of the object.  */
+
+class auto_fix_quotes
+{
+ public:
+  auto_fix_quotes ();
+  ~auto_fix_quotes ();
+
+ private:
+  const char *m_saved_open_quote;
+  const char *m_saved_close_quote;
+};
+
 /* Various selftests involving location-handling require constructing a
line table and one or more line maps within it.
 
-- 
1.8.5.3



Re: [C++ Patch] Adjust one more error message to use rich_location::add_range

2018-07-02 Thread David Malcolm
On Mon, 2018-07-02 at 12:58 +0200, Paolo Carlini wrote:
> Hi,
> 
> I was double checking my pending patch and going through the errors
> we 
> emit in decl.c and elsewhere about thread_local and __thread and
> noticed 
> another place, in parser.c, where using rich_location::add_range
> seems 
> natural. Note, we could in principle swap location and 
> decl_specs->locations[ds_thread] in the error basing on the gnu bool
> and 
> ensure that the caret always points to __thread. All in all, I don't 
> think it's worth it...
> 
> Thanks, Paolo.

The patch looks good to me (with my "diagnostic messages" maintainer
hat on)

Thanks
Dave


Re: [C++ Patch] Adjust one more error message to use rich_location::add_range

2018-07-02 Thread Jason Merrill
OK.

On Mon, Jul 2, 2018 at 6:58 AM, Paolo Carlini  wrote:
> Hi,
>
> I was double checking my pending patch and going through the errors we emit
> in decl.c and elsewhere about thread_local and __thread and noticed another
> place, in parser.c, where using rich_location::add_range seems natural.
> Note, we could in principle swap location and
> decl_specs->locations[ds_thread] in the error basing on the gnu bool and
> ensure that the caret always points to __thread. All in all, I don't think
> it's worth it...
>
> Thanks, Paolo.
>
> ///
>


Re: [PATCH] -fopt-info: add indentation via DUMP_VECT_SCOPE

2018-07-02 Thread Christophe Lyon
On Mon, 2 Jul 2018 at 19:00, David Malcolm  wrote:
>
> On Mon, 2018-07-02 at 14:23 +0200, Christophe Lyon wrote:
> > On Fri, 29 Jun 2018 at 10:09, Richard Biener  > com> wrote:
> > >
> > > On Tue, Jun 26, 2018 at 5:43 PM David Malcolm 
> > > wrote:
> > > >
> > > > This patch adds a concept of nested "scopes" to dumpfile.c's
> > > > dump_*_loc
> > > > calls, and wires it up to the DUMP_VECT_SCOPE macro in tree-
> > > > vectorizer.h,
> > > > so that the nested structure is shown in -fopt-info by
> > > > indentation.
> > > >
> > > > For example, this converts -fopt-info-all e.g. from:
> > > >
> > > > test.c:8:3: note: === analyzing loop ===
> > > > test.c:8:3: note: === analyze_loop_nest ===
> > > > test.c:8:3: note: === vect_analyze_loop_form ===
> > > > test.c:8:3: note: === get_loop_niters ===
> > > > test.c:8:3: note: symbolic number of iterations is (unsigned int)
> > > > n_9(D)
> > > > test.c:8:3: note: not vectorized: loop contains function calls or
> > > > data references that cannot be analyzed
> > > > test.c:8:3: note: vectorized 0 loops in function
> > > >
> > > > to:
> > > >
> > > > test.c:8:3: note: === analyzing loop ===
> > > > test.c:8:3: note:  === analyze_loop_nest ===
> > > > test.c:8:3: note:   === vect_analyze_loop_form ===
> > > > test.c:8:3: note:=== get_loop_niters ===
> > > > test.c:8:3: note:   symbolic number of iterations is (unsigned
> > > > int) n_9(D)
> > > > test.c:8:3: note:   not vectorized: loop contains function calls
> > > > or data references that cannot be analyzed
> > > > test.c:8:3: note: vectorized 0 loops in function
> > > >
> > > > showing that the "symbolic number of iterations" message is
> > > > within
> > > > the "=== analyze_loop_nest ===" (and not within the
> > > > "=== vect_analyze_loop_form ===").
> > > >
> > > > This is also enabling work for followups involving optimization
> > > > records
> > > > (allowing the records to directly capture the nested structure of
> > > > the
> > > > dump messages).
> > > >
> > > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > >
> > > > OK for trunk?
> >
> > Hi,
> >
> > I've noticed that this patch (r262246) caused regressions on aarch64:
> > gcc.dg/vect/slp-perm-1.c -flto -ffat-lto-objects  scan-tree-dump
> > vect "note: Built SLP cancelled: can use load/store-lanes"
> > gcc.dg/vect/slp-perm-1.c scan-tree-dump vect "note: Built SLP
> > cancelled: can use load/store-lanes"
> > gcc.dg/vect/slp-perm-2.c -flto -ffat-lto-objects  scan-tree-dump
> > vect "note: Built SLP cancelled: can use load/store-lanes"
> > gcc.dg/vect/slp-perm-2.c scan-tree-dump vect "note: Built SLP
> > cancelled: can use load/store-lanes"
> > gcc.dg/vect/slp-perm-3.c -flto -ffat-lto-objects  scan-tree-dump
> > vect "note: Built SLP cancelled: can use load/store-lanes"
> > gcc.dg/vect/slp-perm-3.c scan-tree-dump vect "note: Built SLP
> > cancelled: can use load/store-lanes"
> > gcc.dg/vect/slp-perm-5.c -flto -ffat-lto-objects  scan-tree-dump
> > vect "note: Built SLP cancelled: can use load/store-lanes"
> > gcc.dg/vect/slp-perm-5.c scan-tree-dump vect "note: Built SLP
> > cancelled: can use load/store-lanes"
> > gcc.dg/vect/slp-perm-6.c -flto -ffat-lto-objects  scan-tree-dump
> > vect "note: Built SLP cancelled: can use load/store-lanes"
> > gcc.dg/vect/slp-perm-6.c scan-tree-dump vect "note: Built SLP
> > cancelled: can use load/store-lanes"
> > gcc.dg/vect/slp-perm-7.c -flto -ffat-lto-objects  scan-tree-dump
> > vect "note: Built SLP cancelled: can use load/store-lanes"
> > gcc.dg/vect/slp-perm-7.c scan-tree-dump vect "note: Built SLP
> > cancelled: can use load/store-lanes"
> > gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects  scan-tree-dump
> > vect "note: Built SLP cancelled: can use load/store-lanes"
> > gcc.dg/vect/slp-perm-8.c scan-tree-dump vect "note: Built SLP
> > cancelled: can use load/store-lanes"
> >
> > The problem is that now there are more spaces between "note:" and
> > "Built", the attached small patch does that for slp-perm-1.c.
>
> Sorry about the breakage.
>
> > Is it the right way of fixing it or do we want to accept any amount
> > of
> > spaces for instance?
>
> I don't think we want to hardcode the amount of space in the dumpfile.
> The idea of my patch was to make the dump more human-readable (I hope)
> by visualizing the nesting structure of the dump messages, but I think
> we shouldn't "bake" that into the expected strings, as someone might
> want to add an intermediate nesting level.
>
> Do we really need to look for the "note:" in the scan-tree-dump?
> Should that directive be rewritten to:
>
> -/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
> load/store-lanes" "vect" { target { vect_perm3_int && vect_load_lanes } } } } 
> */
> +/* { dg-final { scan-tree-dump "Built SLP cancelled: can use 
> load/store-lanes" "vect" { target { vect_perm3_int && vect_load_lanes } } } } 
> */
>
> which I believe would match any amount of 

Re: [PATCH] Remove -Wabi from libstdc++ build options

2018-07-02 Thread Jonathan Wakely

On 30/06/18 20:48 +, Bernd Edlinger wrote:

Hi,

the -Wabi option prints a warning as follows:

cc1plus: warning: -Wabi won't warn about anything [-Wabi]
cc1plus: note: -Wabi warns about differences from the most up-to-date
ABI, which is also used by default
cc1plus: note: use e.g. -Wabi=11 to warn about changes from GCC 7

This happens many times while building libstdc++, and as the warning
explains, it is good for nothing, so this patch removes it.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


No, I don't think we want to simply remove it.

Maybe https://gcc.gnu.org/ml/gcc/2018-06/msg00276.html instead?




Re: [PATCH] -fopt-info: add indentation via DUMP_VECT_SCOPE

2018-07-02 Thread David Malcolm
On Mon, 2018-07-02 at 14:23 +0200, Christophe Lyon wrote:
> On Fri, 29 Jun 2018 at 10:09, Richard Biener  com> wrote:
> > 
> > On Tue, Jun 26, 2018 at 5:43 PM David Malcolm 
> > wrote:
> > > 
> > > This patch adds a concept of nested "scopes" to dumpfile.c's
> > > dump_*_loc
> > > calls, and wires it up to the DUMP_VECT_SCOPE macro in tree-
> > > vectorizer.h,
> > > so that the nested structure is shown in -fopt-info by
> > > indentation.
> > > 
> > > For example, this converts -fopt-info-all e.g. from:
> > > 
> > > test.c:8:3: note: === analyzing loop ===
> > > test.c:8:3: note: === analyze_loop_nest ===
> > > test.c:8:3: note: === vect_analyze_loop_form ===
> > > test.c:8:3: note: === get_loop_niters ===
> > > test.c:8:3: note: symbolic number of iterations is (unsigned int)
> > > n_9(D)
> > > test.c:8:3: note: not vectorized: loop contains function calls or
> > > data references that cannot be analyzed
> > > test.c:8:3: note: vectorized 0 loops in function
> > > 
> > > to:
> > > 
> > > test.c:8:3: note: === analyzing loop ===
> > > test.c:8:3: note:  === analyze_loop_nest ===
> > > test.c:8:3: note:   === vect_analyze_loop_form ===
> > > test.c:8:3: note:=== get_loop_niters ===
> > > test.c:8:3: note:   symbolic number of iterations is (unsigned
> > > int) n_9(D)
> > > test.c:8:3: note:   not vectorized: loop contains function calls
> > > or data references that cannot be analyzed
> > > test.c:8:3: note: vectorized 0 loops in function
> > > 
> > > showing that the "symbolic number of iterations" message is
> > > within
> > > the "=== analyze_loop_nest ===" (and not within the
> > > "=== vect_analyze_loop_form ===").
> > > 
> > > This is also enabling work for followups involving optimization
> > > records
> > > (allowing the records to directly capture the nested structure of
> > > the
> > > dump messages).
> > > 
> > > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> > > 
> > > OK for trunk?
> 
> Hi,
> 
> I've noticed that this patch (r262246) caused regressions on aarch64:
> gcc.dg/vect/slp-perm-1.c -flto -ffat-lto-objects  scan-tree-dump
> vect "note: Built SLP cancelled: can use load/store-lanes"
> gcc.dg/vect/slp-perm-1.c scan-tree-dump vect "note: Built SLP
> cancelled: can use load/store-lanes"
> gcc.dg/vect/slp-perm-2.c -flto -ffat-lto-objects  scan-tree-dump
> vect "note: Built SLP cancelled: can use load/store-lanes"
> gcc.dg/vect/slp-perm-2.c scan-tree-dump vect "note: Built SLP
> cancelled: can use load/store-lanes"
> gcc.dg/vect/slp-perm-3.c -flto -ffat-lto-objects  scan-tree-dump
> vect "note: Built SLP cancelled: can use load/store-lanes"
> gcc.dg/vect/slp-perm-3.c scan-tree-dump vect "note: Built SLP
> cancelled: can use load/store-lanes"
> gcc.dg/vect/slp-perm-5.c -flto -ffat-lto-objects  scan-tree-dump
> vect "note: Built SLP cancelled: can use load/store-lanes"
> gcc.dg/vect/slp-perm-5.c scan-tree-dump vect "note: Built SLP
> cancelled: can use load/store-lanes"
> gcc.dg/vect/slp-perm-6.c -flto -ffat-lto-objects  scan-tree-dump
> vect "note: Built SLP cancelled: can use load/store-lanes"
> gcc.dg/vect/slp-perm-6.c scan-tree-dump vect "note: Built SLP
> cancelled: can use load/store-lanes"
> gcc.dg/vect/slp-perm-7.c -flto -ffat-lto-objects  scan-tree-dump
> vect "note: Built SLP cancelled: can use load/store-lanes"
> gcc.dg/vect/slp-perm-7.c scan-tree-dump vect "note: Built SLP
> cancelled: can use load/store-lanes"
> gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects  scan-tree-dump
> vect "note: Built SLP cancelled: can use load/store-lanes"
> gcc.dg/vect/slp-perm-8.c scan-tree-dump vect "note: Built SLP
> cancelled: can use load/store-lanes"
> 
> The problem is that now there are more spaces between "note:" and
> "Built", the attached small patch does that for slp-perm-1.c.

Sorry about the breakage.

> Is it the right way of fixing it or do we want to accept any amount
> of
> spaces for instance?

I don't think we want to hardcode the amount of space in the dumpfile. 
The idea of my patch was to make the dump more human-readable (I hope)
by visualizing the nesting structure of the dump messages, but I think
we shouldn't "bake" that into the expected strings, as someone might
want to add an intermediate nesting level.

Do we really need to look for the "note:" in the scan-tree-dump? 
Should that directive be rewritten to:

-/* { dg-final { scan-tree-dump "note: Built SLP cancelled: can use 
load/store-lanes" "vect" { target { vect_perm3_int && vect_load_lanes } } } } */
+/* { dg-final { scan-tree-dump "Built SLP cancelled: can use load/store-lanes" 
"vect" { target { vect_perm3_int && vect_load_lanes } } } } */

which I believe would match any amount of spaces.

Alternatively a regex accepting any amount of space ought to work, if
we care that the message begins with "note: ".

The "note: " comes from dumpfile.c's dump_loc, and is emitted
regardless of whether it's a MSG_NOTE vs a MSG_MISSED_OPTIMIZATION or

Re: [PATCH,rs6000] Fix implementation of vec_unpackh, vec_unpackl builtins

2018-07-02 Thread Segher Boessenkool
Hi!

On Fri, Jun 29, 2018 at 07:38:39AM -0700, Carl Love wrote:
> +;; Unpack high elements of float vector to vector of doubles
> +(define_expand "altivec_unpackh_v4sf"
> +  [(set (match_operand:V2DF 0 "register_operand" "=v")
> +(match_operand:V4SF 1 "register_operand" "v"))]
> +  "TARGET_VSX"
> +{
> +  emit_insn (gen_doublehv4sf2 (operands[0], operands[1]));
> +  DONE;
> +}
> +  [(set_attr "type" "veccomplex")])

I wondered if these mactually work for all VSX registers, not just the VMX
registers (i.e. "wa" or similar instead of "v").  But constraints in
define_expand are meaningless anyway; just leave them out please.

Does it help to define these altivec_unpackh_v4sf, when all it does is
expand as doublehv4sf2?  Can't you more easily put the latter in the tables?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> @@ -0,0 +1,257 @@
> +/* { dg-do compile { target powerpc*-*-* } } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-mpower8-vector -maltivec" } */

This needs p8vector_ok then.  Is that correct?  What requires p8?
Is VSX (p7) enough for everything here?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> @@ -0,0 +1,94 @@
> +/* { dg-do compile { target powerpc*-*-* } } */
> +/* { dg-require-effective-target powerpc_altivec_ok } */
> +/* { dg-options "-mpower8-vector -mvsx" } */

Same here: required target does not match options.

Rest looks fine.


Segher


Re: [PATCH] [RFC] Higher-level reporting of vectorization problems

2018-07-02 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, 22 Jun 2018, David Malcolm wrote:
>
>> NightStrike and I were chatting on IRC last week about
>> issues with trying to vectorize the following code:
>> 
>> #include 
>> std::size_t f(std::vector> const & v) {
>>  std::size_t ret = 0;
>>  for (auto const & w: v)
>>  ret += w.size();
>>  return ret;
>> }
>> 
>> icc could vectorize it, but gcc couldn't, but neither of us could
>> immediately figure out what the problem was.
>> 
>> Using -fopt-info leads to a wall of text.
>> 
>> I tried using my patch here:
>> 
>>  "[PATCH] v3 of optinfo, remarks and optimization records"
>>   https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01267.html
>> 
>> It improved things somewhat, by showing:
>> (a) the nesting structure via indentation, and
>> (b) the GCC line at which each message is emitted (by using the
>> "remark" output)
>> 
>> but it's still a wall of text:
>> 
>>   https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.remarks.html
>>   
>> https://dmalcolm.fedorapeople.org/gcc/2018-06-18/test.cc.d/..%7C..%7Csrc%7Ctest.cc.html#line-4
>> 
>> It doesn't yet provide a simple high-level message to a
>> tech-savvy user on what they need to do to get GCC to
>> vectorize their loop.
>
> Yeah, in particular the vectorizer is way too noisy in its low-level
> functions.  IIRC -fopt-info-vec-missed is "somewhat" better:
>
> t.C:4:26: note: step unknown.
> t.C:4:26: note: vector alignment may not be reachable
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: no array mode for V2DI[3]
> t.C:4:26: note: Data access with gaps requires scalar epilogue loop
> t.C:4:26: note: can't use a fully-masked loop because the target doesn't 
> have the appropriate masked load or store.
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: not ssa-name.
> t.C:4:26: note: use not simple.
> t.C:4:26: note: no array mode for V2DI[3]
> t.C:4:26: note: Data access with gaps requires scalar epilogue loop
> t.C:4:26: note: op not supported by target.
> t.C:4:26: note: not vectorized: relevant stmt not supported: _15 = _14 
> /[ex] 4;
> t.C:4:26: note: bad operation or unsupported loop bound.
> t.C:4:26: note: not vectorized: no grouped stores in basic block.
> t.C:4:26: note: not vectorized: no grouped stores in basic block.
> t.C:6:12: note: not vectorized: not enough data-refs in basic block.
>
>
>> The pertinent dump messages are:
>> 
>> test.cc:4:23: remark: === try_vectorize_loop_1 === 
>> [../../src/gcc/tree-vectorizer.c:674:try_vectorize_loop_1]
>> cc1plus: remark:
>> Analyzing loop at test.cc:4 
>> [../../src/gcc/dumpfile.c:735:ensure_pending_optinfo]
>> test.cc:4:23: remark:  === analyze_loop_nest === 
>> [../../src/gcc/tree-vect-loop.c:2299:vect_analyze_loop]
>> [...snip...]
>> test.cc:4:23: remark:   === vect_analyze_loop_operations === 
>> [../../src/gcc/tree-vect-loop.c:1520:vect_analyze_loop_operations]
>> [...snip...]
>> test.cc:4:23: remark:==> examining statement: ‘_15 = _14 /[ex] 4;’ 
>> [../../src/gcc/tree-vect-stmts.c:9382:vect_analyze_stmt]
>> test.cc:4:23: remark:vect_is_simple_use: operand ‘_14’ 
>> [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
>> test.cc:4:23: remark:def_stmt: ‘_14 = _8 - _7;’ 
>> [../../src/gcc/tree-vect-stmts.c:10098:vect_is_simple_use]
>> test.cc:4:23: remark:type of def: internal 
>> [../../src/gcc/tree-vect-stmts.c:10112:vect_is_simple_use]
>> test.cc:4:23: remark:vect_is_simple_use: operand ‘4’ 
>> [../../src/gcc/tree-vect-stmts.c:10064:vect_is_simple_use]
>> test.cc:4:23: remark:op not supported by target. 
>> [../../src/gcc/tree-vect-stmts.c:5932:vectorizable_operation]
>> test.cc:4:23: remark:not vectorized: relevant stmt not supported: ‘_15 = 
>> _14 /[ex] 4;’ [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt]
>> test.cc:4:23: remark:   bad operation or unsupported loop bound. 
>> [../../src/gcc/tree-vect-loop.c:2043:vect_analyze_loop_2]
>> cc1plus: remark: vectorized 0 loops in function. 
>> [../../src/gcc/tree-vectorizer.c:904:vectorize_loops]
>> 
>> In particular, that complaint from
>>   [../../src/gcc/tree-vect-stmts.c:9565:vect_analyze_stmt]
>> is coming from:
>> 
>>   if (!ok)
>> {
>>   if (dump_enabled_p ())
>> {
>>   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
>>"not vectorized: relevant stmt not ");
>>   dump_printf (MSG_MISSED_OPTIMIZATION, "supported: ");
>>   dump_gimple_stmt (MSG_MISSED_OPTIMIZATION, TDF_SLIM, stmt, 0);
>> }
>> 
>>   return false;
>> }
>> 
>> This got me thinking: the user presumably wants to know several
>> things:
>> 
>> * the location of the loop that can't be vectorized (vect_location
>>   captures this)
>> * location of the problematic statement
>> * why it's problematic
>> * the problematic statement itself.
>> 
>> The following 

libgo patch committed: Check return value as well as error from waitid

2018-07-02 Thread Ian Lance Taylor
This libgo patch checks the return value as well as the error from
waitid.  https://gcc.gnu.org/PR86331 indicates that if a signal
handler runs it is possible for syscall.Syscall6 to return a non-zero
errno value even if no error occurs. That is a problem in general, but
this fix will let us work around the general problem for the specific
case of calling waitid.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to trunk, GCC 8 branch, and GCC 7
branch.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 262312)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-e1fcce0aec27b1f50ac0e736f39f4c806c2a5baa
+94738979a3422e845acf358a766aabf8b9275d43
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/os/wait_waitid.go
===
--- libgo/go/os/wait_waitid.go  (revision 262312)
+++ libgo/go/os/wait_waitid.go  (working copy)
@@ -28,9 +28,12 @@ func (p *Process) blockUntilWaitable() (
// We don't care about the values it returns.
var siginfo [16]uint64
psig := [0]
-   _, _, e := syscall.Syscall6(syscall.SYS_WAITID, _P_PID, uintptr(p.Pid), 
uintptr(unsafe.Pointer(psig)), syscall.WEXITED|syscall.WNOWAIT, 0, 0)
+   r, _, e := syscall.Syscall6(syscall.SYS_WAITID, _P_PID, uintptr(p.Pid), 
uintptr(unsafe.Pointer(psig)), syscall.WEXITED|syscall.WNOWAIT, 0, 0)
runtime.KeepAlive(p)
-   if e != 0 {
+   // Check r as well as e because syscall.Syscall6 currently
+   // just returns errno, and the SIGCHLD signal handler may
+   // change errno. See https://gcc.gnu.org/PR86331.
+   if r != 0 && e != 0 {
// waitid has been available since Linux 2.6.9, but
// reportedly is not available in Ubuntu on Windows.
// See issue 16610.


Re: [PATCH] Make sure rs6000-modes.h is installed in plugin/include/config/rs6000/ subdir

2018-07-02 Thread Segher Boessenkool
Hi Jakub,

On Fri, Jun 29, 2018 at 12:52:59AM +0200, Jakub Jelinek wrote:
> The newly added rs6000-modes.h is now included from rs6000.h, so it is
> needed when building plugins that include tm.h, but it wasn't listed in the
> Makefile fragments and therefore included among PLUGIN_HEADERS.
> 
> Fixed thusly, tested on powerpc64le-linux, ok for trunk and 8.2?

Yes please.  Thanks!


Segher


> 2018-06-28  Jakub Jelinek  
> 
>   * config/rs6000/t-rs6000: Append rs6000-modes.h to TM_H.


Re: [PATCH, aarch64 3/4] aarch64: Add movprfx alternatives for predicate patterns

2018-07-02 Thread Richard Henderson
On 07/02/2018 04:55 AM, Richard Sandiford wrote:
>> +;; Predicated floating-point operations with select matching output.
>> +(define_insn "*cond__0"
>> +  [(set (match_operand:SVE_F 0 "register_operand" "+w, w, ?")
>>  (unspec:SVE_F
>> -  [(match_operand: 1 "register_operand" "Upl")
>> +  [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
>> (unspec:SVE_F
>> - [(match_operand:SVE_F 2 "register_operand" "0")
>> -  (match_operand:SVE_F 3 "register_operand" "w")]
>> + [(match_dup 1)
>> +  (match_operand:SVE_F 2 "register_operand" "0, w, w")
>> +  (match_operand:SVE_F 3 "register_operand" "w, 0, w")]
>> + SVE_COND_FP_BINARY)
>> +   (match_dup 0)]
>> +  UNSPEC_SEL))]
>> +  "TARGET_SVE"
>> +  "@
>> +   \t%0., %1/m, %0., %3.
>> +   \t%0., %1/m, %0., %2.
>> +   movprfx\t%0, %1/m, %2\;\t%0., %1/m, %0., 
>> %3."
>> +  [(set_attr "movprfx" "*,*,yes")]
>> +)
> 
> Reintroduces a (match_dup 1) into the SVE_COND_FP_BINARY.
> 
> OK otherwise, thanks.

Feh, and fixed again in patch 4.
I've squashed all 4 patches for final commit,
so the intermediate breakage is gone.

Thanks for the review.


r~


Re: [RFC, testsuite/guality] Use relative line numbers in gdb-test

2018-07-02 Thread Mike Stump
On Jun 28, 2018, at 12:39 PM, Tom de Vries  wrote:
> This patch adds a dg-final override that passes it's first argument to the
> gdb-test action.  This allows us to use relative line numbers in gdb-test.
> 
> Tested pr45882.c.
> 
> Any comments?

> 2018-06-28  Tom de Vries  
> 
>   * gcc.dg/guality/pr45882.c (foo): Use relative line numbers.
>   * lib/gcc-dg.exp (dg-final): New proc.
>   * lib/gcc-gdb-test.exp (gdb-test): Add and handle additional line number
>   argument.

I like it.  :-)  I'd approve it.  Anyone not like it?



[PATCH] Fix GCOV scan pattern (PR testsuite/86366).

2018-07-02 Thread Martin Liška
Hi.

This fixes broken tests gcc.dg/profile-dir*. I'm going to install the patch.

Martin

gcc/testsuite/ChangeLog:

2018-07-02  Martin Liska  

PR testsuite/86366
* gcc.dg/profile-dir-1.c: Fix scanned pattern.
* gcc.dg/profile-dir-2.c: Likewise.
* gcc.dg/profile-dir-3.c: Likewise.
---
 gcc/testsuite/gcc.dg/profile-dir-1.c | 2 +-
 gcc/testsuite/gcc.dg/profile-dir-2.c | 2 +-
 gcc/testsuite/gcc.dg/profile-dir-3.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)


diff --git a/gcc/testsuite/gcc.dg/profile-dir-1.c b/gcc/testsuite/gcc.dg/profile-dir-1.c
index df49e2361f4..01269f99902 100644
--- a/gcc/testsuite/gcc.dg/profile-dir-1.c
+++ b/gcc/testsuite/gcc.dg/profile-dir-1.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-profiling "-fprofile-generate" } */
 /* { dg-options "-O -fprofile-generate=. -fdump-ipa-cgraph" } */
-/* { dg-final { scan-ipa-dump " ./profile-dir-1.gcda" "cgraph" } } */
+/* { dg-final { scan-ipa-dump "Using data file \.\/.*#profile-dir-1.gcda" "cgraph" } } */
 
 int
 main(void)
diff --git a/gcc/testsuite/gcc.dg/profile-dir-2.c b/gcc/testsuite/gcc.dg/profile-dir-2.c
index 004ba0b0905..13081ce253f 100644
--- a/gcc/testsuite/gcc.dg/profile-dir-2.c
+++ b/gcc/testsuite/gcc.dg/profile-dir-2.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-profiling "-fprofile-generate" } */
 /* { dg-options "-O -fprofile-generate -fdump-ipa-cgraph" } */
-/* { dg-final { scan-ipa-dump "/profile-dir-2.gcda" "cgraph" } } */
+/* { dg-final { scan-ipa-dump "Using data file .*\/profile-dir-2.gcda" "cgraph" } } */
 
 int
 main(void)
diff --git a/gcc/testsuite/gcc.dg/profile-dir-3.c b/gcc/testsuite/gcc.dg/profile-dir-3.c
index 6b47807d850..21f975a93e9 100644
--- a/gcc/testsuite/gcc.dg/profile-dir-3.c
+++ b/gcc/testsuite/gcc.dg/profile-dir-3.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-require-profiling "-fprofile-generate" } */
 /* { dg-options "-O -fprofile-generate -fprofile-dir=. -fdump-ipa-cgraph" } */
-/* { dg-final { scan-ipa-dump " ./profile-dir-3.gcda" "cgraph" } } */
+/* { dg-final { scan-ipa-dump "Using data file \.\/.*#profile-dir-3.gcda" "cgraph" } } */
 
 int
 main(void)



Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-07-02 Thread Cesar Philippidis
On 07/02/2018 07:14 AM, Tom de Vries wrote:
> On 06/21/2018 03:58 PM, Cesar Philippidis wrote:
>> On 06/20/2018 03:15 PM, Tom de Vries wrote:
>>> On 06/20/2018 11:59 PM, Cesar Philippidis wrote:
 Now it follows the formula contained in
 the "CUDA Occupancy Calculator" spreadsheet that's distributed with CUDA.
>>>
>>> Any reason we're not using the cuda runtime functions to get the
>>> occupancy (see PR85590 - [nvptx, libgomp, openacc] Use cuda runtime fns
>>> to determine launch configuration in nvptx ) ?
>>
>> There are two reasons:
>>
>>   1) cuda_occupancy.h depends on the CUDA runtime to extract the device
>>  properties instead of the CUDA driver API. However, we can always
>>  teach libgomp how to populate the cudaDeviceProp struct using the
>>  driver API.
>>
>>   2) CUDA is not always present on the build host, and that's why
>>  libgomp maintains its own cuda.h. So at the very least, this
>>  functionality would be good to have in libgomp as a fallback
>>  implementation;
> 
> Libgomp maintains its own cuda.h to "allow building GCC with PTX
> offloading even without CUDA being installed" (
> https://gcc.gnu.org/ml/gcc-patches/2017-01/msg00980.html ).
> 
> The libgomp nvptx plugin however uses the cuda driver API to launch
> kernels etc, so we can assume that's always available at launch time.
> And according to the "CUDA Pro Tip: Occupancy API Simplifies Launch
> Configuration", the occupancy API is also available in the driver API.

Thanks for the info. I was not aware that the CUDA driver API had a
thread occupancy calculator (it' described in section 4.18).

> What we cannot assume to be available is the occupancy API pre cuda-6.5.
> So it's fine to have a fallback for that (properly isolated in utility
> functions), but for cuda 6.5 and up we want to use the occupancy API.

That seems reasonable. I'll run some experiments with that. In the
meantime, would it be OK to make this fallback the default, then add
support for the driver occupancy calculator as a follow up?

>>  its not good to have program fail due to
>>  insufficient hardware resources errors when it is avoidable.
>>
> 
> Right, in fact there are two separate things you're trying to address
> here: launch failure and occupancy heuristic, so split the patch.

ACK. I'll split those changes into separate patches.

By the way, do you have any preferences on how to break up the nvptx
vector length changes for trunk submission? I was planning on breaking
it down into four components - generic ME changes, tests, nvptx
reductions and the rest. Those two nvptx compoinents are large, so I'll
probably break them down to smaller patches, but I'm not sure if it's
worthwhile to make them independent from one another with the use of a
lot of stub functions.

Cesar


Quarter 3 2018

2018-07-02 Thread Elena Murphy
Hi,



Hope you having a great day!



I just wanted to be aware if you looking to acquire database for your marketing 
efforts? We provide database for all industry and technology.



Kindly review and let me be aware of your interest so that I can get back to 
you with the exact counts, sample and more info regarding the same.



Do let me be aware if you have any questions for me.



Regards,

Elena Murphy

Database Executive

If you do not wish to receive these emails. Please respond Exit.


Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-07-02 Thread Tom de Vries
On 06/21/2018 03:58 PM, Cesar Philippidis wrote:
> On 06/20/2018 03:15 PM, Tom de Vries wrote:
>> On 06/20/2018 11:59 PM, Cesar Philippidis wrote:
>>> Now it follows the formula contained in
>>> the "CUDA Occupancy Calculator" spreadsheet that's distributed with CUDA.
>>
>> Any reason we're not using the cuda runtime functions to get the
>> occupancy (see PR85590 - [nvptx, libgomp, openacc] Use cuda runtime fns
>> to determine launch configuration in nvptx ) ?
> 
> There are two reasons:
> 
>   1) cuda_occupancy.h depends on the CUDA runtime to extract the device
>  properties instead of the CUDA driver API. However, we can always
>  teach libgomp how to populate the cudaDeviceProp struct using the
>  driver API.
> 
>   2) CUDA is not always present on the build host, and that's why
>  libgomp maintains its own cuda.h. So at the very least, this
>  functionality would be good to have in libgomp as a fallback
>  implementation;

Libgomp maintains its own cuda.h to "allow building GCC with PTX
offloading even without CUDA being installed" (
https://gcc.gnu.org/ml/gcc-patches/2017-01/msg00980.html ).

The libgomp nvptx plugin however uses the cuda driver API to launch
kernels etc, so we can assume that's always available at launch time.
And according to the "CUDA Pro Tip: Occupancy API Simplifies Launch
Configuration", the occupancy API is also available in the driver API.

What we cannot assume to be available is the occupancy API pre cuda-6.5.
So it's fine to have a fallback for that (properly isolated in utility
functions), but for cuda 6.5 and up we want to use the occupancy API.

>  its not good to have program fail due to
>  insufficient hardware resources errors when it is avoidable.
>

Right, in fact there are two separate things you're trying to address
here: launch failure and occupancy heuristic, so split the patch.

Thanks,
- Tom


Re: [PATCH][arm] Avoid STRD with odd register for TARGET_ARM in output_move_double

2018-07-02 Thread Kyrill Tkachov

Hi Christophe,

On 02/07/18 13:17, Christophe Lyon wrote:

On Fri, 29 Jun 2018 at 15:32, Kyrill Tkachov
 wrote:

Hi all,

In this testcase the user forces an odd register as the starting reg for a 
DFmode value.
The output_move_double function tries to store that using an STRD instruction.
But for TARGET_ARM the starting register of an STRD must be an even one.
This is always the case with compiler-allocated registers for DFmode values, 
but the
inline assembly forced our hand here.

This patch  restricts the STRD-emitting logic in output_move_double to not avoid
odd-numbered source registers in STRD.
I'm not a fan of the whole function, we should be exposing a lot of the logic 
in there
to RTL rather than at the final output stage, but that would need to be fixed 
separately.

This patch is much safer for backporting purposes.

Bootstrapped and tested on arm-none-linux-gnueabihf.


Hi Kyrill,

I think you want to skip this test if one overrides -mfloat-abi, like
the small attached patch does.
OK?


Ok.
Thanks,
Kyrill



Thanks,

Christophe


Committing to trunk.
Thanks,
Kyrill

2018-06-29  Kyrylo Tkachov  

  * config/arm/arm.c (output_move_double): Don't allow STRD instructions
  if starting source register is not even.

2018-06-29  Kyrylo Tkachov  

  * gcc.target/arm/arm-soft-strd-even.c: New test.




Re: [ABSU_EXPR] Add some of the missing patterns in match,pd

2018-07-02 Thread Richard Biener
On Fri, Jun 29, 2018 at 4:38 AM Kugan Vivekanandarajah
 wrote:
>
> Hi Marc,
>
> Thanks for the review.
>
> On 28 June 2018 at 14:11, Marc Glisse  wrote:
> > (why is there no mention of ABSU_EXPR in doc/* ?)
>
> I will fix this in a separate patch.
> >
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -571,10 +571,12 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > (copysigns (op @0) @1)
> > (copysigns @0 @1
> >
> > -/* abs(x)*abs(x) -> x*x.  Should be valid for all types.  */
> > -(simplify
> > - (mult (abs@1 @0) @1)
> > - (mult @0 @0))
> > +/* abs(x)*abs(x) -> x*x.  Should be valid for all types.
> > +   also for absu(x)*absu(x) -> x*x.  */
> > +(for op (abs absu)
> > + (simplify
> > +  (mult (op@1 @0) @1)
> > +  (mult @0 @0)))
> >
> > 1) the types are wrong, it should be (convert (mult @0 @0)), or maybe
> > view_convert if vectors also use absu.
> > 2) this is only valid with -fwrapv (TYPE_OVERFLOW_WRAPS(TREE_TYPE(@0))),
> > otherwise you are possibly replacing a multiplication on unsigned with a
> > multiplication on signed that may overflow. So maybe it is actually supposed
> > to be
> > (mult (convert @0) (convert @0))
> Indeed, I missed it. I have changed it in the attached patch.
> >
> >  /* cos(copysign(x, y)) -> cos(x).  Similarly for cosh.  */
> >  (for coss (COS COSH)
> > @@ -1013,15 +1015,24 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >&& tree_nop_conversion_p (type, TREE_TYPE (@2)))
> >(bit_xor (convert @1) (convert @2
> >
> > -(simplify
> > - (abs (abs@1 @0))
> > - @1)
> > -(simplify
> > - (abs (negate @0))
> > - (abs @0))
> > -(simplify
> > - (abs tree_expr_nonnegative_p@0)
> > - @0)
> > +/* Convert abs (abs (X)) into abs (X).
> > +   also absu (absu (X)) into absu (X).  */
> > +(for op (abs absu)
> > + (simplify
> > +  (op (op@1 @0))
> > +  @1))
> >
> > You cannot have (absu (absu @0)) since absu takes a signed integer and
> > returns an unsigned one. (absu (abs @0)) may indeed be equivalent to
> > (convert (abs @0)). Possibly (op (convert (absu @0))) could also be
> > simplified if convert is a NOP.
> >
> > +/* Convert abs[u] (-X) -> abs[u] (X).  */
> > +(for op (abs absu)
> > + (simplify
> > +  (op (negate @0))
> > +  (op @0)))
> > +
> > +/* Convert abs[u] (X)  where X is nonnegative -> (X).  */
> > +(for op (abs absu)
> > + (simplify
> > +  (op tree_expr_nonnegative_p@0)
> > +  @0))
> >
> > Missing convert again.
> >
> > Where are the testcases?
> I have fixed the above and added test-cases.
>
> >
> >> Bootstrap and regression testing on x86_64-linux-gnu. Is this OK if no
> >> regressions.
> >
> >
> > Does it mean you have run the tests or intend to run them in the future? It
> > is confusing.
> Sorry for not being clear.
>
> I have bootstrapped and regression tested with no new regression.
>
> Is this OK?

+/* Convert absu(x)*absu(x) -> x*x.  */
+(simplify
+ (mult (absu@1 @0) @1)
+ (mult (convert @0) (convert @0)))

(mult (convert@2 @0) @2)

+/* Convert abs (abs (X)) into abs (X).
+   also absu (absu (X)) into absu (X).  */
 (simplify
  (abs (abs@1 @0))
  @1)
+
+(simplify
+ (absu (nop_convert (absu@1 @0)))
+ @1)

please use

  (absu (convert@2 (absu@1 @0))
  (if (tree_nop_conversion_p (TREE_TYPE (@2), TREE_TYPE (@1)))
   @1))

as Marc said we can have (absu (abs @0))
which should be equivalent to (convert (abs @0))

Otherwise OK.

Richard.

> Thanks,
> Kugan
>
> >
> > --
> > Marc Glisse


Re: [14/n] PR85694: Rework overwidening detection

2018-07-02 Thread Christophe Lyon
On Mon, 2 Jul 2018 at 15:37, Richard Sandiford
 wrote:
>
> Christophe Lyon  writes:
> > On Fri, 29 Jun 2018 at 13:36, Richard Sandiford
> >  wrote:
> >>
> >> Richard Sandiford  writes:
> >> > This patch is the main part of PR85694.  The aim is to recognise at 
> >> > least:
> >> >
> >> >   signed char *a, *b, *c;
> >> >   ...
> >> >   for (int i = 0; i < 2048; i++)
> >> > c[i] = (a[i] + b[i]) >> 1;
> >> >
> >> > as an over-widening pattern, since the addition and shift can be done
> >> > on shorts rather than ints.  However, it ended up being a lot more
> >> > general than that.
> >> >
> >> > The current over-widening pattern detection is limited to a few simple
> >> > cases: logical ops with immediate second operands, and shifts by a
> >> > constant.  These cases are enough for common pixel-format conversion
> >> > and can be detected in a peephole way.
> >> >
> >> > The loop above requires two generalisations of the current code: support
> >> > for addition as well as logical ops, and support for non-constant second
> >> > operands.  These are harder to detect in the same peephole way, so the
> >> > patch tries to take a more global approach.
> >> >
> >> > The idea is to get information about the minimum operation width
> >> > in two ways:
> >> >
> >> > (1) by using the range information attached to the SSA_NAMEs
> >> > (effectively a forward walk, since the range info is
> >> > context-independent).
> >> >
> >> > (2) by back-propagating the number of output bits required by
> >> > users of the result.
> >> >
> >> > As explained in the comments, there's a balance to be struck between
> >> > narrowing an individual operation and fitting in with the surrounding
> >> > code.  The approach is pretty conservative: if we could narrow an
> >> > operation to N bits without changing its semantics, it's OK to do that 
> >> > if:
> >> >
> >> > - no operations later in the chain require more than N bits; or
> >> >
> >> > - all internally-defined inputs are extended from N bits or fewer,
> >> >   and at least one of them is single-use.
> >> >
> >> > See the comments for the rationale.
> >> >
> >> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
> >> > since the code seemed more readable without.
> >> >
> >> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> >>
> >> Here's a version rebased on top of current trunk.  Changes from last time:
> >>
> >> - reintroduce dump_generic_expr_loc, with the obvious change to the
> >>   prototype
> >>
> >> - fix a typo in a comment
> >>
> >> - use vect_element_precision from the new version of 12/n.
> >>
> >> Tested as before.  OK to install?
> >>
> >
> > Hi Richard,
> >
> > This patch introduces regressions on arm-none-linux-gnueabihf:
> > gcc.dg/vect/vect-over-widen-1-big-array.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-over-widen-1-big-array.c scan-tree-dump-times
> > vect "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-over-widen-1.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-over-widen-1.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-over-widen-4-big-array.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-over-widen-4-big-array.c scan-tree-dump-times
> > vect "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-over-widen-4.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-over-widen-4.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-widen-shift-s16.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 8
> > gcc.dg/vect/vect-widen-shift-s16.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 8
> > gcc.dg/vect/vect-widen-shift-s8.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
> > gcc.dg/vect/vect-widen-shift-s8.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 1
> > gcc.dg/vect/vect-widen-shift-u16.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
> > gcc.dg/vect/vect-widen-shift-u16.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 1
> > gcc.dg/vect/vect-widen-shift-u8.c -flto -ffat-lto-objects
> > scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> > gcc.dg/vect/vect-widen-shift-u8.c scan-tree-dump-times vect
> > "vect_recog_widen_shift_pattern: detected" 2
>
> Sorry about that, was caused by a stupid typo.  I've applied the
> below as obvious.
>
> (For the record, it was actually 12/n that caused 

Re: [14/n] PR85694: Rework overwidening detection

2018-07-02 Thread Richard Sandiford
Christophe Lyon  writes:
> On Fri, 29 Jun 2018 at 13:36, Richard Sandiford
>  wrote:
>>
>> Richard Sandiford  writes:
>> > This patch is the main part of PR85694.  The aim is to recognise at least:
>> >
>> >   signed char *a, *b, *c;
>> >   ...
>> >   for (int i = 0; i < 2048; i++)
>> > c[i] = (a[i] + b[i]) >> 1;
>> >
>> > as an over-widening pattern, since the addition and shift can be done
>> > on shorts rather than ints.  However, it ended up being a lot more
>> > general than that.
>> >
>> > The current over-widening pattern detection is limited to a few simple
>> > cases: logical ops with immediate second operands, and shifts by a
>> > constant.  These cases are enough for common pixel-format conversion
>> > and can be detected in a peephole way.
>> >
>> > The loop above requires two generalisations of the current code: support
>> > for addition as well as logical ops, and support for non-constant second
>> > operands.  These are harder to detect in the same peephole way, so the
>> > patch tries to take a more global approach.
>> >
>> > The idea is to get information about the minimum operation width
>> > in two ways:
>> >
>> > (1) by using the range information attached to the SSA_NAMEs
>> > (effectively a forward walk, since the range info is
>> > context-independent).
>> >
>> > (2) by back-propagating the number of output bits required by
>> > users of the result.
>> >
>> > As explained in the comments, there's a balance to be struck between
>> > narrowing an individual operation and fitting in with the surrounding
>> > code.  The approach is pretty conservative: if we could narrow an
>> > operation to N bits without changing its semantics, it's OK to do that if:
>> >
>> > - no operations later in the chain require more than N bits; or
>> >
>> > - all internally-defined inputs are extended from N bits or fewer,
>> >   and at least one of them is single-use.
>> >
>> > See the comments for the rationale.
>> >
>> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
>> > since the code seemed more readable without.
>> >
>> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> Here's a version rebased on top of current trunk.  Changes from last time:
>>
>> - reintroduce dump_generic_expr_loc, with the obvious change to the
>>   prototype
>>
>> - fix a typo in a comment
>>
>> - use vect_element_precision from the new version of 12/n.
>>
>> Tested as before.  OK to install?
>>
>
> Hi Richard,
>
> This patch introduces regressions on arm-none-linux-gnueabihf:
> gcc.dg/vect/vect-over-widen-1-big-array.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-1-big-array.c scan-tree-dump-times
> vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-1.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-1.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-4-big-array.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-4-big-array.c scan-tree-dump-times
> vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-4.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-over-widen-4.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-widen-shift-s16.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 8
> gcc.dg/vect/vect-widen-shift-s16.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 8
> gcc.dg/vect/vect-widen-shift-s8.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
> gcc.dg/vect/vect-widen-shift-s8.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 1
> gcc.dg/vect/vect-widen-shift-u16.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
> gcc.dg/vect/vect-widen-shift-u16.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 1
> gcc.dg/vect/vect-widen-shift-u8.c -flto -ffat-lto-objects
> scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
> gcc.dg/vect/vect-widen-shift-u8.c scan-tree-dump-times vect
> "vect_recog_widen_shift_pattern: detected" 2

Sorry about that, was caused by a stupid typo.  I've applied the
below as obvious.

(For the record, it was actually 12/n that caused this.  14/n hasn't
been applied yet.)

Thanks,
Richard


2018-07-02  Richard Sandiford  

gcc/
* tree-vect-patterns.c (vect_recog_widen_shift_pattern): Fix typo
in dump string.

Index: gcc/tree-vect-patterns.c

[PATCH] Fix PR86363

2018-07-02 Thread Richard Biener


The following fixes an oversight in a previous patch to enhance VN
of memset calls.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

>From b35c162ee7126057134d3b478e80160880402f1a Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Mon, 2 Jul 2018 13:11:39 +0200
Subject: [PATCH] fix-pr86363

2018-07-02  Richard Biener  

PR tree-optimization/86363
* tree-ssa-sccvn.c (vn_reference_lookup_3): Check the
memset argument refers to a non-variable address.

* gcc.dg/torture/pr86363.c: New testcase.

diff --git a/gcc/testsuite/gcc.dg/torture/pr86363.c 
b/gcc/testsuite/gcc.dg/torture/pr86363.c
new file mode 100644
index 000..154f9386e05
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr86363.c
@@ -0,0 +1,25 @@
+/* { dg-do run } */
+/* { dg-additional-options "-w -Wno-psabi" } */
+
+typedef char U __attribute__ ((vector_size (16)));
+typedef unsigned V __attribute__ ((vector_size (16)));
+
+V g;
+
+V
+f (V v, U u)
+{
+  __builtin_memset ([v[0]], 0, 1);
+  g ^= u[0];
+  return g;
+}
+
+int
+main (void)
+{
+  V x = f ((V) { 5 }, (U) { 1 });
+
+  if (x[0] != 1 || x[1] != 1 || x[2] != 1 || x[3] != 1)
+__builtin_abort ();
+  return 0;
+}
diff --git a/gcc/tree-ssa-sccvn.c b/gcc/tree-ssa-sccvn.c
index e5eddf902b8..1e16e13cfa1 100644
--- a/gcc/tree-ssa-sccvn.c
+++ b/gcc/tree-ssa-sccvn.c
@@ -1988,6 +1988,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree vuse, void *vr_,
  base2 = get_ref_base_and_extent (ref2, , , ,
   );
  if (!known_size_p (maxsize2)
+ || !known_eq (maxsize2, size2)
  || !operand_equal_p (base, base2, OEP_ADDRESS_OF))
return (void *)-1;
}


Re: [16/n] PR85694: Add detection of averaging operations

2018-07-02 Thread Richard Biener
On Fri, Jun 29, 2018 at 11:21 AM Richard Sandiford
 wrote:
>
> This patch adds detection of average instructions:
>
>a = (((wide) b + (wide) c) >> 1);
>--> a = (wide) .AVG_FLOOR (b, c);
>
>a = (((wide) b + (wide) c + 1) >> 1);
>--> a = (wide) .AVG_CEIL (b, c);
>
> in cases where users of "a" need only the low half of the result,
> making the cast to (wide) redundant.  The heavy lifting was done by
> earlier patches.
>
> This showed up another problem in vectorizable_call: if the call is a
> pattern definition statement rather than the main pattern statement,
> the type of vectorised call might be different from the type of the
> original statement.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> 2018-06-29  Richard Sandiford  
>
> gcc/
> PR tree-optimization/85694
> * doc/md.texi (avgM3_floor, uavgM3_floor, avgM3_ceil)
> (uavgM3_ceil): Document new optabs.
> * doc/sourcebuild.texi (vect_avg_qi): Document new target selector.
> * internal-fn.def (IFN_AVG_FLOOR, IFN_AVG_CEIL): New internal
> functions.
> * optabs.def (savg_floor_optab, uavg_floor_optab, savg_ceil_optab)
> (savg_ceil_optab): New optabs.
> * tree-vect-patterns.c (vect_recog_average_pattern): New function.
> (vect_vect_recog_func_ptrs): Add it.
> * tree-vect-stmts.c (vectorizable_call): Get the type of the zero
> constant directly from the associated lhs.
>
> gcc/testsuite/
> PR tree-optimization/85694
> * lib/target-supports.exp (check_effective_target_vect_avg_qi): New
> proc.
> * gcc.dg/vect/vect-avg-1.c: New test.
> * gcc.dg/vect/vect-avg-2.c: Likewise.
> * gcc.dg/vect/vect-avg-3.c: Likewise.
> * gcc.dg/vect/vect-avg-4.c: Likewise.
> * gcc.dg/vect/vect-avg-5.c: Likewise.
> * gcc.dg/vect/vect-avg-6.c: Likewise.
> * gcc.dg/vect/vect-avg-7.c: Likewise.
> * gcc.dg/vect/vect-avg-8.c: Likewise.
> * gcc.dg/vect/vect-avg-9.c: Likewise.
> * gcc.dg/vect/vect-avg-10.c: Likewise.
> * gcc.dg/vect/vect-avg-11.c: Likewise.
> * gcc.dg/vect/vect-avg-12.c: Likewise.
> * gcc.dg/vect/vect-avg-13.c: Likewise.
> * gcc.dg/vect/vect-avg-14.c: Likewise.
>
> Index: gcc/doc/md.texi
> ===
> --- gcc/doc/md.texi 2018-06-29 10:14:49.425353913 +0100
> +++ gcc/doc/md.texi 2018-06-29 10:16:31.936416331 +0100
> @@ -5599,6 +5599,34 @@ Other shift and rotate instructions, ana
>  Vector shift and rotate instructions that take vectors as operand 2
>  instead of a scalar type.
>
> +@cindex @code{avg@var{m}3_floor} instruction pattern
> +@cindex @code{uavg@var{m}3_floor} instruction pattern
> +@item @samp{avg@var{m}3_floor}
> +@itemx @samp{uavg@var{m}3_floor}
> +Signed and unsigned average instructions.  These instructions add
> +operands 1 and 2 without truncation, divide the result by 2,
> +round towards -Inf, and store the result in operand 0.  This is
> +equivalent to the C code:
> +@smallexample
> +narrow op0, op1, op2;
> +@dots{}
> +op0 = (narrow) (((wide) op1 + (wide) op2) >> 1);
> +@end smallexample
> +where the sign of @samp{narrow} determines whether this is a signed
> +or unsigned operation.
> +
> +@cindex @code{avg@var{m}3_ceil} instruction pattern
> +@cindex @code{uavg@var{m}3_ceil} instruction pattern
> +@item @samp{avg@var{m}3_ceil}
> +@itemx @samp{uavg@var{m}3_ceil}
> +Like @samp{avg@var{m}3_floor} and @samp{uavg@var{m}3_floor}, but round
> +towards +Inf.  This is equivalent to the C code:
> +@smallexample
> +narrow op0, op1, op2;
> +@dots{}
> +op0 = (narrow) (((wide) op1 + (wide) op2 + 1) >> 1);
> +@end smallexample
> +
>  @cindex @code{bswap@var{m}2} instruction pattern
>  @item @samp{bswap@var{m}2}
>  Reverse the order of bytes of operand 1 and store the result in operand 0.
> Index: gcc/doc/sourcebuild.texi
> ===
> --- gcc/doc/sourcebuild.texi2018-06-14 12:27:24.156171818 +0100
> +++ gcc/doc/sourcebuild.texi2018-06-29 10:16:31.936416331 +0100
> @@ -1417,6 +1417,10 @@ Target supports Fortran @code{real} kind
>  The target's ABI allows stack variables to be aligned to the preferred
>  vector alignment.
>
> +@item vect_avg_qi
> +Target supports both signed and unsigned averaging operations on vectors
> +of bytes.
> +
>  @item vect_condition
>  Target supports vector conditional operations.
>
> Index: gcc/internal-fn.def
> ===
> --- gcc/internal-fn.def 2018-06-14 12:27:34.108084438 +0100
> +++ gcc/internal-fn.def 2018-06-29 10:16:31.936416331 +0100
> @@ -143,6 +143,11 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, f
>  DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
>  DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
>
> 

Re: [15/n] PR85694: Try to split existing casts in widened patterns

2018-07-02 Thread Richard Biener
On Wed, Jun 20, 2018 at 12:39 PM Richard Sandiford
 wrote:
>
> The main over-widening patch can introduce quite a few extra casts,
> and in many cases those casts simply "tap into" an intermediate
> point in an existing extension.  E.g. if we have:
>
> unsigned char a;
> int ax = (int) a;
>
> and a later operation using ax is shortened to "unsigned short",
> we would need:
>
> unsigned short ax' = (unsigned short) a;
>
> The a->ax extension requires one set of unpacks to get to unsigned
> short and another set of unpacks to get to int.  The first set are
> then duplicated for ax'.  If both ax and ax' are needed, the a->ax'
> extension would end up counting twice during cost calculations.
>
> This patch rewrites the original:
>
> int ax = (int) a;
>
> into a pattern:
>
> unsigned short ax' = (unsigned short) a;
> int ax = (int) ax';
>
> so that each extension only counts once.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> 2018-06-20  Richard Sandiford  
>
> gcc/
> * tree-vect-patterns.c (vect_split_statement): New function.
> (vect_convert_input): Use it to try to split an existing cast.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-over-widen-5.c: Test that the extensions
> get split into two for use by the over-widening pattern.
> * gcc.dg/vect/vect-over-widen-6.c: Likewise.
> * gcc.dg/vect/vect-over-widen-7.c: Likewise.
> * gcc.dg/vect/vect-over-widen-8.c: Likewise.
> * gcc.dg/vect/vect-over-widen-9.c: Likewise.
> * gcc.dg/vect/vect-over-widen-10.c: Likewise.
> * gcc.dg/vect/vect-over-widen-11.c: Likewise.
> * gcc.dg/vect/vect-over-widen-12.c: Likewise.
> * gcc.dg/vect/vect-over-widen-13.c: Likewise.
> * gcc.dg/vect/vect-over-widen-14.c: Likewise.
> * gcc.dg/vect/vect-over-widen-15.c: Likewise.
> * gcc.dg/vect/vect-over-widen-16.c: Likewise.
> * gcc.dg/vect/vect-over-widen-22.c: New test.
>
> Index: gcc/tree-vect-patterns.c
> ===
> --- gcc/tree-vect-patterns.c2018-06-20 11:26:19.557193074 +0100
> +++ gcc/tree-vect-patterns.c2018-06-20 11:26:23.637157077 +0100
> @@ -565,6 +565,97 @@ vect_recog_temp_ssa_var (tree type, gimp
>return make_temp_ssa_name (type, stmt, "patt");
>  }
>
> +/* STMT2_INFO describes a type conversion that could be split into STMT1
> +   followed by a version of STMT2_INFO that takes NEW_RHS as its first
> +   input.  Try to do this using pattern statements, returning true on
> +   success.  */
> +
> +static bool
> +vect_split_statement (stmt_vec_info stmt2_info, tree new_rhs,
> + gimple *stmt1, tree vectype)
> +{
> +  if (is_pattern_stmt_p (stmt2_info))
> +{
> +  /* STMT2_INFO is part of a pattern.  Get the statement to which
> +the pattern is attached.  */
> +  stmt_vec_info orig_stmt2_info
> +   = vinfo_for_stmt (STMT_VINFO_RELATED_STMT (stmt2_info));
> +  vect_init_pattern_stmt (stmt1, orig_stmt2_info, vectype);
> +
> +  if (dump_enabled_p ())
> +   {
> + dump_printf_loc (MSG_NOTE, vect_location,
> +  "Splitting pattern statement: ");
> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt2_info->stmt, 0);
> +   }
> +
> +  /* Since STMT2_INFO is a pattern statement, we can change it
> +in-situ without worrying about changing the code for the
> +containing block.  */
> +  gimple_assign_set_rhs1 (stmt2_info->stmt, new_rhs);
> +
> +  if (dump_enabled_p ())
> +   {
> + dump_printf_loc (MSG_NOTE, vect_location, "into: ");
> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt1, 0);
> + dump_printf_loc (MSG_NOTE, vect_location, "and: ");
> + dump_gimple_stmt (MSG_NOTE, TDF_SLIM, stmt2_info->stmt, 0);
> +   }
> +
> +  gimple_seq *def_seq = _VINFO_PATTERN_DEF_SEQ (orig_stmt2_info);
> +  if (STMT_VINFO_RELATED_STMT (orig_stmt2_info) == stmt2_info->stmt)
> +   /* STMT2_INFO is the actual pattern statement.  Add STMT1
> +  to the end of the definition sequence.  */
> +   gimple_seq_add_stmt_without_update (def_seq, stmt1);
> +  else
> +   {
> + /* STMT2_INFO belongs to the definition sequence.  Insert STMT1
> +before it.  */
> + gimple_stmt_iterator gsi = gsi_for_stmt (stmt2_info->stmt, def_seq);
> + gsi_insert_before_without_update (, stmt1, GSI_SAME_STMT);
> +   }
> +  return true;
> +}
> +  else
> +{
> +  /* STMT2_INFO doesn't yet have a pattern.  Try to create a
> +two-statement pattern now.  */
> +  gcc_assert (!STMT_VINFO_RELATED_STMT (stmt2_info));
> +  tree lhs_type = TREE_TYPE (gimple_get_lhs (stmt2_info->stmt));
> +  tree lhs_vectype = get_vectype_for_scalar_type (lhs_type);
> +  if (!lhs_vectype)
> +   return false;
> +
> +  if 

Re: [14/n] PR85694: Rework overwidening detection

2018-07-02 Thread Richard Biener
On Fri, Jun 29, 2018 at 1:36 PM Richard Sandiford
 wrote:
>
> Richard Sandiford  writes:
> > This patch is the main part of PR85694.  The aim is to recognise at least:
> >
> >   signed char *a, *b, *c;
> >   ...
> >   for (int i = 0; i < 2048; i++)
> > c[i] = (a[i] + b[i]) >> 1;
> >
> > as an over-widening pattern, since the addition and shift can be done
> > on shorts rather than ints.  However, it ended up being a lot more
> > general than that.
> >
> > The current over-widening pattern detection is limited to a few simple
> > cases: logical ops with immediate second operands, and shifts by a
> > constant.  These cases are enough for common pixel-format conversion
> > and can be detected in a peephole way.
> >
> > The loop above requires two generalisations of the current code: support
> > for addition as well as logical ops, and support for non-constant second
> > operands.  These are harder to detect in the same peephole way, so the
> > patch tries to take a more global approach.
> >
> > The idea is to get information about the minimum operation width
> > in two ways:
> >
> > (1) by using the range information attached to the SSA_NAMEs
> > (effectively a forward walk, since the range info is
> > context-independent).
> >
> > (2) by back-propagating the number of output bits required by
> > users of the result.
> >
> > As explained in the comments, there's a balance to be struck between
> > narrowing an individual operation and fitting in with the surrounding
> > code.  The approach is pretty conservative: if we could narrow an
> > operation to N bits without changing its semantics, it's OK to do that if:
> >
> > - no operations later in the chain require more than N bits; or
> >
> > - all internally-defined inputs are extended from N bits or fewer,
> >   and at least one of them is single-use.
> >
> > See the comments for the rationale.
> >
> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
> > since the code seemed more readable without.
> >
> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Here's a version rebased on top of current trunk.  Changes from last time:
>
> - reintroduce dump_generic_expr_loc, with the obvious change to the
>   prototype
>
> - fix a typo in a comment
>
> - use vect_element_precision from the new version of 12/n.
>
> Tested as before.  OK to install?

OK.

Richard.

> Richard
>
>
> 2018-06-29  Richard Sandiford  
>
> gcc/
> * poly-int.h (print_hex): New function.
> * dumpfile.h (dump_generic_expr_loc, dump_dec, dump_hex): Declare.
> * dumpfile.c (dump_generic_expr): Fix formatting.
> (dump_generic_expr_loc): New function.
> (dump_dec, dump_hex): New poly_wide_int functions.
> * tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
> min_input_precision, operation_precision and operation_sign.
> * tree-vect-patterns.c (vect_get_range_info): New function.
> (vect_same_loop_or_bb_p, vect_single_imm_use)
> (vect_operation_fits_smaller_type): Delete.
> (vect_look_through_possible_promotion): Add an optional
> single_use_p parameter.
> (vect_recog_over_widening_pattern): Rewrite to use new
> stmt_vec_info infomration.  Handle one operation at a time.
> (vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
> (vect_truncatable_operation_p, vect_set_operation_type)
> (vect_set_min_input_precision): New functions.
> (vect_determine_min_output_precision_1): Likewise.
> (vect_determine_min_output_precision): Likewise.
> (vect_determine_precisions_from_range): Likewise.
> (vect_determine_precisions_from_users): Likewise.
> (vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
> (vect_vect_recog_func_ptrs): Put over_widening first.
> Add cast_forwprop.
> (vect_pattern_recog): Call vect_determine_precisions.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
> over-widening messages.
> * gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-2.c: Likewise.
> * gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
> * gcc.dg/vect/bb-slp-over-widen-1.c: New test.
> * gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
> * gcc.dg/vect/vect-over-widen-5.c: Likewise.
> * gcc.dg/vect/vect-over-widen-6.c: Likewise.
> * gcc.dg/vect/vect-over-widen-7.c: Likewise.
> * gcc.dg/vect/vect-over-widen-8.c: Likewise.
> * gcc.dg/vect/vect-over-widen-9.c: Likewise.
> * gcc.dg/vect/vect-over-widen-10.c: Likewise.
> * 

Re: [PATCH] -fopt-info: add indentation via DUMP_VECT_SCOPE

2018-07-02 Thread Christophe Lyon
On Fri, 29 Jun 2018 at 10:09, Richard Biener  wrote:
>
> On Tue, Jun 26, 2018 at 5:43 PM David Malcolm  wrote:
> >
> > This patch adds a concept of nested "scopes" to dumpfile.c's dump_*_loc
> > calls, and wires it up to the DUMP_VECT_SCOPE macro in tree-vectorizer.h,
> > so that the nested structure is shown in -fopt-info by indentation.
> >
> > For example, this converts -fopt-info-all e.g. from:
> >
> > test.c:8:3: note: === analyzing loop ===
> > test.c:8:3: note: === analyze_loop_nest ===
> > test.c:8:3: note: === vect_analyze_loop_form ===
> > test.c:8:3: note: === get_loop_niters ===
> > test.c:8:3: note: symbolic number of iterations is (unsigned int) n_9(D)
> > test.c:8:3: note: not vectorized: loop contains function calls or data 
> > references that cannot be analyzed
> > test.c:8:3: note: vectorized 0 loops in function
> >
> > to:
> >
> > test.c:8:3: note: === analyzing loop ===
> > test.c:8:3: note:  === analyze_loop_nest ===
> > test.c:8:3: note:   === vect_analyze_loop_form ===
> > test.c:8:3: note:=== get_loop_niters ===
> > test.c:8:3: note:   symbolic number of iterations is (unsigned int) n_9(D)
> > test.c:8:3: note:   not vectorized: loop contains function calls or data 
> > references that cannot be analyzed
> > test.c:8:3: note: vectorized 0 loops in function
> >
> > showing that the "symbolic number of iterations" message is within
> > the "=== analyze_loop_nest ===" (and not within the
> > "=== vect_analyze_loop_form ===").
> >
> > This is also enabling work for followups involving optimization records
> > (allowing the records to directly capture the nested structure of the
> > dump messages).
> >
> > Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
> >
> > OK for trunk?
>

Hi,

I've noticed that this patch (r262246) caused regressions on aarch64:
gcc.dg/vect/slp-perm-1.c -flto -ffat-lto-objects  scan-tree-dump
vect "note: Built SLP cancelled: can use load/store-lanes"
gcc.dg/vect/slp-perm-1.c scan-tree-dump vect "note: Built SLP
cancelled: can use load/store-lanes"
gcc.dg/vect/slp-perm-2.c -flto -ffat-lto-objects  scan-tree-dump
vect "note: Built SLP cancelled: can use load/store-lanes"
gcc.dg/vect/slp-perm-2.c scan-tree-dump vect "note: Built SLP
cancelled: can use load/store-lanes"
gcc.dg/vect/slp-perm-3.c -flto -ffat-lto-objects  scan-tree-dump
vect "note: Built SLP cancelled: can use load/store-lanes"
gcc.dg/vect/slp-perm-3.c scan-tree-dump vect "note: Built SLP
cancelled: can use load/store-lanes"
gcc.dg/vect/slp-perm-5.c -flto -ffat-lto-objects  scan-tree-dump
vect "note: Built SLP cancelled: can use load/store-lanes"
gcc.dg/vect/slp-perm-5.c scan-tree-dump vect "note: Built SLP
cancelled: can use load/store-lanes"
gcc.dg/vect/slp-perm-6.c -flto -ffat-lto-objects  scan-tree-dump
vect "note: Built SLP cancelled: can use load/store-lanes"
gcc.dg/vect/slp-perm-6.c scan-tree-dump vect "note: Built SLP
cancelled: can use load/store-lanes"
gcc.dg/vect/slp-perm-7.c -flto -ffat-lto-objects  scan-tree-dump
vect "note: Built SLP cancelled: can use load/store-lanes"
gcc.dg/vect/slp-perm-7.c scan-tree-dump vect "note: Built SLP
cancelled: can use load/store-lanes"
gcc.dg/vect/slp-perm-8.c -flto -ffat-lto-objects  scan-tree-dump
vect "note: Built SLP cancelled: can use load/store-lanes"
gcc.dg/vect/slp-perm-8.c scan-tree-dump vect "note: Built SLP
cancelled: can use load/store-lanes"

The problem is that now there are more spaces between "note:" and
"Built", the attached small patch does that for slp-perm-1.c.
Is it the right way of fixing it or do we want to accept any amount of
spaces for instance?

I'm surprised there is such little impact on the testsuite though.

If OK, I'll update the patch to take the other slp-perm-[235678].c
tests into account.

Thanks,

Christophe

> OK and sorry for the delay.
> Richard.
>
> > gcc/ChangeLog:
> > * dumpfile.c (dump_loc): Add indentation based on scope depth.
> > (dump_scope_depth): New variable.
> > (get_dump_scope_depth): New function.
> > (dump_begin_scope): New function.
> > (dump_end_scope): New function.
> > * dumpfile.h (get_dump_scope_depth): New declaration.
> > (dump_begin_scope): New declaration.
> > (dump_end_scope): New declaration.
> > (class auto_dump_scope): New class.
> > (AUTO_DUMP_SCOPE): New macro.
> > * tree-vectorizer.h (DUMP_VECT_SCOPE): Reimplement in terms of
> > AUTO_DUMP_SCOPE.
> > ---
> >  gcc/dumpfile.c| 35 +++
> >  gcc/dumpfile.h| 39 +++
> >  gcc/tree-vectorizer.h | 15 ---
> >  3 files changed, 82 insertions(+), 7 deletions(-)
> >
> > diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
> > index 122e420..190b52d 100644
> > --- a/gcc/dumpfile.c
> > +++ b/gcc/dumpfile.c
> > @@ -419,6 +419,8 @@ dump_loc (dump_flags_t dump_kind, FILE *dfile, 
> > source_location 

Re: [PATCH][arm] Avoid STRD with odd register for TARGET_ARM in output_move_double

2018-07-02 Thread Christophe Lyon
On Fri, 29 Jun 2018 at 15:32, Kyrill Tkachov
 wrote:
>
> Hi all,
>
> In this testcase the user forces an odd register as the starting reg for a 
> DFmode value.
> The output_move_double function tries to store that using an STRD instruction.
> But for TARGET_ARM the starting register of an STRD must be an even one.
> This is always the case with compiler-allocated registers for DFmode values, 
> but the
> inline assembly forced our hand here.
>
> This patch  restricts the STRD-emitting logic in output_move_double to not 
> avoid
> odd-numbered source registers in STRD.
> I'm not a fan of the whole function, we should be exposing a lot of the logic 
> in there
> to RTL rather than at the final output stage, but that would need to be fixed 
> separately.
>
> This patch is much safer for backporting purposes.
>
> Bootstrapped and tested on arm-none-linux-gnueabihf.
>

Hi Kyrill,

I think you want to skip this test if one overrides -mfloat-abi, like
the small attached patch does.

OK?

Thanks,

Christophe

> Committing to trunk.
> Thanks,
> Kyrill
>
> 2018-06-29  Kyrylo Tkachov  
>
>  * config/arm/arm.c (output_move_double): Don't allow STRD instructions
>  if starting source register is not even.
>
> 2018-06-29  Kyrylo Tkachov  
>
>  * gcc.target/arm/arm-soft-strd-even.c: New test.
gcc/testsuite/ChangeLog:

2018-07-02  Christophe Lyon  

* gcc.target/arm/arm-soft-strd-even.c: Skip if -mfloat-abi is
overriden.

diff --git a/gcc/testsuite/gcc.target/arm/arm-soft-strd-even.c 
b/gcc/testsuite/gcc.target/arm/arm-soft-strd-even.c
index fb7317c..4ef3dd8 100644
--- a/gcc/testsuite/gcc.target/arm/arm-soft-strd-even.c
+++ b/gcc/testsuite/gcc.target/arm/arm-soft-strd-even.c
@@ -1,5 +1,6 @@
 /* { dg-do assemble } */
 /* { dg-require-effective-target arm_arm_ok } */
+/* { dg-skip-if "do not override -mfloat-abi" { *-*-* } { "-mfloat-abi=*" } 
{"-mfloat-abi=soft" } } */
 /* { dg-options "-O2 -marm -mfloat-abi=soft" } */
 
 /* Check that we don't try to emit STRD in ARM state with


Re: Limit Debug mode impact: overload __niter_base

2018-07-02 Thread Jonathan Wakely

On 01/07/18 21:20 +0200, François Dumont wrote:

    Here is a new proposal between yours and mine.

    It is still adding a function to wrap what __niter_base unwrap, I 
called it __nwrap_iter for this reason. But it takes advantage of 


Since "niter" refers to __normal_iterator I think a name based on
"niter" would be better than nsomething_iter.

__niter_wrap
__niter_rewrap
__niter_lift (misuse of functional programming term?)
__niter_raise (misuse of linear algebra term?)
__make_niter
__remake_niter


knowing that __niter_base will only unwrap random access iterator to 
use an expression to that will do the right thing, no matter the 
original iterator type.


OK, since __niter_base only transforms types based on 
__normal_iterator that seems safe to assume (in theory we could use

__normal_iterator with non-random access iterators, but we don't).

Could you please add a comment to the __nwrap_iter saying something
like:

 // Reverse the __niter_base transformation to get a
 // __normal_iterator back again (this assumes that __normal_iterator
 // is only used to wrap random access iterators, like pointers).


    I eventually found no issue in the testsuite, despite the 
std::equal change. I might have had a local invalid test.


Yes, I *did* test it already :-)





diff --git a/libstdc++-v3/include/bits/stl_algobase.h 
b/libstdc++-v3/include/bits/stl_algobase.h
index d429943..003ae8d 100644
--- a/libstdc++-v3/include/bits/stl_algobase.h
+++ b/libstdc++-v3/include/bits/stl_algobase.h
@@ -277,6 +277,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__niter_base(_Iterator __it)
{ return __it; }

+  // Convert an iterator of type _From to an iterator of type _To.
+  // e.g. from int* to __normal_iterator.
+  template
+inline _Iterator
+__nwrap_iter(_Iterator, _Iterator, _Iterator __res)
+{ return __res; }
+
+  template
+inline _From
+__nwrap_iter(_From __from, _To __to, _To __res)
+{ return __from + (__res - __to); }


Every time you call this function you pass it:

 __nwrap_iter(x, __niter_base(x), y)

So can the __niter_base(x) call happen inside __nwrap_iter?

i.e.

 template
   inline _From
   __nwrap_iter(_From __from, _To __res)
   { return __from + (__res - __niter_base(__from)); }




Re: [PATCH, aarch64 4/4] aarch64: Add movprfx patterns for zero and unmatched select

2018-07-02 Thread Richard Sandiford
Richard Henderson  writes:
>   * config/aarch64/aarch64-protos.h, config/aarch64/aarch64.c
>   (aarch64_sve_prepare_conditional_op): Remove.
>   * config/aarch64/aarch64-sve.md (cond_):
>   Allow aarch64_simd_reg_or_zero as select operand; remove
>   the aarch64_sve_prepare_conditional_op call.
>   (cond_): Likewise.
>   (cond_): Likewise.
>   (*cond__z): New pattern.
>   (*cond__z): New pattern.
>   (*cond__z): New pattern.
>   (*cond__any): New pattern.
>   (*cond__any): New pattern.
>   (*cond__any): New pattern
>   and a splitters to match all of the *_any patterns.
>   * config/aarch64/predicates.md (aarch64_sve_any_binary_operator): New.
> ---
>  gcc/config/aarch64/aarch64-protos.h |   1 -
>  gcc/config/aarch64/aarch64.c|  54 --
>  gcc/config/aarch64/aarch64-sve.md   | 154 
>  gcc/config/aarch64/predicates.md|   3 +
>  4 files changed, 136 insertions(+), 76 deletions(-)

OK, thanks.

Richard

>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 87c6ae20278..514ddc457ca 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -513,7 +513,6 @@ bool aarch64_gen_adjusted_ldpstp (rtx *, bool, 
> scalar_mode, RTX_CODE);
>  void aarch64_expand_sve_vec_cmp_int (rtx, rtx_code, rtx, rtx);
>  bool aarch64_expand_sve_vec_cmp_float (rtx, rtx_code, rtx, rtx, bool);
>  void aarch64_expand_sve_vcond (machine_mode, machine_mode, rtx *);
> -void aarch64_sve_prepare_conditional_op (rtx *, unsigned int, bool);
>  #endif /* RTX_CODE */
>  
>  void aarch64_init_builtins (void);
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 3af7e98e166..d75d45f4b8b 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -16058,60 +16058,6 @@ aarch64_expand_sve_vcond (machine_mode data_mode, 
> machine_mode cmp_mode,
>emit_set_insn (ops[0], gen_rtx_UNSPEC (data_mode, vec, UNSPEC_SEL));
>  }
>  
> -/* Prepare a cond_ operation that has the operands
> -   given by OPERANDS, where:
> -
> -   - operand 0 is the destination
> -   - operand 1 is a predicate
> -   - operands 2 to NOPS - 2 are the operands to an operation that is
> - performed for active lanes
> -   - operand NOPS - 1 specifies the values to use for inactive lanes.
> -
> -   COMMUTATIVE_P is true if operands 2 and 3 are commutative.  In that case,
> -   no pattern is provided for a tie between operands 3 and NOPS - 1.  */
> -
> -void
> -aarch64_sve_prepare_conditional_op (rtx *operands, unsigned int nops,
> - bool commutative_p)
> -{
> -  /* We can do the operation directly if the "else" value matches one
> - of the other inputs.  */
> -  for (unsigned int i = 2; i < nops - 1; ++i)
> -if (rtx_equal_p (operands[i], operands[nops - 1]))
> -  {
> - if (i == 3 && commutative_p)
> -   std::swap (operands[2], operands[3]);
> - return;
> -  }
> -
> -  /* If the "else" value is different from the other operands, we have
> - the choice of doing a SEL on the output or a SEL on an input.
> - Neither choice is better in all cases, but one advantage of
> - selecting the input is that it can avoid a move when the output
> - needs to be distinct from the inputs.  E.g. if operand N maps to
> - register N, selecting the output would give:
> -
> - MOVPRFX Z0.S, Z2.S
> - ADD Z0.S, P1/M, Z0.S, Z3.S
> - SEL Z0.S, P1, Z0.S, Z4.S
> -
> - whereas selecting the input avoids the MOVPRFX:
> -
> - SEL Z0.S, P1, Z2.S, Z4.S
> - ADD Z0.S, P1/M, Z0.S, Z3.S.
> -
> - ??? Matching the other input can produce
> -
> - MOVPRFX Z4.S, P1/M, Z2.S
> - ADD Z4.S, P1/M, Z4.S, Z3.S
> -   */
> -  machine_mode mode = GET_MODE (operands[0]);
> -  rtx temp = gen_reg_rtx (mode);
> -  rtvec vec = gen_rtvec (3, operands[1], operands[2], operands[nops - 1]);
> -  emit_set_insn (temp, gen_rtx_UNSPEC (mode, vec, UNSPEC_SEL));
> -  operands[2] = operands[nops - 1] = temp;
> -}
> -
>  /* Implement TARGET_MODES_TIEABLE_P.  In principle we should always return
> true.  However due to issues with register allocation it is preferable
> to avoid tieing integer scalar and FP scalar modes.  Executing integer
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index db16affc093..b16d0455159 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -1817,13 +1817,10 @@
>  (SVE_INT_BINARY:SVE_I
>(match_operand:SVE_I 2 "register_operand")
>(match_operand:SVE_I 3 "register_operand"))
> -(match_operand:SVE_I 4 "register_operand")]
> +(match_operand:SVE_I 4 "aarch64_simd_reg_or_zero")]
> UNSPEC_SEL))]
>"TARGET_SVE"
> -{
> -  bool commutative_p = (GET_RTX_CLASS () == RTX_COMM_ARITH);
> -  aarch64_sve_prepare_conditional_op 

Re: [PATCH, aarch64 2/4] aarch64: Remove predicate from inside SVE_COND_FP_BINARY

2018-07-02 Thread Richard Sandiford
Richard Henderson  writes:
> The predicate is present within the containing UNSPEC_SEL;
> there is no need to duplicate it.
>
>   * config/aarch64/aarch64-sve.md (cond_):
>   Remove match_dup 1 from the inner unspec.
>   (*cond_): Likewise.

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64-sve.md | 9 +++--
>  1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 3dee6a4376d..2aceef65c80 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -2677,8 +2677,7 @@
>   (unspec:SVE_F
> [(match_operand: 1 "register_operand")
>  (unspec:SVE_F
> -  [(match_dup 1)
> -   (match_operand:SVE_F 2 "register_operand")
> +  [(match_operand:SVE_F 2 "register_operand")
> (match_operand:SVE_F 3 "register_operand")]
>SVE_COND_FP_BINARY)
>  (match_operand:SVE_F 4 "register_operand")]
> @@ -2694,8 +2693,7 @@
>   (unspec:SVE_F
> [(match_operand: 1 "register_operand" "Upl")
>  (unspec:SVE_F
> -  [(match_dup 1)
> -   (match_operand:SVE_F 2 "register_operand" "0")
> +  [(match_operand:SVE_F 2 "register_operand" "0")
> (match_operand:SVE_F 3 "register_operand" "w")]
>SVE_COND_FP_BINARY)
>  (match_dup 2)]
> @@ -2710,8 +2708,7 @@
>   (unspec:SVE_F
> [(match_operand: 1 "register_operand" "Upl")
>  (unspec:SVE_F
> -  [(match_dup 1)
> -   (match_operand:SVE_F 2 "register_operand" "w")
> +  [(match_operand:SVE_F 2 "register_operand" "w")
> (match_operand:SVE_F 3 "register_operand" "0")]
>SVE_COND_FP_BINARY)
>  (match_dup 3)]


Re: [PATCH, aarch64 1/4] aarch64: Add movprfx alternatives for unpredicated patterns

2018-07-02 Thread Richard Sandiford
Richard Henderson  writes:
>   * config/aarch64/aarch64.md (movprfx): New attr.
>   (length): Default movprfx to 8.
>   * config/aarch64/aarch64-sve.md (*mul3): Add movprfx alt.
>   (*madd, *msub   (*mul3_highpart): Likewise.
>   (*3): Likewise.
>   (*v3): Likewise.
>   (*3): Likewise.
>   (*3): Likewise.
>   (*fma4, *fnma4): Likewise.
>   (*fms4, *fnms4): Likewise.
>   (*div4): Likewise.

OK, thanks.

Richard

> ---
>  gcc/config/aarch64/aarch64-sve.md | 184 ++
>  gcc/config/aarch64/aarch64.md |  11 +-
>  2 files changed, 116 insertions(+), 79 deletions(-)
>
> diff --git a/gcc/config/aarch64/aarch64-sve.md 
> b/gcc/config/aarch64/aarch64-sve.md
> index 8e2433385a8..3dee6a4376d 100644
> --- a/gcc/config/aarch64/aarch64-sve.md
> +++ b/gcc/config/aarch64/aarch64-sve.md
> @@ -937,47 +937,53 @@
>  ;; to gain much and would make the instruction seem less uniform to the
>  ;; register allocator.
>  (define_insn "*mul3"
> -  [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, ?")
>   (unspec:SVE_I
> -   [(match_operand: 1 "register_operand" "Upl, Upl")
> +   [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
>  (mult:SVE_I
> -  (match_operand:SVE_I 2 "register_operand" "%0, 0")
> -  (match_operand:SVE_I 3 "aarch64_sve_mul_operand" "vsm, w"))]
> +  (match_operand:SVE_I 2 "register_operand" "%0, 0, w")
> +  (match_operand:SVE_I 3 "aarch64_sve_mul_operand" "vsm, w, w"))]
> UNSPEC_MERGE_PTRUE))]
>"TARGET_SVE"
>"@
> mul\t%0., %0., #%3
> -   mul\t%0., %1/m, %0., %3."
> +   mul\t%0., %1/m, %0., %3.
> +   movprfx\t%0, %2\;mul\t%0., %1/m, %0., %3."
> +  [(set_attr "movprfx" "*,*,yes")]
>  )
>  
>  (define_insn "*madd"
> -  [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, ?")
>   (plus:SVE_I
> (unspec:SVE_I
> - [(match_operand: 1 "register_operand" "Upl, Upl")
> -  (mult:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, w")
> -  (match_operand:SVE_I 3 "register_operand" "w, w"))]
> + [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
> +  (mult:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, w, w")
> +  (match_operand:SVE_I 3 "register_operand" "w, w, w"))]
>   UNSPEC_MERGE_PTRUE)
> -   (match_operand:SVE_I 4 "register_operand" "w, 0")))]
> +   (match_operand:SVE_I 4 "register_operand" "w, 0, w")))]
>"TARGET_SVE"
>"@
> mad\t%0., %1/m, %3., %4.
> -   mla\t%0., %1/m, %2., %3."
> +   mla\t%0., %1/m, %2., %3.
> +   movprfx\t%0, %4\;mla\t%0., %1/m, %2., %3."
> +  [(set_attr "movprfx" "*,*,yes")]
>  )
>  
>  (define_insn "*msub3"
> -  [(set (match_operand:SVE_I 0 "register_operand" "=w, w")
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, w, ?")
>   (minus:SVE_I
> -   (match_operand:SVE_I 4 "register_operand" "w, 0")
> +   (match_operand:SVE_I 4 "register_operand" "w, 0, w")
> (unspec:SVE_I
> - [(match_operand: 1 "register_operand" "Upl, Upl")
> -  (mult:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, w")
> -  (match_operand:SVE_I 3 "register_operand" "w, w"))]
> + [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
> +  (mult:SVE_I (match_operand:SVE_I 2 "register_operand" "%0, w, w")
> +  (match_operand:SVE_I 3 "register_operand" "w, w, w"))]
>   UNSPEC_MERGE_PTRUE)))]
>"TARGET_SVE"
>"@
> msb\t%0., %1/m, %3., %4.
> -   mls\t%0., %1/m, %2., %3."
> +   mls\t%0., %1/m, %2., %3.
> +   movprfx\t%0, %4\;mls\t%0., %1/m, %2., %3."
> +  [(set_attr "movprfx" "*,*,yes")]
>  )
>  
>  ;; Unpredicated highpart multiplication.
> @@ -997,15 +1003,18 @@
>  
>  ;; Predicated highpart multiplication.
>  (define_insn "*mul3_highpart"
> -  [(set (match_operand:SVE_I 0 "register_operand" "=w")
> +  [(set (match_operand:SVE_I 0 "register_operand" "=w, ?")
>   (unspec:SVE_I
> -   [(match_operand: 1 "register_operand" "Upl")
> -(unspec:SVE_I [(match_operand:SVE_I 2 "register_operand" "%0")
> -   (match_operand:SVE_I 3 "register_operand" "w")]
> +   [(match_operand: 1 "register_operand" "Upl, Upl")
> +(unspec:SVE_I [(match_operand:SVE_I 2 "register_operand" "%0, w")
> +   (match_operand:SVE_I 3 "register_operand" "w, w")]
>MUL_HIGHPART)]
> UNSPEC_MERGE_PTRUE))]
>"TARGET_SVE"
> -  "mulh\t%0., %1/m, %0., %3."
> +  "@
> +   mulh\t%0., %1/m, %0., %3.
> +   movprfx\t%0, %2\;mulh\t%0., %1/m, %0., %3."
> +  [(set_attr "movprfx" "*,yes")]
>  )
>  
>  ;; Unpredicated division.
> @@ -1025,17 +1034,19 @@
>  
>  ;; Division predicated with a PTRUE.
>  (define_insn "*3"
> -  [(set (match_operand:SVE_SDI 0 "register_operand" "=w, 

Re: [PATCH, aarch64 3/4] aarch64: Add movprfx alternatives for predicate patterns

2018-07-02 Thread Richard Sandiford
Richard Henderson  writes:
> @@ -2687,34 +2738,60 @@
>aarch64_sve_prepare_conditional_op (operands, 5, );
>  })
>  
> -;; Predicated floating-point operations.
> -(define_insn "*cond_"
> -  [(set (match_operand:SVE_F 0 "register_operand" "=w")
> +;; Predicated floating-point operations with select matching output.
> +(define_insn "*cond__0"
> +  [(set (match_operand:SVE_F 0 "register_operand" "+w, w, ?")
>   (unspec:SVE_F
> -   [(match_operand: 1 "register_operand" "Upl")
> +   [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
>  (unspec:SVE_F
> -  [(match_operand:SVE_F 2 "register_operand" "0")
> -   (match_operand:SVE_F 3 "register_operand" "w")]
> +  [(match_dup 1)
> +   (match_operand:SVE_F 2 "register_operand" "0, w, w")
> +   (match_operand:SVE_F 3 "register_operand" "w, 0, w")]
> +  SVE_COND_FP_BINARY)
> +(match_dup 0)]
> +   UNSPEC_SEL))]
> +  "TARGET_SVE"
> +  "@
> +   \t%0., %1/m, %0., %3.
> +   \t%0., %1/m, %0., %2.
> +   movprfx\t%0, %1/m, %2\;\t%0., %1/m, %0., 
> %3."
> +  [(set_attr "movprfx" "*,*,yes")]
> +)

Reintroduces a (match_dup 1) into the SVE_COND_FP_BINARY.

OK otherwise, thanks.

The original reason for using SVE_COND_FP_BINARY rather than rtx codes
was to emphasise that nothing happens for inactive lanes: this is really
a predicated operation that returns "don't care" values for inactive lanes
fused with a select that "happens" to use (but in fact always uses) the same
predicate.  So from that point of view it seemed natural for both unspecs
to have the predicate.

OTOH, since SVE_COND_FP_BINARY is never used independently, and since it's
an unspec, I guess it doesn't matter much either way.

Richard


Re: [PATCH][aarch64] Avoid tag collisions for loads on falkor

2018-07-02 Thread Siddhesh Poyarekar

On 07/02/2018 03:29 PM, Kyrill Tkachov wrote:

Nice! What were the regressions though? Would be nice to adjust the tests
to make them more robust so that we have as clean a testsuite as possible.


Sure, they're gcc.dg/guality/pr36728-2.c and gcc.target/aarch64/extend.c.

The addressing mode costs for falkor lead to generation of an sbfiz + 
ldr for extend.c instead of the ldr with sxtw.  Luis is looking at 
whether that is the best output for falkor or if it needs to be 
improved.  I suspect this may result in a cost adjustment.


pr36728-2.c reorders code and seems to throw off gdb but the codegen 
seems correct.  This patch is not responsible for this regression though 
(nor extend.c) so I didn't look too far beyond verifying that the 
codegen wasn't incorrect.



More comments inline, but a general observation:
in the function comment for the new functions can you please include a 
description
of the function arguments and the meaning of the return value (for 
example, some functions return -1 ; what does that mean?).
It really does make it much easier to maintain the code after some time 
has passed.


OK.

+   rudimentarny attempt to ensure that related loads with the same 
tags don't

+   get moved out unnecessarily.


s/rudimentarny/rudimentary/


OK.


+  tag_insn_info (rtx_insn *insn, rtx dest, rtx base, rtx offset,
+    bool writeback, bool ldp)
+    {
+  this->insn = insn;
+  this->dest = dest;
+  this->base = base;
+  this->offset = offset;
+  this->writeback = writeback;
+  this->ldp = ldp;
+    }
+


Since this is C++ you can write it as the more idiomatic constructor 
initialiser list (I think that's what it's called):
tag_insn_info (rtx_insn *i, rtx b, rtx d, rtx o, bool wr, bool l) : insn 
(i), base (b), dest (d) etc.


OK.


+  /* Compute the tag based on BASE, DEST and OFFSET of the load.  */
+  unsigned tag ()
+    {
+  unsigned int_offset = 0;
+  rtx offset = this->offset;
+  unsigned dest = REGNO (this->dest);
+  unsigned base = REGNO (this->base);
+  machine_mode dest_mode = GET_MODE (this->dest);
+  unsigned dest_mode_size = GET_MODE_SIZE (dest_mode).to_constant 
();

+


I appreciate this pass is unlikely to be used with SVE code but it would 
be nice if we could make it
variable-with-mode-proof. Current practice is to add a comment to 
.to_constant () calls explaining why
we guarantee that the size is constant, or otherwise check is_constant 
() and have appropriate fallbacks.
Check other uses of to_constant () and is_constant () in aarch64.c for 
examples. This applies to all uses

of to_constant () in this file.


OK.


+ recog_memoized (insn);


Did you mean to continue here if recog_memoized (insn) < 0 ?


I didn't, thanks for catching that.

+ /* Don't bother with very long strides because the 
prefetcher

+    is unable to train on them anyway.  */
+ if (INTVAL (stride) < 2048)
+   return true;


I appreciate this is a core-specific but can you please at least make it 
a #define constant with

a meaningful name and use that?


OK.

+  /* The largest width we want to bother with is a load of a pair of 
qud-words.  */


"quad-words"


OK.

Thanks,
Siddhesh


Re: [patch] jump threading multiple paths that start from the same BB

2018-07-02 Thread Christophe Lyon
On Sun, 1 Jul 2018 at 12:56, Aldy Hernandez  wrote:
>
>
>
> On 06/29/2018 02:50 PM, Jeff Law wrote:
> > [ Returning to another old patch... ]
> >
> > On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
> >> [One more time, but without rejected HTML mail, because apparently this
> >> is my first post to gcc-patches *ever* ;-)].
> >>
> >> Howdy!
> >>
> >> While poking around in the backwards threader I noticed that we bail if
> >> we have already seen a starting BB.
> >>
> >>/* Do not jump-thread twice from the same block.  */
> >>if (bitmap_bit_p (threaded_blocks, entry->src->index)
> >>
> >> This limitation discards paths that are sub-paths of paths that have
> >> already been threaded.
> >>
> >> The following patch scans the remaining to-be-threaded paths to identify
> >> if any of them start from the same point, and are thus sub-paths of the
> >> just-threaded path.  By removing the common prefix of blocks in upcoming
> >> threadable paths, and then rewiring first non-common block
> >> appropriately, we expose new threading opportunities, since we are no
> >> longer starting from the same BB.  We also simplify the would-be
> >> threaded paths, because we don't duplicate already duplicated paths.
> >>
> >> This sounds confusing, but the documentation for the entry point to my
> >> patch (adjust_paths_after_duplication) shows an actual example:
> >>
> >> +/* After an FSM path has been jump threaded, adjust the remaining FSM
> >> +   paths that are subsets of this path, so these paths can be safely
> >> +   threaded within the context of the new threaded path.
> >> +
> >> +   For example, suppose we have just threaded:
> >> +
> >> +   5 -> 6 -> 7 -> 8 -> 12  =>  5 -> 6' -> 7' -> 8' -> 12'
> >> +
> >> +   And we have an upcoming threading candidate:
> >> +   5 -> 6 -> 7 -> 8 -> 15 -> 20
> >> +
> >> +   This function adjusts the upcoming path into:
> >> +   8' -> 15 -> 20
> >>
> >> Notice that we will no longer have two paths that start from the same
> >> BB.  One will start with bb5, while the adjusted path will start with
> >> bb8'.  With this we kill two birds-- we are able to thread more paths,
> >> and these paths will avoid duplicating a whole mess of things that have
> >> already been threaded.
> >>
> >> The attached patch is a subset of some upcoming work that can live on
> >> its own.  It bootstraps and regtests.  Also, by running it on a handful
> >> of .ii files, I can see that we are able to thread sub-paths that we
> >> previously dropped on the floor.  More is better, right? :)
> >>
> >> To test this, I stole Jeff's method of using cachegrind to benchmark
> >> instruction counts and conditional branches
> >> (https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02434.html).
> >>
> >> Basically, I bootstrapped two compilers, with and without improvements,
> >> and used each to build a stage1 trunk.  Each of these stage1-trunks were
> >> used on 246 .ii GCC files I have lying around from a bootstrap some
> >> random point last year.  I used no special flags on builds apart from
> >> --enable-languages=c,c++.
> >>
> >> Although I would've wished a larger improvement, this works comes for
> >> free, as it's just a subset of other work I'm hacking on.
> >>
> >> Without further ado, here are my monumental, earth shattering improvements:
> >>
> >> Conditional branches
> >> Without patch: 411846839709
> >> Withpatch: 411831330316
> >>  %changed: -0.0037660%
> >>
> >> Number of instructions
> >> Without patch: 2271844650634
> >> Withpatch: 2271761153578
> >>  %changed: -0.0036754%
> >>
> >>
> >> OK for trunk?
> >> Aldy
> >>
> >> p.s. There is a lot of debugging/dumping code in my patch, which I can
> >> gladly remove if/when approved.  It helps keep my head straight while
> >> looking at this spaghetti :).
> >>
> >> curr.patch
> >>
> >>
> >> gcc/
> >>
> >>  * tree-ssa-threadupdate.c (mark_threaded_blocks): Avoid
> >>  dereferencing path[] beyond its length.
> >>  (debug_path): New.
> >>  (debug_all_paths): New.
> >>  (rewire_first_differing_edge): New.
> >>  (adjust_paths_after_duplication): New.
> >>  (duplicate_thread_path): Call adjust_paths_after_duplication.
> >>  Add new argument.
> >>  (thread_through_all_blocks): Add new argument to
> >>  duplicate_thread_path.
> > This is fine for the trunk.  I'd keep the dumping code as-is.  It'll be
> > useful in the future :-)
>
> Retested against current trunk and committed.
>

Hi,

I've noticed a regression on aarch64:
FAIL: gcc.dg/tree-ssa/ssa-dom-thread-7.c scan-tree-dump thread3 "Jumps
threaded: 3"
very likely caused by this patch (appeared between 262282 and 262294)

Christophe

> Thanks.
>
> Aldy


Re: [14/n] PR85694: Rework overwidening detection

2018-07-02 Thread Christophe Lyon
On Fri, 29 Jun 2018 at 13:36, Richard Sandiford
 wrote:
>
> Richard Sandiford  writes:
> > This patch is the main part of PR85694.  The aim is to recognise at least:
> >
> >   signed char *a, *b, *c;
> >   ...
> >   for (int i = 0; i < 2048; i++)
> > c[i] = (a[i] + b[i]) >> 1;
> >
> > as an over-widening pattern, since the addition and shift can be done
> > on shorts rather than ints.  However, it ended up being a lot more
> > general than that.
> >
> > The current over-widening pattern detection is limited to a few simple
> > cases: logical ops with immediate second operands, and shifts by a
> > constant.  These cases are enough for common pixel-format conversion
> > and can be detected in a peephole way.
> >
> > The loop above requires two generalisations of the current code: support
> > for addition as well as logical ops, and support for non-constant second
> > operands.  These are harder to detect in the same peephole way, so the
> > patch tries to take a more global approach.
> >
> > The idea is to get information about the minimum operation width
> > in two ways:
> >
> > (1) by using the range information attached to the SSA_NAMEs
> > (effectively a forward walk, since the range info is
> > context-independent).
> >
> > (2) by back-propagating the number of output bits required by
> > users of the result.
> >
> > As explained in the comments, there's a balance to be struck between
> > narrowing an individual operation and fitting in with the surrounding
> > code.  The approach is pretty conservative: if we could narrow an
> > operation to N bits without changing its semantics, it's OK to do that if:
> >
> > - no operations later in the chain require more than N bits; or
> >
> > - all internally-defined inputs are extended from N bits or fewer,
> >   and at least one of them is single-use.
> >
> > See the comments for the rationale.
> >
> > I didn't bother adding STMT_VINFO_* wrappers for the new fields
> > since the code seemed more readable without.
> >
> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Here's a version rebased on top of current trunk.  Changes from last time:
>
> - reintroduce dump_generic_expr_loc, with the obvious change to the
>   prototype
>
> - fix a typo in a comment
>
> - use vect_element_precision from the new version of 12/n.
>
> Tested as before.  OK to install?
>

Hi Richard,

This patch introduces regressions on arm-none-linux-gnueabihf:
gcc.dg/vect/vect-over-widen-1-big-array.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-over-widen-1-big-array.c scan-tree-dump-times
vect "vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-over-widen-1.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-over-widen-1.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-over-widen-4-big-array.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-over-widen-4-big-array.c scan-tree-dump-times
vect "vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-over-widen-4.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-over-widen-4.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-widen-shift-s16.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 8
gcc.dg/vect/vect-widen-shift-s16.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 8
gcc.dg/vect/vect-widen-shift-s8.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
gcc.dg/vect/vect-widen-shift-s8.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 1
gcc.dg/vect/vect-widen-shift-u16.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 1
gcc.dg/vect/vect-widen-shift-u16.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 1
gcc.dg/vect/vect-widen-shift-u8.c -flto -ffat-lto-objects
scan-tree-dump-times vect "vect_recog_widen_shift_pattern: detected" 2
gcc.dg/vect/vect-widen-shift-u8.c scan-tree-dump-times vect
"vect_recog_widen_shift_pattern: detected" 2

Christophe

> Richard
>
>
> 2018-06-29  Richard Sandiford  
>
> gcc/
> * poly-int.h (print_hex): New function.
> * dumpfile.h (dump_generic_expr_loc, dump_dec, dump_hex): Declare.
> * dumpfile.c (dump_generic_expr): Fix formatting.
> (dump_generic_expr_loc): New function.
> (dump_dec, dump_hex): New poly_wide_int functions.
> * tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
> min_input_precision, operation_precision and operation_sign.
> * tree-vect-patterns.c (vect_get_range_info): New 

[C++ Patch] Adjust one more error message to use rich_location::add_range

2018-07-02 Thread Paolo Carlini

Hi,

I was double checking my pending patch and going through the errors we 
emit in decl.c and elsewhere about thread_local and __thread and noticed 
another place, in parser.c, where using rich_location::add_range seems 
natural. Note, we could in principle swap location and 
decl_specs->locations[ds_thread] in the error basing on the gnu bool and 
ensure that the caret always points to __thread. All in all, I don't 
think it's worth it...


Thanks, Paolo.

///

/cp
2018-07-02  Paolo Carlini  

* parser.c (set_and_check_decl_spec_loc): Use rich_location::add_range
in error message about __thread and thread_local at the same time.

/testsuite
2018-07-02  Paolo Carlini  

* g++.dg/diagnostic/thread-thread_local.C: New.
Index: cp/parser.c
===
--- cp/parser.c (revision 262300)
+++ cp/parser.c (working copy)
@@ -28377,12 +28377,15 @@ set_and_check_decl_spec_loc (cp_decl_specifier_seq
   else if (ds == ds_thread)
{
  bool gnu = token_is__thread (token);
+ gcc_rich_location richloc (location);
  if (gnu != decl_specs->gnu_thread_keyword_p)
-   error_at (location,
- "both %<__thread%> and % specified");
+   {
+ richloc.add_range (decl_specs->locations[ds_thread], false);
+ error_at (,
+   "both %<__thread%> and % specified");
+   }
  else
{
- gcc_rich_location richloc (location);
  richloc.add_fixit_remove ();
  error_at (, "duplicate %qD", token->u.value);
}
Index: testsuite/g++.dg/diagnostic/thread-thread_local.C
===
--- testsuite/g++.dg/diagnostic/thread-thread_local.C   (nonexistent)
+++ testsuite/g++.dg/diagnostic/thread-thread_local.C   (working copy)
@@ -0,0 +1,13 @@
+// { dg-options "-fdiagnostics-show-caret" }
+// { dg-do compile { target c++11 } }
+
+thread_local __thread int a;  // { dg-error "14:both .__thread. and 
.thread_local. specified" }
+/* { dg-begin-multiline-output "" }
+ thread_local __thread int a;
+  ^~~~
+   { dg-end-multiline-output "" } */
+__thread thread_local int b;  // { dg-error "10:both .__thread. and 
.thread_local. specified" }
+/* { dg-begin-multiline-output "" }
+ __thread thread_local int b;
+  ^~~~
+   { dg-end-multiline-output "" } */


Re: [PATCH][wwwdocs] Mention Cortex-A76 support in GCC 9 changes.html

2018-07-02 Thread Richard Earnshaw (lists)
On 29/06/18 14:29, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch adds support for the Arm Cortex-A76 processor in changes.html
> for GCC 9.
> It enables the AArch64 section of the page and adds the news blob there.
> It also adds an entry to the already-existing arm entry.
> 
> Ok to commit to CVS (for the aarch64 parts)?
> 

OK.

R.

> Thanks,
> Kyrill
> 
> wwwdocs-a76.patch
> 
> 
> Index: htdocs/gcc-9/changes.html
> ===
> RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
> retrieving revision 1.9
> diff -U 3 -r1.9 changes.html
> --- htdocs/gcc-9/changes.html 20 Jun 2018 16:15:35 -  1.9
> +++ htdocs/gcc-9/changes.html 27 Jun 2018 10:25:19 -
> @@ -77,13 +77,41 @@
>  
>  New Targets and Target Specific Improvements
>  
> -
> +AArch64
> +
> +  
> +Support has been added for the following processors
> +(GCC identifiers in parentheses):
> +
> + Arm Cortex-A76 (cortex-a76).
> + Arm Cortex-A55/Cortex-A76 DynamIQ big.LITTLE 
> (cortex-a76.cortex-a55).
> +
> +The GCC identifiers can be used
> +as arguments to the -mcpu or -mtune options,
> +for example: -mcpu=cortex-a76 or
> +-mtune=cortex-a76.cortex-a55 or as arguments to the 
> equivalent target
> +attributes and pragmas.
> +  
> +
>  
>  
>  
>  ARM
>  
>
> +Support has been added for the following processors
> +(GCC identifiers in parentheses):
> +
> + Arm Cortex-A76 (cortex-a76).
> + Arm Cortex-A55/Cortex-A76 DynamIQ big.LITTLE 
> (cortex-a76.cortex-a55).
> +
> +The GCC identifiers can be used
> +as arguments to the -mcpu or -mtune options,
> +for example: -mcpu=cortex-a76 or
> +-mtune=cortex-a76.cortex-a55 or as arguments to the 
> equivalent target
> +attributes and pragmas.
> +  
> +  
>  Support for the deprecated Armv2 and Armv3 architectures and their
>  variants has been removed.  Their corresponding -march
>  values and the -mcpu options that used these architectures
> 



Re: [PATCH 3/3][POPCOUNT] Remove unnecessary if condition in phiopt

2018-07-02 Thread Richard Biener
On Mon, Jul 2, 2018 at 3:15 AM Kugan Vivekanandarajah
 wrote:
>
> Hi Richard,
>
> On 29 June 2018 at 18:45, Richard Biener  wrote:
> > On Wed, Jun 27, 2018 at 7:09 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> Hi Richard,
> >>
> >> Thanks for the review,
> >>
> >> On 25 June 2018 at 20:20, Richard Biener  
> >> wrote:
> >> > On Fri, Jun 22, 2018 at 11:16 AM Kugan Vivekanandarajah
> >> >  wrote:
> >> >>
> >> >> gcc/ChangeLog:
> >> >
> >> > @@ -1516,6 +1521,114 @@ minmax_replacement (basic_block cond_bb,
> >> > basic_block middle_bb,
> >> >
> >> >return true;
> >> >  }
> >> > +/* Convert
> >> > +
> >> > +   
> >> > +   if (b_4(D) != 0)
> >> > +   goto 
> >> >
> >> > vertical space before the comment.
> >> Done.
> >>
> >> >
> >> > + edge e0 ATTRIBUTE_UNUSED, edge e1
> >> > ATTRIBUTE_UNUSED,
> >> >
> >> > why pass them if they are unused?
> >> Removed.
> >>
> >> >
> >> > +  if (stmt_count != 2)
> >> > +return false;
> >> >
> >> > so what about the case when there is no conversion?
> >> Done.
> >>
> >> >
> >> > +  /* Check that we have a popcount builtin.  */
> >> > +  if (!is_gimple_call (popcount)
> >> > +  || !gimple_call_builtin_p (popcount, BUILT_IN_NORMAL))
> >> > +return false;
> >> > +  tree fndecl = gimple_call_fndecl (popcount);
> >> > +  if ((DECL_FUNCTION_CODE (fndecl) != BUILT_IN_POPCOUNT)
> >> > +  && (DECL_FUNCTION_CODE (fndecl) != BUILT_IN_POPCOUNTL)
> >> > +  && (DECL_FUNCTION_CODE (fndecl) != BUILT_IN_POPCOUNTLL))
> >> > +return false;
> >> >
> >> > look at popcount handling in tree-vrp.c how to properly also handle
> >> > IFN_POPCOUNT.
> >> > (CASE_CFN_POPCOUNT)
> >> Done.
> >> >
> >> > +  /* Cond_bb has a check for b_4 != 0 before calling the popcount
> >> > + builtin.  */
> >> > +  if (gimple_code (cond) != GIMPLE_COND
> >> > +  || gimple_cond_code (cond) != NE_EXPR
> >> > +  || TREE_CODE (gimple_cond_lhs (cond)) != SSA_NAME
> >> > +  || rhs != gimple_cond_lhs (cond))
> >> > +return false;
> >> >
> >> > The check for SSA_NAME is redundant.
> >> > You fail to check that gimple_cond_rhs is zero.
> >> Done.
> >>
> >> >
> >> > +  /* Remove the popcount builtin and cast stmt.  */
> >> > +  gsi = gsi_for_stmt (popcount);
> >> > +  gsi_remove (, true);
> >> > +  gsi = gsi_for_stmt (cast);
> >> > +  gsi_remove (, true);
> >> > +
> >> > +  /* And insert the popcount builtin and cast stmt before the cond_bb.  
> >> > */
> >> > +  gsi = gsi_last_bb (cond_bb);
> >> > +  gsi_insert_before (, popcount, GSI_NEW_STMT);
> >> > +  gsi_insert_before (, cast, GSI_NEW_STMT);
> >> >
> >> > use gsi_move_before ().  You need to reset flow sensitive info on the
> >> > LHS of the popcount call as well as on the LHS of the cast.
> >> Done.
> >>
> >> >
> >> > You fail to check the PHI operand on the false edge.  Consider
> >> >
> >> >  if (b != 0)
> >> >res = __builtin_popcount (b);
> >> >  else
> >> >res = 1;
> >> >
> >> > You fail to check the PHI operand on the true edge.  Consider
> >> >
> >> >  res = 0;
> >> >  if (b != 0)
> >> >{
> >> >   __builtin_popcount (b);
> >> >   res = 2;
> >> >}
> >> >
> >> > and using -fno-tree-dce and whatever you need to keep the
> >> > popcount call in the IL.  A gimple testcase for phiopt will do.
> >> >
> >> > Your testcase relies on popcount detection.  Please write it
> >> > using __builtin_popcount instead.  Write one with a cast and
> >> > one without.
> >> Added the testcases.
> >>
> >> Is this OK now.
> >
> > +  for (gsi = gsi_start_bb (middle_bb); !gsi_end_p (gsi); gsi_next ())
> > +{
> >
> > use gsi_after_labels (middle_bb)
> >
> > +  popcount = last_stmt (middle_bb);
> > +  if (popcount == NULL)
> > +return false;
> >
> > after the counting this test is always false, remove it.
> >
> > +  /* We have a cast stmt feeding popcount builtin.  */
> > +  cast = first_stmt (middle_bb);
> >
> > looking at the implementation of first_stmt this will
> > give you a label in case the BB has one.  I think it's better
> > to merge this and the above with the "counting" like
> >
> > gsi = gsi_start_nondebug_after_labels_bb (middle_bb);
> > if (gsi_end_p (gsi))
> >   return false;
> > cast = gsi_stmt (gsi);
> > gsi_next_nondebug ();
> > if (!gsi_end_p (gsi))
> >   {
> > popcount = gsi_stmt (gsi);
> > gsi_next_nondebug ();
> > if (!gsi_end_p (gsi))
> >return false;
> >   }
> > else
> >   {
> > popcount = cast;
> > cast = NULL;
> >   }
>
> Done.
> >
> > +  /* Check that we have a cast prior to that.  */
> > +  if (gimple_code (cast) != GIMPLE_ASSIGN
> > + || gimple_assign_rhs_code (cast) != NOP_EXPR)
> > +   return false;
> >
> > CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (cast))
> >
> Done.
>
> > +  /* Check PHI arguments.  */
> > +  if (lhs != arg0 || !integer_zerop (arg1))
> > +return false;
> >
> > that is not sufficient, you do not know whether arg0 is the true
> > value or the false value.  The edge flags will 

Re: [i386] Do not omit the frame pointer at -O0

2018-07-02 Thread Eric Botcazou
> LGTM, but please note that the patch was already approved by Jeff on
> 22th of June [1].

Sorry, I missed that...  Thanks for pointing it out.

-- 
Eric Botcazou


Re: [PATCH][aarch64] Avoid tag collisions for loads on falkor

2018-07-02 Thread Kyrill Tkachov

Hi Siddhesh,

On 02/07/18 10:15, Siddhesh Poyarekar wrote:

Hi,

This is a rewrite of the tag collision avoidance patch that Kugan had
written as a machine reorg pass back in February[1].

The falkor hardware prefetching system uses a combination of the
source, destination and offset to decide which prefetcher unit to
train with the load.  This is great when loads in a loop are
sequential but sub-optimal if there are unrelated loads in a loop that
tag to the same prefetcher unit.

This pass attempts to rename the desination register of such colliding
loads using routines available in regrename.c so that their tags do
not collide.  This shows some performance gains with mcf and xalancbmk
(~5% each) and will be tweaked further.  The pass is placed near the
fag end of the pass list so that subsequent passes don't inadvertantly
end up undoing the renames.

A full gcc bootstrap and testsuite ran successfully on aarch64, i.e. it
did not introduce any new regressions.  I also did a make-check with
-mcpu=falkor to ensure that there were no regressions.  The couple of
regressions I found were target-specific and were related to scheduling
and cost differences and are not correctness issues.



Nice! What were the regressions though? Would be nice to adjust the tests
to make them more robust so that we have as clean a testsuite as possible.


[1] https://patchwork.ozlabs.org/patch/872532/

2018-07-02  Siddhesh Poyarekar 
Kugan Vivekanandarajah 

* config/aarch64/falkor-tag-collision-avoidance.c: New file.
* config.gcc (extra_objs): Build it.
* config/aarch64/t-aarch64 (falkor-tag-collision-avoidance.o):
Likewise.
* config/aarch64/aarch64-passes.def
(pass_tag_collision_avoidance): New pass.
* config/aarch64/aarch64.c (qdf24xx_tunings): Add
AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS to tuning_flags.
(aarch64_classify_address): Remove static qualifier.
(aarch64_address_info, aarch64_address_type): Move to...
* config/aarch64/aarch64-protos.h: ... here.
(make_pass_tag_collision_avoidance): New function.
* config/aarch64/aarch64-tuning-flags.def (rename_load_regs):
New tuning flag.



More comments inline, but a general observation:
in the function comment for the new functions can you please include a 
description
of the function arguments and the meaning of the return value (for example, 
some functions return -1 ; what does that mean?).
It really does make it much easier to maintain the code after some time has 
passed.

Thanks,
Kyrill


---
 gcc/config.gcc|   2 +-
 gcc/config/aarch64/aarch64-passes.def |   1 +
 gcc/config/aarch64/aarch64-protos.h   |  49 ++
 gcc/config/aarch64/aarch64-tuning-flags.def   |   2 +
 gcc/config/aarch64/aarch64.c  |  48 +-
 .../aarch64/falkor-tag-collision-avoidance.c  | 821 ++
 gcc/config/aarch64/t-aarch64  |   9 +
 8 files changed, 891 insertions(+), 46 deletions(-)
 create mode 100644 gcc/config/aarch64/falkor-tag-collision-avoidance.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 4d9f9c6ea29..b78a30f5d69 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -304,7 +304,7 @@ aarch64*-*-*)
 extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
 c_target_objs="aarch64-c.o"
 cxx_target_objs="aarch64-c.o"
-   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
+   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o 
falkor-tag-collision-avoidance.o"
target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
 target_has_targetm_common=yes
 ;;
diff --git a/gcc/config/aarch64/aarch64-passes.def 
b/gcc/config/aarch64/aarch64-passes.def
index 87747b420b0..f61a8870aa1 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -19,3 +19,4 @@
. */

 INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
+INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 4ea50acaa59..175a3faf057 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -283,6 +283,49 @@ struct tune_params
   const struct cpu_prefetch_tune *prefetch;
 };

+/* Classifies an address.
+
+   ADDRESS_REG_IMM
+   A simple base register plus immediate offset.
+
+   ADDRESS_REG_WB
+   A base register indexed by immediate offset with writeback.
+
+   ADDRESS_REG_REG
+   A base register indexed by (optionally scaled) register.
+
+   ADDRESS_REG_UXTW
+   A base register indexed by (optionally scaled) zero-extended register.
+
+   ADDRESS_REG_SXTW
+   A base register indexed by (optionally scaled) sign-extended register.
+
+   ADDRESS_LO_SUM
+   A LO_SUM rtx with a base register and "LO12" symbol 

[PATCH][aarch64] Avoid tag collisions for loads on falkor

2018-07-02 Thread Siddhesh Poyarekar
Hi,

This is a rewrite of the tag collision avoidance patch that Kugan had
written as a machine reorg pass back in February[1].

The falkor hardware prefetching system uses a combination of the
source, destination and offset to decide which prefetcher unit to
train with the load.  This is great when loads in a loop are
sequential but sub-optimal if there are unrelated loads in a loop that
tag to the same prefetcher unit.

This pass attempts to rename the desination register of such colliding
loads using routines available in regrename.c so that their tags do
not collide.  This shows some performance gains with mcf and xalancbmk
(~5% each) and will be tweaked further.  The pass is placed near the
fag end of the pass list so that subsequent passes don't inadvertantly
end up undoing the renames.

A full gcc bootstrap and testsuite ran successfully on aarch64, i.e. it
did not introduce any new regressions.  I also did a make-check with
-mcpu=falkor to ensure that there were no regressions.  The couple of
regressions I found were target-specific and were related to scheduling
and cost differences and are not correctness issues.

[1] https://patchwork.ozlabs.org/patch/872532/

2018-07-02  Siddhesh Poyarekar  
Kugan Vivekanandarajah  

* config/aarch64/falkor-tag-collision-avoidance.c: New file.
* config.gcc (extra_objs): Build it.
* config/aarch64/t-aarch64 (falkor-tag-collision-avoidance.o):
Likewise.
* config/aarch64/aarch64-passes.def
(pass_tag_collision_avoidance): New pass.
* config/aarch64/aarch64.c (qdf24xx_tunings): Add
AARCH64_EXTRA_TUNE_RENAME_LOAD_REGS to tuning_flags.
(aarch64_classify_address): Remove static qualifier.
(aarch64_address_info, aarch64_address_type): Move to...
* config/aarch64/aarch64-protos.h: ... here.
(make_pass_tag_collision_avoidance): New function.
* config/aarch64/aarch64-tuning-flags.def (rename_load_regs):
New tuning flag.

---
 gcc/config.gcc|   2 +-
 gcc/config/aarch64/aarch64-passes.def |   1 +
 gcc/config/aarch64/aarch64-protos.h   |  49 ++
 gcc/config/aarch64/aarch64-tuning-flags.def   |   2 +
 gcc/config/aarch64/aarch64.c  |  48 +-
 .../aarch64/falkor-tag-collision-avoidance.c  | 821 ++
 gcc/config/aarch64/t-aarch64  |   9 +
 8 files changed, 891 insertions(+), 46 deletions(-)
 create mode 100644 gcc/config/aarch64/falkor-tag-collision-avoidance.c

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 4d9f9c6ea29..b78a30f5d69 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -304,7 +304,7 @@ aarch64*-*-*)
extra_headers="arm_fp16.h arm_neon.h arm_acle.h"
c_target_objs="aarch64-c.o"
cxx_target_objs="aarch64-c.o"
-   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o"
+   extra_objs="aarch64-builtins.o aarch-common.o cortex-a57-fma-steering.o 
falkor-tag-collision-avoidance.o"
target_gtfiles="\$(srcdir)/config/aarch64/aarch64-builtins.c"
target_has_targetm_common=yes
;;
diff --git a/gcc/config/aarch64/aarch64-passes.def 
b/gcc/config/aarch64/aarch64-passes.def
index 87747b420b0..f61a8870aa1 100644
--- a/gcc/config/aarch64/aarch64-passes.def
+++ b/gcc/config/aarch64/aarch64-passes.def
@@ -19,3 +19,4 @@
.  */
 
 INSERT_PASS_AFTER (pass_regrename, 1, pass_fma_steering);
+INSERT_PASS_AFTER (pass_machine_reorg, 1, pass_tag_collision_avoidance);
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 4ea50acaa59..175a3faf057 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -283,6 +283,49 @@ struct tune_params
   const struct cpu_prefetch_tune *prefetch;
 };
 
+/* Classifies an address.
+
+   ADDRESS_REG_IMM
+   A simple base register plus immediate offset.
+
+   ADDRESS_REG_WB
+   A base register indexed by immediate offset with writeback.
+
+   ADDRESS_REG_REG
+   A base register indexed by (optionally scaled) register.
+
+   ADDRESS_REG_UXTW
+   A base register indexed by (optionally scaled) zero-extended register.
+
+   ADDRESS_REG_SXTW
+   A base register indexed by (optionally scaled) sign-extended register.
+
+   ADDRESS_LO_SUM
+   A LO_SUM rtx with a base register and "LO12" symbol relocation.
+
+   ADDRESS_SYMBOLIC:
+   A constant symbolic address, in pc-relative literal pool.  */
+
+enum aarch64_address_type {
+  ADDRESS_REG_IMM,
+  ADDRESS_REG_WB,
+  ADDRESS_REG_REG,
+  ADDRESS_REG_UXTW,
+  ADDRESS_REG_SXTW,
+  ADDRESS_LO_SUM,
+  ADDRESS_SYMBOLIC
+};
+
+/* Address information.  */
+struct aarch64_address_info {
+  enum aarch64_address_type type;
+  rtx base;
+  rtx offset;
+  poly_int64 const_offset;
+  int shift;
+  enum aarch64_symbol_type symbol_type;
+};
+
 #define AARCH64_FUSION_PAIR(x, name) \
   AARCH64_FUSE_##name##_index, 
 /* 

Re: [PATCH] When using -fprofile-generate=/some/path mangle absolute path of file (PR lto/85759).

2018-07-02 Thread Rainer Orth
Hi Martin,

> On 06/22/2018 10:35 PM, Jeff Law wrote:
>> On 05/16/2018 05:53 AM, Martin Liška wrote:
>>> On 12/21/2017 10:13 AM, Martin Liška wrote:
 On 12/20/2017 06:45 PM, Jakub Jelinek wrote:
> Another thing is that the "/" in there is wrong, so
>   const char dir_separator_str[] = { DIR_SEPARATOR, '\0' };
>   char *b = concat (profile_data_prefix, dir_separator_str, pwd, NULL);
> needs to be used instead.
 This looks much nicer, I forgot about DIR_SEPARATOR.

> Does profile_data_prefix have any dir separators stripped from the end?
 That's easy to achieve..

> Is pwd guaranteed to be relative in this case?
 .. however this is absolute path, which would be problematic on a DOC
 based FS.
 Maybe we should do the same path mangling as we do for purpose of gcov:

 https://github.com/gcc-mirror/gcc/blob/master/gcc/gcov.c#L2424
>>> Hi.
>>>
>>> I decided to implement that. Which means for:
>>>
>>> $ gcc -fprofile-generate=/tmp/myfolder empty.c -O2 && ./a.out 
>>>
>>> we get following file:
>>> /tmp/myfolder/#home#marxin#Programming#testcases#tmp#empty.gcda
>>>
>>> That guarantees we have a unique file path. As seen in the PR it
>>> can produce a funny ICE.
>>>
>>> I've been testing the patch.
>>> Ready after it finishes tests?
>>>
>>> Martin
>>>
 What do you think about it?
 Regarding the string manipulation: I'm not an expert, but work with
 string in C
 is for me always a pain :)

 Martin

>>>
>>> 0001-When-using-fprofile-generate-some-path-mangle-absolu.patch
>>>
>>>
>>> From 386a4561a4d1501e8959871791289e95f6a89af5 Mon Sep 17 00:00:00 2001
>>> From: marxin 
>>> Date: Wed, 16 Aug 2017 10:22:57 +0200
>>> Subject: [PATCH] When using -fprofile-generate=/some/path mangle absolute 
>>> path
>>>  of file (PR lto/85759).
>>>
>>> gcc/ChangeLog:
>>>
>>> 2018-05-16  Martin Liska  
>>>
>>> PR lto/85759
>>> * coverage.c (coverage_init): Mangle full path name.
>>> * doc/invoke.texi: Document the change.
>>> * gcov-io.c (mangle_path): New.
>>> * gcov-io.h (mangle_path): Likewise.
>>> * gcov.c (mangle_name): Use mangle_path for path mangling.
>> ISTM you can self-approve this now if you want it to move forward :-)
>> 
>> jeff
>> 
>
> Sure, let me install the patch then.

your patch caused 3 testcases to regress everywhere:

+UNRESOLVED: gcc.dg/pr47793.c scan-file .
+FAIL: gcc.dg/profile-dir-1.c scan-ipa-dump cgraph " ./profile-dir-1.gcda"
+FAIL: gcc.dg/profile-dir-3.c scan-ipa-dump cgraph " ./profile-dir-3.gcda"

Please fix.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [i386] Do not omit the frame pointer at -O0

2018-07-02 Thread Uros Bizjak
On Mon, Jul 2, 2018 at 10:14 AM, Eric Botcazou  wrote:
> Ping for https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01228.html
>
> Thanks in advance.

LGTM, but please note that the patch was already approved by Jeff on
22th of June [1].

[1] https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01466.html

Uros.


Re: [testsuite/guality, committed] Prevent optimization of local in vla-1.c

2018-07-02 Thread Jakub Jelinek
On Mon, Jul 02, 2018 at 09:44:04AM +0200, Richard Biener wrote:
> Given the array has size i + 1 it's upper bound should be 'i' and 'i'
> should be available via DW_OP_[GNU_]entry_value.
> 
> I see it is
> 
> <175>   DW_AT_upper_bound : 10 byte block: 75 1 8 20 24 8 20 26 31
> 1c   (DW_OP_breg5 (rdi): 1; DW_OP_const1u: 32; DW_OP_shl;
> DW_OP_const1u: 32; DW_OP_shra; DW_OP_lit1; DW_OP_minus)
> 
> and %rdi is 1.  Not sure why gdb fails to print it's length.  Yes, the
> storage itself doesn't have a location but the
> type specifies the size.
> 
> (gdb) ptype a
> type = char [variable length]
> (gdb) p sizeof(a)
> $3 = 0
> 
> this looks like a gdb bug to me?
> 
> Btw, the location expression looks odd, if I deciper it correctly
> we end up with ((%rdi << 32) >> 32) - 1 which computes to 4
> but the upper bound should be 5.  The GIMPLE debug stmts compute
> the upper bound as (sizetype)((long)(i_1(D) + 1) - 1)

The << 32 >> 32 is sign extension.  And yes, for f1 I don't see why
DW_OP_GNU_entry_value shouldn't work, i in main is needed for the call to
f2, so needs to live in some register or memory in that function until the
second call.  For f2 i is needed after the bar call for the a[i + 4] read,
worst case in form of precomputed i + 4, but that is reversible op.

Jakub


[i386] Do not omit the frame pointer at -O0

2018-07-02 Thread Eric Botcazou
Ping for https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01228.html

Thanks in advance.

-- 
Eric Botcazou


Re: [testsuite/guality, committed] Prevent optimization of local in vla-1.c

2018-07-02 Thread Richard Biener
On Sun, Jul 1, 2018 at 9:25 PM Tom de Vries  wrote:
>
> On 07/01/2018 09:11 PM, Jakub Jelinek wrote:
> > On Sun, Jul 01, 2018 at 06:19:20PM +0200, Tom de Vries wrote:
> >> So, the local vla a is optimized away.
> >>
> >> This patch adds VOLATILE to 'a', which prevents it from being optimized 
> >> away,
> >> and fixes the non-lto failures.
> >>
> >> Committed as obvious.
> >
> > That isn't obvious, it is just wrong.
> > The intent of the testcase is to test how debugging of optimized code with
> > VLAs works.  With your change we don't test that anymore.  Please revert.
> >
>
> Hi,
>
> Sure, but ... if this is wrong, then for my understanding can you
> explain to me how the fail should be addressed?

Given the array has size i + 1 it's upper bound should be 'i' and 'i'
should be available via DW_OP_[GNU_]entry_value.

I see it is

<175>   DW_AT_upper_bound : 10 byte block: 75 1 8 20 24 8 20 26 31
1c   (DW_OP_breg5 (rdi): 1; DW_OP_const1u: 32; DW_OP_shl;
DW_OP_const1u: 32; DW_OP_shra; DW_OP_lit1; DW_OP_minus)

and %rdi is 1.  Not sure why gdb fails to print it's length.  Yes, the
storage itself doesn't have a location but the
type specifies the size.

(gdb) ptype a
type = char [variable length]
(gdb) p sizeof(a)
$3 = 0

this looks like a gdb bug to me?

Btw, the location expression looks odd, if I deciper it correctly
we end up with ((%rdi << 32) >> 32) - 1 which computes to 4
but the upper bound should be 5.  The GIMPLE debug stmts compute
the upper bound as (sizetype)((long)(i_1(D) + 1) - 1)

vartrack correctly prints

(debug_insn 7 26 8 2 (var_location:SI D#3 (plus:SI (entry_value:SI
(reg:SI 5 di [ i ]))
(const_int 1 [0x1])))
"/space/rguenther/src/gcc-slpcost/gcc/testsuite/gcc.dg/guality/vla-1.c":15
-1
 (nil))
(debug_insn 8 7 9 2 (var_location:DI D#2 (sign_extend:DI
(debug_expr:SI D#3)))
"/space/rguenther/src/gcc-slpcost/gcc/testsuite/gcc.dg/guality/vla-1.c":15
-1
 (nil))
(debug_insn 9 8 10 2 (var_location:DI D#1 (plus:DI (debug_expr:DI D#2)
(const_int -1 [0x])))
"/space/rguenther/src/gcc-slpcost/gcc/testsuite/gcc.dg/guality/vla-1.c":15
-1
 (nil))
(debug_insn 10 9 11 2 (var_location:DI D.1914 (debug_expr:DI D#1))
"/space/rguenther/src/gcc-slpcost/gcc/testsuite/gcc.dg/guality/vla-1.c":15
-1
 (nil))

Richard.

> [ FWIW, I considered this obvious, given the ok for "[testsuite] Fix
> guality/pr45882.c for flto"  (
> https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01304.html ) which seemed
> similar to me. ]
>
> Thanks,
> - Tom
>
> >> [testsuite/guality] Prevent optimization of local in vla-1.c
> >>
> >> 2018-07-01  Tom de Vries  
> >>
> >>  * gcc.dg/guality/prevent-optimization.h (VOLATILE): Define.
> >>  * gcc.dg/guality/vla-1.c (f1): Mark local vla a as VOLATILE.
> >
> >   Jakub
> >


Re: abstract ABS_EXPR code for ranges into separate function

2018-07-02 Thread Richard Biener
On Sun, Jul 1, 2018 at 10:05 AM Aldy Hernandez  wrote:
>
> Boy those extract_range_from_*_expr functions are huge.
>
> OK to move the ABS_EXPR code into its own function?

OK.

Richard.

> Tested on x86-64 Linux.
>
> Aldy


Re: extract_range_from_binary* cleanups for VRP

2018-07-02 Thread Richard Biener
On Fri, Jun 29, 2018 at 7:55 PM Aldy Hernandez  wrote:
>
> Howdy!
>
> Attached are some cleanups to the VRP code dealing with PLUS/MINUS_EXPR
> on ranges.  This will make it easier to share code with any other range
> implementation in the future, but is completely independent from any
> other work.
>
> Currently there is a lot of code duplication in the PLUS/MINUS_EXPR
> code, which we can easily abstract out and make everything easier to
> read.  I have tried to keep functionality changes to a minimum to help
> in reviewing.
>
> A few minor things that are different:
>
> 1. As mentioned in a previous thread with Richard
> (https://gcc.gnu.org/ml/gcc/2018-06/msg00100.html), I would like to use
> the first variant here, as they seem to ultimately do the same thing:
>
> - /* Get the lower and upper bounds of the type.  */
> - if (TYPE_OVERFLOW_WRAPS (expr_type))
> -   {
> - type_min = wi::min_value (prec, sgn);
> - type_max = wi::max_value (prec, sgn);
> -   }
> - else
> -   {
> - type_min = wi::to_wide (vrp_val_min (expr_type));
> - type_max = wi::to_wide (vrp_val_max (expr_type));
> -   }
>
> 2. I've removed the code below, as it seems to be a remnant from when
> the comparisons were being done with double_int's.  The overflow checks
> were/are being done prior anyhow.  For that matter, I put in some
> gcc_unreachables in the code below, and never triggered it in a
> bootstrap + regtest.
>
> - /* Check for type overflow.  */
> - if (min_ovf == 0)
> -   {
> - if (wi::cmp (wmin, type_min, sgn) == -1)
> -   min_ovf = -1;
> - else if (wi::cmp (wmin, type_max, sgn) == 1)
> -   min_ovf = 1;
> -   }
> - if (max_ovf == 0)
> -   {
> - if (wi::cmp (wmax, type_min, sgn) == -1)
> -   max_ovf = -1;
> - else if (wi::cmp (wmax, type_max, sgn) == 1)
> -   max_ovf = 1;
> -   }
>
> Everything else is exactly as it was, just abstracted and moved around.
>
> To test this, I compared the results of every binary op before and after
> this patch, to make sure that we were getting the same exact ranges.
> There were no differences in a bootstrap plus regtest.
>
> OK for trunk?

OK.

Thanks,
Richard.