Re: [PATCH] Improve AVX512 sse movcc (PR target/88547)

2018-12-19 Thread Jakub Jelinek
On Thu, Dec 20, 2018 at 08:42:05AM +0100, Uros Bizjak wrote:
> > If one vcond argument is all ones (non-bool) vector and another one is all
> > zeros, we can use for AVX512{DQ,BW} (sometimes + VL) the vpmovm2? insns.
> > While if op_true is all ones and op_false, we emit large code that the
> > combiner often optimizes to that vpmovm2?, if the arguments are swapped,
> > we emit vpxor + vpternlog + and masked move (blend), while we could just
> > invert the mask with knot* and use vpmovm2?.
> >
> > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> > trunk?  The patch is large, but it is mostly reindentation, in the
> > attachment there is diff -ubpd variant of the i386.c changes to make it more
> > readable.
> >
> > 2018-12-19  Jakub Jelinek  
> >
> > PR target/88547
> > * config/i386/i386.c (ix86_expand_sse_movcc): For maskcmp, try to
> > emit vpmovm2? instruction perhaps after knot?.  Reorganize code
> > so that it doesn't have to test !maskcmp in almost every 
> > conditional.
> >
> > * gcc.target/i386/pr88547-1.c: New test.
> 
> LGTM, under assumption that interunit moves from mask reg to xmm regs are 
> fast.

In a simple benchmark (calling these functions in a tight loop on i9-7960X)
the performance is the same, just shorter sequences.

Jakub


Re: [PATCH] Improve AVX512 sse movcc (PR target/88547)

2018-12-19 Thread Uros Bizjak
On Thu, Dec 20, 2018 at 12:20 AM Jakub Jelinek  wrote:
>
> Hi!
>
> If one vcond argument is all ones (non-bool) vector and another one is all
> zeros, we can use for AVX512{DQ,BW} (sometimes + VL) the vpmovm2? insns.
> While if op_true is all ones and op_false, we emit large code that the
> combiner often optimizes to that vpmovm2?, if the arguments are swapped,
> we emit vpxor + vpternlog + and masked move (blend), while we could just
> invert the mask with knot* and use vpmovm2?.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?  The patch is large, but it is mostly reindentation, in the
> attachment there is diff -ubpd variant of the i386.c changes to make it more
> readable.
>
> 2018-12-19  Jakub Jelinek  
>
> PR target/88547
> * config/i386/i386.c (ix86_expand_sse_movcc): For maskcmp, try to
> emit vpmovm2? instruction perhaps after knot?.  Reorganize code
> so that it doesn't have to test !maskcmp in almost every conditional.
>
> * gcc.target/i386/pr88547-1.c: New test.

LGTM, under assumption that interunit moves from mask reg to xmm regs are fast.

Thanks,
Uros.

> --- gcc/config/i386/i386.c.jj   2018-12-18 19:40:27.698946295 +0100
> +++ gcc/config/i386/i386.c  2018-12-19 17:14:24.948218640 +0100
> @@ -23593,33 +23593,117 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp
>cmp = gen_rtx_SUBREG (mode, cmp, 0);
>  }
>
> -  if (vector_all_ones_operand (op_true, mode)
> -  && rtx_equal_p (op_false, CONST0_RTX (mode))
> -  && !maskcmp)
> +  if (maskcmp)
> +{
> +  rtx (*gen) (rtx, rtx) = NULL;
> +  if ((op_true == CONST0_RTX (mode)
> +  && vector_all_ones_operand (op_false, mode))
> + || (op_false == CONST0_RTX (mode)
> + && vector_all_ones_operand (op_true, mode)))
> +   switch (mode)
> + {
> + case E_V64QImode:
> +   if (TARGET_AVX512BW)
> + gen = gen_avx512bw_cvtmask2bv64qi;
> +   break;
> + case E_V32QImode:
> +   if (TARGET_AVX512VL && TARGET_AVX512BW)
> + gen = gen_avx512vl_cvtmask2bv32qi;
> +   break;
> + case E_V16QImode:
> +   if (TARGET_AVX512VL && TARGET_AVX512BW)
> + gen = gen_avx512vl_cvtmask2bv16qi;
> +   break;
> + case E_V32HImode:
> +   if (TARGET_AVX512BW)
> + gen = gen_avx512bw_cvtmask2wv32hi;
> +   break;
> + case E_V16HImode:
> +   if (TARGET_AVX512VL && TARGET_AVX512BW)
> + gen = gen_avx512vl_cvtmask2wv16hi;
> +   break;
> + case E_V8HImode:
> +   if (TARGET_AVX512VL && TARGET_AVX512BW)
> + gen = gen_avx512vl_cvtmask2wv8hi;
> +   break;
> + case E_V16SImode:
> +   if (TARGET_AVX512DQ)
> + gen = gen_avx512f_cvtmask2dv16si;
> +   break;
> + case E_V8SImode:
> +   if (TARGET_AVX512VL && TARGET_AVX512DQ)
> + gen = gen_avx512vl_cvtmask2dv8si;
> +   break;
> + case E_V4SImode:
> +   if (TARGET_AVX512VL && TARGET_AVX512DQ)
> + gen = gen_avx512vl_cvtmask2dv4si;
> +   break;
> + case E_V8DImode:
> +   if (TARGET_AVX512DQ)
> + gen = gen_avx512f_cvtmask2qv8di;
> +   break;
> + case E_V4DImode:
> +   if (TARGET_AVX512VL && TARGET_AVX512DQ)
> + gen = gen_avx512vl_cvtmask2qv4di;
> +   break;
> + case E_V2DImode:
> +   if (TARGET_AVX512VL && TARGET_AVX512DQ)
> + gen = gen_avx512vl_cvtmask2qv2di;
> +   break;
> + default:
> +   break;
> + }
> +  if (gen && SCALAR_INT_MODE_P (cmpmode))
> +   {
> + cmp = force_reg (cmpmode, cmp);
> + if (op_true == CONST0_RTX (mode))
> +   {
> + rtx (*gen_not) (rtx, rtx);
> + switch (cmpmode)
> +   {
> +   case E_QImode: gen_not = gen_knotqi; break;
> +   case E_HImode: gen_not = gen_knothi; break;
> +   case E_SImode: gen_not = gen_knotsi; break;
> +   case E_DImode: gen_not = gen_knotdi; break;
> +   default: gcc_unreachable ();
> +   }
> + rtx n = gen_reg_rtx (cmpmode);
> + emit_insn (gen_not (n, cmp));
> + cmp = n;
> +   }
> + emit_insn (gen (dest, cmp));
> + return;
> +   }
> +}
> +  else if (vector_all_ones_operand (op_true, mode)
> +  && op_false == CONST0_RTX (mode))
>  {
>emit_insn (gen_rtx_SET (dest, cmp));
> +  return;
>  }
> -  else if (op_false == CONST0_RTX (mode) && !maskcmp)
> +  else if (op_false == CONST0_RTX (mode))
>  {
>op_true = force_reg (mode, op_true);
>x = gen_rtx_AND (mode, cmp, op_true);
>emit_insn (gen_rtx_SET (dest, x));
> +  return;
>  }
> -  else if (op_true == CONST0_RTX (mod

Relax std::move_if_noexcept for std::pair

2018-12-19 Thread François Dumont

Hi

    I eventually find out what was the problem with the 
std::move_if_noexcept within associative containers.


    The std::pair move default constructor might not move both first 
and second member. If any is not moveable it will just copy it. And then 
the noexcept qualification of the copy constructor will participate in 
the noexcept qualification of the std::pair move constructor. So 
std::move_if_noexcept can eventually decide to not use move because a 
_copy_ constructor not noexcept qualified.


    This is why I am partially specializing __move_if_noexcept_cond. As 
there doesn't seem to exist any Standard meta function to find out if 
move will take place I resort using std::is_const as in this case for 
sure the compiler won't call the move constructor.


    Note that I find __move_if_noexcept_cond very counter-intuitive 
cause it says if the move semantic should _not_ be used.


    I am submitting this now cause it might be consider as a bug even 
if the end result is just that it pessimizes the occasion to use move 
semantic rather than copy.


    * include/bits/stl_pair.h (__move_if_noexcept_cond>): New
    partial specialization.
    * testsuite/20_util/move_if_noexcept/1.cc (test02): New.
    * testsuite/23_containers/unordered_map/allocator/move_assign.cc
    (test03): New.

    Tested under Linux x86_64 normal mode.

François

diff --git a/libstdc++-v3/include/bits/stl_pair.h b/libstdc++-v3/include/bits/stl_pair.h
index 48af2b02ef9..85aad838860 100644
--- a/libstdc++-v3/include/bits/stl_pair.h
+++ b/libstdc++-v3/include/bits/stl_pair.h
@@ -528,6 +528,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   typedef pair<__ds_type1, __ds_type2> 	  __pair_type;
   return __pair_type(std::forward<_T1>(__x), std::forward<_T2>(__y));
 }
+
+  template
+struct __move_if_noexcept_cond>
+: public __and_>,
+		__not_<__and_<
+	   __or_, is_nothrow_move_constructible<_T1>>,
+	   __or_, is_nothrow_move_constructible<_T2>::type
+  { };
 #else
   template
 inline pair<_T1, _T2>
diff --git a/libstdc++-v3/testsuite/20_util/move_if_noexcept/1.cc b/libstdc++-v3/testsuite/20_util/move_if_noexcept/1.cc
index 078ccb83d36..b6f01097e40 100644
--- a/libstdc++-v3/testsuite/20_util/move_if_noexcept/1.cc
+++ b/libstdc++-v3/testsuite/20_util/move_if_noexcept/1.cc
@@ -33,7 +33,7 @@ struct noexcept_move_copy
 
   noexcept_move_copy(const noexcept_move_copy&) = default;
 
-  operator bool() { return status; }
+  operator bool() const { return status; }
 
 private:
   bool status;
@@ -50,7 +50,7 @@ struct noexcept_move_no_copy
 
   noexcept_move_no_copy(const noexcept_move_no_copy&) = delete;
 
-  operator bool() { return status; }
+  operator bool() const { return status; }
 
 private:
   bool status;
@@ -67,7 +67,7 @@ struct except_move_copy
 
   except_move_copy(const except_move_copy&) = default;
 
-  operator bool() { return status; }
+  operator bool() const { return status; }
 
 private:
   bool status;
@@ -84,7 +84,7 @@ struct except_move_no_copy
 
   except_move_no_copy(const except_move_no_copy&) = delete;
 
-  operator bool() { return status; }
+  operator bool() const { return status; }
 
 private:
   bool status;
@@ -110,8 +110,38 @@ test01()
   VERIFY( emnc1 == false );
 }
 
+void
+test02()
+{
+  std::pair nemc1;
+  auto nemc2 __attribute__((unused)) = std::move_if_noexcept(nemc1);
+  VERIFY( nemc1.first == false );
+  VERIFY( nemc1.second == false );
+
+  std::pair emc1;
+  auto emc2 __attribute__((unused)) = std::move_if_noexcept(emc1);
+  VERIFY( emc1.first == true );
+  VERIFY( emc1.second == true );
+
+  std::pair emnc1;
+  auto emnc2 __attribute__((unused)) = std::move_if_noexcept(emnc1);
+  VERIFY( emnc1.first == false );
+  VERIFY( emnc1.second == false );
+
+  std::pair cemc1;
+  auto cemc2 __attribute__((unused)) = std::move_if_noexcept(cemc1);
+  VERIFY( cemc1.first == true );
+  VERIFY( cemc1.second == false );
+
+  std::pair nemnc1;
+  auto nemnc2 __attribute__((unused)) = std::move_if_noexcept(nemnc1);
+  VERIFY( nemnc1.first == false );
+  VERIFY( nemnc1.second == false );
+}
+
 int main()
 {
   test01();
+  test02();
   return 0;
 }
diff --git a/libstdc++-v3/testsuite/23_containers/unordered_map/allocator/move_assign.cc b/libstdc++-v3/testsuite/23_containers/unordered_map/allocator/move_assign.cc
index b27269e607a..d1be3adaae5 100644
--- a/libstdc++-v3/testsuite/23_containers/unordered_map/allocator/move_assign.cc
+++ b/libstdc++-v3/testsuite/23_containers/unordered_map/allocator/move_assign.cc
@@ -21,6 +21,7 @@
 #include 
 #include 
 #include 
+#include 
 
 using __gnu_test::propagating_allocator;
 using __gnu_test::counter_type;
@@ -49,8 +50,10 @@ void test01()
   VERIFY( 1 == v1.get_allocator().get_personality() );
   VERIFY( 2 == v2.get_allocator().get_personality() );
 
-  // No move because key is const.
-  VERIFY( counter_type::move_assign_count == 0  );
+  // Key copied, value moved.
+  VERIFY( counter_type::copy_count == 1  );
+  VERIFY( counter_

Re: add tsv110 pipeline scheduling

2018-12-19 Thread wuyuan (E)

Hi Ramana,
 Please ignore the patch in the previous email attachment (the 
ChangeLog has deleted in this patch..)  I have already communicated with Shao 
Kun, he has fixed the problem of the previous patch. So I resubmitted the 
tsv110 pipeline patch, please review.
 The patch  as follows :



2018-12-20   wuyuan  

* config/aarch64/aarch64-cores.def: New CPU.
* config/aarch64/aarch64.md : Add "tsv110.md"
* config/aarch64/tsv110.md : tsv110.md   new file






diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
old mode 100644
new mode 100755
index 20f4924..ea9b7c5
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -97,7 +97,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2
 AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, 
cortexa72, 0x41, 0xd0c, -1)
 
 /* HiSilicon ('H') cores. */
-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
 
 /* ARMv8.4-A Architecture Processors.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
old mode 100644
new mode 100755
index cf2732e..7f7673a
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -349,6 +349,7 @@
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
+(include "tsv110.md")
 
 ;; ---
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md
new file mode 100644
index 000..758ab95
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_automaton "tsv110")
+
+(define_attr "tsv110_neon_type"
+  "neon_arith_acc, neon_arith_acc_q,
+   neon_arith_basic, neon_arith_complex,
+   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
+   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
+   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
+   neon_shift_imm_complex,
+   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
+   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
+   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
+   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
+   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
+   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_fp_recps_rsqrts_q,
+   neon_bitops, neon_bitops_q, neon_from_gp,
+   neon_from_gp_q, neon_move, neon_tbl3_tbl4, neon_zip_q, neon_to_gp,
+   neon_load_a, neon_load_b, neon_load_c, neon_load_d, neon_load_e,
+   neon_load_f, neon_store_a, neon_store_b, neon_store_complex,
+   unknown"
+  (cond [
+ (eq_attr "type" "neon_arith_acc, neon_reduc_add_acc,\
+  neon_reduc_add_acc_q")
+   (const_string "neon_arith_acc")
+ (eq_attr "type" "neon_arith_acc_q")
+   (const_string "neon_arith_acc_q")
+ (eq_attr "type" "neon_abs,neon_abs_q,neon_add, neon_add_q, 
neon_add_long,\
+  neon_add_widen, neon_neg, neon_neg_q,\
+  neon_reduc_add, neon_reduc_add_q,\
+  neon_reduc_add_long, neon_sub, neon_sub_q,\
+  neon_sub_long, neon_sub_widen, neon_logic,\
+  neon_logic_q, neon_tst, neon_tst_q,\
+  neon_compare, neon_compare_q,\
+  neon_compare_zero, neon_compare_zero_q,\
+  neon_minmax, neon_minmax_q, neon_reduc_minmax,\
+  neon_reduc_minmax_q")
+   (const_string "neon_arith_basic")
+ (eq_attr "type" "neon_add_halve_narrow_q,\
+  neon_add_halve, neon_add_halve_q,\
+  neon_sub_halve, neon

[PATCH] Use proper print formatter in main function in fixincl.c

2018-12-19 Thread Nicholas Krause
This fixes the bug id, 71176 to use the proper known
code print formatter type, %lu for size_t rather than
%d which is considered best pratice for print statements.

Signed-off-by: Nicholas Krause 
---
 fixincludes/fixincl.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c
index 6dba2f6e830..4e3010df0a6 100644
--- a/fixincludes/fixincl.c
+++ b/fixincludes/fixincl.c
@@ -159,10 +159,10 @@ main (int argc, char** argv)
 tSCC zFmt[] =
   "\
 Processed %5d files containing %d bytes\n\
-Applying  %5d fixes to %d files\n\
+Applying  %5lu fixes to %d files\n\
 Altering  %5d of them\n";
 
-fprintf (stderr, zFmt, process_ct, ttl_data_size, apply_ct,
+fprintf (stderr, zFmt, process_ct, (unsigned long int) ttl_data_size, 
apply_ct,
  fixed_ct, altered_ct);
   }
 #endif /* DO_STATS */
-- 
2.17.1



Re: add tsv110 pipeline scheduling

2018-12-19 Thread wuyuan (E)
Hi Ramana,
  I have already communicated with Shao Kun, he has fixed the problem 
of the previous patch. So I resubmitted the tsv 110 pipeline patch, please 
review.
 The patch  as follows :

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
old mode 100644
new mode 100755
index b1eed3b..5611dd0
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2018-12-20  wuyuan
+
+   * config/aarch64/aarch64-cores.def: New CPU.
+   * config/aarch64/aarch64.md : Add "tsv110.md"
+   * config/aarch64/tsv110.md : tsv110.md   new file
+
 2018-12-20  Alan Modra  
 
* config/rs6000/sysv4.h (GNU_USER_DYNAMIC_LINKER): Define.
diff --git a/gcc/config/aarch64/aarch64-cores.def 
b/gcc/config/aarch64/aarch64-cores.def
old mode 100644
new mode 100755
index 20f4924..ea9b7c5
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -97,7 +97,7 @@ AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
AARCH64_FL_FOR_ARCH8_2
 AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, 
cortexa72, 0x41, 0xd0c, -1)
 
 /* HiSilicon ('H') cores. */
-AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
+AARCH64_CORE("tsv110",  tsv110, tsv110, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, tsv110,  
 0x48, 0xd01, -1)
 
 /* ARMv8.4-A Architecture Processors.  */
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
old mode 100644
new mode 100755
index cf2732e..7f7673a
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -349,6 +349,7 @@
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 (include "thunderx2t99.md")
+(include "tsv110.md")
 
 ;; ---
 ;; Jumps and other miscellaneous insns
diff --git a/gcc/config/aarch64/tsv110.md b/gcc/config/aarch64/tsv110.md
new file mode 100644
index 000..758ab95
--- /dev/null
+++ b/gcc/config/aarch64/tsv110.md
@@ -0,0 +1,708 @@
+;; tsv110 pipeline description
+;; Copyright (C) 2018 Free Software Foundation, Inc.
+;;
+;; This file is part of GCC.
+;;
+;; GCC is free software; you can redistribute it and/or modify it
+;; under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; GCC is distributed in the hope that it will be useful, but
+;; WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;; General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GCC; see the file COPYING3.  If not see
+;; .
+
+(define_automaton "tsv110")
+
+(define_attr "tsv110_neon_type"
+  "neon_arith_acc, neon_arith_acc_q,
+   neon_arith_basic, neon_arith_complex,
+   neon_reduc_add_acc, neon_multiply, neon_multiply_q,
+   neon_multiply_long, neon_mla, neon_mla_q, neon_mla_long,
+   neon_sat_mla_long, neon_shift_acc, neon_shift_imm_basic,
+   neon_shift_imm_complex,
+   neon_shift_reg_basic, neon_shift_reg_basic_q, neon_shift_reg_complex,
+   neon_shift_reg_complex_q, neon_fp_negabs, neon_fp_arith,
+   neon_fp_arith_q, neon_fp_reductions_q, neon_fp_cvt_int,
+   neon_fp_cvt_int_q, neon_fp_cvt16, neon_fp_minmax, neon_fp_mul,
+   neon_fp_mul_q, neon_fp_mla, neon_fp_mla_q, neon_fp_recpe_rsqrte,
+   neon_fp_recpe_rsqrte_q, neon_fp_recps_rsqrts, neon_fp_recps_rsqrts_q,
+   neon_bitops, neon_bitops_q, neon_from_gp,
+   neon_from_gp_q, neon_move, neon_tbl3_tbl4, neon_zip_q, neon_to_gp,
+   neon_load_a, neon_load_b, neon_load_c, neon_load_d, neon_load_e,
+   neon_load_f, neon_store_a, neon_store_b, neon_store_complex,
+   unknown"
+  (cond [
+ (eq_attr "type" "neon_arith_acc, neon_reduc_add_acc,\
+  neon_reduc_add_acc_q")
+   (const_string "neon_arith_acc")
+ (eq_attr "type" "neon_arith_acc_q")
+   (const_string "neon_arith_acc_q")
+ (eq_attr "type" "neon_abs,neon_abs_q,neon_add, neon_add_q, 
neon_add_long,\
+  neon_add_widen, neon_neg, neon_neg_q,\
+  neon_reduc_add, neon_reduc_add_q,\
+  neon_reduc_add_long, neon_sub, neon_sub_q,\
+  neon_sub_long, neon_sub_widen, neon_logic,\
+  neon_logic_q, neon_tst, neon_tst_q,\
+  neon_compare, neon_compare_q,\
+  neon_compare_zero, neon_compare_zero_q,\
+  neon_minmax, neon_minmax_q, neon_reduc_minmax,\
+  neon_reduc_minmax_q")
+   (const_string "neon_arith_basic")
+ (eq_attr "type" 

Re: [PATCH 1/2] C++: more location wrapper nodes (PR c++/43064, PR c++/43486)

2018-12-19 Thread David Malcolm
On Wed, 2018-12-19 at 20:00 +0100, Thomas Schwinge wrote:
> Hi David!
> 
> I will admit that I don't have researched ;-/ what this is actually
> all
> about, and how it's implemented, but...
> 
> On Mon,  5 Nov 2018 15:31:08 -0500, David Malcolm  m> wrote:
> > The C++ frontend gained various location wrapper nodes in r256448
> > (GCC 8).
> > That patch:
> >   https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00799.html
> > added wrapper nodes around all nodes with !CAN_HAVE_LOCATION_P for:
> > 
> > * arguments at callsites, and for
> > 
> > * typeid, alignof, sizeof, and offsetof.
> > 
> > This is a followup to that patch, adding many more location
> > wrappers
> > to the C++ frontend.  It adds location wrappers for nodes with
> > !CAN_HAVE_LOCATION_P to:
> > 
> > * all literal nodes (in cp_parser_primary_expression)
> > 
> > * all id-expression nodes (in finish_id_expression), except within
> > a
> >   decltype.
> > 
> > * all mem-initializer nodes within a mem-initializer-list
> >   (in cp_parser_mem_initializer)
> > 
> > However, the patch also adds some suppressions: regions in the
> > parser
> > for which wrapper nodes will not be created:
> > 
> > * within a template-parameter-list or template-argument-list (in
> >   cp_parser_template_parameter_list and
> > cp_parser_template_argument_list
> >   respectively), to avoid encoding the spelling location of the
> > nodes
> >   in types.  For example, "array<10>" and "array<10>" are the same
> > type,
> >   despite the fact that the two different "10" tokens are spelled
> > in
> >   different locations in the source.
> > 
> > * within a gnu-style attribute (none of are handlers are set up to
> > cope
> >   with location wrappers yet)
> > 
> > * within various OpenMP clauses

I suppressed the addition of wrapper nodes within OpenMP as a way to
reduce the scope of the patch.

> ... I did wonder why things applicable to OpenMP wouldn't likewise
> apply
> to OpenACC, too?  That is:

It might or might not be.  Maybe there's a gap in my test coverage? 
How should I be running the OpenACC tests?

> > (cp_parser_omp_all_clauses): Don't create wrapper nodes within
> > OpenMP clauses.
> > (cp_parser_omp_for_loop): Likewise.
> > (cp_parser_omp_declare_reduction_exprs): Likewise.
> > @@ -33939,6 +33968,9 @@ cp_parser_omp_all_clauses (cp_parser
> > *parser, omp_clause_mask mask,
> >bool first = true;
> >cp_token *token = NULL;
> >  
> > +  /* Don't create location wrapper nodes within OpenMP
> > clauses.  */
> > +  auto_suppress_location_wrappers sentinel;
> > +
> >while (cp_lexer_next_token_is_not (parser->lexer,
> > CPP_PRAGMA_EOL))
> >  {
> >pragma_omp_clause c_kind;
> > @@ -35223,6 +35255,10 @@ cp_parser_omp_for_loop (cp_parser *parser,
> > enum tree_code code, tree clauses,
> > }
> >loc = cp_lexer_consume_token (parser->lexer)->location;
> >  
> > +  /* Don't create location wrapper nodes within an OpenMP
> > "for"
> > +statement.  */
> > +  auto_suppress_location_wrappers sentinel;
> > +
> >matching_parens parens;
> >if (!parens.require_open (parser))
> > return NULL;
> > @@ -37592,6 +37628,8 @@ cp_parser_omp_declare_reduction_exprs (tree
> > fndecl, cp_parser *parser)
> >else
> > {
> >   cp_parser_parse_tentatively (parser);
> > + /* Don't create location wrapper nodes here.  */
> > + auto_suppress_location_wrappers sentinel;
> >   tree fn_name = cp_parser_id_expression (parser,
> > /*template_p=*/false,
> >   /*check_dependen
> > cy_p=*/true,
> >   /*template_p=*/N
> > ULL,
> 
> Shouldn't "cp_parser_oacc_all_clauses" (and "some" other functions?)
> be
> adjusted in the same way?  How would I test that?  (I don't see any
> OpenMP test cases added -- I have not yet tried whether any problems
> would become apparent when temporarily removing the OpenMP changes
> cited
> above.)

Lots of pre-existing OpenMP test cases started failing when I added the
wrapper nodes to the C++ parser (e.g. for id-expressions and
constants); suppressing them in the given places was an easy way to get
them to pass again.

Dave


[PATCH] -Wtautological-compare: fix comparison of macro expansions

2018-12-19 Thread David Malcolm
On Wed, 2018-12-19 at 17:27 -0600, Aaron Sawdey wrote:
> Assuming you applied this as svn 267273, it causes bootstrap failure
> on powerpc64le-unknown-linux-gnu. Stage 2 fails with multiple
> instances
> of this error:
> 
> ../../trunk-base/gcc/c-family/c-pragma.c: In function ‘void
> handle_pragma_scalar_storage_order(cpp_reader*)’:
> ../../trunk-base/gcc/c-family/c-pragma.c:417:24: error: self-
> comparison always evaluates to false [-Werror=tautological-compare]
>   417 |   if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
>   |^~
> ../../trunk-base/gcc/c-family/c-attribs.c: In function ‘tree_node*
> handle_scalar_storage_order_attribute(tree_node**, tree, tree, int,
> bool*)’:
> ../../trunk-base/gcc/c-family/c-attribs.c:1401:24: error: self-
> comparison always evaluates to false [-Werror=tautological-compare]
>  1401 |   if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
>   |^~
> ../../trunk-base/gcc/builtins.c: In function ‘rtx_def*
> c_readstr(const char*, scalar_int_mode)’:
> ../../trunk-base/gcc/builtins.c:830:28: error: self-comparison always
> evaluates to false [-Werror=tautological-compare]
>   830 |   if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN
>   |^~
> ../../trunk-base/gcc/combine.c: In function ‘int
> rtx_equal_for_field_assignment_p(rtx, rtx, bool)’:
> ../../trunk-base/gcc/combine.c:9668:28: error: self-comparison always
> evaluates to false [-Werror=tautological-compare]
>  9668 |   if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
>   |^~
> 
> Aaron

Sorry about that.

Does the following patch help?  (testing in progress here)

gcc/c-family/ChangeLog:
* c-warn.c (get_outermost_macro_expansion): New function.
(spelled_the_same_p): Use it to unwind the macro expansions, and
compare the outermost macro in each nested expansion, rather than
the innermost.

gcc/testsuite/ChangeLog:
* c-c++-common/Wtautological-compare-8.c: New test.
---
 gcc/c-family/c-warn.c  | 26 +
 .../c-c++-common/Wtautological-compare-8.c | 33 ++
 2 files changed, 54 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/Wtautological-compare-8.c

diff --git a/gcc/c-family/c-warn.c b/gcc/c-family/c-warn.c
index b0f6da0..6013202 100644
--- a/gcc/c-family/c-warn.c
+++ b/gcc/c-family/c-warn.c
@@ -399,6 +399,25 @@ warn_tautological_bitwise_comparison (const op_location_t 
&loc, tree_code code,
"bitwise comparison always evaluates to true");
 }
 
+/* Given LOC from a macro expansion, return the map for the outermost
+   macro in the nest of expansions.  */
+
+static const line_map_macro *
+get_outermost_macro_expansion (location_t loc)
+{
+  gcc_assert (from_macro_expansion_at (loc));
+
+  const line_map *map = linemap_lookup (line_table, loc);
+  const line_map_macro *macro_map;
+  do
+{
+  macro_map = linemap_check_macro (map);
+  loc = linemap_unwind_toward_expansion (line_table, loc, &map);
+} while (linemap_macro_expansion_map_p (map));
+
+  return macro_map;
+}
+
 /* Given LOC_A and LOC_B from macro expansions, return true if
they are "spelled the same" i.e. if they are both directly from
expansion of the same non-function-like macro.  */
@@ -409,11 +428,8 @@ spelled_the_same_p (location_t loc_a, location_t loc_b)
   gcc_assert (from_macro_expansion_at (loc_a));
   gcc_assert (from_macro_expansion_at (loc_b));
 
-  const line_map_macro *map_a
-= linemap_check_macro (linemap_lookup (line_table, loc_a));
-
-  const line_map_macro *map_b
-= linemap_check_macro (linemap_lookup (line_table, loc_b));
+  const line_map_macro *map_a = get_outermost_macro_expansion (loc_a);
+  const line_map_macro *map_b = get_outermost_macro_expansion (loc_b);
 
   if (map_a->macro == map_b->macro)
 if (!cpp_fun_like_macro_p (map_a->macro))
diff --git a/gcc/testsuite/c-c++-common/Wtautological-compare-8.c 
b/gcc/testsuite/c-c++-common/Wtautological-compare-8.c
new file mode 100644
index 000..1adedad
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/Wtautological-compare-8.c
@@ -0,0 +1,33 @@
+/* { dg-options "-Wtautological-compare" } */
+
+int foo;
+#define INCOMING_FRAME_SP_OFFSET foo
+#define DEFAULT_INCOMING_FRAME_SP_OFFSET INCOMING_FRAME_SP_OFFSET
+
+int test (void)
+{
+  if (DEFAULT_INCOMING_FRAME_SP_OFFSET != INCOMING_FRAME_SP_OFFSET) /* { 
dg-warning "self-comparison" "" { target c } } */
+return 1;
+  else
+return 0;
+}
+
+#define BYTES_BIG_ENDIAN foo
+#define WORDS_BIG_ENDIAN foo
+
+int test_2 (void)
+{
+  if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN) /* { dg-warning "self-comparison" 
"" { target c } } */
+return 1;
+  else
+return 0;
+}
+
+#define COND DEFAULT_INCOMING_FRAME_SP_OFFSET != INCOMING_FRAME_SP_OFFSET
+int test_3 (void)
+{
+  if (COND)
+return 1;
+  else
+return 0;
+}
-- 
1.8.5.3



Re: [PATCH] [aarch64] Revert support for ARMv8.2 in tsv110

2018-12-19 Thread Zhangshaokun
Hi Richard,

On 2018/12/19 18:12, Richard Earnshaw (lists) wrote:
> On 19/12/2018 03:11, Shaokun Zhang wrote:
>> For HiSilicon's tsv110 cpu core, it supports some v8_4A features, but
>> some mandatory features are not implemented. Revert to ARMv8.2 that
>> all mandatory features are supported.
>>
> 
> Thanks, I've put this in.
> 

Thanks.

> I've modified the ChangeLog entry slightly - we normally use 'revert' in
> the specific sense of completely removing an existing patch.
> 

I have checked the modified ChangeLog that is precise. Thanks for more 
explanation
about 'revert', got it.

> Also, when sending patches, please do not send ChangeLog entries as part
> of the patch file.  Because the file is always updated at the head, the
> patch hunk is rarely going to apply cleanly.  Instead, include the
> ChangeLog text as part of your email description; that way we can then

Surely, I will follow it. At the beginning, I also had the doubt that every
one would update the ChangeLog when he upstreamed the patch, how to apply
the patch directly if the ChangeLog file is conflicted. I have understood
it when you given the detailed description.

Thanks,
Shaokun

> paste it directly into the ChangeLog file itself and simply correct the
> date.
> 
> R.
> 
>> ---
>>  gcc/ChangeLog| 5 +
>>  gcc/config/aarch64/aarch64-cores.def | 6 +++---
>>  2 files changed, 8 insertions(+), 3 deletions(-)
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index e9f5baa6557c..842876b0ae90 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,8 @@
>> +2018-12-19 Shaokun Zhang  
>> +
>> +* config/aarch64/aarch64-cores.def (tsv110) : Revert support for ARMv8.2
>> +in tsv110.
>> +
>>  2018-12-18  Vladimir Makarov  
>>  
>>  PR rtl-optimization/87759
>> diff --git a/gcc/config/aarch64/aarch64-cores.def 
>> b/gcc/config/aarch64/aarch64-cores.def
>> index 74be5dbf2595..20f4924e084d 100644
>> --- a/gcc/config/aarch64/aarch64-cores.def
>> +++ b/gcc/config/aarch64/aarch64-cores.def
>> @@ -96,10 +96,10 @@ AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
>> AARCH64_FL_FOR_ARCH8_2
>>  AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
>> AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
>> AARCH64_FL_DOTPROD, cortexa72, 0x41, 0xd0b, -1)
>>  AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
>> AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, 
>> cortexa72, 0x41, 0xd0c, -1)
>>  
>> -/* ARMv8.4-A Architecture Processors.  */
>> -
>>  /* HiSilicon ('H') cores. */
>> -AARCH64_CORE("tsv110", tsv110,cortexa57,8_4A, 
>> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES 
>> | AARCH64_FL_SHA2, tsv110,   0x48, 0xd01, -1)
>> +AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
>> AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, 
>> tsv110,   0x48, 0xd01, -1)
>> +
>> +/* ARMv8.4-A Architecture Processors.  */
>>  
>>  /* Qualcomm ('Q') cores. */
>>  AARCH64_CORE("saphira", saphira,saphira,8_4A,  
>> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   
>> 0x51, 0xC01, -1)
>>
> 
> 
> .
> 



Re: [PATCH] Use proper print formatter in main function in fixincl.c

2018-12-19 Thread Joseph Myers
This patch is wrong for multiple reasons (the %d you're changing is for an 
int argument, so is correct as-is, and %lu is not portable for size_t, so 
since we may not be able to assume C99 %zu on the host you'd need to cast 
the ttl_data_size argument explicitly to unsigned long int to use %lu for 
it).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [C++ Patch] [PR c++/88146] do not crash synthesizing inherited ctor(...)

2018-12-19 Thread Alexandre Oliva
Christophe,

Thanks again for the report.  This was quite an adventure to figure
out ;-)  See below.


[PR88146] avoid diagnostics diffs if cdtor_returns_this

Diagnostics for testsuite/g++.dg/cpp0x/inh-ctor32.C varied across
platforms.  Specifically, on ARM, the diagnostics within the subtest
derived_ctor::inherited_derived_ctor::constexpr_noninherited_ctor did
not match those displayed on other platforms, and the test failed.

The difference seemed to have to do with locations assigned to ctors,
but it was more subtle: on ARM, the instantiation of bor's template
ctor was nested within the instantiation of bar's template ctor
inherited from bor.  The reason turned out to be related with the
internal return type of ctors: arm_cxx_cdtor_returns_this is enabled
for because of AAPCS, while cxx.cdtor_returns_this is disabled on most
other platforms.  While convert_to_void returns early with a VOID
expr, the non-VOID return type of the base ctor CALL_EXPR causes
convert_to_void to inspect the called decl for nodiscard attributes:
maybe_warn_nodiscard -> cp_get_fndecl_from_callee ->
maybe_constant_init -> cxx_eval_outermost_constant_expr ->
instantiate_constexpr_fns -> nested instantiation.

The internal return type assigned to a cdtor should not affect
instantiation (constexpr or template) decisions, IMHO.  We know it
affects diagnostics, but I have a hunch this might bring deeper issues
with it, so I've arranged for the CALL_EXPR handler in convert_to_void
to disregard cdtors, regardless of the ABI.


The patch is awkward on purpose: it's meant to illustrate both
portions of the affected code, to draw attention to a potential
problem, and to get bootstrap-testing coverage for the path that will
be taken on ARM.  I envision removing the first hunk, and the else
from the second hunk, once testing is done.

The first hunk is there to highlight where convert_to_void returns
early on x86, instead of handling the CALL_EXPR.

BTW (here's the potential problem), shouldn't we go into the CALL_EXPR
case for the volatile void mentioned in comments next to the case, or
won't that match VOID_TYPE_P?

Finally, I shall mention the possibility of taking the opposite
direction, and actually looking for nodiscard in cdtor calls so as to
trigger the constexpr side effects that we've inadvertently triggered
and observed with the inh-ctor32.C testcase.  It doesn't feel right to
me, but I've been wrong many times before ;-)

Would a rearranged version of the patch, dropping the redundant tests
and retaining only the addition of the test for cdtor identifiers, be
ok to install, provided that it passes regression testing?


Note this patch does NOT carry a ChangeLog entry.  That's also on
purpose, to indicate it's not meant to be included as is.
---
 gcc/cp/cvt.c |   21 -
 1 file changed, 20 insertions(+), 1 deletion(-)

diff --git a/gcc/cp/cvt.c b/gcc/cp/cvt.c
index eb1687377c3e..1a15af8a6e99 100644
--- a/gcc/cp/cvt.c
+++ b/gcc/cp/cvt.c
@@ -1112,7 +1112,8 @@ convert_to_void (tree expr, impl_conv_void implicit, 
tsubst_flags_t complain)
 error_at (loc, "pseudo-destructor is not called");
   return error_mark_node;
 }
-  if (VOID_TYPE_P (TREE_TYPE (expr)))
+  if (VOID_TYPE_P (TREE_TYPE (expr))
+  && TREE_CODE (expr) != CALL_EXPR)
 return expr;
   switch (TREE_CODE (expr))
 {
@@ -1169,6 +1170,24 @@ convert_to_void (tree expr, impl_conv_void implicit, 
tsubst_flags_t complain)
   break;
 
 case CALL_EXPR:   /* We have a special meaning for volatile void fn().  */
+  /* cdtors may return this or void, depending on
+targetm.cxx.cdtor_returns_this, but this shouldn't affect our
+decisions here: nodiscard cdtors are nonsensical, and we
+don't want to call maybe_warn_nodiscard because it may
+trigger constexpr or template instantiation in a way that
+changes their instantiaton nesting.  This changes the way
+contexts are printed in diagnostics, with bad consequences
+for the testsuite, but there may be other undesirable
+consequences of visiting referenced ctors too soon.  */
+  if (DECL_P (TREE_OPERAND (expr, 0))
+ && IDENTIFIER_CDTOR_P (DECL_NAME (TREE_OPERAND (expr, 0
+   return expr;
+  /* FIXME: Move this test before the one above, after a round of
+testing as it is, to get coverage of the behavior we'd get on
+ARM.  */
+  else if (VOID_TYPE_P (TREE_TYPE (expr)))
+   return expr;
+
   maybe_warn_nodiscard (expr, implicit);
   break;
 


-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


[PATCH] Use proper print formatter in main function in fixincl.c

2018-12-19 Thread Nicholas Krause
This fixes the bug id, 71176 to use the proper known
code print formatter type, %lu for size_t rather than
%d which is considered best pratice for print statements.

Signed-off-by: Nicholas Krause 
---
 fixincludes/fixincl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fixincludes/fixincl.c b/fixincludes/fixincl.c
index 6dba2f6e830..8e16a2f7792 100644
--- a/fixincludes/fixincl.c
+++ b/fixincludes/fixincl.c
@@ -159,7 +159,7 @@ main (int argc, char** argv)
 tSCC zFmt[] =
   "\
 Processed %5d files containing %d bytes\n\
-Applying  %5d fixes to %d files\n\
+Applying  %5d fixes to %lu files\n\
 Altering  %5d of them\n";
 
 fprintf (stderr, zFmt, process_ct, ttl_data_size, apply_ct,
-- 
2.17.1



Re: [PATCH 2/2] v2: C++: improvements to binary operator diagnostics (PR c++/87504)

2018-12-19 Thread Aaron Sawdey
Assuming you applied this as svn 267273, it causes bootstrap failure
on powerpc64le-unknown-linux-gnu. Stage 2 fails with multiple instances
of this error:

../../trunk-base/gcc/c-family/c-pragma.c: In function ‘void 
handle_pragma_scalar_storage_order(cpp_reader*)’:
../../trunk-base/gcc/c-family/c-pragma.c:417:24: error: self-comparison always 
evaluates to false [-Werror=tautological-compare]
  417 |   if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
  |^~
../../trunk-base/gcc/c-family/c-attribs.c: In function ‘tree_node* 
handle_scalar_storage_order_attribute(tree_node**, tree, tree, int, bool*)’:
../../trunk-base/gcc/c-family/c-attribs.c:1401:24: error: self-comparison 
always evaluates to false [-Werror=tautological-compare]
 1401 |   if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
  |^~
../../trunk-base/gcc/builtins.c: In function ‘rtx_def* c_readstr(const char*, 
scalar_int_mode)’:
../../trunk-base/gcc/builtins.c:830:28: error: self-comparison always evaluates 
to false [-Werror=tautological-compare]
  830 |   if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN
  |^~
../../trunk-base/gcc/combine.c: In function ‘int 
rtx_equal_for_field_assignment_p(rtx, rtx, bool)’:
../../trunk-base/gcc/combine.c:9668:28: error: self-comparison always evaluates 
to false [-Werror=tautological-compare]
 9668 |   if (BYTES_BIG_ENDIAN != WORDS_BIG_ENDIAN)
  |^~

Aaron

On 12/12/18 2:42 PM, Jason Merrill wrote:
> On 12/4/18 5:35 PM, David Malcolm wrote:
>> The v1 patch:
>>    https://gcc.gnu.org/ml/gcc-patches/2018-11/msg00303.html
>> has bitrotten somewhat, so here's v2 of the patch, updated relative
>> to r266740.
>>
>> Blurb from v1 patch follows:
>>
>> The C frontend is able (where expression locations are available) to print
>> problems with binary operators in 3-location form, labelling the types of
>> the expressions:
>>
>>    arg_0 op arg_1
>>    ~ ^~ ~
>>  |    |
>>  |    arg1 type
>>  arg0 type
>>
>> The C++ frontend currently just shows the combined location:
>>
>>    arg_0 op arg_1
>>    ~~^~~~
>>
>> and fails to highlight where the subexpressions are, or their types.
>>
>> This patch introduces a op_location_t struct for handling the above
>> operator-location vs combined-location split, and a new
>> class binary_op_rich_location for displaying the above, so that the
>> C++ frontend is able to use the more detailed 3-location form for
>> type mismatches in binary operators, and for -Wtautological-compare
>> (where types are not displayed).  Both forms can be seen in this
>> example:
>>
>> bad-binary-ops.C:69:20: error: no match for 'operator&&' (operand types are
>>    's' and 't')
>>     69 |   return ns_4::foo && ns_4::inner::bar;
>>    |  ~ ^~ 
>>    |    |   |
>>    |    s   t
>> bad-binary-ops.C:69:20: note: candidate: 'operator&&(bool, bool)' 
>>     69 |   return ns_4::foo && ns_4::inner::bar;
>>    |  ~~^~~
>>
>> The patch also allows from some uses of macros in
>> -Wtautological-compare, where both sides of the comparison have
>> been spelled the same way, e.g.:
>>
>> Wtautological-compare-ranges.c:23:11: warning: self-comparison always
>>     evaluates to true [-Wtautological-compare]
>>     23 |   if (FOO == FOO);
>>    |   ^~
>>
>> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu, in
>> conjunction with the previous patch.
>>
>> OK for trunk?
>> Dave
>>
>> gcc/c-family/ChangeLog:
>> PR c++/87504
>> * c-common.h (warn_tautological_cmp): Convert 1st param from
>> location_t to const op_location_t &.
>> * c-warn.c (find_array_ref_with_const_idx_r): Strip location
>> wrapper when testing for INTEGER_CST.
>> (warn_tautological_bitwise_comparison): Convert 1st param from
>> location_t to const op_location_t &; use it to build a
>> binary_op_rich_location, and use this.
>> (spelled_the_same_p): New function.
>> (warn_tautological_cmp): Convert 1st param from location_t to
>> const op_location_t &.  Warn for macro expansions if
>> spelled_the_same_p.  Use binary_op_rich_location.
>>
>> gcc/c/ChangeLog:
>> PR c++/87504
>> * c-typeck.c (class maybe_range_label_for_tree_type_mismatch):
>> Move from here to gcc-rich-location.h and gcc-rich-location.c.
>> (build_binary_op): Use struct op_location_t and
>> class binary_op_rich_location.
>>
>> gcc/cp/ChangeLog:
>> PR c++/87504
>> * call.c (op_error): Convert 1st param from location_t to
>> const op_location_t &.  Use binary_op_rich_location for binary
>> ops.
>> (build_conditional_expr_1): Convert 1st param from location_t to
>> const op_location_t &.
>> (build_conditional_expr): Likewise.
>> (build_new_op_1): Likewise.
>> (build_new_op): Likewise.
>> * cp-tr

[PATCH] Improve AVX512 sse movcc (PR target/88547)

2018-12-19 Thread Jakub Jelinek
Hi!

If one vcond argument is all ones (non-bool) vector and another one is all
zeros, we can use for AVX512{DQ,BW} (sometimes + VL) the vpmovm2? insns.
While if op_true is all ones and op_false, we emit large code that the
combiner often optimizes to that vpmovm2?, if the arguments are swapped,
we emit vpxor + vpternlog + and masked move (blend), while we could just
invert the mask with knot* and use vpmovm2?.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?  The patch is large, but it is mostly reindentation, in the
attachment there is diff -ubpd variant of the i386.c changes to make it more
readable.

2018-12-19  Jakub Jelinek  

PR target/88547
* config/i386/i386.c (ix86_expand_sse_movcc): For maskcmp, try to
emit vpmovm2? instruction perhaps after knot?.  Reorganize code
so that it doesn't have to test !maskcmp in almost every conditional.

* gcc.target/i386/pr88547-1.c: New test.

--- gcc/config/i386/i386.c.jj   2018-12-18 19:40:27.698946295 +0100
+++ gcc/config/i386/i386.c  2018-12-19 17:14:24.948218640 +0100
@@ -23593,33 +23593,117 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp
   cmp = gen_rtx_SUBREG (mode, cmp, 0);
 }
 
-  if (vector_all_ones_operand (op_true, mode)
-  && rtx_equal_p (op_false, CONST0_RTX (mode))
-  && !maskcmp)
+  if (maskcmp)
+{
+  rtx (*gen) (rtx, rtx) = NULL;
+  if ((op_true == CONST0_RTX (mode)
+  && vector_all_ones_operand (op_false, mode))
+ || (op_false == CONST0_RTX (mode)
+ && vector_all_ones_operand (op_true, mode)))
+   switch (mode)
+ {
+ case E_V64QImode:
+   if (TARGET_AVX512BW)
+ gen = gen_avx512bw_cvtmask2bv64qi;
+   break;
+ case E_V32QImode:
+   if (TARGET_AVX512VL && TARGET_AVX512BW)
+ gen = gen_avx512vl_cvtmask2bv32qi;
+   break;
+ case E_V16QImode:
+   if (TARGET_AVX512VL && TARGET_AVX512BW)
+ gen = gen_avx512vl_cvtmask2bv16qi;
+   break;
+ case E_V32HImode:
+   if (TARGET_AVX512BW)
+ gen = gen_avx512bw_cvtmask2wv32hi;
+   break;
+ case E_V16HImode:
+   if (TARGET_AVX512VL && TARGET_AVX512BW)
+ gen = gen_avx512vl_cvtmask2wv16hi;
+   break;
+ case E_V8HImode:
+   if (TARGET_AVX512VL && TARGET_AVX512BW)
+ gen = gen_avx512vl_cvtmask2wv8hi;
+   break;
+ case E_V16SImode:
+   if (TARGET_AVX512DQ)
+ gen = gen_avx512f_cvtmask2dv16si;
+   break;
+ case E_V8SImode:
+   if (TARGET_AVX512VL && TARGET_AVX512DQ)
+ gen = gen_avx512vl_cvtmask2dv8si;
+   break;
+ case E_V4SImode:
+   if (TARGET_AVX512VL && TARGET_AVX512DQ)
+ gen = gen_avx512vl_cvtmask2dv4si;
+   break;
+ case E_V8DImode:
+   if (TARGET_AVX512DQ)
+ gen = gen_avx512f_cvtmask2qv8di;
+   break;
+ case E_V4DImode:
+   if (TARGET_AVX512VL && TARGET_AVX512DQ)
+ gen = gen_avx512vl_cvtmask2qv4di;
+   break;
+ case E_V2DImode:
+   if (TARGET_AVX512VL && TARGET_AVX512DQ)
+ gen = gen_avx512vl_cvtmask2qv2di;
+   break;
+ default:
+   break;
+ }
+  if (gen && SCALAR_INT_MODE_P (cmpmode))
+   {
+ cmp = force_reg (cmpmode, cmp);
+ if (op_true == CONST0_RTX (mode))
+   {
+ rtx (*gen_not) (rtx, rtx);
+ switch (cmpmode)
+   {
+   case E_QImode: gen_not = gen_knotqi; break;
+   case E_HImode: gen_not = gen_knothi; break;
+   case E_SImode: gen_not = gen_knotsi; break;
+   case E_DImode: gen_not = gen_knotdi; break;
+   default: gcc_unreachable ();
+   }
+ rtx n = gen_reg_rtx (cmpmode);
+ emit_insn (gen_not (n, cmp));
+ cmp = n;
+   }
+ emit_insn (gen (dest, cmp));
+ return;
+   }
+}
+  else if (vector_all_ones_operand (op_true, mode)
+  && op_false == CONST0_RTX (mode))
 {
   emit_insn (gen_rtx_SET (dest, cmp));
+  return;
 }
-  else if (op_false == CONST0_RTX (mode) && !maskcmp)
+  else if (op_false == CONST0_RTX (mode))
 {
   op_true = force_reg (mode, op_true);
   x = gen_rtx_AND (mode, cmp, op_true);
   emit_insn (gen_rtx_SET (dest, x));
+  return;
 }
-  else if (op_true == CONST0_RTX (mode) && !maskcmp)
+  else if (op_true == CONST0_RTX (mode))
 {
   op_false = force_reg (mode, op_false);
   x = gen_rtx_NOT (mode, cmp);
   x = gen_rtx_AND (mode, x, op_false);
   emit_insn (gen_rtx_SET (dest, x));
+  return;
 }
-  else if (INTEGRAL_MODE_P (mode) && op_true == CONSTM1_RTX (mode)
-  && !maskcmp)
+  else if (INTEGRAL_MODE_P (mode) && op_true == CONSTM

Re: [C++ PATCH] Constexpr fold even some TREE_CONSTANT ctors (PR c++/87934)

2018-12-19 Thread Jakub Jelinek
On Tue, Dec 18, 2018 at 10:27:56PM -0500, Jason Merrill wrote:
> On 12/18/18 6:19 PM, Jakub Jelinek wrote:
> > On Tue, Dec 18, 2018 at 05:40:03PM -0500, Jason Merrill wrote:
> > > On 12/18/18 3:45 PM, Jakub Jelinek wrote:
> > > > The following testcase FAILs, because parsing creates a TREE_CONSTANT
> > > > CONSTRUCTOR that contains CONST_DECL elts.  cp_fold_r can handle that,
> > > > but constexpr evaluation doesn't touch those CONSTRUCTORs.
> > > > 
> > > > Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok 
> > > > for
> > > > trunk?
> > > 
> > > OK.  I also wonder if store_init_value should use cp_fold_r rather than 
> > > just
> > > cp_fully_fold.
> > 
> > I've been thinking about that already when working on the PR88410 bug.
> > 
> > Do you mean something like following completely untested patch?
> > Perhaps I could add a helper inline so that there is no code repetition
> > between cp_fully_fold and this new function.
> 
> Something like that, yes.

The following does the job too (even the PR88410 ICE is gone with the
cp-gimplify.c change from that patch reverted) and is shorter.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-12-19  Jakub Jelinek  

* cp-tree.h (cp_fully_fold_init): Declare.
* cp-gimplify.c (cp_fully_fold_init): New function.
* typeck2.c (split_nonconstant_init, store_init_value): Use it
instead of cp_fully_fold.

--- gcc/cp/cp-tree.h.jj 2018-12-19 09:09:28.251543416 +0100
+++ gcc/cp/cp-tree.h2018-12-19 14:57:54.719812330 +0100
@@ -7542,6 +7542,7 @@ extern bool cxx_omp_privatize_by_referen
 extern bool cxx_omp_disregard_value_expr   (tree, bool);
 extern void cp_fold_function   (tree);
 extern tree cp_fully_fold  (tree);
+extern tree cp_fully_fold_init (tree);
 extern void clear_fold_cache   (void);
 extern tree lookup_hotness_attribute   (tree);
 extern tree process_stmt_hotness_attribute (tree);
--- gcc/cp/cp-gimplify.c.jj 2018-12-19 09:09:28.335542037 +0100
+++ gcc/cp/cp-gimplify.c2018-12-19 15:00:28.214293053 +0100
@@ -2171,6 +2171,20 @@ cp_fully_fold (tree x)
   return cp_fold_rvalue (x);
 }
 
+/* Likewise, but also fold recursively, which cp_fully_fold doesn't perform
+   in some cases.  */
+
+tree
+cp_fully_fold_init (tree x)
+{
+  if (processing_template_decl)
+return x;
+  x = cp_fully_fold (x);
+  hash_set pset;
+  cp_walk_tree (&x, cp_fold_r, &pset, NULL);
+  return x;
+}
+
 /* c-common interface to cp_fold.  If IN_INIT, this is in a static initializer
and certain changes are made to the folding done.  Or should be (FIXME).  We
never touch maybe_const, as it is only used for the C front-end
--- gcc/cp/typeck2.c.jj 2018-12-19 09:09:28.401540956 +0100
+++ gcc/cp/typeck2.c2018-12-19 14:57:54.736812061 +0100
@@ -750,7 +750,7 @@ split_nonconstant_init (tree dest, tree
 init = TARGET_EXPR_INITIAL (init);
   if (TREE_CODE (init) == CONSTRUCTOR)
 {
-  init = cp_fully_fold (init);
+  init = cp_fully_fold_init (init);
   code = push_stmt_list ();
   if (split_nonconstant_init_1 (dest, init))
init = NULL_TREE;
@@ -858,7 +858,7 @@ store_init_value (tree decl, tree init,
   if (!const_init)
value = oldval;
 }
-  value = cp_fully_fold (value);
+  value = cp_fully_fold_init (value);
 
   /* Handle aggregate NSDMI in non-constant initializers, too.  */
   value = replace_placeholders (value, decl);


Jakub


[C++ PATCH] Fix up cp_parser_class_specifier_1 error recovery (PR c++/88180, take 2)

2018-12-19 Thread Jakub Jelinek
Hi!

On Tue, Dec 18, 2018 at 05:29:41PM -0500, Jason Merrill wrote:
> So, we end up calling ggc_collect because we're processing a member function
> in a context where defining a type is not allowed.  One solution would be to
> not do late parsing of members in such a context.
> 
> We don't have this problem with lambdas because cp_parser_lambda_body
> already increments function_depth to avoid GC in the middle of an
> expression.

So like this?  We already have similar treatment for error-recovery
if template arguments are erroneous.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-12-19  Jakub Jelinek  

PR c++/88180
* parser.c (cp_parser_class_specifier_1): If
cp_parser_check_type_definition fails, skip default arguments, NSDMIs,
etc. like for erroneous template args.

* g++.dg/parse/pr88180.C: New test.
* g++.dg/pr85039-1.C: Don't expect diagnostics inside of the type
definition's NSDMIs.

--- gcc/cp/parser.c.jj  2018-12-18 22:44:59.229131699 +0100
+++ gcc/cp/parser.c 2018-12-19 11:35:17.250161052 +0100
@@ -23106,7 +23106,7 @@ cp_parser_class_specifier_1 (cp_parser*
   cp_ensure_no_oacc_routine (parser);
 
   /* Issue an error message if type-definitions are forbidden here.  */
-  cp_parser_check_type_definition (parser);
+  bool type_definition_ok_p = cp_parser_check_type_definition (parser);
   /* Remember that we are defining one more class.  */
   ++parser->num_classes_being_defined;
   /* Inside the class, surrounding template-parameter-lists do not
@@ -23301,7 +23301,7 @@ cp_parser_class_specifier_1 (cp_parser*
   cp_default_arg_entry *e;
   tree save_ccp, save_ccr;
 
-  if (any_erroneous_template_args_p (type))
+  if (!type_definition_ok_p || any_erroneous_template_args_p (type))
{
  /* Skip default arguments, NSDMIs, etc, in order to improve
 error recovery (c++/71169, c++/71832).  */
--- gcc/testsuite/g++.dg/parse/pr88180.C.jj 2018-12-19 11:25:39.565627093 
+0100
+++ gcc/testsuite/g++.dg/parse/pr88180.C2018-12-19 11:25:39.565627093 
+0100
@@ -0,0 +1,12 @@
+// PR c++/88180
+// { dg-do compile }
+// { dg-options "--param ggc-min-heapsize=1024" }
+
+struct d {
+  static d *b;
+} * d::b(__builtin_offsetof(struct { // { dg-error "types may not be defined" }
+  int i;
+  struct a { // { dg-error "types may not be defined" }
+int c() { return .1f; }
+  };
+}, i));
--- gcc/testsuite/g++.dg/pr85039-1.C.jj 2018-04-17 09:01:04.023044471 +0200
+++ gcc/testsuite/g++.dg/pr85039-1.C2018-12-20 00:09:32.348914862 +0100
@@ -5,9 +5,9 @@ constexpr int a() {
   __builtin_offsetof(struct { // { dg-error "types may not be defined" }
 int i;
 short b {
-  __builtin_offsetof(struct { // { dg-error "types may not be defined" }
+  __builtin_offsetof(struct {
int j;
-struct c { // { dg-error "types may not be defined" }
+struct c {
   void d() {
   }
 };


Jakub


Re: [EXT] Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-19 Thread Steve Ellcey
On Wed, 2018-12-19 at 23:57 +0100, Jakub Jelinek wrote:
> On Wed, Dec 19, 2018 at 10:10:19PM +, Steve Ellcey wrote:
> > @@ -199,6 +201,7 @@ int B::f25<7> (int a, int *b, int c)
> >  // { dg-final { scan-assembler-times
> > "_ZGVdN8vuva32u__ZN1BIiE3f25ILi7EEEiiPii:" 1 { target { i?86-*-*
> > x86_64-*-* } } } }
> >  // { dg-final { scan-assembler-times
> > "_ZGVeM8vuva32u__ZN1BIiE3f25ILi7EEEiiPii:" 1 { target { i?86-*-*
> > x86_64-*-* } } } }
> >  // { dg-final { scan-assembler-times
> > "_ZGVeN8vuva32u__ZN1BIiE3f25ILi7EEEiiPii:" 1 { target { i?86-*-*
> > x86_64-*-* } } } }
> > +// { dg-warning "unsupported argument type 'B' for simd" "" {
> > target aarch64-*-* } 191 }
> 
> Can you use relative line number instead, like .-10 or so?

That sounds like a good idea.

> 
> > @@ -62,7 +65,7 @@ int f3 (const int a, const int b, const int c,
> > const int &d, const int &e, const
> >  // { dg-final { scan-assembler-times
> > "_ZGVdM8vulLUR4__Z2f3iiiRKiS0_S0_:" 1 { target { i?86-*-* x86_64-*-
> > * } } } }
> >  // { dg-final { scan-assembler-times
> > "_ZGVdN8vulLUR4__Z2f3iiiRKiS0_S0_:" 1 { target { i?86-*-* x86_64-*-
> > * } } } }
> >  // { dg-final { scan-assembler-times
> > "_ZGVeM16vulLUR4__Z2f3iiiRKiS0_S0_:" 1 { target { i?86-*-* x86_64-
> > *-* } } } }
> > -// { dg-final { scan-assembler-times
> > "_ZGVeN16vulLUR4__Z2f3iiiRKiS0_S0_:" 1 { target { i?86-*-* x86_64-
> > *-* } } } }
> > +// { dg-final { scan-assembler-times
> > "_ZGVeN4vulLUR4__Z2f3iiiRKiS0_S0_:" 1 { target { i?86-*-* x86_64-*-
> > * } } } }
> 
> Can you explain this change?  Are you changing the x86 ABI?

No, that is a mistake that snuck in.  None of the x86 lines should
change.  Same for the other x86 changes.  I was changing the aarch64
manglings and obviously messed up some of the x86 ones.  Unfortunately
I did those changes after I did my x86 testing to verify the x86
code change I made so I didn't notice them.  I will fix those so that
no x86 lines are different.

Steve Ellcey
sell...@marvell.com


Re: [EXT] Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-19 Thread Jakub Jelinek
On Wed, Dec 19, 2018 at 10:10:19PM +, Steve Ellcey wrote:
> @@ -199,6 +201,7 @@ int B::f25<7> (int a, int *b, int c)
>  // { dg-final { scan-assembler-times 
> "_ZGVdN8vuva32u__ZN1BIiE3f25ILi7EEEiiPii:" 1 { target { i?86-*-* x86_64-*-* } 
> } } }
>  // { dg-final { scan-assembler-times 
> "_ZGVeM8vuva32u__ZN1BIiE3f25ILi7EEEiiPii:" 1 { target { i?86-*-* x86_64-*-* } 
> } } }
>  // { dg-final { scan-assembler-times 
> "_ZGVeN8vuva32u__ZN1BIiE3f25ILi7EEEiiPii:" 1 { target { i?86-*-* x86_64-*-* } 
> } } }
> +// { dg-warning "unsupported argument type 'B' for simd" "" { target 
> aarch64-*-* } 191 }

Can you use relative line number instead, like .-10 or so?

> @@ -62,7 +65,7 @@ int f3 (const int a, const int b, const int c, const int 
> &d, const int &e, const
>  // { dg-final { scan-assembler-times "_ZGVdM8vulLUR4__Z2f3iiiRKiS0_S0_:" 1 { 
> target { i?86-*-* x86_64-*-* } } } }
>  // { dg-final { scan-assembler-times "_ZGVdN8vulLUR4__Z2f3iiiRKiS0_S0_:" 1 { 
> target { i?86-*-* x86_64-*-* } } } }
>  // { dg-final { scan-assembler-times "_ZGVeM16vulLUR4__Z2f3iiiRKiS0_S0_:" 1 
> { target { i?86-*-* x86_64-*-* } } } }
> -// { dg-final { scan-assembler-times "_ZGVeN16vulLUR4__Z2f3iiiRKiS0_S0_:" 1 
> { target { i?86-*-* x86_64-*-* } } } }
> +// { dg-final { scan-assembler-times "_ZGVeN4vulLUR4__Z2f3iiiRKiS0_S0_:" 1 { 
> target { i?86-*-* x86_64-*-* } } } }

Can you explain this change?  Are you changing the x86 ABI?

>  #pragma omp declare simd uniform(b) linear(c, d) linear(uval(e)) 
> linear(ref(f))
>  int f4 (const int a, const int b, const int c, const int &d, const int &e, 
> const int &f)
> @@ -83,4 +86,4 @@ int f4 (const int a, const int b, const int c, const int 
> &d, const int &e, const
>  // { dg-final { scan-assembler-times "_ZGVdM8vulLUR4__Z2f4iiiRKiS0_S0_:" 1 { 
> target { i?86-*-* x86_64-*-* } } } }
>  // { dg-final { scan-assembler-times "_ZGVdN8vulLUR4__Z2f4iiiRKiS0_S0_:" 1 { 
> target { i?86-*-* x86_64-*-* } } } }
>  // { dg-final { scan-assembler-times "_ZGVeM16vulLUR4__Z2f4iiiRKiS0_S0_:" 1 
> { target { i?86-*-* x86_64-*-* } } } }
> -// { dg-final { scan-assembler-times "_ZGVeN16vulLUR4__Z2f4iiiRKiS0_S0_:" 1 
> { target { i?86-*-* x86_64-*-* } } } }
> +// { dg-final { scan-assembler-times "_ZGVeN4vulLUR4__Z2f4iiiRKiS0_S0_:" 1 { 
> target { i?86-*-* x86_64-*-* } } } }

Likewise.

> --- a/gcc/testsuite/g++.dg/gomp/declare-simd-4.C
> +++ b/gcc/testsuite/g++.dg/gomp/declare-simd-4.C
> @@ -13,6 +13,8 @@ f1 (int *p, int *q, short *s)
>  // { dg-final { scan-assembler-times "_ZGVdN8l4ln4ln6__Z2f1PiS_Ps:" 1 { 
> target { i?86-*-* x86_64-*-* } } } }
>  // { dg-final { scan-assembler-times "_ZGVeM16l4ln4ln6__Z2f1PiS_Ps:" 1 { 
> target { i?86-*-* x86_64-*-* } } } }
>  // { dg-final { scan-assembler-times "_ZGVeN16l4ln4ln6__Z2f1PiS_Ps:" 1 { 
> target { i?86-*-* x86_64-*-* } } } }
> +// { dg-final { scan-assembler-times "_ZGVnM4l4ln4ln6__Z2f1PiS_Ps:" 1 { 
> target { i?86-*-* x86_64-*-* } } } }
> +// { dg-final { scan-assembler-times "_ZGVnN4l4ln4ln6__Z2f1PiS_Ps:" 1 { 
> target { i?86-*-* x86_64-*-* } } } }

This will also surely fail on x86.

> @@ -21,6 +21,7 @@ int f2 (int a, int *b, int c)
>  /* { dg-final { scan-assembler-times "_ZGVdN8uva32l4_f2:" 1 { target { 
> i?86-*-* x86_64-*-* } } } } */
>  /* { dg-final { scan-assembler-times "_ZGVeM8uva32l4_f2:" 1 { target { 
> i?86-*-* x86_64-*-* } } } } */
>  /* { dg-final { scan-assembler-times "_ZGVeN8uva32l4_f2:" 1 { target { 
> i?86-*-* x86_64-*-* } } } } */
> +/* { dg-warning "GCC does not currently support simdlen 8 for type 'int'" "" 
> { target aarch64-*-* } 11 } */

.-x here too.

Jakub


Re: [PATCH] PR fortran/87992 -- trivially stupid patch, but ...

2018-12-19 Thread Steve Kargl
On Sun, Dec 16, 2018 at 09:42:25AM -0800, Steve Kargl wrote:
> The following patch removes the ICE reported in PR fortran/87992,
> and restores the behavior observed with gfortran 7 and 8 (ie,
> code compiles).
> 
> The PR marks the code with ice-on-invalid-code.  I don't use
> CLASS in any of code and have never read the standard nor a
> Fortran book about CLASS.  If the code is invalid, is gfortran
> required by a constraint to reject the code.  If yes, someone
> with CLASS will need to address this PR; otherwise, I will
> commit the patch and close it as FIXED.
> 
> PS: the patch simply checks for a non-NULL pointer.
> 
> Index: gcc/fortran/resolve.c
> ===
> --- gcc/fortran/resolve.c (revision 267190)
> +++ gcc/fortran/resolve.c (working copy)
> @@ -12313,7 +12313,11 @@ resolve_fl_variable (gfc_symbol *sym, int mp_flag)
>  {
>/* Make sure that character string variables with assumed length are
>dummy arguments.  */
> -  e = sym->ts.u.cl->length;
> +  if (sym->ts.u.cl)
> + e = sym->ts.u.cl->length;
> +  else
> + return false;
> +
>if (e == NULL && !sym->attr.dummy && !sym->attr.result
> && !sym->ts.deferred && !sym->attr.select_type_temporary
> && !sym->attr.omp_udr_artificial_var)
> Index: gcc/testsuite/gfortran.dg/pr87992.f90
> ===
> --- gcc/testsuite/gfortran.dg/pr87992.f90 (nonexistent)
> +++ gcc/testsuite/gfortran.dg/pr87992.f90 (working copy)
> @@ -0,0 +1,5 @@
> +! { dg-do compile }
> +subroutine s(x)
> +   class(*), allocatable :: x
> +   x = ''
> +end
> 

Patched committed on trunk after verification from
Gerhard that the code is valid Fortran.

-- 
Steve


Re: [EXT] Re: [Patch 2/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-19 Thread Steve Ellcey
Here is an updated version of the GCC patch to enable SIMD functions on
Aarch64.  There are a number of changes from the last patch.

I reduced the shared code changes, there is still one change in shared code
(omp-simd-clone.c) to call targetm.simd_clone.adjust from expand_simd_clones
but it now uses the same argument as the existing call.  This new call allows
Aarch64 to add the aarch64_vector_pcs attribute to SIMD clone definitions
which in turn ensures they use the correct ABI.  Previously this target
function was only called on declarations, not definitions.  This change affects
the x86 target so I modified ix86_simd_clone_adjust to return and do nothing
when called with a definition.  This means there is no change in behaviour
on x86.  I did a build and GCC testsuite run on x86 to verify this.

Most of the changes from the previous patch are in the
aarch64_simd_clone_compute_vecsize_and_simdlen function.

The previous version was heavily based on the x86 function, this one has
changes to address the issues that were raised in the earlier patch
and so it no longer looks like the x86 version.  I use types instead of modes
to check for what we can/cannot vectorize and I (try to) differentiate
between vectors that we are not currently handling (but could later) and
those that won't ever be handled.

I have also added a testsuite patch to fix regressions in the gcc.dg/gomp
and g++.dg/gomp tests.  There are no regressions with this patch applied.

Steve Ellcey
sell...@marvell.com


2018-12-19  Steve Ellcey  

* config/aarch64/aarch64.c (cgraph.h): New include.
(supported_simd_type): New function.
(currently_supported_simd_type): Ditto.
(aarch64_simd_clone_compute_vecsize_and_simdlen): Ditto.
(aarch64_simd_clone_adjust): Ditto.
(aarch64_simd_clone_usable): Ditto.
(TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New macro.
(TARGET_SIMD_CLONE_ADJUST): Ditto.
(TARGET_SIMD_CLONE_USABLE): Ditto.
* config/i386/i386.c (ix86_simd_clone_adjust): Add definition check.
* omp-simd-clone.c (expand_simd_clones): Add targetm.simd_clone.adjust
call.

2018-12-19  Steve Ellcey  

* g++.dg/gomp/declare-simd-1.C: Add aarch64 specific
warning checks and assembler scans.
* g++.dg/gomp/declare-simd-3.C: Ditto.
* g++.dg/gomp/declare-simd-4.C: Ditto.
* g++.dg/gomp/declare-simd-7.C: Ditto.
* gcc.dg/gomp/declare-simd-1.c: Ditto.
* gcc.dg/gomp/declare-simd-3.c: Ditto.


 
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 6038494..e61f6e1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -40,6 +40,7 @@
 #include "regs.h"
 #include "emit-rtl.h"
 #include "recog.h"
+#include "cgraph.h"
 #include "diagnostic.h"
 #include "insn-attr.h"
 #include "alias.h"
@@ -71,6 +72,7 @@
 #include "selftest.h"
 #include "selftest-rtl.h"
 #include "rtx-vector-builder.h"
+#include "intl.h"
 
 /* This file should be included last.  */
 #include "target-def.h"
@@ -18064,6 +18066,138 @@ aarch64_estimated_poly_value (poly_int64 val)
   return val.coeffs[0] + val.coeffs[1] * over_128 / 128;
 }
 
+
+/* Return true for types that could be supported as SIMD return or
+   argument types.  */
+
+static bool supported_simd_type (tree t)
+{
+  return (FLOAT_TYPE_P (t) || INTEGRAL_TYPE_P (t));
+}
+
+/* Return true for types that currently are supported as SIMD return
+   or argument types.  */
+
+static bool currently_supported_simd_type (tree t)
+{
+  if (COMPLEX_FLOAT_TYPE_P (t))
+return false;
+
+  return supported_simd_type (t);
+}
+
+/* Implement TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN.  */
+
+static int
+aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
+	struct cgraph_simd_clone *clonei,
+	tree base_type,
+	int num ATTRIBUTE_UNUSED)
+{
+  const char *wmsg;
+  int vsize;
+  tree t, ret_type, arg_type;
+
+  if (!TARGET_SIMD)
+return 0;
+
+  if (clonei->simdlen
+  && (clonei->simdlen < 2
+	  || clonei->simdlen > 1024
+	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
+{
+  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		  "unsupported simdlen %d", clonei->simdlen);
+  return 0;
+}
+
+  ret_type = TREE_TYPE (TREE_TYPE (node->decl));
+  if (TREE_CODE (ret_type) != VOID_TYPE
+  && !currently_supported_simd_type (ret_type))
+{
+  if (supported_simd_type (ret_type))
+	wmsg = G_("GCC does not currently support return type %qT for simd");
+  else
+	wmsg = G_("unsupported return type return type %qT for simd");
+  warning_at (DECL_SOURCE_LOCATION (node->decl), 0, wmsg, ret_type);
+  return 0;
+}
+
+  for (t = DECL_ARGUMENTS (node->decl); t; t = DECL_CHAIN (t))
+{
+  arg_type = TREE_TYPE (t);
+  if (POINTER_TYPE_P (arg_type))
+	arg_type = TREE_TYPE (arg_type);
+  if (!currently_supported_simd_type (arg_type))
+	{
+	  if (supported_simd_type (arg

For libgomp OpenACC entry points, redefine the "device" argument to "flags"

2018-12-19 Thread Thomas Schwinge
Hi Jakub!

On Wed, 19 Dec 2018 15:18:12 +0100, Jakub Jelinek  wrote:
> On Wed, Dec 19, 2018 at 03:03:42PM +0100, Jakub Jelinek wrote:
> > On Wed, Dec 19, 2018 at 02:59:54PM +0100, Thomas Schwinge wrote:
> > > Right.  For OpenACC, there's no "device" clause, so we only ever passed
> > > in "GOMP_DEVICE_ICV" (default), or "GOMP_DEVICE_HOST_FALLBACK" ("if
> > > (false)" clause).  Therefore, the libgomp "resolve_legacy_flags" function
> > > added to make sure that these two values (as used by old executables)
> > > continue to work as before (with new libgomp).  (And, we have to make
> > > sure that no (new) "GOACC_FLAG_*" combination ever results in these
> > > values; will document that.)

> > LGTM then in principle.
> 
> Or keep it int and use inverted bitmask, thus when bit is 1, it represents
> the default state and when bit is 0, it is something different from it.

Ha, I too had that idea after thinking some more about the "-1" and "-2"
values/representations...  :-)

> If you passed before just -1 and -2 and because we are only supporting two's
> complement, the host fallback test would be (flags & 1) == 0.

I structured that a bit more conveniently.  That's especially useful once
additional flags added, where you want to just do "flags |= [flag]", etc.

> Then you don't need to at runtime transform from legacy to non-legacy.

Right.

Is the attached OK for trunk?  If approving this patch, please respond
with "Reviewed-by: NAME " so that your effort will be recorded in
the commit log, see .

For your review convenience, here's the "gcc/omp-expand.c" changes with
"--ignore-space-change" (as I slightly restructured OpenACC vs. OpenMP
code paths):

@@ -7536,49 +7536,62 @@ expand_omp_target (struct omp_region *region)
 
   clauses = gimple_omp_target_clauses (entry_stmt);
 
-  /* By default, the value of DEVICE is GOMP_DEVICE_ICV (let runtime
- library choose) and there is no conditional.  */
-  cond = NULL_TREE;
-  device = build_int_cst (integer_type_node, GOMP_DEVICE_ICV);
-
-  c = omp_find_clause (clauses, OMP_CLAUSE_IF);
-  if (c)
-cond = OMP_CLAUSE_IF_EXPR (c);
-
+  device = NULL_TREE;
+  tree goacc_flags = NULL_TREE;
+  if (is_gimple_omp_oacc (entry_stmt))
+{
+  /* By default, no GOACC_FLAGs are set.  */
+  goacc_flags = integer_zero_node;
+}
+  else
+{
   c = omp_find_clause (clauses, OMP_CLAUSE_DEVICE);
   if (c)
{
-  /* Even if we pass it to all library function calls, it is currently 
only
-defined/used for the OpenMP target ones.  */
-  gcc_checking_assert (start_ix == BUILT_IN_GOMP_TARGET
-  || start_ix == BUILT_IN_GOMP_TARGET_DATA
-  || start_ix == BUILT_IN_GOMP_TARGET_UPDATE
-  || start_ix == BUILT_IN_GOMP_TARGET_ENTER_EXIT_DATA);
-
  device = OMP_CLAUSE_DEVICE_ID (c);
  clause_loc = OMP_CLAUSE_LOCATION (c);
}
   else
+   {
+ /* By default, the value of DEVICE is GOMP_DEVICE_ICV (let runtime
+library choose).  */
+ device = build_int_cst (integer_type_node, GOMP_DEVICE_ICV);
  clause_loc = gimple_location (entry_stmt);
+   }
 
   c = omp_find_clause (clauses, OMP_CLAUSE_NOWAIT);
   if (c)
flags_i |= GOMP_TARGET_FLAG_NOWAIT;
+}
 
+  /* By default, there is no conditional.  */
+  cond = NULL_TREE;
+  c = omp_find_clause (clauses, OMP_CLAUSE_IF);
+  if (c)
+cond = OMP_CLAUSE_IF_EXPR (c);
+  /* If we found the clause 'if (cond)', build:
+ OpenACC: goacc_flags = (cond ? goacc_flags : flags | 
GOACC_FLAG_HOST_FALLBACK)
+ OpenMP: device = (cond ? device : GOMP_DEVICE_HOST_FALLBACK) */
+  if (cond)
+{
+  tree *tp;
+  if (is_gimple_omp_oacc (entry_stmt))
+   tp = &goacc_flags;
+  else
+   {
  /* Ensure 'device' is of the correct type.  */
  device = fold_convert_loc (clause_loc, integer_type_node, device);
 
-  /* If we found the clause 'if (cond)', build
- (cond ? device : GOMP_DEVICE_HOST_FALLBACK).  */
-  if (cond)
-{
+ tp = &device;
+   }
+
   cond = gimple_boolify (cond);
 
   basic_block cond_bb, then_bb, else_bb;
   edge e;
   tree tmp_var;
 
-  tmp_var = create_tmp_var (TREE_TYPE (device));
+  tmp_var = create_tmp_var (TREE_TYPE (*tp));
   if (offloaded)
e = split_block_after_labels (new_bb);
   else
@@ -7601,10 +7614,17 @@ expand_omp_target (struct omp_region *region)
   gsi_insert_after (&gsi, stmt, GSI_CONTINUE_LINKING);
 
   gsi = gsi_start_bb (then_bb);
-  stmt = gimple_build_assign (tmp_var, device);
+  stmt = gimple_build_assign (tmp_var

Re: [PATCH] LWG 2936: update path::compare logic and optimize string comparisons

2018-12-19 Thread Christophe Lyon
On Tue, 18 Dec 2018 at 16:51, Jonathan Wakely  wrote:
>
> The resolution for LWG 2936 defines the comparison more precisely, which
> this patch implements. The patch also defines comparisons with strings
> to work without constructing a temporary path object (so avoids any
> memory allocations).
>
> * include/bits/fs_path.h (path::compare(const string_type&))
> (path::compare(const value_type*)): Add noexcept and construct a
> string view to compare to instead of a path.
> (path::compare(basic_string_view)): Add noexcept. Remove
> inline definition.
> * src/filesystem/std-path.cc (path::_Parser): Track last type read
> from input.
> (path::_Parser::next()): Return a final empty component when the
> input ends in a non-root directory separator.
> (path::_M_append(basic_string_view)): Remove special cases
> for trailing non-root directory separator.
> (path::_M_concat(basic_string_view)): Likewise.
> (path::compare(const path&)): Implement LWG 2936.
> (path::compare(basic_string_view)): Define in terms of
> components returned by parser, consistent with LWG 2936.
> * testsuite/27_io/filesystem/path/compare/lwg2936.cc: New.
> * testsuite/27_io/filesystem/path/compare/path.cc: Test more cases.
> * testsuite/27_io/filesystem/path/compare/strings.cc: Likewise.
>
> Tested x86_64-linux, committed to trunk.
>

Hi,

The updated test fails on aarch64-linux-gnu:
FAIL:27_io/filesystem/path/compare/strings.cc execution test

In the logs I can see:
/libstdc++-v3/testsuite/27_io/filesystem/path/compare/strings.cc:40:
void test01(): Assertion 'p.compare(p0) == p.compare(s0)' failed.

Christophe


[PATCH, og8] Add OpenACC 2.6 `no_create' clause support

2018-12-19 Thread Maciej W. Rozycki
The clause makes any device code use the local memory address for each 
of the variables specified unless the given variable is already present 
on the current device.

2018-12-19  Julian Brown  
Maciej W. Rozycki  

gcc/
* omp-low.c (lower_omp_target): Support GOMP_MAP_NO_ALLOC.
* tree-pretty-print.c (dump_omp_clause): Likewise.

gcc/c-family/
* c-pragma.h (pragma_omp_clause): Add
PRAGMA_OACC_CLAUSE_NO_CREATE.

gcc/c/
* c-parser.c (c_parser_omp_clause_name): Support no_create.
(c_parser_oacc_data_clause): Likewise.
(c_parser_oacc_all_clauses): Likewise.
(OACC_DATA_CLAUSE_MASK, OACC_KERNELS_CLAUSE_MASK)
(OACC_PARALLEL_CLAUSE_MASK, OACC_SERIAL_CLAUSE_MASK): Add
PRAGMA_OACC_CLAUSE_NO_CREATE.
* c-typeck.c (handle_omp_array_sections): Support
GOMP_MAP_NO_ALLOC.

gcc/cp/
* parser.c (cp_parser_omp_clause_name): Support no_create.
(cp_parser_oacc_data_clause): Likewise.
(cp_parser_oacc_all_clauses): Likewise.
(OACC_DATA_CLAUSE_MASK, OACC_KERNELS_CLAUSE_MASK)
(OACC_PARALLEL_CLAUSE_MASK, OACC_SERIAL_CLAUSE_MASK): Add
PRAGMA_OACC_CLAUSE_NO_CREATE.
* semantics.c (handle_omp_array_sections): Support no_create.

gcc/fortran/
* gfortran.h (gfc_omp_map_op): Add OMP_MAP_NO_ALLOC.
* openmp.c (omp_mask2): Add OMP_CLAUSE_NO_CREATE.
(gfc_match_omp_clauses): Support no_create.
(OACC_PARALLEL_CLAUSES, OACC_KERNELS_CLAUSES)
(OACC_SERIAL_CLAUSES, OACC_DATA_CLAUSES): Add
OMP_CLAUSE_NO_CREATE.
* trans-openmp.c (gfc_trans_omp_clauses_1): Support
OMP_MAP_NO_ALLOC.

include/
* gomp-constants.h (gomp_map_kind): Support GOMP_MAP_NO_ALLOC.

libgomp/
* target.c (gomp_map_vars_async): Support GOMP_MAP_NO_ALLOC.
* testsuite/libgomp.oacc-c-c++-common/nocreate-1.c: New test.
* testsuite/libgomp.oacc-c-c++-common/nocreate-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/nocreate-3.c: New test.
* testsuite/libgomp.oacc-c-c++-common/nocreate-4.c: New test.
* testsuite/libgomp.oacc-fortran/nocreate-1.f90: New test.
* testsuite/libgomp.oacc-fortran/nocreate-2.f90: New test.
---
Hi,

 This has passed regression-testing with the `x86_64-linux-gnu' target and
the `nvptx-none' offload target, across the `gcc', `g++', `gfortran' and
`libgomp' test suites.  I will appreciate feedback and if none has been
given shortly, then I will commit this change to the og8 branch.

  Maciej
---
 gcc/c-family/c-pragma.h  |1 
 gcc/c/c-parser.c |   20 
 gcc/c/c-typeck.c |1 
 gcc/cp/parser.c  |   20 
 gcc/cp/semantics.c   |1 
 gcc/fortran/gfortran.h   |1 
 gcc/fortran/openmp.c |   15 ++-
 gcc/fortran/trans-openmp.c   |3 
 gcc/omp-low.c|2 
 gcc/tree-pretty-print.c  |3 
 include/gomp-constants.h |2 
 libgomp/target.c |   53 +
 libgomp/testsuite/libgomp.oacc-c-c++-common/nocreate-1.c |   40 +
 libgomp/testsuite/libgomp.oacc-c-c++-common/nocreate-2.c |   28 ++
 libgomp/testsuite/libgomp.oacc-c-c++-common/nocreate-3.c |   38 +
 libgomp/testsuite/libgomp.oacc-c-c++-common/nocreate-4.c |   42 ++
 libgomp/testsuite/libgomp.oacc-fortran/nocreate-1.f90|   29 +++
 libgomp/testsuite/libgomp.oacc-fortran/nocreate-2.f90|   61 +++
 18 files changed, 352 insertions(+), 8 deletions(-)

gcc-openacc-no-create.diff
Index: gcc-openacc-gcc-8-branch/gcc/c-family/c-pragma.h
===
--- gcc-openacc-gcc-8-branch.orig/gcc/c-family/c-pragma.h
+++ gcc-openacc-gcc-8-branch/gcc/c-family/c-pragma.h
@@ -147,6 +147,7 @@ enum pragma_omp_clause {
   PRAGMA_OACC_CLAUSE_GANG,
   PRAGMA_OACC_CLAUSE_HOST,
   PRAGMA_OACC_CLAUSE_INDEPENDENT,
+  PRAGMA_OACC_CLAUSE_NO_CREATE,
   PRAGMA_OACC_CLAUSE_NOHOST,
   PRAGMA_OACC_CLAUSE_NUM_GANGS,
   PRAGMA_OACC_CLAUSE_NUM_WORKERS,
Index: gcc-openacc-gcc-8-branch/gcc/c/c-parser.c
===
--- gcc-openacc-gcc-8-branch.orig/gcc/c/c-parser.c
+++ gcc-openacc-gcc-8-branch/gcc/c/c-parser.c
@@ -11315,7 +11315,9 @@ c_parser_omp_clause_name (c_parser *pars
result = PRAGMA_OMP_CLAUSE_MERGEABLE;
  break;
case 'n':
- if (!strcmp ("nogroup", p))
+ if (!strcmp ("no_create", p))
+   result = PRAGMA_OACC_CLAUSE_NO_CREATE;
+ else if (!strcmp (

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-19 Thread Wilco Dijkstra
Hi,

Jakub Jelinek wrote:
> On Wed, Dec 19, 2018 at 07:53:48PM +, Uecker, Martin wrote:
>> What do you think about making the trampoline a single call
>> instruction and have a large memory region which is the same
>> page mapped many times?

This sounds like a good idea, but given a function descriptor is 8-16 bytes
it doesn't need to be 1 instruction. You can even go for larger sizes since
all it affects is minimum alignment of function descriptors.

>> The trampoline handler would pop the instruction pointer and use
>> this as an index into the real stack to read the static chain and
>> function pointer.
>
> While you save a few bytes per trampoline that way, it is heavily call-ret
> stack unfriendly, so it will not be very fast.

A repeated page adjacent to the stack is a good idea since it avoids adding
runtime support to push/pop nested function addresses. That would be
inefficient and likely very tricky for setjmp and exception handling
(or leak memory).

Since it can use several instructions we could load the static chain register
with the PC for example. On ISAs that don't support PC-relative addressing
you could do a call/ret sequence to get the PC and then tailcall the helper
to keep the return stack intact.

If computing the difference between the stack and trampoline region takes
just a few instructions (eg. thread local storage) then it could even be 
inlined.

Wilco 


[PATCH, i386]: Use kortest instead of ktest in *cmp_ccz_1

2018-12-19 Thread Uros Bizjak
Hello!

Attached patch uses equivalent instruction, where HImode variant is
also enabled for plain AVX512F isa.

2018-12-19  Uros Bizjak  

* config/i386/i386.md (SWI1248_AVX512BWDQ_64): Rename from
SWI1248_AVX512BWDQ2_64.  Unconditionally enable HImode.
(*cmp_ccz_1): Emit kortest instead of ktest insn.
Use SWI1248_AVX512BWDQ_64 mode iterator and enable only for
TARGET_AVX512F.

testsuite/ChangeLog:

2018-12-19  Uros Bizjak  

* gcc.target/i386/avx512dq-pr82855.c: Update scan-assembler pattern.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mailine SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 267276)
+++ config/i386/i386.md (working copy)
@@ -1244,20 +1244,20 @@
(compare:CC (match_operand:SWI48 0 "nonimmediate_operand")
(match_operand:SWI48 1 "")))])
 
-(define_mode_iterator SWI1248_AVX512BWDQ2_64
-  [(QI "TARGET_AVX512DQ") (HI "TARGET_AVX512DQ")
+(define_mode_iterator SWI1248_AVX512BWDQ_64
+  [(QI "TARGET_AVX512DQ") HI
(SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW && TARGET_64BIT")])
 
 (define_insn "*cmp_ccz_1"
   [(set (reg FLAGS_REG)
-   (compare (match_operand:SWI1248_AVX512BWDQ2_64 0
+   (compare (match_operand:SWI1248_AVX512BWDQ_64 0
"nonimmediate_operand" ",?m,$k")
-(match_operand:SWI1248_AVX512BWDQ2_64 1 "const0_operand")))]
-  "ix86_match_ccmode (insn, CCZmode)"
+(match_operand:SWI1248_AVX512BWDQ_64 1 "const0_operand")))]
+  "TARGET_AVX512F && ix86_match_ccmode (insn, CCZmode)"
   "@
test{}\t%0, %0
cmp{}\t{%1, %0|%0, %1}
-   ktest\t%0, %0"
+   kortest\t%0, %0"
   [(set_attr "type" "test,icmp,msklog")
(set_attr "length_immediate" "0,1,*")
(set_attr "prefix" "*,*,vex")
Index: testsuite/gcc.target/i386/avx512dq-pr82855.c
===
--- testsuite/gcc.target/i386/avx512dq-pr82855.c(revision 267276)
+++ testsuite/gcc.target/i386/avx512dq-pr82855.c(working copy)
@@ -1,7 +1,7 @@
 /* PR target/82855 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -mavx512vl -mavx512dq" } */
-/* { dg-final { scan-assembler {\mktestb\M} } } */
+/* { dg-final { scan-assembler {\mkortestb\M} } } */
 
 #include 
 


C++ PATCH to implement deferred parsing of noexcept-specifiers (c++/86476, c++/52869)

2018-12-19 Thread Marek Polacek
Prompted by Jon's observation in 52869, I noticed that we don't treat
a noexcept-specifier as a complete-class context of a class ([class.mem]/6).
As with member function bodies, default arguments, and NSDMIs, names used in
a noexcept-specifier of a member-function can be declared later in the class
body, so we need to wait and parse them at the end of the class.
For that, I've made use of DEFAULT_ARG (now best to be renamed to UNPARSED_ARG).

This wasn't as easy as I'd anticipated, because I needed to make sure to
* handle well accessing function parameters in the noexcept-specifier,
  hence the maybe_{begin,end}_member_function_processing business,
* not regress diagnostic.  See e.g. noexcept38.C for detecting "looser
  throw specifier", or noexcept39.C, friend decls and redeclaration.
  This is handled by functions like noexcept_override_late_checks and
  check_redeclaration_exception_specification.  I hope that's it.

Compiling libstdc++ was a fairly good stress test, and I've added a bunch
of reduced testcases I've collected along the way.

I also noticed we're not properly detecting using 'this' in static member
functions; tracked in 88548.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-12-19  Marek Polacek  

PR c++/86476 - noexcept-specifier is a complete-class context.
PR c++/52869
* cp-tree.def (DEFAULT_ARG): Update commentary.
* cp-tree.h (UNPARSED_NOEXCEPT_SPEC_P): New macro.
(check_redeclaration_exception_specification): Declare.
(maybe_check_throw_specifier): Declare.
* decl.c (check_redeclaration_exception_specification): No longer
static.  Handle UNPARSED_NOEXCEPT_SPEC_P.
* except.c (nothrow_spec_p): Accept DEFAULT_ARG in assert.
* parser.c (cp_parser_noexcept_specification_opt,
cp_parser_late_noexcept_specifier, noexcept_override_late_checks):
Forward-declare.
(unparsed_noexcepts): New macro.
(push_unparsed_function_queues): Update initializer.
(cp_parser_init_declarator): Maybe save the noexcept-specifier to
post process.
(maybe_begin_member_function_processing): New.
(maybe_end_member_function_processing): New.
(cp_parser_class_specifier_1): Implement delayed parsing of
noexcept-specifiers.
(cp_parser_member_declaration): Maybe save the noexcept-specifier to
post process.
(cp_parser_save_noexcept): New.
(cp_parser_late_noexcept_specifier): New.
(noexcept_override_late_checks): New.
(cp_parser_noexcept_specification_opt): Call cp_parser_save_noexcept
instead of the normal processing if needed.
(cp_parser_save_member_function_body): Maybe save the
noexcept-specifier to post process.
* parser.h (cp_unparsed_functions_entry): Add new field to carry
a noexcept-specifier.
* pt.c (dependent_type_p_r): Handle unparsed noexcept expression.
* search.c (maybe_check_throw_specifier): New function, broken out
of...
(check_final_overrider): ...here.  Call maybe_check_throw_specifier.
* tree.c (canonical_eh_spec): Handle UNPARSED_NOEXCEPT_SPEC_P.
(cp_tree_equal): Handle DEFAULT_ARG.
* typeck2.c (merge_exception_specifiers): If an unparsed noexcept
expression has been passed, return it instead of merging it.

* g++.dg/cpp0x/noexcept34.C: New test.
* g++.dg/cpp0x/noexcept35.C: New test.
* g++.dg/cpp0x/noexcept36.C: New test.
* g++.dg/cpp0x/noexcept37.C: New test.
* g++.dg/cpp0x/noexcept38.C: New test.
* g++.dg/cpp0x/noexcept39.C: New test.

diff --git gcc/cp/cp-tree.def gcc/cp/cp-tree.def
index 43d90eb1efb..aa8b752d8f4 100644
--- gcc/cp/cp-tree.def
+++ gcc/cp/cp-tree.def
@@ -209,7 +209,9 @@ DEFTREECODE (USING_STMT, "using_stmt", tcc_statement, 1)
 
 /* An un-parsed default argument.  Holds a vector of input tokens and
a vector of places where the argument was instantiated before
-   parsing had occurred.  */
+   parsing had occurred.  This is also used for delayed NSDMIs and
+   noexcept-specifier parsing.  For a noexcept-specifier, the vector
+   holds a function declaration used for late checking.  */
 DEFTREECODE (DEFAULT_ARG, "default_arg", tcc_exceptional, 0)
 
 /* An uninstantiated/unevaluated noexcept-specification.  For the
diff --git gcc/cp/cp-tree.h gcc/cp/cp-tree.h
index 1d806b782bd..bd3cd200fcb 100644
--- gcc/cp/cp-tree.h
+++ gcc/cp/cp-tree.h
@@ -1193,6 +1193,9 @@ struct GTY (()) tree_default_arg {
 #define UNEVALUATED_NOEXCEPT_SPEC_P(NODE)  \
   (DEFERRED_NOEXCEPT_SPEC_P (NODE) \
&& DEFERRED_NOEXCEPT_PATTERN (TREE_PURPOSE (NODE)) == NULL_TREE)
+#define UNPARSED_NOEXCEPT_SPEC_P(NODE) \
+  ((NODE) && (TREE_PURPOSE (NODE)) \
+   && (TREE_CODE (TREE_PURPOSE (NODE)) == DEFAULT_ARG))
 
 struct GTY (()) tree_deferred_noexcept {
   struct tree_base base;
@@ -643

[PATCH] Fix grammar in libstdc++ ABI history documentation

2018-12-19 Thread Jonathan Wakely

* doc/xml/manual/abi.xml: Add missing word.

Committed to trunk.


commit c8af51b0a2caa1e8a65d5aea28e82cde306f487e
Author: Jonathan Wakely 
Date:   Wed Dec 19 20:15:59 2018 +

Fix grammar in libstdc++ ABI history documentation

* doc/xml/manual/abi.xml: Add missing word.

diff --git a/libstdc++-v3/doc/xml/manual/abi.xml 
b/libstdc++-v3/doc/xml/manual/abi.xml
index 8859e965000..d1e6b989a71 100644
--- a/libstdc++-v3/doc/xml/manual/abi.xml
+++ b/libstdc++-v3/doc/xml/manual/abi.xml
@@ -425,7 +425,7 @@ compatible.
 20160603 which is greater than the
 20160427 value of the macro in the 6.1.0 release,
 but there are features supported in the 6.1.0 release that are not
-supported in 5.4.0 release.
+supported in the 5.4.0 release.
 You also can't test for the exact values listed below to try and
 identify a release, because a snapshot taken from the gcc-5-branch on
 2016-04-27 would have the same value for the macro as the 6.1.0 release


Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-19 Thread Jakub Jelinek
On Wed, Dec 19, 2018 at 07:53:48PM +, Uecker, Martin wrote:
> What do you think about making the trampoline a single call
> instruction and have a large memory region which is the same
> page mapped many times?
> 
> 
> call trampoline_handler
> call trampoline_handler
> call trampoline_handler
> ...
> ...
> many identical read-only pages
> ...
> ...
> 
> 
> The trampoline handler would pop the instruction pointer and use
> this as an index into the real stack to read the static chain and
> function pointer.

While you save a few bytes per trampoline that way, it is heavily call-ret
stack unfriendly, so it will not be very fast.

Jakub


Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-19 Thread Uecker, Martin
Am Dienstag, den 18.12.2018, 17:42 +0100 schrieb Jakub Jelinek:
> On Tue, Dec 18, 2018 at 04:33:48PM +, Uecker, Martin wrote:
> > > Yes, something like this. If the trampolines are pre-allocated, this could
> > > even avoid the need to clear the cache on archs where this is needed.
> > 
> > And if we can make the trampolines be all the same (and it somehow derived
> > from the IP where it has to look for the static chain), we could map the
> > same page of pre-allocated trampolines and not use memory on platforms
> > with virtual memory.
> 
> Yeah, if it is e.g. a pair of executable page and data page right after it,
> say for x86_64 page of:
> pushq $0
> jmp .L1
> pushq $1
> jmp .L1
> ...
> push $NNN
> jmp .L1
> # Almost at the end of page
> .L1:
> decode the above pushed number
> read + decrypt the data (both where to jump to and static chain)
> set static chain reg to the static chain data
> jmp *function pointer
> it could just mmap both pages at once PROT_NONE, and then mmap one from the
> file and fill in data in the other page.  Or perhaps one executable and two
> data pages, depending on the exact sizes of needed data vs. code.

What do you think about making the trampoline a single call
instruction and have a large memory region which is the same
page mapped many times?


call trampoline_handler
call trampoline_handler
call trampoline_handler
...
...
many identical read-only pages
...
...


The trampoline handler would pop the instruction pointer and use
this as an index into the real stack to read the static chain and
function pointer.


Creation of a trampoline would consist of storing
static chain and function on the stack (with
right alignment) and simply return the
corresponding address in the shadow stack.


Best,
Martin




[PATCH][rs6000] avoid using unaligned vsx or lxvd2x/stxvd2x for memcpy/memmove inline expansion

2018-12-19 Thread Aaron Sawdey
Because of POWER9 dd2.1 issues with certain unaligned vsx instructions
to cache inhibited memory, here is a patch that keeps memmove (and memcpy)
inline expansion from doing unaligned vector or using vector load/store
other than lvx/stvx. More description of the issue is here:

https://patchwork.ozlabs.org/patch/814059/

OK for trunk if bootstrap/regtest ok?

Thanks!
   Aaron

2018-12-19  Aaron Sawdey  

* config/rs6000/rs6000-string.c (expand_block_move): Don't use
unaligned vsx and avoid lxvd2x/stxvd2x.
(gen_lvx_v4si_move): New function.


Index: gcc/config/rs6000/rs6000-string.c
===
--- gcc/config/rs6000/rs6000-string.c   (revision 267055)
+++ gcc/config/rs6000/rs6000-string.c   (working copy)
@@ -2669,6 +2669,35 @@
   return true;
 }

+/* Generate loads and stores for a move of v4si mode using lvx/stvx.
+   This uses altivec_{l,st}vx__internal which use unspecs to
+   keep combine from changing what instruction gets used.
+
+   DEST is the destination for the data.
+   SRC is the source of the data for the move.  */
+
+static rtx
+gen_lvx_v4si_move (rtx dest, rtx src)
+{
+  rtx rv = NULL;
+  if (MEM_P (dest))
+{
+  gcc_assert (!MEM_P (src));
+  gcc_assert (GET_MODE (src) == V4SImode);
+  rv = gen_altivec_stvx_v4si_internal (dest, src);
+}
+  else if (MEM_P (src))
+{
+  gcc_assert (!MEM_P (dest));
+  gcc_assert (GET_MODE (dest) == V4SImode);
+  rv = gen_altivec_lvx_v4si_internal (dest, src);
+}
+  else
+gcc_unreachable ();
+
+  return rv;
+}
+
 /* Expand a block move operation, and return 1 if successful.  Return 0
if we should let the compiler generate normal code.

@@ -2721,11 +2750,11 @@

   /* Altivec first, since it will be faster than a string move
 when it applies, and usually not significantly larger.  */
-  if (TARGET_ALTIVEC && bytes >= 16 && (TARGET_EFFICIENT_UNALIGNED_VSX || 
align >= 128))
+  if (TARGET_ALTIVEC && bytes >= 16 && align >= 128)
{
  move_bytes = 16;
  mode = V4SImode;
- gen_func.mov = gen_movv4si;
+ gen_func.mov = gen_lvx_v4si_move;
}
   else if (bytes >= 8 && TARGET_POWERPC64
   && (align >= 64 || !STRICT_ALIGNMENT))



-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-19 Thread Uecker, Martin
Am Sonntag, den 16.12.2018, 09:13 -0700 schrieb Jeff Law:

> It's also important to remember that not every target which uses
> function descriptors uses the LSB.  On some targets the LSB may switch
> between modes (arm vs thumb for example).  So on those targets the use
> of descriptors may imply an even larger minimum alignment.

There is a similar mechanism for pointer-to-member-functions
used by C++. Is this correct on aarch64?

/* By default, the C++ compiler will use the lowest bit of the pointer
   to function to indicate a pointer-to-member-function points to a
   virtual member function.  However, if FUNCTION_BOUNDARY indicates
   function addresses aren't always even, the lowest bit of the delta
   field will be used.  */
#ifndef TARGET_PTRMEMFUNC_VBIT_LOCATION
#define TARGET_PTRMEMFUNC_VBIT_LOCATION \
  (FUNCTION_BOUNDARY >= 2 * BITS_PER_UNIT \
   ? ptrmemfunc_vbit_in_pfn : ptrmemfunc_vbit_in_delta)
#endif


Best,
Martin

Re: [PATCH 1/2] C++: more location wrapper nodes (PR c++/43064, PR c++/43486)

2018-12-19 Thread Thomas Schwinge
Hi David!

I will admit that I don't have researched ;-/ what this is actually all
about, and how it's implemented, but...

On Mon,  5 Nov 2018 15:31:08 -0500, David Malcolm  wrote:
> The C++ frontend gained various location wrapper nodes in r256448 (GCC 8).
> That patch:
>   https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00799.html
> added wrapper nodes around all nodes with !CAN_HAVE_LOCATION_P for:
> 
> * arguments at callsites, and for
> 
> * typeid, alignof, sizeof, and offsetof.
> 
> This is a followup to that patch, adding many more location wrappers
> to the C++ frontend.  It adds location wrappers for nodes with
> !CAN_HAVE_LOCATION_P to:
> 
> * all literal nodes (in cp_parser_primary_expression)
> 
> * all id-expression nodes (in finish_id_expression), except within a
>   decltype.
> 
> * all mem-initializer nodes within a mem-initializer-list
>   (in cp_parser_mem_initializer)
> 
> However, the patch also adds some suppressions: regions in the parser
> for which wrapper nodes will not be created:
> 
> * within a template-parameter-list or template-argument-list (in
>   cp_parser_template_parameter_list and cp_parser_template_argument_list
>   respectively), to avoid encoding the spelling location of the nodes
>   in types.  For example, "array<10>" and "array<10>" are the same type,
>   despite the fact that the two different "10" tokens are spelled in
>   different locations in the source.
> 
> * within a gnu-style attribute (none of are handlers are set up to cope
>   with location wrappers yet)
> 
> * within various OpenMP clauses

... I did wonder why things applicable to OpenMP wouldn't likewise apply
to OpenACC, too?  That is:

>   (cp_parser_omp_all_clauses): Don't create wrapper nodes within
>   OpenMP clauses.
>   (cp_parser_omp_for_loop): Likewise.
>   (cp_parser_omp_declare_reduction_exprs): Likewise.

> @@ -33939,6 +33968,9 @@ cp_parser_omp_all_clauses (cp_parser *parser, 
> omp_clause_mask mask,
>bool first = true;
>cp_token *token = NULL;
>  
> +  /* Don't create location wrapper nodes within OpenMP clauses.  */
> +  auto_suppress_location_wrappers sentinel;
> +
>while (cp_lexer_next_token_is_not (parser->lexer, CPP_PRAGMA_EOL))
>  {
>pragma_omp_clause c_kind;
> @@ -35223,6 +35255,10 @@ cp_parser_omp_for_loop (cp_parser *parser, enum 
> tree_code code, tree clauses,
>   }
>loc = cp_lexer_consume_token (parser->lexer)->location;
>  
> +  /* Don't create location wrapper nodes within an OpenMP "for"
> +  statement.  */
> +  auto_suppress_location_wrappers sentinel;
> +
>matching_parens parens;
>if (!parens.require_open (parser))
>   return NULL;
> @@ -37592,6 +37628,8 @@ cp_parser_omp_declare_reduction_exprs (tree fndecl, 
> cp_parser *parser)
>else
>   {
> cp_parser_parse_tentatively (parser);
> +   /* Don't create location wrapper nodes here.  */
> +   auto_suppress_location_wrappers sentinel;
> tree fn_name = cp_parser_id_expression (parser, /*template_p=*/false,
> /*check_dependency_p=*/true,
> /*template_p=*/NULL,

Shouldn't "cp_parser_oacc_all_clauses" (and "some" other functions?) be
adjusted in the same way?  How would I test that?  (I don't see any
OpenMP test cases added -- I have not yet tried whether any problems
would become apparent when temporarily removing the OpenMP changes cited
above.)


Grüße
 Thomas


Re: [C++ Patch] [PR c++/88146] do not crash synthesizing inherited ctor(...)

2018-12-19 Thread Jakub Jelinek
On Wed, Dec 19, 2018 at 04:47:51PM -0200, Alexandre Oliva wrote:
> On Dec 19, 2018, Christophe Lyon  wrote:
> 
> > The new test inh-ctor32.C fails on arm:
> > FAIL:g++.dg/cpp0x/inh-ctor32.C  -std=c++14  (test for warnings, line 
> > 208)
> > FAIL:g++.dg/cpp0x/inh-ctor32.C  -std=c++17  (test for warnings, line 
> > 208)
> 
> Thanks, sorry about the breakage, I'm looking into it.
> 
> I'm very surprised and puzzled that the messages actually differ across
> targets, but I managed to get the same messages you got, with a cross
> compiler targeting arm-unknown-linux-gnueabi, that are slightly
> different from those I get with a native x86_64-linux-gnu compiler built
> out of the same sources.

ARM returns this from ctors, compared to most other targets that return
void.  Maybe something related to that?

Jakub


Re: [C++ Patch] [PR c++/88146] do not crash synthesizing inherited ctor(...)

2018-12-19 Thread Alexandre Oliva
On Dec 19, 2018, Christophe Lyon  wrote:

> The new test inh-ctor32.C fails on arm:
> FAIL:g++.dg/cpp0x/inh-ctor32.C  -std=c++14  (test for warnings, line 208)
> FAIL:g++.dg/cpp0x/inh-ctor32.C  -std=c++17  (test for warnings, line 208)

Thanks, sorry about the breakage, I'm looking into it.

I'm very surprised and puzzled that the messages actually differ across
targets, but I managed to get the same messages you got, with a cross
compiler targeting arm-unknown-linux-gnueabi, that are slightly
different from those I get with a native x86_64-linux-gnu compiler built
out of the same sources.

-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe


Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2018-12-19 Thread Andi Kleen
On Wed, Dec 19, 2018 at 06:28:29PM +0100, Richard Biener wrote:
> On Wed, Dec 19, 2018 at 4:41 PM Andi Kleen  wrote:
> >
> > > > We can combine the two together, increasing iteration count and
> > > > decreasing perf count at the same time.  What count would you suggest
> > > > from your experience?
> > >
> > > Can we instead for the tests where we want to test profile use/merge
> > > elide the profiling step and supply the "raw" data in an testsuite 
> > > alternate
> > > file instead?
> >
> > That would be possible, but a drawback is that we wouldn't have an
> > "end2end" test anymore that also tests the interaction with perf
> > and autofdo. Would be good to test these cases too, there were regressions
> > in this before.
> 
> Sure.
> 
> > But perhaps splitting that into two separate tests is reasonable,
> > with the majority of tests running with fake data.
> >
> > This would have the advantage that gcc developers who don't
> > have an autofdo setup (e.g. missing tools or running in virtualization
> > with PMU disabled) would still do most of the regression tests.
> 
> Yes, I think the pros outweight the cons here.  Well, at least if
> generating such data that works on multiple archs is even possible?

The gcov data that comes out of autofdo is architecture independent
as far as I know. It's mainly counts per line.

In fact even the perf input data should be fairly architecture
independent (except perhaps for endian)

I think it would need a way to write gcov data using text input
(unless you want to put a lot of binaries into the repository)

Also it would need to be adjusted every time a line number
changes in the test cases. I guess best would be if dejagnu
could somehow generate it from test case comments, but I don't
know how complicated that would be.

Doing such updates would be likely difficult with binaries.

In the future if we ever re-add discriminator support
again it would also need some way to specify the correct
discriminator.

I guess for simple test cases it could be ensured it is
always 0.

-Andi


V9 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-19 Thread H.J. Lu
On Wed, Dec 19, 2018 at 6:51 AM H.J. Lu  wrote:
>
> On Tue, Dec 18, 2018 at 2:14 PM Jason Merrill  wrote:
> >
> > On 12/18/18 4:12 PM, H.J. Lu wrote:
> > > On Tue, Dec 18, 2018 at 12:36 PM Jason Merrill  wrote:
> > >>
> > >> On 12/18/18 9:10 AM, H.J. Lu wrote:
> > >>> +  switch (TREE_CODE (rhs))
> > >>> +{
> > >>> +case ADDR_EXPR:
> > >>> +  base = TREE_OPERAND (rhs, 0);
> > >>> +  while (handled_component_p (base))
> > >>> + {
> > >>> +   if (TREE_CODE (base) == COMPONENT_REF)
> > >>> + break;
> > >>> +   base = TREE_OPERAND (base, 0);
> > >>> + }
> > >>> +  if (TREE_CODE (base) != COMPONENT_REF)
> > >>> + return NULL_TREE;
> > >>> +  object = TREE_OPERAND (base, 0);
> > >>> +  field = TREE_OPERAND (base, 1);
> > >>> +  break;
> > >>> +case COMPONENT_REF:
> > >>> +  object = TREE_OPERAND (rhs, 0);
> > >>> +  field = TREE_OPERAND (rhs, 1);
> > >>> +  break;
> > >>> +default:
> > >>> +  return NULL_TREE;
> > >>> +}
> > >>> +
> > >>> +  tree context = check_alignment_of_packed_member (type, field);
> > >>> +  if (context)
> > >>> +return context;
> > >>> +
> > >>> +  /* Check alignment of the object.  */
> > >>> +  while (TREE_CODE (object) == COMPONENT_REF)
> > >>> +{
> > >>> +  field = TREE_OPERAND (object, 1);
> > >>> +  context = check_alignment_of_packed_member (type, field);
> > >>> +  if (context)
> > >>> + return context;
> > >>> +  object = TREE_OPERAND (object, 0);
> > >>> +}
> > >>> +
> > >>
> > >> You can see interleaved COMPONENT_REF and ARRAY_REF that this still
> > >> doesn't look like it will handle, something like
> > >>
> > >> struct A
> > >> {
> > >> int i;
> > >> };
> > >>
> > >> struct B
> > >> {
> > >> char c;
> > >> __attribute ((packed)) A ar[4];
> > >> };
> > >>
> > >> B b;
> > >>
> > >> int *p = &b.ar[1].i;
> > >>
> > >> Rather than have a loop in the ADDR_EXPR case of the switch, you can
> > >> handle everything in the lower loop.  And not have a switch at all, just
> > >> strip any ADDR_EXPR before the loop.
> > >
> > > I changed it to
> > >
> > >   if (TREE_CODE (rhs) == ADDR_EXPR)
> > >  rhs = TREE_OPERAND (rhs, 0);
> > >while (handled_component_p (rhs))
> > >  {
> > >if (TREE_CODE (rhs) == COMPONENT_REF)
> > >  break;
> > >rhs = TREE_OPERAND (rhs, 0);
> > >  }
> > >
> > >if (TREE_CODE (rhs) != COMPONENT_REF)
> > >  return NULL_TREE;
> > >
> > >object = TREE_OPERAND (rhs, 0);
> > >field = TREE_OPERAND (rhs, 1);
> >
> > That still doesn't warn about my testcase above.
> >
> > > [hjl@gnu-cfl-1 pr51628-6]$ cat a.i
> > > struct A
> > > {
> > > int i;
> > > } __attribute ((packed));
> > >
> > > struct B
> > > {
> > > char c;
> > > struct A ar[4];
> > > };
> > >
> > > struct B b;
> > >
> > > int *p = &b.ar[1].i;
> >
> > This testcase is importantly different because 'i' is packed, whereas in
> > my testcase only the ar member of B is packed.
> >
> > My suggestion was that this loop:
> >
> > > +  /* Check alignment of the object.  */
> > > +  while (TREE_CODE (object) == COMPONENT_REF)
> > > +{
> > > +  field = TREE_OPERAND (object, 1);
> > > +  context = check_alignment_of_packed_member (type, field);
> > > +  if (context)
> > > + return context;
> > > +  object = TREE_OPERAND (object, 0);
> > > +}
> >
> > could loop over all handled_component_p, but only call
> > check_alignment_of_packed_member for COMPONENT_REF.
>
> Thanks for the hint.  I changed it to
>
>   /* Check alignment of the object.  */
>   while (handled_component_p (object))
> {
>   if (TREE_CODE (object) == COMPONENT_REF)
> {
>   do
> {
>   field = TREE_OPERAND (object, 1);
>   context = check_alignment_of_packed_member (type, field);
>   if (context)
> return context;
>   object = TREE_OPERAND (object, 0);
> }
>   while (TREE_CODE (object) == COMPONENT_REF);
> }
>   else
> object = TREE_OPERAND (object, 0);
> }

I got
[hjl@gnu-cfl-1 pr51628-6]$ cat a.i
struct A
{
   int i;
};

struct B
{
   char c;
   __attribute ((packed)) struct A ar[4];
};

struct B b;

int *p = &b.ar[1].i;
[hjl@gnu-cfl-1 pr51628-6]$ make a.s
/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/ -O2
-S a.i
a.i:14:10: warning: taking address of packed member of ‘struct B’ may
result in an unaligned pointer value [-Waddress-of-packed-member]
   14 | int *p = &b.ar[1].i;
  |  ^~
[hjl@gnu-cfl-1 pr51628-6]$

> > > +  if (TREE_CODE (rhs) != COND_EXPR)
> > > +{
> > > +  while (TREE_CODE (rhs) == COMPOUND_EXPR)
> > > + rhs = TREE_OPERAND (rhs, 1);
> >
> > What if you have a COND_EXPR inside a COMPOUND_EXPR?
> >
>
> It works for me:
>
> [hjl@gnu-cfl-1 pr51628-5]$ cat c.i
> struct A 

Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2018-12-19 Thread Richard Biener
On Wed, Dec 19, 2018 at 4:41 PM Andi Kleen  wrote:
>
> > > We can combine the two together, increasing iteration count and
> > > decreasing perf count at the same time.  What count would you suggest
> > > from your experience?
> >
> > Can we instead for the tests where we want to test profile use/merge
> > elide the profiling step and supply the "raw" data in an testsuite alternate
> > file instead?
>
> That would be possible, but a drawback is that we wouldn't have an
> "end2end" test anymore that also tests the interaction with perf
> and autofdo. Would be good to test these cases too, there were regressions
> in this before.

Sure.

> But perhaps splitting that into two separate tests is reasonable,
> with the majority of tests running with fake data.
>
> This would have the advantage that gcc developers who don't
> have an autofdo setup (e.g. missing tools or running in virtualization
> with PMU disabled) would still do most of the regression tests.

Yes, I think the pros outweight the cons here.  Well, at least if
generating such data that works on multiple archs is even possible?

Richard.

> -Andi


[nvptx, committed] Add PTX_CTA_SIZE

2018-12-19 Thread Tom de Vries
[ was: Re: [nvptx] vector length patch series ]
On 14-12-18 20:58, Tom de Vries wrote:
> 0005-nvptx-update-openacc-dim-macros.patch

Factored out this patch.

Committed.

Thanks,
- Tom
[nvptx] Add PTX_CTA_SIZE

2018-12-17  Tom de Vries  

	* config/nvptx/nvptx.c (PTX_CTA_SIZE): Define.

---
 gcc/config/nvptx/nvptx.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 2a2d638e6d7..f4095ff5f55 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -86,6 +86,11 @@
 #define PTX_WORKER_LENGTH 32
 #define PTX_DEFAULT_RUNTIME_DIM 0 /* Defer to runtime.  */
 
+/* The PTX concept CTA (Concurrent Thread Array) maps on the CUDA concept thread
+   block, which has had a maximum number of threads of 1024 since CUDA version
+   2.x.  */
+#define PTX_CTA_SIZE 1024
+
 /* The various PTX memory areas an object might reside in.  */
 enum nvptx_data_area
 {


Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2018-12-19 Thread Andi Kleen
On Wed, Dec 19, 2018 at 12:08:35PM +0800, Bin.Cheng wrote:
> On Wed, Dec 19, 2018 at 12:00 PM Andi Kleen  wrote:
> >
> > On Wed, Dec 19, 2018 at 10:01:15AM +0800, Bin.Cheng wrote:
> > > On Tue, Dec 18, 2018 at 7:15 PM Bin.Cheng  wrote:
> > > >
> > > > On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen  wrote:
> > > > >
> > > > > "bin.cheng"  writes:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Due to ICE and mal-functional bugs, indirect call value profile 
> > > > > > transformation
> > > > > > is disabled on GCC-7/8/trunk.  This patch restores the 
> > > > > > transformation.  The
> > > > > > main issue is AutoFDO should store cgraph_node's profile_id of 
> > > > > > callee func in
> > > > > > the first histogram value's counter, rather than pointer to 
> > > > > > callee's name string
> > > > > > as it is now.
> > > > > > With the patch, some "Indirect call -> direct call" tests pass with 
> > > > > > autofdo, while
> > > > > > others are unstable.  I think the instability is caused by poor 
> > > > > > perf data collected
> > > > > > during regrets run, and can confirm these tests pass if good perf 
> > > > > > data could be
> > > > > > collected in manual experiments.
> > > > >
> > > > > Would be good to make the tests stable, otherwise we'll just have
> > > > > regressions in the future again.
> > > > >
> > > > > The problem is that the tests don't run long enough and don't get 
> > > > > enough samples?
> > > > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > > > -  int i;
> > > > -  for (i = 0; i < 1000; i++)
> > > > +  int i, j;
> > > > +  for (i = 0; i < 100; i++)
> > > > +for (j = 0; j < 50; j++)
> > > >   g += tc->foo();
> > > > if (g<100) g++;
> > > >  }
> > > > @@ -27,8 +28,9 @@ void test1 (A *tc)
> > > >  static __attribute__((always_inline))
> > > >  void test2 (B *tc)
> > > >  {
> > > > -  int i;
> > > > +  int i, j;
> > > >for (i = 0; i < 100; i++)
> > > > +for (j = 0; j < 50; j++)
> > > >
> > > > I have to increase loop count like this to get stable pass on my
> > > > machine.  The original count (1000) is too small to be sampled.
> > > >
> > > > >
> > > > > Could add some loop?
> > > > > Or possibly increase the sampling frequency in perf (-F or -c)?
> > > > Maybe, I will have a try.
> > > Turned out all "Indirect call" test can be resolved by adding -c 100
> > > to perf command line:
> > > diff --git a/gcc/config/i386/gcc-auto-profile 
> > > b/gcc/config/i386/gcc-auto-profile
> > > ...
> > > -exec perf record -e $E -b "$@"
> > > +exec perf record -e $E -c 100 -b "$@"
> > >
> > > Is 100 too small here?  Or is it fine for all scenarios?
> >
> > -c 100 is risky because it can cause perf throttling, which
> > makes it lose data.
> Right, it looks suspicious to me too.
> 
> >
> > perf has a limiter that if the PMU handler uses too much CPU
> > time it stops measuring for some time. A PMI is 10k+ cycles,
> > so doing one every 100 branches is a lot of CPU time.
> >
> > I wouldn't go down that low. It is better to increase the
> > iteration count.
> We can combine the two together, increasing iteration count and
> decreasing perf count at the same time.  What count would you suggest
> from your experience?

Normally nothing less than 50k for a common event like branches.

But for such a limited test 10k might still work, as long
as the runtime is fair controlled.

We would probably need to ensure at least 10+ samples, so
that would be 100k iterations.

iirc that was what we used originally, until people
complained about the simulator run times.

-Andi


[PATCH][GCC][Aarch64] Change expected bfxil count in gcc.target/aarch64/combine_bfxil.c to 18 (PR/87763)

2018-12-19 Thread Sam Tebbs
Hi all,

Since r265398 (combine: Do not combine moves from hard registers), the bfxil
scan in gcc.target/aarch64/combine_bfxil.c has been failing.

FAIL: gcc.target/aarch64/combine_bfxil.c scan-assembler-times bfxil\\t 13

This is because bfi was generated for the combine_* functions in the 
above test,
but as of r265398, bfxil is preferred over bfi and so the bfxil count has
increased. This patch increases the scan count to 18 to account for this so
that the test passes.

Before r265398

combine_zero_extended_int:
     bfxil   x0, x1, 0, 16
     ret

combine_balanced:
     bfi x0, x1, 0, 32
     ret

combine_minimal:
     bfi x0, x1, 0, 1
     ret

combine_unbalanced:
     bfi x0, x1, 0, 24
     ret

combine_balanced_int:
     bfi w0, w1, 0, 16
     ret

combine_unbalanced_int:
     bfi w0, w1, 0, 8
     ret

With r265398

combine_zero_extended_int:
     bfxil   x0, x1, 0, 16
     ret

combine_balanced:
     bfxil   x0, x1, 0, 32
     ret

combine_minimal:
     bfxil   x0, x1, 0, 1
     ret

combine_unbalanced:
     bfxil   x0, x1, 0, 24
     ret

combine_balanced_int:
     bfxil   w0, w1, 0, 16
     ret

combine_unbalanced_int:
     bfxil   w0, w1, 0, 8
     ret

These bfxil and bfi invocations are equivalent, so this patch won't hide any
incorrect code-gen.

Bootstrapped on aarch64-none-linux-gnu and regression tested on
aarch64-none-elf with no regressions.

OK for trunk?

gcc/testsuite/Changelog:

2018-12-19  Sam Tebbs  

     * gcc.target/aarch64/combine_bfxil.c: Change 
scan-assembler-times bfxil count to 18.

diff --git a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c 
b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
index 
84e5377ce9a10953f50b7c13ed06563bef014a55..109f989a2f0b68ce65509a38a82e8fd819f45a19
 100644
--- a/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
+++ b/gcc/testsuite/gcc.target/aarch64/combine_bfxil.c
@@ -114,4 +114,4 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-assembler-times "bfxil\\t" 13 } } */
+/* { dg-final { scan-assembler-times "bfxil\\t" 18 } } */


Re: [PATCH, GCC, AARCH64, 5/6] Enable BTI : Add new pass for BTI.

2018-12-19 Thread James Greenhalgh
On Fri, Dec 14, 2018 at 10:09:03AM -0600, Sudakshina Das wrote:



> I have updated the patch according to our discussions offline.
> The md pattern is now split into 4 patterns and i have added a new
> test for the setjmp case along with some comments where missing.

This is OK for trunk.

Thanks,
James

> *** gcc/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das  
>   Ramana Radhakrishnan  
> 
>   * config.gcc (aarch64*-*-*): Add aarch64-bti-insert.o.
>   * gcc/config/aarch64/aarch64.h: Update comment for
>   TRAMPOLINE_SIZE.
>   * config/aarch64/aarch64.c (aarch64_asm_trampoline_template):
>   Update if bti is enabled.
>   * config/aarch64/aarch64-bti-insert.c: New file.
>   * config/aarch64/aarch64-passes.def (INSERT_PASS_BEFORE): Insert
>   bti pass.
>   * config/aarch64/aarch64-protos.h (make_pass_insert_bti):
>   Declare the new bti pass.
>   * config/aarch64/aarch64.md (unspecv): Add UNSPECV_BTI_NOARG,
>   UNSPECV_BTI_C, UNSPECV_BTI_J and UNSPECV_BTI_JC.
>   (bti_noarg, bti_j, bti_c, bti_jc): New define_insns.
>   * config/aarch64/t-aarch64: Add rule for aarch64-bti-insert.o.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2018-xx-xx  Sudakshina Das  
> 
>   * gcc.target/aarch64/bti-1.c: New test.
>   * gcc.target/aarch64/bti-2.c: New test.
>   * gcc.target/aarch64/bti-3.c: New test.
>   * lib/target-supports.exp
>   (check_effective_target_aarch64_bti_hw): Add new check for
>   BTI hw.
> 
> Thanks
> Sudi


Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2018-12-19 Thread Andi Kleen
> > We can combine the two together, increasing iteration count and
> > decreasing perf count at the same time.  What count would you suggest
> > from your experience?
> 
> Can we instead for the tests where we want to test profile use/merge
> elide the profiling step and supply the "raw" data in an testsuite alternate
> file instead?

That would be possible, but a drawback is that we wouldn't have an
"end2end" test anymore that also tests the interaction with perf
and autofdo. Would be good to test these cases too, there were regressions
in this before.

But perhaps splitting that into two separate tests is reasonable,
with the majority of tests running with fake data.

This would have the advantage that gcc developers who don't
have an autofdo setup (e.g. missing tools or running in virtualization
with PMU disabled) would still do most of the regression tests.

-Andi


Re: [PATCH] v6: C++: more location wrapper nodes (PR c++/43064, PR c++/43486)

2018-12-19 Thread David Malcolm
On Tue, 2018-12-18 at 15:40 -0500, Jason Merrill wrote:
> On 12/18/18 4:22 PM, David Malcolm wrote:
> > On Mon, 2018-12-17 at 18:30 -0500, David Malcolm wrote:
> > > On Mon, 2018-12-17 at 14:33 -0500, Jason Merrill wrote:
> > > > On 12/14/18 7:17 PM, David Malcolm wrote:
> > > > > +  /* Since default args are effectively part of the
> > > > > function
> > > > > type,
> > > > > +  strip location wrappers here, since otherwise the
> > > > > location of
> > > > > +  one function's default arguments is arbitrarily
> > > > > chosen
> > > > > for
> > > > > +  all functions with similar signature (due to
> > > > > canonicalization
> > > > > +  of function types).  */
> > > > 
> > > > Hmm, looking at this again, why would this happen?  I see that
> > > > type_list_equal uses == to compare default arguments, so two
> > > > function
> > > > types with the same default argument but different location
> > > > wrappers
> > > > shouldn't be combined.
> > > > 
> > > > Jason
> > > 
> > > Thanks.
> > > 
> > > I did some digging into this.  I added this strip to fix
> > >g++.dg/template/defarg6.C
> > > but it looks like I was overzealous (the comment is correct, but
> > > it's
> > > papering over a problem).
> > > 
> > > It turns out that type_list_equal is doing more than just pointer
> > > equality; it's hitting the simple_cst_equal part of the && at
> > > line
> > > 7071:
> > > 
> > > 7063  bool
> > > 7064  type_list_equal (const_tree l1, const_tree l2)
> > > 7065  {
> > > 7066const_tree t1, t2;
> > > 7067  
> > > 7068for (t1 = l1, t2 = l2; t1 && t2; t1 = TREE_CHAIN
> > > (t1),
> > > t2 = TREE_CHAIN (t2))
> > > 7069  if (TREE_VALUE (t1) != TREE_VALUE (t2)
> > > 7070  || (TREE_PURPOSE (t1) != TREE_PURPOSE (t2)
> > > 7071  && ! (1 == simple_cst_equal (TREE_PURPOSE
> > > (t1), TREE_PURPOSE (t2))
> > > 7072&& (TREE_TYPE (TREE_PURPOSE (t1))
> > > 7073== TREE_TYPE (TREE_PURPOSE
> > > (t2))
> > > 7074return false;
> > > 7075  
> > > 7076return t1 == t2;
> > > 7077  }
> > > 
> > > What's happening is that there are two different functions with
> > > identical types apart from the locations of their (equal) default
> > > arguments: both of the TREE_PURPOSEs are NON_LVALUE_EXPR wrappers
> > > around a CONST_DECL enum value (at different source locations).
> > > 
> > > simple_cst_equal is stripping the location wrappers here:
> > > 
> > > 7311if (CONVERT_EXPR_CODE_P (code1) || code1 ==
> > > NON_LVALUE_EXPR)
> > > 7312  {
> > > 7313if (CONVERT_EXPR_CODE_P (code2)
> > > 7314|| code2 == NON_LVALUE_EXPR)
> > > 7315  return simple_cst_equal (TREE_OPERAND (t1,
> > > 0),
> > > TREE_OPERAND (t2, 0));
> > > 7316else
> > > 7317  return simple_cst_equal (TREE_OPERAND (t1,
> > > 0),
> > > t2);
> > > 7318  }
> > > 
> > > and thus finds them to be equal; the iteration in type_list_equal
> > > continues, and runs out of parameters with t1 == t2 == NULL, and
> > > thus
> > > returns true, and thus the two function types hash to the same
> > > slot,
> > > and the two function types get treated as being the same.
> > > 
> > > It's not clear to me yet what the best solution to this is:
> > > - should simple_cst_equal regard different source locations as
> > > being
> > > different?
> > > - should function-type hashing use a custom version of
> > > type_list_equal
> > > when comparing params, and make different source locations of
> > > default
> > > args be different?
> > > - something else?
> > > 
> > > Dave
> > 
> > I tried both of the above approaches, and both work.
> > 
> > Here's v6 of the patch:
> > 
> > I removed the strip of wrappers in
> > cp_parser_late_parsing_default_args
> > from earlier versions of the patch, in favor of fixing
> > simple_cst_equal
> > so that it treats location wrappers with unequal source locations
> > as
> > being unequal.  This ensures that function-types with default
> > arguments
> > don't get merged when the default argument constants have different
> > spelling locations.  [I have an alternative patch which instead
> > introduces a different comparator for FUNCTION_TYPE's
> > TYPE_ARG_TYPES
> > within type_cache_hasher::equal, almost identical to
> > type_list_equal,
> > but adding the requirement that  location wrappers around default
> > arguments have equal source location for the params to be
> > considered
> > equal; both patches pass bootstrap®ression testing]
> > 
> > Doing so leads to the reported location for the bad default
> > argument
> > within a template in g++.dg/template/defarg6.C moving to the
> > argument
> > location.  Previously, the callsite of the instantiation was
> > identified
> > due to the use of input_location in convert_like_real here:
> > 
> > 6816  location_t loc = cp_expr_loc_or_loc (expr,
> > input_location);
>

[openacc] Make oacc_fn_attrib_level external

2018-12-19 Thread Tom de Vries
[ was: Fwd: [openacc, committed] Add oacc_get_default_dim ]

On 19-12-18 16:27, Tom de Vries wrote:
> [ Adding gcc-patches ]
> 
>  Forwarded Message 
> Subject: [openacc, committed] Add oacc_get_default_dim
> Date: Wed, 19 Dec 2018 16:24:25 +0100
> From: Tom de Vries 
> To: Thomas Schwinge 
> 
> [ was: Re: [nvptx] vector length patch series -- openacc parts ]
> 
> On 19-12-18 11:40, Thomas Schwinge wrote:
>> Hi Tom!
>>
>> Thanks for picking up this series!
>>
>>
>> And just to note:
>>
>> On Tue, 18 Dec 2018 00:52:30 +0100, Tom de Vries  wrote:
>>> On 14-12-18 20:58, Tom de Vries wrote:
>>>
 0003-openacc-Add-target-hook-TARGET_GOACC_ADJUST_PARALLEL.patch
>>>
 0017-nvptx-Enable-large-vectors.patch
>>>
 0023-nvptx-Force-vl32-if-calling-vector-partitionable-rou.patch
>>>
>>> Thomas,
>>>
>>> these patches are openacc (0003) or have openacc components (0017, 0023).
>>>
>>> Can you review and possibly approve the openacc parts?
>>
>> I've seen this (and your earlier questions), and will get to it
>> eventually, thanks.
>>
>>
> 
> In that case, let's make the review for the IMO trivial bits post-commit.
> 
> Committed the openacc component of 0017 ...
> 

... and of 0023.

Thanks,
- Tom
[openacc] Make oacc_fn_attrib_level external

Expose oacc_fn_attrib_level to be used in backends.

2018-12-17  Tom de Vries  

	* omp-offload.c (oacc_fn_attrib_level): Remove static.
	* omp-offload.h (oacc_fn_attrib_level): Declare.

---
 gcc/omp-offload.c | 2 +-
 gcc/omp-offload.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 9c7bd7328d1..a220b4b9982 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -88,7 +88,7 @@ vec *offload_funcs, *offload_vars;
 /* Return level at which oacc routine may spawn a partitioned loop, or
-1 if it is not a routine (i.e. is an offload fn).  */
 
-static int
+int
 oacc_fn_attrib_level (tree attr)
 {
   tree pos = TREE_VALUE (attr);
diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h
index 14edcad8a7d..176c4da7e88 100644
--- a/gcc/omp-offload.h
+++ b/gcc/omp-offload.h
@@ -23,6 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 #define GCC_OMP_DEVICE_H
 
 extern int oacc_get_default_dim (int dim);
+extern int oacc_fn_attrib_level (tree attr);
 
 extern GTY(()) vec *offload_funcs;
 extern GTY(()) vec *offload_vars;


Fwd: [openacc, committed] Add oacc_get_default_dim

2018-12-19 Thread Tom de Vries
[ Adding gcc-patches ]

 Forwarded Message 
Subject: [openacc, committed] Add oacc_get_default_dim
Date: Wed, 19 Dec 2018 16:24:25 +0100
From: Tom de Vries 
To: Thomas Schwinge 

[ was: Re: [nvptx] vector length patch series -- openacc parts ]

On 19-12-18 11:40, Thomas Schwinge wrote:
> Hi Tom!
> 
> Thanks for picking up this series!
> 
> 
> And just to note:
> 
> On Tue, 18 Dec 2018 00:52:30 +0100, Tom de Vries  wrote:
>> On 14-12-18 20:58, Tom de Vries wrote:
>>
>>> 0003-openacc-Add-target-hook-TARGET_GOACC_ADJUST_PARALLEL.patch
>>
>>> 0017-nvptx-Enable-large-vectors.patch
>>
>>> 0023-nvptx-Force-vl32-if-calling-vector-partitionable-rou.patch
>>
>> Thomas,
>>
>> these patches are openacc (0003) or have openacc components (0017, 0023).
>>
>> Can you review and possibly approve the openacc parts?
> 
> I've seen this (and your earlier questions), and will get to it
> eventually, thanks.
> 
> 

In that case, let's make the review for the IMO trivial bits post-commit.

Committed the openacc component of 0017 ...

Thanks,
- Tom

[openacc] Add oacc_get_default_dim

Expose oacc_default_dims to backends.

2018-12-17  Tom de Vries  

	* omp-offload.c (oacc_get_default_dim): New function.
	* omp-offload.h (oacc_get_default_dim): Declare.

---
 gcc/omp-offload.c | 7 +++
 gcc/omp-offload.h | 2 ++
 2 files changed, 9 insertions(+)

diff --git a/gcc/omp-offload.c b/gcc/omp-offload.c
index 4457e1a3079..9c7bd7328d1 100644
--- a/gcc/omp-offload.c
+++ b/gcc/omp-offload.c
@@ -573,6 +573,13 @@ oacc_xform_tile (gcall *call)
 static int oacc_default_dims[GOMP_DIM_MAX];
 static int oacc_min_dims[GOMP_DIM_MAX];
 
+int
+oacc_get_default_dim (int dim)
+{
+  gcc_assert (0 <= dim && dim < GOMP_DIM_MAX);
+  return oacc_default_dims[dim];
+}
+
 /* Parse the default dimension parameter.  This is a set of
:-separated optional compute dimensions.  Each specified dimension
is a positive integer.  When device type support is added, it is
diff --git a/gcc/omp-offload.h b/gcc/omp-offload.h
index 6186f03649e..14edcad8a7d 100644
--- a/gcc/omp-offload.h
+++ b/gcc/omp-offload.h
@@ -22,6 +22,8 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef GCC_OMP_DEVICE_H
 #define GCC_OMP_DEVICE_H
 
+extern int oacc_get_default_dim (int dim);
+
 extern GTY(()) vec *offload_funcs;
 extern GTY(()) vec *offload_vars;
 



Re: [PATCH, libgcc/ARM & testsuite] Optimize executable size when using softfloat fmul/dmul

2018-12-19 Thread Richard Earnshaw (lists)
On 14/12/2018 21:09, Thomas Preudhomme wrote:
> Hi Richard,
> 
> None, is there any? All the one I could find in the big switch
> selecting tm_files and tmake_files in gcc/config.gcc are including
> arm/elf.h. I tried to build for arm-wince-pe but got: "Configuration
> arm-wince-pe not supported". However note that to guarantee correct
> results the only requirement is to support global symbol overriding
> weak symbol correctly and I see .weak usage in many other libgcc
> backend (eg. i386). The "take the first definition resolving an
> undefined reference and ignore the one in following object of a static
> library" is only to benefit from the size optimization.
> 

I'd forgotten that the last vestiges of all non-elf variants for Arm had
been removed some time back.  Never mind, then; clearly that can't be a
concern...

On the rest of the patch, this is OK.  I'm not entirely sure that what
you've done will reliably work in terms of guaranteeing to find the weak
definition first within the same library, but at least it should be safe.

R.

> Best regards,
> 
> Thomas
> On Fri, 7 Dec 2018 at 14:14, Richard Earnshaw (lists)
>  wrote:
>>
>> On 19/11/2018 09:57, Thomas Preudhomme wrote:
>>> Softfloat single precision and double precision floating-point
>>> multiplication routines in libgcc share some code with the
>>> floating-point division of their corresponding precision. As the code
>>> is structured now, this leads to *all* division code being pulled in an
>>> executable in softfloat mode even if only multiplication is
>>> performed.
>>>
>>> This patch create some new LIB1ASMFUNCS macros to also build files with
>>> just the multiplication and shared code as weak symbols. By putting
>>> these earlier in the static library, they can then be picked up when
>>> only multiplication is used and they are overriden by the global
>>> definition in the existing file containing both multiplication and
>>> division code when division is needed.
>>>
>>> The patch also removes changes made to the FUNC_START and ARM_FUNC_START
>>> macros in r218124 since the intent was to put multiplication and
>>> division code into their own section in a later patch to achieve the
>>> same size optimization. That approach relied on specific section layout
>>> to ensure multiplication and division were not too far from the shared
>>> bit of code in order to the branches to be within range. Due to lack of
>>> guarantee regarding section layout, in particular with all the
>>> possibility of linker scripts, this approach was chosen instead. This
>>> patch keeps the two testcases that were posted by Tony Wang (an Arm
>>> employee at the time) on the mailing list to implement this approach
>>> and adds a new one, hence the attribution.
>>>
>>> ChangeLog entries are as follows:
>>>
>>> *** gcc/ChangeLog ***
>>>
>>> 2018-11-14  Thomas Preud'homme  
>>>
>>> * config/arm/elf.h: Update comment about condition that need to
>>> match with libgcc/config/arm/lib1funcs.S to also include
>>> libgcc/config/arm/t-arm.
>>> * doc/sourcebuild.texi (output-exists, output-exists-not): Rename
>>> subsubsection these directives are in to "Check for output files".
>>> Move scan-symbol to that section and add to it new scan-symbol-not
>>> directive.
>>>
>>> *** gcc/testsuite/ChangeLog ***
>>>
>>> 2018-11-16  Tony Wang  
>>> Thomas Preud'homme  
>>>
>>> * lib/lto.exp (lto-execute): Define output_file and testname_with_flags
>>> to same value as execname.
>>> (scan-symbol): Move and rename to ...
>>> * lib/gcc-dg.exp (scan-symbol-common): This.  Adapt into a
>>> helper function returning true or false if a symbol is present.
>>> (scan-symbol): New procedure.
>>> (scan-symbol-not): Likewise.
>>> * gcc.target/arm/size-optimization-ieee-1.c: New testcase.
>>> * gcc.target/arm/size-optimization-ieee-2.c: Likewise.
>>> * gcc.target/arm/size-optimization-ieee-3.c: Likewise.
>>>
>>> *** libgcc/ChangeLog ***
>>>
>>> 2018-11-16  Thomas Preud'homme  
>>>
>>> * /config/arm/lib1funcs.S (FUNC_START): Remove unused sp_section
>>> parameter and corresponding code.
>>> (ARM_FUNC_START): Likewise in both definitions.
>>> Also update footer comment about condition that need to match with
>>> gcc/config/arm/elf.h to also include libgcc/config/arm/t-arm.
>>> * config/arm/ieee754-df.S (muldf3): Also build it if L_arm_muldf3 is
>>> defined.  Weakly define it in this case.
>>> * config/arm/ieee754-sf.S (mulsf3): Likewise with L_arm_mulsf3.
>>> * config/arm/t-elf (LIB1ASMFUNCS): Build _arm_muldf3.o and
>>> _arm_mulsf3.o before muldiv versions if targeting Thumb-1 only. Add
>>> comment to keep condition in sync with the one in
>>> libgcc/config/arm/lib1funcs.S and gcc/config/arm/elf.h.
>>>
>>> Testing: Bootstrapped on arm-linux-gnueabihf (Arm & Thumb-2) and
>>> testsuite shows no
>>> regression. Also built an arm-none-eabi cross compiler targeting
>>> soft-float whi

Re: [PATCH, ARM] Do softfloat when -mfpu set, -mfloat-abi=softfp and targeting Thumb-1

2018-12-19 Thread Thomas Preudhomme
Good catch.

Committed patch in attachment. Best regards,

Thomas
On Wed, 19 Dec 2018 at 14:13, Richard Earnshaw (lists)
 wrote:
>
> On 14/12/2018 21:15, Thomas Preudhomme wrote:
> > Hi Richard,
> >
> > Thanks for catching the problem with this approach. Hopefully this
> > version should solve the real problem:
> >
> >
> > FP instructions are only enabled for TARGET_32BIT and TARGET_HARD_FLOAT
> > but GCC only gives an error when TARGET_HARD_FLOAT is true and -mfpu is
> > not set. Among other things, it makes some of the cmse tests (eg.
> > gcc.target/arm/cmse/baseline/softfp.c) fail when targeting
> > -march=armv8-m.base -mcmse -mfpu= -mfloat-abi=softfp. This
> > patch adds an extra check for TARGET_32BIT to TARGET_HARD_FLOAT such
> > that it is false on TARGET_THUMB1 targets even when a FPU is specified.
> >
> > ChangeLog entries are as follows:
> >
> > *** gcc/ChangeLog ***
> >
> > 2018-12-14  thomas Preud'homme  
> >
> > * config/arm/arm.h (TARGET_HARD_FLOAT): Restrict to TARGET_32BIT
> > targets.
>
> Yes, this is better.  And with this change, I think this line:
>
>   if (TARGET_HARD_FLOAT && !TARGET_THUMB1)
>
> in output_return_instruction() can be collapsed into simply
>
>
> if (TARGET_HARD_FLOAT)
>
> OK with that change.
>
> R.
>
> >
> > *** gcc/testsuite/ChangeLog ***
> >
> > 2018-12-14  thomas Preud'homme  
> >
> > * gcc.target/arm/cmse/baseline/softfp.c: Force an FPU.
> >
> > Testing: No testsuite regression when targeting arm-none-eabi Armv6S-M
> > with -mfloat-abi=softfp
> >
> > Is this ok for stage3?
> >
> > Best regards,
> >
> > Thomas
> >
> > On Thu, 29 Nov 2018 at 14:52, Richard Earnshaw (lists)
> >  wrote:
> >>
> >> On 29/11/2018 10:51, Thomas Preudhomme wrote:
> >>> Hi,
> >>>
> >>> FP instructions are only enabled for TARGET_32BIT and TARGET_HARD_FLOAT
> >>> but GCC only gives an error when TARGET_HARD_FLOAT is true and -mfpu is
> >>> not set. Among other things, it makes some of the cmse tests (eg.
> >>> gcc.target/arm/cmse/baseline/softfp.c) fail when targeting
> >>> -march=armv8-m.base -mfpu= -mfloat-abi=softfp. This patch
> >>> errors out when a Thumb-1 -like target is selected and a FPU is
> >>> specified, thus making such tests being skipped.
> >>>
> >>> ChangeLog entries are as follows:
> >>>
> >>> *** gcc/ChangeLog ***
> >>>
> >>> 2018-11-28  thomas Preud'homme  
> >>>
> >>> * config/arm/arm.c (arm_options_perform_arch_sanity_checks): Error out
> >>> if targeting Thumb-1 with an FPU specified.
> >>>
> >>> *** gcc/testsuite/ChangeLog ***
> >>>
> >>> 2018-11-28  thomas Preud'homme  
> >>>
> >>> * gcc.target/arm/thumb1_mfpu-1.c: New testcase.
> >>> * gcc.target/arm/thumb1_mfpu-2.c: Likewise.
> >>>
> >>> Testing: No testsuite regression when targeting arm-none-eabi Armv6S-M.
> >>> Fails as expected when targeting Armv6-M with an -mfpu or a default FPU.
> >>> Succeeds without.
> >>>
> >>> Is this ok for stage3?
> >>>
> >>
> >> This doesn't sound right.  Specifically this bit...
> >>
> >> +  else if (TARGET_THUMB1
> >> +  && bitmap_bit_p (arm_active_target.isa, isa_bit_vfpv2))
> >> +   error ("Thumb-1 does not allow FP instructions");
> >>
> >> If I use
> >>
> >> -mcpu=arm1176jzf-s -mfpu=auto -mfloat-abi=softfp -mthumb
> >>
> >> then that shouldn't error, since softfp and thumb is, in reality, just
> >> float-abi=soft (as there are no fp instructions in thumb).  We also want
> >> it to work this way so that I can add the thumb/arm attribute to
> >> specific functions and have the compiler use HW float instructions when
> >> they are suitable.
> >>
> >>
> >> R.
> >>
> >>> Best regards,
> >>>
> >>> Thomas
> >>>
> >>>
> >>> thumb1_mfpu_error.patch
> >>>
> >>> From 051e38552d7c596873e0303f6ec4272b26d50900 Mon Sep 17 00:00:00 2001
> >>> From: Thomas Preud'homme 
> >>> Date: Tue, 27 Nov 2018 15:52:38 +
> >>> Subject: [PATCH] [PATCH, ARM] Error out when -mfpu set and targeting 
> >>> Thumb-1
> >>>
> >>> Hi,
> >>>
> >>> FP instructions are only enabled for TARGET_32BIT and TARGET_HARD_FLOAT
> >>> but GCC only gives an error when TARGET_HARD_FLOAT is true and -mfpu is
> >>> not set. Among other things, it makes some of the cmse tests (eg.
> >>> gcc.target/arm/cmse/baseline/softfp.c) fail when targeting
> >>> -march=armv8-m.base -mfpu= -mfloat-abi=softfp. This patch
> >>> errors out when a Thumb-1 -like target is selected and a FPU is
> >>> specified, thus making such tests being skipped.
> >>>
> >>> ChangeLog entries are as follows:
> >>>
> >>> *** gcc/ChangeLog ***
> >>>
> >>> 2018-11-28  thomas Preud'homme  
> >>>
> >>>   * config/arm/arm.c (arm_options_perform_arch_sanity_checks): Error 
> >>> out
> >>>   if targeting Thumb-1 with an FPU specified.
> >>>
> >>> *** gcc/testsuite/ChangeLog ***
> >>>
> >>> 2018-11-28  thomas Preud'homme  
> >>>
> >>>   * gcc.target/arm/thumb1_mfpu-1.c: New testcase.
> >>>   * gcc.target/arm/thumb1_mfpu-2.c: Likewise.
> >>>
> >>> Testing: No testsuite regressi

Re: V8 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-19 Thread H.J. Lu
On Tue, Dec 18, 2018 at 7:19 PM Sandra Loosemore
 wrote:
>
> On 12/18/18 2:12 PM, H.J. Lu wrote:
> >
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index ac2ee59d92c..47f2fc3f518 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -358,6 +358,7 @@ Objective-C and Objective-C++ Dialects}.
> >  -Wuseless-cast  -Wvariadic-macros  -Wvector-operation-performance @gol
> >  -Wvla  -Wvla-larger-than=@var{byte-size}  -Wvolatile-register-var @gol
> >  -Wwrite-strings @gol
> > +-Waddress-of-packed-member @gol
> >  -Wzero-as-null-pointer-constant  -Whsa}
> >
> >  @item C and Objective-C-only Warning Options
>
> Minor documentation nit:  it looks like some effort has been made to
> alphabetize that list.  Can you please put -Waddress-of-packed member in
> the right place, and also fix the misplaced -Whsa at the end?
>

I am applying

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 47f2fc3f518..14365fba501 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -281,7 +281,8 @@ Objective-C and Objective-C++ Dialects}.
 @xref{Warning Options,,Options to Request or Suppress Warnings}.
 @gccoptlist{-fsyntax-only  -fmax-errors=@var{n}  -Wpedantic @gol
 -pedantic-errors @gol
--w  -Wextra  -Wall  -Waddress  -Waggregate-return  -Waligned-new @gol
+-w  -Wextra  -Wall  -Waddress  -Waddress-of-packed-member @gol
+-Waggregate-return  -Waligned-new @gol
 -Walloc-zero  -Walloc-size-larger-than=@var{byte-size} @gol
 -Walloca  -Walloca-larger-than=@var{byte-size} @gol
 -Wno-aggressive-loop-optimizations  -Warray-bounds  -Warray-bounds=@var{n} @gol
@@ -310,7 +311,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wformat-y2k  -Wframe-address @gol
 -Wframe-larger-than=@var{byte-size}  -Wno-free-nonheap-object @gol
 -Wjump-misses-init @gol
--Wif-not-aligned @gol
+-Whsa  -Wif-not-aligned @gol
 -Wignored-qualifiers  -Wignored-attributes  -Wincompatible-pointer-types @gol
 -Wimplicit  -Wimplicit-fallthrough  -Wimplicit-fallthrough=@var{n} @gol
 -Wimplicit-function-declaration  -Wimplicit-int @gol
@@ -358,8 +359,7 @@ Objective-C and Objective-C++ Dialects}.
 -Wuseless-cast  -Wvariadic-macros  -Wvector-operation-performance @gol
 -Wvla  -Wvla-larger-than=@var{byte-size}  -Wvolatile-register-var @gol
 -Wwrite-strings @gol
--Waddress-of-packed-member @gol
--Wzero-as-null-pointer-constant  -Whsa}
+-Wzero-as-null-pointer-constant}

 @item C and Objective-C-only Warning Options
 @gccoptlist{-Wbad-function-cast  -Wmissing-declarations @gol

Thanks.

-- 
H.J.


Re: V8 [PATCH] C/C++: Add -Waddress-of-packed-member

2018-12-19 Thread H.J. Lu
On Tue, Dec 18, 2018 at 2:14 PM Jason Merrill  wrote:
>
> On 12/18/18 4:12 PM, H.J. Lu wrote:
> > On Tue, Dec 18, 2018 at 12:36 PM Jason Merrill  wrote:
> >>
> >> On 12/18/18 9:10 AM, H.J. Lu wrote:
> >>> +  switch (TREE_CODE (rhs))
> >>> +{
> >>> +case ADDR_EXPR:
> >>> +  base = TREE_OPERAND (rhs, 0);
> >>> +  while (handled_component_p (base))
> >>> + {
> >>> +   if (TREE_CODE (base) == COMPONENT_REF)
> >>> + break;
> >>> +   base = TREE_OPERAND (base, 0);
> >>> + }
> >>> +  if (TREE_CODE (base) != COMPONENT_REF)
> >>> + return NULL_TREE;
> >>> +  object = TREE_OPERAND (base, 0);
> >>> +  field = TREE_OPERAND (base, 1);
> >>> +  break;
> >>> +case COMPONENT_REF:
> >>> +  object = TREE_OPERAND (rhs, 0);
> >>> +  field = TREE_OPERAND (rhs, 1);
> >>> +  break;
> >>> +default:
> >>> +  return NULL_TREE;
> >>> +}
> >>> +
> >>> +  tree context = check_alignment_of_packed_member (type, field);
> >>> +  if (context)
> >>> +return context;
> >>> +
> >>> +  /* Check alignment of the object.  */
> >>> +  while (TREE_CODE (object) == COMPONENT_REF)
> >>> +{
> >>> +  field = TREE_OPERAND (object, 1);
> >>> +  context = check_alignment_of_packed_member (type, field);
> >>> +  if (context)
> >>> + return context;
> >>> +  object = TREE_OPERAND (object, 0);
> >>> +}
> >>> +
> >>
> >> You can see interleaved COMPONENT_REF and ARRAY_REF that this still
> >> doesn't look like it will handle, something like
> >>
> >> struct A
> >> {
> >> int i;
> >> };
> >>
> >> struct B
> >> {
> >> char c;
> >> __attribute ((packed)) A ar[4];
> >> };
> >>
> >> B b;
> >>
> >> int *p = &b.ar[1].i;
> >>
> >> Rather than have a loop in the ADDR_EXPR case of the switch, you can
> >> handle everything in the lower loop.  And not have a switch at all, just
> >> strip any ADDR_EXPR before the loop.
> >
> > I changed it to
> >
> >   if (TREE_CODE (rhs) == ADDR_EXPR)
> >  rhs = TREE_OPERAND (rhs, 0);
> >while (handled_component_p (rhs))
> >  {
> >if (TREE_CODE (rhs) == COMPONENT_REF)
> >  break;
> >rhs = TREE_OPERAND (rhs, 0);
> >  }
> >
> >if (TREE_CODE (rhs) != COMPONENT_REF)
> >  return NULL_TREE;
> >
> >object = TREE_OPERAND (rhs, 0);
> >field = TREE_OPERAND (rhs, 1);
>
> That still doesn't warn about my testcase above.
>
> > [hjl@gnu-cfl-1 pr51628-6]$ cat a.i
> > struct A
> > {
> > int i;
> > } __attribute ((packed));
> >
> > struct B
> > {
> > char c;
> > struct A ar[4];
> > };
> >
> > struct B b;
> >
> > int *p = &b.ar[1].i;
>
> This testcase is importantly different because 'i' is packed, whereas in
> my testcase only the ar member of B is packed.
>
> My suggestion was that this loop:
>
> > +  /* Check alignment of the object.  */
> > +  while (TREE_CODE (object) == COMPONENT_REF)
> > +{
> > +  field = TREE_OPERAND (object, 1);
> > +  context = check_alignment_of_packed_member (type, field);
> > +  if (context)
> > + return context;
> > +  object = TREE_OPERAND (object, 0);
> > +}
>
> could loop over all handled_component_p, but only call
> check_alignment_of_packed_member for COMPONENT_REF.

Thanks for the hint.  I changed it to

  /* Check alignment of the object.  */
  while (handled_component_p (object))
{
  if (TREE_CODE (object) == COMPONENT_REF)
{
  do
{
  field = TREE_OPERAND (object, 1);
  context = check_alignment_of_packed_member (type, field);
  if (context)
return context;
  object = TREE_OPERAND (object, 0);
}
  while (TREE_CODE (object) == COMPONENT_REF);
}
  else
object = TREE_OPERAND (object, 0);
}

> > +  if (TREE_CODE (rhs) != COND_EXPR)
> > +{
> > +  while (TREE_CODE (rhs) == COMPOUND_EXPR)
> > + rhs = TREE_OPERAND (rhs, 1);
>
> What if you have a COND_EXPR inside a COMPOUND_EXPR?
>

It works for me:

[hjl@gnu-cfl-1 pr51628-5]$ cat c.i
struct A {
  int i;
} __attribute__ ((packed));

int*
foo3 (struct A *p1, int **q1, int *q2, int *q3, struct A *p2)
{
  return q1 ? (*q1 = 1, &p1->i) : (q2 ? (*q1 = &p1->i, *q2 = 2, &p2->i): q2);
}
[hjl@gnu-cfl-1 pr51628-5]$
/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/xgcc
-B/export/build/gnu/tools-build/gcc-debug/build-x86_64-linux/gcc/ -O2
-S c.i
c.i: In function \u2018foo3\u2019:
c.i:8:20: warning: assignment to \u2018int *\u2019 from
\u2018int\u2019 makes pointer from integer without a cast
[-Wint-conversion]
8 |   return q1 ? (*q1 = 1, &p1->i) : (q2 ? (*q1 = &p1->i, *q2 =
2, &p2->i): q2);
  |^
c.i:8:48: warning: taking address of packed member of \u2018struct
A\u2019 may result in an unaligned pointer value
[-Waddress-of-packed-member]
8 |   return q1 ? (*q1 = 1, &p1->i) : (q2 ? (*q1 = &p1->i, *q2 =
2, &p2->i): q2);
  |

Re: [C++ Patch] [PR c++/88146] do not crash synthesizing inherited ctor(...)

2018-12-19 Thread Christophe Lyon
On Sat, 15 Dec 2018 at 23:11, Jason Merrill  wrote:
>
> On Fri, Dec 14, 2018 at 6:05 PM Alexandre Oliva  wrote:
> >
> > On Dec 14, 2018, Jason Merrill  wrote:
> >
> > > Let's move the initialization of "fields" inside the 'then' block here
> > > with the initialization of "cvquals", rather than clear it in the
> > > 'else'.
> >
> > We'd still have to NULL-initialize it somewhere, so I'd rather just move
> > the entire loop into the conditional, and narrow the scope of variables
> > only used within the loop, like this.  The full patch below is very hard
> > to read because of the reindentation, so here's a diff -b.
> >
> > diff --git a/gcc/cp/method.c b/gcc/cp/method.c
> > index fd023e200538..17404a65b0fd 100644
> > --- a/gcc/cp/method.c
> > +++ b/gcc/cp/method.c
> > @@ -675,12 +675,9 @@ do_build_copy_constructor (tree fndecl)
> >  }
> >else
> >  {
> > -  tree fields = TYPE_FIELDS (current_class_type);
> >tree member_init_list = NULL_TREE;
> > -  int cvquals = cp_type_quals (TREE_TYPE (parm));
> >int i;
> >tree binfo, base_binfo;
> > -  tree init;
> >vec *vbases;
> >
> >/* Initialize all the base-classes with the parameter converted
> > @@ -704,15 +701,18 @@ do_build_copy_constructor (tree fndecl)
> > inh, member_init_list);
> > }
> >
> > -  for (; fields; fields = DECL_CHAIN (fields))
> > +  if (!inh)
> > +   {
> > + int cvquals = cp_type_quals (TREE_TYPE (parm));
> > +
> > + for (tree fields = TYPE_FIELDS (current_class_type);
> > +  fields; fields = DECL_CHAIN (fields))
> > {
> >   tree field = fields;
> >   tree expr_type;
> >
> >   if (TREE_CODE (field) != FIELD_DECL)
> > continue;
> > - if (inh)
> > -   continue;
> >
> >   expr_type = TREE_TYPE (field);
> >   if (DECL_NAME (field))
> > @@ -742,7 +742,7 @@ do_build_copy_constructor (tree fndecl)
> >   expr_type = cp_build_qualified_type (expr_type, quals);
> > }
> >
> > - init = build3 (COMPONENT_REF, expr_type, parm, field, NULL_TREE);
> > + tree init = build3 (COMPONENT_REF, expr_type, parm, field, 
> > NULL_TREE);
> >   if (move_p && !TYPE_REF_P (expr_type)
> >   /* 'move' breaks bit-fields, and has no effect for 
> > scalars.  */
> >   && !scalarish_type_p (expr_type))
> > @@ -751,6 +751,8 @@ do_build_copy_constructor (tree fndecl)
> >
> >   member_init_list = tree_cons (field, init, member_init_list);
> > }
> > +   }
> > +
> >finish_mem_initializers (member_init_list);
> >  }
> >  }
> > @@ -891,6 +893,7 @@ synthesize_method (tree fndecl)
> >
> >/* Reset the source location, we might have been previously
> >   deferred, and thus have saved where we were first needed.  */
> > +  if (!DECL_INHERITED_CTOR (fndecl))
> >  DECL_SOURCE_LOCATION (fndecl)
> >= DECL_SOURCE_LOCATION (TYPE_NAME (DECL_CONTEXT (fndecl)));
> >
> >
> > Is this OK too?  (pending regstrapping)
>
> Yes, thanks.
>

Hi,

The new test inh-ctor32.C fails on arm:
FAIL:g++.dg/cpp0x/inh-ctor32.C  -std=c++14  (test for warnings, line 208)
FAIL:g++.dg/cpp0x/inh-ctor32.C  -std=c++17  (test for warnings, line 208)

The log has:

/gcc/testsuite/g++.dg/cpp0x/inh-ctor32.C: In instantiation of
'constexpr 
derived_ctor::inherited_derived_ctor::constexpr_noninherited_ctor::bor::bor(T
...) [with T = {int, int}]':
/gcc/testsuite/g++.dg/cpp0x/inh-ctor32.C:206:13:   required from
'constexpr 
derived_ctor::inherited_derived_ctor::constexpr_noninherited_ctor::bar::bar(T
...) [with T = {int, int}][inherited from
derived_ctor::inherited_derived_ctor::constexpr_noninherited_ctor::bor]'
/gcc/testsuite/g++.dg/cpp0x/inh-ctor32.C:208:42:   required from here

Christophe


> Jason


[committed][nvptx] Commit passing pr85381-*.c test-cases

2018-12-19 Thread Tom de Vries
Hi,

Add pr85381*.c test-cases that are already passing without the fix for PR85381.

Build and reg-tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx] Commit passing pr85381-*.c test-cases

2018-12-19  Tom de Vries  

* testsuite/libgomp.oacc-c-c++-common/pr85381-2.c: New test.
* testsuite/libgomp.oacc-c-c++-common/pr85381-3.c: New test.
* testsuite/libgomp.oacc-c-c++-common/pr85381-4.c: New test.

---
 .../libgomp.oacc-c-c++-common/pr85381-2.c  | 36 ++
 .../libgomp.oacc-c-c++-common/pr85381-3.c  | 35 +
 .../libgomp.oacc-c-c++-common/pr85381-4.c  | 27 
 3 files changed, 98 insertions(+)

diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c
new file mode 100644
index 000..6570c64afff
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-2.c
@@ -0,0 +1,36 @@
+/* { dg-additional-options "-save-temps" } */
+/* { dg-do run { target openacc_nvidia_accel_selected } }
+   { dg-skip-if "" { *-*-* } { "*" } { "-O2" } } */
+
+int
+main (void)
+{
+  int v1;
+
+  #pragma acc parallel
+  #pragma acc loop worker
+  for (v1 = 0; v1 < 20; v1 += 2)
+;
+
+  return 0;
+}
+
+/* Todo: Boths bar.syncs can be removed.
+   Atm we generate this dead code inbetween forked and joining:
+
+ mov.u32 %r28, %ntid.y;
+ mov.u32 %r29, %tid.y;
+ add.u32 %r30, %r29, %r29;
+ setp.gt.s32 %r31, %r30, 19;
+ @%r31   bra $L2;
+ add.u32 %r25, %r28, %r28;
+ mov.u32 %r24, %r30;
+ $L3:
+ add.u32 %r24, %r24, %r25;
+ setp.le.s32 %r33, %r24, 19;
+ @%r33   bra $L3;
+ $L2:
+
+   so the loop is not recognized as empty loop (which we detect by seeing if
+   joining immediately follows forked).  */
+/* { dg-final { scan-assembler-times "bar.sync" 2 } } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-3.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-3.c
new file mode 100644
index 000..c5d1c5add68
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-3.c
@@ -0,0 +1,35 @@
+/* { dg-additional-options "-save-temps -w" } */
+/* { dg-do run { target openacc_nvidia_accel_selected } }
+   { dg-skip-if "" { *-*-* } { "*" } { "-O2" } } */
+
+int a;
+#pragma acc declare create(a)
+
+#pragma acc routine vector
+void __attribute__((noinline, noclone))
+foo_v (void)
+{
+  a = 1;
+}
+
+#pragma acc routine worker
+void __attribute__((noinline, noclone))
+foo_w (void)
+{
+  a = 2;
+}
+
+int
+main (void)
+{
+
+  #pragma acc parallel
+  foo_v ();
+
+  #pragma acc parallel
+  foo_w ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler-not "bar.sync" } } */
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c
new file mode 100644
index 000..d955d79718d
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/pr85381-4.c
@@ -0,0 +1,27 @@
+/* { dg-additional-options "-save-temps -w" } */
+/* { dg-do run { target openacc_nvidia_accel_selected } }
+   { dg-skip-if "" { *-*-* } { "*" } { "-O2" } } */
+
+#define n 1024
+
+int
+main (void)
+{
+  #pragma acc parallel
+  {
+#pragma acc loop worker
+for (int i = 0; i < n; i++)
+  ;
+
+#pragma acc loop worker
+for (int i = 0; i < n; i++)
+  ;
+  }
+
+  return 0;
+}
+
+/* Atm, %ntid.y is broadcast from one loop to the next, so there are 2 
bar.syncs
+   for that (the other two are there for the same reason as in pr85381-2.c).
+   Todo: Recompute %ntid.y instead of broadcasting it. */
+/* { dg-final { scan-assembler-times "bar.sync" 4 } } */


[committed][nvptx, libgomp] Move rtl-dump test-cases to libgomp

2018-12-19 Thread Tom de Vries
Hi,

The goacc.exp test-cases nvptx-merged-loop.c and nvptx-sese-1.c are failing
during linking due to missing libgomp.spec.

Move them to the libgomp testsuite.

Build and reg-tested on x86_64 with nvptx accelerator.

Committed to trunk.

Thanks,
- Tom

[nvptx, libgomp] Move rtl-dump test-cases to libgomp

2018-12-19  Tom de Vries  

* gcc.dg/goacc/nvptx-merged-loop.c: Move to
libgomp/testsuite/libgomp.oacc-c-c++-common.
* gcc.dg/goacc/nvptx-sese-1.c: Same.

* testsuite/lib/libgomp.exp: Add load_lib of scanoffloadrtl.exp.
* testsuite/libgomp.oacc-c-c++-common/nvptx-merged-loop.c: Move from
gcc/testsuite/gcc.dg/goacc.
* testsuite/libgomp.oacc-c-c++-common/nvptx-sese-1.c: Same.

---
 libgomp/testsuite/lib/libgomp.exp | 1 +
 .../testsuite/libgomp.oacc-c-c++-common}/nvptx-merged-loop.c  | 8 
 .../testsuite/libgomp.oacc-c-c++-common}/nvptx-sese-1.c   | 8 
 3 files changed, 9 insertions(+), 8 deletions(-)

diff --git a/libgomp/testsuite/lib/libgomp.exp 
b/libgomp/testsuite/lib/libgomp.exp
index c41b3e6dc18..04738a9ce82 100644
--- a/libgomp/testsuite/lib/libgomp.exp
+++ b/libgomp/testsuite/lib/libgomp.exp
@@ -31,6 +31,7 @@ load_gcc_lib scanrtl.exp
 load_gcc_lib scantree.exp
 load_gcc_lib scanltranstree.exp
 load_gcc_lib scanoffloadtree.exp
+load_gcc_lib scanoffloadrtl.exp
 load_gcc_lib scanipa.exp
 load_gcc_lib scanwpaipa.exp
 load_gcc_lib timeout-dg.exp
diff --git a/gcc/testsuite/gcc.dg/goacc/nvptx-merged-loop.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/nvptx-merged-loop.c
similarity index 59%
rename from gcc/testsuite/gcc.dg/goacc/nvptx-merged-loop.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/nvptx-merged-loop.c
index 3ff537c1d97..8a2117e1624 100644
--- a/gcc/testsuite/gcc.dg/goacc/nvptx-merged-loop.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/nvptx-merged-loop.c
@@ -1,6 +1,6 @@
-/* { dg-do link } */
-/* { dg-require-effective-target offload_nvptx } */
-/* { dg-options "-fopenacc -O2 -foffload=-fdump-rtl-mach\\ -dumpbase\\ 
nvptx-merged-loop.c\\ -Wa,--no-verify" } */
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-options "-foffload=-fdump-rtl-mach" } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-O2" } } */
 
 #define N (32*32*32+17)
 void __attribute__ ((noinline)) Foo (int *ary)
@@ -27,4 +27,4 @@ int main ()
   return 0;
 }   
 
-/* { dg-final { scan-rtl-dump "Merging loop .* into " "mach" } } */
+/* { dg-final { scan-offload-rtl-dump "Merging loop .* into " "mach" } } */
diff --git a/gcc/testsuite/gcc.dg/goacc/nvptx-sese-1.c 
b/libgomp/testsuite/libgomp.oacc-c-c++-common/nvptx-sese-1.c
similarity index 63%
rename from gcc/testsuite/gcc.dg/goacc/nvptx-sese-1.c
rename to libgomp/testsuite/libgomp.oacc-c-c++-common/nvptx-sese-1.c
index 7e67fe78f06..9583265c775 100644
--- a/gcc/testsuite/gcc.dg/goacc/nvptx-sese-1.c
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/nvptx-sese-1.c
@@ -1,6 +1,6 @@
-/* { dg-do link } */
-/* { dg-require-effective-target offload_nvptx } */
-/* { dg-options "-fopenacc -O2 -foffload=-fdump-rtl-mach\\ -dumpbase\\ 
nvptx-sese-1.c\\ -Wa,--no-verify" } */
+/* { dg-do run { target openacc_nvidia_accel_selected } } */
+/* { dg-options "-foffload=-fdump-rtl-mach" } */
+/* { dg-skip-if "" { *-*-* } { "*" } { "-O2" } } */
 
 #pragma acc routine  seq
 int __attribute__((noinline)) foo (int x)
@@ -32,4 +32,4 @@ int main ()
 }
 
 /* Match {N->N(.N)+} */
-/* { dg-final { scan-rtl-dump "SESE regions:.* 
\[0-9\]+{\[0-9\]+->\[0-9\]+(\\.\[0-9\]+)+}" "mach" } } */
+/* { dg-final { scan-offload-rtl-dump "SESE regions:.* 
\[0-9\]+{\[0-9\]+->\[0-9\]+(\\.\[0-9\]+)+}" "mach" } } */


[committed][testsuite] Add scan-offload-rtl-dump

2018-12-19 Thread Tom de Vries
Hi,

This patch adds scan-offload-rtl-dump, similar to scan-offload-tree-dump.

Build and reg-tested on x86_64 with nvptx accelerator.

Pre-approved here ( https://gcc.gnu.org/ml/gcc-patches/2018-05/msg01089.html ).

Committed to trunk.

Thanks,
- Tom

[testsuite] Add scan-offload-rtl-dump

2018-03-28  Tom de Vries  

* lib/scanoffloadrtl.exp: New file.
* gcc.dg-selftests/dg-final.exp (dg_final_directive_check_num_args): Add
offload-rtl.

* doc/sourcebuild.texi (Commands for use in dg-final, Scan optimization
dump files): Add offload-rtl.

---
 gcc/doc/sourcebuild.texi|   3 +-
 gcc/testsuite/gcc.dg-selftests/dg-final.exp |   4 +-
 gcc/testsuite/lib/scanoffloadrtl.exp| 147 
 3 files changed, 152 insertions(+), 2 deletions(-)

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
index 224ab89921a..46ef388e109 100644
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -2644,7 +2644,8 @@ assembly output.
 @subsubsection Scan optimization dump files
 
 These commands are available for @var{kind} of @code{tree}, @code{ltrans-tree},
-@code{offload-tree}, @code{rtl}, @code{ipa}, and @code{wpa-ipa}.
+@code{offload-tree}, @code{rtl}, @code{offload-rtl}, @code{ipa}, and
+@code{wpa-ipa}.
 
 @table @code
 @item scan-@var{kind}-dump @var{regex} @var{suffix} [@{ target/xfail 
@var{selector} @}]
diff --git a/gcc/testsuite/gcc.dg-selftests/dg-final.exp 
b/gcc/testsuite/gcc.dg-selftests/dg-final.exp
index 1d98666e137..90a6e894abd 100644
--- a/gcc/testsuite/gcc.dg-selftests/dg-final.exp
+++ b/gcc/testsuite/gcc.dg-selftests/dg-final.exp
@@ -25,6 +25,7 @@ load_lib "scanasm.exp"
 load_lib "scanwpaipa.exp"
 load_lib "scanltranstree.exp"
 load_lib "scanoffloadtree.exp"
+load_lib "scanoffloadrtl.exp"
 load_lib "gcc-dg.exp"
 
 proc verify_call_1 { args } {
@@ -82,7 +83,8 @@ proc dg_final_directive_check_num_args {} {
verify_call $proc_name $too_few "too few arguments"
 }
 
-foreach kind [list "tree" "rtl" "ipa" "ltrans-tree" "wpa-ipa" 
"offload-tree"] {
+foreach kind [list "tree" "rtl" "ipa" "ltrans-tree" "wpa-ipa" \
+ "offload-tree" "offload-rtl"] {
verify_args scan-$kind-dump 2 3
verify_args scan-$kind-dump-times 3 4
verify_args scan-$kind-dump-not 2 3
diff --git a/gcc/testsuite/lib/scanoffloadrtl.exp 
b/gcc/testsuite/lib/scanoffloadrtl.exp
new file mode 100644
index 000..e836f6d27bb
--- /dev/null
+++ b/gcc/testsuite/lib/scanoffloadrtl.exp
@@ -0,0 +1,147 @@
+#   Copyright (C) 2018 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+# 
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+# 
+# You should have received a copy of the GNU General Public License
+# along with GCC; see the file COPYING3.  If not see
+# .
+
+# Various utilities for scanning offloading rtl dump output, used by
+# libgomp.exp.
+
+load_lib scandump.exp
+
+# Utility for scanning compiler result, invoked via dg-final.
+# Call pass if pattern is present, otherwise fail.
+#
+# Argument 0 is the regexp to match
+# Argument 1 is the name of the dumped rtl pass
+# Argument 2 handles expected failures and the like
+proc scan-offload-rtl-dump { args } {
+
+if { [llength $args] < 2 } {
+   error "scan-offload-rtl-dump: too few arguments"
+   return
+}
+if { [llength $args] > 3 } {
+   error "scan-offload-rtl-dump: too many arguments"
+   return
+}
+if { [llength $args] >= 3 } {
+   scan-dump "offload-rtl" [lindex $args 0] \
+ "\[0-9\]\[0-9\]\[0-9]r.[lindex $args 1]" ".o" \
+ [lindex $args 2]
+} else {
+   scan-dump "offload-rtl" [lindex $args 0] \
+ "\[0-9\]\[0-9\]\[0-9]r.[lindex $args 1]" ".o"
+}
+}
+
+# Call pass if pattern is present given number of times, otherwise fail.
+# Argument 0 is the regexp to match
+# Argument 1 is number of times the regexp must be found
+# Argument 2 is the name of the dumped rtl pass
+# Argument 3 handles expected failures and the like
+proc scan-offload-rtl-dump-times { args } {
+
+if { [llength $args] < 3 } {
+   error "scan-offload-rtl-dump-times: too few arguments"
+   return
+}
+if { [llength $args] > 4 } {
+   error "scan-offload-rtl-dump-times: too many arguments"
+   return
+}
+if { [llength $args] >= 4 } {
+   scan-dump-times "offload-rtl" [lindex $args 0] [lindex $args 1] \
+   "\[0-9\]\[0-9\]\[0-9]r.[lindex $args 2]" ".o" \
+  

Re: [PATCH, ARM] Do softfloat when -mfpu set, -mfloat-abi=softfp and targeting Thumb-1

2018-12-19 Thread Richard Earnshaw (lists)
On 14/12/2018 21:15, Thomas Preudhomme wrote:
> Hi Richard,
> 
> Thanks for catching the problem with this approach. Hopefully this
> version should solve the real problem:
> 
> 
> FP instructions are only enabled for TARGET_32BIT and TARGET_HARD_FLOAT
> but GCC only gives an error when TARGET_HARD_FLOAT is true and -mfpu is
> not set. Among other things, it makes some of the cmse tests (eg.
> gcc.target/arm/cmse/baseline/softfp.c) fail when targeting
> -march=armv8-m.base -mcmse -mfpu= -mfloat-abi=softfp. This
> patch adds an extra check for TARGET_32BIT to TARGET_HARD_FLOAT such
> that it is false on TARGET_THUMB1 targets even when a FPU is specified.
> 
> ChangeLog entries are as follows:
> 
> *** gcc/ChangeLog ***
> 
> 2018-12-14  thomas Preud'homme  
> 
> * config/arm/arm.h (TARGET_HARD_FLOAT): Restrict to TARGET_32BIT
> targets.

Yes, this is better.  And with this change, I think this line:

  if (TARGET_HARD_FLOAT && !TARGET_THUMB1)

in output_return_instruction() can be collapsed into simply


if (TARGET_HARD_FLOAT)

OK with that change.

R.

> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2018-12-14  thomas Preud'homme  
> 
> * gcc.target/arm/cmse/baseline/softfp.c: Force an FPU.
> 
> Testing: No testsuite regression when targeting arm-none-eabi Armv6S-M
> with -mfloat-abi=softfp
> 
> Is this ok for stage3?
> 
> Best regards,
> 
> Thomas
> 
> On Thu, 29 Nov 2018 at 14:52, Richard Earnshaw (lists)
>  wrote:
>>
>> On 29/11/2018 10:51, Thomas Preudhomme wrote:
>>> Hi,
>>>
>>> FP instructions are only enabled for TARGET_32BIT and TARGET_HARD_FLOAT
>>> but GCC only gives an error when TARGET_HARD_FLOAT is true and -mfpu is
>>> not set. Among other things, it makes some of the cmse tests (eg.
>>> gcc.target/arm/cmse/baseline/softfp.c) fail when targeting
>>> -march=armv8-m.base -mfpu= -mfloat-abi=softfp. This patch
>>> errors out when a Thumb-1 -like target is selected and a FPU is
>>> specified, thus making such tests being skipped.
>>>
>>> ChangeLog entries are as follows:
>>>
>>> *** gcc/ChangeLog ***
>>>
>>> 2018-11-28  thomas Preud'homme  
>>>
>>> * config/arm/arm.c (arm_options_perform_arch_sanity_checks): Error out
>>> if targeting Thumb-1 with an FPU specified.
>>>
>>> *** gcc/testsuite/ChangeLog ***
>>>
>>> 2018-11-28  thomas Preud'homme  
>>>
>>> * gcc.target/arm/thumb1_mfpu-1.c: New testcase.
>>> * gcc.target/arm/thumb1_mfpu-2.c: Likewise.
>>>
>>> Testing: No testsuite regression when targeting arm-none-eabi Armv6S-M.
>>> Fails as expected when targeting Armv6-M with an -mfpu or a default FPU.
>>> Succeeds without.
>>>
>>> Is this ok for stage3?
>>>
>>
>> This doesn't sound right.  Specifically this bit...
>>
>> +  else if (TARGET_THUMB1
>> +  && bitmap_bit_p (arm_active_target.isa, isa_bit_vfpv2))
>> +   error ("Thumb-1 does not allow FP instructions");
>>
>> If I use
>>
>> -mcpu=arm1176jzf-s -mfpu=auto -mfloat-abi=softfp -mthumb
>>
>> then that shouldn't error, since softfp and thumb is, in reality, just
>> float-abi=soft (as there are no fp instructions in thumb).  We also want
>> it to work this way so that I can add the thumb/arm attribute to
>> specific functions and have the compiler use HW float instructions when
>> they are suitable.
>>
>>
>> R.
>>
>>> Best regards,
>>>
>>> Thomas
>>>
>>>
>>> thumb1_mfpu_error.patch
>>>
>>> From 051e38552d7c596873e0303f6ec4272b26d50900 Mon Sep 17 00:00:00 2001
>>> From: Thomas Preud'homme 
>>> Date: Tue, 27 Nov 2018 15:52:38 +
>>> Subject: [PATCH] [PATCH, ARM] Error out when -mfpu set and targeting Thumb-1
>>>
>>> Hi,
>>>
>>> FP instructions are only enabled for TARGET_32BIT and TARGET_HARD_FLOAT
>>> but GCC only gives an error when TARGET_HARD_FLOAT is true and -mfpu is
>>> not set. Among other things, it makes some of the cmse tests (eg.
>>> gcc.target/arm/cmse/baseline/softfp.c) fail when targeting
>>> -march=armv8-m.base -mfpu= -mfloat-abi=softfp. This patch
>>> errors out when a Thumb-1 -like target is selected and a FPU is
>>> specified, thus making such tests being skipped.
>>>
>>> ChangeLog entries are as follows:
>>>
>>> *** gcc/ChangeLog ***
>>>
>>> 2018-11-28  thomas Preud'homme  
>>>
>>>   * config/arm/arm.c (arm_options_perform_arch_sanity_checks): Error out
>>>   if targeting Thumb-1 with an FPU specified.
>>>
>>> *** gcc/testsuite/ChangeLog ***
>>>
>>> 2018-11-28  thomas Preud'homme  
>>>
>>>   * gcc.target/arm/thumb1_mfpu-1.c: New testcase.
>>>   * gcc.target/arm/thumb1_mfpu-2.c: Likewise.
>>>
>>> Testing: No testsuite regression when targeting arm-none-eabi Armv6S-M.
>>> Fails as expected when targeting Armv6-M with an -mfpu or a default FPU.
>>> Succeeds without.
>>>
>>> Is this ok for stage3?
>>>
>>> Best regards,
>>>
>>> Thomas
>>> ---
>>>  gcc/config/arm/arm.c | 3 +++
>>>  gcc/testsuite/gcc.target/arm/thumb1_mfpu-1.c | 7 +++
>>>  gcc/testsuite/gcc.target/arm/thumb1_mfpu-2.c | 8 
>>>  3 files chang

Re: [PATCH] Restrict a VSX extract to TARGET_POWERPC64 (PR88213)

2018-12-19 Thread Segher Boessenkool
On Wed, Dec 19, 2018 at 01:49:41PM +, Segher Boessenkool wrote:
> This pattern optimises a scalar extract from a vector loaded from
> memory to be just a scalar load from memory.  But to do a 64-bit
> integer load you need 64-bit integer registers, which needs
> TARGET_POWERPC64.
> 
> This fixes it.  Tested on powerpc64-linux {-m32,-m64}; committing to trunk.

I've also backported it to 8 and 7.


Segher


> 2018-12-19  Segher Boessenkool  
> 
>   PR target/88213
>   * config/rs6000/vsx.md (*vsx_extract___load):
>   Require TARGET_POWERPC64.
> 
> ---
>  gcc/config/rs6000/vsx.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index e30f89d..2c00b40 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -3257,7 +3257,7 @@ (define_insn_and_split 
> "*vsx_extract___load"
>(match_operand:VSX_D 1 "memory_operand" "m,m")
>(parallel [(match_operand:QI 2 "const_0_to_1_operand" "n,n")])))
> (clobber (match_scratch:P 3 "=&b,&b"))]
> -  "VECTOR_MEM_VSX_P (mode)"
> +  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)"
>"#"
>"&& reload_completed"
>[(set (match_dup 0) (match_dup 4))]
> -- 
> 1.8.3.1


[PATCH] Restrict a VSX extract to TARGET_POWERPC64 (PR88213)

2018-12-19 Thread Segher Boessenkool
This pattern optimises a scalar extract from a vector loaded from
memory to be just a scalar load from memory.  But to do a 64-bit
integer load you need 64-bit integer registers, which needs
TARGET_POWERPC64.

This fixes it.  Tested on powerpc64-linux {-m32,-m64}; committing to trunk.


Segher


2018-12-19  Segher Boessenkool  

PR target/88213
* config/rs6000/vsx.md (*vsx_extract___load):
Require TARGET_POWERPC64.

---
 gcc/config/rs6000/vsx.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index e30f89d..2c00b40 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -3257,7 +3257,7 @@ (define_insn_and_split 
"*vsx_extract___load"
 (match_operand:VSX_D 1 "memory_operand" "m,m")
 (parallel [(match_operand:QI 2 "const_0_to_1_operand" "n,n")])))
(clobber (match_scratch:P 3 "=&b,&b"))]
-  "VECTOR_MEM_VSX_P (mode)"
+  "TARGET_POWERPC64 && VECTOR_MEM_VSX_P (mode)"
   "#"
   "&& reload_completed"
   [(set (match_dup 0) (match_dup 4))]
-- 
1.8.3.1



RE: [Patch, Vectorizer, SVE] fmin/fmax builtin reduction support

2018-12-19 Thread Alejandro Martinez Vicente
Richard,

I'm happy to change the name of the helper to code_helper_for_stmt, the new 
patch and changelog are included. Regarding the reductions being fold_left, the 
FMINNM/FMINMV instructions are defined in such a way that this is not necessary 
(it wouldn't work with FMIN/FMINV).

Alejandro

 
gcc/Changelog:
 
2018-12-18  Alejandro Martinez  

* gimple-match.h (code_helper_for_stmt): New function to get a
code_helper from an statement.
* internal-fn.def: New reduc_fmax_scal and reduc_fmin_scal optabs for
ieee fp max/min reductions
* optabs.def: Likewise.
* tree-vect-loop.c (reduction_fn_for_scalar_code): Changed function
signature to accept code_helper instead of tree_code. Handle the
fmax/fmin builtins.
(needs_fold_left_reduction_p): Likewise.
(check_reduction_path): Likewise.
(vect_is_simple_reduction): Use code_helper instead of tree_code. Check
for supported call-based reductions. Extend support for both
assignment-based and call-based reductions.
(vect_model_reduction_cost): Extend cost-model support to call-based
reductions (just use MAX expression).
(get_initial_def_for_reduction): Use code_helper instead of tree_code.
Extend support for both assignment-based and call-based reductions.
(vect_create_epilog_for_reduction): Likewise.
(vectorizable_reduction): Likewise.
* tree-vectorizer.h: include gimple-match.h for code_helper. Use
code_helper in check_reduction_path signature.
* config/aarch64/aarch64-sve.md: Added define_expand to capture new
reduc_fmax_scal and reduc_fmin_scal optabs.
* config/aarch64/iterators.md: New FMAXMINNMV and fmaxmin_uns iterators
to support the new define_expand.
 
gcc/testsuite/Changelog:
 
2018-12-18  Alejandro Martinez  

* gcc.target/aarch64/sve/reduc_9.c: New test to check
SVE-vectorized reductions without -ffast-math.
* gcc.target/aarch64/sve/reduc_10.c: New test to check
SVE-vectorized builtin reductions without -ffast-math.

-Original Message-
From: Richard Biener  
Sent: 19 December 2018 12:35
To: Alejandro Martinez Vicente 
Cc: GCC Patches ; Richard Sandiford 
; nd 
Subject: Re: [Patch, Vectorizer, SVE] fmin/fmax builtin reduction support

On Wed, Dec 19, 2018 at 10:33 AM Alejandro Martinez Vicente 
 wrote:
>
> Hi all,
>
> Loops that use the fmin/fmax builtins can be vectorized even without 
> -ffast-math using SVE's FMINNM/FMAXNM instructions. This is an example:
>
> double
> f (double *x, int n)
> {
>   double res = 100.0;
>   for (int i = 0; i < n; ++i)
> res = __builtin_fmin (res, x[i]);
>   return res;
> }
>
> Before this patch, the compiler would generate this code 
> (-march=armv8.2-a+sve
> -O2 -ftree-vectorize):
>
>  :
>0:   713fcmp w1, #0x0
>4:   5400018db.le34 
>8:   51000422sub w2, w1, #0x1
>c:   91002003add x3, x0, #0x8
>   10:   d2e80b21mov x1, #0x4059
>   14:   9e670020fmovd0, x1
>   18:   8b224c62add x2, x3, w2, uxtw #3
>   1c:   d503201fnop
>   20:   fc408401ldr d1, [x0],#8
>   24:   1e617800fminnm  d0, d0, d1
>   28:   eb02001fcmp x0, x2
>   2c:   54a1b.ne20 
>   30:   d65f03c0ret
>   34:   d2e80b20mov x0, #0x4059
>   38:   9e67fmovd0, x0
>   3c:   d65f03c0ret
>
> After this patch, this is the code that gets generated:
>
>  :
>0:   713fcmp w1, #0x0
>4:   5400020db.le44 
>8:   d282mov x2, #0x0
>c:   25d8e3e0ptrue   p0.d
>   10:   93407c21sxtwx1, w1
>   14:   9003adrpx3, 0 
>   18:   25804001mov p1.b, p0.b
>   1c:   9163add x3, x3, #0x0
>   20:   85c0e060ld1rd   {z0.d}, p0/z, [x3]
>   24:   25e11fe0whilelo p0.d, xzr, x1
>   28:   a5e24001ld1d{z1.d}, p0/z, [x0, x2, lsl #3]
>   2c:   04f0e3e2incdx2
>   30:   65c58020fminnm  z0.d, p0/m, z0.d, z1.d
>   34:   25e11c40whilelo p0.d, x2, x1
>   38:   5481b.ne28   // b.any
>   3c:   65c52400fminnmv d0, p1, z0.d
>   40:   d65f03c0ret
>   44:   d2e80b20mov x0, #0x4059
>   48:   9e67fmovd0, x0
>   4c:   d65f03c0ret
>
> This patch extends the support for reductions to include calls to 
> internal functions, in addition to assign statements. For this 
> purpose, in most places where a tree_code would be used, a code_helper 
> is used instead. The code_helper allows to hold either a tree_code or 
> combined_fn.
>
> This patch implements these tasks:
>
> - Detect a reduction candidate based on a call to an internal function
>   (currently only fmin or fmax).
> 

Re: [Patch, Vectorizer, SVE] fmin/fmax builtin reduction support

2018-12-19 Thread Richard Biener
On Wed, Dec 19, 2018 at 10:33 AM Alejandro Martinez Vicente
 wrote:
>
> Hi all,
>
> Loops that use the fmin/fmax builtins can be vectorized even without
> -ffast-math using SVE's FMINNM/FMAXNM instructions. This is an example:
>
> double
> f (double *x, int n)
> {
>   double res = 100.0;
>   for (int i = 0; i < n; ++i)
> res = __builtin_fmin (res, x[i]);
>   return res;
> }
>
> Before this patch, the compiler would generate this code (-march=armv8.2-a+sve
> -O2 -ftree-vectorize):
>
>  :
>0:   713fcmp w1, #0x0
>4:   5400018db.le34 
>8:   51000422sub w2, w1, #0x1
>c:   91002003add x3, x0, #0x8
>   10:   d2e80b21mov x1, #0x4059
>   14:   9e670020fmovd0, x1
>   18:   8b224c62add x2, x3, w2, uxtw #3
>   1c:   d503201fnop
>   20:   fc408401ldr d1, [x0],#8
>   24:   1e617800fminnm  d0, d0, d1
>   28:   eb02001fcmp x0, x2
>   2c:   54a1b.ne20 
>   30:   d65f03c0ret
>   34:   d2e80b20mov x0, #0x4059
>   38:   9e67fmovd0, x0
>   3c:   d65f03c0ret
>
> After this patch, this is the code that gets generated:
>
>  :
>0:   713fcmp w1, #0x0
>4:   5400020db.le44 
>8:   d282mov x2, #0x0
>c:   25d8e3e0ptrue   p0.d
>   10:   93407c21sxtwx1, w1
>   14:   9003adrpx3, 0 
>   18:   25804001mov p1.b, p0.b
>   1c:   9163add x3, x3, #0x0
>   20:   85c0e060ld1rd   {z0.d}, p0/z, [x3]
>   24:   25e11fe0whilelo p0.d, xzr, x1
>   28:   a5e24001ld1d{z1.d}, p0/z, [x0, x2, lsl #3]
>   2c:   04f0e3e2incdx2
>   30:   65c58020fminnm  z0.d, p0/m, z0.d, z1.d
>   34:   25e11c40whilelo p0.d, x2, x1
>   38:   5481b.ne28   // b.any
>   3c:   65c52400fminnmv d0, p1, z0.d
>   40:   d65f03c0ret
>   44:   d2e80b20mov x0, #0x4059
>   48:   9e67fmovd0, x0
>   4c:   d65f03c0ret
>
> This patch extends the support for reductions to include calls to internal
> functions, in addition to assign statements. For this purpose, in most places
> where a tree_code would be used, a code_helper is used instead. The 
> code_helper
> allows to hold either a tree_code or combined_fn.
>
> This patch implements these tasks:
>
> - Detect a reduction candidate based on a call to an internal function
>   (currently only fmin or fmax).
> - Process the reduction using code_helper. This means that at several places
>   we have to check whether this is as assign-based reduction or a call-based
>   reduction.
> - Add new internal functions for the fmin/fmax reductions and for conditional
>   fmin/fmax. In architectures where ieee fmin/fmax reductions are available, 
> it
>   is still possible to vectorize the loop using unconditional instructions.
> - Update SVE's md to support these new reductions.
> - Add new SVE tests to check that the optimal code is being generated.
>
> I tested this patch in an aarch64 machine bootstrapping the compiler and
> running the checks.

Just some quick comments based on the above and the changelog.
Using code_helper is reasonable I guess.

> Alejandro
>
> gcc/Changelog:
>
> 2018-12-18  Alejandro Martinez  
>
> * gimple-match.h (code_helper_for_stmnt): New function to get a

code_helper_for_stmt I hope.

> code_helper from an statement.
> * internal-fn.def: New reduc_fmax_scal and reduc_fmin_scal optabs for
> ieee fp max/min reductions

Aren't they necessarily fold_left reductions then?  Thus, should the optabs
be named accordingly fold_left_fmax_optab?

> * optabs.def: Likewise.
> * tree-vect-loop.c (reduction_fn_for_scalar_code): Changed function
> signature to accept code_helper instead of tree_code. Handle the
> fmax/fmin builtins.
> (needs_fold_left_reduction_p): Likewise.
> (check_reduction_path): Likewise.
> (vect_is_simple_reduction): Use code_helper instead of tree_code. 
> Check
> for supported call-based reductions. Extend support for both
> assignment-based and call-based reductions.
> (vect_model_reduction_cost): Extend cost-model support to call-based
> reductions (just use MAX expression).
> (get_initial_def_for_reduction): Use code_helper instead of tree_code.
> Extend support for both assignment-based and call-based reductions.
> (vect_create_epilog_for_reduction): Likewise.
> (vectorizable_reduction): Likewise.
> * tree-vectorizer.h: include gimple-match.h for code_helper. Use
> code_helper in check_reduction_path signature.
> * config/aarch64/aarch64-sve.md: Added define_expand to capture new
> reduc_fmax_scal and reduc_fmin_scal optabs.

[PATCH] Fix PR88533

2018-12-19 Thread Richard Biener


With the patch for PR85275 I throttled loop-header copying too much.
The following reverts that patch and instead adds heuristics to
should_duplicate_loop_header_p as to _not_ copy exit tests that
are based on non-IV/invariant tests.  Since CH runs before any
LIM we have to keep track of what is invariant ourselves.  Then
it's also easy enough to see what tests are based on IVs without
resorting to SCEV which would be limited in some cases as well.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2018-12-19  Richard Biener  

PR tree-optimization/88533
Revert
2018-04-30  Richard Biener  

PR tree-optimization/28364
PR tree-optimization/85275
* tree-ssa-loop-ch.c (ch_base::copy_headers): Stop after
copying first exit test.

* gcc.dg/tree-ssa/ssa-dom-thread-7.c: Adjust.

* tree-ssa-loop-ch.c: Include tree-phinodes.h and
ssa-iterators.h.
(should_duplicate_loop_header_p): Track whether stmt compute
loop invariants or values based on IVs.  Apart from the
original loop header only duplicate blocks with exit tests
that are based on IVs or invariants.

* gcc.dg/tree-ssa/copy-headers-6.c: New testcase.
* gcc.dg/tree-ssa/copy-headers-7.c: Likewise.
* gcc.dg/tree-ssa/ivopt_mult_1.c: Un-XFAIL.
* gcc.dg/tree-ssa/ivopt_mult_2.c: Likewise.

Index: gcc/testsuite/gcc.dg/tree-ssa/copy-headers-6.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/copy-headers-6.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/copy-headers-6.c  (working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ch2-details" } */
+
+int is_sorted(int *a, int n)
+{
+  for (int i = 0; i < n - 1; i++)
+if (a[i] > 0)
+  return 0;
+  return 1;
+}
+
+/* Verify we apply loop header copying but only copy the IV test and
+   not the alternate exit test.  */
+
+/* { dg-final { scan-tree-dump "is now do-while loop" "ch2" } } */
+/* { dg-final { scan-tree-dump-times "  if " 3 "ch2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/copy-headers-7.c  (working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-ch2-details --param 
logical-op-non-short-circuit=0" } */
+
+int is_sorted(int *a, int n, int m, int k)
+{
+  for (int i = 0; i < n - 1 && m && k > i; i++)
+if (a[i] > a[i + 1])
+  return 0;
+  return 1;
+}
+
+/* Verify we apply loop header copying but only copy the IV tests and
+   the invariant test, not the alternate exit test.  */
+
+/* { dg-final { scan-tree-dump "is now do-while loop" "ch2" } } */
+/* { dg-final { scan-tree-dump-times "Will duplicate bb" 3 "ch2" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c(revision 267232)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_1.c(working copy)
@@ -20,4 +20,4 @@ long foo(long* p, long* p2, int N1, int
   return s;
 }
 
-/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts" { xfail *-*-* } } 
} */
+/* { dg-final { scan-tree-dump-times "Replacing" 1 "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c(revision 267232)
+++ gcc/testsuite/gcc.dg/tree-ssa/ivopt_mult_2.c(working copy)
@@ -21,4 +21,4 @@ long foo(long* p, long* p2, int N1, int
   return s;
 }
 
-/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts" { xfail *-*-* } } 
} */
+/* { dg-final { scan-tree-dump-times "Replacing" 2 "ivopts" } } */
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c(revision 267232)
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c(working copy)
@@ -2,7 +2,7 @@
 /* { dg-options "-O2 -fdump-tree-thread1-stats -fdump-tree-thread2-stats 
-fdump-tree-dom2-stats -fdump-tree-thread3-stats -fdump-tree-dom3-stats 
-fdump-tree-vrp2-stats -fno-guess-branch-probability" } */
 /* { dg-final { scan-tree-dump "Jumps threaded: 16"  "thread1" } } */
 /* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread2" } } */
-/* { dg-final { scan-tree-dump "Jumps threaded: 1"  "dom2" } } */
+/* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom2" } } */
 /* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough
to change decisions in switch expansion which in turn can expose new
jump threading opportunities.  Skip the later tests on aarch64.  */
Index: gcc/tree-ssa-loop-ch.c
=

Re: [PR86153] simplify more overflow tests in VRP

2018-12-19 Thread Alexandre Oliva
On Dec 18, 2018, Jeff Law  wrote:

>> Although such overflow tests could be uniformly simplified to compares
>> against a constant, the original code would only perform such
>> simplifications when the test could be resolved to an equality test
>> against zero.  I've thus avoided introducing compares against other
>> constants, and instead added code that will only simplify overflow
>> tests that weren't simplified before when the condition can be
>> evaluated at compile time.

> That limitation was precisely what my (unsubmitted) patch was trying
> to address :-)

This patch is what I was getting at in my earlier email.

These transformations are already performed elsewhere, e.g. when
forwprop is enabled, but given sufficiently complex code to begin with,
as in the pr83239 testcases, forwprop presumably runs too early to be
able to simplify the tests and then non-early vrp comes to the rescue.

Presumably with more convoluted tests than the ones I'm introducing,
forwprop would be unable to infer the ranges, and then we'd really
depend on vrp, but I didn't dig deep enough to try and create a testcase
that wouldn't be optimized by forwprop, only by vrp.  That's why the
tests disable forwprop.


[PR86153] simplify vrp overflow simplifications

It turns out there was apparently no reason to avoid simplifying every
overflow comparison to a compare with a constant, it was not
profitable because earlier VRP couldn't deal with that as well as it
does now.

So, make the transformation unconditionally, even in cases we'd have
transformed differently before my previous patch, and let the
now-better optimizations resolve them to boolean constants or to
equality tests when possible.

The only significant difference is that where we'd turn A>B after
B=A+1 into B!=0, we'll now turn it into A!=-1u.  That might seem
worse, but considering that test canonicalization will have moved the
(probably) earliest SSA version to the first operand, that form is
more likely to allow the later SSA definition, presumably in terms of
the earlier one, to be completely removed, which would have otherwise
required propagation of the assignment to B into the compare, which is
possible in equality tests, but not in other kinds of overflow tests.

Regstrapped on x86_64- and i686-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR testsuite/86153
PR middle-end/83239
* vr-values.c
(vr_values::vrp_evaluate_conditional_warnv_with_ops): Simplify
the handling of overflow comparisons.

for  gcc/testsuite/ChangeLog

PR testsuite/86153
PR middle-end/83239
* gcc.dg/vrp-overflow-2.c: New.
---
 gcc/testsuite/gcc.dg/vrp-overflow-2.c |   35 ++
 gcc/vr-values.c   |   66 +++--
 2 files changed, 40 insertions(+), 61 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vrp-overflow-2.c

diff --git a/gcc/testsuite/gcc.dg/vrp-overflow-2.c 
b/gcc/testsuite/gcc.dg/vrp-overflow-2.c
new file mode 100644
index ..a905471bcaa1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vrp-overflow-2.c
@@ -0,0 +1,35 @@
+/* { dg-do run } */
+/* { dg-options "-O2 -fno-tree-forwprop" } */
+
+void __attribute__((noreturn)) undefined ();
+
+int tij (unsigned i)
+{
+  unsigned j = i + 1;
+
+  if (j == 0)
+return 0;
+
+  if (i > j)
+undefined ();
+
+  return 1;
+}
+
+int tji (unsigned i)
+{
+  unsigned j = i - 1;
+
+  if (i == 0)
+return 0;
+
+  if (j > i)
+undefined ();
+
+  return 1;
+}
+
+int main (int argc, char *argv[]) {
+  tij (argc);
+  tji (argc);
+}
diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index d71a703ab550..49c5da9cb515 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -2305,70 +2305,14 @@ vr_values::vrp_evaluate_conditional_warnv_with_ops 
(enum tree_code code,
   && !POINTER_TYPE_P (TREE_TYPE (op0)))
 return NULL_TREE;
 
-  /* If OP0 CODE OP1 is an overflow comparison, if it can be expressed
- as a simple equality test, then prefer that over its current form
- for evaluation.
-
- An overflow test which collapses to an equality test can always be
- expressed as a comparison of one argument against zero.  Overflow
- occurs when the chosen argument is zero and does not occur if the
- chosen argument is not zero.  */
+  /* If OP0 CODE OP1 is an overflow comparison, it can be expressed as
+ a test involving only one of the operands and a constant, so
+ prefer that over its current form for evaluation.  */
   tree x;
   if (overflow_comparison_p (code, op0, op1, use_equiv_p, &x))
 {
-  wide_int max = wi::max_value (TYPE_PRECISION (TREE_TYPE (op0)), 
UNSIGNED);
-  /* B = A - 1; if (A < B) -> B = A - 1; if (A == 0)
- B = A - 1; if (A > B) -> B = A - 1; if (A != 0)
- B = A + 1; if (B < A) -> B = A + 1; if (B == 0)
- B = A + 1; if (B > A) -> B = A + 1; if (B != 0) */
-  if (integer_zerop (x))
-   {
- op1 = x;
- code = (code == L

[nvptx, committed] Use MAX, MIN, ROUND_UP macros

2018-12-19 Thread Tom de Vries
[ was: Re: [nvptx] vector length patch series ]

On 14-12-18 20:58, Tom de Vries wrote:
> 0014-nvptx-Use-MAX-MIN-ROUND_UP-macros.patch

Committed.

Thanks,
- Tom
[nvptx] Use MAX, MIN, ROUND_UP macros

Use MAX, MIN, and ROUND_UP macros to simplify code.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-12-17  Tom de Vries  

	* config/nvptx/nvptx.c (nvptx_gen_shared_bcast, shared_prop_gen)
	(nvptx_goacc_expand_accel_var): Use MAX and ROUND_UP.
	(nvptx_assemble_value, nvptx_output_skip): Use MIN.
	(nvptx_shared_propagate, nvptx_single, nvptx_expand_shared_addr): Use
	MAX.

---
 gcc/config/nvptx/nvptx.c | 28 ++--
 1 file changed, 10 insertions(+), 18 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 163f2268e5f..2a2d638e6d7 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -1810,9 +1810,8 @@ nvptx_gen_shared_bcast (rtx reg, propagate_mask pm, unsigned rep,
 	  {
 	unsigned align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
 
-	if (align > oacc_bcast_align)
-	  oacc_bcast_align = align;
-	data->offset = (data->offset + align - 1) & ~(align - 1);
+	oacc_bcast_align = MAX (oacc_bcast_align, align);
+	data->offset = ROUND_UP (data->offset, align);
 	addr = data->base;
 	gcc_assert (data->base != NULL);
 	if (data->offset)
@@ -1934,8 +1933,7 @@ nvptx_assemble_value (unsigned HOST_WIDE_INT val, unsigned size)
 {
   val >>= part * BITS_PER_UNIT;
   part = init_frag.size - init_frag.offset;
-  if (part > size)
-	part = size;
+  part = MIN (part, size);
 
   unsigned HOST_WIDE_INT partial
 	= val << (init_frag.offset * BITS_PER_UNIT);
@@ -1998,8 +1996,7 @@ nvptx_output_skip (FILE *, unsigned HOST_WIDE_INT size)
   if (init_frag.offset)
 {
   unsigned part = init_frag.size - init_frag.offset;
-  if (part > size)
-	part = (unsigned) size;
+  part = MIN (part, (unsigned)size);
   size -= part;
   nvptx_assemble_value (0, part);
 }
@@ -3927,9 +3924,8 @@ shared_prop_gen (rtx reg, propagate_mask pm, unsigned rep, void *data_,
   /* Starting a loop, initialize pointer.*/
   unsigned align = GET_MODE_ALIGNMENT (GET_MODE (reg)) / BITS_PER_UNIT;
 
-  if (align > oacc_bcast_align)
-	oacc_bcast_align = align;
-  data->offset = (data->offset + align - 1) & ~(align - 1);
+  oacc_bcast_align = MAX (oacc_bcast_align, align);
+  data->offset = ROUND_UP (data->offset, align);
 
   data->ptr = gen_reg_rtx (Pmode);
 
@@ -3970,8 +3966,7 @@ nvptx_shared_propagate (bool pre_p, bool is_call, basic_block block,
   rtx init = gen_rtx_SET (data.base, oacc_bcast_sym);
   emit_insn_after (init, insn);
 
-  if (oacc_bcast_size < data.offset)
-	oacc_bcast_size = data.offset;
+  oacc_bcast_size = MAX (oacc_bcast_size, data.offset);
 }
   return empty;
 }
@@ -4346,8 +4341,7 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
 	  data.base = oacc_bcast_sym;
 	  data.ptr = 0;
 
-	  if (oacc_bcast_size < GET_MODE_SIZE (SImode))
-	oacc_bcast_size = GET_MODE_SIZE (SImode);
+	  oacc_bcast_size = MAX (oacc_bcast_size, GET_MODE_SIZE (SImode));
 
 	  data.offset = 0;
 	  emit_insn_before (nvptx_gen_shared_bcast (pvar, PM_read, 0, &data,
@@ -5044,13 +5038,11 @@ nvptx_expand_shared_addr (tree exp, rtx target,
 return target;
 
   unsigned align = TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 2));
-  if (align > worker_red_align)
-worker_red_align = align;
+  worker_red_align = MAX (worker_red_align, align);
 
   unsigned offset = TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 0));
   unsigned size = TREE_INT_CST_LOW (CALL_EXPR_ARG (exp, 1));
-  if (size + offset > worker_red_size)
-worker_red_size = size + offset;
+  worker_red_size = MAX (worker_red_size, size + offset);
 
   rtx addr = worker_red_sym;
   if (offset)


[nvptx, committed] Make nvptx state propagation function names more generic

2018-12-19 Thread Tom de Vries
[ was: Re: [nvptx] vector length patch series ]

On 14-12-18 20:58, Tom de Vries wrote:
> 0008-nvptx-make-nvptx-state-propagation-function-names-mo.patch

Committed.

Thanks,
- Tom
[nvptx] Make nvptx state propagation function names more generic

Rename state propagation functions to avoid worker/vector terminology.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-12-17  Tom de Vries  

	* config/nvptx/nvptx.c (nvptx_gen_wcast): Rename as
	nvptx_gen_warp_bcast.
	(nvptx_gen_wcast): Rename to nvptx_gen_shared_bcast, add bool
	vector argument, and update call to nvptx_gen_shared_bcast.
	(propagator_fn): Add bool argument.
	(nvptx_propagate): New bool argument, pass bool argument to fn.
	(vprop_gen): Rename to warp_prop_gen, update call to
	nvptx_gen_warp_bcast.
	(nvptx_vpropagate): Rename to nvptx_warp_propagate, update call to
	nvptx_propagate.
	(wprop_gen): Rename to shared_prop_gen, update call to
	nvptx_gen_shared_bcast.
	(nvptx_wpropagate): Rename to nvptx_shared_propagate, update call
	to nvptx_propagate.
	(nvptx_wsync): Rename to nvptx_cta_sync.
	(nvptx_single): Update calls to nvptx_gen_warp_bcast,
	nvptx_gen_shared_bcast and nvptx_cta_sync.
	(nvptx_process_pars): Likewise.
	(write_worker_buffer): Rename as write_shared_buffer.
	(nvptx_file_end): Update calls to write_shared_buffer.
	(nvptx_expand_worker_addr): Rename as nvptx_expand_shared_addr.
	(nvptx_expand_builtin): Update call to nvptx_expand_shared_addr.
	(nvptx_get_worker_red_addr): Rename as nvptx_get_shared_red_addr.
	(nvptx_goacc_reduction_setup): Update call to
	nvptx_get_shared_red_addr.
	(nvptx_goacc_reduction_fini): Likewise.
	(nvptx_goacc_reduction_teardown): Likewise.

---
 gcc/config/nvptx/nvptx.c | 96 +++-
 1 file changed, 54 insertions(+), 42 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 9625ac86aa1..163f2268e5f 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -1748,7 +1748,7 @@ nvptx_gen_shuffle (rtx dst, rtx src, rtx idx, nvptx_shuffle_kind kind)
across the vectors of a single warp.  */
 
 static rtx
-nvptx_gen_vcast (rtx reg)
+nvptx_gen_warp_bcast (rtx reg)
 {
   return nvptx_gen_shuffle (reg, reg, const0_rtx, SHUFFLE_IDX);
 }
@@ -1779,7 +1779,8 @@ enum propagate_mask
how many loop iterations will be executed (0 for not a loop).  */

 static rtx
-nvptx_gen_wcast (rtx reg, propagate_mask pm, unsigned rep, broadcast_data_t *data)
+nvptx_gen_shared_bcast (rtx reg, propagate_mask pm, unsigned rep,
+			broadcast_data_t *data, bool vector)
 {
   rtx  res;
   machine_mode mode = GET_MODE (reg);
@@ -1793,7 +1794,7 @@ nvptx_gen_wcast (rtx reg, propagate_mask pm, unsigned rep, broadcast_data_t *dat
 	start_sequence ();
 	if (pm & PM_read)
 	  emit_insn (gen_sel_truesi (tmp, reg, GEN_INT (1), const0_rtx));
-	emit_insn (nvptx_gen_wcast (tmp, pm, rep, data));
+	emit_insn (nvptx_gen_shared_bcast (tmp, pm, rep, data, vector));
 	if (pm & PM_write)
 	  emit_insn (gen_rtx_SET (reg, gen_rtx_NE (BImode, tmp, const0_rtx)));
 	res = get_insns ();
@@ -1813,6 +1814,7 @@ nvptx_gen_wcast (rtx reg, propagate_mask pm, unsigned rep, broadcast_data_t *dat
 	  oacc_bcast_align = align;
 	data->offset = (data->offset + align - 1) & ~(align - 1);
 	addr = data->base;
+	gcc_assert (data->base != NULL);
 	if (data->offset)
 	  addr = gen_rtx_PLUS (Pmode, addr, GEN_INT (data->offset));
 	  }
@@ -3803,11 +3805,11 @@ nvptx_find_sese (auto_vec &blocks, bb_pair_vec_t ®ions)
regions and (b) only propagating stack entries that are used.  The
latter might be quite hard to determine.  */
 
-typedef rtx (*propagator_fn) (rtx, propagate_mask, unsigned, void *);
+typedef rtx (*propagator_fn) (rtx, propagate_mask, unsigned, void *, bool);
 
 static bool
 nvptx_propagate (bool is_call, basic_block block, rtx_insn *insn,
-		 propagate_mask rw, propagator_fn fn, void *data)
+		 propagate_mask rw, propagator_fn fn, void *data, bool vector)
 {
   bitmap live = DF_LIVE_IN (block);
   bitmap_iterator iterator;
@@ -3842,7 +3844,7 @@ nvptx_propagate (bool is_call, basic_block block, rtx_insn *insn,
 	  
 	  emit_insn (gen_rtx_SET (idx, GEN_INT (fs)));
 	  /* Allow worker function to initialize anything needed.  */
-	  rtx init = fn (tmp, PM_loop_begin, fs, data);
+	  rtx init = fn (tmp, PM_loop_begin, fs, data, vector);
 	  if (init)
 	emit_insn (init);
 	  emit_label (label);
@@ -3851,7 +3853,7 @@ nvptx_propagate (bool is_call, basic_block block, rtx_insn *insn,
 	}
   if (rw & PM_read)
 	emit_insn (gen_rtx_SET (tmp, gen_rtx_MEM (DImode, ptr)));
-  emit_insn (fn (tmp, rw, fs, data));
+  emit_insn (fn (tmp, rw, fs, data, vector));
   if (rw & PM_write)
 	emit_insn (gen_rtx_SET (gen_rtx_MEM (DImode, ptr), tmp));
   if (fs)
@@ -3859,7 +3861,7 @@ nvptx_propagate (bool is_call, basic_block block, rtx_insn *insn,
 	  emit_insn (gen_rtx_SET (pred, gen_rtx_NE (BImode, idx, const0_rtx)));
 	  emit_insn (gen_

[nvptx, committed] Rename worker_bcast variables to oacc_bcast

2018-12-19 Thread Tom de Vries
[ was: Re: [nvptx] vector length patch series ]

On 14-12-18 20:58, Tom de Vries wrote:
> 0006-nvptx-Rename-worker_bcast-variables-oacc_bcast.patch

Committed.

Thanks,
- Tom
[nvptx] Rename worker_bcast variables to oacc_bcast

Rename worker_bcast variables to oacc_bcast, avoiding worker terminology.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-12-17  Tom de Vries  

	* config/nvptx/nvptx.c (worker_bcast_size): Rename as
	oacc_bcast_size.
	(worker_bcast_align): Rename as oacc_bcast_align.
	(worker_bcast_sym): Rename as oacc_bcast_sym.
	(nvptx_option_override): Update usage of oacc_bcast_*.
	(struct wcast_data_t): Rename as broadcast_data_t.
	(nvptx_gen_wcast): Update type of data argument and usage of
	oacc_bcast_align.
	(wprop_gen): Update type of data_ and usage of oacc_bcast_align.
	(nvptx_wpropagate): Update type of data and usage of
	oacc_bcast_{sym,size}.
	(nvptx_single): Update type of data and usage of oacc_bcast_size.
	(nvptx_file_end): Update usage of oacc_bcast_{sym,align,size}.

---
 gcc/config/nvptx/nvptx.c | 59 
 1 file changed, 30 insertions(+), 29 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 1ad3ba92caa..9625ac86aa1 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -127,14 +127,15 @@ struct tree_hasher : ggc_cache_ptr_hash
 static GTY((cache)) hash_table *declared_fndecls_htab;
 static GTY((cache)) hash_table *needed_fndecls_htab;
 
-/* Buffer needed to broadcast across workers.  This is used for both
-   worker-neutering and worker broadcasting.  It is shared by all
-   functions emitted.  The buffer is placed in shared memory.  It'd be
-   nice if PTX supported common blocks, because then this could be
-   shared across TUs (taking the largest size).  */
-static unsigned worker_bcast_size;
-static unsigned worker_bcast_align;
-static GTY(()) rtx worker_bcast_sym;
+/* Buffer needed to broadcast across workers and vectors.  This is
+   used for both worker-neutering and worker broadcasting, and
+   vector-neutering and boardcasting when vector_length > 32.  It is
+   shared by all functions emitted.  The buffer is placed in shared
+   memory.  It'd be nice if PTX supported common blocks, because then
+   this could be shared across TUs (taking the largest size).  */
+static unsigned oacc_bcast_size;
+static unsigned oacc_bcast_align;
+static GTY(()) rtx oacc_bcast_sym;
 
 /* Buffer needed for worker reductions.  This has to be distinct from
the worker broadcast array, as both may be live concurrently.  */
@@ -207,9 +208,9 @@ nvptx_option_override (void)
   declared_libfuncs_htab
 = hash_table::create_ggc (17);
 
-  worker_bcast_sym = gen_rtx_SYMBOL_REF (Pmode, "__worker_bcast");
-  SET_SYMBOL_DATA_AREA (worker_bcast_sym, DATA_AREA_SHARED);
-  worker_bcast_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
+  oacc_bcast_sym = gen_rtx_SYMBOL_REF (Pmode, "__oacc_bcast");
+  SET_SYMBOL_DATA_AREA (oacc_bcast_sym, DATA_AREA_SHARED);
+  oacc_bcast_align = GET_MODE_ALIGNMENT (SImode) / BITS_PER_UNIT;
 
   worker_red_sym = gen_rtx_SYMBOL_REF (Pmode, "__worker_red");
   SET_SYMBOL_DATA_AREA (worker_red_sym, DATA_AREA_SHARED);
@@ -1754,7 +1755,7 @@ nvptx_gen_vcast (rtx reg)
 
 /* Structure used when generating a worker-level spill or fill.  */
 
-struct wcast_data_t
+struct broadcast_data_t
 {
   rtx base;  /* Register holding base addr of buffer.  */
   rtx ptr;  /* Iteration var,  if needed.  */
@@ -1778,7 +1779,7 @@ enum propagate_mask
how many loop iterations will be executed (0 for not a loop).  */

 static rtx
-nvptx_gen_wcast (rtx reg, propagate_mask pm, unsigned rep, wcast_data_t *data)
+nvptx_gen_wcast (rtx reg, propagate_mask pm, unsigned rep, broadcast_data_t *data)
 {
   rtx  res;
   machine_mode mode = GET_MODE (reg);
@@ -1808,8 +1809,8 @@ nvptx_gen_wcast (rtx reg, propagate_mask pm, unsigned rep, wcast_data_t *data)
 	  {
 	unsigned align = GET_MODE_ALIGNMENT (mode) / BITS_PER_UNIT;
 
-	if (align > worker_bcast_align)
-	  worker_bcast_align = align;
+	if (align > oacc_bcast_align)
+	  oacc_bcast_align = align;
 	data->offset = (data->offset + align - 1) & ~(align - 1);
 	addr = data->base;
 	if (data->offset)
@@ -3914,15 +3915,15 @@ nvptx_vpropagate (bool is_call, basic_block block, rtx_insn *insn)
 static rtx
 wprop_gen (rtx reg, propagate_mask pm, unsigned rep, void *data_)
 {
-  wcast_data_t *data = (wcast_data_t *)data_;
+  broadcast_data_t *data = (broadcast_data_t *)data_;
 
   if (pm & PM_loop_begin)
 {
   /* Starting a loop, initialize pointer.*/
   unsigned align = GET_MODE_ALIGNMENT (GET_MODE (reg)) / BITS_PER_UNIT;
 
-  if (align > worker_bcast_align)
-	worker_bcast_align = align;
+  if (align > oacc_bcast_align)
+	oacc_bcast_align = align;
   data->offset = (data->offset + align - 1) & ~(align - 1);
 
   data->ptr = gen_reg_rtx (Pmode);
@@ -3947,7 +3948,7 @@ wprop_gen (r

[nvptx, committed] Generalize bar.sync instruction

2018-12-19 Thread Tom de Vries
[ was: Re: [nvptx] vector length patch series ]

On 14-12-18 20:58, Tom de Vries wrote:
> 0011-nvptx-Add-thread-count-parm-to-bar.sync.patch

Factored out this patch, committed.

Thanks,
- Tom
[nvptx] Generalize bar.sync instruction

Allow the logical barrier operand of nvptx_barsync to be a register, and add a
thread count operand.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-12-17  Tom de Vries  

	* config/nvptx/nvptx.md (nvptx_barsync): Add and handle operand.
	* config/nvptx/nvptx.c (nvptx_wsync): Update call to gen_nvptx_barsync.

---
 gcc/config/nvptx/nvptx.c  |  2 +-
 gcc/config/nvptx/nvptx.md | 10 --
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index a354811194c..1ad3ba92caa 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -3974,7 +3974,7 @@ nvptx_wpropagate (bool pre_p, bool is_call, basic_block block, rtx_insn *insn)
 static rtx
 nvptx_wsync (bool after)
 {
-  return gen_nvptx_barsync (GEN_INT (after));
+  return gen_nvptx_barsync (GEN_INT (after), GEN_INT (0));
 }
 
 #if WORKAROUND_PTXJIT_BUG
diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index ca00b1d8073..f1f6fe0c404 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -1454,10 +1454,16 @@
   [(set_attr "atomic" "true")])
 
 (define_insn "nvptx_barsync"
-  [(unspec_volatile [(match_operand:SI 0 "const_int_operand" "")]
+  [(unspec_volatile [(match_operand:SI 0 "nvptx_nonmemory_operand" "Ri")
+		 (match_operand:SI 1 "const_int_operand")]
 		UNSPECV_BARSYNC)]
   ""
-  "\\tbar.sync\\t%0;"
+  {
+if (INTVAL (operands[1]) == 0)
+  return "\\tbar.sync\\t%0;";
+else
+  return "\\tbar.sync\\t%0, %1;";
+  }
   [(set_attr "predicable" "false")])
 
 (define_expand "memory_barrier"


[nvptx, committed] Only use one logical barrier resource

2018-12-19 Thread Tom de Vries
[ was: Re: [nvptx] vector length patch series ]
On 14-12-18 20:58, Tom de Vries wrote:
> 0010-nvptx-only-use-one-bar.sync-barriers-in-OpenACC-offl.patch

Committed.

Thanks,
- Tom

[nvptx] Only use one logical barrier resource

For openacc loops, we generate this style of code:
...
@%r41   bra.uni $L5;
@%r40   bra $L6;
mov.u64 %r32, %ar0;
cvta.shared.u64 %r39, __worker_bcast;
st.u64  [%r39], %r32;
$L6:
$L5:
bar.sync0;
@%r40   bra $L4;
cvta.shared.u64 %r38, __worker_bcast;
ld.u64  %r32, [%r38];
...
$L4:
bar.sync1;
...

The first barrier is there to ensure that no thread reads the broadcast buffer
before it's written.  The second barrier is there to ensure that no thread
overwrites the broadcast buffer before all threads have read it (as well as
implementing the obligatory synchronization after a worker loop).

We've been using the logical barrier resources '0' and '1' for these two
barriers, but there's no reason why we can't use the same one.

Use logical barrier resource '0' for both barriers, making the openacc
implementation claim less resources.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-12-17  Tom de Vries  

	* config/nvptx/nvptx.c (nvptx_single): Always pass false to
	nvptx_wsync.
	(nvptx_process_pars): Likewise.

---
 gcc/config/nvptx/nvptx.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 9f834d35200..a354811194c 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -4351,7 +4351,7 @@ nvptx_single (unsigned mask, basic_block from, basic_block to)
 	  /* This barrier is needed to avoid worker zero clobbering
 	 the broadcast buffer before all the other workers have
 	 had a chance to read this instance of it.  */
-	  emit_insn_before (nvptx_wsync (true), tail);
+	  emit_insn_before (nvptx_wsync (false), tail);
 	}
 
   extract_insn (tail);
@@ -4476,7 +4476,7 @@ nvptx_process_pars (parallel *par)
 	{
 	  /* Insert begin and end synchronizations.  */
 	  emit_insn_before (nvptx_wsync (false), par->forked_insn);
-	  emit_insn_before (nvptx_wsync (true), par->join_insn);
+	  emit_insn_before (nvptx_wsync (false), par->join_insn);
 	}
 }
   else if (par->mask & GOMP_DIM_MASK (GOMP_DIM_VECTOR))


[nvptx, committed] Use TARGET_SET_CURRENT_FUNCTION

2018-12-19 Thread Tom de Vries
[ was: Re: [nvptx] vector length patch series ]

On 14-12-18 20:58, Tom de Vries wrote:
> 0013-nvptx-Use-TARGET_SET_CURRENT_FUNCTION.patch

Committed.

Thanks,
- Tom
[nvptx] Use TARGET_SET_CURRENT_FUNCTION

Implement TARGET_SET_CURRENT_FUNCTION for nvptx.  This gives us a place to
add initialization or reset actions that need to be executed on a per-function
basis.

Build and reg-tested on x86_64 with nvptx accelerator.

2018-12-17  Tom de Vries  

	* config/nvptx/nvptx.c (nvptx_previous_fndecl): Declare.
	(nvptx_set_current_function): New function.
	(TARGET_SET_CURRENT_FUNCTION): Define.

---
 gcc/config/nvptx/nvptx.c | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 74ca0f585aa..9f834d35200 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -5980,6 +5980,17 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
   return false;
 }
 
+static GTY(()) tree nvptx_previous_fndecl;
+
+static void
+nvptx_set_current_function (tree fndecl)
+{
+  if (!fndecl || fndecl == nvptx_previous_fndecl)
+return;
+
+  nvptx_previous_fndecl = fndecl;
+}
+
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE nvptx_option_override
 
@@ -6116,6 +6127,9 @@ nvptx_can_change_mode_class (machine_mode, machine_mode, reg_class_t)
 #undef TARGET_HAVE_SPECULATION_SAFE_VALUE
 #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
 
+#undef TARGET_SET_CURRENT_FUNCTION
+#define TARGET_SET_CURRENT_FUNCTION nvptx_set_current_function
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-nvptx.h"


Patch ping (Re: [C++ PATCH] Fix __builtin_{is_constant_evaluated,constant_p} handling in static_assert (PR c++/86524, PR c++/88446))

2018-12-19 Thread Jakub Jelinek
Hi!

On Wed, Dec 12, 2018 at 11:30:37PM +0100, Jakub Jelinek wrote:
> 2018-12-12  Jakub Jelinek  
> 
>   PR c++/86524
>   PR c++/88446
>   * cp-tree.h (fold_non_dependent_expr): Add manifestly_const_eval
>   argument.
>   * constexpr.c (cxx_eval_builtin_function_call): Evaluate
>   __builtin_constant_p if ctx->manifestly_const_eval even in constexpr
>   functions.  For arguments to builtins, if ctx->manifestly_const_eval
>   try to first evaluate arguments with it, but if that doesn't result
>   in a constant expression, retry without it.  Fix comment typo.
>   (fold_non_dependent_expr): Add manifestly_const_eval argument, pass
>   it through to cxx_eval_outermost_constant_expr and
>   maybe_constant_value.
>   * semantics.c (finish_static_assert): Call fold_non_dependent_expr
>   with true as manifestly_const_eval.
> 
>   * g++.dg/cpp1y/constexpr-86524.C: New test.
>   * g++.dg/cpp2a/is-constant-evaluated4.C: New test.
>   * g++.dg/cpp2a/is-constant-evaluated5.C: New test.
>   * g++.dg/cpp2a/is-constant-evaluated6.C: New test.

I'd like to ping this patch.

Thanks.

Jakub


Re: [PATCH] [aarch64] Revert support for ARMv8.2 in tsv110

2018-12-19 Thread Richard Earnshaw (lists)
On 19/12/2018 03:11, Shaokun Zhang wrote:
> For HiSilicon's tsv110 cpu core, it supports some v8_4A features, but
> some mandatory features are not implemented. Revert to ARMv8.2 that
> all mandatory features are supported.
> 

Thanks, I've put this in.

I've modified the ChangeLog entry slightly - we normally use 'revert' in
the specific sense of completely removing an existing patch.

Also, when sending patches, please do not send ChangeLog entries as part
of the patch file.  Because the file is always updated at the head, the
patch hunk is rarely going to apply cleanly.  Instead, include the
ChangeLog text as part of your email description; that way we can then
paste it directly into the ChangeLog file itself and simply correct the
date.

R.

> ---
>  gcc/ChangeLog| 5 +
>  gcc/config/aarch64/aarch64-cores.def | 6 +++---
>  2 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index e9f5baa6557c..842876b0ae90 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,8 @@
> +2018-12-19 Shaokun Zhang  
> +
> +* config/aarch64/aarch64-cores.def (tsv110) : Revert support for ARMv8.2
> + in tsv110.
> +
>  2018-12-18  Vladimir Makarov  
>  
>   PR rtl-optimization/87759
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index 74be5dbf2595..20f4924e084d 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -96,10 +96,10 @@ AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
> AARCH64_FL_FOR_ARCH8_2
>  AARCH64_CORE("cortex-a76",  cortexa76, cortexa57, 8_2A,  
> AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_F16 | AARCH64_FL_RCPC | 
> AARCH64_FL_DOTPROD, cortexa72, 0x41, 0xd0b, -1)
>  AARCH64_CORE("ares",  ares, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
> AARCH64_FL_F16 | AARCH64_FL_RCPC | AARCH64_FL_DOTPROD | AARCH64_FL_PROFILE, 
> cortexa72, 0x41, 0xd0c, -1)
>  
> -/* ARMv8.4-A Architecture Processors.  */
> -
>  /* HiSilicon ('H') cores. */
> -AARCH64_CORE("tsv110", tsv110,cortexa57,8_4A, 
> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES 
> | AARCH64_FL_SHA2, tsv110,   0x48, 0xd01, -1)
> +AARCH64_CORE("tsv110",  tsv110, cortexa57, 8_2A,  AARCH64_FL_FOR_ARCH8_2 | 
> AARCH64_FL_CRYPTO | AARCH64_FL_F16 | AARCH64_FL_AES | AARCH64_FL_SHA2, 
> tsv110,   0x48, 0xd01, -1)
> +
> +/* ARMv8.4-A Architecture Processors.  */
>  
>  /* Qualcomm ('Q') cores. */
>  AARCH64_CORE("saphira", saphira,saphira,8_4A,  
> AARCH64_FL_FOR_ARCH8_4 | AARCH64_FL_CRYPTO | AARCH64_FL_RCPC, saphira,   
> 0x51, 0xC01, -1)
> 



[SVE ACLE] Various fixes and cleanups

2018-12-19 Thread Richard Sandiford
I've applied the following three patches to aarch64/sve-acle-branch.
The first just fixes some bugs I noticed while testing the current branch.
The other two try to tidy up the instruction generation code so that we
aren't passing so many values around, and so that it's easier to separate
"number of operands" from "how to get an icode".

Thanks,
Richard

[SVE ACLE] Some fixes

- Fix the SEL assembly syntax (it doesn't take a predication suffix)
- Fix the operand numbering in mul3
- Avoid using general_operand for things that don't accept memory


diff --git a/gcc/config/aarch64/aarch64-sve.md b/gcc/config/aarch64/aarch64-sve.md
index 2176be8cf9a..65eddc261d8 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -1589,7 +1589,7 @@
 	  UNSPEC_SEL))]
   "TARGET_SVE"
   "@
-   sel\t%0., %3/m, %1., %2.
+   sel\t%0., %3, %1., %2.
mov\t%0., %3/m, #%1
movprfx\t%0., %3/z, %0.\;mov\t%0., %3/m, %1.
mov\t%0., %3/z, #%1
@@ -1601,12 +1601,12 @@
   [(set (match_operand:SVE_F 0 "register_operand" "=w, w, ?&w, ?&w")
 	(unspec:SVE_F
 	  [(match_operand: 3 "register_operand" "Upa, Upl, Upl, Upl")
-	   (match_operand:SVE_F 1 "general_operand" "w, Dn, w, Dn")
+	   (match_operand:SVE_F 1 "aarch64_nonmemory_operand" "w, Dn, w, Dn")
 	   (match_operand:SVE_F 2 "aarch64_simd_reg_or_zero" "w, 0, Dz, Dz")]
 	  UNSPEC_SEL))]
   "TARGET_SVE"
   "@
-   sel\t%0., %3/m, %1., %2.
+   sel\t%0., %3, %1., %2.
* return aarch64_output_sve_mov_immediate (operands[1], 3, true);
movprfx\t%0., %3/z, %0.\;mov\t%0., %3/m, %1
* return aarch64_output_sve_mov_immediate (operands[1], 3, false);"
@@ -2553,8 +2553,8 @@
 	(unspec:SVE_F
 	  [(match_dup 3)
 	   (const_int SVE_ALLOW_NEW_FAULTS)
-	   (match_operand:SVE_F 2 "register_operand")
-	   (match_operand:SVE_F 3 "aarch64_sve_float_mul_operand")]
+	   (match_operand:SVE_F 1 "register_operand")
+	   (match_operand:SVE_F 2 "aarch64_sve_float_mul_operand")]
 	  UNSPEC_COND_MUL))]
   "TARGET_SVE"
   {
diff --git a/gcc/testsuite/gcc.target/aarch64/sve-acle/asm/dup_s16.c b/gcc/testsuite/gcc.target/aarch64/sve-acle/asm/dup_s16.c
index 350af45b4ee..6a5af81ed3a 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve-acle/asm/dup_s16.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve-acle/asm/dup_s16.c
@@ -339,7 +339,7 @@ TEST_UNIFORM_Z (dup_127_s16_m, svint16_t,
 /*
 ** dup_128_s16_m:
 **	mov	(z[0-9]+\.h), #128
-**	sel	z0\.h, p0/m, \1, z0\.h
+**	sel	z0\.h, p0, \1, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (dup_128_s16_m, svint16_t,
@@ -359,7 +359,7 @@ TEST_UNIFORM_Z (dup_253_s16_m, svint16_t,
 /*
 ** dup_254_s16_m:
 **	mov	(z[0-9]+\.h), #254
-**	sel	z0\.h, p0/m, \1, z0\.h
+**	sel	z0\.h, p0, \1, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (dup_254_s16_m, svint16_t,
@@ -369,7 +369,7 @@ TEST_UNIFORM_Z (dup_254_s16_m, svint16_t,
 /*
 ** dup_255_s16_m:
 **	mov	(z[0-9]+\.h), #255
-**	sel	z0\.h, p0/m, \1, z0\.h
+**	sel	z0\.h, p0, \1, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (dup_255_s16_m, svint16_t,
@@ -388,7 +388,7 @@ TEST_UNIFORM_Z (dup_256_s16_m, svint16_t,
 /*
 ** dup_257_s16_m:
 **	mov	(z[0-9]+)\.b, #1
-**	sel	z0\.h, p0/m, \1\.h, z0\.h
+**	sel	z0\.h, p0, \1\.h, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (dup_257_s16_m, svint16_t,
@@ -426,7 +426,7 @@ TEST_UNIFORM_Z (dup_7ffd_s16_m, svint16_t,
 /*
 ** dup_7ffe_s16_m:
 **	mov	(z[0-9]+\.h), #32766
-**	sel	z0\.h, p0/m, \1, z0\.h
+**	sel	z0\.h, p0, \1, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (dup_7ffe_s16_m, svint16_t,
@@ -436,7 +436,7 @@ TEST_UNIFORM_Z (dup_7ffe_s16_m, svint16_t,
 /*
 ** dup_7fff_s16_m:
 **	mov	(z[0-9]+\.h), #32767
-**	sel	z0\.h, p0/m, \1, z0\.h
+**	sel	z0\.h, p0, \1, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (dup_7fff_s16_m, svint16_t,
@@ -464,7 +464,7 @@ TEST_UNIFORM_Z (dup_m128_s16_m, svint16_t,
 /*
 ** dup_m129_s16_m:
 **	mov	(z[0-9]+\.h), #-129
-**	sel	z0\.h, p0/m, \1, z0\.h
+**	sel	z0\.h, p0, \1, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (dup_m129_s16_m, svint16_t,
@@ -484,7 +484,7 @@ TEST_UNIFORM_Z (dup_m254_s16_m, svint16_t,
 /*
 ** dup_m255_s16_m:
 **	mov	(z[0-9]+\.h), #-255
-**	sel	z0\.h, p0/m, \1, z0\.h
+**	sel	z0\.h, p0, \1, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (dup_m255_s16_m, svint16_t,
@@ -503,7 +503,7 @@ TEST_UNIFORM_Z (dup_m256_s16_m, svint16_t,
 /*
 ** dup_m257_s16_m:
 **	mov	(z[0-9]+\.h), #-257
-**	sel	z0\.h, p0/m, \1, z0\.h
+**	sel	z0\.h, p0, \1, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (dup_m257_s16_m, svint16_t,
@@ -513,7 +513,7 @@ TEST_UNIFORM_Z (dup_m257_s16_m, svint16_t,
 /*
 ** dup_m258_s16_m:
 **	mov	(z[0-9]+)\.b, #-2
-**	sel	z0\.h, p0/m, \1\.h, z0\.h
+**	sel	z0\.h, p0, \1\.h, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (dup_m258_s16_m, svint16_t,
@@ -546,7 +546,7 @@ TEST_UNIFORM_Z (dup_m7f00_s16_m, svint16_t,
 /*
 ** dup_m7f01_s16_m:
 **	mov	(z[0-9]+\.h), #-32513
-**	sel	z0\.h, p0/m, \1, z0\.h
+**	sel	z0\.h, p0, \1, z0\.h
 **	ret
 */
 TEST_UNIFORM_Z (dup_m7f01_s16_m, svint16_t,
@@ -566,7 +566,7 @@ TEST_UNIFORM_Z (dup_m7ffe_s16_m, svint16_t,
 /*
 ** dup_m7fff_s16_m:
 **	mov	(z[0-9]+\.h), #-32767
-**	sel	z0\.h, p0/m, \1, z0\.h
+**	sel	z0\.h, p0, \1, z0\

Re: [PATCH AutoFDO]Restoring indirect call value profile transformation

2018-12-19 Thread Richard Biener
On Wed, Dec 19, 2018 at 5:08 AM Bin.Cheng  wrote:
>
> On Wed, Dec 19, 2018 at 12:00 PM Andi Kleen  wrote:
> >
> > On Wed, Dec 19, 2018 at 10:01:15AM +0800, Bin.Cheng wrote:
> > > On Tue, Dec 18, 2018 at 7:15 PM Bin.Cheng  wrote:
> > > >
> > > > On Sun, Dec 16, 2018 at 9:11 AM Andi Kleen  wrote:
> > > > >
> > > > > "bin.cheng"  writes:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > Due to ICE and mal-functional bugs, indirect call value profile 
> > > > > > transformation
> > > > > > is disabled on GCC-7/8/trunk.  This patch restores the 
> > > > > > transformation.  The
> > > > > > main issue is AutoFDO should store cgraph_node's profile_id of 
> > > > > > callee func in
> > > > > > the first histogram value's counter, rather than pointer to 
> > > > > > callee's name string
> > > > > > as it is now.
> > > > > > With the patch, some "Indirect call -> direct call" tests pass with 
> > > > > > autofdo, while
> > > > > > others are unstable.  I think the instability is caused by poor 
> > > > > > perf data collected
> > > > > > during regrets run, and can confirm these tests pass if good perf 
> > > > > > data could be
> > > > > > collected in manual experiments.
> > > > >
> > > > > Would be good to make the tests stable, otherwise we'll just have
> > > > > regressions in the future again.
> > > > >
> > > > > The problem is that the tests don't run long enough and don't get 
> > > > > enough samples?
> > > > Yes, take g++.dg/tree-prof/morefunc.C as an example:
> > > > -  int i;
> > > > -  for (i = 0; i < 1000; i++)
> > > > +  int i, j;
> > > > +  for (i = 0; i < 100; i++)
> > > > +for (j = 0; j < 50; j++)
> > > >   g += tc->foo();
> > > > if (g<100) g++;
> > > >  }
> > > > @@ -27,8 +28,9 @@ void test1 (A *tc)
> > > >  static __attribute__((always_inline))
> > > >  void test2 (B *tc)
> > > >  {
> > > > -  int i;
> > > > +  int i, j;
> > > >for (i = 0; i < 100; i++)
> > > > +for (j = 0; j < 50; j++)
> > > >
> > > > I have to increase loop count like this to get stable pass on my
> > > > machine.  The original count (1000) is too small to be sampled.
> > > >
> > > > >
> > > > > Could add some loop?
> > > > > Or possibly increase the sampling frequency in perf (-F or -c)?
> > > > Maybe, I will have a try.
> > > Turned out all "Indirect call" test can be resolved by adding -c 100
> > > to perf command line:
> > > diff --git a/gcc/config/i386/gcc-auto-profile 
> > > b/gcc/config/i386/gcc-auto-profile
> > > ...
> > > -exec perf record -e $E -b "$@"
> > > +exec perf record -e $E -c 100 -b "$@"
> > >
> > > Is 100 too small here?  Or is it fine for all scenarios?
> >
> > -c 100 is risky because it can cause perf throttling, which
> > makes it lose data.
> Right, it looks suspicious to me too.
>
> >
> > perf has a limiter that if the PMU handler uses too much CPU
> > time it stops measuring for some time. A PMI is 10k+ cycles,
> > so doing one every 100 branches is a lot of CPU time.
> >
> > I wouldn't go down that low. It is better to increase the
> > iteration count.
> We can combine the two together, increasing iteration count and
> decreasing perf count at the same time.  What count would you suggest
> from your experience?

Can we instead for the tests where we want to test profile use/merge
elide the profiling step and supply the "raw" data in an testsuite alternate
file instead?

Richard.

> Thanks,
> bin
> >
> > -Andi


[Patch, Vectorizer, SVE] fmin/fmax builtin reduction support

2018-12-19 Thread Alejandro Martinez Vicente
Hi all,
 
Loops that use the fmin/fmax builtins can be vectorized even without
-ffast-math using SVE's FMINNM/FMAXNM instructions. This is an example:
 
double
f (double *x, int n)
{
  double res = 100.0;
  for (int i = 0; i < n; ++i)
res = __builtin_fmin (res, x[i]);
  return res;
}

Before this patch, the compiler would generate this code (-march=armv8.2-a+sve
-O2 -ftree-vectorize):

 :
   0:   713fcmp w1, #0x0
   4:   5400018db.le34 
   8:   51000422sub w2, w1, #0x1
   c:   91002003add x3, x0, #0x8
  10:   d2e80b21mov x1, #0x4059
  14:   9e670020fmovd0, x1
  18:   8b224c62add x2, x3, w2, uxtw #3
  1c:   d503201fnop
  20:   fc408401ldr d1, [x0],#8
  24:   1e617800fminnm  d0, d0, d1
  28:   eb02001fcmp x0, x2
  2c:   54a1b.ne20 
  30:   d65f03c0ret
  34:   d2e80b20mov x0, #0x4059
  38:   9e67fmovd0, x0
  3c:   d65f03c0ret

After this patch, this is the code that gets generated:

 :
   0:   713fcmp w1, #0x0
   4:   5400020db.le44 
   8:   d282mov x2, #0x0
   c:   25d8e3e0ptrue   p0.d
  10:   93407c21sxtwx1, w1
  14:   9003adrpx3, 0 
  18:   25804001mov p1.b, p0.b
  1c:   9163add x3, x3, #0x0
  20:   85c0e060ld1rd   {z0.d}, p0/z, [x3]
  24:   25e11fe0whilelo p0.d, xzr, x1
  28:   a5e24001ld1d{z1.d}, p0/z, [x0, x2, lsl #3]
  2c:   04f0e3e2incdx2
  30:   65c58020fminnm  z0.d, p0/m, z0.d, z1.d
  34:   25e11c40whilelo p0.d, x2, x1
  38:   5481b.ne28   // b.any
  3c:   65c52400fminnmv d0, p1, z0.d
  40:   d65f03c0ret
  44:   d2e80b20mov x0, #0x4059
  48:   9e67fmovd0, x0
  4c:   d65f03c0ret

This patch extends the support for reductions to include calls to internal
functions, in addition to assign statements. For this purpose, in most places
where a tree_code would be used, a code_helper is used instead. The code_helper
allows to hold either a tree_code or combined_fn.

This patch implements these tasks:

- Detect a reduction candidate based on a call to an internal function
  (currently only fmin or fmax).
- Process the reduction using code_helper. This means that at several places
  we have to check whether this is as assign-based reduction or a call-based
  reduction.
- Add new internal functions for the fmin/fmax reductions and for conditional
  fmin/fmax. In architectures where ieee fmin/fmax reductions are available, it
  is still possible to vectorize the loop using unconditional instructions.
- Update SVE's md to support these new reductions.
- Add new SVE tests to check that the optimal code is being generated.

I tested this patch in an aarch64 machine bootstrapping the compiler and
running the checks.
 
Alejandro
 
gcc/Changelog:
 
2018-12-18  Alejandro Martinez  

* gimple-match.h (code_helper_for_stmnt): New function to get a
code_helper from an statement.
* internal-fn.def: New reduc_fmax_scal and reduc_fmin_scal optabs for
ieee fp max/min reductions
* optabs.def: Likewise.
* tree-vect-loop.c (reduction_fn_for_scalar_code): Changed function
signature to accept code_helper instead of tree_code. Handle the
fmax/fmin builtins.
(needs_fold_left_reduction_p): Likewise.
(check_reduction_path): Likewise.
(vect_is_simple_reduction): Use code_helper instead of tree_code. Check
for supported call-based reductions. Extend support for both
assignment-based and call-based reductions.
(vect_model_reduction_cost): Extend cost-model support to call-based
reductions (just use MAX expression).
(get_initial_def_for_reduction): Use code_helper instead of tree_code.
Extend support for both assignment-based and call-based reductions.
(vect_create_epilog_for_reduction): Likewise.
(vectorizable_reduction): Likewise.
* tree-vectorizer.h: include gimple-match.h for code_helper. Use
code_helper in check_reduction_path signature.
* config/aarch64/aarch64-sve.md: Added define_expand to capture new
reduc_fmax_scal and reduc_fmin_scal optabs.
* config/aarch64/iterators.md: New FMAXMINNMV and fmaxmin_uns iterators
to support the new define_expand.
 
gcc/testsuite/Changelog:
 
2018-12-18  Alejandro Martinez  

* gcc.target/aarch64/sve/reduc_9.c: New test to check
SVE-vectorized reductions without -ffast-math.
* gcc.target/aarch64/sve/reduc_10.c: New test to check
SVE-vectorized builtin reductions without -ffast-math.


final.patch
Description: final.patch


Re: [PATCH] [RFC] PR target/52813 and target/11807

2018-12-19 Thread Segher Boessenkool
On Wed, Dec 19, 2018 at 08:40:13AM +0200, Dimitar Dimitrov wrote:
> On Mon, Dec 17 2018 20:15:02 EET Bernd Edlinger wrote:
> > out of curiosity I looked at the clobber statement in
> > gdb/nat/linux-ptrace.c:
> > 
> >asm volatile ("pushq %0;"
> >  ".globl linux_ptrace_test_ret_to_nx_instr;"
> >  "linux_ptrace_test_ret_to_nx_instr:"
> >  "ret"
> >  : : "r" ((uint64_t) (uintptr_t) return_address)
> >  : "%rsp", "memory");
> > 
> > it turns out to be a far jump, instruction.
> 
> GDB functionality should not be affected if SP clobber is removed, even if 
> the 
> generated code is slightly different. Please see this comment:
> http://sourceware.org/ml/gdb-patches/2018-12/msg00204.html
> 
> As I understand it, this particular code is never meant to return. It should 
> either stop due to the NX mapping of return_address/%0, or hit the breakpoint 
> placed at return_address/%0.

If it doesn't return it is undefined behaviour, so anything might happen
and that is perfectly alright.

Defining labels is an asm is undefined, too.

Maybe real assembler code is wanted here?  I.e. a .s file.


Segher


Re: [rs6000] Fix x86 SSSE3 compatibility implementations and testcases

2018-12-19 Thread Segher Boessenkool
Hi!

On Tue, Dec 18, 2018 at 10:23:05PM -0600, Paul Clarke wrote:
> This patch is the analog to r266868-r266870, but for SSSE3.
> The SSSE3 tests had been inadvertently made to PASS without actually running
> the test code. Actually running the code turned up some previously undetected
> issues.
> 
> This patch fixes some issues in the implementations, fixes up the tests
> to use a union for the test data, which avoids strict aliasing issues,
> and enables the tests to actually run (by removing a dependency on
> __BUILTIN_CPU_SUPPORTS).
> 
> Also, there's a fairly insignificant change in the testcases that walk
> through the data as pairs of vectors from:
>   [0] and [1]
>   [2] and [3]
>   ...
>   [n-4] and [n-3]
>   [n-2] and [n-1]
> 
> to:
>   [0] and [1]
>   [1] and [2]
>   ...
>   [n-3] and [n-2]
>   [n-2] and [n-1]


> -  for (i = 0; i < 256; i += 4)
> +  for (i = 0; i < ARRAY_SIZE (vals); i ++)

Please write "i++", not "i ++", throughout.

I wonder if the extra overlap will not hide problems?  OTOH it is extra
testing of course.

Okay for trunk.  Thanks!


Segher


Re: [patch] Fix bootstrap powerpc*-*-freebsd* targets

2018-12-19 Thread Segher Boessenkool
On Tue, Dec 18, 2018 at 10:39:27AM +1030, Alan Modra wrote:
> On Mon, Dec 17, 2018 at 11:05:57AM -0600, Segher Boessenkool wrote:
> > Hi!
> > 
> > On Mon, Dec 17, 2018 at 10:40:01AM +1030, Alan Modra wrote:
> > > Since I broke powerpc*-freebsd and the other non-linux powerpc
> > > targets, I guess I ought to fix them.  The following is a variation on
> > > your first patch, that results in -mcall-linux for powerpc-freebsd*
> > > providing the 32-bit powerpc-linux dynamic linker.
> > 
> > That, like the first patch, abuses that header file.  Please do it
> > somewhere sane instead, not in a random subtarget file?
> 
> Is there is a better place, currently?  sysv4.h contains a mess of OS
> related defines already, to support various -mcall options.  If those
> stay in sysv4.h I can't see a better place for the fall-back
> GNU_USER_DYNAMIC_LINKER define.
> 
> Here's the problem:
> powerpc*-*-linux* uses tm_file="rs6000/rs6000.h dbxelf.h elfos.h
> gnu-user.h linux.h freebsd-spec.h rs6000/sysv4.h" plus a few more.
> linux.h contains the proper GNU_USER_DYNAMIC_LINKER define for linux.
> Fairly obviously we can't put a fallback define in rs6000/rs6000.h
> for those targets that don't include linux.h (and including linux.h
> for non-linux targets is probably not a good idea).
> 
> Besides rs6000/sysv4.h, you could put the fallback in rs6000/freebsd.h
> to fix powerpc*-freebsd*, but then you'd need to put it in
> rs6000/netbsd.h, rs6000/eabi.h, rs6000/rtems.h, rs6000/vxworks.h,
> rs6000/lynx.h to fix those targets.  That would be horrible.  And it
> would leave powerpc-elf broken.
> 
> > 
> > >   * config/rs6000/sysv4.h (GNU_USER_DYNAMIC_LINKER): Define.

The patch is okay for now, btw.  Thanks!


Segher


Re: [PATCH] Allow _mm256_clmulepi64_epi128 even for just -mvcplmulqdq -mavx (PR target/88541)

2018-12-19 Thread Uros Bizjak
On Tue, Dec 18, 2018 at 9:53 PM Jakub Jelinek  wrote:
>
> Hi!
>
> As mentioned in the PR, there is a VEX encoded vpclmulqdq instruction
> with ymm arguments that needs VPCLMULQDQ ISA, and then EVEX encoded
> vpclmulqdq with zmm arguments that needs VPCLMULQDQ + AVX512F ISAs and
> vpclmulqdq with xmm or ymm arguments that needs VPCLMULQDQ + AVX512VL ISAs.
>
> So, _mm256_clmulepi64_epi128 can be done just with AVX (so that VEX encoded
> instructions are handled) + VPCLMULQDQ ISAs.
> The corresponding builtin matches this:
> BDESC (OPTION_MASK_ISA_VPCLMULQDQ | OPTION_MASK_ISA_AVX, 
> CODE_FOR_vpclmulqdq_v4di, "__builtin_ia32_vpclmulqdq_v4di", 
> IX86_BUILTIN_VPCLMULQDQ4, UNKNOWN, (int) V4DI_FTYPE_V4DI_V4DI_INT)
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
>
> 2018-12-18  Jakub Jelinek  
>
> PR target/88541
> * config/i386/vpclmulqdqintrin.h (_mm256_clmulepi64_epi128): Enable
> for -mavx -mvpclmulqdq rather than just for -mavx512vl -mvpclmulqdq.
>
> * gcc.target/i386/avx-vpclmulqdq-1.c: New test.

OK.

Thanks,
Uros.

> --- gcc/config/i386/vpclmulqdqintrin.h.jj   2018-06-13 10:05:54.775128332 
> +0200
> +++ gcc/config/i386/vpclmulqdqintrin.h  2018-12-18 20:09:37.693666571 +0100
> @@ -53,9 +53,9 @@ _mm512_clmulepi64_epi128 (__m512i __A, _
>  #pragma GCC pop_options
>  #endif /* __DISABLE_VPCLMULQDQF__ */
>
> -#if !defined(__VPCLMULQDQ__) || !defined(__AVX512VL__)
> +#if !defined(__VPCLMULQDQ__) || !defined(__AVX__)
>  #pragma GCC push_options
> -#pragma GCC target("vpclmulqdq,avx512vl")
> +#pragma GCC target("vpclmulqdq,avx")
>  #define __DISABLE_VPCLMULQDQ__
>  #endif /* __VPCLMULQDQ__ */
>
> @@ -78,6 +78,4 @@ _mm256_clmulepi64_epi128 (__m256i __A, _
>  #pragma GCC pop_options
>  #endif /* __DISABLE_VPCLMULQDQ__ */
>
> -
>  #endif /* _VPCLMULQDQINTRIN_H_INCLUDED */
> -
> --- gcc/testsuite/gcc.target/i386/avx-vpclmulqdq-1.c.jj 2018-12-18 
> 20:13:28.683960294 +0100
> +++ gcc/testsuite/gcc.target/i386/avx-vpclmulqdq-1.c2018-12-18 
> 20:12:41.140723131 +0100
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx -mvpclmulqdq" } */
> +
> +#include 
> +
> +__m256i
> +foo (__m256i x, __m256i y)
> +{
> +  return _mm256_clmulepi64_epi128 (x, y, 0);
> +}
>
> Jakub


Re: [PATCH] [PR87012] canonicalize ref type for tmpl arg

2018-12-19 Thread Alexandre Oliva
On Dec 14, 2018, Jason Merrill  wrote:

> Yes, like that, thanks.  It might be a bit of an optimization to skip
> this when t == TREE_TYPE (parm).  OK either way.

Thanks, I've put the suggested optimization in.

Here's what I'm about to install.


[PR87012] canonicalize ref type for tmpl arg

When binding an object to a template parameter of reference type, we
take the address of the object and dereference that address.  The type
of the address may still carry (template) typedefs, but
verify_unstripped_args_1 rejects such typedefs other than in the top
level of template arguments.

Canonicalizing the type we want to convert to right after any
substitutions or deductions avoids that issue.


for  gcc/cp/ChangeLog

PR c++/87012
* pt.c (convert_template_argument): Canonicalize type after
tsubst/deduce.

for  gcc/testsuite/ChangeLog

PR c++/87012
* g++.dg/cpp0x/pr87012.C: New.
---
 gcc/cp/pt.c  |3 +++
 gcc/testsuite/g++.dg/cpp0x/pr87012.C |   11 +++
 2 files changed, 14 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/pr87012.C

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 79eef12112fb..e99de71ea9e2 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -8019,6 +8019,9 @@ convert_template_argument (tree parm,
   if (invalid_nontype_parm_type_p (t, complain))
return error_mark_node;
 
+  if (t != TREE_TYPE (parm))
+   t = canonicalize_type_argument (t, complain);
+
   if (!type_dependent_expression_p (orig_arg)
  && !uses_template_parms (t))
/* We used to call digest_init here.  However, digest_init
diff --git a/gcc/testsuite/g++.dg/cpp0x/pr87012.C 
b/gcc/testsuite/g++.dg/cpp0x/pr87012.C
new file mode 100644
index ..fd3eea47c390
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/pr87012.C
@@ -0,0 +1,11 @@
+// { dg-do compile { target c++11 } }
+
+template
+using ref = T&;
+
+int x;
+
+template class T, T>
+struct X { };
+
+struct Y : X { };


-- 
Alexandre Oliva, freedom fighter   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás-GNUChe