Re: [PATCH] AVX512FP16: Support vector shuffle builtins

2021-10-14 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 15, 2021 at 1:37 PM Hongyu Wang  wrote:
>
> > This part seems not related to vector shuffle.
> Yes, have separated this part to another patch and checked-in.
>
> Updated patch. Ok for this one?
>
> Hongtao Liu via Gcc-patches  于2021年10月14日周四 下午2:33写道:
> >
> > On Thu, Oct 14, 2021 at 10:39 AM Hongyu Wang via Gcc-patches
> >  wrote:
> > >
> > > Hi,
> > >
> > > This patch supports HFmode vector shuffle by creating HImode subreg when
> > > expanding permutation expr.
> > >
> > > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde{-m32,}
> > > OK for master?
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/i386/i386-expand.c (ix86_expand_vec_perm): Convert
> > > HFmode input operand to HImode.
> > > (ix86_vectorize_vec_perm_const): Likewise.
> > > (ix86_expand_vector_init): Allow HFmode for one_operand_shuffle.
> > > * config/i386/sse.md (*avx512bw_permvar_truncv16siv16hi_1_hf):
> > > New define_insn.
> > > (*avx512f_permvar_truncv8siv8hi_1_hf):
> > > Likewise.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/avx512fp16-builtin_shuffle-1.c: New test.
> > > * gcc.target/i386/avx512fp16-pr101846.c: Ditto.
> > > * gcc.target/i386/avx512fp16-pr94680.c: Ditto.
> > > ---
> > >  gcc/config/i386/i386-expand.c | 29 ++-
> > >  gcc/config/i386/sse.md| 54 +++-
> > >  .../i386/avx512fp16-builtin_shuffle-1.c   | 86 +++
> > >  .../gcc.target/i386/avx512fp16-pr101846.c | 56 
> > >  .../gcc.target/i386/avx512fp16-pr94680.c  | 61 +
> > >  5 files changed, 284 insertions(+), 2 deletions(-)
> > >  create mode 100644 
> > > gcc/testsuite/gcc.target/i386/avx512fp16-builtin_shuffle-1.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-pr101846.c
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-pr94680.c
> > >
> > > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > > index c0924a59efb..0f50ed3b9f8 100644
> > > --- a/gcc/config/i386/i386-expand.c
> > > +++ b/gcc/config/i386/i386-expand.c
> > > @@ -4836,6 +4836,18 @@ ix86_expand_vec_perm (rtx operands[])
> > >e = GET_MODE_UNIT_SIZE (mode);
> > >gcc_assert (w <= 64);
> > >
> > > +  if (GET_MODE_INNER (mode) == HFmode)
> > > +{
> > > +  machine_mode orig_mode = mode;
> > > +  mode = mode_for_vector (HImode, w).require ();
> > > +  if (target)
> > > +   target = lowpart_subreg (mode, target, orig_mode);
> > > +  if (op0)
> > > +   op0 = lowpart_subreg (mode, op0, orig_mode);
> > > +  if (op1)
> > > +   op1 = lowpart_subreg (mode, op1, orig_mode);
> > > +}
> > > +
ix86_expand_vec_perm is only called by (define_expand "vec_perm"
which means target, op0 and op1 must existed, and you can drop
if(target/op0/op1) stuff.
> > >if (TARGET_AVX512F && one_operand_shuffle)
> > >  {
> > >rtx (*gen) (rtx, rtx, rtx) = NULL;
> > > @@ -15092,7 +15104,8 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, 
> > > rtx vals)
> > >   rtx ops[2] = { XVECEXP (vals, 0, 0), XVECEXP (vals, 0, 1) };
> > >   if (inner_mode == QImode
> > >   || inner_mode == HImode
> > > - || inner_mode == TImode)
> > > + || inner_mode == TImode
> > > + || inner_mode == HFmode)
> > This part seems not related to vector shuffle.
> > > {
> > >   unsigned int n_bits = n_elts * GET_MODE_SIZE (inner_mode);
> > >   scalar_mode elt_mode = inner_mode == TImode ? DImode : 
> > > SImode;
> > > @@ -21099,6 +21112,20 @@ ix86_vectorize_vec_perm_const (machine_mode 
> > > vmode, rtx target, rtx op0,
> > >unsigned int i, nelt, which;
> > >bool two_args;
> > >
> > > +  /* For HF mode vector, convert it to HI using subreg.  */
> > > +  if (GET_MODE_INNER (vmode) == HFmode)
> > > +{
> > > +  machine_mode orig_mode = vmode;
> > > +  vmode = mode_for_vector (HImode,
> > > +  GET_MODE_NUNITS (vmode)).require ();
> > > +  if (target)
> > > +   target = lowpart_subreg (vmode, target, orig_mode);
> > > +  if (op0)
> > > +   op0 = lowpart_subreg (vmode, op0, orig_mode);
> > > +  if (op1)
> > > +   op1 = lowpart_subreg (vmode, op1, orig_mode);
> > > +}
> > > +
Those checks for NULL seems reasonable according to documents,
op0,op1,target maybe NULL.
@deftypefn {Target Hook} bool TARGET_VECTORIZE_VEC_PERM_CONST
(machine_mode @var{mode}, rtx @var{output}, rtx @var{in0}, rtx
@var{in1}, const vec_perm_indices @var{})
This hook is used to test whether the target can permute up to two
vectors of mode @var{mode} using the permutation vector @code{sel}, and
also to emit such a permutation.  In the former case @var{in0}, @var{in1}
and @var{out} are all null.  In the latter case @var{in0} and @var{in1} are
the source vectors and @var{out} is the 

Re: [PATCH] Fix loop split incorrect count and probability

2021-10-14 Thread Xionghu Luo via Gcc-patches


On 2021/9/23 20:17, Richard Biener wrote:
> On Wed, 22 Sep 2021, Xionghu Luo wrote:
> 
>>
>>
>> On 2021/8/11 17:16, Richard Biener wrote:
>>> On Wed, 11 Aug 2021, Xionghu Luo wrote:
>>>


 On 2021/8/10 22:47, Richard Biener wrote:
> On Mon, 9 Aug 2021, Xionghu Luo wrote:
>
>> Thanks,
>>
>> On 2021/8/6 19:46, Richard Biener wrote:
>>> On Tue, 3 Aug 2021, Xionghu Luo wrote:
>>>
 loop split condition is moved between loop1 and loop2, the split bb's
 count and probability should also be duplicated instead of (100% vs
 INV),
 secondly, the original loop1 and loop2 count need be propotional from
 the
 original loop.


 diff base/loop-cond-split-1.c.151t.lsplit
 patched/loop-cond-split-1.c.151t.lsplit:
 ...
   int prephitmp_16;
   int prephitmp_25;

[local count: 118111600]:
   if (n_7(D) > 0)
 goto ; [89.00%]
   else
 goto ; [11.00%]

[local count: 118111600]:
   return;

[local count: 105119324]:
   pretmp_3 = ga;

 -   [local count: 955630225]:
 +   [local count: 315357973]:
   # i_13 = PHI 
   # prephitmp_12 = PHI 
   if (prephitmp_12 != 0)
 goto ; [33.00%]
   else
 goto ; [67.00%]

 -   [local count: 315357972]:
 +   [local count: 104068130]:
   _2 = do_something ();
   ga = _2;

 -   [local count: 955630225]:
 +   [local count: 315357973]:
   # prephitmp_5 = PHI 
   i_10 = inc (i_13);
   if (n_7(D) > i_10)
 goto ; [89.00%]
   else
 goto ; [11.00%]

[local count: 105119324]:
   goto ; [100.00%]

 -   [local count: 850510901]:
 +   [local count: 280668596]:
   if (prephitmp_12 != 0)
 -goto ; [100.00%]
 +goto ; [33.00%]
   else
 -goto ; [INV]
 +goto ; [67.00%]

 -   [local count: 850510901]:
 +   [local count: 280668596]:
   goto ; [100.00%]

 -   [count: 0]:
 +   [local count: 70429947]:
   # i_23 = PHI 
   # prephitmp_25 = PHI 

 -   [local count: 955630225]:
 +   [local count: 640272252]:
   # i_15 = PHI 
   # prephitmp_16 = PHI 
   i_22 = inc (i_15);
   if (n_7(D) > i_22)
 goto ; [89.00%]
   else
 goto ; [11.00%]

 -   [local count: 850510901]:
 +   [local count: 569842305]:
   goto ; [100.00%]

 }

 gcc/ChangeLog:

  * tree-ssa-loop-split.c (split_loop): Fix incorrect probability.
  (do_split_loop_on_cond): Likewise.
 ---
 gcc/tree-ssa-loop-split.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

 diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
 index 3a09bbc39e5..8e5a7ded0f7 100644
 --- a/gcc/tree-ssa-loop-split.c
 +++ b/gcc/tree-ssa-loop-split.c
 @@ -583,10 +583,10 @@ split_loop (class loop *loop1)
  basic_block cond_bb;
>>
>>  if (!initial_true)
>> -  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>> +  cond = fold_build1 (TRUTH_NOT_EXPR, boolean_type_node, cond);
>> +
>> +edge true_edge = EDGE_SUCC (bbs[i], 0)->flags & EDGE_TRUE_VALUE
>> +   ? EDGE_SUCC (bbs[i], 0)
>> +   : EDGE_SUCC (bbs[i], 1);
>>
 
class loop *loop2 = loop_version (loop1, cond, _bb,
 - profile_probability::always
 (),
 - profile_probability::always
 (),
 - profile_probability::always
 (),
 - profile_probability::always
 (),
 + true_edge->probability,
 +
 true_edge->probability.invert (),
 + true_edge->probability,
 +
 true_edge->probability.invert (),
 true);
>>>
>>> there is no 'true_edge' variable at this point.
>>
>> Sorry, missed the above hunk when split the patch.
>>
>>>
  gcc_assert (loop2);
 @@ -1486,10 +1486,10 @@ do_split_loop_on_cond (struct loop 

Re: [PATCH] AVX512FP16: Support vector shuffle builtins

2021-10-14 Thread Hongyu Wang via Gcc-patches
> This part seems not related to vector shuffle.
Yes, have separated this part to another patch and checked-in.

Updated patch. Ok for this one?

Hongtao Liu via Gcc-patches  于2021年10月14日周四 下午2:33写道:
>
> On Thu, Oct 14, 2021 at 10:39 AM Hongyu Wang via Gcc-patches
>  wrote:
> >
> > Hi,
> >
> > This patch supports HFmode vector shuffle by creating HImode subreg when
> > expanding permutation expr.
> >
> > Bootstrapped/regtested on x86_64-pc-linux-gnu{-m32,} and sde{-m32,}
> > OK for master?
> >
> > gcc/ChangeLog:
> >
> > * config/i386/i386-expand.c (ix86_expand_vec_perm): Convert
> > HFmode input operand to HImode.
> > (ix86_vectorize_vec_perm_const): Likewise.
> > (ix86_expand_vector_init): Allow HFmode for one_operand_shuffle.
> > * config/i386/sse.md (*avx512bw_permvar_truncv16siv16hi_1_hf):
> > New define_insn.
> > (*avx512f_permvar_truncv8siv8hi_1_hf):
> > Likewise.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/i386/avx512fp16-builtin_shuffle-1.c: New test.
> > * gcc.target/i386/avx512fp16-pr101846.c: Ditto.
> > * gcc.target/i386/avx512fp16-pr94680.c: Ditto.
> > ---
> >  gcc/config/i386/i386-expand.c | 29 ++-
> >  gcc/config/i386/sse.md| 54 +++-
> >  .../i386/avx512fp16-builtin_shuffle-1.c   | 86 +++
> >  .../gcc.target/i386/avx512fp16-pr101846.c | 56 
> >  .../gcc.target/i386/avx512fp16-pr94680.c  | 61 +
> >  5 files changed, 284 insertions(+), 2 deletions(-)
> >  create mode 100644 
> > gcc/testsuite/gcc.target/i386/avx512fp16-builtin_shuffle-1.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-pr101846.c
> >  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-pr94680.c
> >
> > diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> > index c0924a59efb..0f50ed3b9f8 100644
> > --- a/gcc/config/i386/i386-expand.c
> > +++ b/gcc/config/i386/i386-expand.c
> > @@ -4836,6 +4836,18 @@ ix86_expand_vec_perm (rtx operands[])
> >e = GET_MODE_UNIT_SIZE (mode);
> >gcc_assert (w <= 64);
> >
> > +  if (GET_MODE_INNER (mode) == HFmode)
> > +{
> > +  machine_mode orig_mode = mode;
> > +  mode = mode_for_vector (HImode, w).require ();
> > +  if (target)
> > +   target = lowpart_subreg (mode, target, orig_mode);
> > +  if (op0)
> > +   op0 = lowpart_subreg (mode, op0, orig_mode);
> > +  if (op1)
> > +   op1 = lowpart_subreg (mode, op1, orig_mode);
> > +}
> > +
> >if (TARGET_AVX512F && one_operand_shuffle)
> >  {
> >rtx (*gen) (rtx, rtx, rtx) = NULL;
> > @@ -15092,7 +15104,8 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, 
> > rtx vals)
> >   rtx ops[2] = { XVECEXP (vals, 0, 0), XVECEXP (vals, 0, 1) };
> >   if (inner_mode == QImode
> >   || inner_mode == HImode
> > - || inner_mode == TImode)
> > + || inner_mode == TImode
> > + || inner_mode == HFmode)
> This part seems not related to vector shuffle.
> > {
> >   unsigned int n_bits = n_elts * GET_MODE_SIZE (inner_mode);
> >   scalar_mode elt_mode = inner_mode == TImode ? DImode : SImode;
> > @@ -21099,6 +21112,20 @@ ix86_vectorize_vec_perm_const (machine_mode vmode, 
> > rtx target, rtx op0,
> >unsigned int i, nelt, which;
> >bool two_args;
> >
> > +  /* For HF mode vector, convert it to HI using subreg.  */
> > +  if (GET_MODE_INNER (vmode) == HFmode)
> > +{
> > +  machine_mode orig_mode = vmode;
> > +  vmode = mode_for_vector (HImode,
> > +  GET_MODE_NUNITS (vmode)).require ();
> > +  if (target)
> > +   target = lowpart_subreg (vmode, target, orig_mode);
> > +  if (op0)
> > +   op0 = lowpart_subreg (vmode, op0, orig_mode);
> > +  if (op1)
> > +   op1 = lowpart_subreg (vmode, op1, orig_mode);
> > +}
> > +
> >d.target = target;
> >d.op0 = op0;
> >d.op1 = op1;
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> > index a3c4a3f1e62..d023d8a1c2e 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -12573,6 +12573,33 @@
> > (truncate:V16HI (match_dup 1)))]
> >"operands[1] = lowpart_subreg (V16SImode, operands[1], V32HImode);")
> >
> > +(define_insn_and_split "*avx512bw_permvar_truncv16siv16hi_1_hf"
> > +  [(set (match_operand:V16HF 0 "nonimmediate_operand")
> > +   (vec_select:V16HF
> > + (subreg:V32HF
> > +   (unspec:V32HI
> > + [(match_operand:V32HI 1 "register_operand")
> > +  (match_operand:V32HI 2 "permvar_truncate_operand")]
> > +UNSPEC_VPERMVAR) 0)
> > + (parallel [(const_int 0) (const_int 1)
> > +(const_int 2) (const_int 3)
> > +(const_int 4) (const_int 5)
> > +(const_int 6) (const_int 7)
> > +

Re: [PATCH] AVX512FP16: Fix ICE for 2 v4hf vector concat

2021-10-14 Thread Hongtao Liu via Gcc-patches
On Fri, Oct 15, 2021 at 1:07 PM Hongyu Wang via Gcc-patches
 wrote:
>
> Hi,
>
> For V4HFmode, doing vector concat like
>
> __builtin_shufflevector (a, b, {0, 1, 2, 3, 4, 5, 6, 7})
>
> could trigger ICE since it is not handled in ix86_vector_init ().
>
> Handle HFmode like HImode to avoid such ICE.
>
> Bootstrappted/regtested on x86_64-pc-linux-gnu{-m32,} and sde{-m32,}
>
> OK for master?
>
Ok.
> gcc/ChangeLog:
>
> * config/i386/i386-expand.c (ix86_expand_vector_init):
> For half_vector concat for HFmode, handle them like HImode.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/i386/avx512fp16-v4hf-concat.c: New test.
> ---
>  gcc/config/i386/i386-expand.c|  3 ++-
>  .../gcc.target/i386/avx512fp16-v4hf-concat.c | 16 
>  2 files changed, 18 insertions(+), 1 deletion(-)
>  create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-v4hf-concat.c
>
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index 95274201f4f..1b011047251 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -15122,7 +15122,8 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, rtx 
> vals)
>   rtx ops[2] = { XVECEXP (vals, 0, 0), XVECEXP (vals, 0, 1) };
>   if (inner_mode == QImode
>   || inner_mode == HImode
> - || inner_mode == TImode)
> + || inner_mode == TImode
> + || inner_mode == HFmode)
> {
>   unsigned int n_bits = n_elts * GET_MODE_SIZE (inner_mode);
>   scalar_mode elt_mode = inner_mode == TImode ? DImode : SImode;
> diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-v4hf-concat.c 
> b/gcc/testsuite/gcc.target/i386/avx512fp16-v4hf-concat.c
> new file mode 100644
> index 000..3b8a7f39b85
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512fp16-v4hf-concat.c
> @@ -0,0 +1,16 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512fp16 -O2" } */
> +/* { dg-final { scan-assembler-times "vpunpcklqdq" 1 } } */
> +
> +typedef _Float16 v8hf __attribute__((vector_size (16)));
> +typedef _Float16 v4hf __attribute__((vector_size (8)));
> +
> +v8hf foov (v4hf a, v4hf b)
> +{
> +return __builtin_shufflevector (a, b, 0, 1, 2, 3, 4, 5, 6, 7);
> +}
> +
> +v8hf foov2 (v4hf a)
> +{
> +return __builtin_shufflevector (a, (v4hf){0}, 0, 1, 2, 3, 4, 5, 6, 7);
> +}
> --
> 2.27.1
>


-- 
BR,
Hongtao


Re: [PATCH] libstdc++: Check [ptr,end) and [ptr,ptr+n) ranges with _GLIBCXX_ASSERTIONS

2021-10-14 Thread François Dumont via Gcc-patches

On 14/10/21 7:43 pm, Jonathan Wakely wrote:

On Thu, 14 Oct 2021 at 18:11, François Dumont  wrote:

Hi

  On a related subject I am waiting for some feedback on:

https://gcc.gnu.org/pipermail/libstdc++/2021-August/053005.html

I'm concerned that this adds too much overhead for the
_GLIBCXX_ASSERTIONS case. It adds function calls which are not
necessarily inlined, and which perform arithmetic and comparisons on
the arguments. That has a runtime cost which is non-zero.


I thought that limiting the checks to __valid_range would be fine for 
_GLIBCXX_ASSERTIONS. If you do not want any overhead you just don't 
define it.




The patches I sent in this thread have zero runtime cost, because they
use the compiler built-in which compiles away to nothing if the sizes
aren't known.
I'll try to find out if it can help for the test case on std::copy which 
I was adding with my proposal.



On 11/10/21 6:49 pm, Jonathan Wakely wrote:

This enables lightweight checks for the __glibcxx_requires_valid_range
and __glibcxx_requires_string_len macros  when _GLIBCXX_ASSERTIONS is
defined.  By using __builtin_object_size we can check whether the end of
the range is part of the same object as the start of the range, and
detect problems like in PR 89927.

libstdc++-v3/ChangeLog:

   * include/debug/debug.h (__valid_range_p, __valid_range_n): New
   inline functions using __builtin_object_size to check ranges
   delimited by pointers.
   [_GLIBCXX_ASSERTIONS] (__glibcxx_requires_valid_range): Use
   __valid_range_p.
   [_GLIBCXX_ASSERTIONS] (__glibcxx_requires_string_len): Use
   __valid_range_n.


The first patch allows us to detect bugs like string("foo", "bar"),
like in PR 89927. Debug mode cannot currently detect this. The new
check uses the compiler built-in to detect when the two arguments are
not part of the same object. This assumes we're optimizing and the
compiler knows the values of the pointers. If it doesn't, then the
function just returns true and should inline to nothing.

I see, it does not detect that input pointers are unrelated but as they
are the computed size is >= __sz.

Isn't it UB to compare unrelated pointers ?

Yes, and my patch doesn't compare any pointers, does it?


+  __UINTPTR_TYPE__ __f = (__UINTPTR_TYPE__)__first;
+  __UINTPTR_TYPE__ __l = (__UINTPTR_TYPE__)__last;
+  if (const std::size_t __sz = __builtin_object_size(__first, 3))
+    return __f <= __l && (__l - __f) <= __sz;

Isn't it a comparison ?

But maybe this is what the previous cast is for, I never understood it.

Note that those cast could be moved within the if branch, even if I 
guess that the compiler does it.




[PATCH] AVX512FP16: Fix ICE for 2 v4hf vector concat

2021-10-14 Thread Hongyu Wang via Gcc-patches
Hi,

For V4HFmode, doing vector concat like

__builtin_shufflevector (a, b, {0, 1, 2, 3, 4, 5, 6, 7})

could trigger ICE since it is not handled in ix86_vector_init ().

Handle HFmode like HImode to avoid such ICE.

Bootstrappted/regtested on x86_64-pc-linux-gnu{-m32,} and sde{-m32,}

OK for master?

gcc/ChangeLog:

* config/i386/i386-expand.c (ix86_expand_vector_init):
For half_vector concat for HFmode, handle them like HImode.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-v4hf-concat.c: New test.
---
 gcc/config/i386/i386-expand.c|  3 ++-
 .../gcc.target/i386/avx512fp16-v4hf-concat.c | 16 
 2 files changed, 18 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/avx512fp16-v4hf-concat.c

diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 95274201f4f..1b011047251 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -15122,7 +15122,8 @@ ix86_expand_vector_init (bool mmx_ok, rtx target, rtx 
vals)
  rtx ops[2] = { XVECEXP (vals, 0, 0), XVECEXP (vals, 0, 1) };
  if (inner_mode == QImode
  || inner_mode == HImode
- || inner_mode == TImode)
+ || inner_mode == TImode
+ || inner_mode == HFmode)
{
  unsigned int n_bits = n_elts * GET_MODE_SIZE (inner_mode);
  scalar_mode elt_mode = inner_mode == TImode ? DImode : SImode;
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-v4hf-concat.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-v4hf-concat.c
new file mode 100644
index 000..3b8a7f39b85
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-v4hf-concat.c
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-final { scan-assembler-times "vpunpcklqdq" 1 } } */
+
+typedef _Float16 v8hf __attribute__((vector_size (16)));
+typedef _Float16 v4hf __attribute__((vector_size (8)));
+
+v8hf foov (v4hf a, v4hf b)
+{
+return __builtin_shufflevector (a, b, 0, 1, 2, 3, 4, 5, 6, 7);
+}
+
+v8hf foov2 (v4hf a)
+{
+return __builtin_shufflevector (a, (v4hf){0}, 0, 1, 2, 3, 4, 5, 6, 7);
+}
-- 
2.27.1



Re: [PATCH] include/longlong.h: Remove incorrect lvalue to rvalue conversion from asm output constraints

2021-10-14 Thread Fāng-ruì Sòng via Gcc-patches

On 2021-10-12, Jakub Jelinek wrote:

On Tue, Oct 12, 2021 at 09:21:21AM -0700, Fāng-ruì Sòng wrote:

> > An output constraint takes a lvalue. While GCC happily strips the
> > incorrect lvalue to rvalue conversion, Clang rejects the code by default:
> >
> > error: invalid use of a cast in a inline asm context requiring an 
lvalue: remove the cast or build with -fheinous-gnu-extensions
> >
> > The file appears to share the same origin with gmplib longlong.h but
> > they differ much now (gmplib version is much longer).
> >
> > I don't have write access to the git repo.
> > ---
> >  include/longlong.h | 186 ++---
> >  1 file changed, 93 insertions(+), 93 deletions(-)
> >
> > diff --git a/include/longlong.h b/include/longlong.h
> > index c3e92e54ecc..0a21a441d2d 100644
> > --- a/include/longlong.h
> > +++ b/include/longlong.h
> > @@ -194,8 +194,8 @@ extern UDItype __udiv_qrnnd (UDItype *, UDItype, 
UDItype, UDItype);
> >  #if defined (__arc__) && W_TYPE_SIZE == 32
> >  #define add_ss(sh, sl, ah, al, bh, bl) \
> >__asm__ ("add.f%1, %4, %5\n\tadc   %0, %2, %3" \
> > -: "=r" ((USItype) (sh)), \
> > -  "=" ((USItype) (sl)) \
> > +: "=r" (sh), \
> > +  "=" (sl) \
> >  : "%r" ((USItype) (ah)), \
> >"rICal" ((USItype) (bh)),  \
> >"%r" ((USItype) (al)), \
>
> This seems to alter the meanining of existing programs if sh and sl do
> not have the expected type.
>
> I think you need to add a compound expression and temporaries of type
> USItype if you want to avoid the cast.

Add folks who may comment on the output constraint behavior when a
lvalue to rvalue conversion like (`(USItype)`) is added.


Allowing the casts in there is intentional, the comment about this
e.g. in GCC's C FE says:
Really, this should not be here.  Users should be using a
proper lvalue, dammit.  But there's a long history of using casts
in the output operands.  In cases like longlong.h, this becomes a
primitive form of typechecking -- if the cast can be removed, then
the output operand had a type of the proper width; otherwise we'll
get an error.

If you try e.g.:

void
foo (void)
{
 int i;
 long l;
 __asm ("" : "=r" ((unsigned) i));
 __asm ("" : "=r" ((long) l));
 __asm ("" : "=r" ((long long) l));
 __asm ("" : "=r" ((int) l));
 __asm ("" : "=r" ((long) i));
}

then on e.g. x86-64 the first 3 asms are accepted by GCC, the last two
rejected, because the modes are different there.

So the above change throws away important typechecking.  As it is
used in a macro, something different should verify that if the casts are
removed.

Jakub



Thanks for the detailed explanation. I see the type checking effect now.
In many cases the assembler can provide some checks, e.g.

  long foo(long a) {
long r;
__asm__ ("movq %1, %0" : "=r"(r) : "r"(a));
return r;
  }

I came here from trying to build glibc with Clang where I came across errors 
like

In file included from strtof_l.c:44:
./strtod_l.c:1500:26: error: invalid use of a cast in a inline asm context 
requiring an lvalue: remove the cast or build with -fheinous-gnu-extensions
  udiv_qrnnd (quot, n, n, 0, d);
  ~~^~~
./longlong.h:518:24: note: expanded from macro 'udiv_qrnnd'
 "=d" ((UDItype) (r))   \
   ~~~^~
In file included from strtof_l.c:44:
./strtod_l.c:1606:21: error: invalid use of a cast in a inline asm context 
requiring an lvalue: remove the cast or build with -fheinous-gnu-extensions
  add_ss (n1, n0, r - d0, 0, 0, d0);
  ^

(Seems that building GCC with Clang doesn't run into the issues?)

If the desire to keep the checking is still strong, perhaps I will just stick 
with using the -f option for glibc.


PING^2: [PATCH/RFC 2/2] WPD: Enable whole program devirtualization at LTRANS

2021-10-14 Thread Feng Xue OS via Gcc-patches
Thanks,
Feng


From: Feng Xue OS
Sent: Thursday, September 16, 2021 5:26 PM
To: Jan Hubicka; mjam...@suse.cz; Richard Biener; gcc-patches@gcc.gnu.org
Cc: JiangNing OS
Subject: [PATCH/RFC 2/2] WPD: Enable whole program devirtualization at LTRANS

This patch is to extend applicability  of full devirtualization to LTRANS stage.
Normally, whole program assumption would not hold when WPA splits
whole compilation into more than one LTRANS partitions. To avoid information
lost for WPD at LTRANS, we will record all vtable nodes and related member
function references into each partition.

Bootstrapped/regtested on x86_64-linux and aarch64-linux.

Thanks,
Feng


2021-09-07  Feng Xue  

gcc/
* tree.h (TYPE_CXX_LOCAL): New macro for type using
base.nothrow_flag.
* tree-core.h (tree_base): Update comment on using
base.nothrow_flag to represent TYPE_CXX_LOCAL.
* ipa-devirt.c (odr_type_d::whole_program_local): Removed.
(odr_type_d::whole_program_local_p): Check TYPE_CXX_LOCAL flag
on type, and enable WPD at LTRANS when flag_devirtualize_fully
is true.
(get_odr_type): Remove setting whole_program_local flag on type.
(identify_whole_program_local_types): Replace whole_program_local
in odr_type_d by TYPE_CXX_LOCAL on type.
(maybe_record_node): Enable WPD at LTRANS when
flag_devirtualize_fully is true.
* ipa.c (can_remove_vtable_if_no_refs_p): Retain vtables at LTRANS
stage under full devirtualization.
* lto-cgraph.c (compute_ltrans_boundary): Add all defined vtables
to boundary of each LTRANS partition.
* lto-streamer-out.c (get_symbol_initial_value): Streaming out
initial value of vtable even its class is optimized away.
* lto-lang.c (lto_post_options): Disable full devirtualization
if flag_ltrans_devirtualize is false.
* tree-streamer-in.c (unpack_ts_base_value_fields): unpack value
of TYPE_CXX_LOCAL for a type from streaming data.
* tree-streamer-out.c (pack_ts_base_value_fields): pack value
ofTYPE_CXX_LOCAL for a type into streaming data.
---
From 624aef44d72799ae488a431b4dce730f4b0fc28e Mon Sep 17 00:00:00 2001
From: Feng Xue 
Date: Mon, 6 Sep 2021 20:34:50 +0800
Subject: [PATCH 2/2] WPD: Enable whole program devirtualization at LTRANS

Whole program assumption would not hold when WPA splits whole compilation
into more than one LTRANS partitions. To avoid information lost for WPD
at LTRANS, we will record all vtable nodes and related member function
references into each partition.

2021-09-07  Feng Xue  

gcc/
	* tree.h (TYPE_CXX_LOCAL): New macro for type using
	base.nothrow_flag.
   	* tree-core.h (tree_base): Update comment on using
	base.nothrow_flag to represent TYPE_CXX_LOCAL.
	* ipa-devirt.c (odr_type_d::whole_program_local): Removed.
(odr_type_d::whole_program_local_p): Check TYPE_CXX_LOCAL flag
	on type, and enable WPD at LTRANS when flag_devirtualize_fully
	is true.
(get_odr_type): Remove setting whole_program_local flag on type.
(identify_whole_program_local_types): Replace whole_program_local
	in odr_type_d by TYPE_CXX_LOCAL on type.
(maybe_record_node): Enable WPD at LTRANS when
	flag_devirtualize_fully	is true.
* ipa.c (can_remove_vtable_if_no_refs_p): Retain vtables at LTRANS
	stage under full devirtualization.
* lto-cgraph.c (compute_ltrans_boundary): Add all defined vtables
	to boundary of each LTRANS partition.
	* lto-streamer-out.c (get_symbol_initial_value): Streaming out
	initial	value of vtable even its class is optimized away.
	* lto-streamer-in.c (lto_input_tree): There might be more than
	one decls in dref_queue, register debuginfo for all of them.
	* lto-lang.c (lto_post_options): Disable full devirtualization
	if flag_ltrans_devirtualize is false.
	* tree-streamer-in.c (unpack_ts_base_value_fields): unpack value
	of TYPE_CXX_LOCAL for a type from streaming data.
	* tree-streamer-out.c (pack_ts_base_value_fields): pack value
	ofTYPE_CXX_LOCAL for a type into streaming data.
---
 gcc/ipa-devirt.c| 33 ++---
 gcc/ipa.c   |  7 ++-
 gcc/lto-cgraph.c| 19 +++
 gcc/lto-streamer-in.c   |  3 +--
 gcc/lto-streamer-out.c  | 12 +++-
 gcc/lto/lto-lang.c  |  6 ++
 gcc/tree-core.h |  3 +++
 gcc/tree-streamer-in.c  | 11 ---
 gcc/tree-streamer-out.c | 16 +---
 gcc/tree.h  |  5 +
 10 files changed, 90 insertions(+), 25 deletions(-)

diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index 284c449c6c1..bb929f016f8 100644
--- a/gcc/ipa-devirt.c
+++ b/gcc/ipa-devirt.c
@@ -216,8 +216,6 @@ struct GTY(()) odr_type_d
   int id;
   /* Is it in anonymous namespace? */
   bool anonymous_namespace;
-  /* Set when type is not used outside of program.  */
-  bool whole_program_local;
   /* Did we 

PING^2: [PATCH/RFC 1/2] WPD: Enable whole program devirtualization

2021-10-14 Thread Feng Xue OS via Gcc-patches
Hi, Honza & Martin,

  Would you please take some time to review proposal and patches of whole
program devirtualization? We have to say, this feature is not 100% safe, but
provides us a way to deploy correct WPD on C++ program if we elaborately
prepare linked libraries to ensure rtti symbols are contained, which is always
the case for libstdc++ and well-composed third-part c++libraries with default
gcc options. If not, we could get an expected rebuild with desirable options,
and this does not require invasive modification on source codes, which is an
advantage over LLVM visibility-based scheme.

Now gcc-12 dev branch is at late stage since time will step into Nov.  Anyway,
we are not sure it is acceptable or not. But if yes, getting it in before code
freeze would be a good time point. 

And made some minor changes on patches, also posted RFC link here for
your convenience.  (https://gcc.gnu.org/pipermail/gcc/2021-August/237132.html) 

Thanks,
Feng


From: Feng Xue OS 
Sent: Saturday, September 18, 2021 5:38 PM
To: Jason Merrill; Jan Hubicka; mjam...@suse.cz; Richard Biener; 
gcc-patches@gcc.gnu.org
Subject: Re: [PATCH/RFC 1/2] WPD: Enable whole program devirtualization

>On 9/16/21 22:29, Feng Xue OS wrote:
>>> On 9/16/21 05:25, Feng Xue OS via Gcc-patches wrote:
 This and following patches are composed to enable full devirtualization
 under whole program assumption (so also called whole-program
 devirtualization, WPD for short), which is an enhancement to current
 speculative devirtualization. The base of the optimization is how to
 identify class type that is local in terms of whole-program scope, at
 least  those class types in libstdc++ must be excluded in some way.
 Our means is to use typeinfo symbol as identity marker of a class since
 it is unique and always generated once the class or its derived type
 is instantiated somewhere, and rely on symbol resolution by
 lto-linker-plugin to detect whether  a typeinfo is referenced by regular
 object/library, which indirectly tells class types are escaped or not.
 The RFC at https://gcc.gnu.org/pipermail/gcc/2021-August/237132.html
 gives more details on that.

 Bootstrapped/regtested on x86_64-linux and aarch64-linux.

 Thanks,
 Feng

 
 2021-09-07  Feng Xue  

 gcc/
* common.opt (-fdevirtualize-fully): New option.
* class.c (build_rtti_vtbl_entries): Force generation of typeinfo
even -fno-rtti is specificied under full devirtualization.
>>>
>>> This makes -fno-rtti useless; rather than this, you should warn about
>>> the combination of flags and force flag_rtti on.  It also sounds like
>>> you depend on the library not being built with -fno-rtti.
>>
>> Although rtti is generated by front-end, we will remove it after lto symtab
>> merge, which is meant to keep same behavior as -fno-rtti.
>
> Ah, the cp/ change is OK, then, with a comment about that.
>
>> Yes, regular library to be linked with should contain rtti data, otherwise
>> WPD could not deduce class type usage safely. By default, we can think
>> that it should work for libstdc++, but it probably becomes a problem for
>> user library, which might be avoided if we properly document this
>> requirement and suggest user doing that when using WPD.
>
> Yes, I would expect that external libraries would be built with RTTI on
> to allow users to use RTTI features even if they aren't used within the
> library.  But it's good to document it as a requirement.
>
>> +   /* If a class with virtual base is only instantiated as
>> +  subobjects of derived classes, and has no complete object in
>> +  compilation unit, merely construction vtables will be 
>> involved,
>> +  its primary vtable is really not needed, and subject to being
>> +  removed.  So once a vtable node is encountered, for all
>> +  polymorphic base classes of the vtable's context class, always
>> +  force generation of primary vtable nodes when full
>> +  devirtualization is enabled.  */
>
> Why do you need the primary vtable if you're relying on RTTI info?
> Construction vtables will point to the same RTTI node.

At middle end, the easiest way to get vtable of type is via TYPE_BINFO(type),
it is the primary one. And WPD relies on existence of varpool_node of the
vtable decl to determine if the type has been removed (when it is never
instantiated), so we will force generation of vtable node at very early stage.
Additionally, construction vtable (C-in-D) belongs to the class (D) of complete
object, not the class (C) of subobject actually being constructed for, it is not
easy to correlate construction vtable with the subobject class (C) after front
end.

>
>> +   /* Public class w/o key member function (or local class in a public
>> +  inline function) requires COMDAT-like 

[PATCH] AVX512FP16: Fix testcase for complex intrinsic

2021-10-14 Thread Hongyu Wang via Gcc-patches
Hi,

-march=cascadelake which contains -mavx512vl produces unmatched scan
for vf[c]maddcsh test, so add -mno-avx512vl to vf[c]maddcsh-1a.c.

Also add scan for vblendmps to vf[c]maddcph tests to check correctness.

Tested on unix{-m32,} with -march=cascadelake.

Pushed to trunk as obvious fix.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512fp16-vfcmaddcph-1a.c: Add scan for
vblendmps.
* gcc.target/i386/avx512fp16-vfmaddcph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c: Likewise.
* gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c: Likewise.
* gcc.target/i386/avx512fp16-vfmaddcsh-1a.c: Add -mno-avx512vl.
* gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c: Likewise.
---
 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1a.c   | 1 +
 gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c   | 2 +-
 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1a.c| 1 +
 gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1a.c| 2 +-
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfcmaddcph-1a.c | 2 ++
 gcc/testsuite/gcc.target/i386/avx512fp16vl-vfmaddcph-1a.c  | 2 ++
 6 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1a.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1a.c
index 6c2c34c1731..cd39b7f99ff 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1a.c
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcph-1a.c
@@ -6,6 +6,7 @@
 /* { dg-final { scan-assembler-times "vfcmaddcph\[ 
\\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[
 \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vfcmaddcph\[ 
\\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[
 \\t\]+#)" 2 } } */
 /* { dg-final { scan-assembler-times "vfcmaddcph\[ 
\\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[
 \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vblendmps\[ 
\\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[
 \\t\]+#)" 2 } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c
index 8ff2092c325..eb96588df39 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-options "-mavx512fp16 -mno-avx512vl -O2" } */
 /* { dg-final { scan-assembler-times "vfcmaddcsh\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vfcmaddcsh\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[
 \\t\]+#)" 2 } } */
 /* { dg-final { scan-assembler-times "vfcmaddcsh\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[
 \\t\]+#)" 1 } } */
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1a.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1a.c
index 4dae5f02dc6..859b215ab17 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1a.c
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcph-1a.c
@@ -6,6 +6,7 @@
 /* { dg-final { scan-assembler-times "vfmaddcph\[ 
\\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+(?:\n|\[
 \\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vfmaddcph\[ 
\\t\]+\{rn-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\[^\n\r]*(?:\n|\[
 \\t\]+#)" 2 } } */
 /* { dg-final { scan-assembler-times "vfmaddcph\[ 
\\t\]+\{rz-sae\}\[^\{\n\]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}\{z\}\[^\n\r]*(?:\n|\[
 \\t\]+#)" 1 } } */
+/* { dg-final { scan-assembler-times "vblendmps\[ 
\\t\]+%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\[^\n\r]*%zmm\[0-9\]+\{%k\[0-9\]\}(?:\n|\[
 \\t\]+#)" 2 } } */
 
 #include 
 
diff --git a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1a.c 
b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1a.c
index 2ebe1f8ddd7..288d1c12a10 100644
--- a/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1a.c
+++ b/gcc/testsuite/gcc.target/i386/avx512fp16-vfmaddcsh-1a.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mavx512fp16 -O2" } */
+/* { dg-options "-mavx512fp16 -mno-avx512vl -O2" } */
 /* { dg-final { scan-assembler-times "vfmaddcsh\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "vfmaddcsh\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+\{%k\[0-9\]\}\[^\{\n\r]*(?:\n|\[
 \\t\]+#)" 2 } } */
 /* { dg-final { scan-assembler-times "vfmaddcsh\[ 

[pushed] c++: instantiate less for constant folding

2021-10-14 Thread Jason Merrill via Gcc-patches
I've been experimenting with a change to make all inline functions
implicitly constexpr; this revealed that we are instantiating too
aggressively for speculative constant evaluation, leading to ordering
difficulties with e.g. is_a_helper::test.  This patch tries to
avoid such instantiation until we actually need the function definition to
determine whether a call is constant, by limiting the initial instantiation
of all used functions to manifestly-constant-evaluated expressions, and
checking whether the function arguments are constant before instantiating
the function.

This change resulted in a change in the diagnostics for a few library tests
due to instantiating the function with the static_assert later (during
constant evaluation) than we did before (during instantiation of the
intermediate function).

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/cp/ChangeLog:

* constexpr.c (cxx_bind_parameters_in_call): Replace
new_call parameter with fun.
(cxx_eval_call_expression): Call it before instantiation.
(cxx_eval_outermost_constant_expr): Only instantiate fns
when manifestly_const_eval.
* typeck2.c (check_narrowing): This context is manifestly
constant-evaluated.

libstdc++-v3/ChangeLog:

* testsuite/20_util/integer_comparisons/greater_equal_neg.cc:
* testsuite/20_util/integer_comparisons/greater_neg.cc:
* testsuite/20_util/integer_comparisons/less_equal_neg.cc:
Adjust expected message.
* testsuite/lib/prune.exp: Prune 'in constexpr expansion'.

gcc/testsuite/ChangeLog:

* g++.dg/ext/vla22.C: Don't expect a narrowing error.
* g++.dg/cpp0x/constexpr-inst1.C: New test.
---
 gcc/cp/constexpr.c| 58 ++-
 gcc/cp/typeck2.c  |  2 +-
 gcc/testsuite/g++.dg/cpp0x/constexpr-inst1.C  | 17 ++
 gcc/testsuite/g++.dg/ext/vla22.C  |  2 +-
 .../integer_comparisons/greater_equal_neg.cc  | 24 
 .../integer_comparisons/greater_neg.cc| 24 
 .../integer_comparisons/less_equal_neg.cc | 24 
 libstdc++-v3/testsuite/lib/prune.exp  |  1 +
 8 files changed, 87 insertions(+), 65 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp0x/constexpr-inst1.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 08f8514d08d..c5f01b95470 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1609,24 +1609,24 @@ addr_of_non_const_var (tree *tp, int *walk_subtrees, 
void *data)
all arguments and bind their values to correspondings
parameters, making up the NEW_CALL context.  */
 
-static void
-cxx_bind_parameters_in_call (const constexpr_ctx *ctx, tree t,
- constexpr_call *new_call,
+static tree
+cxx_bind_parameters_in_call (const constexpr_ctx *ctx, tree t, tree fun,
 bool *non_constant_p, bool *overflow_p,
 bool *non_constant_args)
 {
   const int nargs = call_expr_nargs (t);
-  tree fun = new_call->fundef->decl;
-  tree parms = new_call->fundef->parms;
+  tree parms = DECL_ARGUMENTS (fun);
   int i;
   /* We don't record ellipsis args below.  */
   int nparms = list_length (parms);
   int nbinds = nargs < nparms ? nargs : nparms;
-  tree binds = new_call->bindings = make_tree_vec (nbinds);
+  tree binds = make_tree_vec (nbinds);
   for (i = 0; i < nargs; ++i)
 {
   tree x, arg;
   tree type = parms ? TREE_TYPE (parms) : void_type_node;
+  if (parms && DECL_BY_REFERENCE (parms))
+   type = TREE_TYPE (type);
   x = get_nth_callarg (t, i);
   /* For member function, the first argument is a pointer to the implied
  object.  For a constructor, it might still be a dummy object, in
@@ -1647,7 +1647,7 @@ cxx_bind_parameters_in_call (const constexpr_ctx *ctx, 
tree t,
  non_constant_p, overflow_p);
   /* Don't VERIFY_CONSTANT here.  */
   if (*non_constant_p && ctx->quiet)
-   return;
+   break;
   /* Just discard ellipsis args after checking their constantitude.  */
   if (!parms)
continue;
@@ -1698,6 +1698,8 @@ cxx_bind_parameters_in_call (const constexpr_ctx *ctx, 
tree t,
}
   parms = TREE_CHAIN (parms);
 }
+
+  return binds;
 }
 
 /* Variables and functions to manage constexpr call expansion context.
@@ -2564,6 +2566,26 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
}
 }
 
+  bool non_constant_args = false;
+  new_call.bindings
+= cxx_bind_parameters_in_call (ctx, t, fun, non_constant_p,
+  overflow_p, _constant_args);
+
+  /* We build up the bindings list before we know whether we already have this
+ call cached.  If we don't end up saving these bindings, ggc_free them when
+ this function exits.  */
+  class free_bindings
+  {
+tree *bindings;
+  public:
+free_bindings (tree ): bindings () { 

Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-14 Thread Andrew MacLeod via Gcc-patches

On 10/14/21 6:07 PM, Martin Sebor via Gcc-patches wrote:

On 10/9/21 12:47 PM, Aldy Hernandez via Gcc-patches wrote:

We seem to be passing a lot of context around in the strlen code.  I
certainly don't want to contribute to more.

Most of the handle_* functions are passing the gsi as well as either
ptr_qry or rvals.  That looks a bit messy.  May I suggest putting all
of that in the strlen pass object (well, the dom walker object, but we
can rename it to be less dom centric)?

Something like the attached (untested) patch could be the basis for
further cleanups.

Jakub, would this line of work interest you?


You didn't ask me but since no one spoke up against it let me add
some encouragement: this is exactly what I was envisioning and in
line with other such modernization we have been doing elsewhere.
Could you please submit it for review?

Martin


I'm willing to bet he didn't submit it for review because he doesn't 
have time this release to polish and track it...  (I think the threader 
has been quite consuming).  Rather, it was offered as a starting point 
for someone else who might be interested in continuing to pursue this 
work...  *everyone* is interested in cleanup work others do :-)


Of course, I could be wrong...

Andrew




Re: [r12-4413 Regression] FAIL: gcc.dg/pr102738.c (test for excess errors) on Linux/x86_64

2021-10-14 Thread Andrew MacLeod via Gcc-patches

On 10/14/21 8:19 PM, sunil.k.pandey wrote:

On Linux/x86_64,

f0b7d4cc49ddb1c2c7474cc3f61e260aa93a96c0 is the first bad commit
commit f0b7d4cc49ddb1c2c7474cc3f61e260aa93a96c0
Author: Andrew MacLeod 
Date:   Thu Oct 14 10:43:58 2021 -0400

 Simplification for right shift.

caused

FAIL: gcc.dg/pr102738.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4413/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr102738.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr102738.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)

Is this due to lack of 128 bit support?   that fix has just been checked 
in and I think resolves it...


Sorry

Andrew



Re: [COMMITTED] tree-optimization/102738 - Simplification for right shift.

2021-10-14 Thread Andrew MacLeod via Gcc-patches

On 10/14/21 7:42 PM, H.J. Lu wrote:

On Thu, Oct 14, 2021 at 11:06 AM Andrew MacLeod via Gcc-patches
 wrote:

As the PR observes, if the first operand of a right shift is 0 or -1,
operand 2 doesn't matter and the result will be the same as op1, so it
can be turned into a copy.

This patch checks for that condition and performs the operation in EVRP.

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew


/* PR tree-optimization/102738 */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+/* Remove arithmetic shift right when the LHS is known to be 0 or -1.  */
+
+int a1(__int128 f, int g)

Missing

/* { dg-do compile { target int128 } } */



Thanks..   Done.    Feel free to bypass me :-)

Andrew



[r12-4413 Regression] FAIL: gcc.dg/pr102738.c (test for excess errors) on Linux/x86_64

2021-10-14 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

f0b7d4cc49ddb1c2c7474cc3f61e260aa93a96c0 is the first bad commit
commit f0b7d4cc49ddb1c2c7474cc3f61e260aa93a96c0
Author: Andrew MacLeod 
Date:   Thu Oct 14 10:43:58 2021 -0400

Simplification for right shift.

caused

FAIL: gcc.dg/pr102738.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4413/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr102738.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check RUNTESTFLAGS="dg.exp=gcc.dg/pr102738.c 
--target_board='unix{-m32\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH] middle-end/102682 - avoid invalid subreg on the LHS

2021-10-14 Thread Jeff Law via Gcc-patches




On 10/11/2021 7:39 AM, Richard Biener wrote:

The following avoids generating

(insn 6 5 7 2 (set (subreg:OI (concatn/v:TI [
 (reg:DI 92 [ buffer ])
 (reg:DI 93 [ buffer+8 ])
 ]) 0)
 (subreg:OI (reg/v:V8SI 85 [ __x ]) 0)) "t.ii":76:21 74 
{*movoi_internal_avx}
  (nil))

via store_bit_field_1 when we try to store excess data into
a register allocated temporary.  The case was supposed to

   /* Use the subreg machinery either to narrow OP0 to the required
  words...

but the check ensured only an register-aligned but not a large
enough piece.  The following adds such missed check which ends up
decomposing the set to

(insn 6 5 7 (set (subreg:DI (reg/v:TI 84 [ buffer ]) 0)
 (subreg:DI (reg/v:V8SI 85 [ __x ]) 0)) "t.ii":76:21 -1
  (nil))

(insn 7 6 0 (set (subreg:DI (reg/v:TI 84 [ buffer ]) 8)
 (subreg:DI (reg/v:V8SI 85 [ __x ]) 8)) "t.ii":76:21 -1
  (nil))


Bootstrapped and tested on x86_64-unknown-linux-gnu, OK for trunk?

Thanks,
Richard.

2021-10-11  Richard Biener  

PR middle-end/102682
* expmed.c (store_bit_field_1): Ensure a LHS subreg would
not create a paradoxical subreg.

OK.

Jeff



Re: PING Re: [Patch][doc][PR101843]clarification on building gcc and binutils together

2021-10-14 Thread Jeff Law via Gcc-patches




On 10/14/2021 9:52 AM, John Henning wrote:

Hi Jeff, not sure what you mean by "all", please can you clarify?

Anything related to single tree builds needs to be removed.




On 9/23/21, 7:08 AM, "Gcc-patches on behalf of John Henning via Gcc-patches" 
 wrote:

 Hello Jeff,

 >I would strongly recommend removing all the documentation related to
 >single tree builds.

 Two questions:

 (1) When you say "all", are you suggesting that in-the-gcc-tree builds of 
gmp, mpfr, mpc, and isl should no longer be documented?  Or only in-tree builds of 
binutils?
gmp, mpfr, mpc and isl are requirements for building gcc and are not 
germane to this discussion.


binutils is not a requirement for building gcc and any documentation 
related to single tree builds using binutils, gdb & friends that are not 
strictly needed to build gcc should be removed.




 (2) Is there any truth to the suggestion (found in some google tracks) 
that when building a cross-compiler, it is easier to build binutils in the same 
tree?   For example

 https://gcc.gnu.org/wiki/Building_Cross_Toolchains_with_gcc
 https://www.gnu.org/software/gcc/simtest-howto.html
 https://stackoverflow.com/a/6228588
It used to be easier when Cygnus kept those bits working.  There's 
enough divergence across the projects these days that it's easier to 
build them separately and independently.  It's been that way for, I'd 
guess 15-20 years now.




 It is out of respect for existing user habit that I proposed merely demoting it to an 
"alternative" method (while "recommending" the separate build).

That existing user habit needs to be broken :-)

jeff


Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-14 Thread Jeff Law via Gcc-patches




On 10/14/2021 4:07 PM, Martin Sebor via Gcc-patches wrote:

On 10/9/21 12:47 PM, Aldy Hernandez via Gcc-patches wrote:

We seem to be passing a lot of context around in the strlen code.  I
certainly don't want to contribute to more.

Most of the handle_* functions are passing the gsi as well as either
ptr_qry or rvals.  That looks a bit messy.  May I suggest putting all
of that in the strlen pass object (well, the dom walker object, but we
can rename it to be less dom centric)?

Something like the attached (untested) patch could be the basis for
further cleanups.

Jakub, would this line of work interest you?


You didn't ask me but since no one spoke up against it let me add
some encouragement: this is exactly what I was envisioning and in
line with other such modernization we have been doing elsewhere.
Could you please submit it for review?

Agreed.
jeff



Re: [COMMITTED] tree-optimization/102738 - Simplification for right shift.

2021-10-14 Thread H.J. Lu via Gcc-patches
On Thu, Oct 14, 2021 at 11:06 AM Andrew MacLeod via Gcc-patches
 wrote:
>
> As the PR observes, if the first operand of a right shift is 0 or -1,
> operand 2 doesn't matter and the result will be the same as op1, so it
> can be turned into a copy.
>
> This patch checks for that condition and performs the operation in EVRP.
>
> Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.
>
> Andrew
>

/* PR tree-optimization/102738 */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+/* Remove arithmetic shift right when the LHS is known to be 0 or -1.  */
+
+int a1(__int128 f, int g)

Missing

/* { dg-do compile { target int128 } } */


-- 
H.J.


Re: [RFC][patch][PR102281]Clear padding for variables that are in registers

2021-10-14 Thread Qing Zhao via Gcc-patches


> On Oct 14, 2021, at 12:02 PM, Qing Zhao via Gcc-patches 
>  wrote:
> 
>> 
>> For !is_gimple_reg vars, yes, something like clear_padding_type_has_padding_p
>> could be useful to avoid unnecessary IL growth, but it should be implemented
>> more efficiently,

Another thought on this:  

The current “clear_padding_type_may_have_padding_p” might be enough, no need 
for a new
“clear_padding_type_has_padding_p”. 

The reason is: the computation to decide whether the type has padding is 
unavoidable.  
If we just use “clear_padding_type_may_have_padding_p”, then the computation to 
decide
 the type has padding is done by “gimple_fold_builtin_clear_padding”;

The new “clear_padding_type_has_padding_p” just move this computation from 
“gimple_fold_builtin_clear_padding” to
this new routine.  No actual compilation time can be saved. 

Qing

> 
> You mean the following is not efficient enough:
> 
> /* Return true if TYPE contains any padding bits.  */
> 
> bool
> clear_padding_type_has_padding_p (tree type)
> {
> bool has_padding = false;
> if (BITS_PER_UNIT == 8
> && CHAR_BIT == 8
> && clear_padding_type_may_have_padding_p (type))
>   {
> HOST_WIDE_INT sz = int_size_in_bytes (type), i;
> gcc_assert (sz > 0);
> unsigned char *buf = XALLOCAVEC (unsigned char, sz);
> memset (buf, ~0, sz);
> clear_type_padding_in_mask (type, buf);
> for (i = 0; i < sz; i++)
> if (buf[i] != (unsigned char) ~0)
>   {
> has_padding = true;
> break;
>   }
>   }
> return has_padding;
> }
> 
> 



Re: [PATCH] Convert strlen pass from evrp to ranger.

2021-10-14 Thread Martin Sebor via Gcc-patches

On 10/9/21 12:47 PM, Aldy Hernandez via Gcc-patches wrote:

We seem to be passing a lot of context around in the strlen code.  I
certainly don't want to contribute to more.

Most of the handle_* functions are passing the gsi as well as either
ptr_qry or rvals.  That looks a bit messy.  May I suggest putting all
of that in the strlen pass object (well, the dom walker object, but we
can rename it to be less dom centric)?

Something like the attached (untested) patch could be the basis for
further cleanups.

Jakub, would this line of work interest you?


You didn't ask me but since no one spoke up against it let me add
some encouragement: this is exactly what I was envisioning and in
line with other such modernization we have been doing elsewhere.
Could you please submit it for review?

Martin



Aldy

On Fri, Oct 8, 2021 at 5:12 PM Aldy Hernandez  wrote:


The following patch converts the strlen pass from evrp to ranger,
leaving DOM as the last remaining user.

No additional cleanups have been done.  For example, the strlen pass
still has uses of VR_ANTI_RANGE, and the sprintf still passes around
pairs of integers instead of using a proper range.  Fixing this
could further improve these passes.

As a further enhancement, if the relevant maintainers deem useful,
the domwalk could be removed from strlen.  That is, unless the pass
needs it for something else.

With ranger we are now able to remove the range calculation from
before_dom_children entirely.  Just working with the ranger on-demand
catches all the strlen and sprintf testcases with the exception of
builtin-sprintf-warn-22.c which is due to a limitation of the sprintf
code.  I have XFAILed the test and documented what the problem is.

It looks like the same problem in the sprintf test triggers a false
positive in gimple-ssa-warn-access.cc so I have added
-Wno-format-overflow until it can be fixed.

I can expand on the false positive if necessary, but the gist is that
this:

 _17 = strlen (_132);
 _18 = strlen (_136);
 _19 = _18 + _17;
 if (_19 > 75)
   goto ; [0.00%]
 else
   goto ; [100.00%]

...dominates the sprintf in BB61.  This means that ranger can figure
out that the _17 and _18 are [0, 75].  On the other hand, evrp
returned a range of [0, 9223372036854775805] which presumably the
sprintf code was ignoring as a false positive here:

   char sizstr[80];
   ...
   ...
   char *s1 = print_generic_expr_to_str (sizrng[1]);
   gcc_checking_assert (strlen (s0) + strlen (s1)
< sizeof sizstr - 4);
   sprintf (sizstr, "[%s, %s]", s0, s1);

The warning triggers with:

gimple-ssa-warn-access.cc: In member function ‘void 
{anonymous}::pass_waccess::maybe_check_access_sizes(rdwr_map*, tree, tree, 
gimple*)’:
gimple-ssa-warn-access.cc:2916:32: warning: ‘%s’ directive writing up to 75 
bytes into a region of size between 2 and 77 [-Wformat-overflow=]
  2916 |   sprintf (sizstr, "[%s, %s]", s0, s1);
   |^~
gimple-ssa-warn-access.cc:2916:23: note: ‘sprintf’ output between 5 and 155 
bytes into a destination of size 80
  2916 |   sprintf (sizstr, "[%s, %s]", s0, s1);
   |   ^~~~

On a positive note, these changes found two possible sprintf overflow
bugs in the C++ and Fortran front-ends which I have fixed below.

Bootstrap and regtested on x86-64 Linux.  I also ran it through our
callgrind harness and there was no overall change in overall
compilation time.

OK?

gcc/ChangeLog:

 * Makefile.in: Disable -Wformat-overflow for
 gimple-ssa-warn-access.o.
 * tree-ssa-strlen.c (compare_nonzero_chars): Pass statement
 context to ranger.
 (get_addr_stridx): Same.
 (get_stridx): Same.
 (get_range_strlen_dynamic): Same.
 (handle_builtin_strlen): Same.
 (handle_builtin_strchr): Same.
 (handle_builtin_strcpy): Same.
 (maybe_diag_stxncpy_trunc): Same.
 (handle_builtin_stxncpy_strncat):
 (handle_builtin_memcpy): Same.
 (handle_builtin_strcat): Same.
 (handle_alloc_call): Same.
 (handle_builtin_memset): Same.
 (handle_builtin_string_cmp): Same.
 (handle_pointer_plus): Same.
 (count_nonzero_bytes_addr): Same.
 (count_nonzero_bytes): Same.
 (handle_store): Same.
 (fold_strstr_to_strncmp): Same.
 (handle_integral_assign): Same.
 (check_and_optimize_stmt): Same.
 (class strlen_dom_walker): Replace evrp with ranger.
 (strlen_dom_walker::before_dom_children): Remove evrp.
 (strlen_dom_walker::after_dom_children): Remove evrp.

gcc/cp/ChangeLog:

 * ptree.c (cxx_print_xnode): Add more space to pfx array.

gcc/fortran/ChangeLog:

 * misc.c (gfc_dummy_typename): Make sure ts->kind is
 non-negative.


[PATCH] PR fortran/102685 - ICE in output_constructor_regular_field, at varasm.c:5514

2021-10-14 Thread Harald Anlauf via Gcc-patches
Dear Fortranners,

the attached patch adds a check for the shape of arrays in derived type
constructors.  This brings it in line with other major brands.

Example:

  type t
 integer :: a(1)
  end type
  type(t), parameter :: y(2) = t([1,2])
end

This was silently accepted before, but now gives:

pr102685-z1.f90:4:33:

4 |   type(t), parameter :: y(2) = t([1,2])
  | 1
Error: The shape of component 'a' in the structure constructor at (1) differs 
from the shape of the declared component for dimension 1 (2/1)

During regtesting several previously invalid testcases surfaced
which would be rejected by the present change.  They have been
adjusted along the discussion with Tobias and Paul, see the thread

https://gcc.gnu.org/pipermail/fortran/2021-October/056707.html

In developing the patch I encountered a difficulty with testcase
dec_structure_6.f90, which uses a DEC extension, namelist "old-style
CLIST initializers in STRUCTURE".  I could not figure out how to
determine the shape of the initializer; it seemed to be always zero.
I've added code to accept this, but only under -fdec-structure, and
added a TODO in a comment.  If somebody reading this could give me
a hint to solve end, I would adjust the patch accordingly.

Regtested on x86_64-pc-linux-gnu.  OK?  Or further comments?

Thanks,
Harald

Fortran: validate shape of arrays in constructors against declarations

gcc/fortran/ChangeLog:

	PR fortran/102685
	* resolve.c (resolve_structure_cons): In a structure constructor,
	compare shapes of array components against declared shape.

gcc/testsuite/ChangeLog:

	PR fortran/102685
	* gfortran.dg/derived_constructor_char_1.f90: Fix invalid code.
	* gfortran.dg/pr70931.f90: Likewise.
	* gfortran.dg/transfer_simplify_2.f90: Likewise.
	* gfortran.dg/pr102685.f90: New test.

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 0d0af39d23f..d035b30ad7d 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -29,6 +29,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "data.h"
 #include "target-memory.h" /* for gfc_simplify_transfer */
 #include "constructor.h"
+#include "parse.h"

 /* Types used in equivalence statements.  */

@@ -1454,6 +1455,42 @@ resolve_structure_cons (gfc_expr *expr, int init)
 	}
 	}

+  /* Validate shape.  We silently accept some cases where the apparent
+	 shape of the constructor is zero, and we cannot check dynamic or PDT
+	 arrays.  */
+  if (cons->expr->expr_type == EXPR_ARRAY && rank == cons->expr->rank
+	  && comp->as && !comp->attr.allocatable && !comp->attr.pointer
+	  && !comp->attr.pdt_array)
+	{
+	  mpz_t len;
+	  mpz_init (len);
+	  for (int n = 0; n < rank; n++)
+	{
+	  gcc_assert (comp->as->upper[n]->expr_type == EXPR_CONSTANT
+			  && comp->as->lower[n]->expr_type == EXPR_CONSTANT);
+	  mpz_set_ui (len, 1);
+	  mpz_add (len, len, comp->as->upper[n]->value.integer);
+	  mpz_sub (len, len, comp->as->lower[n]->value.integer);
+	  /* Shape agrees for this dimension.  */
+	  if (mpz_cmp (cons->expr->shape[n], len) == 0)
+		continue;
+	  /* Accept DEC old-style initializers in STRUCTURE, where shape
+		 is currently not correctly set (it is zero.  Why?).
+		 TODO: Fix this or find a better solution.  */
+	  if (flag_dec_structure
+		  && mpz_cmp_si (cons->expr->shape[n], 0) == 0)
+		continue;
+	  gfc_error ("The shape of component %qs in the structure "
+			 "constructor at %L differs from the shape of the "
+			 "declared component for dimension %d (%ld/%ld)",
+			 comp->name, >expr->where, n+1,
+			 mpz_get_si (cons->expr->shape[n]),
+			 mpz_get_si (len));
+	  t = false;
+	}
+	  mpz_clear (len);
+	}
+
   if (!comp->attr.pointer || comp->attr.proc_pointer
 	  || cons->expr->expr_type == EXPR_NULL)
 	continue;
diff --git a/gcc/testsuite/gfortran.dg/derived_constructor_char_1.f90 b/gcc/testsuite/gfortran.dg/derived_constructor_char_1.f90
index 892a9c9f4c1..91fc4c902d8 100644
--- a/gcc/testsuite/gfortran.dg/derived_constructor_char_1.f90
+++ b/gcc/testsuite/gfortran.dg/derived_constructor_char_1.f90
@@ -5,7 +5,7 @@
 !
 !
   Type :: t5
-character (len=5) :: txt(4)
+character (len=5) :: txt(2)
   End Type t5

   character (len=3), parameter :: str3(2) = [ "ABC", "ZYX" ]
diff --git a/gcc/testsuite/gfortran.dg/pr102685.f90 b/gcc/testsuite/gfortran.dg/pr102685.f90
new file mode 100644
index 000..d325c27b32a
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/pr102685.f90
@@ -0,0 +1,30 @@
+! { dg-do compile }
+! PR fortran/102685
+
+program p
+  type t
+ integer :: a(2)
+  end type
+  type(t), parameter :: x0= t([2]) ! { dg-error "shape of component" }
+  type(t), parameter :: x1(2) = t([2]) ! { dg-error "shape of component" }
+  type(t), parameter :: x(2)  = t([integer::]) ! { dg-error "shape of component" }
+
+  type u
+ integer :: a
+ integer :: b(0)
+  end type
+  type(u), parameter :: z0(2) = u(1, 

[committed] libstdc++: Simplify variant access functions

2021-10-14 Thread Jonathan Wakely via Gcc-patches
libstdc++-v3/ChangeLog:

* include/std/variant (__variant::__get(in_place_index_t, U&&)):
Rename to __get_n and remove first argument. Replace pair of
overloads with a single function using 'if constexpr'.
(__variant::__get(Variant&&)): Adjust to use __get_n.

Tested powerpc64le-linux. Committed to trunk.

commit 4f87d4c5aec9a1eaca7be61e5c8aab4d6e61b1d8
Author: Jonathan Wakely 
Date:   Thu Oct 14 20:37:38 2021

libstdc++: Simplify variant access functions

libstdc++-v3/ChangeLog:

* include/std/variant (__variant::__get(in_place_index_t, U&&)):
Rename to __get_n and remove first argument. Replace pair of
overloads with a single function using 'if constexpr'.
(__variant::__get(Variant&&)): Adjust to use __get_n.

diff --git a/libstdc++-v3/include/std/variant b/libstdc++-v3/include/std/variant
index 6377b6731ea..b85a89d0b7b 100644
--- a/libstdc++-v3/include/std/variant
+++ b/libstdc++-v3/include/std/variant
@@ -279,27 +279,26 @@ namespace __variant
   __gnu_cxx::__aligned_membuf<_Type> _M_storage;
 };
 
-  template
-constexpr decltype(auto)
-__get(in_place_index_t<0>, _Union&& __u) noexcept
-{ return std::forward<_Union>(__u)._M_first._M_get(); }
-
   template
 constexpr decltype(auto)
-__get(in_place_index_t<_Np>, _Union&& __u) noexcept
+__get_n(_Union&& __u) noexcept
 {
-  return __variant::__get(in_place_index<_Np-1>,
- std::forward<_Union>(__u)._M_rest);
+  if constexpr (_Np == 0)
+   return std::forward<_Union>(__u)._M_first._M_get();
+  else if constexpr (_Np == 1)
+   return std::forward<_Union>(__u)._M_rest._M_first._M_get();
+  else if constexpr (_Np == 2)
+   return std::forward<_Union>(__u)._M_rest._M_rest._M_first._M_get();
+  else
+   return __variant::__get_n<_Np - 3>(
+std::forward<_Union>(__u)._M_rest._M_rest._M_rest);
 }
 
   // Returns the typed storage for __v.
   template
 constexpr decltype(auto)
 __get(_Variant&& __v) noexcept
-{
-  return __variant::__get(std::in_place_index<_Np>,
- std::forward<_Variant>(__v)._M_u);
-}
+{ return __variant::__get_n<_Np>(std::forward<_Variant>(__v)._M_u); }
 
   template
 struct _Traits


[committed] libstdc++: Make filesystem::path(path&&) always noexcept

2021-10-14 Thread Jonathan Wakely via Gcc-patches
Since r12-4065 std::basic_string is always nothrow-move-constructible,
so filesystem::path is too.

That also means that path::_S_convert(T) is noexcept when returning its
argument, because T is either a basci_string or basic_string_view, and
will be moved into the return value.

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (path(path&&)): Make unconditionally
noexcept.
(path::_S_convert(T)): Add condtional noexcept.

Tested powerpc64le-linux. Committed to trunk.

commit 373acac1c8f2d64409ccea6aea409a0e15e80a6a
Author: Jonathan Wakely 
Date:   Thu Oct 14 13:58:02 2021

libstdc++: Make filesystem::path(path&&) always noexcept

Since r12-4065 std::basic_string is always nothrow-move-constructible,
so filesystem::path is too.

That also means that path::_S_convert(T) is noexcept when returning its
argument, because T is either a basci_string or basic_string_view, and
will be moved into the return value.

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (path(path&&)): Make unconditionally
noexcept.
(path::_S_convert(T)): Add condtional noexcept.

diff --git a/libstdc++-v3/include/bits/fs_path.h 
b/libstdc++-v3/include/bits/fs_path.h
index a63e4b9ab07..d13fb12455c 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -316,10 +316,7 @@ namespace __detail
 
 path(const path& __p) = default;
 
-path(path&& __p)
-#if _GLIBCXX_USE_CXX11_ABI || _GLIBCXX_FULLY_DYNAMIC_STRING == 0
-  noexcept
-#endif
+path(path&& __p) noexcept
 : _M_pathname(std::move(__p._M_pathname)),
   _M_cmpts(std::move(__p._M_cmpts))
 { __p.clear(); }
@@ -624,6 +621,7 @@ namespace __detail
 template
   static auto
   _S_convert(_Tp __str)
+  noexcept(is_same_v)
   {
if constexpr (is_same_v)
  return __str; // No conversion needed.


[committed] c-family: Support DFP printf/scanf formats for C2X

2021-10-14 Thread Joseph Myers
When I enabled various decimal floating-point features for C2X /
stopped them being diagnosed with -pedantic for C2X, I missed the
format checking support.  The DFP printf and scanf formats are
included in C2X.  Thus, adjust the data for those formats so that they
are no longer diagnosed with -std=c2x -Wformat -pedantic.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  Applied to 
mainline.

gcc/c-family/
* c-format.c (printf_length_specs, scanf_length_specs)
(print_char_table, scan_char_table): Support DFP formats for C2X.
* c-format.h (TEX_D32, TEX_D64, TEX_D128): Remove.
(T2X_D32, T2X_D64, T2X_D128): New macros.

gcc/testsuite/
* gcc.dg/format/c11-dfp-printf-1.c,
gcc.dg/format/c11-dfp-scanf-1.c, gcc.dg/format/c2x-dfp-printf-1.c,
gcc.dg/format/c2x-dfp-scanf-1.c: New tests.

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index c27faf71676..e735e092043 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -493,8 +493,8 @@ static const format_length_info printf_length_specs[] =
   { "Z", FMT_LEN_z, STD_EXT, NO_FMT, 0 },
   { "t", FMT_LEN_t, STD_C99, NO_FMT, 0 },
   { "j", FMT_LEN_j, STD_C99, NO_FMT, 0 },
-  { "H", FMT_LEN_H, STD_EXT, NO_FMT, 0 },
-  { "D", FMT_LEN_D, STD_EXT, "DD", FMT_LEN_DD, STD_EXT, 0 },
+  { "H", FMT_LEN_H, STD_C2X, NO_FMT, 0 },
+  { "D", FMT_LEN_D, STD_C2X, "DD", FMT_LEN_DD, STD_C2X, 0 },
   { NO_FMT, NO_FMT, 0 }
 };
 
@@ -530,8 +530,8 @@ static const format_length_info scanf_length_specs[] =
   { "z", FMT_LEN_z, STD_C99, NO_FMT, 0 },
   { "t", FMT_LEN_t, STD_C99, NO_FMT, 0 },
   { "j", FMT_LEN_j, STD_C99, NO_FMT, 0 },
-  { "H", FMT_LEN_H, STD_EXT, NO_FMT, 0 },
-  { "D", FMT_LEN_D, STD_EXT, "DD", FMT_LEN_DD, STD_EXT, 0 },
+  { "H", FMT_LEN_H, STD_C2X, NO_FMT, 0 },
+  { "D", FMT_LEN_D, STD_C2X, "DD", FMT_LEN_DD, STD_C2X, 0 },
   { NO_FMT, NO_FMT, 0 }
 };
 
@@ -703,15 +703,15 @@ static const format_char_info print_char_table[] =
   { "di",  0, STD_C89, { T89_I,   T99_SC,  T89_S,   T89_L,   T9L_LL,  TEX_LL,  
T99_SST, T99_PD,  T99_IM,  BADLEN,  BADLEN,  BADLEN  }, "-wp0 +'I",  "i",  NULL 
},
   { "oxX", 0, STD_C89, { T89_UI,  T99_UC,  T89_US,  T89_UL,  T9L_ULL, TEX_ULL, 
T99_ST,  T99_UPD, T99_UIM, BADLEN,  BADLEN,  BADLEN }, "-wp0#", "i",  NULL 
},
   { "u",   0, STD_C89, { T89_UI,  T99_UC,  T89_US,  T89_UL,  T9L_ULL, TEX_ULL, 
T99_ST,  T99_UPD, T99_UIM, BADLEN,  BADLEN,  BADLEN }, "-wp0'I","i",  NULL 
},
-  { "fgG", 0, STD_C89, { T89_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T89_LD,  
BADLEN,  BADLEN,  BADLEN,  TEX_D32, TEX_D64, TEX_D128 }, "-wp0 +#'I", "",   
NULL },
-  { "eE",  0, STD_C89, { T89_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T89_LD,  
BADLEN,  BADLEN,  BADLEN,  TEX_D32, TEX_D64, TEX_D128 }, "-wp0 +#I",  "",   
NULL },
+  { "fgG", 0, STD_C89, { T89_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T89_LD,  
BADLEN,  BADLEN,  BADLEN,  T2X_D32, T2X_D64, T2X_D128 }, "-wp0 +#'I", "",   
NULL },
+  { "eE",  0, STD_C89, { T89_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T89_LD,  
BADLEN,  BADLEN,  BADLEN,  T2X_D32, T2X_D64, T2X_D128 }, "-wp0 +#I",  "",   
NULL },
   { "c",   0, STD_C89, { T89_I,   BADLEN,  BADLEN,  T94_WI,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-w","",   NULL 
},
   { "s",   1, STD_C89, { T89_C,   BADLEN,  BADLEN,  T94_W,   BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-wp",   "cR", NULL 
},
   { "p",   1, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  
BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN }, "-w","c",  NULL 
},
   { "n",   1, STD_C89, { T89_I,   T99_SC,  T89_S,   T89_L,   T9L_LL,  BADLEN,  
T99_SST, T99_PD,  T99_IM,  BADLEN,  BADLEN,  BADLEN }, "",  "W",  NULL 
},
   /* C99 conversion specifiers.  */
-  { "F",   0, STD_C99, { T99_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  
BADLEN,  BADLEN,  BADLEN,  TEX_D32, TEX_D64, TEX_D128 }, "-wp0 +#'I", "",   
NULL },
-  { "aA",  0, STD_C99, { T99_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  
BADLEN,  BADLEN,  BADLEN,  TEX_D32, TEX_D64,  TEX_D128 }, "-wp0 +#",   "",   
NULL },
+  { "F",   0, STD_C99, { T99_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  
BADLEN,  BADLEN,  BADLEN,  T2X_D32, T2X_D64, T2X_D128 }, "-wp0 +#'I", "",   
NULL },
+  { "aA",  0, STD_C99, { T99_D,   BADLEN,  BADLEN,  T99_D,   BADLEN,  T99_LD,  
BADLEN,  BADLEN,  BADLEN,  T2X_D32, T2X_D64,  T2X_D128 }, "-wp0 +#",   "",   
NULL },
   /* C2X conversion specifiers.  */
   { "b",   0, STD_C2X, { T2X_UI,  T2X_UC,  T2X_US,  T2X_UL,  T2X_ULL, TEX_ULL, 
T2X_ST,  T2X_UPD, T2X_UIM, BADLEN,  BADLEN,  BADLEN }, "-wp0#", "i",  NULL 
},
   /* X/Open conversion specifiers.  */
@@ -870,15 +870,15 @@ static const format_char_info scan_char_table[] =
   { "di",1, STD_C89, { T89_I,   T99_SC,  T89_S,   T89_L,   T9L_LL,  
TEX_LL,  T99_SST, T99_PD,  T99_IM,  BADLEN,  BADLEN,  BADLEN }, "*w'I", "W",   
NULL },
   { "u", 1, STD_C89, { 

[r12-4397 Regression] FAIL: gcc.dg/guality/pr54200.c -Og -DPREVENT_OPTIMIZATION line 20 z == 3 on Linux/x86_64

2021-10-14 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

4cb52980e5d5fb64a393d385923da1b51ab34606 is the first bad commit
commit 4cb52980e5d5fb64a393d385923da1b51ab34606
Author: Martin Liska 
Date:   Tue Oct 12 14:31:50 2021 +0200

Eliminate AUTODETECT_VALUE usage in options.

caused

FAIL: gcc.dg/guality/pr54200.c  -Og -DPREVENT_OPTIMIZATION  line 20 z == 3

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4397/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr54200.c --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/pr54200.c --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


[r12-4387 Regression] FAIL: gcc.target/i386/avx512fp16-vfmaddcsh-1a.c scan-assembler-times vblendvps[ \\t]+[^{\n]*%xmm[0-9]+[^\n\r]*%xmm[0-9]+[^\n\r]*%xmm[0-9]+[^\n\r]*%xmm[0-9]+(?:\n|[ \\t]+#) 2 on L

2021-10-14 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

2f9529fc62bcd7e5796c5c8c11879c9ba2ca133f is the first bad commit
commit 2f9529fc62bcd7e5796c5c8c11879c9ba2ca133f
Author: Hongyu Wang 
Date:   Thu Sep 23 13:52:16 2021 +0800

AVX512FP16: Adjust builtin for mask complex fma

caused

FAIL: gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c scan-assembler-times 
vblendvps[ 
\\t]+[^{\n]*%xmm[0-9]+[^\n\r]*%xmm[0-9]+[^\n\r]*%xmm[0-9]+[^\n\r]*%xmm[0-9]+(?:\n|[
 \\t]+#) 2
FAIL: gcc.target/i386/avx512fp16-vfmaddcsh-1a.c scan-assembler-times vblendvps[ 
\\t]+[^{\n]*%xmm[0-9]+[^\n\r]*%xmm[0-9]+[^\n\r]*%xmm[0-9]+[^\n\r]*%xmm[0-9]+(?:\n|[
 \\t]+#) 2

with GCC configured with

../../gcc/configure 
--prefix=/local/skpandey/gccwork/toolwork/gcc-bisect-master/master/r12-4387/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-vfcmaddcsh-1a.c 
--target_board='unix{-m64\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-vfmaddcsh-1a.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="i386.exp=gcc.target/i386/avx512fp16-vfmaddcsh-1a.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: PATCH, rs6000] Optimization for vec_xl_sext

2021-10-14 Thread David Edelsohn via Gcc-patches
On Thu, Oct 14, 2021 at 2:17 AM HAO CHEN GUI  wrote:
>
> Hi,
>
>The patch optimizes the code generation for vec_xl_sext builtin. Now all 
> the sign extensions are done on VSX registers directly.
>
>Bootstrapped and tested on powerpc64le-linux with no regressions. Is this 
> okay for trunk? Any recommendations? Thanks a lot.
>
>I refined the patch according to Bill and David's advice. I put the 
> patch.diff and ChangeLog in attachment also in case the indentation doesn't 
> show correctly in email body.
>
>
> ChangeLog
>
> 2021-10-11 Haochen Gui 
>
>
> gcc/
>
> * config/rs6000/rs6000-call.c (altivec_expand_lxvr_builtin):
>
> Modify the expansion for sign extension. All extensions are done
>
> within VSX registers.
>
>
> gcc/testsuite/
>
> * gcc.target/powerpc/p10_vec_xl_sext.c: New test.

This is okay.

Thanks, David


Re: [RFC][patch][PR102281]Clear padding for variables that are in registers

2021-10-14 Thread Qing Zhao via Gcc-patches


On Oct 14, 2021, at 4:06 AM, Jakub Jelinek  wrote:

On Tue, Oct 12, 2021 at 10:51:45PM +, Qing Zhao wrote:
PR10228 1https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102281
Exposed an issue in the current implementation of the padding initialization of 
-ftrivial-auto-var-init.

Currently, we add __builtin_clear_padding call _AFTER_ every explicit 
initialization of an auto variable:

var_decl = {init_constructor};
__builtin_clear_padding (_decl, 0B, 1);

the reason I added the call to EVERY auto variable that has explicit 
initialization is, the folding of __builtin_clear_padding will automatically 
turn this call to a NOP when there is no padding in the variable. So, we don't 
need to check whether the variable has padding explicitly.

However, always adding the call to __builtin_clear_padding (_decl,…) might 
introduce invalid IR when VAR_DECL cannot be addressable.

In order to resolve this issue, I propose the following solution:

Instead of adding the call to __builtin_clear_padding _AFTER_ the explicit 
initialization, Using zero to initialize the whole variable BEFORE explicit 
fields initialization when VAR_DECL has padding, i.e:

If (had_padding_p (var_decl))
  var_decl = ZERO;
var_decl = {init_constructor};

This should resolve the invalid IR issue.  However, there might be more run 
time overhead from such padding initialization since the whole variable is set 
to zero instead of only the paddings.

Please let me know you comments on this.

As I've tried to explain in the PR, this is wrong.
It makes no sense whatsoever to clear padding on is_gimple_reg
variables even if they do have padding (e.g. long double/_Complex long
double), because all the GIMPLE passes will just not care about it.

Okay, I see now.

Then during gimplification phase, I will add “is_gimple_reg” to decide whether 
adding call to __builtin_clear_padding or not.


If you want to clear padding for those, it needs to be done where it
is forced into memory (e.g. expansion for returning or passing such
types in memory if they aren't passed in registers, or RA when the register
allocator decides to spill them into memory if even that is what you want to
cover).

With the current implementation,  for a long double/_Complex long double auto 
variable, if it is not explicitly initialized, -ftrivial-auto-var-init will add 
a call to .DEFERRED_INIT during gimplification, and expand this call as block 
initialization including its padding during RTL expansion;  No any issue when 
the variable is NOT explicitly initialized.

Only when it is explicitly initialized, and when it will be spilled into memory 
during RTL, its padding might not be initialized. Therefore It's necessary to 
clear its padding during RTL phase when this variable is spilled to memory.

It is tricky to fix this issue, and it’s not needed to fix PR102281, I will 
create another PR  to record this long double/_Complex lang double padding 
initialization issue and fix it later.


For !is_gimple_reg vars, yes, something like clear_padding_type_has_padding_p
could be useful to avoid unnecessary IL growth, but it should be implemented
more efficiently,

You mean the following is not efficient enough:

/* Return true if TYPE contains any padding bits.  */

bool
clear_padding_type_has_padding_p (tree type)
{
 bool has_padding = false;
 if (BITS_PER_UNIT == 8
 && CHAR_BIT == 8
 && clear_padding_type_may_have_padding_p (type))
   {
 HOST_WIDE_INT sz = int_size_in_bytes (type), i;
 gcc_assert (sz > 0);
 unsigned char *buf = XALLOCAVEC (unsigned char, sz);
 memset (buf, ~0, sz);
 clear_type_padding_in_mask (type, buf);
 for (i = 0; i < sz; i++)
 if (buf[i] != (unsigned char) ~0)
   {
 has_padding = true;
 break;
   }
   }
 return has_padding;
}


in particular for the answer to a question will
__builtin_clear_padding emit any code on this type at least for non-unions
there is no buffer covering the whole type needed, just next to
clear_in_mask bool add another bool for the new mode and in clear_padding_flush
in that mode just set another bool if clear_in_mask mode would clear any bit
in the mask if it was there.  For unions, I'm afraid it needs to do it
the clear_in_mask way and at the end analyze the buf.
Also, besides answer to the question "would __builtin_clear_padding emit any
code on this type" we should have another function/another mode which
returns true if any padding whatsoever is found, including unions.

So, what’s the difference between “will __builtin_clear_padding emit any code 
on this type” and “any padding is found for this type”?
Should the answers to these two questions be the same?

For non-union it would be exactly the same thing, but in unions it would
keep using the same mode too, even if just one union member has any padding
return it.  And, for these modes, there could be shortcuts, once we find
some padding, it doesn't make sense to walk further fields.

Is it convenient for you to modify 

[COMMITTED] tree-optimization/102738 - Simplification for right shift.

2021-10-14 Thread Andrew MacLeod via Gcc-patches
As the PR observes, if the first operand of a right shift is 0 or -1, 
operand 2 doesn't matter and the result will be the same as op1, so it 
can be turned into a copy.


This patch checks for that condition and performs the operation in EVRP.

Bootstrapped on x86_64-pc-linux-gnu with no regressions.  Pushed.

Andrew

commit f0b7d4cc49ddb1c2c7474cc3f61e260aa93a96c0
Author: Andrew MacLeod 
Date:   Thu Oct 14 10:43:58 2021 -0400

Simplification for right shift.

When the first operand of a signed right shift is zero or negative one, the
RHS doesn't matter and the shift can be converted to a copy.

PR tree-optimization/102738
gcc/
* vr-values.c (simplify_using_ranges::simplify): Handle RSHIFT_EXPR.

gcc/testsuite
* gcc.dg/pr102738.c: New.

diff --git a/gcc/testsuite/gcc.dg/pr102738.c b/gcc/testsuite/gcc.dg/pr102738.c
new file mode 100644
index 000..776fcecb27a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102738.c
@@ -0,0 +1,48 @@
+/* PR tree-optimization/102738 */
+/* { dg-options "-O2 -fdump-tree-evrp" } */
+
+/* Remove arithmetic shift right when the LHS is known to be 0 or -1.  */
+
+int a1(__int128 f, int g)
+{
+ /* Leaves f >> 127.  */
+return (f >> 127) >> g;
+}
+
+int a2(int f, int g)
+{
+ /* Leaves f >> 31.  */
+return (f >> 31) >> g;
+}
+
+int a3(int f, int g)
+{
+if (f == 0 || f == -1)
+  return f >> g;
+__builtin_unreachable();
+}
+
+int a4(int f, int g)
+{
+if (f == 0 || f == 1)
+  return (-f) >> g;
+__builtin_unreachable();
+}
+
+int a5(int f, int g)
+{
+if (f == 0 || f == 1)
+  return (f-1) >> g;
+return 0;
+}
+
+int a6(int f, int g)
+{
+if (f == 6 || f == 7)
+  return (f-7) >> g;
+__builtin_unreachable();
+}
+
+/* { dg-final { scan-tree-dump-times " >> 127" 1 "evrp" } } */
+/* { dg-final { scan-tree-dump-times " >> 31" 1 "evrp" } } */
+/* { dg-final { scan-tree-dump-times " >> " 2 "evrp" } } */
diff --git a/gcc/vr-values.c b/gcc/vr-values.c
index 9bf58f416f2..d0f7cb41bc8 100644
--- a/gcc/vr-values.c
+++ b/gcc/vr-values.c
@@ -4281,6 +4281,28 @@ simplify_using_ranges::simplify (gimple_stmt_iterator *gsi)
 	case MAX_EXPR:
 	  return simplify_min_or_max_using_ranges (gsi, stmt);
 
+	case RSHIFT_EXPR:
+	  {
+	tree op0 = gimple_assign_rhs1 (stmt);
+	tree type = TREE_TYPE (op0);
+	int_range_max range;
+	if (TYPE_SIGN (type) == SIGNED
+		&& query->range_of_expr (range, op0, stmt))
+	  {
+		unsigned prec = TYPE_PRECISION (TREE_TYPE (op0));
+		int_range<2> nzm1 (type, wi::minus_one (prec), wi::zero (prec),
+   VR_ANTI_RANGE);
+		range.intersect (nzm1);
+		// If there are no ranges other than [-1, 0] remove the shift.
+		if (range.undefined_p ())
+		  {
+		gimple_assign_set_rhs_from_tree (gsi, op0);
+		return true;
+		  }
+		return false;
+	  }
+	break;
+	  }
 	default:
 	  break;
 	}


Re: [PATCH] libstdc++: Check [ptr, end) and [ptr, ptr+n) ranges with _GLIBCXX_ASSERTIONS

2021-10-14 Thread Jonathan Wakely via Gcc-patches
On Thu, 14 Oct 2021 at 18:11, François Dumont  wrote:
>
> Hi
>
>  On a related subject I am waiting for some feedback on:
>
> https://gcc.gnu.org/pipermail/libstdc++/2021-August/053005.html

I'm concerned that this adds too much overhead for the
_GLIBCXX_ASSERTIONS case. It adds function calls which are not
necessarily inlined, and which perform arithmetic and comparisons on
the arguments. That has a runtime cost which is non-zero.

The patches I sent in this thread have zero runtime cost, because they
use the compiler built-in which compiles away to nothing if the sizes
aren't known.

>
> On 11/10/21 6:49 pm, Jonathan Wakely wrote:
> > This enables lightweight checks for the __glibcxx_requires_valid_range
> > and __glibcxx_requires_string_len macros  when _GLIBCXX_ASSERTIONS is
> > defined.  By using __builtin_object_size we can check whether the end of
> > the range is part of the same object as the start of the range, and
> > detect problems like in PR 89927.
> >
> > libstdc++-v3/ChangeLog:
> >
> >   * include/debug/debug.h (__valid_range_p, __valid_range_n): New
> >   inline functions using __builtin_object_size to check ranges
> >   delimited by pointers.
> >   [_GLIBCXX_ASSERTIONS] (__glibcxx_requires_valid_range): Use
> >   __valid_range_p.
> >   [_GLIBCXX_ASSERTIONS] (__glibcxx_requires_string_len): Use
> >   __valid_range_n.
> >
> >
> > The first patch allows us to detect bugs like string("foo", "bar"),
> > like in PR 89927. Debug mode cannot currently detect this. The new
> > check uses the compiler built-in to detect when the two arguments are
> > not part of the same object. This assumes we're optimizing and the
> > compiler knows the values of the pointers. If it doesn't, then the
> > function just returns true and should inline to nothing.
>
> I see, it does not detect that input pointers are unrelated but as they
> are the computed size is >= __sz.
>
> Isn't it UB to compare unrelated pointers ?

Yes, and my patch doesn't compare any pointers, does it?



Re: [PATCH] libstdc++: Check [ptr,end) and [ptr,ptr+n) ranges with _GLIBCXX_ASSERTIONS

2021-10-14 Thread François Dumont via Gcc-patches

Hi

    On a related subject I am waiting for some feedback on:

https://gcc.gnu.org/pipermail/libstdc++/2021-August/053005.html


On 11/10/21 6:49 pm, Jonathan Wakely wrote:

This enables lightweight checks for the __glibcxx_requires_valid_range
and __glibcxx_requires_string_len macros  when _GLIBCXX_ASSERTIONS is
defined.  By using __builtin_object_size we can check whether the end of
the range is part of the same object as the start of the range, and
detect problems like in PR 89927.

libstdc++-v3/ChangeLog:

* include/debug/debug.h (__valid_range_p, __valid_range_n): New
inline functions using __builtin_object_size to check ranges
delimited by pointers.
[_GLIBCXX_ASSERTIONS] (__glibcxx_requires_valid_range): Use
__valid_range_p.
[_GLIBCXX_ASSERTIONS] (__glibcxx_requires_string_len): Use
__valid_range_n.


The first patch allows us to detect bugs like string("foo", "bar"),
like in PR 89927. Debug mode cannot currently detect this. The new
check uses the compiler built-in to detect when the two arguments are
not part of the same object. This assumes we're optimizing and the
compiler knows the values of the pointers. If it doesn't, then the
function just returns true and should inline to nothing.


I see, it does not detect that input pointers are unrelated but as they 
are the computed size is >= __sz.


Isn't it UB to compare unrelated pointers ?


I would like to also enable that for Debug Mode, otherwise we have
checks that work for _GLIBCXX_ASSERTIONS but not for _GLIBCXX_DEBUG. I
tried to make that work with the second patch attached to this mail,
but it doesn't abort for the example in PR 89927. I think puttingthe
checks inside the "real" debug checking functions is too many levels
of inlining and the compiler "forgets" the pointer values.

I think the first patch is worth committing. It should add no overhead
for optimized builds, and diagnoses some bugs that we do not diagnose
today. I'm less sure about the second, since it doesn't actually help.
Maybe the second one should wait for Siddhesh's
__builtin_dynamic_object_size to land on trunk.

Taking this idea further, we could do something similar for
__glibcxx_requires_string, which is currently almost useless (it only
checks if the pointer is null) but could be changed to use
__valid_range_n(_String, char_traits<...>::length(_String))
so that we can diagnose non-null terminated strings (because the
length that char-traits would give us would be larger than the size
that __builtin_object_size would give us).

Thoughts?






Re: [RFC][patch][PR102281]Clear padding for variables that are in registers

2021-10-14 Thread Qing Zhao via Gcc-patches


> On Oct 14, 2021, at 4:06 AM, Jakub Jelinek  wrote:
> 
> On Tue, Oct 12, 2021 at 10:51:45PM +, Qing Zhao wrote:
>> PR10228 1https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102281
>> Exposed an issue in the current implementation of the padding initialization 
>> of -ftrivial-auto-var-init.
>> 
>> Currently, we add __builtin_clear_padding call _AFTER_ every explicit 
>> initialization of an auto variable:
>> 
>> var_decl = {init_constructor};
>> __builtin_clear_padding (_decl, 0B, 1);
>> 
>> the reason I added the call to EVERY auto variable that has explicit 
>> initialization is, the folding of __builtin_clear_padding will automatically 
>> turn this call to a NOP when there is no padding in the variable. So, we 
>> don't need to check whether the variable has padding explicitly. 
>> 
>> However, always adding the call to __builtin_clear_padding (_decl,…) 
>> might introduce invalid IR when VAR_DECL cannot be addressable. 
>> 
>> In order to resolve this issue, I propose the following solution:
>> 
>> Instead of adding the call to __builtin_clear_padding _AFTER_ the explicit 
>> initialization, Using zero to initialize the whole variable BEFORE explicit 
>> fields initialization when VAR_DECL has padding, i.e:
>> 
>> If (had_padding_p (var_decl))
>>   var_decl = ZERO;
>> var_decl = {init_constructor};
>> 
>> This should resolve the invalid IR issue.  However, there might be more run 
>> time overhead from such padding initialization since the whole variable is 
>> set to zero instead of only the paddings. 
>> 
>> Please let me know you comments on this.
> 
> As I've tried to explain in the PR, this is wrong.
> It makes no sense whatsoever to clear padding on is_gimple_reg
> variables even if they do have padding (e.g. long double/_Complex long
> double), because all the GIMPLE passes will just not care about it.

Okay, I see now.

Then during gimplification phase, I will add “is_gimple_reg” to decide whether 
adding call to __builtin_clear_padding or not. 

> 
> If you want to clear padding for those, it needs to be done where it
> is forced into memory (e.g. expansion for returning or passing such
> types in memory if they aren't passed in registers, or RA when the register
> allocator decides to spill them into memory if even that is what you want to
> cover).

With the current implementation,  for a long double/_Complex long double auto 
variable, if it is not explicitly initialized, -ftrivial-auto-var-init will add 
a call to .DEFERRED_INIT during gimplification, and expand this call as block 
initialization including its padding during RTL expansion;  No any issue when 
the variable is NOT explicitly initialized.

Only when it is explicitly initialized, and when it will be spilled into memory 
during RTL, its padding might not be initialized. Therefore It's necessary to 
clear its padding during RTL phase when this variable is spilled to memory. 

It is tricky to fix this issue, and it’s not needed to fix PR102281, I will 
create another PR  to record this long double/_Complex lang double padding 
initialization issue and fix it later.

> 
> For !is_gimple_reg vars, yes, something like clear_padding_type_has_padding_p
> could be useful to avoid unnecessary IL growth, but it should be implemented
> more efficiently,

You mean the following is not efficient enough:

/* Return true if TYPE contains any padding bits.  */

bool
clear_padding_type_has_padding_p (tree type)
{
 bool has_padding = false;
 if (BITS_PER_UNIT == 8
 && CHAR_BIT == 8
 && clear_padding_type_may_have_padding_p (type))
   {
 HOST_WIDE_INT sz = int_size_in_bytes (type), i;
 gcc_assert (sz > 0);
 unsigned char *buf = XALLOCAVEC (unsigned char, sz);
 memset (buf, ~0, sz);
 clear_type_padding_in_mask (type, buf);
 for (i = 0; i < sz; i++)
 if (buf[i] != (unsigned char) ~0)
   {
 has_padding = true;
 break;
   }
   }
 return has_padding;
}


> in particular for the answer to a question will
> __builtin_clear_padding emit any code on this type at least for non-unions
> there is no buffer covering the whole type needed, just next to
> clear_in_mask bool add another bool for the new mode and in 
> clear_padding_flush
> in that mode just set another bool if clear_in_mask mode would clear any bit
> in the mask if it was there.  For unions, I'm afraid it needs to do it
> the clear_in_mask way and at the end analyze the buf.
> Also, besides answer to the question "would __builtin_clear_padding emit any
> code on this type" we should have another function/another mode which
> returns true if any padding whatsoever is found, including unions.

So, what’s the difference between “will __builtin_clear_padding emit any code 
on this type” and “any padding is found for this type”?
Should the answers to these two questions be the same?

> For non-union it would be exactly the same thing, but in unions it would
> keep using the same mode too, even if just one union member has any padding
> 

Re: PATCH, rs6000] Optimization for vec_xl_sext

2021-10-14 Thread Bill Schmidt via Gcc-patches
Hi Haochen,

The patch looks okay to me now.  Will defer to David for final call. :-)

Thanks!
Bill

On 10/14/21 1:17 AM, HAO CHEN GUI wrote:
> Hi,
>
>   The patch optimizes the code generation for vec_xl_sext builtin. Now all 
> the sign extensions are done on VSX registers directly.
>
>   Bootstrapped and tested on powerpc64le-linux with no regressions. Is this 
> okay for trunk? Any recommendations? Thanks a lot.
>
>   I refined the patch according to Bill and David's advice. I put the 
> patch.diff and ChangeLog in attachment also in case the indentation doesn't 
> show correctly in email body.
>
>
> ChangeLog
>
> 2021-10-11 Haochen Gui 
>
>
> gcc/
>
> * config/rs6000/rs6000-call.c (altivec_expand_lxvr_builtin):
>
> Modify the expansion for sign extension. All extensions are done
>
> within VSX registers.
>
>
> gcc/testsuite/
>
> * gcc.target/powerpc/p10_vec_xl_sext.c: New test.
>
> patch.diff
>
> diff --git a/gcc/config/rs6000/rs6000-call.c b/gcc/config/rs6000/rs6000-call.c
> index b4e13af4dc6..587e9fa2a2a 100644
> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -9779,7 +9779,7 @@ altivec_expand_lxvr_builtin (enum insn_code icode, tree 
> exp, rtx target, bool bl
>
>    if (sign_extend)
>  {
> -  rtx discratch = gen_reg_rtx (DImode);
> +  rtx discratch = gen_reg_rtx (V2DImode);
>    rtx tiscratch = gen_reg_rtx (TImode);
>
>    /* Emit the lxvr*x insn.  */
> @@ -9788,20 +9788,31 @@ altivec_expand_lxvr_builtin (enum insn_code icode, 
> tree exp, rtx target, bool bl
>     return 0;
>    emit_insn (pat);
>
> -  /* Emit a sign extension from QI,HI,WI to double (DI).  */
> -  rtx scratch = gen_lowpart (smode, tiscratch);
> +  /* Emit a sign extension from V16QI,V8HI,V4SI to V2DI.  */
> +  rtx temp1, temp2;
>    if (icode == CODE_FOR_vsx_lxvrbx)
> -   emit_insn (gen_extendqidi2 (discratch, scratch));
> +   {
> + temp1  = simplify_gen_subreg (V16QImode, tiscratch, TImode, 0);
> + emit_insn (gen_vsx_sign_extend_qi_v2di (discratch, temp1));
> +   }
>    else if (icode == CODE_FOR_vsx_lxvrhx)
> -   emit_insn (gen_extendhidi2 (discratch, scratch));
> +   {
> + temp1  = simplify_gen_subreg (V8HImode, tiscratch, TImode, 0);
> + emit_insn (gen_vsx_sign_extend_hi_v2di (discratch, temp1));
> +   }
>    else if (icode == CODE_FOR_vsx_lxvrwx)
> -   emit_insn (gen_extendsidi2 (discratch, scratch));
> -  /*  Assign discratch directly if scratch is already DI.  */
> -  if (icode == CODE_FOR_vsx_lxvrdx)
> -   discratch = scratch;
> +   {
> + temp1  = simplify_gen_subreg (V4SImode, tiscratch, TImode, 0);
> + emit_insn (gen_vsx_sign_extend_si_v2di (discratch, temp1));
> +   }
> +  else if (icode == CODE_FOR_vsx_lxvrdx)
> +   discratch = simplify_gen_subreg (V2DImode, tiscratch, TImode, 0);
> +  else
> +   gcc_unreachable ();
>
> -  /* Emit the sign extension from DI (double) to TI (quad). */
> -  emit_insn (gen_extendditi2 (target, discratch));
> +  /* Emit the sign extension from V2DI (double) to TI (quad).  */
> +  temp2 = simplify_gen_subreg (TImode, discratch, V2DImode, 0);
> +  emit_insn (gen_extendditi2_vector (target, temp2));
>
>    return target;
>  }
> diff --git a/gcc/testsuite/gcc.target/powerpc/p10_vec_xl_sext.c 
> b/gcc/testsuite/gcc.target/powerpc/p10_vec_xl_sext.c
> new file mode 100644
> index 000..78e72ac5425
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/p10_vec_xl_sext.c
> @@ -0,0 +1,35 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target int128 } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include 
> +
> +vector signed __int128
> +foo1 (signed long a, signed char *b)
> +{
> +  return vec_xl_sext (a, b);
> +}
> +
> +vector signed __int128
> +foo2 (signed long a, signed short *b)
> +{
> +  return vec_xl_sext (a, b);
> +}
> +
> +vector signed __int128
> +foo3 (signed long a, signed int *b)
> +{
> +  return vec_xl_sext (a, b);
> +}
> +
> +vector signed __int128
> +foo4 (signed long a, signed long *b)
> +{
> +  return vec_xl_sext (a, b);
> +}
> +
> +/* { dg-final { scan-assembler-times {\mvextsd2q\M} 4 } } */
> +/* { dg-final { scan-assembler-times {\mvextsb2d\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvextsh2d\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mvextsw2d\M} 1 } } */


Re: [PATCH] Allow different vector types for stmt groups

2021-10-14 Thread Martin Jambor
Hi,

On Thu, Oct 14 2021, Richard Biener wrote:
> On Wed, 13 Oct 2021, Martin Jambor wrote:
>
>> Hi,
>> 
>> On Mon, Sep 27 2021, Richard Biener via Gcc-patches wrote:
>> >
>> [...]
>> >
>> > The following is what I have pushed after re-bootstrapping and testing
>> > on x86_64-unknown-linux-gnu.
>> >
>> > Richard.
>> >
>> > From fc335f9fde40d7a20a1a6e38fd6f842ed93a039e Mon Sep 17 00:00:00 2001
>> > From: Richard Biener 
>> > Date: Wed, 18 Nov 2020 09:36:57 +0100
>> > Subject: [PATCH] Allow different vector types for stmt groups
>> > To: gcc-patches@gcc.gnu.org
>> >
>> > This allows vectorization (in practice non-loop vectorization) to
>> > have a stmt participate in different vector type vectorizations.
>> > It allows us to remove vect_update_shared_vectype and replace it
>> > by pushing/popping STMT_VINFO_VECTYPE from SLP_TREE_VECTYPE around
>> > vect_analyze_stmt and vect_transform_stmt.
>> >
>> > For data-ref the situation is a bit more complicated since we
>> > analyze alignment info with a specific vector type in mind which
>> > doesn't play well when that changes.
>> >
>> > So the bulk of the change is passing down the actual vector type
>> > used for a vectorized access to the various accessors of alignment
>> > info, first and foremost dr_misalignment but also aligned_access_p,
>> > known_alignment_for_access_p, vect_known_alignment_in_bytes and
>> > vect_supportable_dr_alignment.  I took the liberty to replace
>> > ALL_CAPS macro accessors with the lower-case function invocations.
>> >
>> > The actual changes to the behavior are in dr_misalignment which now
>> > is the place factoring in the negative step adjustment as well as
>> > handling alignment queries for a vector type with bigger alignment
>> > requirements than what we can (or have) analyze(d).
>> >
>> > vect_slp_analyze_node_alignment makes use of this and upon receiving
>> > a vector type with a bigger alingment desire re-analyzes the DR
>> > with respect to it but keeps an older more precise result if possible.
>> > In this context it might be possible to do the analysis just once
>> > but instead of analyzing with respect to a specific desired alignment
>> > look for the biggest alignment we can compute a not unknown alignment.
>> >
>> > The ChangeLog includes the functional changes but not the bulk due
>> > to the alignment accessor API changes - I hope that's something good.
>> >
>> > 2021-09-17  Richard Biener  
>> >
>> >PR tree-optimization/97351
>> >PR tree-optimization/97352
>> >PR tree-optimization/82426
>> >* tree-vectorizer.h (dr_misalignment): Add vector type
>> >argument.
>> >(aligned_access_p): Likewise.
>> >(known_alignment_for_access_p): Likewise.
>> >(vect_supportable_dr_alignment): Likewise.
>> >(vect_known_alignment_in_bytes): Likewise.  Refactor.
>> >(DR_MISALIGNMENT): Remove.
>> >(vect_update_shared_vectype): Likewise.
>> >* tree-vect-data-refs.c (dr_misalignment): Refactor, handle
>> >a vector type with larger alignment requirement and apply
>> >the negative step adjustment here.
>> >(vect_calculate_target_alignment): Remove.
>> >(vect_compute_data_ref_alignment): Get explicit vector type
>> >argument, do not apply a negative step alignment adjustment
>> >here.
>> >(vect_slp_analyze_node_alignment): Re-analyze alignment
>> >when we re-visit the DR with a bigger desired alignment but
>> >keep more precise results from smaller alignments.
>> >* tree-vect-slp.c (vect_update_shared_vectype): Remove.
>> >(vect_slp_analyze_node_operations_1): Do not update the
>> >shared vector type on stmts.
>> >* tree-vect-stmts.c (vect_analyze_stmt): Push/pop the
>> >vector type of an SLP node to the representative stmt-info.
>> >(vect_transform_stmt): Likewise.
>> 
>> I have bisected an AMD zen2 10% performance regression of SPEC 2006 FP
>> 433.milc bechmark when compiled with -Ofast -march=native -flto to this
>> commit.  See also:
>> 
>>   
>> https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=412.70.0=289.70.0;
>> 
>> I am not sure if a bugzilla bug is in order because I cannot reproduce
>> the regression neither on an AMD zen3 machine nor on Intel CascadeLake,
>
> It's for sure worth a PR for tracking purposes.  But I've not been
> very successful in identifying regression causes on Zen2 - what perf
> points to is usually exactly the same assembly in both base and peak :/

OK, it's PR 102750 then.

Martin




[COMMIT] openmp: Mark declare variant directive as supported in Fortran

2021-10-14 Thread Kwok Cheung Yeung

Hello

As declare variant is now supported in the Fortran FE, I have removed 
the 'Only C and C++' for the declare variant entry in the list of OpenMP 
5.0 features supported. Committed as obvious.


Thanks

KwokFrom 2c4666fb0686a8f5a55821f1527351dc71c018b4 Mon Sep 17 00:00:00 2001
From: Kwok Cheung Yeung 
Date: Thu, 14 Oct 2021 09:29:13 -0700
Subject: [PATCH] openmp: Mark declare variant directive in documentation as
 supported in Fortran

2021-10-14  Kwok Cheung Yeung  

libgomp/
* libgomp.texi (OpenMP 5.0): Update entry for declare variant
directive.
---
 libgomp/libgomp.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libgomp/libgomp.texi b/libgomp/libgomp.texi
index bdd7e3ac442..af25e9df250 100644
--- a/libgomp/libgomp.texi
+++ b/libgomp/libgomp.texi
@@ -182,7 +182,7 @@ The OpenMP 4.5 specification is fully supported.
 @item Iterators @tab Y @tab
 @item @code{metadirective} directive @tab N @tab
 @item @code{declare variant} directive
-  @tab P @tab Only C and C++, simd traits not handled correctly
+  @tab P @tab simd traits not handled correctly
 @item @emph{target-offload-var} ICV and @code{OMP_TARGET_OFFLOAD}
   env variable @tab Y @tab
 @item Nested-parallel changes to @emph{max-active-levels-var} ICV @tab Y @tab
-- 
2.30.0.335.ge636282



Re: [PATCH] Allow `make tags` to work from top-level directory

2021-10-14 Thread Jeff Law via Gcc-patches




On 10/12/2021 1:56 PM, Eric Gallager wrote:

On Tue, Oct 12, 2021 at 9:18 AM Jeff Law  wrote:



On 10/11/2021 4:05 PM, Eric Gallager via Gcc-patches wrote:

On Thu, Oct 13, 2016 at 4:43 PM Eric Gallager  wrote:

On 10/13/16, Jeff Law  wrote:

On 10/06/2016 07:21 AM, Eric Gallager wrote:

The libdecnumber, libgcc, and libobjc subdirectories are missing TAGS
targets in their Makefiles. The attached patch causes them to be
skipped when running `make tags`.

ChangeLog entry:

2016-10-06  Eric Gallager  

   * Makefile.def: Mark libdecnumber, libgcc, and libobjc as missing
   TAGS target.
   * Makefile.in: Regenerate.


OK.  Please install.

Thanks,
Jeff


I'm still waiting to hear back from  about my request
for copyright assignment, which I'll need to get sorted out before I
can start committing stuff (like this patch).

Thanks,
Eric

Update: In the intervening years, I got my copyright assignment filed
and have recently become able to commit again; is your old approval
from 2016 still valid, Jeff, or do I need a re-approval?
Ref: https://gcc.gnu.org/legacy-ml/gcc-patches/2016-10/msg00370.html

It's still valid.  Just re-test and commit.

jeff

While re-testing, it seems that the `etags` command on my computer
can't be found any longer; I'm thinking gcc/Makefile.in should be
updated to stop hardcoding etags and use a variable that can be
overridden instead... should I do a separate patch for that, or
combine it with this one?

I think this will be simple enough that either way will be fine.

jeff


Re: [PATCH] libiberty: d-demangle: Add test case for function literals

2021-10-14 Thread Jeff Law via Gcc-patches




On 10/13/2021 12:43 PM, Luís Ferreira wrote:

Coverage tests doesn't include a case for function literals

Signed-off-by: Luís Ferreira 

libiberty/ChangeLog:

* testsuite/d-demangle-expected: add test case for function literals

THanks.  Installed.
jeff



Re: [PATCH] libiberty: d-demangle: add test cases for simple special mangles

2021-10-14 Thread Jeff Law via Gcc-patches




On 10/13/2021 1:02 PM, Luís Ferreira wrote:

Simple mangled names (only with identifiers) are not being covered by coverage
tests.

Signed-off-by: Luís Ferreira 

libiberty/ChangeLog:

* testsuite/d-demangle-expected: add test cases for simple special 
mangles

THanks.  Pushed to the trunk.
jeff



Re: [PATCH] Add ability to use full resolving path solver in the backward threader.

2021-10-14 Thread Jeff Law via Gcc-patches




On 10/14/2021 9:13 AM, Aldy Hernandez wrote:

The path solver runs in two modes: a quick mode which assumes any unknown
SSA names are VARYING, and a fully resolving mode using the ranger.

The backward threader currently uses the quick mode, because the fully
resolving mode was not available initially.  Since we now have the ability
in the solver (used by the hybrid threader), I thought it'd be useful to
have the knob for when the time comes.

Note that enabling the fully resolving mode will require some experimenting,
as enabling it would likely render other jump threading passes irrelevant
(VRP threading comes to mind).

There should be no functional changes as the resolver is set to false.

OK pending tests?

gcc/ChangeLog:

* tree-ssa-threadbackward.c (class back_threader): Add m_resolve.
(back_threader::back_threader): Same.
(back_threader::resolve_phi): Try to solve without looking back if
possible.
(back_threader::find_paths_to_names): Same.
(try_thread_blocks): Pass resolve argument to back threader.
(pass_early_thread_jumps::execute): Same.

OK
jeff



Re: [PATCH] Allow different vector types for stmt groups

2021-10-14 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 14 Oct 2021, Richard Biener via Gcc-patches wrote:

> > I have bisected an AMD zen2 10% performance regression of SPEC 2006 FP
> > 433.milc bechmark when compiled with -Ofast -march=native -flto to this
> > commit.  See also:
> > 
> >   
> > https://lnt.opensuse.org/db_default/v4/SPEC/graph?plot.0=412.70.0=289.70.0;
> > 
> > I am not sure if a bugzilla bug is in order because I cannot reproduce
> > the regression neither on an AMD zen3 machine nor on Intel CascadeLake,
> 
> It's for sure worth a PR for tracking purposes.  But I've not been
> very successful in identifying regression causes on Zen2 - what perf
> points to is usually exactly the same assembly in both base and peak :/

Well, in this case it's at least a fairly impressive (40%) difference in 
samples for add_force_to_mom* :

> > BEFORE:
> > 18.47%105497  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> > add_force_to_mom
> > 
> > AFTER:
> > 24.04%149010  milc_peak.mine-  milc_peak.mine-lto-nat  [.] 
> > add_force_to_mom

(as the change was in the vectorizer this shouldn't be different inlining 
decisions hopefully).


Ciao,
Michael.


PING Re: [Patch][doc][PR101843]clarification on building gcc and binutils together

2021-10-14 Thread John Henning via Gcc-patches
Hi Jeff, not sure what you mean by "all", please can you clarify?

On 9/23/21, 7:08 AM, "Gcc-patches on behalf of John Henning via Gcc-patches" 
 wrote:

Hello Jeff,

>I would strongly recommend removing all the documentation related to 
>single tree builds.  

Two questions:

(1) When you say "all", are you suggesting that in-the-gcc-tree builds of 
gmp, mpfr, mpc, and isl should no longer be documented?  Or only in-tree builds 
of binutils?

(2) Is there any truth to the suggestion (found in some google tracks) that 
when building a cross-compiler, it is easier to build binutils in the same 
tree?   For example

https://gcc.gnu.org/wiki/Building_Cross_Toolchains_with_gcc
https://www.gnu.org/software/gcc/simtest-howto.html 
https://stackoverflow.com/a/6228588

It is out of respect for existing user habit that I proposed merely 
demoting it to an "alternative" method (while "recommending" the separate 
build).  

   -john





[PATCH] Add ability to use full resolving path solver in the backward threader.

2021-10-14 Thread Aldy Hernandez via Gcc-patches
The path solver runs in two modes: a quick mode which assumes any unknown
SSA names are VARYING, and a fully resolving mode using the ranger.

The backward threader currently uses the quick mode, because the fully
resolving mode was not available initially.  Since we now have the ability
in the solver (used by the hybrid threader), I thought it'd be useful to
have the knob for when the time comes.

Note that enabling the fully resolving mode will require some experimenting,
as enabling it would likely render other jump threading passes irrelevant
(VRP threading comes to mind).

There should be no functional changes as the resolver is set to false.

OK pending tests?

gcc/ChangeLog:

* tree-ssa-threadbackward.c (class back_threader): Add m_resolve.
(back_threader::back_threader): Same.
(back_threader::resolve_phi): Try to solve without looking back if
possible.
(back_threader::find_paths_to_names): Same.
(try_thread_blocks): Pass resolve argument to back threader.
(pass_early_thread_jumps::execute): Same.
---
 gcc/tree-ssa-threadbackward.c | 41 ++-
 1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 7c2c1112bee..8cc92a484fe 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -77,7 +77,7 @@ private:
 class back_threader
 {
 public:
-  back_threader (bool speed_p);
+  back_threader (bool speed_p, bool resolve);
   ~back_threader ();
   void maybe_thread_block (basic_block bb);
   bool thread_through_all_blocks (bool may_peel_loop_headers);
@@ -112,17 +112,22 @@ private:
   tree m_name;
   // Marker to differentiate unreachable edges.
   static const edge UNREACHABLE_EDGE;
+  // Set to TRUE if unknown SSA names along a path should be resolved
+  // with the ranger.  Otherwise, unknown SSA names are assumed to be
+  // VARYING.  Setting to true more precise but slower.
+  bool m_resolve;
 };
 
 // Used to differentiate unreachable edges, so we may stop the search
 // in a the given direction.
 const edge back_threader::UNREACHABLE_EDGE = (edge) -1;
 
-back_threader::back_threader (bool speed_p)
+back_threader::back_threader (bool speed_p, bool resolve)
   : m_profit (speed_p),
-m_solver (m_ranger, /*resolve=*/false)
+m_solver (m_ranger, resolve)
 {
   m_last_stmt = NULL;
+  m_resolve = resolve;
 }
 
 back_threader::~back_threader ()
@@ -295,7 +300,23 @@ back_threader::resolve_phi (gphi *phi, bitmap interesting)
 
  bitmap_set_bit (interesting, v);
  bitmap_set_bit (m_imports, v);
- done |= find_paths_to_names (e->src, interesting);
+
+ // When resolving unknowns, see if the incoming SSA can be
+ // resolved on entry without having to keep looking back.
+ bool keep_looking = true;
+ if (m_resolve)
+   {
+ m_path.safe_push (e->src);
+ if (maybe_register_path ())
+   {
+ keep_looking = false;
+ m_visited_bbs.add (e->src);
+   }
+ m_path.pop ();
+   }
+ if (keep_looking)
+   done |= find_paths_to_names (e->src, interesting);
+
  bitmap_clear_bit (interesting, v);
}
   else if (TREE_CODE (arg) == INTEGER_CST)
@@ -360,6 +381,14 @@ back_threader::find_paths_to_names (basic_block bb, bitmap 
interesting)
   return false;
 }
 
+  // Try to resolve the path with nothing but ranger knowledge.
+  if (m_resolve && m_path.length () > 1 && maybe_register_path ())
+{
+  m_path.pop ();
+  m_visited_bbs.remove (bb);
+  return true;
+}
+
   auto_bitmap processed;
   unsigned i;
   bool done = false;
@@ -936,7 +965,7 @@ static bool
 try_thread_blocks (function *fun)
 {
   /* Try to thread each block with more than one successor.  */
-  back_threader threader (/*speed_p=*/true);
+  back_threader threader (/*speed=*/true, /*resolve=*/false);
   basic_block bb;
   FOR_EACH_BB_FN (bb, fun)
 {
@@ -1005,7 +1034,7 @@ pass_early_thread_jumps::execute (function *fun)
   loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
 
   /* Try to thread each block with more than one successor.  */
-  back_threader threader (/*speed_p=*/false);
+  back_threader threader (/*speed_p=*/false, /*resolve=*/false);
   basic_block bb;
   FOR_EACH_BB_FN (bb, fun)
 {
-- 
2.31.1



Re: [PATCH] rs6000: Remove unnecessary option manipulation.

2021-10-14 Thread Bill Schmidt via Gcc-patches
Hi Martin,

On 10/14/21 2:49 AM, Martin Liška wrote:
> Hello.
>
> There's follow up flag handling simplification based on
> 4ab51fa0e1705201420d87b601bd92bc643b3d52.
>
> Patch can bootstrap on ppc64le-linux-gnu and survives regression tests.
>
> Ready to be installed?
> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.c (rs6000_override_options_after_change):
> Do not set flag_rename_registers, it's already default behavior.
> Use EnabledBy for unroll_only_small_loops.
> * config/rs6000/rs6000.opt: Use EnabledBy for
> unroll_only_small_loops.
> ---
>  gcc/config/rs6000/rs6000.c   | 7 +--
>  gcc/config/rs6000/rs6000.opt | 2 +-
>  2 files changed, 2 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index acba4d9f26c..40146179e06 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -3472,13 +3472,8 @@ rs6000_override_options_after_change (void)
>    /* Explicit -funroll-loops turns -munroll-only-small-loops off, and
>   turns -frename-registers on.  */
>    if ((OPTION_SET_P (flag_unroll_loops) && flag_unroll_loops)
> -   || (OPTION_SET_P (flag_unroll_all_loops)
> -   && flag_unroll_all_loops))
> +   || (OPTION_SET_P (flag_unroll_all_loops && flag_unroll_all_loops)))

Looks like you got your parentheses wrong here.

Thanks,
Bill

>  {
> -  if (!OPTION_SET_P (unroll_only_small_loops))
> -    unroll_only_small_loops = 0;
> -  if (!OPTION_SET_P (flag_rename_registers))
> -    flag_rename_registers = 1;
>    if (!OPTION_SET_P (flag_cunroll_grow_size))
>  flag_cunroll_grow_size = 1;
>  }
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 9d7878f144a..faeb7423ca7 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -546,7 +546,7 @@ Target Undocumented Var(rs6000_optimize_swaps) Init(1) 
> Save
>  Analyze and remove doubleword swaps from VSX computations.
>  
>  munroll-only-small-loops
> -Target Undocumented Var(unroll_only_small_loops) Init(0) Save
> +Target Undocumented Var(unroll_only_small_loops) Init(0) Save 
> EnabledBy(funroll-loops)
>  ; Use conservative small loop unrolling.
>  
>  mpower9-misc


[committed] Fix mips testsuite fallout from vectorizer changes

2021-10-14 Thread Jeff Law via Gcc-patches


The mips ports have been failing msa-insert-split.c since the change to 
enabled the vectorizer at -O2.


This test expects to see insve, instructions to move data around. Those 
come as a result of the gimple optimizers using BIT_FIELD_REF expressions.


With vectorization we just emit a vector load and a vector store without 
any BIT_FIELD_REFs in gimple.  This translates into simple loads/stores 
and naturally the test fails.


Turning off the vectorizer seems like the right thing to do here.

Committed to the trunk,
Jeff
commit 8ececf9b8c5a2bf2f231db0c3bcf2be927990e7c
Author: Jeff Law 
Date:   Thu Oct 14 10:49:05 2021 -0400

Fix mips testsuite fallout from vectorizer changes

gcc/testsuite
* gcc.target/mips/msa-insert-split.c: Turn off vectorizer.

diff --git a/gcc/testsuite/gcc.target/mips/msa-insert-split.c 
b/gcc/testsuite/gcc.target/mips/msa-insert-split.c
index 50f3b8a61a1..9ad5987ac03 100644
--- a/gcc/testsuite/gcc.target/mips/msa-insert-split.c
+++ b/gcc/testsuite/gcc.target/mips/msa-insert-split.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mfp64 -mhard-float -mmsa" } */
+/* { dg-options "-fno-tree-vectorize -mfp64 -mhard-float -mmsa" } */
 /* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
 
 typedef double v2f64 __attribute__ ((vector_size (16)));


Re: [PATCH] aix: ensure reference to __tls_get_addr is in text section.

2021-10-14 Thread David Edelsohn via Gcc-patches
On Thu, Oct 14, 2021 at 10:10 AM CHIGOT, CLEMENT
 wrote:
>
> Hi David,
>
> The fact that csect .data is referencing csect .text doesn't mean that
> if .text is kept, .data is kept too. It means the opposite. if .data is kept
> then .text must be kept.

Yes, we are in agreement about the purpose of the other reference
between text and data.

The __tls_get_addr reference is an effort to artificially elicit an
error if pthread is not linked in a program that uses TLS.  It is not
truly necessary for correct TLS code generation from GCC.

Your patch moves the __tls_get_addr referenced from the data section
to the text section.  As we agree, the logic of the other code implies
that the data section is used to pull in the text section, so moving
__tls_get_addr to the text section seems more fragile.  It's a game of
which section will be preserved to ensure that __tls_get_addr is
referenced to ensure that an artificial error is generated.  And it's
possible that neither text nor data sections will be referenced if
fine-grained CSECTs are used.  There is no way to make this logic
perfect.

Thanks, David


>
> That's actually what is being done by the linker with the TLS support test
> in configure.
> $ cat test.c
> __thread int a; int b; int main() { return a = b; }
>
> With ".ref __tls_get_addr" in .data:
> $ gcc -maix64 test.c -S -o test.s
> $ cat test.s
> ...
> _section_.text:
> .csect .data[RW],4
> .llong _section_.text
> .extern __tls_get_addr
> .ref __tls_get_addr
> $ gcc -maix64 test.s -o test
> $ dump -X64 -tv test
> ...
> [142]   m   0x0097 debug 0FILEC:PPC64 test.c
> [143]   m   0x16c0 .text 1  unamex.text
> [144]   a4  0x005c   00 SD   PR--
> [147]   m   0x20001298  .bss 1  externb
> [148]   a4  0x0004   00 CM   BS--
> [149]   m   0x8800 .tbss 1  externa
> [150]   a4  0x0004   00 CM   UL--
> ...
>
> Csect .data is garbage-collected by the linker. Thus the .ref doesn't matter.
>
> With ".ref __tls_get_addr" in .text:
> $ cat test.s
> _section_.text:
> .csect .data[RW],4
> .llong _section_.text
> .csect .text[PR],5
> .extern __tls_get_addr
> .ref __tls_get_addr
> $ gcc  -maix64 test.s -o test
> ld: 0711-317 ERROR: Undefined symbol: __tls_get_addr
> ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information.
> collect2: error: ld returned 8 exit status
>
> As csect .text is kept (because of main function), the .ref is still there 
> and the error
> is raise correctly. As "-pthread" isn't passed, __tls_get_addr is not 
> available.
>
> However, writing this mail, I'm wondering if we don't want to always keep both
> csects. If .data is kept, then .text is and if .text is kept, then .data is.
> Or always keeping .data would have too much side effects ?
>
> Thanks,
> Clément
>
> 
> From: David Edelsohn 
> Sent: Thursday, October 14, 2021 3:42 PM
> To: CHIGOT, CLEMENT 
> Cc: gcc-patches@gcc.gnu.org 
> Subject: Re: [PATCH] aix: ensure reference to __tls_get_addr is in text 
> section.
>
> Caution! External email. Do not open attachments or click links, unless this 
> email comes from a known sender and you know the content is safe.
>
> The reference to __tls_get_addr is in the data section.  And the code
> just above creates a symbol in the text section referenced from the
> data section to ensure the text section is retained.  So this change
> doesn't make sense.  You're essentially saying that the data section
> is not used, which makes the other code useless to ensure that the
> text section is referenced.
>
> Thanks, David
>
> On Thu, Oct 14, 2021 at 3:06 AM CHIGOT, CLEMENT  
> wrote:
> >
> > The garbage collector of AIX linker might remove the reference to
> > __tls_get_addr if it's added inside an unused csect.
> >
> >
> > Clément Chigot
> > ATOS Bull SAS
> > 1 rue de Provence - 38432 Échirolles - France
> >


Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.

2021-10-14 Thread Richard Sandiford via Gcc-patches
Robin Dapp  writes:
> Hi Richard,
>
>> (2) Insert:
>> 
>>  if (SUBREG_P (src))
>>src = SUBREG_REG (src);
>> 
>> here.
>> 
>> OK with those changes if that works.  Let me know if they don't —
>> I'll try to be quicker with the next review.
>
> thank you, this looks good in a first testsuite run on s390.  If you 
> have time, would you mind looking at the other outstanding patches of 
> this series as well? In case of further comments, which I am sure there 
> will be, I could test them all at once afterwards.

Which ones still need review?  I think 2/7 and 3/7 are approved,
but 4/7 was still being discussed.  AFAICT the last message on
that was:

   https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576865.html

We probably need to reach a conclusion on that before 5/7.
6/7 and 7/7 looked to be s390-specific.

Thanks,
Richard


Re: [PATCH] Cleanup --params for backward threader.

2021-10-14 Thread Jeff Law via Gcc-patches




On 10/14/2021 8:25 AM, Aldy Hernandez wrote:

The new backward threader makes some of the --param knobs used to
control it questionable at best or no longer applicable at worst.

The fsm-maximum-phi-arguments param is unused and can be removed.

The max-fsm-thread-length param is block based which is a bit redundant,
since we already restrict paths based on instruction estimates.

The max-fsm-thread-paths restricts the total number of threadable paths
in a function.  We probably don't need this.  Besides, the forward
threader has no such restriction.

OK pending tests?

gcc/ChangeLog:

* doc/invoke.texi: Remove max-fsm-thread-length,
max-fsm-thread-paths, and fsm-maximum-phi-arguments.
* params.opt: Same.
* tree-ssa-threadbackward.c (back_threader::back_threader): Remove
argument.
(back_threader_registry::back_threader_registry): Same.
(back_threader_profitability::profitable_path_p): Remove
param_max_fsm_thread-length.
(back_threader_registry::register_path): Remove
m_max_allowable_paths.
OK.  I don't think any of those params were addressing pathological 
cases and thread-length is really handled better by costing based on the 
# statements.


jeff



Re: [PATCH] Fix handling of flag_rename_registers.

2021-10-14 Thread Jeff Law via Gcc-patches




On 10/13/2021 4:02 AM, Martin Liška wrote:


works and that it is not somehow dependent on ordering?  Otherwise we 
have to

use EnabledBy(funroll-loops,funroll-all-loops) on frename-registers.
I guess the
EnabledBy doesn't work if the target decides to set flag_unroll_loop 
in one of

its hooks rather than via its option table override?  (as said, this
is all a mess...)


It's a complete mess. The only override happens in
rs6000_override_options_after_change. I think it can also utilize 
EnabledBy, but

I would like to do it in a different patch.
So what's the preferred way to handle this?  We're in the process of 
evaluating -frename-registers on our target right now and subject to 
verification of a couple issues, our inclination is to turn it on for 
our target at -O2.


Jeff



[PATCH] Cleanup --params for backward threader.

2021-10-14 Thread Aldy Hernandez via Gcc-patches
The new backward threader makes some of the --param knobs used to
control it questionable at best or no longer applicable at worst.

The fsm-maximum-phi-arguments param is unused and can be removed.

The max-fsm-thread-length param is block based which is a bit redundant,
since we already restrict paths based on instruction estimates.

The max-fsm-thread-paths restricts the total number of threadable paths
in a function.  We probably don't need this.  Besides, the forward
threader has no such restriction.

OK pending tests?

gcc/ChangeLog:

* doc/invoke.texi: Remove max-fsm-thread-length,
max-fsm-thread-paths, and fsm-maximum-phi-arguments.
* params.opt: Same.
* tree-ssa-threadbackward.c (back_threader::back_threader): Remove
argument.
(back_threader_registry::back_threader_registry): Same.
(back_threader_profitability::profitable_path_p): Remove
param_max_fsm_thread-length.
(back_threader_registry::register_path): Remove
m_max_allowable_paths.
---
 gcc/doc/invoke.texi   | 12 
 gcc/params.opt| 12 
 gcc/tree-ssa-threadbackward.c | 27 +++
 3 files changed, 3 insertions(+), 48 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 03234c887dc..69993270b39 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14464,14 +14464,6 @@ Emit instrumentation calls to __tsan_func_entry() and 
__tsan_func_exit().
 Maximum number of instructions to copy when duplicating blocks on a
 finite state automaton jump thread path.
 
-@item max-fsm-thread-length
-Maximum number of basic blocks on a finite state automaton jump thread
-path.
-
-@item max-fsm-thread-paths
-Maximum number of new jump thread paths to create for a finite state
-automaton.
-
 @item parloops-chunk-size
 Chunk size of omp schedule for loops parallelized by parloops.
 
@@ -14626,10 +14618,6 @@ The maximum depth of recursive inlining for non-inline 
functions.
 Scale factor to apply to the number of statements in a threading path
 when comparing to the number of (scaled) blocks.
 
-@item fsm-maximum-phi-arguments
-Maximum number of arguments a PHI may have before the FSM threader
-will not try to thread through its block.
-
 @item uninit-control-dep-attempts
 Maximum number of nested calls to search for control dependencies
 during uninitialized variable analysis.
diff --git a/gcc/params.opt b/gcc/params.opt
index 84d642d72c5..06a6fdc9deb 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -173,10 +173,6 @@ Common Joined UInteger Var(param_ranger_logical_depth) 
Init(6) IntegerRange(1, 9
 Maximum depth of logical expression evaluation ranger will look through when
 evaluating outgoing edge ranges.
 
--param=fsm-maximum-phi-arguments=
-Common Joined UInteger Var(param_fsm_maximum_phi_arguments) Init(100) 
IntegerRange(1, 99) Param Optimization
-Maximum number of arguments a PHI may have before the FSM threader will not 
try to thread through its block.
-
 -param=fsm-scale-path-blocks=
 Common Joined UInteger Var(param_fsm_scale_path_blocks) Init(3) 
IntegerRange(1, 10) Param Optimization
 Scale factor to apply to the number of blocks in a threading path when 
comparing to the number of (scaled) statements.
@@ -537,18 +533,10 @@ The maximum number of nested indirect inlining performed 
by early inliner.
 Common Joined UInteger Var(param_max_fields_for_field_sensitive) Param
 Maximum number of fields in a structure before pointer analysis treats the 
structure as a single variable.
 
--param=max-fsm-thread-length=
-Common Joined UInteger Var(param_max_fsm_thread_length) Init(10) 
IntegerRange(1, 99) Param Optimization
-Maximum number of basic blocks on a finite state automaton jump thread path.
-
 -param=max-fsm-thread-path-insns=
 Common Joined UInteger Var(param_max_fsm_thread_path_insns) Init(100) 
IntegerRange(1, 99) Param Optimization
 Maximum number of instructions to copy when duplicating blocks on a finite 
state automaton jump thread path.
 
--param=max-fsm-thread-paths=
-Common Joined UInteger Var(param_max_fsm_thread_paths) Init(50) 
IntegerRange(1, 99) Param Optimization
-Maximum number of new jump thread paths to create for a finite state automaton.
-
 -param=max-gcse-insertion-ratio=
 Common Joined UInteger Var(param_max_gcse_insertion_ratio) Init(20) Param 
Optimization
 The maximum ratio of insertions to deletions of expressions in GCSE.
diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 1999ccf4834..7c2c1112bee 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -52,12 +52,11 @@ along with GCC; see the file COPYING3.  If not see
 class back_threader_registry
 {
 public:
-  back_threader_registry (int max_allowable_paths);
+  back_threader_registry ();
   bool register_path (const vec &, edge taken);
   bool thread_through_all_blocks (bool may_peel_loop_headers);
 private:
   

[PATCH] Minor cleanups to backward threader.

2021-10-14 Thread Aldy Hernandez via Gcc-patches
I will commit this as obvious pending tests on x86-64 Linux.

gcc/ChangeLog:

* tree-ssa-threadbackward.c (class back_threader): Make m_imports
an auto_bitmap.
(back_threader::~back_threader): Do not release m_path.
---
 gcc/tree-ssa-threadbackward.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 496b68e0a82..1999ccf4834 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -105,7 +105,7 @@ private:
   hash_set m_visited_bbs;
   // The set of SSA names, any of which could potentially change the
   // value of the final conditional in a path.
-  bitmap m_imports;
+  auto_bitmap m_imports;
   // The last statement in the path.
   gimple *m_last_stmt;
   // This is a bit of a wart.  It's used to pass the LHS SSA name to
@@ -125,13 +125,10 @@ back_threader::back_threader (bool speed_p)
 m_solver (m_ranger, /*resolve=*/false)
 {
   m_last_stmt = NULL;
-  m_imports = BITMAP_ALLOC (NULL);
 }
 
 back_threader::~back_threader ()
 {
-  m_path.release ();
-  BITMAP_FREE (m_imports);
 }
 
 // Register the current path for jump threading if it's profitable to
-- 
2.31.1



Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-14 Thread Jeff Law via Gcc-patches




On 10/14/2021 3:25 AM, Aldy Hernandez wrote:

PING.

Note, that there are no PRs and nothing really dependent on this
patch.  This has just been suggested as the right thing to do wrt
loops.

This patch fixes 6 XFAILs in our testsuite and has the added
side-effect of fixing the aarch64 bootstrap problem (though the
problem in the uninit code is still there).

This is a fundamental change in what we've traditionally allowed for
jump threading, but I think it's a good thing.  It also paves the way
for combining the validity models for both the forward and the
backward threaders.

I am happy to field the PRs this may bring about, since every change
in the cost model has us questioning whether we should or shouldn't
thread a path.  But I may need some help from y'all if there's a
missing thread that causes a regression in some other pass.  That
being said, most of the issues that have come with the threader
changes haven't been because we thread less, but because we thread
more-- so perhaps restricting things is a good thing ;-).

Just chasing down fallout from the vectorizer changes first :-)

jeff



Re: [RFC] Replace VRP with EVRP passes

2021-10-14 Thread Jeff Law via Gcc-patches




On 10/13/2021 2:58 PM, Andrew MacLeod wrote:
As work has progressed, we're pretty close to being able to 
functionally replace VRP with another EVRP pass.  At least it seems 
close enough that we should discuss if thats something we might want 
to consider for this release.   Replacing just one of the 2 VRP passes 
is another option.


First, lets examine simplifications/folds.

Running over my set of 380 GCC source files, we see the following 
results for number of cases we currently get:


Number of EVRP cases :   5789
Number of VRP1 cases :   4913
Number of VRP2 cases :   279
 combined VRP1/2:   5192

The 2 passes of VRP get a total of 5192 cases.

If we run EVRP instead of each VRP pass, we get the following results:

Number of EVRP1 cases :   5789
Number of EVRP2 cases :   7521
Number of EVRP3 cases :   2240
 combined EVRP2/3:   9761

so the EVRP passes find an additional 4569 opportunities,  or 88% 
more. This initially surprised me until it occurred to me that this is 
probably due to the pruning require for ASSERT_EXPRs, which means it 
never had a chance to see some of those cases. Notice how the second 
pass appears far more effective now.
Wow.  It'd be nice to know for sure if it's the pruning, but the data is 
pretty clear, EVRP does a considerably better job.




Regarding what we would miss if we took VRP out, if we run  EVRP 
passes first then a VRP pass immediately after, we see what VRP finds 
that EVRP cannot:


Number of EVRP2 cases :  7521
Number of VRP1 cases :   11
Number of EVRP3 cases :  2269
Number of VRP2 cases :   54

I have looked at some of these, and so far they all appear to be cases 
which are solved via the iteration model VRP uses. regardless, missing 
65 cases and getting 4569 new ones would seem to be a win. I will 
continue to investigate them.
Well, that would seem to support our long-held belief that the iteration 
step just isn't that important.  Good to finally get a confirmation.





== Performance ==

The threading work has been pulled out of VRP, so we get a better idea 
of what VRPs time really is. we're showing about a 58% slowdown in VRP 
over the 2 passes.  I've begun investigating because it shouldn't be 
off by that much, Im seeing a lot of excess time being wasted with 
callback queries from the substitute_and_fold engine when processing 
PHIs.  It should be possible to bring this all back in line as that 
isnt the model ranger should be using anyway.


I figured while I'm looking into the performance side of it, maybe we 
should start talking about whether we want to replace one or both of 
the VRP passes with an EVRP instance.



I see 3 primary options:
1 - replace both VRP passes with EVRP instances.
2 - replace VRP2 with EVRP2
3 - Replace neither, leave it as is.

I figure since the second pass of VRP doesn't get a lot to start with, 
it probably doesn't make sense to replace VRP1 and not VRP2.


Option 1 is what I would expect the strategic move next release to be, 
it seems ready now, its just a matter of whether we want to give it 
more time.  It would also be trivial to turn VRP back on for one for 
both later in the cycle if we determines there was something important 
missing.


option 2 is something we ought to really consider if we don't want to 
do option 1.  There are a few PRs that are starting to open that have 
VRP not getting something due to the whims of more precise 
mutli-ranges being converted back to a value_range, and replacing VRP2 
would allow us to catch those..  plus, we pick up a lot more than  
VRP2 does.


I would propose we add a param, similar to what EVRP has which will 
allow us to choose which pass is called for VRP1 and VRP2 and set our 
defaults appropriately.  I wouldn't work with a hybrid like we did 
with EVRP... just choose which pass runs.     And we'll have to adjust 
some testcases based one whatever our default is.


Thoughts?

Personally I think we give option 1 a go and if something shows up 
over the next couple of months, or  we cant get performance in line 
with where we want it, then we can switch back to VRP for one or both 
passes.  I wouldn't  expect either, but one never knows :-)


If that isn't palatable for everyone, then I'd suggest option 2

I'd be happy with #1 or #2.   The only one I don't like is #3.

Jeff


Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.

2021-10-14 Thread Robin Dapp via Gcc-patches

Hi Richard,


(2) Insert:

 if (SUBREG_P (src))
   src = SUBREG_REG (src);

here.

OK with those changes if that works.  Let me know if they don't —
I'll try to be quicker with the next review.


thank you, this looks good in a first testsuite run on s390.  If you 
have time, would you mind looking at the other outstanding patches of 
this series as well? In case of further comments, which I am sure there 
will be, I could test them all at once afterwards.


Thanks again
 Robin


[committed] libstdc++: Fix brainwrong in path::_S_convert(T) [PR102743]

2021-10-14 Thread Jonathan Wakely via Gcc-patches

On 13/10/21 20:41 +0100, Jonathan Wakely wrote:

Adjust the __detail::__effective_range overloads so they always return a
string or string view using std::char_traits, because we don't care
about the traits of an incoming string.

Use std::contiguous_iterator in the __effective_range(const Source&)
overload, to allow returning a basic_string_view in more cases. For the
non-contiguous cases in both __effective_range and __string_from_range,
return a std::string instead of std::u8string when the value type of the
range is char8_t.  These changes avoid unnecessary basic_string
temporaries.

Also simplify __string_from_range(Iter, Iter) to not need
std::__to_address for the contiguous case.

Combine the _S_convert(string_type) and _S_convert(const T&) overloads
into a single _S_convert(T) function which also avoids the dangling
view problem of PR 102592 (should that recur somehow).

libstdc++-v3/ChangeLog:

* include/bits/fs_path.h (__detail::__is_contiguous): New
variable template to identify contiguous iterators.
(__detail::__unified_char8_t): New alias template to decide when
to treat char8_t as char without encoding conversion.
(__detail::__effective_range(const basic_string&)): Use
std::char_traits for returned string view.
(__detail::__effective_range(const basic_string_view&)):
Likewise.
(__detail::__effective_range(const Source&)): Use
__is_contiguous to detect mode cases of contiguous iterators.
Use __unified_char8_t to return a std::string instead of
std::u8string.

Tested powerpc64le-linux. Committed to trunk.




template
  static auto
-  _S_convert(const _Tp& __str)
+  _S_convert(_Tp __str)
  {
-   if constexpr (is_same_v<_Tp, string_type>)
- return __str;
-   else if constexpr (is_same_v<_Tp, basic_string_view>)
- return __str;
-   else if constexpr (is_same_v)
- return basic_string_view(__str.data(), __str.size());
+   if constexpr (is_same_v<_Tp, typename _Tp::value_type>)
+ return __str; // No conversion needed.


Yikes, this condition is wrong! Causing PR 102743 on Windows.

Fixed by the attached patch, tested x86_64-linux and x86_64-mingw32,
pushed to trunk.

commit 5e3f88838994b67580594c4679c839fff7cdeba0
Author: Jonathan Wakely 
Date:   Thu Oct 14 13:20:57 2021

libstdc++: Fix brainwrong in path::_S_convert(T) [PR102743]

This function was supposed to check whether the parameter's value type
is the same as path::value_type, and therefore needs no conversion.
Instead it checks whether the parameter is the same as its own value
type, which is never true. This means we incorrectly return a string
view for the case where T is path::string_type, instead of just
returning the string itself. The only place that happens is
path::_S_convert_loc for Windows, where we call _S_convert with a
std::wstring rvalue.

This fixes the condition in _S_convert(T).

libstdc++-v3/ChangeLog:

PR libstdc++/102743
* include/bits/fs_path.h (path::_S_convert(T)): Fix condition
for returning the same string unchanged.

diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h
index c51bfa3095a..a63e4b9ab07 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -625,7 +625,7 @@ namespace __detail
   static auto
   _S_convert(_Tp __str)
   {
-	if constexpr (is_same_v<_Tp, typename _Tp::value_type>)
+	if constexpr (is_same_v)
 	  return __str; // No conversion needed.
 #if !defined _GLIBCXX_FILESYSTEM_IS_WINDOWS && defined _GLIBCXX_USE_CHAR8_T
 	else if constexpr (is_same_v<_Tp, std::u8string>)


[committed] libstdc++: Use more descriptive feature test macro

2021-10-14 Thread Jonathan Wakely via Gcc-patches
The out-of-class definitions of the static constants are redundant if
the __cpp_inline_variables feature is supported, so use that macro to
decide whether to define them or not.

libstdc++-v3/ChangeLog:

* include/bits/regex.h: Check __cpp_inline_variables instead of
__cplusplus.

Tested powerpc64le-linux. Committed to trunk.

commit 3d95867ce6867239aa4ae69a9c82915660e1071d
Author: Jonathan Wakely 
Date:   Wed Oct 6 20:03:50 2021

libstdc++: Use more descriptive feature test macro

The out-of-class definitions of the static constants are redundant if
the __cpp_inline_variables feature is supported, so use that macro to
decide whether to define them or not.

libstdc++-v3/ChangeLog:

* include/bits/regex.h: Check __cpp_inline_variables instead of
__cplusplus.

diff --git a/libstdc++-v3/include/bits/regex.h 
b/libstdc++-v3/include/bits/regex.h
index a3990183580..785edc71800 100644
--- a/libstdc++-v3/include/bits/regex.h
+++ b/libstdc++-v3/include/bits/regex.h
@@ -807,7 +807,7 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   _AutomatonPtr_M_automaton;
 };
 
-#if __cplusplus < 201703L
+#if ! __cpp_inline_variables
   template
 constexpr regex_constants::syntax_option_type
 basic_regex<_Ch, _Tr>::icase;


Re: [PATCH] aix: ensure reference to __tls_get_addr is in text section.

2021-10-14 Thread CHIGOT, CLEMENT via Gcc-patches
Hi David,

The fact that csect .data is referencing csect .text doesn't mean that
if .text is kept, .data is kept too. It means the opposite. if .data is kept
then .text must be kept.

That's actually what is being done by the linker with the TLS support test
in configure.
$ cat test.c
__thread int a; int b; int main() { return a = b; }

With ".ref __tls_get_addr" in .data:
$ gcc -maix64 test.c -S -o test.s
$ cat test.s
...
_section_.text:
.csect .data[RW],4
.llong _section_.text
.extern __tls_get_addr
.ref __tls_get_addr
$ gcc -maix64 test.s -o test
$ dump -X64 -tv test
...
[142]   m   0x0097 debug 0FILEC:PPC64 test.c
[143]   m   0x16c0 .text 1  unamex.text
[144]   a4  0x005c   00 SD   PR--
[147]   m   0x20001298  .bss 1  externb
[148]   a4  0x0004   00 CM   BS--
[149]   m   0x8800 .tbss 1  externa
[150]   a4  0x0004   00 CM   UL--
...

Csect .data is garbage-collected by the linker. Thus the .ref doesn't matter.

With ".ref __tls_get_addr" in .text:
$ cat test.s
_section_.text:
.csect .data[RW],4
.llong _section_.text
.csect .text[PR],5
.extern __tls_get_addr
.ref __tls_get_addr
$ gcc  -maix64 test.s -o test
ld: 0711-317 ERROR: Undefined symbol: __tls_get_addr
ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information.
collect2: error: ld returned 8 exit status

As csect .text is kept (because of main function), the .ref is still there and 
the error
is raise correctly. As "-pthread" isn't passed, __tls_get_addr is not available.

However, writing this mail, I'm wondering if we don't want to always keep both
csects. If .data is kept, then .text is and if .text is kept, then .data is.
Or always keeping .data would have too much side effects ?

Thanks,
Clément


From: David Edelsohn 
Sent: Thursday, October 14, 2021 3:42 PM
To: CHIGOT, CLEMENT 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [PATCH] aix: ensure reference to __tls_get_addr is in text section.

Caution! External email. Do not open attachments or click links, unless this 
email comes from a known sender and you know the content is safe.

The reference to __tls_get_addr is in the data section.  And the code
just above creates a symbol in the text section referenced from the
data section to ensure the text section is retained.  So this change
doesn't make sense.  You're essentially saying that the data section
is not used, which makes the other code useless to ensure that the
text section is referenced.

Thanks, David

On Thu, Oct 14, 2021 at 3:06 AM CHIGOT, CLEMENT  wrote:
>
> The garbage collector of AIX linker might remove the reference to
> __tls_get_addr if it's added inside an unused csect.
>
>
> Clément Chigot
> ATOS Bull SAS
> 1 rue de Provence - 38432 Échirolles - France
>


Re: [PATCH] rs6000: Fix memory leak in rs6000_density_test

2021-10-14 Thread David Edelsohn via Gcc-patches
On Thu, Oct 14, 2021 at 9:00 AM Richard Sandiford
 wrote:
>
> rs6000_density_test has an early exit test between a call
> to get_loop_body and the corresponding free.  This would
> lead to a memory leak if the early exit is taken.
>
> Tested on powerpc64le-linux-gnu.  It's obvious that moving the
> test avoids the leak, but there are multiple ways to write it,
> so: OK to install?

Thanks for noticing this and creating a patch.

This is okay.

Thanks, David

>
> Richard
>
>
> gcc/
> * config/rs6000/rs6000.c (rs6000_density_test): Move early
> exit test further up the function.
> ---
>  gcc/config/rs6000/rs6000.c | 17 -
>  1 file changed, 8 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index acba4d9f26c..01a95591a5d 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -5289,20 +5289,19 @@ struct rs6000_cost_data
>  static void
>  rs6000_density_test (rs6000_cost_data *data)
>  {
> -  struct loop *loop = data->loop_info;
> -  basic_block *bbs = get_loop_body (loop);
> -  int nbbs = loop->num_nodes;
> -  loop_vec_info loop_vinfo = loop_vec_info_for_loop (data->loop_info);
> -  int vec_cost = data->cost[vect_body], not_vec_cost = 0;
> -  int i, density_pct;
> -
>/* This density test only cares about the cost of vector version of the
>   loop, so immediately return if we are passed costing for the scalar
>   version (namely computing single scalar iteration cost).  */
>if (data->costing_for_scalar)
>  return;
>
> -  for (i = 0; i < nbbs; i++)
> +  struct loop *loop = data->loop_info;
> +  basic_block *bbs = get_loop_body (loop);
> +  int nbbs = loop->num_nodes;
> +  loop_vec_info loop_vinfo = loop_vec_info_for_loop (data->loop_info);
> +  int vec_cost = data->cost[vect_body], not_vec_cost = 0;
> +
> +  for (int i = 0; i < nbbs; i++)
>  {
>basic_block bb = bbs[i];
>gimple_stmt_iterator gsi;
> @@ -5322,7 +5321,7 @@ rs6000_density_test (rs6000_cost_data *data)
>  }
>
>free (bbs);
> -  density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);
> +  int density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);
>
>if (density_pct > rs6000_density_pct_threshold
>&& vec_cost + not_vec_cost > rs6000_density_size_threshold)


RE: [PATCH] arm: Remove add_stmt_cost hook

2021-10-14 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Richard Sandiford 
> Sent: Thursday, October 14, 2021 11:57 AM
> To: gcc-patches@gcc.gnu.org
> Cc: ni...@redhat.com; Richard Earnshaw ;
> Ramana Radhakrishnan ; Kyrylo
> Tkachov 
> Subject: [PATCH] arm: Remove add_stmt_cost hook
> 
> The arm implementation of add_stmt_cost was added alongside
> arm_builtin_vectorization_cost.  At that time it was necessary
> to override the latter when overriding the former, since
> default_add_stmt_cost didn't indirect through the
> builtin_vectorization_cost target hook:
> 
>   int stmt_cost = default_builtin_vectorization_cost (kind, vectype,
>   misalign);
> 
> That was fixed by:
> 
> 2014-06-06  Bingfeng Mei  
> 
>* targhooks.c (default_add_stmt_cost): Call target specific
>hook instead of default one.
> 
> so the arm definition of add_stmt_cost is now equivalent
> to the default.
> 
> Bootrapped & regression-tested on arm-linux-gnueabihf.  OK to install?
> 

Ok.
Thanks,
Kyrill

> Richard
> 
> 
> gcc/
>   * config/arm/arm.c (arm_add_stmt_cost): Delete.
>   (TARGET_VECTORIZE_ADD_STMT_COST): Delete.
> ---
>  gcc/config/arm/arm.c | 40 
>  1 file changed, 40 deletions(-)
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index d8c5d2bc7db..e51f60a1841 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -304,11 +304,6 @@ static bool aarch_macro_fusion_pair_p (rtx_insn*,
> rtx_insn*);
>  static int arm_builtin_vectorization_cost (enum vect_cost_for_stmt
> type_of_cost,
>  tree vectype,
>  int misalign ATTRIBUTE_UNUSED);
> -static unsigned arm_add_stmt_cost (vec_info *vinfo, void *data, int count,
> -enum vect_cost_for_stmt kind,
> -struct _stmt_vec_info *stmt_info,
> -tree vectype, int misalign,
> -enum vect_cost_model_location where);
> 
>  static void arm_canonicalize_comparison (int *code, rtx *op0, rtx *op1,
>bool op0_preserve_value);
> @@ -769,8 +764,6 @@ static const struct attribute_spec
> arm_attribute_table[] =
>  #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
>  #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
>arm_builtin_vectorization_cost
> -#undef TARGET_VECTORIZE_ADD_STMT_COST
> -#define TARGET_VECTORIZE_ADD_STMT_COST arm_add_stmt_cost
> 
>  #undef TARGET_CANONICALIZE_COMPARISON
>  #define TARGET_CANONICALIZE_COMPARISON \
> @@ -12242,39 +12235,6 @@ arm_builtin_vectorization_cost (enum
> vect_cost_for_stmt type_of_cost,
>  }
>  }
> 
> -/* Implement targetm.vectorize.add_stmt_cost.  */
> -
> -static unsigned
> -arm_add_stmt_cost (vec_info *vinfo, void *data, int count,
> -enum vect_cost_for_stmt kind,
> -struct _stmt_vec_info *stmt_info, tree vectype,
> -int misalign, enum vect_cost_model_location where)
> -{
> -  unsigned *cost = (unsigned *) data;
> -  unsigned retval = 0;
> -
> -  if (flag_vect_cost_model)
> -{
> -  int stmt_cost = arm_builtin_vectorization_cost (kind, vectype, 
> misalign);
> -
> -  /* Statements in an inner loop relative to the loop being
> -  vectorized are weighted more heavily.  The value here is
> -  arbitrary and could potentially be improved with analysis.  */
> -  if (where == vect_body && stmt_info
> -   && stmt_in_inner_loop_p (vinfo, stmt_info))
> - {
> -   loop_vec_info loop_vinfo = dyn_cast (vinfo);
> -   gcc_assert (loop_vinfo);
> -   count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); /*
> FIXME.  */
> - }
> -
> -  retval = (unsigned) (count * stmt_cost);
> -  cost[where] += retval;
> -}
> -
> -  return retval;
> -}
> -
>  /* Return true if and only if this insn can dual-issue only as older.  */
>  static bool
>  cortexa7_older_only (rtx_insn *insn)


Re: [PATCH 2/3][vect] Consider outside costs earlier for epilogue loops

2021-10-14 Thread Andre Vieira (lists) via Gcc-patches

Hi,

I completely forgot I still had this patch out as well, I grouped it 
together with the unrolling because it was what motivated the change, 
but it is actually wider applicable and can be reviewed separately.


On 17/09/2021 16:32, Andre Vieira (lists) via Gcc-patches wrote:

Hi,

This patch changes the order in which we check outside and inside 
costs for epilogue loops, this is to ensure that a predicated epilogue 
is more likely to be picked over an unpredicated one, since it saves 
having to enter a scalar epilogue loop.


gcc/ChangeLog:

    * tree-vect-loop.c (vect_better_loop_vinfo_p): Change how 
epilogue loop costs are compared.


[committed] Fix predcom-3.c on arc-elf after vectorizer changes

2021-10-14 Thread Jeff Law via Gcc-patches
The change to turn on the vectorizer by default has caused predcom-3 to 
fail on arc-elf.


The test seems to want to verify that the loop was unrolled 3 times as 
part of the predcom pass.  Not surprisingly, vectorization changes the 
number of iterations and thus impacts unrolling decisions.


Given the test is expecting a specific amount of unrolling which is tied 
to the iterations, ISTM the best option in this case is to disable 
vectorization for this test.


Committed to the trunk,
Jeff
commit ebdf180e15d0ae18bfcb2bd822d7f096743cd4fb
Author: Jeff Law 
Date:   Thu Oct 14 09:41:57 2021 -0400

Fix predcom-3.c on arc-elf after vectorizer changes

gcc/testsuite
* gcc.dg/tree-ssa/predcom-3.c: Disable vectorizer.

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c
index 1174cd17eec..9abbe6c1380 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/predcom-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -funroll-loops --param max-unroll-times=8 
-fpredictive-commoning -fdump-tree-pcom-details -fno-tree-pre" } */
+/* { dg-options "-O2 -funroll-loops --param max-unroll-times=8 
-fpredictive-commoning -fdump-tree-pcom-details -fno-tree-pre 
-fno-tree-loop-vectorize" } */
 
 int a[1000], b[1000];
 


[COMMITTED] Add FIXME note to backward threader.

2021-10-14 Thread Aldy Hernandez via Gcc-patches
There's a limitation in the path discovery bits in the backward
threader that I ran into recently and I'd like to document it so we
don't lose track of it.

Basically we stop looking if we find a threadable path through a PHI,
without taking into account that there could be multiple
paths through a PHI that resolve the path.  For example:

x_5 = PHI <10(4), 20(5), ...>
if (x_5 > 5)

I don't remember how we ended up skipping this, but it could existing
behavior from the old bits.  It probably skipped multiple threads
through a PHI since the generic copier couldn't re-using existing
threading paths anyhow.

Documenting for later fixing.
---
 gcc/tree-ssa-threadbackward.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/gcc/tree-ssa-threadbackward.c b/gcc/tree-ssa-threadbackward.c
index 28c7ef8c872..496b68e0a82 100644
--- a/gcc/tree-ssa-threadbackward.c
+++ b/gcc/tree-ssa-threadbackward.c
@@ -282,6 +282,13 @@ back_threader::resolve_phi (gphi *phi, bitmap interesting)
  continue;
}
 
+  // FIXME: We currently stop looking if we find a threadable path
+  // through a PHI.  This is pessimistic, as there can be multiple
+  // paths that can resolve the path.  For example:
+  //
+  // x_5 = PHI <10(4), 20(5), ...>
+  // if (x_5 > 5)
+
   tree arg = gimple_phi_arg_def (phi, i);
   if (TREE_CODE (arg) == SSA_NAME)
{
-- 
2.31.1



Re: [PATCH] aix: ensure reference to __tls_get_addr is in text section.

2021-10-14 Thread David Edelsohn via Gcc-patches
The reference to __tls_get_addr is in the data section.  And the code
just above creates a symbol in the text section referenced from the
data section to ensure the text section is retained.  So this change
doesn't make sense.  You're essentially saying that the data section
is not used, which makes the other code useless to ensure that the
text section is referenced.

Thanks, David

On Thu, Oct 14, 2021 at 3:06 AM CHIGOT, CLEMENT  wrote:
>
> The garbage collector of AIX linker might remove the reference to
> __tls_get_addr if it's added inside an unused csect.
>
>
> Clément Chigot
> ATOS Bull SAS
> 1 rue de Provence - 38432 Échirolles - France
>


Re: [PATCH][RFC] Introduce TREE_AOREFWRAP to cache ao_ref in the IL

2021-10-14 Thread Michael Matz via Gcc-patches
Hello,

On Thu, 14 Oct 2021, Richard Biener wrote:

> > So, at _this_ write-through of the email I think I like the above idea 
> > best: make ao_ref be a tree (at least its storage, because it currently 
> > is a one-member-function class), make ao_ref.volatile_p be 
> > tree_base.volatile_flag (hence TREE_VOLATILE(ao_ref)) (this reduces 
> > sizeof(ao_ref) by 8), increase all nr-of-operand of each tcc_reference by 
> > 1, and make TREE_AO_REF(reftree) be "TREE_OPERAND(reftree, 
> > TREE_CODE_LENGTH(reftree) - 1)", i.e. the last operand of such 
> > tcc_reference tree.
> 
> Hmm.  I'm not sure that's really something I like - it's especially
> quite some heavy lifting while at the same time lacking true boldness
> as to changing the representation of memory refs ;)

Well, it would at least enable such changes later in an orderly fashion.

> That said - I've prototyped the TREE_ASM_WRITTEN way now because it's 
> even simpler than the original TREE_AOREFWRAP approach, see below.
> 
> Note that I'm not embedding it into the tree structure, I'm merely
> using the same allocation to store two objects, the outermost ref
> and the ao_ref associated with it.  Quote:
> 
> +  size_t length = tree_code_size (TREE_CODE (lhs));
> +  if (!TREE_ASM_WRITTEN (lhs))
> +{
> +  tree alt_lhs
> +   = ggc_alloc_cleared_tree_node_stat (length + sizeof (ao_ref));
> +  memcpy (alt_lhs, lhs, length);
> +  TREE_ASM_WRITTEN (alt_lhs) = 1;
> +  *ref = new ((char *)alt_lhs + length) ao_ref;

You need to ensure that alt_lhs+length is properly aligned for ao_ref, but 
yeah, for a hack that works.  If you really want to go that way you need 
good comments about this hack.  It's really somewhat worrisome that the 
size of the allocation depends on a bit in tree_base.

(It's a really cute hack that works as a micro optimization, the question 
is, do we really need to go there already, are all other less hacky 
approaches not bringing similar improvements?  The cuter the hacks the 
less often they pay off in the long run of production software :) )


Ciao,
Michael.


Re: [RFC] Replace VRP with EVRP passes

2021-10-14 Thread Andrew MacLeod via Gcc-patches

On 10/14/21 8:54 AM, Richard Biener wrote:

On Thu, Oct 14, 2021 at 2:46 PM Andrew MacLeod  wrote:


I always test ada, objc.. anything that can be built as part of my
normal build :-)  so yes.

As far as I am aware, we are not missing any symbolic/relational cases,
that has all been functioning for a couple of months.



I think option 2 would be safest at least so any important iteration/symbolic
work is done early.


Certainly works as a good first step.

So let's pull the trigger on that one?


OK, we'll put something together.



Re: [PATCH][RFC] Introduce TREE_AOREFWRAP to cache ao_ref in the IL

2021-10-14 Thread Richard Biener via Gcc-patches
On Wed, 13 Oct 2021, Michael Matz wrote:

> Hello,
> 
> [this is the fourth attempt to write a comment/review/opinion for this 
> ao_ref-in-tcc_reference, please accept some possible incoherence]
> 
> On Tue, 12 Oct 2021, Richard Biener via Gcc-patches wrote:
> 
> > This prototype hack introduces a new tcc_reference TREE_AOREFWRAP
> > which we can use to wrap a reference tree, recording the ao_ref
> > associated with it.  That comes in handy when trying to optimize
> > the constant factor involved with alias stmt walking (or alias
> > queries in general) where there's parts that are liner in the
> > reference expression complexity, namely get_ref_base_and_extent,
> > which shows up usually high on profiles.
> 
> So, generally I like storing things into the IL that are impossible to 
> (re-)discover.  Remembering things that are merely slowish to rediscover 
> is less clear, the increase of IL memory use, and potentially the 
> necessary pointer chasing might just trade one clearly measurable slow 
> point (the rediscover computation) with many slow points all over the 
> place (the pointer chasing/cache effects).  Here ...

Note putting the cache in the IL rather than on-the-side was to
improve this - notably the approach embedding the ao_ref within
the outermost ref allocation (the TREE_ASM_WRITTEN thing) would
be just a pointer add away from memory we access anyway.  It also
cuts down the cost of the cache query itself to a bit test in
cache rather than querying a hashtable (and the hashtable pointer).

But yes, embedding any sort of cache into the IL is questionable.

> > The following patch is minimal as to make tree-ssa.exp=ssa-fre-*
> > not ICE and make the testcases from PR28071 and PR39326 compile
> > successfully at -O1 (both testcases show a moderately high
> > load on alias stmt walking around 25%, resp. 34%).  With the
> > patch which makes use of the cache only from stmt_may_clobber_ref_p
> > for now the compile-time improves by 7%, resp. 19% which means
> > overall the idea might be worth pursuing.
> 
> ... you seem to have a point, though.  Also, I am of the opinion that our 
> gimple memrefs could be even fatter (and also encode things like 
> multi-dimensional accesses, either right from the source code or 
> discovered by index analysis), and your idea goes into that direction.

Agreed with the issue of our reference tree representation, not sure
if the cache really goes into the fixing direction though.

> So, yay for encoding memref properties into the IL, even though here it's 
> only a cache.  You solved the necessary invalidation already.  Perhaps 
> only partly, that will be seen once exposed to the wild.
> 
> So, the only question seems to be how to encode it: either by ...
> 
> > transparent by instead of wrapping the refs with another tree
> 
> ... this (wrapping) ...
> 
> > to reallocate the outermost handled_component_p (and only those),
> 
> ... or that (aggregation).  There is a third possibility if you want this 
> only in the gimple world (which is the case): encode it not in the trees 
> but in the gimple statements.  This sort of works easily for everything 
> except calls.  I will not consider this variant, nor the side table 
> implementation.

Yeah, I was not considering to use the GIMPLE stmt itself because
we can have two (for aggregate copies) and N (for calls and asms).
I'm still considering the side table though.

> 
> While writing this email I switched between more liking one or the other, 
> multiple times.  So, I'll write down some basic facts/requirements:
> 
> 1) You basically want to add stuff to an existing structure:
> (a) by wrapping: to work seemlessly the outer tree should have similar 
> enough properties to the inner tree (e.g. also be tcc_reference) to be 
> used interchangably in most code, except that which needs to look at 
> the added stuff.
> (b) by aggregating the stuff into the existing structure itself: if you 
> need both structs (with and without stuff) the pure thing to do is to 
> actually create two structs, once with, once without stuff.
> 2) the added stuff is optional
> 3) we have multiple things (all tcc_reference) to which to add stuff
> 4) all tcc_reference are tree_exp, which is variable number of operands,
>which constrain things we can do naturally (e.g. we can't add stuff 
>after tree_exp, except by constraining the number of operands)
> 
> Considering this it seems that aggregation is worse: you basically double 
> the number of structure types (at least conceptually, if you go with your 
> bit-idea).  So, some idea of wrapping seems more natural.
> 
> (I think your idea of aggregation but going with a bit flag to indicate if 
> this tcc_reference is or isn't annotated, and therefore has things 
> allocated after the variable number of operands, is a terrible hack)
> 
> There is another possibility doing something like your bit-flag 
> aggregation but with fewer hackery: if ao_ref would be a tree 

Re: [PATCH] Add forgotten documentation of param ipa-cp-recursive-freq-factor

2021-10-14 Thread Martin Liška

On 10/14/21 15:03, Martin Jambor wrote:

|OK for trunk?|


Please push it as obvious.

Martin


[RFC] vect: Convert cost hooks to classes

2021-10-14 Thread Richard Sandiford via Gcc-patches
The current vector cost interface has a quite a bit of redundancy
built in.  Each target that defines its own hooks has to replicate
the basic unsigned[3] management.  Currently each target also
duplicates the cost adjustment for inner loops.

This patch instead defines a vector_costs class for holding
the scalar or vector cost and allows targets to subclass it.
There is then only one costing hook: to create a new costs
structure of the appropriate type.  Everything else can be
virtual functions, with common concepts implemented in the
base class rather than in each target's derivation.

This might seem like excess C++-ification, but it shaves
~100 LOC.  I've also got some follow-on changes that become
significantly easier with this patch.  Maybe it could help
with things like weighting blocks based on frequency too.

This will clash with Andre's unrolling patches.  His patches
have priority so this patch should queue behind them.

The x86 and rs6000 parts fully convert to a self-contained class.
The equivalent aarch64 changes are more complex, so this patch
just does the bare minimum.  A later patch will rework the
aarch64 bits.

Tested on aarch64-linux-gnu, arm-linux-gnueabihf, x86_64-linux-gnu
and powerpc64le-linux-gnu.  WDYT?

Richard


gcc/
* target.def (targetm.vectorize.init_cost): Replace with...
(targetm.vectorize.create_costs): ...this.
(targetm.vectorize.add_stmt_cost): Delete.
(targetm.vectorize.finish_cost): Likewise.
(targetm.vectorize.destroy_cost_data): Likewise.
* doc/tm.texi.in (TARGET_VECTORIZE_INIT_COST): Replace with...
(TARGET_VECTORIZE_CREATE_COSTS): ...this.
(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
(TARGET_VECTORIZE_FINISH_COST): Likewise.
(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
* doc/tm.texi: Regenerate.
* tree-vectorizer.h (vec_info::vec_info): Remove target_cost_data
parameter.
(vec_info::target_cost_data): Change from a void * to a vector_costs *.
(vector_costs): New class.
(init_cost): Take a vec_info and return a vector_costs.
(dump_stmt_cost): Remove data parameter.
(add_stmt_cost): Replace vinfo and data parameters with a vector_costs.
(add_stmt_costs): Likewise.
(finish_cost): Replace data parameter with a vector_costs.
(destroy_cost_data): Delete.
* tree-vectorizer.c (dump_stmt_cost): Remove data argument and
don't print it.
(vec_info::vec_info): Remove the target_cost_data parameter and
initialize the member variable to null instead.
(vec_info::~vec_info): Delete target_cost_data instead of calling
destroy_cost_data.
(vector_costs::add_stmt_cost): New function.
(vector_costs::finish_cost): Likewise.
(vector_costs::record_stmt_cost): Likewise.
(vector_costs::adjust_cost_for_freq): Likewise.
* tree-vect-loop.c (_loop_vec_info::_loop_vec_info): Update
call to vec_info::vec_info.
(vect_compute_single_scalar_iteration_cost): Update after above
changes to costing interface.
(vect_analyze_loop_operations): Likewise.
(vect_estimate_min_profitable_iters): Likewise.
(vect_analyze_loop_2): Initialize LOOP_VINFO_TARGET_COST_DATA
at the start_over point, where it needs to be recreated after
trying without slp.  Update retry code accordingly.
* tree-vect-slp.c (_bb_vec_info::_bb_vec_info): Update call
to vec_info::vec_info.
(vect_slp_analyze_operation): Update after above changes to costing
interface.
(vect_bb_vectorization_profitable_p): Likewise.
* targhooks.h (default_init_cost): Replace with...
(default_vectorize_create_costs): ...this.
(default_add_stmt_cost): Delete.
(default_finish_cost, default_destroy_cost_data): Likewise.
* targhooks.c (default_init_cost): Replace with...
(default_vectorize_create_costs): ...this.
(default_add_stmt_cost): Delete, moving logic to vector_costs instead.
(default_finish_cost, default_destroy_cost_data): Delete.
* config/aarch64/aarch64.c (aarch64_vector_costs): Inherit from
vector_costs.  Add a constructor.
(aarch64_init_cost): Replace with...
(aarch64_vectorize_create_costs): ...this.
(aarch64_add_stmt_cost): Replace with...
(aarch64_vector_costs::add_stmt_cost): ...this.  Use record_stmt_cost
to adjust the cost for inner loops.
(aarch64_finish_cost): Replace with...
(aarch64_vector_costs::finish_cost): ...this.
(aarch64_destroy_cost_data): Delete.
(TARGET_VECTORIZE_INIT_COST): Replace with...
(TARGET_VECTORIZE_CREATE_COSTS): ...this.
(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
(TARGET_VECTORIZE_FINISH_COST): Likewise.
(TARGET_VECTORIZE_DESTROY_COST_DATA): Likewise.
* config/i386/i386.c 

[PATCH] Add forgotten documentation of param ipa-cp-recursive-freq-factor

2021-10-14 Thread Martin Jambor
Hi,

Martin Liška has noticed that I forgot to document the recently added
parameter in the invoke.texi documentation.  This patch fixes it.

Tested by running make info and make pdf and examining the output.

OK for trunk?

Thanks,

Martin


gcc/ChangeLog:

2021-10-14  Martin Jambor  

* doc/invoke.texi (Optimize Options): Add entry for
ipa-cp-recursive-freq-factor.
---
 gcc/doc/invoke.texi | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 03234c887dc..64347b1ceba 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -14246,6 +14246,10 @@ Maximum depth of recursive cloning for self-recursive 
function.
 Recursive cloning only when the probability of call being executed exceeds
 the parameter.
 
+@item ipa-cp-recursive-freq-factor
+The number of times interprocedural copy propagation expects recursive
+functions to call themselves.
+
 @item ipa-cp-recursion-penalty
 Percentage penalty the recursive functions will receive when they
 are evaluated for cloning.
-- 
2.33.0



[PATCH] rs6000: Fix memory leak in rs6000_density_test

2021-10-14 Thread Richard Sandiford via Gcc-patches
rs6000_density_test has an early exit test between a call
to get_loop_body and the corresponding free.  This would
lead to a memory leak if the early exit is taken.

Tested on powerpc64le-linux-gnu.  It's obvious that moving the
test avoids the leak, but there are multiple ways to write it,
so: OK to install?

Richard


gcc/
* config/rs6000/rs6000.c (rs6000_density_test): Move early
exit test further up the function.
---
 gcc/config/rs6000/rs6000.c | 17 -
 1 file changed, 8 insertions(+), 9 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index acba4d9f26c..01a95591a5d 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5289,20 +5289,19 @@ struct rs6000_cost_data
 static void
 rs6000_density_test (rs6000_cost_data *data)
 {
-  struct loop *loop = data->loop_info;
-  basic_block *bbs = get_loop_body (loop);
-  int nbbs = loop->num_nodes;
-  loop_vec_info loop_vinfo = loop_vec_info_for_loop (data->loop_info);
-  int vec_cost = data->cost[vect_body], not_vec_cost = 0;
-  int i, density_pct;
-
   /* This density test only cares about the cost of vector version of the
  loop, so immediately return if we are passed costing for the scalar
  version (namely computing single scalar iteration cost).  */
   if (data->costing_for_scalar)
 return;
 
-  for (i = 0; i < nbbs; i++)
+  struct loop *loop = data->loop_info;
+  basic_block *bbs = get_loop_body (loop);
+  int nbbs = loop->num_nodes;
+  loop_vec_info loop_vinfo = loop_vec_info_for_loop (data->loop_info);
+  int vec_cost = data->cost[vect_body], not_vec_cost = 0;
+
+  for (int i = 0; i < nbbs; i++)
 {
   basic_block bb = bbs[i];
   gimple_stmt_iterator gsi;
@@ -5322,7 +5321,7 @@ rs6000_density_test (rs6000_cost_data *data)
 }
 
   free (bbs);
-  density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);
+  int density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);
 
   if (density_pct > rs6000_density_pct_threshold
   && vec_cost + not_vec_cost > rs6000_density_size_threshold)


[PATCH] tree-optimization/102659 - really avoid undef overflow in if-conversion

2021-10-14 Thread Richard Biener via Gcc-patches
This plugs the remaining hole of POINTER_PLUS_EXPR with undefined
overflow.  Unfortunately we have to go through some lengths to
not put invariant conversions into the loop body since that confuses
the vectorizers gather/scatter discovery which relies on identifying
an invariant component of plus and minus expressions.  We can
emit those in the loop preheader but then we have to accept that
being non-empty when looking for the LOOP_VECTORIZED internal
function call in the vectorizer.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-10-14  Richard Biener  

PR tree-optimization/102659
* tree-if-conv.c (if_convertible_gimple_assign_stmt_p): Also
rewrite pointer typed undefined overflow operations.
(predicate_statements): Likewise.  Make sure to emit invariant
conversions in the preheader.
* tree-vectorizer.c (vect_loop_vectorized_call): Look through
non-empty preheaders.
* tree-data-ref.c (dr_analyze_indices): Strip useless
conversions to the MEM_REF base type.
---
 gcc/tree-data-ref.c   |  1 +
 gcc/tree-if-conv.c| 37 ++---
 gcc/tree-vectorizer.c |  3 ++-
 3 files changed, 33 insertions(+), 8 deletions(-)

diff --git a/gcc/tree-data-ref.c b/gcc/tree-data-ref.c
index 18307a554fc..57bac06242f 100644
--- a/gcc/tree-data-ref.c
+++ b/gcc/tree-data-ref.c
@@ -1370,6 +1370,7 @@ dr_analyze_indices (struct indices *dri, tree ref, edge 
nest, loop_p loop)
   tree op = TREE_OPERAND (ref, 0);
   tree access_fn = analyze_scalar_evolution (loop, op);
   access_fn = instantiate_scev (nest, loop, access_fn);
+  STRIP_NOPS (access_fn);
   if (TREE_CODE (access_fn) == POLYNOMIAL_CHREC)
{
  tree memoff = TREE_OPERAND (ref, 1);
diff --git a/gcc/tree-if-conv.c b/gcc/tree-if-conv.c
index 0b6b07cfac6..15dcc1e2b94 100644
--- a/gcc/tree-if-conv.c
+++ b/gcc/tree-if-conv.c
@@ -1047,7 +1047,8 @@ if_convertible_gimple_assign_stmt_p (gimple *stmt,
fprintf (dump_file, "tree could trap...\n");
   return false;
 }
-  else if (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+  else if ((INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+   || POINTER_TYPE_P (TREE_TYPE (lhs)))
   && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (lhs))
   && arith_code_with_undefined_signed_overflow
(gimple_assign_rhs_code (stmt)))
@@ -2520,6 +2521,7 @@ predicate_statements (loop_p loop)
   for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi);)
{
  gassign *stmt = dyn_cast  (gsi_stmt (gsi));
+ tree lhs;
  if (!stmt)
;
  else if (is_false_predicate (cond)
@@ -2574,15 +2576,36 @@ predicate_statements (loop_p loop)
 
  gsi_replace (, new_stmt, true);
}
- else if (INTEGRAL_TYPE_P (TREE_TYPE (gimple_assign_lhs (stmt)))
-  && TYPE_OVERFLOW_UNDEFINED
-   (TREE_TYPE (gimple_assign_lhs (stmt)))
+ else if (((lhs = gimple_assign_lhs (stmt)), true)
+  && (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
+  || POINTER_TYPE_P (TREE_TYPE (lhs)))
+  && TYPE_OVERFLOW_UNDEFINED (TREE_TYPE (lhs))
   && arith_code_with_undefined_signed_overflow
(gimple_assign_rhs_code (stmt)))
{
  gsi_remove (, true);
- gsi_insert_seq_before (, rewrite_to_defined_overflow (stmt),
-GSI_LAST_NEW_STMT);
+ gimple_seq stmts = rewrite_to_defined_overflow (stmt);
+ bool first = true;
+ for (gimple_stmt_iterator gsi2 = gsi_start (stmts);
+  !gsi_end_p (gsi2);)
+   {
+ gassign *stmt2 = as_a  (gsi_stmt (gsi2));
+ gsi_remove (, false);
+ /* Make sure to move invariant conversions out of the
+loop.  */
+ if (CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (stmt2))
+ && expr_invariant_in_loop_p (loop,
+  gimple_assign_rhs1 (stmt2)))
+   gsi_insert_on_edge_immediate (loop_preheader_edge (loop),
+ stmt2);
+ else if (first)
+   {
+ gsi_insert_before (, stmt2, GSI_NEW_STMT);
+ first = false;
+   }
+ else
+   gsi_insert_after (, stmt2, GSI_NEW_STMT);
+   }
}
  else if (gimple_vdef (stmt))
{
@@ -2601,7 +2624,7 @@ predicate_statements (loop_p loop)
  gimple_assign_set_rhs1 (stmt, ifc_temp_var (type, rhs, ));
  update_stmt (stmt);
}
- tree lhs = gimple_get_lhs (gsi_stmt (gsi));
+ lhs = gimple_get_lhs (gsi_stmt (gsi));
  if (lhs && TREE_CODE 

Re: [RFC] Replace VRP with EVRP passes

2021-10-14 Thread Richard Biener via Gcc-patches
On Thu, Oct 14, 2021 at 2:46 PM Andrew MacLeod  wrote:
>
> On 10/14/21 4:27 AM, Richard Biener wrote:
> > On Wed, Oct 13, 2021 at 10:58 PM Andrew MacLeod  wrote:
> >> As work has progressed, we're pretty close to being able to functionally
> >> replace VRP with another EVRP pass.  At least it seems close enough that
> >> we should discuss if thats something we might want to consider for this
> >> release.   Replacing just one of the 2 VRP passes is another option.
> >>
> >> First, lets examine simplifications/folds.
> >>
> >> Running over my set of 380 GCC source files, we see the following
> >> results for number of cases we currently get:
> >>
> >> Number of EVRP cases :   5789
> >> Number of VRP1 cases :   4913
> >> Number of VRP2 cases :   279
> >>combined VRP1/2:   5192
> >>
> >> The 2 passes of VRP get a total of 5192 cases.
> >>
> >> If we run EVRP instead of each VRP pass, we get the following results:
> >>
> >> Number of EVRP1 cases :   5789
> >> Number of EVRP2 cases :   7521
> >> Number of EVRP3 cases :   2240
> >>combined EVRP2/3:   9761
> >>
> >> so the EVRP passes find an additional 4569 opportunities,  or 88% more.
> >> This initially surprised me until it occurred to me that this is
> >> probably due to the pruning require for ASSERT_EXPRs, which means it
> >> never had a chance to see some of those cases. Notice how the second
> >> pass appears far more effective now.
> >>
> >> Regarding what we would miss if we took VRP out, if we run  EVRP passes
> >> first then a VRP pass immediately after, we see what VRP finds that EVRP
> >> cannot:
> >>
> >> Number of EVRP2 cases :  7521
> >> Number of VRP1 cases :   11
> >> Number of EVRP3 cases :  2269
> >> Number of VRP2 cases :   54
> >>
> >> I have looked at some of these, and so far they all appear to be cases
> >> which are solved via the iteration model VRP uses.
> > Most should be now handled by using SCEV analysis on PHIs rather
> > than VRP iteration so can you share an example where the iteration helps?
>
> I didn't delve too deeply since I was skimming for whats missing, and
> they all seemed to be in very large programs with cyclic phis feeding
> each other.
>
> I'll pick one shortly and do a deeper dive.
>
>
> >
> >> regardless, missing
> >> 65 cases and getting 4569 new ones would seem to be a win. I will
> >> continue to investigate them.
> >>
> >> == Performance ==
> >>
> >> The threading work has been pulled out of VRP, so we get a better idea
> >> of what VRPs time really is. we're showing about a 58% slowdown in VRP
> >> over the 2 passes.  I've begun investigating because it shouldn't be off
> >> by that much, Im seeing a lot of excess time being wasted with callback
> >> queries from the substitute_and_fold engine when processing PHIs.  It
> >> should be possible to bring this all back in line as that isnt the model
> >> ranger should be using anyway.
> >>
> >> I figured while I'm looking into the performance side of it, maybe we
> >> should start talking about whether we want to replace one or both of the
> >> VRP passes with an EVRP instance.
> >>
> >>
> >> I see 3 primary options:
> >> 1 - replace both VRP passes with EVRP instances.
> >> 2 - replace VRP2 with EVRP2
> >> 3 - Replace neither, leave it as is.
> >>
> >> I figure since the second pass of VRP doesn't get a lot to start with,
> >> it probably doesn't make sense to replace VRP1 and not VRP2.
> >>
> >> Option 1 is what I would expect the strategic move next release to be,
> >> it seems ready now, its just a matter of whether we want to give it more
> >> time.  It would also be trivial to turn VRP back on for one for both
> >> later in the cycle if we determines there was something important missing.
> >>
> >> option 2 is something we ought to really consider if we don't want to do
> >> option 1.  There are a few PRs that are starting to open that have VRP
> >> not getting something due to the whims of more precise mutli-ranges
> >> being converted back to a value_range, and replacing VRP2 would allow us
> >> to catch those..  plus, we pick up a lot more than  VRP2 does.
> >>
> >> I would propose we add a param, similar to what EVRP has which will
> >> allow us to choose which pass is called for VRP1 and VRP2 and set our
> >> defaults appropriately.  I wouldn't work with a hybrid like we did with
> >> EVRP... just choose which pass runs. And we'll have to adjust some
> >> testcases based one whatever our default is.
> >>
> >> Thoughts?
> >>
> >> Personally I think we give option 1 a go and if something shows up over
> >> the next couple of months, or  we cant get performance in line with
> >> where we want it, then we can switch back to VRP for one or both
> >> passes.  I wouldn't  expect either, but one never knows :-)
> >>
> >> If that isn't palatable for everyone, then I'd suggest option 2
> > How far are you with handling the symbolic range cases VRP gets
> > with the relation work?  The 

Re: [PATCH] Replace VRP threader with a hybrid forward threader.

2021-10-14 Thread Richard Biener via Gcc-patches
On Thu, Oct 14, 2021 at 2:29 PM Aldy Hernandez  wrote:
>
> On Mon, Sep 27, 2021 at 7:29 PM Richard Biener
>  wrote:
> >
> > On September 27, 2021 6:07:40 PM GMT+02:00, Aldy Hernandez via Gcc-patches 
> >  wrote:
> > >
> > >
> > >On 9/27/21 5:27 PM, Aldy Hernandez wrote:
> > >>
> > >>
> > >> On 9/27/21 5:01 PM, Jeff Law wrote:
> > >>>
> > >>>
> > >>> On 9/24/2021 9:46 AM, Aldy Hernandez wrote:
> > >
> > >>> And the big question, is the pass running after VRP2 doing anything
> > >>> particularly useful?  Do we want to try and kill it now, or later?
> > >>
> > >> Interesting question.  Perhaps if we convert DOM threading to a hybrid
> > >> model, it will render the post-VRP threader completely useless.  Huhh...
> > >> That could kill 2 birds with one stone... we get rid of a threading
> > >> pass, and we don't need to worry about as much about the super-fast 
> > >> ranger.
> > >
> > >These are just a few of the threading passes at -O2:
> > >
> > >a.c.192t.thread3   <-- bck threader
> > >a.c.193t.dom3  <-- fwd threader
> > >a.c.194t.strlen1
> > >a.c.195t.thread4   <-- bck threader
> > >a.c.196t.vrp2
> > >a.c.197t.vrp-thread2 <-- fwd threader
> > >
> > >That's almost 4 back to back threaders!
> > >
> > >*pause for effect*
> >
> > We've always known we have too many of these once Jeff triplicated all the 
> > backwards threading ones. I do hope we manage to reduce the number for GCC 
> > 12. Esp. If the new ones are slower because they no longer use simple 
> > lattices.
>
> By the way, what is the blessed way of knowing which of the N passes
> we are in?  For instance, there are 4 back threading passes (well 5
> with ethread).  I'd like to know how to know if I'm in the 4th one,
> which is the one that runs before VRP2 threading.  I know there's
> gimple_opt_pass::set_pass_param, but that seems to only take a bool.

There's no blessed way other than to make distinct passes (but you could
add multiple params emulating a binary encoding of a sequence number, eh).

But then I'd really just try removing either the threader before dom or the
one before vrp (I suppose removing the one before VRP makes most sense).

Richard.

>
> Thanks.
> Aldy
>


Re: [PATCH] openmp, fortran: Add support for declare variant in Fortran

2021-10-14 Thread Jakub Jelinek via Gcc-patches
On Thu, Oct 14, 2021 at 11:04:59AM +0100, Kwok Cheung Yeung wrote:
> I have now dropped this. This affects test2 in
> gfortran.dg/gomp/declare-variant-8.f90, which I have added a comment to.

Thanks.

> I have added Fortran-specific tests as
> gfortran.dg/gomp/declare-variant-15.f90 to declare-variant-19.f90.

What I still miss is tests for the (proc_name : variant_name) syntax
in places where proc_name : is optional, but is supplied and is valid, like
e.g. in interface, or in subroutine/function and where proc_name specifies
the name of the containing interface or subroutine/function.
I see that syntax tested in some places with dg-error on that line and
in spaces where it isn't optional (e.g. at module scope before contains).
But if you want, that can be added incrementally.

> From ab03cf08c6ee4a0a6323189313cae911483a2257 Mon Sep 17 00:00:00 2001
> From: Kwok Cheung Yeung 
> Date: Wed, 13 Oct 2021 22:39:20 +0100
> Subject: [PATCH] openmp, fortran: Add support for OpenMP declare variant
>  directive in Fortran
> 
> 2021-10-13  Kwok Cheung Yeung  
> 
> gcc/c-family/
> 
>   * c-omp.c (c_omp_check_context_selector): Rename to
>   omp_check_context_selector and move to omp-general.c.
>   (c_omp_mark_declare_variant): Rename to omp_mark_declare_variant and
>   move to omp-general.c.
> 
> gcc/c/
> 
>   * c-parser.c (c_finish_omp_declare_variant): Change call from
>   c_omp_check_context_selector to omp_check_context_selector. Change
>   call from c_omp_mark_declare_variant to omp_mark_declare_variant.
> 
> gcc/cp/
> 
>   * decl.c (omp_declare_variant_finalize_one): Change call from
>   c_omp_mark_declare_variant to omp_mark_declare_variant.
>   * parser.c (cp_finish_omp_declare_variant): Change call from
>   c_omp_check_context_selector to omp_check_context_selector.
> 
> gcc/fortran/
> 
>   * gfortran.h (enum gfc_statement): Add ST_OMP_DECLARE_VARIANT.
>   (enum gfc_omp_trait_property_kind): New.
>   (struct gfc_omp_trait_property): New.
>   (gfc_get_omp_trait_property): New macro.
>   (struct gfc_omp_selector): New.
>   (gfc_get_omp_selector): New macro.
>   (struct gfc_omp_set_selector): New.
>   (gfc_get_omp_set_selector): New macro.
>   (struct gfc_omp_declare_variant): New.
>   (gfc_get_omp_declare_variant): New macro.
>   (struct gfc_namespace): Add omp_declare_variant field.
>   (gfc_free_omp_declare_variant_list): New prototype.
>   * match.h (gfc_match_omp_declare_variant): New prototype.
>   * openmp.c (gfc_free_omp_trait_property_list): New.
>   (gfc_free_omp_selector_list): New.
>   (gfc_free_omp_set_selector_list): New.
>   (gfc_free_omp_declare_variant_list): New.
>   (gfc_match_omp_clauses): Add extra optional argument.  Handle end of
>   clauses for context selectors.
>   (omp_construct_selectors, omp_device_selectors,
>   omp_implementation_selectors, omp_user_selectors): New.
>   (gfc_match_omp_context_selector): New.
>   (gfc_match_omp_context_selector_specification): New.
>   (gfc_match_omp_declare_variant): New.
>   * parse.c: Include tree-core.h and omp-general.h.
>   (decode_omp_directive): Handle 'declare variant'.
>   (case_omp_decl): Include ST_OMP_DECLARE_VARIANT.
>   (gfc_ascii_statement): Handle ST_OMP_DECLARE_VARIANT.
>   (gfc_parse_file): Initialize omp_requires_mask.
>   * symbol.c (gfc_free_namespace): Call
>   gfc_free_omp_declare_variant_list.
>   * trans-decl.c (gfc_get_extern_function_decl): Call
>   gfc_trans_omp_declare_variant.
>   (gfc_create_function_decl): Call gfc_trans_omp_declare_variant.
>   * trans-openmp.c (gfc_trans_omp_declare_variant): New.
>   * trans-stmt.h (gfc_trans_omp_declare_variant): New prototype.
> 
> gcc/
> 
>   * omp-general.c (omp_check_context_selector):  Move from c-omp.c.
>   (omp_mark_declare_variant): Move from c-omp.c.
>   (omp_context_name_list_prop): Update for Fortran strings.
>   * omp-general.h (omp_check_context_selector): New prototype.
>   (omp_mark_declare_variant): New prototype.
> 
> gcc/testsuite/
> 
>   * gfortran.dg/gomp/declare-variant-1.f90: New test.
>   * gfortran.dg/gomp/declare-variant-10.f90: New test.
>   * gfortran.dg/gomp/declare-variant-11.f90: New test.
>   * gfortran.dg/gomp/declare-variant-12.f90: New test.
>   * gfortran.dg/gomp/declare-variant-13.f90: New test.
>   * gfortran.dg/gomp/declare-variant-14.f90: New test.
>   * gfortran.dg/gomp/declare-variant-15.f90: New test.
>   * gfortran.dg/gomp/declare-variant-16.f90: New test.
>   * gfortran.dg/gomp/declare-variant-17.f90: New test.
>   * gfortran.dg/gomp/declare-variant-18.f90: New test.
>   * gfortran.dg/gomp/declare-variant-19.f90: New test.
>   * gfortran.dg/gomp/declare-variant-2.f90: New test.
>   * gfortran.dg/gomp/declare-variant-2a.f90: New test.
>   * 

Re: [RFC] Replace VRP with EVRP passes

2021-10-14 Thread Andrew MacLeod via Gcc-patches

On 10/14/21 4:27 AM, Richard Biener wrote:

On Wed, Oct 13, 2021 at 10:58 PM Andrew MacLeod  wrote:

As work has progressed, we're pretty close to being able to functionally
replace VRP with another EVRP pass.  At least it seems close enough that
we should discuss if thats something we might want to consider for this
release.   Replacing just one of the 2 VRP passes is another option.

First, lets examine simplifications/folds.

Running over my set of 380 GCC source files, we see the following
results for number of cases we currently get:

Number of EVRP cases :   5789
Number of VRP1 cases :   4913
Number of VRP2 cases :   279
   combined VRP1/2:   5192

The 2 passes of VRP get a total of 5192 cases.

If we run EVRP instead of each VRP pass, we get the following results:

Number of EVRP1 cases :   5789
Number of EVRP2 cases :   7521
Number of EVRP3 cases :   2240
   combined EVRP2/3:   9761

so the EVRP passes find an additional 4569 opportunities,  or 88% more.
This initially surprised me until it occurred to me that this is
probably due to the pruning require for ASSERT_EXPRs, which means it
never had a chance to see some of those cases. Notice how the second
pass appears far more effective now.

Regarding what we would miss if we took VRP out, if we run  EVRP passes
first then a VRP pass immediately after, we see what VRP finds that EVRP
cannot:

Number of EVRP2 cases :  7521
Number of VRP1 cases :   11
Number of EVRP3 cases :  2269
Number of VRP2 cases :   54

I have looked at some of these, and so far they all appear to be cases
which are solved via the iteration model VRP uses.

Most should be now handled by using SCEV analysis on PHIs rather
than VRP iteration so can you share an example where the iteration helps?


I didn't delve too deeply since I was skimming for whats missing, and 
they all seemed to be in very large programs with cyclic phis feeding 
each other.


I'll pick one shortly and do a deeper dive.





regardless, missing
65 cases and getting 4569 new ones would seem to be a win. I will
continue to investigate them.

== Performance ==

The threading work has been pulled out of VRP, so we get a better idea
of what VRPs time really is. we're showing about a 58% slowdown in VRP
over the 2 passes.  I've begun investigating because it shouldn't be off
by that much, Im seeing a lot of excess time being wasted with callback
queries from the substitute_and_fold engine when processing PHIs.  It
should be possible to bring this all back in line as that isnt the model
ranger should be using anyway.

I figured while I'm looking into the performance side of it, maybe we
should start talking about whether we want to replace one or both of the
VRP passes with an EVRP instance.


I see 3 primary options:
1 - replace both VRP passes with EVRP instances.
2 - replace VRP2 with EVRP2
3 - Replace neither, leave it as is.

I figure since the second pass of VRP doesn't get a lot to start with,
it probably doesn't make sense to replace VRP1 and not VRP2.

Option 1 is what I would expect the strategic move next release to be,
it seems ready now, its just a matter of whether we want to give it more
time.  It would also be trivial to turn VRP back on for one for both
later in the cycle if we determines there was something important missing.

option 2 is something we ought to really consider if we don't want to do
option 1.  There are a few PRs that are starting to open that have VRP
not getting something due to the whims of more precise mutli-ranges
being converted back to a value_range, and replacing VRP2 would allow us
to catch those..  plus, we pick up a lot more than  VRP2 does.

I would propose we add a param, similar to what EVRP has which will
allow us to choose which pass is called for VRP1 and VRP2 and set our
defaults appropriately.  I wouldn't work with a hybrid like we did with
EVRP... just choose which pass runs. And we'll have to adjust some
testcases based one whatever our default is.

Thoughts?

Personally I think we give option 1 a go and if something shows up over
the next couple of months, or  we cant get performance in line with
where we want it, then we can switch back to VRP for one or both
passes.  I wouldn't  expect either, but one never knows :-)

If that isn't palatable for everyone, then I'd suggest option 2

How far are you with handling the symbolic range cases VRP gets
with the relation work?  The symbolic range handling is important
for Ada IIRC.  I would of course have hoped the testsuite would
catch important regressions there (did you test Ada?)


I always test ada, objc.. anything that can be built as part of my 
normal build :-)  so yes.


As far as I am aware, we are not missing any symbolic/relational cases, 
that has all been functioning for a couple of months.




I think option 2 would be safest at least so any important iteration/symbolic
work is done early.


Certainly works as a good first step.


Re: [PATCH] aarch64: Fix pointer parameter type in LD1 Neon intrinsics

2021-10-14 Thread Richard Sandiford via Gcc-patches
Jonathan Wright via Gcc-patches  writes:
> The pointer parameter to load a vector of signed values should itself
> be a signed type. This patch fixes two instances of this unsigned-
> signed implicit conversion in arm_neon.h.
>
> Tested relevant intrinsics with -Wpointer-sign and warnings no longer
> present.
>
> Ok for master?
>
> Thanks,
> Jonathan
>
> ---
>
> gcc/ChangeLog:
>
> 2021-10-14  Jonathan Wright  
>
>   * config/aarch64/arm_neon.h (vld1_s8_x3): Use signed type for
>   pointer parameter.
>   (vld1_s32_x3): Likewise.

OK, thanks.

Richard

> diff --git a/gcc/config/aarch64/arm_neon.h b/gcc/config/aarch64/arm_neon.h
> index 
> 2d5bf34b698a88ed934c522cc9f14f125c604a39..24068f8d7da5b123360a479f66b53fe2d4eea28e
>  100644
> --- a/gcc/config/aarch64/arm_neon.h
> +++ b/gcc/config/aarch64/arm_neon.h
> @@ -16198,7 +16198,7 @@ vld1_u8_x3 (const uint8_t *__a)
>  
>  __extension__ extern __inline int8x8x3_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -vld1_s8_x3 (const uint8_t *__a)
> +vld1_s8_x3 (const int8_t *__a)
>  {
>int8x8x3_t __i;
>__builtin_aarch64_simd_ci __o;
> @@ -16250,7 +16250,7 @@ vld1_u32_x3 (const uint32_t *__a)
>  
>  __extension__ extern __inline int32x2x3_t
>  __attribute__ ((__always_inline__, __gnu_inline__, __artificial__))
> -vld1_s32_x3 (const uint32_t *__a)
> +vld1_s32_x3 (const int32_t *__a)
>  {
>int32x2x3_t __i;
>__builtin_aarch64_simd_ci __o;


[PATCH] aarch64: Fix pointer parameter type in LD1 Neon intrinsics

2021-10-14 Thread Jonathan Wright via Gcc-patches
The pointer parameter to load a vector of signed values should itself
be a signed type. This patch fixes two instances of this unsigned-
signed implicit conversion in arm_neon.h.

Tested relevant intrinsics with -Wpointer-sign and warnings no longer
present.

Ok for master?

Thanks,
Jonathan

---

gcc/ChangeLog:

2021-10-14  Jonathan Wright  

* config/aarch64/arm_neon.h (vld1_s8_x3): Use signed type for
pointer parameter.
(vld1_s32_x3): Likewise.


rb14933.patch
Description: rb14933.patch


Re: [PATCH] Replace VRP threader with a hybrid forward threader.

2021-10-14 Thread Aldy Hernandez via Gcc-patches
On Mon, Sep 27, 2021 at 7:29 PM Richard Biener
 wrote:
>
> On September 27, 2021 6:07:40 PM GMT+02:00, Aldy Hernandez via Gcc-patches 
>  wrote:
> >
> >
> >On 9/27/21 5:27 PM, Aldy Hernandez wrote:
> >>
> >>
> >> On 9/27/21 5:01 PM, Jeff Law wrote:
> >>>
> >>>
> >>> On 9/24/2021 9:46 AM, Aldy Hernandez wrote:
> >
> >>> And the big question, is the pass running after VRP2 doing anything
> >>> particularly useful?  Do we want to try and kill it now, or later?
> >>
> >> Interesting question.  Perhaps if we convert DOM threading to a hybrid
> >> model, it will render the post-VRP threader completely useless.  Huhh...
> >> That could kill 2 birds with one stone... we get rid of a threading
> >> pass, and we don't need to worry about as much about the super-fast ranger.
> >
> >These are just a few of the threading passes at -O2:
> >
> >a.c.192t.thread3   <-- bck threader
> >a.c.193t.dom3  <-- fwd threader
> >a.c.194t.strlen1
> >a.c.195t.thread4   <-- bck threader
> >a.c.196t.vrp2
> >a.c.197t.vrp-thread2 <-- fwd threader
> >
> >That's almost 4 back to back threaders!
> >
> >*pause for effect*
>
> We've always known we have too many of these once Jeff triplicated all the 
> backwards threading ones. I do hope we manage to reduce the number for GCC 
> 12. Esp. If the new ones are slower because they no longer use simple 
> lattices.

By the way, what is the blessed way of knowing which of the N passes
we are in?  For instance, there are 4 back threading passes (well 5
with ethread).  I'd like to know how to know if I'm in the 4th one,
which is the one that runs before VRP2 threading.  I know there's
gimple_opt_pass::set_pass_param, but that seems to only take a bool.

Thanks.
Aldy



[pushed] Darwin: Update quotes in driver warning messages.

2021-10-14 Thread Iain Sandoe via Gcc-patches
This adds some missing quotes around options and option argument
terms in warning messages.  Avoid contractions in warning
messages.

tested on x86_64 darwin, pushed to master, thanks
Iain

Signed-off-by: Iain Sandoe 

gcc/ChangeLog:

* config/darwin-driver.c (darwin_find_version_from_kernel):
Quote internal identifiers and avoid contractions in
warnings.
(darwin_default_min_version): Likewise.
(darwin_driver_init): Likewise.
---
 gcc/config/darwin-driver.c | 52 ++
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/gcc/config/darwin-driver.c b/gcc/config/darwin-driver.c
index 3d7768f055d..573abae4782 100644
--- a/gcc/config/darwin-driver.c
+++ b/gcc/config/darwin-driver.c
@@ -143,7 +143,7 @@ darwin_find_version_from_kernel (void)
   if (sysctl (osversion_name, ARRAY_SIZE (osversion_name), osversion,
  _len, NULL, 0) == -1)
 {
-  warning (0, "sysctl for kern.osversion failed: %m");
+  warning (0, "% for % failed: %m");
   return NULL;
 }
 
@@ -189,7 +189,7 @@ darwin_find_version_from_kernel (void)
   return new_flag;
 
  parse_failed:
-  warning (0, "couldn%'t understand kern.osversion %q.*s",
+  warning (0, "could not understand % %q.*s",
   (int) osversion_len, osversion);
   return NULL;
 }
@@ -229,7 +229,7 @@ darwin_default_min_version (void)
   const char *checked = validate_macosx_version_min (new_flag);
   if (checked == NULL)
{
- warning (0, "could not understand version %s", new_flag);
+ warning (0, "could not understand version %qs", new_flag);
  return NULL;
}
   new_flag = xstrndup (checked, strlen (checked));
@@ -305,7 +305,7 @@ darwin_driver_init (unsigned int *decoded_options_count,
  else if (!strcmp ((*decoded_options)[i].arg, "ppc64"))
seenPPC64 = true;
  else
-   error ("this compiler does not support %s",
+   error ("this compiler does not support %qs",
   (*decoded_options)[i].arg);
  /* Now we've examined it, drop the -arch arg.  */
  if (*decoded_options_count > i) {
@@ -377,45 +377,53 @@ darwin_driver_init (unsigned int *decoded_options_count,
   /* Turn -arch  into the appropriate -m32/-m64 flag.
  If the User tried to specify multiple arch flags (which is possible with
  some Darwin compilers) warn that this mode is not supported by this
- compiler (and ignore the arch flags, which means that the default multi-
- lib will be generated).  */
+ compiler.  We take arch specifiers that agree with the default multilib
+ as the first choice and reject others.  */
   /* TODO: determine if these warnings would better be errors.  */
 #if DARWIN_X86
   if (seenPPC || seenPPC64)
-warning (0, "this compiler does not support PowerPC (arch flags ignored)");
+warning (0, "this compiler does not support PowerPC"
+   " (%<-arch%> option ignored)");
   if (seenX86)
 {
   if (seenX86_64 || seenM64)
-   warning (0, "%s conflicts with i386 (arch flags ignored)",
-   (seenX86_64? "x86_64": "m64"));
-  else if (! seenM32) /* Add -m32 if the User didn't. */
+   {
+ const char *op = (seenX86_64? "-arch x86_64": "-m64");
+ warning (0, "%qs conflicts with %<-arch i386%> (%qs ignored)",
+  op, op);
+   }
+  if (! seenM32) /* Add -m32 if the User didn't. */
appendM32 = true;
 }
   else if (seenX86_64)
 {
-  if (seenX86 || seenM32)
-   warning (0, "%s conflicts with x86_64 (arch flags ignored)",
-(seenX86? "i386": "m32"));
-  else if (! seenM64) /* Add -m64 if the User didn't. */
+  if (seenM32)
+   warning (0, "%<-m32%> conflicts with %<-arch x86_64%>"
+   " (%<-m32%> ignored)");
+  if (! seenM64) /* Add -m64 if the User didn't. */
appendM64 = true;
 }  
 #elif DARWIN_PPC
   if (seenX86 || seenX86_64)
-warning (0, "this compiler does not support X86 (arch flags ignored)");
+warning (0, "this compiler does not support x86"
+   " (%<-arch%> option ignored)");
   if (seenPPC)
 {
   if (seenPPC64 || seenM64)
-   warning (0, "%s conflicts with ppc (arch flags ignored)",
-(seenPPC64? "ppc64": "m64"));
-  else if (! seenM32) /* Add -m32 if the User didn't. */
+   {
+ const char *op = (seenPPC64? "-arch ppc64": "-m64");
+ warning (0, "%qs conflicts with %<-arch ppc%> (%qs ignored)",
+  op, op);
+   }
+  if (! seenM32) /* Add -m32 if the User didn't. */
appendM32 = true;
 }
   else if (seenPPC64)
 {
-  if (seenPPC || seenM32)
-   warning (0, "%s conflicts with ppc64 (arch flags ignored)",
-(seenPPC? "ppc": "m32"));
-  else if (! seenM64) /* Add -m64 if the User didn't. */
+  if (seenM32)
+   warning (0, "%<-m32%> 

[COMMITTED] Do not call range_on_path_entry for SSAs defined within the path

2021-10-14 Thread Aldy Hernandez via Gcc-patches
In the path solver, when requesting the range of an SSA for which we
know nothing, we ask the ranger for the range incoming to the path.
We do this by asking for all the incoming ranges to the path entry
block and unioning them.

The problem here is that we're asking for a range on path entry for an
SSA which *is* defined in the path, but for which we know nothing
about:

some_global.1_2 = some_global;
_3 = (char) some_global.1_2;

This request is causing us to ask for range_on_edge of _3 on the
incoming edges to the path.  This is a bit of nonsensical request
because _3 isn't live on entry to the path, so ranger correctly
returns UNDEFINED.  The proper thing is to avoid asking this in the
first place.

I have added a relevant assert, since it doesn't make sense to call
range_on_path_entry for SSAs defined within the path.

Tested on x86-64 Linux.

PR 102736

gcc/ChangeLog:

PR tree/optimization/102736
* gimple-range-path.cc (path_range_query::range_on_path_entry):
Assert that the requested range is defined outside the path.
(path_range_query::ssa_range_in_phi): Do not call
range_on_path_entry for SSA names that are defined within the
path.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr102736.c: New test.
---
 gcc/gimple-range-path.cc |  6 +-
 gcc/testsuite/gcc.dg/tree-ssa/pr102736.c | 21 +
 2 files changed, 26 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr102736.c

diff --git a/gcc/gimple-range-path.cc b/gcc/gimple-range-path.cc
index 422abfddb8f..694271306a7 100644
--- a/gcc/gimple-range-path.cc
+++ b/gcc/gimple-range-path.cc
@@ -134,6 +134,7 @@ path_range_query::defined_outside_path (tree name)
 void
 path_range_query::range_on_path_entry (irange , tree name)
 {
+  gcc_checking_assert (defined_outside_path (name));
   int_range_max tmp;
   basic_block entry = entry_bb ();
   bool changed = false;
@@ -258,7 +259,10 @@ path_range_query::ssa_range_in_phi (irange , gphi *phi)
// Using both the range on entry to the path, and the
// range on this edge yields significantly better
// results.
-   range_on_path_entry (r, arg);
+   if (defined_outside_path (arg))
+ range_on_path_entry (r, arg);
+   else
+ r.set_varying (TREE_TYPE (name));
m_ranger.range_on_edge (tmp, e_in, arg);
r.intersect (tmp);
return;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr102736.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr102736.c
new file mode 100644
index 000..7e556f01a86
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr102736.c
@@ -0,0 +1,21 @@
+// { dg-do run }
+// { dg-options "-O1 -ftree-vrp" }
+
+int a, b = -1, c;
+int d = 1;
+static inline char e(char f, int g) { return g ? f : 0; }
+static inline char h(char f) { return f < a ? f : f < a; }
+static inline unsigned char i(unsigned char f, int g) { return g ? f : f > g; }
+void j() {
+L:
+  c = e(1, i(h(b), d));
+  if (b)
+return;
+  goto L;
+}
+int main() {
+  j();
+  if (c != 1)
+__builtin_abort ();
+  return 0;
+}
-- 
2.31.1



Re: [PATCH] options: Fix variable tracking option processing.

2021-10-14 Thread Richard Biener via Gcc-patches
On Thu, Oct 14, 2021 at 1:10 PM Martin Liška  wrote:
>
> On 10/13/21 15:29, Richard Biener wrote:
> > On Wed, Oct 13, 2021 at 3:12 PM Martin Liška  wrote:
> >>
> >> On 10/13/21 14:50, Richard Biener wrote:
> >>> It does, yes.  But that's a ^ with flag_var_tracking_assignments_toggle;)
> >>>
> >>> It's also one of the more weird flags, so it could be applied after the
> >>> otherwise single set of flag_var_tracking_assignments ...
> >>
> >> Well, it's far from being simple.
> >> Can we please make a step and install the patch I sent? I mean the latest
> >> that does the removal of AUTODETECT_VALUE.
> >
> > But parts of the patch are not obvious and you've not explained why you
> > remove all Init(AUTODETECT_VALUE) but for flag_var_tracking you
> > change it to Init(1).  I count 4 assignments to flag_var_tracking in 
> > toplev.c
> > and one in nvptx.c and c6x.c each.
>
> All right. So the assignments in these target set flag_var_tracking = 0, which
> is fine.
>
> >
> >if (flag_var_tracking_uninit == AUTODETECT_VALUE)
> >  flag_var_tracking_uninit = flag_var_tracking;
> >
> > can probably be handled by EnabledBy, but then we also have
> >
> >if (flag_var_tracking_uninit == 1)
> >  flag_var_tracking = 1;
>
> Yep, I made:
>
> fvar-tracking
> Common Var(flag_var_tracking) PerFunction EnabledBy(fvar-tracking-uninit)
>
> and made fvar-tracking enabled with OPT_LEVELS_1_PLUS.
>
> >
> > which suggests the same the other way around.  I guess
>
> The other way around is problematic as leads to a cycle. I tried adding
> a cycle detection in common_handle_option_auto (using a bitmap). But it breaks
> case OPT_Wunused, which set sets various sub-values :/
>
>
> > positional handling might differ with say
> > -fvar-tracking -fno-var-tracking-uninit vs. -fno-var-tracking-uninit
> > -fvar-tracking
>
> I verified this is fine in debugger.
>
> > when using EnabledBy vs. the "explicit" code.
> >
> > +  else if (!OPTION_SET_P (flag_var_tracking) && flag_var_tracking)
> >   flag_var_tracking = optimize >= 1;
> >
> > I think _this_ should, instead of the Init(1), become an entry in
> > default_options_table with OPT_LEVELS_1_PLUS.
>
> Done.
>
> >
> > As said, besides flag_var_* the posted patch looks OK and is good to
> > commit.
>
> I'm sending an updated version that survives regression tests.
>
> Thoughts?

OK.

Thanks,
Richard.

> Martin
>
> >
> > Richard.
> >
> >>
> >> Martin


Re: [PATCH, OpenMP 5.1, Fortran] Strictly-structured block support for OpenMP directives

2021-10-14 Thread Jakub Jelinek via Gcc-patches
On Thu, Oct 14, 2021 at 12:20:51PM +0200, Jakub Jelinek via Gcc-patches wrote:
> Thinking more about the Fortran case for !$omp sections, there is an
> ambiguity.
> !$omp sections
> block
>   !$omp section
> end block
> is clear and !$omp end sections is optional, but
> !$omp sections
> block
> end block
> is ambiguous during parsing, it could be either followed by !$omp section
> and then the BLOCK would be first section, or by !$omp end sections and then
> it would be clearly the whole sections, with first section being empty
> inside of the block, or if it is followed by something else, it is
> ambiguous whether the block ... end block is part of the first section,
> followed by something and then we should be looking later for either
> !$omp section or !$omp end section to prove that, or if
> !$omp sections
> block
> end block
> was the whole sections construct and we shouldn't await anything further.
> I'm afraid back to the drawing board.

And I have to correct myself, there is no ambiguity in 5.2 here,
the important fact is hidden in sections/parallel sections being
block-associated constructs.  That means the body of the whole construct
has to be a structured-block, and by the 5.1+ definition of Fortran
structured block, it is either block ... end block or something that
doesn't start with block.
So,
!$omp sections
block
end block
a = 1
is only ambiguous in whether it is actually
!$omp sections
block
  !$omp section
end block
a = 1
or
!$omp sections
!$omp section
block
end block
!$omp end sections
a = 1
but both actually do the same thing, work roughly as !$omp single.
If one wants block statement as first in structured-block-sequence
of the first section, followed by either some further statements
or by other sections, then one needs to write
!$omp sections
!$omp section
block
end block
a = 1
...
!$omp end sections
or
!$omp sections
block
  block
  end block
  a = 1
...
end block

Your patch probably already handles it that way, but we again need
testsuite coverage to prove it is handled the way it should in all these
cases (and that we diagnose what is invalid).

Jakub



Re: [PATCH] options: Fix variable tracking option processing.

2021-10-14 Thread Martin Liška

On 10/13/21 15:29, Richard Biener wrote:

On Wed, Oct 13, 2021 at 3:12 PM Martin Liška  wrote:


On 10/13/21 14:50, Richard Biener wrote:

It does, yes.  But that's a ^ with flag_var_tracking_assignments_toggle;)

It's also one of the more weird flags, so it could be applied after the
otherwise single set of flag_var_tracking_assignments ...


Well, it's far from being simple.
Can we please make a step and install the patch I sent? I mean the latest
that does the removal of AUTODETECT_VALUE.


But parts of the patch are not obvious and you've not explained why you
remove all Init(AUTODETECT_VALUE) but for flag_var_tracking you
change it to Init(1).  I count 4 assignments to flag_var_tracking in toplev.c
and one in nvptx.c and c6x.c each.


All right. So the assignments in these target set flag_var_tracking = 0, which
is fine.



   if (flag_var_tracking_uninit == AUTODETECT_VALUE)
 flag_var_tracking_uninit = flag_var_tracking;

can probably be handled by EnabledBy, but then we also have

   if (flag_var_tracking_uninit == 1)
 flag_var_tracking = 1;


Yep, I made:

fvar-tracking
Common Var(flag_var_tracking) PerFunction EnabledBy(fvar-tracking-uninit)

and made fvar-tracking enabled with OPT_LEVELS_1_PLUS.



which suggests the same the other way around.  I guess


The other way around is problematic as leads to a cycle. I tried adding
a cycle detection in common_handle_option_auto (using a bitmap). But it breaks
case OPT_Wunused, which set sets various sub-values :/



positional handling might differ with say
-fvar-tracking -fno-var-tracking-uninit vs. -fno-var-tracking-uninit
-fvar-tracking


I verified this is fine in debugger.


when using EnabledBy vs. the "explicit" code.

+  else if (!OPTION_SET_P (flag_var_tracking) && flag_var_tracking)
  flag_var_tracking = optimize >= 1;

I think _this_ should, instead of the Init(1), become an entry in
default_options_table with OPT_LEVELS_1_PLUS.


Done.



As said, besides flag_var_* the posted patch looks OK and is good to
commit.


I'm sending an updated version that survives regression tests.

Thoughts?
Martin



Richard.



Martin
From 4d9adc296aa527d9ae504da4271e7293d77bceed Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Tue, 12 Oct 2021 14:31:50 +0200
Subject: [PATCH] Eliminate AUTODETECT_VALUE usage in options.

gcc/ChangeLog:

	* common.opt: Stop using AUTODETECT_VALUE
	and use EnabledBy where possible.
	* opts.c: Enable OPT_fvar_tracking with optimize >= 1.
	* toplev.c (AUTODETECT_VALUE): Remove macro.
	(process_options): Simplify by using EnabledBy and
	OPT_fvar_tracking.  Use OPTION_SET_P macro instead of
	AUTODETECT_VALUE.
---
 gcc/common.opt | 24 ++--
 gcc/opts.c |  1 +
 gcc/toplev.c   | 41 ++---
 3 files changed, 25 insertions(+), 41 deletions(-)

diff --git a/gcc/common.opt b/gcc/common.opt
index 1eedfeaf539..a2af7fb36e0 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -3003,19 +3003,16 @@ Common Undocumented Var(flag_use_linker_plugin)
 
 ; Positive if we should track variables, negative if we should run
 ; the var-tracking pass only to discard debug annotations, zero if
-; we're not to run it.  When flag_var_tracking == 2 (AUTODETECT_VALUE) it
-; will be set according to optimize, debug_info_level and debug_hooks
-; in process_options ().
+; we're not to run it.
 fvar-tracking
-Common Var(flag_var_tracking) Init(2) PerFunction
+Common Var(flag_var_tracking) PerFunction EnabledBy(fvar-tracking-uninit)
 Perform variable tracking.
 
 ; Positive if we should track variables at assignments, negative if
 ; we should run the var-tracking pass only to discard debug
-; annotations.  When flag_var_tracking_assignments ==
-; AUTODETECT_VALUE it will be set according to flag_var_tracking.
+; annotations.
 fvar-tracking-assignments
-Common Var(flag_var_tracking_assignments) Init(2) PerFunction
+Common Var(flag_var_tracking_assignments) PerFunction
 Perform variable tracking by annotating assignments.
 
 ; Nonzero if we should toggle flag_var_tracking_assignments after
@@ -3026,8 +3023,7 @@ Toggle -fvar-tracking-assignments.
 
 ; Positive if we should track uninitialized variables, negative if
 ; we should run the var-tracking pass only to discard debug
-; annotations.  When flag_var_tracking_uninit == AUTODETECT_VALUE it
-; will be set according to flag_var_tracking.
+; annotations.
 fvar-tracking-uninit
 Common Var(flag_var_tracking_uninit) PerFunction
 Perform variable tracking and also tag variables that are uninitialized.
@@ -3190,11 +3186,11 @@ Common Driver RejectNegative JoinedOrMissing
 Generate debug information in default format.
 
 gas-loc-support
-Common Driver Var(dwarf2out_as_loc_support) Init(2)
+Common Driver Var(dwarf2out_as_loc_support)
 Assume assembler support for (DWARF2+) .loc directives.
 
 gas-locview-support
-Common Driver Var(dwarf2out_as_locview_support) Init(2)
+Common Driver Var(dwarf2out_as_locview_support)
 Assume assembler support for view in 

Re: [PATCH] c++: fix cases of core1001/1322 by not dropping cv-qualifier of function parameter of type of typename or decltype[PR101402, PR102033, PR102034, PR102039, PR102

2021-10-14 Thread Nick Huang via Gcc-patches
hi Jason,

IMHO, I think your patch probably finally solved this long-standing Core
1001 issue. Of course it is not up to me to say so. I just want to point out
that it even solves the following case, even though it is more or less
expected if concept and lambda all work expectedly.

template
concept IsLambdaAry3=__is_same(T, decltype(+[]{})[3]);
template
void bar(const T){}
template<>
void bar(const decltype(+[]{})[3]){}

B.Regards,


[committed] aarch64: Remove redundant flag_vect_cost_model test

2021-10-14 Thread Richard Sandiford via Gcc-patches
The aarch64 version of add_stmt_cost has a redundant test
of flag_vect_cost_model.  The current structure was based
on the contemporaneous definition of default_add_stmt_cost,
but g:d6d1127249564146429009e0682f25bd58d7a791 later removed
the flag_vect_cost_model test from the default version.

Tested on aarch64-linux-gnu, pushed.

Richard


gcc/
* config/aarch64/aarch64.c (aarch64_add_stmt_cost): Remove
redundant test for flag_vect_cost_model.
---
 gcc/config/aarch64/aarch64.c | 214 +--
 1 file changed, 105 insertions(+), 109 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index a9a1800af53..76d99d247ae 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -15487,125 +15487,121 @@ aarch64_add_stmt_cost (class vec_info *vinfo, void 
*data, int count,
   int misalign, enum vect_cost_model_location where)
 {
   auto *costs = static_cast (data);
-  unsigned retval = 0;
 
-  if (flag_vect_cost_model)
-{
-  fractional_cost stmt_cost
-   = aarch64_builtin_vectorization_cost (kind, vectype, misalign);
-
-  bool in_inner_loop_p = (where == vect_body
- && stmt_info
- && stmt_in_inner_loop_p (vinfo, stmt_info));
-
-  /* Do one-time initialization based on the vinfo.  */
-  loop_vec_info loop_vinfo = dyn_cast (vinfo);
-  bb_vec_info bb_vinfo = dyn_cast (vinfo);
-  if (!costs->analyzed_vinfo && aarch64_use_new_vector_costs_p ())
-   {
- if (loop_vinfo)
-   aarch64_analyze_loop_vinfo (loop_vinfo, costs);
- else
-   aarch64_analyze_bb_vinfo (bb_vinfo, costs);
- costs->analyzed_vinfo = true;
-   }
+  fractional_cost stmt_cost
+= aarch64_builtin_vectorization_cost (kind, vectype, misalign);
 
-  /* Try to get a more accurate cost by looking at STMT_INFO instead
-of just looking at KIND.  */
-  if (stmt_info && aarch64_use_new_vector_costs_p ())
-   {
- if (vectype && aarch64_sve_only_stmt_p (stmt_info, vectype))
-   costs->saw_sve_only_op = true;
-
- /* If we scalarize a strided store, the vectorizer costs one
-vec_to_scalar for each element.  However, we can store the first
-element using an FP store without a separate extract step.  */
- if (vect_is_store_elt_extraction (kind, stmt_info))
-   count -= 1;
-
- stmt_cost = aarch64_detect_scalar_stmt_subtype
-   (vinfo, kind, stmt_info, stmt_cost);
+  bool in_inner_loop_p = (where == vect_body
+ && stmt_info
+ && stmt_in_inner_loop_p (vinfo, stmt_info));
 
- if (vectype && costs->vec_flags)
-   stmt_cost = aarch64_detect_vector_stmt_subtype (vinfo, kind,
-   stmt_info, vectype,
-   where, stmt_cost);
-   }
-
-  /* Do any SVE-specific adjustments to the cost.  */
-  if (stmt_info && vectype && aarch64_sve_mode_p (TYPE_MODE (vectype)))
-   stmt_cost = aarch64_sve_adjust_stmt_cost (vinfo, kind, stmt_info,
- vectype, stmt_cost);
-
-  if (stmt_info && aarch64_use_new_vector_costs_p ())
-   {
- /* Account for any extra "embedded" costs that apply additively
-to the base cost calculated above.  */
- stmt_cost = aarch64_adjust_stmt_cost (kind, stmt_info, vectype,
-   stmt_cost);
-
- /* If we're recording a nonzero vector loop body cost for the
-innermost loop, also estimate the operations that would need
-to be issued by all relevant implementations of the loop.  */
- auto *issue_info = aarch64_tune_params.vec_costs->issue_info;
- if (loop_vinfo
- && issue_info
- && costs->vec_flags
- && where == vect_body
- && (!LOOP_VINFO_LOOP (loop_vinfo)->inner || in_inner_loop_p)
- && vectype
- && stmt_cost != 0)
+  /* Do one-time initialization based on the vinfo.  */
+  loop_vec_info loop_vinfo = dyn_cast (vinfo);
+  bb_vec_info bb_vinfo = dyn_cast (vinfo);
+  if (!costs->analyzed_vinfo && aarch64_use_new_vector_costs_p ())
+{
+  if (loop_vinfo)
+   aarch64_analyze_loop_vinfo (loop_vinfo, costs);
+  else
+   aarch64_analyze_bb_vinfo (bb_vinfo, costs);
+  costs->analyzed_vinfo = true;
+}
+
+  /* Try to get a more accurate cost by looking at STMT_INFO instead
+ of just looking at KIND.  */
+  if (stmt_info && aarch64_use_new_vector_costs_p ())
+{
+  if (vectype && aarch64_sve_only_stmt_p (stmt_info, vectype))
+   costs->saw_sve_only_op = true;
+
+  /* If we scalarize a strided store, the vectorizer costs one
+vec_to_scalar for each element.  

[PATCH] arm: Remove add_stmt_cost hook

2021-10-14 Thread Richard Sandiford via Gcc-patches
The arm implementation of add_stmt_cost was added alongside
arm_builtin_vectorization_cost.  At that time it was necessary
to override the latter when overriding the former, since
default_add_stmt_cost didn't indirect through the
builtin_vectorization_cost target hook:

  int stmt_cost = default_builtin_vectorization_cost (kind, vectype,
  misalign);

That was fixed by:

2014-06-06  Bingfeng Mei  

   * targhooks.c (default_add_stmt_cost): Call target specific
   hook instead of default one.

so the arm definition of add_stmt_cost is now equivalent
to the default.

Bootrapped & regression-tested on arm-linux-gnueabihf.  OK to install?

Richard


gcc/
* config/arm/arm.c (arm_add_stmt_cost): Delete.
(TARGET_VECTORIZE_ADD_STMT_COST): Delete.
---
 gcc/config/arm/arm.c | 40 
 1 file changed, 40 deletions(-)

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index d8c5d2bc7db..e51f60a1841 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -304,11 +304,6 @@ static bool aarch_macro_fusion_pair_p (rtx_insn*, 
rtx_insn*);
 static int arm_builtin_vectorization_cost (enum vect_cost_for_stmt 
type_of_cost,
   tree vectype,
   int misalign ATTRIBUTE_UNUSED);
-static unsigned arm_add_stmt_cost (vec_info *vinfo, void *data, int count,
-  enum vect_cost_for_stmt kind,
-  struct _stmt_vec_info *stmt_info,
-  tree vectype, int misalign,
-  enum vect_cost_model_location where);
 
 static void arm_canonicalize_comparison (int *code, rtx *op0, rtx *op1,
 bool op0_preserve_value);
@@ -769,8 +764,6 @@ static const struct attribute_spec arm_attribute_table[] =
 #undef TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST
 #define TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST \
   arm_builtin_vectorization_cost
-#undef TARGET_VECTORIZE_ADD_STMT_COST
-#define TARGET_VECTORIZE_ADD_STMT_COST arm_add_stmt_cost
 
 #undef TARGET_CANONICALIZE_COMPARISON
 #define TARGET_CANONICALIZE_COMPARISON \
@@ -12242,39 +12235,6 @@ arm_builtin_vectorization_cost (enum 
vect_cost_for_stmt type_of_cost,
 }
 }
 
-/* Implement targetm.vectorize.add_stmt_cost.  */
-
-static unsigned
-arm_add_stmt_cost (vec_info *vinfo, void *data, int count,
-  enum vect_cost_for_stmt kind,
-  struct _stmt_vec_info *stmt_info, tree vectype,
-  int misalign, enum vect_cost_model_location where)
-{
-  unsigned *cost = (unsigned *) data;
-  unsigned retval = 0;
-
-  if (flag_vect_cost_model)
-{
-  int stmt_cost = arm_builtin_vectorization_cost (kind, vectype, misalign);
-
-  /* Statements in an inner loop relative to the loop being
-vectorized are weighted more heavily.  The value here is
-arbitrary and could potentially be improved with analysis.  */
-  if (where == vect_body && stmt_info
- && stmt_in_inner_loop_p (vinfo, stmt_info))
-   {
- loop_vec_info loop_vinfo = dyn_cast (vinfo);
- gcc_assert (loop_vinfo);
- count *= LOOP_VINFO_INNER_LOOP_COST_FACTOR (loop_vinfo); /* FIXME.  */
-   }
-
-  retval = (unsigned) (count * stmt_cost);
-  cost[where] += retval;
-}
-
-  return retval;
-}
-
 /* Return true if and only if this insn can dual-issue only as older.  */
 static bool
 cortexa7_older_only (rtx_insn *insn)


Re: [PATCH] Adjust testcase for O2 vectorization.

2021-10-14 Thread Kewen.Lin via Gcc-patches
Hi Hongtao,

on 2021/10/14 下午3:11, liuhongt wrote:
> Hi Kewen:
>   Cound you help to verify if this patch fix those regressions
> for rs6000 port.
> 

The ppc64le run just finished, there are still some regresssions:

NA->XPASS: c-c++-common/Wstringop-overflow-2.c  -Wc++-compat   (test for 
warnings, line 194)
NA->XPASS: c-c++-common/Wstringop-overflow-2.c  -Wc++-compat   (test for 
warnings, line 212)
NA->XPASS: c-c++-common/Wstringop-overflow-2.c  -Wc++-compat   (test for 
warnings, line 296)
NA->XPASS: c-c++-common/Wstringop-overflow-2.c  -Wc++-compat   (test for 
warnings, line 314)
NA->FAIL: gcc.dg/Wstringop-overflow-21-novec.c (test for excess errors)
NA->FAIL: gcc.dg/Wstringop-overflow-21-novec.c  (test for warnings, line 18)
NA->FAIL: gcc.dg/Wstringop-overflow-21-novec.c  (test for warnings, line 29)
NA->FAIL: gcc.dg/Wstringop-overflow-21-novec.c  (test for warnings, line 45)
NA->FAIL: gcc.dg/Wstringop-overflow-21-novec.c  (test for warnings, line 55)
NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c note (test for warnings, line 
104)
NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c note (test for warnings, line 
137)
NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c note (test for warnings, line 19)
NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c note (test for warnings, line 39)
NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c note (test for warnings, line 56)
NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c note (test for warnings, line 70)
NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c (test for excess errors)
NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 116)
NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 131)
NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 146)
NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 33)
NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 50)
NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 64)
NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 78)
NA->FAIL: gcc.dg/Wstringop-overflow-76-novec.c  (test for warnings, line 97)
PASS->FAIL: c-c++-common/Wstringop-overflow-2.c  -std=gnu++14 (test for excess 
errors)
NA->FAIL: c-c++-common/Wstringop-overflow-2.c  -std=gnu++14  (test for 
warnings, line 229)
NA->FAIL: c-c++-common/Wstringop-overflow-2.c  -std=gnu++14  (test for 
warnings, line 230)
NA->FAIL: c-c++-common/Wstringop-overflow-2.c  -std=gnu++14  (test for 
warnings, line 331)
NA->FAIL: c-c++-common/Wstringop-overflow-2.c  -std=gnu++14  (test for 
warnings, line 332)
// omitting -std=gnu++17, -std=gnu++2a, -std=gnu++98

I'll have a look and get back to you tomorrow.

BR,
Kewen

> As discussed in [1], this patch add xfail/target selector to those
> testcases, also make a copy of them so that they can be tested w/o
> vectorization.
> 
> Newly added xfail/target selectors are used to check the vectorization
> capability of continuous byte/double bytes storage, these scenarios
> are exactly the part of the testcases that regressed after O2
> vectorization.
> 
> [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-October/581456.html.
> 
> gcc/testsuite/ChangeLog
> 
>   PR middle-end/102722
>   PR middle-end/102697
>   PR middle-end/102462
>   PR middle-end/102706
>   * c-c++-common/Wstringop-overflow-2.c: Adjust testcase with new
>   xfail/target selector.
>   * gcc.dg/Warray-bounds-51.c: Ditto.
>   * gcc.dg/Warray-parameter-3.c: Ditto.
>   * gcc.dg/Wstringop-overflow-14.c: Ditto.
>   * gcc.dg/Wstringop-overflow-21.c: Ditto.
>   * gcc.dg/Wstringop-overflow-68.c: Ditto.
>   * gcc.dg/Wstringop-overflow-76.c: Ditto.
>   * gcc.dg/Warray-bounds-48.c: Ditto.
>   * lib/target-supports.exp (check_vect_slp_vnqihi_store_usage):
>   New function.
>   (check_effective_target_vect_slp_v2qi_store): Ditto.
>   (check_effective_target_vect_slp_v4qi_store): Ditto.
>   (check_effective_target_vect_slp_v8qi_store): Ditto.
>   (check_effective_target_vect_slp_v16qi_store): Ditto.
>   (check_effective_target_vect_slp_v2hi_store): Ditto.
>   (check_effective_target_vect_slp_v4hi_store): Ditto.
>   * c-c++-common/Wstringop-overflow-2-novec.c: New test.
>   * gcc.dg/Warray-bounds-51-novec.c: New test.
>   * gcc.dg/Warray-bounds-48-novec.c: New test.
>   * gcc.dg/Warray-parameter-3-novec.c: New test.
>   * gcc.dg/Wstringop-overflow-14-novec.c: New test.
>   * gcc.dg/Wstringop-overflow-21-novec.c: New test.
>   * gcc.dg/Wstringop-overflow-76-novec.c: New test.
> ---
>  .../c-c++-common/Wstringop-overflow-2-novec.c | 348 +
>  .../c-c++-common/Wstringop-overflow-2.c   |  26 +-
>  gcc/testsuite/gcc.dg/Warray-bounds-48-novec.c | 364 ++
>  gcc/testsuite/gcc.dg/Warray-bounds-48.c   |   6 +-
>  gcc/testsuite/gcc.dg/Warray-bounds-51-novec.c |  61 +++
>  gcc/testsuite/gcc.dg/Warray-bounds-51.c   

Re: [PATCH, OpenMP 5.1, Fortran] Strictly-structured block support for OpenMP directives

2021-10-14 Thread Jakub Jelinek via Gcc-patches
On Thu, Oct 07, 2021 at 07:09:07PM +0200, Jakub Jelinek wrote:
> The workshare/parallel workshare case is unclear, I've filed
> https://github.com/OpenMP/spec/issues/3153
> for it.  Either don't allow block if workshare_stmts_only for now
> until that is clarified, or if we do, we need to make sure that it does the
> expected thing, does that gfc_trans_block_construct call ensure it?
> 
> Then we have the
> https://github.com/OpenMP/spec/issues/3154
> issue Tobias discovered, if that issue is resolved to end always applying to
> the directive before the block statement, I think your patch handles it that
> way but we want testsuite coverage for some of those cases.

Just want to follow-up on this, we now have resolutions of the
https://github.com/OpenMP/spec/issues/3153
https://github.com/OpenMP/spec/issues/3154
https://github.com/OpenMP/spec/issues/3155
issues and we can use that to guide this patch.
BLOCK is now explicitly allowed for workshare around the body of
workshare/parallel workshare or around the body of critical in it but not
arbitrarily nested.  My understanding of the patch is that it most likely
implements that, just we need a testsuite coverage that
!$omp workshare
block
  a = 1
  b = 2
  !$omp critical
  block
c = 3
  end block
end block
is fine (also with !$omp end {criticial,workshare} after the block), but
that
!$omp workshare
a = 1
block
  b = 2
  c = 3
end block
!$omp end workshare
etc. is diagnosed.
For Tobias' issue that !$omp end whatever after end block for strictly
structured block binds to the directive above the strictly structured block
I think the patch also implements it but we want again testsuite coverage,
that
subroutine foo
!$omp parallel
!$omp parallel
block
end block
!$omp end parallel
!$omp end parallel
end subroutine foo
subroutine bar
!$omp teams
!$omp parallel
block
end block
!$omp end teams
end subroutine bar
is fine while e.g.
subroutine baz
!$omp parallel
!$omp parallel
block
end block
!$omp end parallel
end subroutine baz
is not (!$omp end parallel pairs with the inner parallel rather than outer,
and the outer parallel's body doesn't start with BLOCK, so needs to be
paired with its !$omp end parallel).
And lastly, the 3rd ticket clarifies that for the separating directives
for Fortran basically the 5.0 state remains except that the body can be now
also optionally wrapped in a single BLOCK.
(And for C/C++ allows no statements at all in between the separating
directives or after/before them but still requires the {}s around it like
5.1 and earlier.  Here we implement the 5.1 wording and let's stay with
that.)
Thinking more about the Fortran case for !$omp sections, there is an
ambiguity.
!$omp sections
block
  !$omp section
end block
is clear and !$omp end sections is optional, but
!$omp sections
block
end block
is ambiguous during parsing, it could be either followed by !$omp section
and then the BLOCK would be first section, or by !$omp end sections and then
it would be clearly the whole sections, with first section being empty
inside of the block, or if it is followed by something else, it is
ambiguous whether the block ... end block is part of the first section,
followed by something and then we should be looking later for either
!$omp section or !$omp end section to prove that, or if
!$omp sections
block
end block
was the whole sections construct and we shouldn't await anything further.
I'm afraid back to the drawing board.

Jakub



[ping^2] Make sure that we get unique test names if several DejaGnu directives refer to the same line [PR102735]

2021-10-14 Thread Thomas Schwinge
Hi!

Ping, again.

Commit log updated for 
"privatization-1-compute.c results in both XFAIL and PASS".


Grüße
 Thomas


On 2021-09-30T08:42:25+0200, I wrote:
> Hi!
>
> Ping.
>
> On 2021-09-22T13:03:46+0200, I wrote:
>> On 2021-09-19T11:35:00-0600, Jeff Law via Gcc-patches 
>>  wrote:
>>> A couple of goacc tests do not have unique names.
>>
>> Thanks for fixing this up, and sorry, largely my "fault", I suppose.  ;-|
>>
>>> This causes problems
>>> for the test comparison script when one of the test passes and the other
>>> fails -- in this scenario the test comparison script claims there is a
>>> regression.
>>
>> So I understand correctly that this is a problem not just for actual
>> mixed PASS vs. FAIL (which we'd like you to report anyway!) that appear
>> for the same line, but also for mixed PASS vs. XFAIL?  (Because, the
>> latter appears to be what you're addressing with your commit here.)
>>
>>> This slipped through for a while because I had turned off x86_64 testing
>>> (others test it regularly and I was revamping the tester's hardware
>>> requirements).  Now that I've acquired more x86_64 resources and turned
>>> on native x86 testing again, it's been flagged.
>>
>> (I don't follow that argument -- these test cases should be all generic?
>> Anyway, not important, I guess.)
>>
>>> This patch just adds a numeric suffix to the TODO string to disambiguate
>>> them.
>>
>> So, instead of doing this manually (always error-prone!), like you've...
>>
>>> Committed to the trunk,
>>
>>> commit f75b237254f32d5be32c9d9610983b777abea633
>>> Author: Jeff Law 
>>> Date:   Sun Sep 19 13:31:32 2021 -0400
>>>
>>> [committed] Make test names unique for a couple of goacc tests
>>
>>> --- a/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute.f90
>>> +++ b/gcc/testsuite/gfortran.dg/goacc/privatization-1-compute.f90
>>> @@ -39,9 +39,9 @@ contains
>>>!$acc atomic write ! ... to force 'TREE_ADDRESSABLE'.
>>>y = a
>>>  !$acc end parallel
>>> -! { dg-note {variable 'i' in 'private' clause potentially has improper 
>>> OpenACC privatization level: 'parm_decl'} "TODO" { xfail *-*-* } 
>>> l_compute$c_compute }
>>> -! { dg-note {variable 'j' in 'private' clause potentially has improper 
>>> OpenACC privatization level: 'parm_decl'} "TODO" { xfail *-*-* } 
>>> l_compute$c_compute }
>>> -! { dg-note {variable 'a' in 'private' clause potentially has improper 
>>> OpenACC privatization level: 'parm_decl'} "TODO" { xfail *-*-* } 
>>> l_compute$c_compute }
>>> +! { dg-note {variable 'i' in 'private' clause potentially has improper 
>>> OpenACC privatization level: 'parm_decl'} "TODO2" { xfail *-*-* } 
>>> l_compute$c_compute }
>>> +! { dg-note {variable 'j' in 'private' clause potentially has improper 
>>> OpenACC privatization level: 'parm_decl'} "TODO3" { xfail *-*-* } 
>>> l_compute$c_compute }
>>> +! { dg-note {variable 'a' in 'private' clause potentially has improper 
>>> OpenACC privatization level: 'parm_decl'} "TODO4" { xfail *-*-* } 
>>> l_compute$c_compute }
>>
>> ... etc. (also similarly in a handful of earlier commits, if I remember
>> correctly), why don't we do that programmatically, like in the attached
>> "Make sure that we get unique test names if several DejaGnu directives
>> refer to the same line", once and for all?  OK to push after proper
>> testing?
>
> Attached again, for easy reference.
>
> I figure it may help if I showed an example of how this changes things;
> for the test case cited above (word-diff):
>
> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O   {+at line 40+} 
> (test for warnings, line 39)
> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O   {+at line 41+} 
> (test for warnings, line 22)
> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O   {+at line 42+} 
> (test for warnings, line 39)
> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O   {+at line 43+} 
> (test for warnings, line 22)
> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O   {+at line 44+} 
> (test for warnings, line 39)
> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O   {+at line 45+} 
> (test for warnings, line 22)
> XFAIL: gfortran.dg/goacc/privatization-1-compute.f90   -O  TODO2 {+at 
> line 50+} (test for warnings, line 29)
> XFAIL: gfortran.dg/goacc/privatization-1-compute.f90   -O  TODO3 {+at 
> line 51+} (test for warnings, line 29)
> XFAIL: gfortran.dg/goacc/privatization-1-compute.f90   -O  TODO4 {+at 
> line 52+} (test for warnings, line 29)
> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O  TODO {+at line 
> 53+} (test for warnings, line 29)
> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O   {+at line 54+} 
> (test for warnings, line 29)
> PASS: gfortran.dg/goacc/privatization-1-compute.f90   -O  (test for 
> excess errors)
>
> Given that we now amend the 'comment' by 'at line $useline"', and given
> that only ever 

Ping: [PATCH] Add a simulate_record_decl lang hook

2021-10-14 Thread Richard Sandiford via Gcc-patches
Ping

Richard Sandiford via Gcc-patches  writes:
> This patch adds a lang hook for defining a struct/RECORD_TYPE
> “as if” it had appeared directly in the source code.  It follows
> the similar existing hook for enums.
>
> It's the caller's responsibility to create the fields
> (as FIELD_DECLs) but the hook's responsibility to create
> and declare the associated RECORD_TYPE.
>
> For now the hook is hard-coded to do the equivalent of:
>
>   typedef struct NAME { FIELDS } NAME;
>
> but this could be controlled by an extra parameter if some callers
> want a different behaviour in future.
>
> The motivating use case is to allow the long list of struct
> definitions in arm_neon.h to be provided by the compiler,
> which in turn unblocks various arm_neon.h optimisations.
>
> Tested on aarch64-linux-gnu, individually and with a follow-on
> patch from Jonathan that makes use of the hook.  OK to install?
>
> Richard


gcc/
* langhooks.h (lang_hooks_for_types::simulate_record_decl): New hook.
* langhooks-def.h (lhd_simulate_record_decl): Declare.
(LANG_HOOKS_SIMULATE_RECORD_DECL): Define.
(LANG_HOOKS_FOR_TYPES_INITIALIZER): Include it.
* langhooks.c (lhd_simulate_record_decl): New function.

gcc/c/
* c-tree.h (c_simulate_record_decl): Declare.
* c-objc-common.h (LANG_HOOKS_SIMULATE_RECORD_DECL): Override.
* c-decl.c (c_simulate_record_decl): New function.

gcc/cp/
* decl.c: Include langhooks-def.h.
(cxx_simulate_record_decl): New function.
* cp-objcp-common.h (cxx_simulate_record_decl): Declare.
(LANG_HOOKS_SIMULATE_RECORD_DECL): Override.
---
 gcc/c/c-decl.c   | 31 +++
 gcc/c/c-objc-common.h|  2 ++
 gcc/c/c-tree.h   |  2 ++
 gcc/cp/cp-objcp-common.h |  4 
 gcc/cp/decl.c| 38 ++
 gcc/langhooks-def.h  |  4 
 gcc/langhooks.c  | 21 +
 gcc/langhooks.h  | 10 ++
 8 files changed, 112 insertions(+)

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 771efa3eadf..8d1324b118c 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -9436,6 +9436,37 @@ c_simulate_enum_decl (location_t loc, const char *name,
   input_location = saved_loc;
   return enumtype;
 }
+
+/* Implement LANG_HOOKS_SIMULATE_RECORD_DECL.  */
+
+tree
+c_simulate_record_decl (location_t loc, const char *name,
+   array_slice fields)
+{
+  location_t saved_loc = input_location;
+  input_location = loc;
+
+  class c_struct_parse_info *struct_info;
+  tree ident = get_identifier (name);
+  tree type = start_struct (loc, RECORD_TYPE, ident, _info);
+
+  for (unsigned int i = 0; i < fields.size (); ++i)
+{
+  DECL_FIELD_CONTEXT (fields[i]) = type;
+  if (i > 0)
+   DECL_CHAIN (fields[i - 1]) = fields[i];
+}
+
+  finish_struct (loc, type, fields[0], NULL_TREE, struct_info);
+
+  tree decl = build_decl (loc, TYPE_DECL, ident, type);
+  TYPE_NAME (type) = decl;
+  TYPE_STUB_DECL (type) = decl;
+  lang_hooks.decls.pushdecl (decl);
+
+  input_location = saved_loc;
+  return type;
+}

 /* Create the FUNCTION_DECL for a function definition.
DECLSPECS, DECLARATOR and ATTRIBUTES are the parts of
diff --git a/gcc/c/c-objc-common.h b/gcc/c/c-objc-common.h
index 7d35a0621e4..f4e8271f06c 100644
--- a/gcc/c/c-objc-common.h
+++ b/gcc/c/c-objc-common.h
@@ -81,6 +81,8 @@ along with GCC; see the file COPYING3.  If not see
 
 #undef LANG_HOOKS_SIMULATE_ENUM_DECL
 #define LANG_HOOKS_SIMULATE_ENUM_DECL c_simulate_enum_decl
+#undef LANG_HOOKS_SIMULATE_RECORD_DECL
+#define LANG_HOOKS_SIMULATE_RECORD_DECL c_simulate_record_decl
 #undef LANG_HOOKS_TYPE_FOR_MODE
 #define LANG_HOOKS_TYPE_FOR_MODE c_common_type_for_mode
 #undef LANG_HOOKS_TYPE_FOR_SIZE
diff --git a/gcc/c/c-tree.h b/gcc/c/c-tree.h
index d50d0cb7f2d..8578d2d1e77 100644
--- a/gcc/c/c-tree.h
+++ b/gcc/c/c-tree.h
@@ -598,6 +598,8 @@ extern tree finish_struct (location_t, tree, tree, tree,
   class c_struct_parse_info *);
 extern tree c_simulate_enum_decl (location_t, const char *,
  vec *);
+extern tree c_simulate_record_decl (location_t, const char *,
+   array_slice);
 extern struct c_arg_info *build_arg_info (void);
 extern struct c_arg_info *get_parm_info (bool, tree);
 extern tree grokfield (location_t, struct c_declarator *,
diff --git a/gcc/cp/cp-objcp-common.h b/gcc/cp/cp-objcp-common.h
index f1704aad557..d5859406e8f 100644
--- a/gcc/cp/cp-objcp-common.h
+++ b/gcc/cp/cp-objcp-common.h
@@ -39,6 +39,8 @@ extern bool cp_handle_option (size_t, const char *, 
HOST_WIDE_INT, int,
 extern tree cxx_make_type_hook (tree_code);
 extern tree cxx_simulate_enum_decl (location_t, const char *,
vec *);
+extern tree cxx_simulate_record_decl (location_t, const char *,
+ array_slice);

Re: [RFC] More jump threading restrictions in the presence of loops.

2021-10-14 Thread Aldy Hernandez via Gcc-patches
PING.

Note, that there are no PRs and nothing really dependent on this
patch.  This has just been suggested as the right thing to do wrt
loops.

This patch fixes 6 XFAILs in our testsuite and has the added
side-effect of fixing the aarch64 bootstrap problem (though the
problem in the uninit code is still there).

This is a fundamental change in what we've traditionally allowed for
jump threading, but I think it's a good thing.  It also paves the way
for combining the validity models for both the forward and the
backward threaders.

I am happy to field the PRs this may bring about, since every change
in the cost model has us questioning whether we should or shouldn't
thread a path.  But I may need some help from y'all if there's a
missing thread that causes a regression in some other pass.  That
being said, most of the issues that have come with the threader
changes haven't been because we thread less, but because we thread
more-- so perhaps restricting things is a good thing ;-).

Aldy

On Wed, Oct 6, 2021 at 12:22 PM Aldy Hernandez  wrote:
>
> [Here go the bits by Richi, tested on x86-64 Linux, and ongoing tests
> on aarch64 and ppc64le.]
>
> There is a lot of fall-out from this patch, as there were many threading
> tests that assumed the restrictions introduced by this patch were valid.
> Some tests have merely shifted the threading to after loop
> optimizations, but others ended up with no threading opportunities at
> all.  Surprisingly some tests ended up with more total threads.  It was
> a crapshoot all around.
>
> On a postive note, there are 6 tests that no longer XFAIL, and one
> guality test which now passes.
>
> I would appreciate someone reviewing the test changes.  I am unsure
> whether some of the tests should be removed, XFAILed, or some other
> thing.
>
> I felt a bit queasy about such a fundamental change wrt threading, so I
> ran it through my callgrind test harness (.ii files from a bootstrap).
> There was no change in overall compilation, DOM, or the VRP threaders.
>
> However, there was a slight increase of 1.63% in the backward threader.
> I'm pretty sure we could reduce this if we incorporated the restrictions
> into their profitability code.  This way we could stop the search when
> we ran into one of these restrictions.  Not sure it's worth it at this
> point.
>
> Note, that this ad-hoc testing is not meant to replace a more thorough
> SPEC, etc test.
>
> Tested on x86-64 Linux.
>
> OK pending tests on aarch64 and ppc64le?
>
> Co-authored-by: Richard Biener 



Re: [PATCH 4/5]AArch64 sve: optimize add reduction patterns

2021-10-14 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
> Hi All,
>
> The following loop does a conditional reduction using an add:
>
> #include 
>
> int32_t f (int32_t *restrict array, int len, int min)
> {
>   int32_t iSum = 0;
>
>   for (int i=0; i if (array[i] >= min)
>iSum += array[i];
>   }
>   return iSum;
> }
>
> for this we currently generate:
>
> mov z1.b, #0
> mov z2.s, w2
> mov z3.d, z1.d
> ptrue   p2.b, all
> ld1wz0.s, p0/z, [x0, x3, lsl 2]
> cmpge   p1.s, p2/z, z0.s, z2.s
> add x3, x3, x4
> sel z0.s, p1, z0.s, z3.s
> add z1.s, p0/m, z1.s, z0.s
> whilelo p0.s, w3, w1
>
> where the SEL is unneeded as it's selecting between 0 or a value.  This can be
> optimized to just doing the conditional add on p1 instead of p0.  After this
> patch we generate:
>
> mov z2.s, w2
> mov z0.b, #0
> ptrue   p1.b, all
> ld1wz1.s, p0/z, [x0, x3, lsl 2]
> cmpge   p0.s, p0/z, z1.s, z2.s
> add x3, x3, x4
> add z0.s, p0/m, z0.s, z1.s
> whilelo p0.s, w3, w1
>
> and so we drop the SEL and the 0 move.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?

OK, thanks.

Richard

> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * match.pd: New rule.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/pred-cond-reduc.c: New test.
>
> --- inline copy of patch -- 
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 19cbad7592787a568d4a7cfd62746d5844c0be5f..ec98a302ac773647413f776fba15930ad247c747
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -6978,6 +6978,18 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  && element_precision (type) == element_precision (op_type))
>  (view_convert (cond_op @2 @3 @4 @5 (view_convert:op_type @1)))
>  
> +/* Detect simplication for a conditional reduction where
> +
> +   a = mask1 ? b : 0
> +   c = mask2 ? d + a : d
> +
> +   is turned into
> +
> +   c = mask1 && mask2 ? d + b : d.  */
> +(simplify
> +  (IFN_COND_ADD @0 @1 (vec_cond @2 @3 integer_zerop) @1)
> +   (IFN_COND_ADD (bit_and @0 @2) @1 @3 @1))
> +
>  /* For pointers @0 and @2 and nonnegative constant offset @1, look for
> expressions like:
>  
> diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pred-cond-reduc.c 
> b/gcc/testsuite/gcc.target/aarch64/sve/pred-cond-reduc.c
> new file mode 100644
> index 
> ..bd53025d3f17224004244dadc88e0c68ded23f12
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/sve/pred-cond-reduc.c
> @@ -0,0 +1,18 @@
> +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> +/* { dg-options "-O3 --save-temps" } */
> +
> +#include 
> +
> +int32_t f (int32_t *restrict array, int len, int min)
> +{
> +  int32_t iSum = 0;
> +
> +  for (int i=0; i +if (array[i] >= min)
> +   iSum += array[i];
> +  }
> +  return iSum;
> +}
> +
> +
> +/* { dg-final { scan-assembler-not {\tsel\tz[0-9]+\.s, p1, z[0-9]+\.s, 
> z[0-9]+\.s} } } */


Re: [RFC] Port git gcc-descr to Python

2021-10-14 Thread Jakub Jelinek via Gcc-patches
On Thu, Oct 14, 2021 at 11:06:55AM +0200, Tobias Burnus wrote:
> Additionally, I observe the following (independent of the conversion):
> For 7433458d871f6bfe2169b9d7d04fec64bb142924, I get:
>r0-80854-g7433458d871f6b
> The question is whether we are happy that only reversions since
> basepoints/gcc-5 have a version number or whether we want to
> use releases/gcc-x.y.0 -> x.y as fallback when basepoint does not
> exist or back-added more basepoints? Admittedly, this will add '.',
> thus, maybe the answer is 'no'?

Yes, everything older than basepoints/gcc-5 is r0-N-gNN
by design.  Digits aren't allowed there and while we could use say
underscore, it doesn't seem to be worth it for such old history,
and all the bugzilla highlighting, web redirection etc. handles just
r number - number - g hexnumber
or
r number - number
formats.  It is more common to refer to the old commits through svn
revisions anyway.

Jakub



Re: [PATCH 3/5]AArch64 sve: do not keep negated mask and inverse mask live at the same time

2021-10-14 Thread Richard Sandiford via Gcc-patches
Sorry for the slow reply.

Tamar Christina  writes:
> Hi All,
>
> The following example:
>
> void f11(double * restrict z, double * restrict w, double * restrict x,
>double * restrict y, int n)
> {
> for (int i = 0; i < n; i++) {
> z[i] = (w[i] > 0) ? w[i] : y[i];
> }
> }
>
> Generates currently:
>
> ptrue   p2.b, all
> ld1dz0.d, p0/z, [x1, x2, lsl 3]
> fcmgt   p1.d, p2/z, z0.d, #0.0
> bic p3.b, p2/z, p0.b, p1.b
> ld1dz1.d, p3/z, [x3, x2, lsl 3]
>
> and after the previous patches generates:
>
> ptrue   p3.b, all
> ld1dz0.d, p0/z, [x1, x2, lsl 3]
> fcmgt   p1.d, p0/z, z0.d, #0.0
> fcmgt   p2.d, p3/z, z0.d, #0.0
> not p1.b, p0/z, p1.b
> ld1dz1.d, p1/z, [x3, x2, lsl 3]
>
> where a duplicate comparison is performed for w[i] > 0.
>
> This is because in the vectorizer we're emitting a comparison for both a and 
> ~a
> where we just need to emit one of them and invert the other.  After this patch
> we generate:
>
> ld1dz0.d, p0/z, [x1, x2, lsl 3]
> fcmgt   p1.d, p0/z, z0.d, #0.0
> mov p2.b, p1.b
> not p1.b, p0/z, p1.b
> ld1dz1.d, p1/z, [x3, x2, lsl 3]
>
> In order to perform the check I have to fully expand the NOT stmts when
> recording them as the SSA names for the top level expressions differ but
> their arguments don't. e.g. in _31 = ~_34 the value of _34 differs but not
> the operands in _34.
>
> But we only do this when the operation is an ordered one because mixing
> ordered and unordered expressions can lead to de-optimized code.

The idea looks good, but I think it should be keyed specifically
on whether we see (in either order):

  BIT_NOT_EXPR 
  COND_EXPR 

So I think scalar_cond_masked_key should have a boolean to say
“is the comparison result inverted?”  The new BIT_NOT_EXPR handling
in scalar_cond_masked_key::get_cond_ops_from_tree would then set this
flag but keep the feeding comparison code as-is (i.e. with no call
to invert_tree_comparison).  The vectorizable_condition code should
then try looking up the comparison with the new “inverted?” flag set.

There would then be no need for tree_comparison_ordered_p.  We'd only
do the optimisation if (~cmp_result & loop_mask) is known to be needed
elsewhere.

This line:

> +   bitop1 = GT_EXPR;

looks like it should be using the original comparison code instead
of hard-coding to GT_EXPR.

Thanks,
Richard
  
>
> Note: This patch series is working incrementally towards generating the most
>   efficient code for this and other loops in small steps. The mov is
>   created by postreload when it does a late CSE.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   * fold-const.c (tree_comparison_ordered_p): New.
>   * fold-const.h (tree_comparison_ordered_p): New.
>   * tree-vect-stmts.c (vectorizable_condition): Check if inverse of mask
>   is live.
>   * tree-vectorizer.c (scalar_cond_masked_key::get_cond_ops_from_tree):
>   Register mask inverses.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/sve/pred-not-gen.c: Update testcase.
>
> --- inline copy of patch -- 
> diff --git a/gcc/fold-const.h b/gcc/fold-const.h
> index 
> 7bac84ba33145c17d1dac9afe70bbd1c89a4b3fa..852fc37b25023a108410fcf375604d082357efa2
>  100644
> --- a/gcc/fold-const.h
> +++ b/gcc/fold-const.h
> @@ -144,6 +144,7 @@ extern enum tree_code swap_tree_comparison (enum 
> tree_code);
>  
>  extern bool ptr_difference_const (tree, tree, poly_int64_pod *);
>  extern enum tree_code invert_tree_comparison (enum tree_code, bool);
> +extern bool tree_comparison_ordered_p (enum tree_code);
>  extern bool inverse_conditions_p (const_tree, const_tree);
>  
>  extern bool tree_unary_nonzero_warnv_p (enum tree_code, tree, tree, bool *);
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 
> 7dcecc9a5c08d56703075229f762f750ed6c5d93..04991457db7e5166e8ce17d4bfa3b107f619dbc1
>  100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -2669,6 +2669,37 @@ invert_tree_comparison (enum tree_code code, bool 
> honor_nans)
>  }
>  }
>  
> +/* Given a tree comparison code return whether the comparison is for an
> +   ordered expression or not.  */
> +
> +bool
> +tree_comparison_ordered_p (enum tree_code code)
> +{
> +  switch (code)
> +{
> +case EQ_EXPR:
> +case NE_EXPR:
> +case GT_EXPR:
> +case GE_EXPR:
> +case LT_EXPR:
> +case LE_EXPR:
> +case LTGT_EXPR:
> +  return true;
> +case UNEQ_EXPR:
> +case UNGT_EXPR:
> +case UNGE_EXPR:
> +case UNLT_EXPR:
> +case UNLE_EXPR:
> +case ORDERED_EXPR:
> +case UNORDERED_EXPR:
> +  return false;
> +default:
> +  gcc_unreachable ();
> +}
> +}
> +
> +
> +
>  /* Similar, but return the comparison that results if the operands are
> swapped.  This is safe for 

Re: [RFC] Port git gcc-descr to Python

2021-10-14 Thread Tobias Burnus

Hi Martin,

On 12.10.21 10:59, Martin Liška wrote:

There's a complete patch that implements both git gcc-descr and
gcc-undesrc
and sets corresponding git aliases to use them.


When invoking ./contrib/git-describe.py directly, I get the error:
fatal: Not a valid object name /master

I think you need something like:
-   upstream = r.stdout.strip() if r.returncode else 'origin'
+   upstream = r.stdout.strip() if r.returncode == 0 else 'origin'

 * * *

Additionally, I observe the following (independent of the conversion):
For 7433458d871f6bfe2169b9d7d04fec64bb142924, I get:
   r0-80854-g7433458d871f6b
The question is whether we are happy that only reversions since
basepoints/gcc-5 have a version number or whether we want to
use releases/gcc-x.y.0 -> x.y as fallback when basepoint does not
exist or back-added more basepoints? Admittedly, this will add '.',
thus, maybe the answer is 'no'?

Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [RFC][patch][PR102281]Clear padding for variables that are in registers

2021-10-14 Thread Jakub Jelinek via Gcc-patches
On Tue, Oct 12, 2021 at 10:51:45PM +, Qing Zhao wrote:
> PR10228 1https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102281
> Exposed an issue in the current implementation of the padding initialization 
> of -ftrivial-auto-var-init.
> 
> Currently, we add __builtin_clear_padding call _AFTER_ every explicit 
> initialization of an auto variable:
> 
> var_decl = {init_constructor};
> __builtin_clear_padding (_decl, 0B, 1);
> 
> the reason I added the call to EVERY auto variable that has explicit 
> initialization is, the folding of __builtin_clear_padding will automatically 
> turn this call to a NOP when there is no padding in the variable. So, we 
> don't need to check whether the variable has padding explicitly. 
> 
> However, always adding the call to __builtin_clear_padding (_decl,…) 
> might introduce invalid IR when VAR_DECL cannot be addressable. 
> 
> In order to resolve this issue, I propose the following solution:
> 
> Instead of adding the call to __builtin_clear_padding _AFTER_ the explicit 
> initialization, Using zero to initialize the whole variable BEFORE explicit 
> fields initialization when VAR_DECL has padding, i.e:
> 
> If (had_padding_p (var_decl))
> var_decl = ZERO;
> var_decl = {init_constructor};
> 
> This should resolve the invalid IR issue.  However, there might be more run 
> time overhead from such padding initialization since the whole variable is 
> set to zero instead of only the paddings. 
> 
> Please let me know you comments on this.

As I've tried to explain in the PR, this is wrong.
It makes no sense whatsoever to clear padding on is_gimple_reg
variables even if they do have padding (e.g. long double/_Complex long
double), because all the GIMPLE passes will just not care about it.

If you want to clear padding for those, it needs to be done where it
is forced into memory (e.g. expansion for returning or passing such
types in memory if they aren't passed in registers, or RA when the register
allocator decides to spill them into memory if even that is what you want to
cover).

For !is_gimple_reg vars, yes, something like clear_padding_type_has_padding_p
could be useful to avoid unnecessary IL growth, but it should be implemented
more efficiently, in particular for the answer to a question will
__builtin_clear_padding emit any code on this type at least for non-unions
there is no buffer covering the whole type needed, just next to
clear_in_mask bool add another bool for the new mode and in clear_padding_flush
in that mode just set another bool if clear_in_mask mode would clear any bit
in the mask if it was there.  For unions, I'm afraid it needs to do it
the clear_in_mask way and at the end analyze the buf.
Also, besides answer to the question "would __builtin_clear_padding emit any
code on this type" we should have another function/another mode which
returns true if any padding whatsoever is found, including unions.
For non-union it would be exactly the same thing, but in unions it would
keep using the same mode too, even if just one union member has any padding
return it.  And, for these modes, there could be shortcuts, once we find
some padding, it doesn't make sense to walk further fields.

Jakub



Re: [PATCH] combine: Check for paradoxical subreg

2021-10-14 Thread Richard Sandiford via Gcc-patches
Sorry for the slow review.

Robin Dapp via Gcc-patches  writes:
> Hi,
>
> while evaluating another patch that introduces more lvalue paradoxical 
> subregs I ran into an ICE in combine at
>
> wide_int o = wi::insert (rtx_mode_t (outer, temp_mode),
>  rtx_mode_t (inner, dest_mode),
>  offset, width);
>
> because width (=GET_MODE_PRECISION (dest_mode)) > GET_MODE_PRECISION 
> (temp_mode).
>
>  From the comments and the code it looks like we do not want to handle a 
> paradoxical subreg here so I quickly added a check for it which prevents 
> the ICE.  The check could also be added to reg_subword_p () I guess.

Yeah, I think adding it to reg_subword_p would be better, immediately
after the GET_CODE (x) == SUBREG test.

OK with that change, thanks.

Richard

>
> Is this the right thing to do or am I missing something, i.e. my other 
> patch might be doing something it should not?
>
> Bootstrap on s390x was successful and testsuite is looking good so far.
>
> Regards
>   Robin
>
> --
>
> gcc/ChangeLog:
>
>  * combine.c (try_combine): Check for paradoxical subreg.
>
> From 2b74c66f8d97c30c45e99336c9871cccd8cf94e1 Mon Sep 17 00:00:00 2001
> From: Robin Dapp 
> Date: Tue, 22 Jun 2021 17:35:37 +0200
> Subject: [PATCH 11/11] combine paradoxical subreg fix.
>
> ---
>  gcc/combine.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/combine.c b/gcc/combine.c
> index 6476812a212..de04560db21 100644
> --- a/gcc/combine.c
> +++ b/gcc/combine.c
> @@ -2782,7 +2782,8 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
> rtx_insn *i0,
>&& CONST_SCALAR_INT_P (SET_SRC (temp_expr))
>&& GET_CODE (PATTERN (i3)) == SET
>&& CONST_SCALAR_INT_P (SET_SRC (PATTERN (i3)))
> -  && reg_subword_p (SET_DEST (PATTERN (i3)), SET_DEST (temp_expr)))
> +  && reg_subword_p (SET_DEST (PATTERN (i3)), SET_DEST (temp_expr))
> +  && !paradoxical_subreg_p (SET_DEST (PATTERN (i3
>  {
>rtx dest = SET_DEST (PATTERN (i3));
>rtx temp_dest = SET_DEST (temp_expr);


Re: [PATCH 1/7] ifcvt: Check if cmovs are needed.

2021-10-14 Thread Richard Sandiford via Gcc-patches
Hi Robin,

Thanks for the update and sorry for the late response.

Robin Dapp  writes:
> Hi Richard,
>
>> Don't we still need this code (without the REG_DEAD handling) for the
>> case in which…
>> 
>>> +  /* As we are transforming
>>> +if (x > y)
>>> +  {
>>> +a = b;
>>> +c = d;
>>> +  }
>>> +into
>>> +  a = (x > y) ...
>>> +  c = (x > y) ...
>>> +
>>> +we potentially check x > y before every set.
>>> +Even though the check might be removed by subsequent passes, this means
>>> +that we cannot transform
>>> +  if (x > y)
>>> +{
>>> +  x = y;
>>> +  ...
>>> +}
>>> +into
>>> +  x = (x > y) ...
>>> +  ...
>>> +since this would invalidate x and the following to-be-removed checks.
>>> +Therefore we introduce a temporary every time we are about to
>>> +overwrite a variable used in the check.  Costing of a sequence with
>>> +these is going to be inaccurate so only use temporaries when
>>> +needed.  */
>>> +  if (reg_overlap_mentioned_p (target, cond))
>>> +   temp = gen_reg_rtx (GET_MODE (target));
>> 
>> …this code triggers?  I don't see otherwise how later uses of x would
>> pick up “temp” instead of the original target.  E.g. suppose we had:
>> 
>>  if (x > y)
>>{
>>  x = …;
>>  z = x; // x does not die here
>>}
>> 
>> Without the loop, it looks like z would pick up the old value of x
>> (used in the comparison) instead of the new one.
>
> getting back to this now.  I re-added handling of the situation you 
> mentioned (even though I didn't manage to trigger it myself).

Yeah, the only reliable way to test for this might be an rtx test
(see testsuite/gcc.dg/rtl for some examples).  A test isn't necessary
though.

My only remaining concern is that bb_ok_for_noce_convert_multiple_sets
allows subreg sources:

  if (!(REG_P (src)
   || (GET_CODE (src) == SUBREG && REG_P (SUBREG_REG (src))
   && subreg_lowpart_p (src
return false;

These subregs could also require rewrites.  It looks like the original
code checks for overlaps correctly, but doesn't necessarily deal with
a hit correctly.  So this is at least partly pre-existing.

I think the way of handling that is as follows:

> diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
> index 017944f4f79..f1448667732 100644
> --- a/gcc/ifcvt.c
> +++ b/gcc/ifcvt.c
> @@ -98,6 +98,8 @@ static int dead_or_predicable (basic_block, basic_block, 
> basic_block,
>  edge, int);
>  static void noce_emit_move_insn (rtx, rtx);
>  static rtx_insn *block_has_only_trap (basic_block);
> +static void need_cmov_or_rewire (basic_block, hash_set *,
> +  hash_map *);
>
>  /* Count the number of non-jump active insns in BB.  */
>  
> @@ -3203,6 +3205,11 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>auto_vec unmodified_insns;
>int count = 0;
>  
> +  hash_set need_no_cmov;
> +  hash_map rewired_src;
> +
> +  need_cmov_or_rewire (then_bb, _no_cmov, _src);
> +
>FOR_BB_INSNS (then_bb, insn)
>  {
>/* Skip over non-insns.  */
> @@ -3213,26 +3220,47 @@ noce_convert_multiple_sets (struct noce_if_info 
> *if_info)
>gcc_checking_assert (set);
>  
>rtx target = SET_DEST (set);
> -  rtx temp = gen_reg_rtx (GET_MODE (target));
> -  rtx new_val = SET_SRC (set);
> +  rtx temp;
> +
> +  int *ii = rewired_src.get (insn);
> +  rtx new_val = ii == NULL ? SET_SRC (set) : temporaries[*ii];

(1) here use something like:

rtx new_val = SET_SRC (set);
if (int *ii = rewired_src.get (insn))
  new_val = simplify_replace_rtx (new_val, targets[*ii],
  temporaries[*ii]);

>rtx old_val = target;
>  
> -  /* If we were supposed to read from an earlier write in this block,
> -  we've changed the register allocation.  Rewire the read.  While
> -  we are looking, also try to catch a swap idiom.  */
> -  for (int i = count - 1; i >= 0; --i)
> - if (reg_overlap_mentioned_p (new_val, targets[i]))
> -   {
> - /* Catch a "swap" style idiom.  */
> - if (find_reg_note (insn, REG_DEAD, new_val) != NULL_RTX)
> -   /* The write to targets[i] is only live until the read
> -  here.  As the condition codes match, we can propagate
> -  the set to here.  */
> -   new_val = SET_SRC (single_set (unmodified_insns[i]));
> - else
> -   new_val = temporaries[i];
> - break;
> -   }
> +  /* As we are transforming
> +  if (x > y)
> +{
> +  a = b;
> +  c = d;
> +}
> +  into
> +a = (x > y) ...
> +c = (x > y) ...
> +
> +  we potentially check x > y before every set.
> +  Even though the check might be removed by subsequent passes, this means
> +  that we cannot transform
> +if (x > y)
> +  

Re: [PATCH, rs6000] punish reload of lfiwzx when loading an int variable [PR102169, PR102146]

2021-10-14 Thread HAO CHEN GUI via Gcc-patches



On 14/10/2021 上午 8:12, Segher Boessenkool wrote:

On Wed, Sep 29, 2021 at 04:32:19PM +0800, HAO CHEN GUI wrote:

   The patch punishes reload of alternative pair of "d, Z" for
movsi_internal1. The reload occurs if 'Z' doesn't match and generates an
additional insn. So the memory reload should be punished.

As David says, why only for loads?  But also, why not for lxsiwzx (and
stxsiwx) as well?

But, what for all other uses of lfiwzx?  And lfiwax?

We need to find out why the register allocator considers it a good idea
to use FP regs here, and fix *that*?


Let me explain why it uses FP regs.

In ira pass,  the cost of general, float and altivec registers are all zero. So 
it prefers 'GEN_OR_VSX_REGS' class and is finally assigned register vs32. Not 
sure if the altivec registers are preferable for 'GEN_OR_VSX_REGS'.

    r120: preferred GEN_OR_VSX_REGS, alternative NO_REGS, allocno 
GEN_OR_VSX_REGS

  a0(r120,l0) costs: BASE_REGS:0,0 GENERAL_REGS:0,0 FLOAT_REGS:0,0 
ALTIVEC_REGS:0,0 VSX_REGS:4000,4000 GEN_OR_FLOAT_REGS:6000,6000 
GEN_OR_VSX_REGS:6000,6000 LINK_REGS:12000,12000 CTR_REGS:12000,12000 
LINK_OR_CTR_REGS:12000,12000 SPEC_OR_GEN_REGS:12000,12000 ALL_REGS:36000,36000 
MEM:8000,8000

  Allocno a0r120 of GEN_OR_VSX_REGS(93) has 93 avail. regs  0 3-12 14-95, 
node:  0 3-12 14-95 (confl regs =  1-2 13 96-110)
  Forming thread from colorable bucket:
  Pushing a0(r120,l0)(cost 0)
  Popping a0(r120,l0)  -- assign reg 32
Disposition:
    0:r120 l0    32

In reload pass, the third alternative pair is "r,m" and the forth pair is "d,Z".  As the 'r' doesn't match the class of reg vs32, it needs a output 
reload and got a "reject++" as well as a "losers". For pair "d,Z", the second operands 'Z' doesn't match. It needs a input reload and got a 
"losers". But the memory reload is not punished if there is no addition modifies with the alternative. So it only got a "addr_losers". Overall cost 
of "r,m" is great than "d,Z". So it picked up FP register and 'lfiwzx' instruction.

    0 Non input pseudo reload: reject++
  alt=2,overall=7,losers=1,rld_nregs=1
  alt=3,overall=6,losers=1,rld_nregs=0

Gui Haochen



The extra insn you talk about is because this insn only allows indexed
addressing ([reg+reg] or [reg] addressing).  That is true for very many
insns.  Reload (well, LRA in the modern world) should know about such
extra costs.  Does it not?


gcc/
     * gcc/config/rs6000/rs6000.md (movsi_internal1): disparages
     slightly the alternative 'Z' of "lfiwzx" when reload is
needed.

"Disparage", no "s".  Changelog entries are written in the imperative.


Segher


Re: [RFC] Replace VRP with EVRP passes

2021-10-14 Thread Richard Biener via Gcc-patches
On Wed, Oct 13, 2021 at 10:58 PM Andrew MacLeod  wrote:
>
> As work has progressed, we're pretty close to being able to functionally
> replace VRP with another EVRP pass.  At least it seems close enough that
> we should discuss if thats something we might want to consider for this
> release.   Replacing just one of the 2 VRP passes is another option.
>
> First, lets examine simplifications/folds.
>
> Running over my set of 380 GCC source files, we see the following
> results for number of cases we currently get:
>
> Number of EVRP cases :   5789
> Number of VRP1 cases :   4913
> Number of VRP2 cases :   279
>   combined VRP1/2:   5192
>
> The 2 passes of VRP get a total of 5192 cases.
>
> If we run EVRP instead of each VRP pass, we get the following results:
>
> Number of EVRP1 cases :   5789
> Number of EVRP2 cases :   7521
> Number of EVRP3 cases :   2240
>   combined EVRP2/3:   9761
>
> so the EVRP passes find an additional 4569 opportunities,  or 88% more.
> This initially surprised me until it occurred to me that this is
> probably due to the pruning require for ASSERT_EXPRs, which means it
> never had a chance to see some of those cases. Notice how the second
> pass appears far more effective now.
>
> Regarding what we would miss if we took VRP out, if we run  EVRP passes
> first then a VRP pass immediately after, we see what VRP finds that EVRP
> cannot:
>
> Number of EVRP2 cases :  7521
> Number of VRP1 cases :   11
> Number of EVRP3 cases :  2269
> Number of VRP2 cases :   54
>
> I have looked at some of these, and so far they all appear to be cases
> which are solved via the iteration model VRP uses.

Most should be now handled by using SCEV analysis on PHIs rather
than VRP iteration so can you share an example where the iteration helps?

> regardless, missing
> 65 cases and getting 4569 new ones would seem to be a win. I will
> continue to investigate them.
>
> == Performance ==
>
> The threading work has been pulled out of VRP, so we get a better idea
> of what VRPs time really is. we're showing about a 58% slowdown in VRP
> over the 2 passes.  I've begun investigating because it shouldn't be off
> by that much, Im seeing a lot of excess time being wasted with callback
> queries from the substitute_and_fold engine when processing PHIs.  It
> should be possible to bring this all back in line as that isnt the model
> ranger should be using anyway.
>
> I figured while I'm looking into the performance side of it, maybe we
> should start talking about whether we want to replace one or both of the
> VRP passes with an EVRP instance.
>
>
> I see 3 primary options:
> 1 - replace both VRP passes with EVRP instances.
> 2 - replace VRP2 with EVRP2
> 3 - Replace neither, leave it as is.
>
> I figure since the second pass of VRP doesn't get a lot to start with,
> it probably doesn't make sense to replace VRP1 and not VRP2.
>
> Option 1 is what I would expect the strategic move next release to be,
> it seems ready now, its just a matter of whether we want to give it more
> time.  It would also be trivial to turn VRP back on for one for both
> later in the cycle if we determines there was something important missing.
>
> option 2 is something we ought to really consider if we don't want to do
> option 1.  There are a few PRs that are starting to open that have VRP
> not getting something due to the whims of more precise mutli-ranges
> being converted back to a value_range, and replacing VRP2 would allow us
> to catch those..  plus, we pick up a lot more than  VRP2 does.
>
> I would propose we add a param, similar to what EVRP has which will
> allow us to choose which pass is called for VRP1 and VRP2 and set our
> defaults appropriately.  I wouldn't work with a hybrid like we did with
> EVRP... just choose which pass runs. And we'll have to adjust some
> testcases based one whatever our default is.
>
> Thoughts?
>
> Personally I think we give option 1 a go and if something shows up over
> the next couple of months, or  we cant get performance in line with
> where we want it, then we can switch back to VRP for one or both
> passes.  I wouldn't  expect either, but one never knows :-)
>
> If that isn't palatable for everyone, then I'd suggest option 2

How far are you with handling the symbolic range cases VRP gets
with the relation work?  The symbolic range handling is important
for Ada IIRC.  I would of course have hoped the testsuite would
catch important regressions there (did you test Ada?)

I think option 2 would be safest at least so any important iteration/symbolic
work is done early.

Thanks,
Richard.

> Andrew
>
>
>
>
>
>


  1   2   >