Re: [PATCH] vect: Add bias parameter for partial vectorization

2021-11-21 Thread Kewen.Lin via Gcc-patches
Hi Robin,

on 2021/11/12 下午5:56, Robin Dapp wrote:
> Hi Kewen and Richard,
> 
> the attached v3 addresses the comments to v2, among others:
> 
>  - Rename to load_store where appropriate.
>  - Save the adjusted length as a separate control that is used instead
> of loop_len with a bias != 0 and added to the loop header.
>  - Update the costs to reflect a bias.
> 
> Bootstrap and regtest were fine on z15 and p9.
> 

Nice!  Some minor comments are inlined below.

> Regards
>  Robin
> 
> 
> vll-v3.patch
> 

...

>  extern void expand_addsub_overflow (location_t, tree_code, tree, tree, tree,
>   bool, bool, bool, bool, tree *);
> diff --git a/gcc/tree-vect-loop-manip.c b/gcc/tree-vect-loop-manip.c
> index 4988c93fdb6..931378820ac 100644
> --- a/gcc/tree-vect-loop-manip.c
> +++ b/gcc/tree-vect-loop-manip.c
> @@ -421,6 +421,7 @@ vect_maybe_permute_loop_masks (gimple_seq *seq, 
> rgroup_controls *dest_rgm,
>  static tree
>  vect_set_loop_controls_directly (class loop *loop, loop_vec_info loop_vinfo,
>gimple_seq *preheader_seq,
> +  gimple_seq *header_seq,
>gimple_stmt_iterator loop_cond_gsi,
>rgroup_controls *rgc, tree niters,
>tree niters_skip, bool might_wrap_p)
> @@ -436,7 +437,7 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>tree length_limit = NULL_TREE;
>/* For length, we need length_limit to ensure length in range.  */
>if (!use_masks_p)
> -length_limit = build_int_cst (compare_type, nitems_per_ctrl);
> +  length_limit = build_int_cst (compare_type, nitems_per_ctrl);
>  

Nit, seems like an unintentional change.

>/* Calculate the maximum number of item values that the rgroup
>   handles in total, the number that it handles for each iteration
> @@ -560,8 +561,9 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>  {
>/* Previous controls will cover BIAS items.  This control covers the
>next batch.  */
> +  tree bias_tree;
>poly_uint64 bias = nitems_per_ctrl * i;
> -  tree bias_tree = build_int_cst (compare_type, bias);
> +  bias_tree = build_int_cst (compare_type, bias);
>  

Same as above.

>/* See whether the first iteration of the vector loop is known
>to have a full control.  */
> @@ -664,6 +666,20 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>  
>vect_set_loop_control (loop, ctrl, init_ctrl, next_ctrl);
>  }
> +
> +  int partial_load_bias = LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo);
> +  if (partial_load_bias != 0
> +  && partial_load_bias != VECT_PARTIAL_BIAS_UNSUPPORTED)
> +{

IIUC, we don't need to check VECT_PARTIAL_BIAS_UNSUPPORTED again?  Since it's
at the stage of transformation, we have checked it before for sure?

> +  tree adjusted_len = rgc->bias_adjusted_ctrl;
> +  gassign *minus = gimple_build_assign (adjusted_len, MINUS_EXPR,
> + rgc->controls[0],
> + build_int_cst
> + (TREE_TYPE (rgc->controls[0]),
> +  -partial_load_bias));
> +  gimple_seq_add_stmt (header_seq, minus);
> +}
> +
>return next_ctrl;
>  }
>  
> @@ -744,6 +760,7 @@ vect_set_loop_condition_partial_vectors (class loop *loop,
>   /* Set up all controls for this group.  */
>   test_ctrl = vect_set_loop_controls_directly (loop, loop_vinfo,
>_seq,
> +  _seq,
>loop_cond_gsi, rgc,
>niters, niters_skip,
>might_wrap_p);
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index e94356d76e9..ceeb6920871 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -1163,6 +1163,31 @@ vect_verify_loop_lens (loop_vec_info loop_vinfo)
>if (LOOP_VINFO_LENS (loop_vinfo).is_empty ())
>  return false;
>  
> +  machine_mode len_load_mode = get_len_load_store_mode
> +(loop_vinfo->vector_mode, true).require ();
> +  machine_mode len_store_mode = get_len_load_store_mode
> +(loop_vinfo->vector_mode, false).require ();
> +
> +  signed char partial_load_bias = internal_len_load_store_bias
> +(IFN_LEN_LOAD, len_load_mode);
> +
> +  signed char partial_store_bias = internal_len_load_store_bias
> +(IFN_LEN_STORE, len_store_mode);
> +
> +  gcc_assert (partial_load_bias == partial_store_bias);
> +
> +  if (partial_load_bias == VECT_PARTIAL_BIAS_UNSUPPORTED)
> +return false;
> +
> +  LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS (loop_vinfo) = partial_load_bias;
> +

Nit, it seems better 

[PATCH 2/2] tree-optimization: [PR92342] Move b & -(a==c) optimization to the gimple level

2021-11-21 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

Combine disabled this optimization in r10-254-gddbb5da5199fb42 but it makes
sense to do this on the gimple level and then let expand decide which way is
better. So this adds the transformation on the gimple level (late like was
done for the multiply case).

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

PR tree-optimization/92342

gcc/ChangeLog:

* match.pd (b & -(a CMP c) -> (a CMP c)?b:0): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/andnegcmp-1.c: New test.
* gcc.dg/tree-ssa/andnegcmp-2.c: New test.
---
 gcc/match.pd|  8 +++-
 gcc/testsuite/gcc.dg/tree-ssa/andnegcmp-1.c | 14 ++
 gcc/testsuite/gcc.dg/tree-ssa/andnegcmp-2.c | 14 ++
 3 files changed, 35 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/andnegcmp-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/andnegcmp-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index ed43c321cbc..b55cbc91b57 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1794,7 +1794,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (for cmp (tcc_comparison)
   (simplify
(mult:c (convert (cmp @0 @1)) @2)
-   (cond (cmp @0 @1) @2 { build_zero_cst (type); }
+   (cond (cmp @0 @1) @2 { build_zero_cst (type); }))
+/* (-(m1 CMP m2)) & d -> (m1 CMP m2) ? d : 0  */
+  (simplify
+   (bit_and:c (negate (convert (cmp @0 @1))) @2)
+   (cond (cmp @0 @1) @2 { build_zero_cst (type); }))
+ )
+)
 
 /* For integral types with undefined overflow and C != 0 fold
x * C EQ/NE y * C into x EQ/NE y.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/andnegcmp-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/andnegcmp-1.c
new file mode 100644
index 000..6f16783f169
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/andnegcmp-1.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* PR tree-optimization/92342 */
+
+int
+f (int m1, int m2, int c)
+{
+  int d = m1 == m2;
+  d = -d;
+  int e = d & c;
+  return e;
+}
+
+/* { dg-final { scan-tree-dump-times "\\? c_\[0-9\]\\(D\\) : 0" 1 "optimized" 
} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/andnegcmp-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/andnegcmp-2.c
new file mode 100644
index 000..0e25c8abc39
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/andnegcmp-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* PR tree-optimization/92342 */
+
+int
+f (int m1, int m2, int c)
+{
+  int d = m1 < m2;
+  d = -d;
+  int e = c & d;
+  return e;
+}
+
+/* { dg-final { scan-tree-dump-times "\\? c_\[0-9\]\\(D\\) : 0" 1 "optimized" 
} } */
-- 
2.17.1



[PATCH 1/2] Improve/Fix (m1 CMP m2) * d -> (m1 CMP m2) ? d : 0 pattern.

2021-11-21 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

The pattern here was not catching all comparisons and the multiply
was not commutative when it should have been. This patches fixes
that by using tcc_comparison and adding :c to the multiply.

OK? Bootstrapped and tested on x86_64-linux-gnu with no regressions.

gcc/ChangeLog:

* match.pd ((m1 CMP m2) * d -> (m1 CMP m2) ? d : 0):
Use tcc_comparison and :c for the multiply.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/multcmp-1.c: New test.
* gcc.dg/tree-ssa/multcmp-2.c: New test.
---
 gcc/match.pd  |  4 ++--
 gcc/testsuite/gcc.dg/tree-ssa/multcmp-1.c | 12 
 gcc/testsuite/gcc.dg/tree-ssa/multcmp-2.c | 12 
 3 files changed, 26 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/multcmp-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/multcmp-2.c

diff --git a/gcc/match.pd b/gcc/match.pd
index ca6c9eff624..ed43c321cbc 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1791,9 +1791,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 
 /* (m1 CMP m2) * d -> (m1 CMP m2) ? d : 0  */
 (if (!canonicalize_math_p ())
- (for cmp (gt lt ge le)
+ (for cmp (tcc_comparison)
   (simplify
-   (mult (convert (cmp @0 @1)) @2)
+   (mult:c (convert (cmp @0 @1)) @2)
(cond (cmp @0 @1) @2 { build_zero_cst (type); }
 
 /* For integral types with undefined overflow and C != 0 fold
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/multcmp-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/multcmp-1.c
new file mode 100644
index 000..fb44cacde77
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/multcmp-1.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int
+f (int m1, int m2, int c)
+{
+  int d = m1 == m2;
+  int e = d * c;
+  return e;
+}
+
+/* { dg-final { scan-tree-dump-times "\\? c_\[0-9\]\\(D\\) : 0" 1 "optimized" 
} } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/multcmp-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/multcmp-2.c
new file mode 100644
index 000..be38b2e0044
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/multcmp-2.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+
+int
+f (int m1, int m2, int c)
+{
+  int d = m1 != m2;
+  int e = c * d;
+  return e;
+}
+
+/* { dg-final { scan-tree-dump-times "\\? c_\[0-9\]\\(D\\) : 0" 1 "optimized" 
} } */
-- 
2.17.1



Re: [PATCH] Don't allow mask/sse/mmx mov in TLS code sequences.

2021-11-21 Thread Hongtao Liu via Gcc-patches
On Fri, Nov 19, 2021 at 3:53 PM Uros Bizjak via Gcc-patches
 wrote:
>
> On Fri, Nov 19, 2021 at 8:50 AM Uros Bizjak  wrote:
> >
> > On Fri, Nov 19, 2021 at 2:14 AM liuhongt  wrote:
> > >
> > > >Why is the above declared as a special memory constraint? Also the
> > > Change to define_memory_constraint since it's ok for
> > > reload can make them match by converting the operand to the form
> > > ‘(mem (reg X))’.where X is a base register (from the register class 
> > > specified
> > > by BASE_REG_CLASS
> > >
> > > >predicate comment is missing and the description should say something
> > > >like:
> > > >
> > > >@internal TLS address that allows insn using non-integer registers
> > > Changed.
> > >
> > > >I think it is better to avoid negative logic. So, something like
> > > >
> > > >Return true if the TLS address requires insn using integer registers.
> > > >
> > > >bool
> > > >ix86_gpr_tls_address_pattern_p
> > > >
> > > >(and use not in the predicate)
> > >
> > > Changed.
> > >
> > > >> +{
> > > >> +  gcc_assert (MEM_P (mem));
> > > >> +
> > > >> +  rtx addr = XEXP (mem, 0);
> > > >> +  subrtx_var_iterator::array_type array;
> > > >> +  FOR_EACH_SUBRTX_VAR (iter, array, addr, ALL)
> > > >> +{
> > > >> +  rtx op = *iter;
> > > >> +  if (GET_CODE (op) == UNSPEC)
> > > >> +   switch (XINT (op, 1))
> > > >> + {
> > > >> + case UNSPEC_GOTNTPOFF:
> > > >> +   return false;
> > > >> + case UNSPEC_TPOFF:
> > > >> +   if (!TARGET_64BIT)
> > > >> + return false;
> > > >> +   break;
> > > >> + default:
> > > >> +   break;
> > > >> + }
> > > >> +  /* Should iter.skip_subrtxes ();
> > > >> +if there's no inner UNSPEC in addr???.  */
> > >
> > > >You should figure the above before submitting the patch.
> > > ix86_print_operand_address_as shows there could be inner UNSPEC in addr, 
> > > so
> > > remove comments.
> > >
> > > >Can you please minimize the testcase?
> > > Done;
> > >
> > > As change in assembler, refer to [1], this patch disallow mask/sse/mmx
> > > mov in TLS code sequences which require integer MOV instructions.
> > >
> > > [1] 
> > > https://sourceware.org/git/?p=binutils-gdb.git;a=patch;h=d7e3e627027fcf37d63e284144fe27ff4eba36b5
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/103275
> > > * config/i386/i386-protos.h (ix86_gpr_tls_address_pattern_p):
> > > Declare.
> > > * config/i386/i386.c (ix86_gpr_tls_address_pattern_p): New
> > > function.
> > > * config/i386/i386.md (*movsi_internal): Don't allow
> > > mask/sse/mmx move in TLS code sequences.
> > > (*movdi_internal): Ditto.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/pr103275.c: New test.
> >
> > OK, with a small comment adjustment below.
>
> Ops, sorry, I was too fast. You can simplify the patch to change the
> constraint from
>
> *km to *kBk
>
> Then no renumbering is needed.
Yes, changed, thanks for the review.
this is the final patch i'm checking in.
>
> Uros.
>
> >
> > Thanks,
> > Uros.
> >
> > > ---
> > >  gcc/config/i386/constraints.md   |  5 ++
> > >  gcc/config/i386/i386-protos.h|  1 +
> > >  gcc/config/i386/i386.c   | 32 +
> > >  gcc/config/i386/i386.md  | 18 ++---
> > >  gcc/testsuite/gcc.target/i386/pr103275.c | 83 
> > >  5 files changed, 130 insertions(+), 9 deletions(-)
> > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr103275.c
> > >
> > > diff --git a/gcc/config/i386/constraints.md 
> > > b/gcc/config/i386/constraints.md
> > > index eaa582d2055..15c5950ee6f 100644
> > > --- a/gcc/config/i386/constraints.md
> > > +++ b/gcc/config/i386/constraints.md
> > > @@ -185,6 +185,11 @@ (define_special_memory_constraint "Bc"
> > >(and (match_operand 0 "memory_operand")
> > > (match_test "constant_address_p (XEXP (op, 0))")))
> > >
> > > +(define_memory_constraint "Bk"
> > > +  "@internal TLS address that allows insn using non-integer registers."
> > > +  (and (match_operand 0 "memory_operand")
> > > +   (not (match_test "ix86_gpr_tls_address_pattern_p (op)"
> > > +
> > >  (define_special_memory_constraint "Bn"
> > >"@internal Memory operand without REX prefix."
> > >(match_operand 0 "norex_memory_operand"))
> > > diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
> > > index 7782cf1163f..941e91636d8 100644
> > > --- a/gcc/config/i386/i386-protos.h
> > > +++ b/gcc/config/i386/i386-protos.h
> > > @@ -240,6 +240,7 @@ extern unsigned int ix86_get_callcvt (const_tree);
> > >  #endif
> > >
> > >  extern rtx ix86_tls_module_base (void);
> > > +extern bool ix86_gpr_tls_address_pattern_p (rtx);
> > >  extern bool ix86_tls_address_pattern_p (rtx);
> > >  extern rtx ix86_rewrite_tls_address (rtx);
> > >
> > > diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> > > index 42c47d2b12b..68079e4230e 

Re: [PATCH, rs6000] optimization for vec_reve builtin [PR100868]

2021-11-21 Thread David Edelsohn via Gcc-patches
On Wed, Nov 17, 2021 at 3:28 AM HAO CHEN GUI  wrote:
>
> Hi,
>
>   The patch optimized for vec_reve builtin on rs6000. For V2DI and V2DF, it 
> is implemented by xxswapd on all targets. For V16QI, V8HI, V4SI and V4SF, it 
> is implemented by quadword byte reverse plus halfword/word byte reverse when 
> p9_vector is set.
>
>   Bootstrapped and tested on powerpc64le-linux with no regressions. Is this 
> okay for trunk? Any recommendations? Thanks a lot.
>
> ChangeLog
> 2021-11-17 Haochen Gui 
>
> gcc/
> * config/rs6000/altivec.md (altivec_vreve2 for VEC_K): Use
> xxbrq for v16qi, xxbrq + xxbrh for v8hi and xxbrq + xxbrw for v4si
> or v4sf when p9_vector is set.
> (altivec_vreve2 for VEC_64): Defined. Implemented by xxswapd.
>
> gcc/testsuite/
> * gcc.target/powerpc/vec_reve_1.c: New test.
> * gcc.target/powerpc/vec_reve_2.c: Likewise.

This is okay.

Please don't send a message that contains the patch as both an inline
message and as an attachment.

Thanks, David


[PATCH 3/3] c++: P1997 array-copy extensions: Assignment, return, etc. [PR103238]

2021-11-21 Thread Will Wray via Gcc-patches
This second patch completes the work of the first 'array-copy' patch to
provide first-cut implementations of all P1997 features. It adds:

 * Assignments to arrays from array values,a = b;
 * Placeholder auto in array declarations, auto cp[] = a;
 * Array as a return type from functions WIP,  auto f() -> T[N];
 * Parsing of array pseudo-destructors a.~A()
   (only parsing for now, untested)

Assignments a = b were easily allowed by changing branch conditions.
Assignments a = {e...} were trickier (a case not mentioned in P1997):

int a[16]; a = {0,1,1,2}; a = {}; // assignments from init-lists

The semantics is the same as for struct aggregates:
(1) Aggregate initialization of an rhs array of the lhs type
(so trailing elements with no initializer are value initialized)
(2) Copy-initialization of the lhs from the rhs.

The special case of an optionally-braced array value is allowed so that
a = b and a = {b} are generally equivalent for same type arrays a and b.
However, the now special-special case of assignment from a braced string-
literal currently only supports exact-match (same as for other arrays):

char a[4]; a={"c++"} /* OK */; a={"c"} /* FAILs but should work */;

Array return from function is work in progress. The tests show what works.
I'm stuck in unfamiliar territory so it's best to submit what I have to be
reviewed for hints on how to progress.

Please try the patch; play, stress it, and report the FAILS.

PR c++/103238

gcc/c/ChangeLog:

* c-decl.c (grokdeclarator): Don't complain of array returns.

gcc/cp/ChangeLog:

* call.c (can_convert_array): Extend to include array inits.
(standard_conversion): No decay for same-type array. Call build_conv.
(implicit_conversion_1): Call reshape_init for arrays too.
* decl.c (grokdeclarator): Don't complain of array returns.
* parser.c (cp_parser_postfix_dot_deref_expression): parse array ~A().
* pt.c (tsubst_function_type): Array type return is not a failure.
(do_auto_deduction): Placeholder auto deduction of array element type.
* tree.c (lvalue_kind): clk_class should include array (I think?).
* typeck.c (cp_build_modify_expr): Call reshape init to strip optional
braces. Allow NOP_EXPR for array assignment.
(convert_for_assignment): New if-block for same-type array convert,
strips optional braces, but rejects STRING_CST rhs shorter than lhs.

gcc/testsuite/ChangeLog:

* g++.dg/init/array-copy10.C: New test. auto[] deduce 'after' PASSes
* g++.dg/init/array-copy11.C: New test. Array return 'before' XFAILs
* g++.dg/init/array-copy12.C: New test. Array return 'after' PASSes
* g++.dg/init/array-copy7.C: New test. Array assign 'before' XFAILs
* g++.dg/init/array-copy8.C: New test. Array assign 'after' PASSes
* g++.dg/init/array-copy9.C: New test. auto[] deduce 'before' XFAILs
---
 gcc/c/c-decl.c   |  2 +-
 gcc/cp/call.c| 43 +++--
 gcc/cp/decl.c|  2 +-
 gcc/cp/parser.c  |  4 +-
 gcc/cp/pt.c  | 13 +-
 gcc/cp/tree.c|  3 +-
 gcc/cp/typeck.c  | 26 +--
 gcc/testsuite/g++.dg/init/array-copy10.C | 57 +++
 gcc/testsuite/g++.dg/init/array-copy11.C | 13 ++
 gcc/testsuite/g++.dg/init/array-copy12.C | 79 
 gcc/testsuite/g++.dg/init/array-copy7.C  | 40 
 gcc/testsuite/g++.dg/init/array-copy8.C  | 56 ++
 gcc/testsuite/g++.dg/init/array-copy9.C  | 57 +++
 13 files changed, 372 insertions(+), 23 deletions(-)

diff --git a/gcc/c/c-decl.c b/gcc/c/c-decl.c
index 3e28a038095..031c43d189f 100644
--- a/gcc/c/c-decl.c
+++ b/gcc/c/c-decl.c
@@ -7055,7 +7055,7 @@ grokdeclarator (const struct c_declarator *declarator,
"returning a function");
type = integer_type_node;
  }
-   if (TREE_CODE (type) == ARRAY_TYPE)
+   if (TREE_CODE (type) == ARRAY_TYPE && !flag_array_copy)
  {
if (name)
  error_at (loc, "%qE declared as function returning an array",
diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 4ee21c7bdbd..c73fb73d86e 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -908,29 +908,34 @@ static bool
 can_convert_array (tree atype, tree from, int flags, tsubst_flags_t complain)
 {
   tree elttype = TREE_TYPE (atype);
-  unsigned i;
 
   if (TREE_CODE (from) == CONSTRUCTOR)
 {
-  for (i = 0; i < CONSTRUCTOR_NELTS (from); ++i)
+  for (auto&& ce : CONSTRUCTOR_ELTS (from))
{
- tree val = CONSTRUCTOR_ELT (from, i)->value;
- bool ok;
- if (TREE_CODE (elttype) == ARRAY_TYPE)
-   ok = can_convert_array (elttype, val, flags, 

[PATCH 2/3] c++: P1997 array-copy extensions: Initialization [PR103238]

2021-11-21 Thread Will Wray via Gcc-patches
This patch implements initializations of arrays from array values.

The first of two 'array-copy' patches, it adds the option -farray-copy
(flag_array_copy) to enable all features of P1997 (copy related or not),
documented as experimental extensions.

It deals with initialization of array variables and member array fields.

Initialization of an array variable from an array of the same type performs
array copy-initialization; elementwise move or copy from an rvalue or lvalue
array respectively, in index order from begin to end. The existing code path
for a structured binding declaration with array initializer, auto[e...]{a};
performs the same array copy-initialization (as a special case superpower).
Borrowing from that, this was a relatively quick and easy change.

Initialization of member arrays proved much more difficult to do in general.
I resorted to trial and error, running gcc in gdb with test cases to work out
where and what to change, until eventually converging on this set of changes.

One starting point was the C special case of char array initialization from
string literals (as char array lvalue constants). However, a long-standing
bug in designated initialization of char arrays by string literals blocked
the task of extending this special case to general array type initializers.
A bugfix patch was separated out, to be merged ahead of these patches:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55227
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584897.html

Other cases to consider, array initializations:

* by optionally brace-enclosed or paren-enclosed array values
* by possibly-designated array-valued aggregate initializers
  (within possibly-elided braced init-lists)
* by brace or paren-enclosed array values in member initialization lists
* by array-valued member initializers

The patch adds tests for these cases, and for inner initializations of nested
array elements of array type.

The work has diverged in details from the P1997 wording, including catching
up with C++20 changes such as parenthesised initialization of aggregates.
The paper will be revised to reflect the implementation experience.

It is likely that there are omissions, errors in the conditions or that changed
code is inappropriate. For example, I inserted a new call to build_array_copy
in typeck2.c:digest_init_r which may not be correct for move-enabled elements.
Please review carefully with this in mind and suggest test cases to exercise.

PR c++/103238

gcc/c-family/ChangeLog:

* c-common.c (complete_array_type): Accept array type initial_value.
* c.opt: New option -farray-copy "experimental extensions for P1997".

gcc/cp/ChangeLog:

* decl.c (do_aggregate_paren_init): Accept single array type init.
(maybe_deduce_size_from_array_init): Include same-type array inits,
or complain for not same-type arrays.
(reshape_init_r): Extend string-literal handling to all array types.
* init.c (build_aggr_init): Follow existing path for array rhs.
* typeck.c (cp_build_modify_expr): Follow path for synthetic op=.
* typeck2.c (digest_init_r): Add call to build_array_copy for
same-type arrays ('copy' feels wrong for move-eligible rhs).

gcc/ChangeLog:

* doc/invoke.texi: -farray-copy help info documentation.

gcc/testsuite/ChangeLog:

* g++.dg/init/array-copy1.C: New test. Variable init 'before' XFAILs
* g++.dg/init/array-copy2.C: New test. Variable init 'after' PASSes
* g++.dg/init/array-copy3.C: New test. Member init 'before' XFAILs
* g++.dg/init/array-copy4.C: New test. Member init 'after' PASSes
* g++.dg/init/array-copy5.C: New test. Member nsdmi & desig XFAILs
* g++.dg/init/array-copy6.C: New test. Member nsdmi & desig PASSes
---
 gcc/c-family/c-common.c |  5 +++
 gcc/c-family/c.opt  |  4 ++
 gcc/cp/decl.c   | 61 -
 gcc/cp/init.c   |  6 ++-
 gcc/cp/typeck.c |  9 +++--
 gcc/cp/typeck2.c| 30 +++
 gcc/doc/invoke.texi |  6 +++
 gcc/testsuite/g++.dg/init/array-copy1.C | 66 
 gcc/testsuite/g++.dg/init/array-copy2.C | 68 +
 gcc/testsuite/g++.dg/init/array-copy3.C | 41 
 gcc/testsuite/g++.dg/init/array-copy4.C | 42 
 gcc/testsuite/g++.dg/init/array-copy5.C | 36 +
 gcc/testsuite/g++.dg/init/array-copy6.C | 51 +
 13 files changed, 395 insertions(+), 30 deletions(-)

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 86c007f53de..fb0b1ef294f 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -6796,6 +6796,11 @@ complete_array_type (tree *ptype, tree initial_value, 
bool do_default)
= 

[PATCH 1/3] c++: designated init of char array by string constant [PR55227]

2021-11-21 Thread Will Wray via Gcc-patches
Also address "FIXME: this code is duplicated from reshape_init" in
cp_complete_array_type by always calling reshape_init on init-list.

PR c++/55227

gcc/cp/ChangeLog:

* decl.c (reshape_init_r): Only call has_designator_check when
   first_initializer_p or for the inner constructor element.
(cp_complete_array_type): Call reshape_init on braced-init-list.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/desig20.C: New test.
---
 gcc/cp/decl.c| 42 +--
 gcc/testsuite/g++.dg/cpp2a/desig20.C | 48 
 2 files changed, 65 insertions(+), 25 deletions(-)

diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c
index 2ddf0e4a524..83a2d3bf8f1 100644
--- a/gcc/cp/decl.c
+++ b/gcc/cp/decl.c
@@ -6824,28 +6824,31 @@ reshape_init_r (tree type, reshape_iter *d, tree 
first_initializer_p,
   if (TREE_CODE (type) == ARRAY_TYPE
   && char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (type
 {
-  tree str_init = init;
-  tree stripped_str_init = stripped_init;
+  tree arr_init = init;
+  tree stripped_arr_init = stripped_init;
+  reshape_iter stripd = {};
 
   /* Strip one level of braces if and only if they enclose a single
 element (as allowed by [dcl.init.string]).  */
   if (!first_initializer_p
- && TREE_CODE (stripped_str_init) == CONSTRUCTOR
- && CONSTRUCTOR_NELTS (stripped_str_init) == 1)
+ && TREE_CODE (stripped_arr_init) == CONSTRUCTOR
+ && CONSTRUCTOR_NELTS (stripped_arr_init) == 1)
{
- str_init = (*CONSTRUCTOR_ELTS (stripped_str_init))[0].value;
- stripped_str_init = tree_strip_any_location_wrapper (str_init);
+ stripd.cur = CONSTRUCTOR_ELT (stripped_arr_init, 0);
+ arr_init = stripd.cur->value;
+ stripped_arr_init = tree_strip_any_location_wrapper (arr_init);
}
 
   /* If it's a string literal, then it's the initializer for the array
 as a whole. Otherwise, continue with normal initialization for
 array types (one value per array element).  */
-  if (TREE_CODE (stripped_str_init) == STRING_CST)
+  if (TREE_CODE (stripped_arr_init) == STRING_CST)
{
- if (has_designator_problem (d, complain))
+ if ((first_initializer_p && has_designator_problem (d, complain))
+ || (stripd.cur && has_designator_problem (, complain)))
return error_mark_node;
  d->cur++;
- return str_init;
+ return arr_init;
}
 }
 
@@ -9545,22 +9548,11 @@ cp_complete_array_type (tree *ptype, tree 
initial_value, bool do_default)
   if (initial_value)
 {
   /* An array of character type can be initialized from a
-brace-enclosed string constant.
-
-FIXME: this code is duplicated from reshape_init. Probably
-we should just call reshape_init here?  */
-  if (char_type_p (TYPE_MAIN_VARIANT (TREE_TYPE (*ptype)))
- && TREE_CODE (initial_value) == CONSTRUCTOR
- && !vec_safe_is_empty (CONSTRUCTOR_ELTS (initial_value)))
-   {
- vec *v = CONSTRUCTOR_ELTS (initial_value);
- tree value = (*v)[0].value;
- STRIP_ANY_LOCATION_WRAPPER (value);
-
- if (TREE_CODE (value) == STRING_CST
- && v->length () == 1)
-   initial_value = value;
-   }
+brace-enclosed string constant so call reshape_init to
+remove the optional braces from a braced string literal.  */
+  if (BRACE_ENCLOSED_INITIALIZER_P (initial_value))
+   initial_value = reshape_init (*ptype, initial_value,
+ tf_warning_or_error);
 
   /* If any of the elements are parameter packs, we can't actually
 complete this type now because the array size is dependent.  */
diff --git a/gcc/testsuite/g++.dg/cpp2a/desig20.C 
b/gcc/testsuite/g++.dg/cpp2a/desig20.C
new file mode 100644
index 000..daadfa58855
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/desig20.C
@@ -0,0 +1,48 @@
+// PR c++/55227 
+// Test designated initializer for char array by string constant
+
+// { dg-options "" }
+
+struct C {char a[2];};
+
+/* Case a, designated, unbraced, string-literal of the exact same size
+   as the initialized char array; valid and accepted before and after.  */
+C a = {.a="a"};
+
+/* Cases b,c,d, designated, braced or mimatched-size, string literal,
+   previously rejected; "C99 designator 'a' outside aggregate initializer".  */
+C b = {.a=""};
+C c = {.a={""}};
+C d = {.a={"a"}};
+
+/* Case e, designated char array field and braced, designated array element(s)
+   (with GNU [N]= extension) valid and accepted before and after.  */
+C e = {.a={[0]='a'}};
+
+/* Cases f,g,h, braced string literal, 'designated' within inner braces;
+   invalid, previously accepted as positional with 'designator' ignored.  */
+C f = {{[0]="a"}}; // { dg-error "C99 designator .0. outside aggregate 
initializer" }
+C g = 

[PATCH 0/3] P1997 'array-copy' patchset [PR103238]

2021-11-21 Thread Will Wray via Gcc-patches
([PATCH 1/3] already submitted fix for PR c++/55227, is a dependency here.)

These patches implement C++ proposal P1997 "Relaxing restrictions on array"
that adds:

  C array copy semantics:
* array-array initializations
* array-array assignments
* array return-by-value from functions
 (array formal parameters are unchanged; there's no pass-by-value).

  Plus, C++ specific:
* array pseudo-destructors
* array element type deduction
 (i.e. admitting placeholder auto in array variable declarations).

The features are added as an experimental extension, disabled by default.
The patches should have no effect until enabled by the new option:

-farray-copy (flag_array_copy, a single flag to enable all features)

The extension is documented as experimental with no guarantee of stability;
features may be added, removed or changed in detail. In particular, there's
no guarantee of ABI stability; allowing array as a function return type has
ABI implications for calling conventions of the array return slot and, for
C++, name-mangling conventions must be defined.

The plan is to first merge array-copy as experimental, with ABI defined as
'what the code does', and then to go ahead with ABI work.

Will Wray (3):
  c++: designated init of char array by string constant [PR55227]
  c++: P1997 array-copy extensions: Initialization [PR103238]
  c++: P1997 array-copy extensions: Assignment, return, etc. [PR103238]

 gcc/c-family/c-common.c  |   5 ++
 gcc/c-family/c.opt   |   4 ++
 gcc/c/c-decl.c   |   2 +-
 gcc/cp/call.c|  43 +++-
 gcc/cp/decl.c| 111 ++-
 gcc/cp/init.c|   6 +-
 gcc/cp/parser.c  |   4 +-
 gcc/cp/pt.c  |  13 +++-
 gcc/cp/tree.c|   3 +-
 gcc/cp/typeck.c  |  35 --
 gcc/cp/typeck2.c |  30 ++---
 gcc/doc/invoke.texi  |   6 ++
 gcc/testsuite/g++.dg/cpp2a/desig20.C |  48 +
 gcc/testsuite/g++.dg/init/array-copy1.C  |  66 ++
 gcc/testsuite/g++.dg/init/array-copy10.C |  57 
 gcc/testsuite/g++.dg/init/array-copy11.C |  13 
 gcc/testsuite/g++.dg/init/array-copy12.C |  79 ++
 gcc/testsuite/g++.dg/init/array-copy2.C  |  68 +++
 gcc/testsuite/g++.dg/init/array-copy3.C  |  41 
 gcc/testsuite/g++.dg/init/array-copy4.C  |  42 
 gcc/testsuite/g++.dg/init/array-copy5.C  |  36 ++
 gcc/testsuite/g++.dg/init/array-copy6.C  |  51 ++
 gcc/testsuite/g++.dg/init/array-copy7.C  |  40 +++
 gcc/testsuite/g++.dg/init/array-copy8.C  |  56 
 gcc/testsuite/g++.dg/init/array-copy9.C  |  57 
 25 files changed, 835 insertions(+), 81 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/desig20.C
 create mode 100644 gcc/testsuite/g++.dg/init/array-copy1.C
 create mode 100644 gcc/testsuite/g++.dg/init/array-copy10.C
 create mode 100644 gcc/testsuite/g++.dg/init/array-copy11.C
 create mode 100644 gcc/testsuite/g++.dg/init/array-copy12.C
 create mode 100644 gcc/testsuite/g++.dg/init/array-copy2.C
 create mode 100644 gcc/testsuite/g++.dg/init/array-copy3.C
 create mode 100644 gcc/testsuite/g++.dg/init/array-copy4.C
 create mode 100644 gcc/testsuite/g++.dg/init/array-copy5.C
 create mode 100644 gcc/testsuite/g++.dg/init/array-copy6.C
 create mode 100644 gcc/testsuite/g++.dg/init/array-copy7.C
 create mode 100644 gcc/testsuite/g++.dg/init/array-copy8.C
 create mode 100644 gcc/testsuite/g++.dg/init/array-copy9.C

-- 
2.31.1



PING^3 [PATCH] rs6000: Remove builtin mask check from builtin_decl [PR102347]

2021-11-21 Thread Kewen.Lin via Gcc-patches
Hi,

As the discussions and the testing result under the main thread, this
patch would be safe.

Ping for this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580357.html

BR,
Kewen


>> on 2021/9/28 下午4:13, Kewen.Lin via Gcc-patches wrote:
>>> Hi,
>>>
>>> As the discussion in PR102347, currently builtin_decl is invoked so
>>> early, it's when making up the function_decl for builtin functions,
>>> at that time the rs6000_builtin_mask could be wrong for those
>>> builtins sitting in #pragma/attribute target functions, though it
>>> will be updated properly later when LTO processes all nodes.
>>>
>>> This patch is to align with the practice i386 port adopts, also
>>> align with r10-7462 by relaxing builtin mask checking in some places.
>>>
>>> Bootstrapped and regress-tested on powerpc64le-linux-gnu P9 and
>>> powerpc64-linux-gnu P8.
>>>
>>> Is it ok for trunk?
>>>
>>> BR,
>>> Kewen
>>> -
>>> gcc/ChangeLog:
>>>
>>> PR target/102347
>>> * config/rs6000/rs6000-call.c (rs6000_builtin_decl): Remove builtin
>>> mask check.
>>>
>>> gcc/testsuite/ChangeLog:
>>>
>>> PR target/102347
>>> * gcc.target/powerpc/pr102347.c: New test.
>>>
>>> ---
>>>  gcc/config/rs6000/rs6000-call.c | 14 --
>>>  gcc/testsuite/gcc.target/powerpc/pr102347.c | 15 +++
>>>  2 files changed, 19 insertions(+), 10 deletions(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102347.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-call.c 
>>> b/gcc/config/rs6000/rs6000-call.c
>>> index fd7f24da818..15e0e09c07d 100644
>>> --- a/gcc/config/rs6000/rs6000-call.c
>>> +++ b/gcc/config/rs6000/rs6000-call.c
>>> @@ -13775,23 +13775,17 @@ rs6000_init_builtins (void)
>>>  }
>>>  }
>>>
>>> -/* Returns the rs6000 builtin decl for CODE.  */
>>> +/* Returns the rs6000 builtin decl for CODE.  Note that we don't check
>>> +   the builtin mask here since there could be some #pragma/attribute
>>> +   target functions and the rs6000_builtin_mask could be wrong when
>>> +   this checking happens, though it will be updated properly later.  */
>>>
>>>  tree
>>>  rs6000_builtin_decl (unsigned code, bool initialize_p ATTRIBUTE_UNUSED)
>>>  {
>>> -  HOST_WIDE_INT fnmask;
>>> -
>>>if (code >= RS6000_BUILTIN_COUNT)
>>>  return error_mark_node;
>>>
>>> -  fnmask = rs6000_builtin_info[code].mask;
>>> -  if ((fnmask & rs6000_builtin_mask) != fnmask)
>>> -{
>>> -  rs6000_invalid_builtin ((enum rs6000_builtins)code);
>>> -  return error_mark_node;
>>> -}
>>> -
>>>return rs6000_builtin_decls[code];
>>>  }
>>>
>>> diff --git a/gcc/testsuite/gcc.target/powerpc/pr102347.c 
>>> b/gcc/testsuite/gcc.target/powerpc/pr102347.c
>>> new file mode 100644
>>> index 000..05c439a8dac
>>> --- /dev/null
>>> +++ b/gcc/testsuite/gcc.target/powerpc/pr102347.c
>>> @@ -0,0 +1,15 @@
>>> +/* { dg-do link } */
>>> +/* { dg-require-effective-target power10_ok } */
>>> +/* { dg-require-effective-target lto } */
>>> +/* { dg-options "-flto -mdejagnu-cpu=power9" } */
>>> +
>>> +/* Verify there are no error messages in LTO mode.  */
>>> +
>>> +#pragma GCC target "cpu=power10"
>>> +int main ()
>>> +{
>>> +  float *b;
>>> +  __vector_quad c;
>>> +  __builtin_mma_disassemble_acc (b, );
>>> +  return 0;
>>> +}
>>> --
>>> 2.27.0
>>>
>>



PING^6 [PATCH] rs6000: Fix some issues in rs6000_can_inline_p [PR102059]

2021-11-21 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping this patch:

https://gcc.gnu.org/pipermail/gcc-patches/2021-September/578552.html

One related patch [1] is ready to commit, whose test cases rely on
this patch if no changes are applied to them.

[1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579658.html

BR,
Kewen

> on 2021/9/1 下午2:55, Kewen.Lin via Gcc-patches wrote:
>> Hi!
>>
>> This patch is to fix the inconsistent behaviors for non-LTO mode
>> and LTO mode.  As Martin pointed out, currently the function
>> rs6000_can_inline_p simply makes it inlinable if callee_tree is
>> NULL, but it's wrong, we should use the command line options
>> from target_option_default_node as default.  It also replaces
>> rs6000_isa_flags with the one from target_option_default_node
>> when caller_tree is NULL as rs6000_isa_flags could probably
>> change since initialization.
>>
>> It also extends the scope of the check for the case that callee
>> has explicit set options, for test case pr102059-2.c inlining can
>> happen unexpectedly before, it's fixed accordingly.
>>
>> As Richi/Mike pointed out, some tuning flags like MASK_P8_FUSION
>> can be neglected for inlining, this patch also exludes them when
>> the callee is attributed by always_inline.
>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu Power9.
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>>  PR ipa/102059
>>  * config/rs6000/rs6000.c (rs6000_can_inline_p): Adjust with
>>  target_option_default_node and consider always_inline_safe flags.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  PR ipa/102059
>>  * gcc.target/powerpc/pr102059-1.c: New test.
>>  * gcc.target/powerpc/pr102059-2.c: New test.
>>  * gcc.target/powerpc/pr102059-3.c: New test.
>>  * gcc.target/powerpc/pr102059-4.c: New test.
>>
>



PING^4 [PATCH v2] rs6000: Modify the way for extra penalized cost

2021-11-21 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580358.html

BR,
Kewen

>>> on 2021/9/28 下午4:16, Kewen.Lin via Gcc-patches wrote:
 Hi,

 This patch follows the discussions here[1][2], where Segher
 pointed out the existing way to guard the extra penalized
 cost for strided/elementwise loads with a magic bound does
 not scale.

 The way with nunits * stmt_cost can get one much
 exaggerated penalized cost, such as: for V16QI on P8, it's
 16 * 20 = 320, that's why we need one bound.  To make it
 better and more readable, the penalized cost is simplified
 as:

 unsigned adjusted_cost = (nunits == 2) ? 2 : 1;
 unsigned extra_cost = nunits * adjusted_cost;

 For V2DI/V2DF, it uses 2 penalized cost for each scalar load
 while for the other modes, it uses 1.  It's mainly concluded
 from the performance evaluations.  One thing might be
 related is that: More units vector gets constructed, more
 instructions are used.  It has more chances to schedule them
 better (even run in parallelly when enough available units
 at that time), so it seems reasonable not to penalize more
 for them.

 The SPEC2017 evaluations on Power8/Power9/Power10 at option
 sets O2-vect and Ofast-unroll show this change is neutral.

 Bootstrapped and regress-tested on powerpc64le-linux-gnu Power9.

 Is it ok for trunk?

 [1] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579121.html
 [2] https://gcc.gnu.org/pipermail/gcc-patches/2021-September/580099.html
 v1: https://gcc.gnu.org/pipermail/gcc-patches/2021-September/579529.html

 BR,
 Kewen
 -
 gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_update_target_cost_per_stmt): Adjust
the way to compute extra penalized cost.  Remove useless parameter.
(rs6000_add_stmt_cost): Adjust the call to function
rs6000_update_target_cost_per_stmt.


 ---
  gcc/config/rs6000/rs6000.c | 31 ++-
  1 file changed, 18 insertions(+), 13 deletions(-)

 diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
 index dd42b0964f1..8200e1152c2 100644
 --- a/gcc/config/rs6000/rs6000.c
 +++ b/gcc/config/rs6000/rs6000.c
 @@ -5422,7 +5422,6 @@ rs6000_update_target_cost_per_stmt (rs6000_cost_data 
 *data,
enum vect_cost_for_stmt kind,
struct _stmt_vec_info *stmt_info,
enum vect_cost_model_location where,
 -  int stmt_cost,
unsigned int orig_count)
  {

 @@ -5462,17 +5461,23 @@ rs6000_update_target_cost_per_stmt 
 (rs6000_cost_data *data,
{
  tree vectype = STMT_VINFO_VECTYPE (stmt_info);
  unsigned int nunits = vect_nunits_for_cost (vectype);
 -unsigned int extra_cost = nunits * stmt_cost;
 -/* As function rs6000_builtin_vectorization_cost shows, we have
 -   priced much on V16QI/V8HI vector construction as their units,
 -   if we penalize them with nunits * stmt_cost, it can result in
 -   an unreliable body cost, eg: for V16QI on Power8, stmt_cost
 -   is 20 and nunits is 16, the extra cost is 320 which looks
 -   much exaggerated.  So let's use one maximum bound for the
 -   extra penalized cost for vector construction here.  */
 -const unsigned int MAX_PENALIZED_COST_FOR_CTOR = 12;
 -if (extra_cost > MAX_PENALIZED_COST_FOR_CTOR)
 -  extra_cost = MAX_PENALIZED_COST_FOR_CTOR;
 +/* Don't expect strided/elementwise loads for just 1 nunit.  */
 +gcc_assert (nunits > 1);
 +/* i386 port adopts nunits * stmt_cost as the penalized cost
 +   for this kind of penalization, we used to follow it but
 +   found it could result in an unreliable body cost especially
 +   for V16QI/V8HI modes.  To make it better, we choose this
 +   new heuristic: for each scalar load, we use 2 as penalized
 +   cost for the case with 2 nunits and use 1 for the other
 +   cases.  It's without much supporting theory, mainly
 +   concluded from the broad performance evaluations on Power8,
 +   Power9 and Power10.  One possibly related point is that:
 +   vector construction for more units would use more insns,
 +   it has more chances to schedule them better (even run in
 +   parallelly when enough available units at that time), so
 +   it seems reasonable not to penalize that much for them.  */
 +unsigned int adjusted_cost = (nunits == 2) ? 2 : 1;
 +unsigned int extra_cost = nunits * adjusted_cost;
  data->extra_ctor_cost += extra_cost;
}
  }
 @@ -5510,7 +5515,7 @@ 

PING^7 [PATCH v2] combine: Tweak the condition of last_set invalidation

2021-11-21 Thread Kewen.Lin via Gcc-patches
Hi,

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2021-June/572555.html

BR,
Kewen

>> on 2021/6/11 下午9:16, Kewen.Lin via Gcc-patches wrote:
>>> Hi Segher,
>>>
>>> Thanks for the review!
>>>
>>> on 2021/6/10 上午4:17, Segher Boessenkool wrote:
 Hi!

 On Wed, Dec 16, 2020 at 04:49:49PM +0800, Kewen.Lin wrote:
> Currently we have the check:
>
>   if (!insn
> || (value && rsp->last_set_table_tick >= 
> label_tick_ebb_start))
>   rsp->last_set_invalid = 1; 
>
> which means if we want to record some value for some reg and
> this reg got refered before in a valid scope,

 If we already know it is *set* in this same extended basic block.
 Possibly by the same instruction btw.

> we invalidate the
> set of reg (last_set_invalid to 1).  It avoids to find the wrong
> set for one reg reference, such as the case like:
>
>... op regX  // this regX could find wrong last_set below
>regX = ...   // if we think this set is valid
>... op regX

 Yup, exactly.

> But because of retry's existence, the last_set_table_tick could
> be set by some later reference insns, but we see it's set due
> to retry on the set (for that reg) insn again, such as:
>
>insn 1
>insn 2
>
>regX = ... --> (a)
>... op regX--> (b)
>
>insn 3
>
>// assume all in the same BB.
>
> Assuming we combine 1, 2 -> 3 sucessfully and replace them as two
> (3 insns -> 2 insns),

 This will delete insn 1 and write the combined result to insns 2 and 3.

> retrying from insn1 or insn2 again:

 Always 2, but your point remains valid.

> it will scan insn (a) again, the below condition holds for regX:
>
>   (value && rsp->last_set_table_tick >= label_tick_ebb_start)
>
> it will mark this set as invalid set.  But actually the
> last_set_table_tick here is set by insn (b) before retrying, so it
> should be safe to be taken as valid set.

 Yup.

> This proposal is to check whether the last_set_table safely happens
> after the current set, make the set still valid if so.

> Full SPEC2017 building shows this patch gets more sucessful combines
> from 1902208 to 1902243 (trivial though).

 Do you have some example, or maybe even a testcase?  :-)

>>>
>>> Sorry for the late reply, it took some time to get one reduced case.
>>>
>>> typedef struct SA *pa_t;
>>>
>>> struct SC {
>>>   int h;
>>>   pa_t elem[];
>>> };
>>>
>>> struct SD {
>>>   struct SC *e;
>>> };
>>>
>>> struct SA {
>>>   struct {
>>> struct SD f[1];
>>>   } g;
>>> };
>>>
>>> void foo(pa_t *k, char **m) {
>>>   int l, i;
>>>   pa_t a;
>>>   l = (int)a->g.f[5].e;
>>>   i = 0;
>>>   for (; i < l; i++) {
>>> k[i] = a->g.f[5].e->elem[i];
>>> m[i] = "";
>>>   }
>>> }
>>>
>>> Baseline is r12-0 and the option is "-O3 -mcpu=power9 
>>> -fno-strict-aliasing",
>>> with this patch, the generated assembly can save two rlwinm s.
>>>
> +  /* Record the luid of the insn whose expression involving register 
> n.  */
> +
> +  intlast_set_table_luid;

 "Record the luid of the insn for which last_set_table_tick was set",
 right?

>>>
>>> But it can be updated later to one smaller luid, how about the wording 
>>> like:
>>>
>>>
>>> +  /* Record the luid of the insn which uses register n, the insn should
>>> + be the first one using register n in that block of the insn which
>>> + last_set_table_tick was set for.  */
>>>
>>>
> -static void update_table_tick (rtx);
> +static void update_table_tick (rtx, int);

 Please remove this declaration instead, the function is not used until
 after its actual definition :-)

>>>
>>> Done.
>>>
> @@ -13243,7 +13247,21 @@ update_table_tick (rtx x)
>for (r = regno; r < endregno; r++)
>   {
> reg_stat_type *rsp = _stat[r];
> -   rsp->last_set_table_tick = label_tick;
> +   if (rsp->last_set_table_tick >= label_tick_ebb_start)
> + {
> +   /* Later references should not have lower ticks.  */
> +   gcc_assert (label_tick >= rsp->last_set_table_tick);

 This should be obvious, but checking it 

Re: [PATCH 07/15] xtensa: Fix non-robust split condition in define_insn_and_split

2021-11-21 Thread Kewen.Lin via Gcc-patches
on 2021/11/20 上午6:26, augustine.sterl...@gmail.com wrote:
> On Thu, Nov 11, 2021 at 3:25 AM Kewen Lin  wrote:
>> gcc/ChangeLog:
>>
>> * config/xtensa/xtensa.md (movdi_internal, movdf_internal): Fix split
>> condition.
> 
> I had been hoping Max would reply (as I'm just doing legacy work
> around this these days), but seeing that he hasn't. This is approved.
> 

Thanks Augustine!  Committed as r12-5444.

BR,
Kewen


Re: [PATCH] gfortran: Improve translation of POPPAR intrinsic

2021-11-21 Thread Bernhard Reutner-Fischer via Gcc-patches
On Sun, 21 Nov 2021 19:59:35 +0100
Harald Anlauf  wrote:

> Let's have a look at the tree-dump of the existing testcase:
> 
> integer(kind=4) runtime_poppar (integer(kind=16) & restrict i)
> {
>integer(kind=4) res;
> 
>{
>  uint128_t D.4221;
> 
>  D.4221 = (uint128_t) *i;
>  res = __builtin_parityll ((unsigned long) D.4221 ^ (unsigned long)
> (D.4221 >> 64));
>}
>return res;
> }
> 
> My understanding is there is actually nothing left to do,
> as the middle-end(?) already handles this.

Well the whole point was ...
> 
> Am 21.11.21 um 01:22 schrieb Bernhard Reutner-Fischer via Fortran:
> > Roger pinged this on gcc-patches some time ago fwiw.
> > [The commit-hooks will likely fix or ignore s/bext/next/ in his
> > mail-addr]
> >
> >
> > On Sun, 14 Jun 2020 23:39:32 +0100
> > "Roger Sayle"  wrote:
> >  
> >>
> >>
> >> The following patch to gfortran's trans-instrinsic.c tweaks the generic 
> >> that
> >> is produced
> >>
> >> for popcnt on integer(kind=16).  Currently, the double word popcnt is
> >> implemented as
> >>
> >> parityll(hipart(x))^parityll(lopart(x)), but with this patch this is now
> >> translated as
> >>
> >> parityll(hipart(x)^lopart(x)).  This will be just an aesthetic change once
> >> my tree-level
> >>
> >> parity optimization patch of 12th June is reviewed and accepted, but
> >> generating the
> >>
> >> more efficient form initially, avoids a tiny bit of garbage collection when
> >> the middle-end
> >>
> >> cleans this up into its preferred form.The semantics/correctness of 
> >> this

... the above, i.e. create better code in the first place.
thanks,

> >>
> >> change are tested by the run-time tests in gfortran.dg/popcnt_poppar_2.F90
> >>
> >>
> >>
> >> This patch has been tested with "make bootstrap" and "make -k check" on
> >>
> >> x86_64-pc-linux-gnu with no regressions.  If approved, I'd very much
> >>
> >> appreciate it if the (gfortran) reviewer could commit this change for me.
> >>
> >>
> >>
> >> 2020-06-14  Roger Sayle  
> >>
> >>
> >>
> >>  * trans-intrinsic.c (gfc_conv_intrinsic_popcnt_poppar): Translate
> >>
> >>  poppar(kind=16) as parityll(hipart(x)^lopart(x)) instead of
> >>
> >>  parityll(hipart(x))^parityll(lopart(x)).
> >>
> >>
> >>
> >>
> >>
> >> Thanks in advance,
> >>
> >> Roger
> >>
> >> --
> >>
> >> Roger Sayle
> >>
> >> NextMove Software
> >>
> >> Cambridge, UK
> >>
> >>
> >>  
> >  
> 



Re: [PATCH] libgccjit: Add support for types used by atomic builtins [PR96066] [PR96067]

2021-11-21 Thread Antoni Boucher via Gcc-patches
Thanks for the review!
I updated the patch.

See notes below.

Le samedi 20 novembre 2021 à 13:50 -0500, David Malcolm a écrit :
> On Sat, 2021-11-20 at 11:27 -0500, Antoni Boucher wrote:
> > Hi.
> > Here's the updated patch.
> > Thanks for the review!
> 
> Thanks for the updated patch...
> 
> > 
> > Le jeudi 20 mai 2021 à 16:24 -0400, David Malcolm a écrit :
> > > On Mon, 2021-05-17 at 21:02 -0400, Antoni Boucher via Jit wrote:
> > > > Hello.
> > > > This patch fixes the issue with using atomic builtins in
> > > > libgccjit.
> > > > Thanks to review it.
> > > 
> > > [...snip...]
> > >  
> > > > diff --git a/gcc/jit/jit-recording.c b/gcc/jit/jit-recording.c
> > > > index 117ff70114c..de876ff9fa6 100644
> > > > --- a/gcc/jit/jit-recording.c
> > > > +++ b/gcc/jit/jit-recording.c
> > > > @@ -2598,8 +2598,18 @@
> > > > recording::memento_of_get_pointer::accepts_writes_from (type
> > > > *rtype)
> > > >  return false;
> > > >  
> > > >    /* It's OK to assign to a (const T *) from a (T *).  */
> > > > -  return m_other_type->unqualified ()
> > > > -    ->accepts_writes_from (rtype_points_to);
> > > > +  if (m_other_type->unqualified ()
> > > > +    ->accepts_writes_from (rtype_points_to)) {
> > > > +  return true;
> > > > +  }
> > > > +
> > > > +  /* It's OK to assign to a (volatile const T *) from a
> > > > (volatile
> > > > const T *). */
> > > > +  if (m_other_type->unqualified ()->unqualified ()
> > > > +    ->accepts_writes_from (rtype_points_to->unqualified ())) {
> > > > +  return true;
> > > > +  }
> > > 
> > > Presumably you need this to get the atomic builtins working?
> > > 
> > > If I'm reading the above correctly, the new test doesn't
> > > distinguish
> > > between the 3 different kinds of qualifiers (aligned, volatile,
> > > and
> > > const), it merely tries to strip some of them off.
> > > 
> > > It's not valid to e.g. assign to a (aligned T *) from a (const T
> > > *).
> > > 
> > > Maybe we need an internal enum to discriminate between different
> > > subclasses of decorated_type?
> 
> I'm still concerned about this case, my reading of the updated patch
> is
> that this case is still not quite correctly handled (see notes
> below).
> I don't think we currently have test coverage for assignment to e.g.
> (aligned T *) from a (const T*); I feel that it should be an error,
> without an explicit cast.
> 
> Please can you add a testcase for this?

Done.

> 
> If you want to go the extra mile, given that this is code created
> through an API, you could have a testcase that iterates through all
> possible combinations of qualifiers (for both source and destination
> pointer), and verifies that libgccjit at least doesn't crash on them
> (and hopefully does the right thing on each one)  :/
> 
> (perhaps doing each one in a different gcc_jit_context)
> 
> Might be nice to update test-fuzzer.c for the new qualifiers; I don't
> think I've touched it in a long time.

Done.

> 
> [...snip...]
> 
> > diff --git a/gcc/jit/jit-recording.h b/gcc/jit/jit-recording.h
> > index 4a994fe7094..60aaba2a246 100644
> > --- a/gcc/jit/jit-recording.h
> > +++ b/gcc/jit/jit-recording.h
> > @@ -545,6 +545,8 @@ public:
> >    virtual bool is_float () const = 0;
> >    virtual bool is_bool () const = 0;
> >    virtual type *is_pointer () = 0;
> > +  virtual type *is_volatile () { return NULL; }
> > +  virtual type *is_const () { return NULL; }
> >    virtual type *is_array () = 0;
> >    virtual struct_ *is_struct () { return NULL; }
> >    virtual bool is_void () const { return false; }
> > @@ -687,6 +689,13 @@ public:
> >    /* Strip off the "const", giving the underlying type.  */
> >    type *unqualified () FINAL OVERRIDE { return m_other_type; }
> >  
> > +  virtual bool is_same_type_as (type *other)
> > +  {
> > +    return m_other_type->is_same_type_as (other->is_const ());
> > +  }
> 
> What happens if other_is_const () returns NULL, and
>   m_other_type->is_same_type_as ()
> tries to call a vfunc on it...

Fixed.

> 
> > +
> > +  virtual type *is_const () { return m_other_type; }
> > +
> >    void replay_into (replayer *) FINAL OVERRIDE;
> >  
> >  private:
> > @@ -701,9 +710,16 @@ public:
> >    memento_of_get_volatile (type *other_type)
> >    : decorated_type (other_type) {}
> >  
> > +  virtual bool is_same_type_as (type *other)
> > +  {
> > +    return m_other_type->is_same_type_as (other->is_volatile ());
> > +  }
> 
> ...with similar considerations here.
> 
> i.e. is it possible for the user to create combinations of qualifiers
> that lead to a vfunc call with NULL "this" (and thus a segfault?)
> 
> > +
> >    /* Strip off the "volatile", giving the underlying type.  */
> >    type *unqualified () FINAL OVERRIDE { return m_other_type; }
> >  
> > +  virtual type *is_volatile () { return m_other_type; }
> > +
> >    void replay_into (replayer *) FINAL OVERRIDE;
> >  
> 
> Hope this is constructive
> Dave
> 

From b2d78a2edc1cd8b24ff88f0da608a69f1dff8229 Mon Sep 17 00:00:00 2001
From: Antoni Boucher 

Re: [PATCH] Simplify branching in algos

2021-11-21 Thread François Dumont via Gcc-patches
A recent thread on this mailing list made me remember that this proposal 
is still open.


I've updated it just to add a missing std qualification.

François

On 08/06/21 5:21 pm, Jonathan Wakely wrote:

I haven't forgotten this one, I just need to double-check that we
don't create another problem like std::rotate in 9.1

I'll try to finish the review tomorrow.

J.


On 27/05/21 07:04 +0200, François Dumont via Libstdc++ wrote:
Following latest fixes in std::inplace_merge and std::stable_sort you 
propose Jonathan to enhance branching in the first.


Here is a proposal based on yours to do so in both algos.

    libstdc++: Enhance branching in std::inplace_merge and 
std::stable_sort


    libstdc++-v3/ChangeLog:

    * include/bits/stl_algo.h
    (__merge_adaptive): Adapt to merge only when buffer is 
large enough..
    (__merge_adaptive_resize): New, adapt merge when buffer 
is too small.

    (__inplace_merge): Adapt, use latter.
    (__stable_sort_adaptive): Adapt to sort only when buffer 
is large enough.
    (__stable_sort_adaptive_resize): New, adapt sort when 
buffer is too small.

    (__stable_sort): Adapt, use latter.

Tested under Linux x64.

Ok to commit ?

François



diff --git a/libstdc++-v3/include/bits/stl_algo.h 
b/libstdc++-v3/include/bits/stl_algo.h

index a18bb000d0c..02ae40c1fb4 100644
--- a/libstdc++-v3/include/bits/stl_algo.h
+++ b/libstdc++-v3/include/bits/stl_algo.h
@@ -2401,28 +2401,42 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
    }

  /// This is a helper function for the merge routines.
-  templatetemplate
   typename _Pointer, typename _Compare>
    void
    __merge_adaptive(_BidirectionalIterator __first,
 _BidirectionalIterator __middle,
 _BidirectionalIterator __last,
 _Distance __len1, _Distance __len2,
- _Pointer __buffer, _Distance __buffer_size,
- _Compare __comp)
+ _Pointer __buffer, _Compare __comp)
    {
-  if (__len1 <= __len2 && __len1 <= __buffer_size)
+  if (__len1 <= __len2)
{
  _Pointer __buffer_end = _GLIBCXX_MOVE3(__first, __middle, 
__buffer);
  std::__move_merge_adaptive(__buffer, __buffer_end, __middle, 
__last,

 __first, __comp);
}
-  else if (__len2 <= __buffer_size)
+  else
{
  _Pointer __buffer_end = _GLIBCXX_MOVE3(__middle, __last, 
__buffer);

  std::__move_merge_adaptive_backward(__first, __middle, __buffer,
  __buffer_end, __last, __comp);
}
+    }
+
+  template
+    void
+    __merge_adaptive_resize(_BidirectionalIterator __first,
+    _BidirectionalIterator __middle,
+    _BidirectionalIterator __last,
+    _Distance __len1, _Distance __len2,
+    _Pointer __buffer, _Distance __buffer_size,
+    _Compare __comp)
+    {
+  if (__len1 <= __buffer_size || __len2 <= __buffer_size)
+    std::__merge_adaptive(__first, __middle, __last,
+  __len1, __len2, __buffer, __comp);
  else
{
  _BidirectionalIterator __first_cut = __first;
@@ -2450,14 +2464,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION

  _BidirectionalIterator __new_middle
    = std::__rotate_adaptive(__first_cut, __middle, __second_cut,
- __len1 - __len11, __len22, __buffer,
- __buffer_size);
-  std::__merge_adaptive(__first, __first_cut, __new_middle, 
__len11,

-    __len22, __buffer, __buffer_size, __comp);
-  std::__merge_adaptive(__new_middle, __second_cut, __last,
-    __len1 - __len11,
-    __len2 - __len22, __buffer,
-    __buffer_size, __comp);
+ __len1 - __len11, __len22,
+ __buffer, __buffer_size);
+  std::__merge_adaptive_resize(__first, __first_cut, __new_middle,
+   __len11, __len22,
+   __buffer, __buffer_size, __comp);
+  std::__merge_adaptive_resize(__new_middle, __second_cut, __last,
+   __len1 - __len11, __len2 - __len22,
+   __buffer, __buffer_size, __comp);
}
    }

@@ -2535,11 +2549,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  // [first,middle) and [middle,last).
  _TmpBuf __buf(__first, std::min(__len1, __len2));

-  if (__buf.begin() == 0)
+  if (__builtin_expect(__buf.size() == __buf.requested_size(), 
true))

+    std::__merge_adaptive
+  (__first, __middle, __last, __len1, __len2, __buf.begin(), 
__comp);

+  else if (__builtin_expect(__buf.begin() == 0, false))
std::__merge_without_buffer
  (__first, __middle, __last, __len1, __len2, __comp);
  else
-    std::__merge_adaptive
+    std::__merge_adaptive_resize
  (__first, __middle, __last, __len1, __len2, __buf.begin(),
   _DistanceType(__buf.size()), __comp);
    }
@@ -2720,34 +2737,45 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
}
    }

-  template

Re: [PATCH] libgccjit: Add support for TLS variable [PR95415]

2021-11-21 Thread David Malcolm via Gcc-patches
On Sat, 2021-11-20 at 17:34 -0500, Antoni Boucher wrote:
> Hi.
> Here's the updated patch.
> See comments below.
> Thanks for your reviews!
> 
> Le jeudi 20 mai 2021 à 16:11 -0400, David Malcolm a écrit :
> > On Tue, 2021-05-18 at 20:43 -0400, Antoni Boucher via Gcc-patches
> > wrote:
> > > Hello.
> > > This patch adds support for TLS variables.
> > > One thing to fix before we merge it is the libgccjit.map file
> > > which
> > > contains LIBGCCJIT_ABI_16 instead of LIBGCCJIT_ABI_17.
> > > LIBGCCJIT_ABI_16 was added in one of my other patches.
> > > Thanks for the review.
> > 
> > > diff --git a/gcc/jit/docs/topics/compatibility.rst
> > > b/gcc/jit/docs/topics/compatibility.rst
> > > index 239b6aa1a92..d10bc1df080 100644
> > > --- a/gcc/jit/docs/topics/compatibility.rst
> > > +++ b/gcc/jit/docs/topics/compatibility.rst
> > > @@ -243,3 +243,12 @@ embedding assembler instructions:
> > >    * :func:`gcc_jit_extended_asm_add_input_operand`
> > >    * :func:`gcc_jit_extended_asm_add_clobber`
> > >    * :func:`gcc_jit_context_add_top_level_asm`
> > > +
> > > +.. _LIBGCCJIT_ABI_17:
> > > +
> > > +``LIBGCCJIT_ABI_17``
> > > +---
> > > +``LIBGCCJIT_ABI_17`` covers the addition of an API entrypoint to
> > > set
> > > the
> > > +thread-local storage model of a variable:
> > > +
> > > +  * :func:`gcc_jit_lvalue_set_tls_model`
> > 
> > Sorry about the delay in reviewing patches.
> > 
> > Is there a summary somewhere of the various outstanding patches and
> > their associated ABI versions?  Are there dependencies between the
> > patches?
> 
> The list of patches is there:
> https://github.com/antoyo/libgccjit-patches but I don't keep them up-
> to-date.
> If that would help you, I could add a README to tell what is the new
> ABI version for each patch.
> I believe there might be some patches that depend on a previous one.

That's not needed; I think all I need to know is what the next patch
you need me to look at is (FWIW I'm about to go on vacation for a week)

[...snip...]

> 
> > 
> > > +
> > > +void
> > > +create_code (gcc_jit_context *ctxt, void *user_data)
> > > +{
> > > +  /* Let's try to inject the equivalent of:
> > > +
> > > + _Thread_local int foo;
> > > +  */
> > > +  gcc_jit_type *int_type =
> > > +    gcc_jit_context_get_type (ctxt, GCC_JIT_TYPE_INT);
> > > +
> > > +  gcc_jit_lvalue *foo =
> > > +    gcc_jit_context_new_global (
> > > +  ctxt, NULL, GCC_JIT_GLOBAL_EXPORTED, int_type, "foo");
> > > +  gcc_jit_lvalue_set_tls_model (foo,
> > > GCC_JIT_TLS_MODEL_GLOBAL_DYNAMIC);
> > 
> > How many of the different enum values can be supported?  How
> > target-
> > dependent is this?
> 
> I'm not sure what you mean here. Are you asking that I test all the
> different enum values?

That would be ideal, but I don't think it's necessary.

> The tls_model enum is defined in gcc/coretypes.h and does not seem to
> change depending on the target. Maybe there are checks elsewhere for
> that, though.

It might be that some targets only support some modes; I don't know.


[...snip...]

Thanks for the updated patch.  It looks good to push to trunk once the
earlier ones are in place, though as usual please re-test it before
pushing.

Dave



Re: [PATCH] i386: Fix up handling of target attribute [PR101180]

2021-11-21 Thread Uros Bizjak via Gcc-patches
On Sat, Nov 20, 2021 at 9:20 AM Jakub Jelinek  wrote:
>
> Hi!
>
> As shown in the testcase below, if a function has multiple target attributes
> (rather than a single one with one or more arguments) or if a function
> gets one target attribute on one declaration and another one on another
> declaration, on x86 their effect is not combined into
> DECL_FUNCTION_SPECIFIC_TARGET, but instead only the last processed target
> attribute wins.  aarch64 handles this right, the following patch follows
> what it does, i.e. only start with target_option_default_node if
> DECL_FUNCTION_SPECIFIC_TARGET is previously NULL (i.e. the first target
> attribute being processed on a function) and otherwise start from the
> previous DECL_FUNCTION_SPECIFIC_TARGET.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2021-11-20  Jakub Jelinek  
>
> PR c++/101180
> * config/i386/i386-options.c (ix86_valid_target_attribute_p): If
> fndecl already has DECL_FUNCTION_SPECIFIC_TARGET, use that as base
> instead of target_option_default_node.
>
> * gcc.target/i386/pr101180.c: New test.

OK.

Thanks,
Uros.

>
> --- gcc/config/i386/i386-options.c.jj   2021-11-19 12:48:56.507415161 +0100
> +++ gcc/config/i386/i386-options.c  2021-11-19 13:04:31.618044781 +0100
> @@ -1443,8 +1443,11 @@ ix86_valid_target_attribute_p (tree fnde
>
>/* Initialize func_options to the default before its target options can
>   be set.  */
> +  tree old_target = DECL_FUNCTION_SPECIFIC_TARGET (fndecl);
> +  if (old_target == NULL_TREE)
> +old_target = target_option_default_node;
>cl_target_option_restore (_options, _options_set,
> -   TREE_TARGET_OPTION (target_option_default_node));
> +   TREE_TARGET_OPTION (old_target));
>
>/* FLAGS == 1 is used for target_clones attribute.  */
>new_target
> --- gcc/testsuite/gcc.target/i386/pr101180.c.jj 2021-11-19 13:24:19.334132937 
> +0100
> +++ gcc/testsuite/gcc.target/i386/pr101180.c2021-11-19 13:23:56.676454806 
> +0100
> @@ -0,0 +1,12 @@
> +/* PR c++/101180 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mno-avx -mno-crc32" } */
> +
> +#include 
> +
> +__attribute__((target ("avx"))) __attribute__((target ("crc32"))) void
> +foo (__m256 *p, unsigned int *q)
> +{
> +  __m256 c = _mm256_and_ps (p[0], p[1]);
> +  *q = __crc32b (*q, 0x55);
> +}
>
> Jakub
>


Re: [PATCH] gfortran: Improve translation of POPPAR intrinsic

2021-11-21 Thread Harald Anlauf via Gcc-patches

Let's have a look at the tree-dump of the existing testcase:

integer(kind=4) runtime_poppar (integer(kind=16) & restrict i)
{
  integer(kind=4) res;

  {
uint128_t D.4221;

D.4221 = (uint128_t) *i;
res = __builtin_parityll ((unsigned long) D.4221 ^ (unsigned long)
(D.4221 >> 64));
  }
  return res;
}

My understanding is there is actually nothing left to do,
as the middle-end(?) already handles this.

Am 21.11.21 um 01:22 schrieb Bernhard Reutner-Fischer via Fortran:

Roger pinged this on gcc-patches some time ago fwiw.
[The commit-hooks will likely fix or ignore s/bext/next/ in his
mail-addr]


On Sun, 14 Jun 2020 23:39:32 +0100
"Roger Sayle"  wrote:




The following patch to gfortran's trans-instrinsic.c tweaks the generic that
is produced

for popcnt on integer(kind=16).  Currently, the double word popcnt is
implemented as

parityll(hipart(x))^parityll(lopart(x)), but with this patch this is now
translated as

parityll(hipart(x)^lopart(x)).  This will be just an aesthetic change once
my tree-level

parity optimization patch of 12th June is reviewed and accepted, but
generating the

more efficient form initially, avoids a tiny bit of garbage collection when
the middle-end

cleans this up into its preferred form.The semantics/correctness of this

change are tested by the run-time tests in gfortran.dg/popcnt_poppar_2.F90



This patch has been tested with "make bootstrap" and "make -k check" on

x86_64-pc-linux-gnu with no regressions.  If approved, I'd very much

appreciate it if the (gfortran) reviewer could commit this change for me.



2020-06-14  Roger Sayle  



 * trans-intrinsic.c (gfc_conv_intrinsic_popcnt_poppar): Translate

 poppar(kind=16) as parityll(hipart(x)^lopart(x)) instead of

 parityll(hipart(x))^parityll(lopart(x)).





Thanks in advance,

Roger

--

Roger Sayle

NextMove Software

Cambridge, UK









Re: Fix failure in merge_block.c testcase

2021-11-21 Thread Jeff Law via Gcc-patches




On 11/21/2021 8:15 AM, Jan Hubicka via Gcc-patches wrote:

Hi,
this testcase needs -fno-ipa-modref becuase otherwise it hits the issue
that complete loop unrolling leaves somewhat mismatched profile.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/testsuite/ChangeLog:

2021-11-21  Jan Hubicka  

PR ipa/103264
* gcc.dg/tree-prof/merge_block.c: Add -fno-ipa-modref

Thank you.  I was planning to chase this down today.

jeff



Improve tracking of bases in modref

2021-11-21 Thread Jan Hubicka via Gcc-patches
Hi,
on exchange2 benchamrk we miss some useful propagation because modref gives
up very early on analyzing accesses through pointers.  For example in
int test (int *a)
{
  int i;
  for (i=0; a[i];i++);
  return i+a[i];
}

We are not able to determine that a[i] accesses are relative to a.
This is because get_access requires the SSA name that is in MEM_REF to be
PARM_DECL while on other places we use ipa-prop helper to work out the proper
base pointers.

This patch commonizes the code in get_access and parm_map_for_arg so both
use the check properly and extends it to also figure out that newly allocated
memory is not a side effect to caller.

It improves disambiguation rates:

Alias oracle query stats:
  refs_may_alias_p: 77359588 disambiguations, 102170294 queries
  ref_maybe_used_by_call_p: 645390 disambiguations, 78392252 queries
  call_may_clobber_ref_p: 386653 disambiguations, 389576 queries
  stmt_kills_ref_p: 106470 kills, 5685744 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 8923 queries
  nonoverlapping_refs_since_match_p: 30581 disambiguations, 65481 must 
overlaps, 97009 queries
  aliasing_component_refs_p: 56854 disambiguations, 15459249 queries
  TBAA oracle: 28236957 disambiguations 104812620 queries
   15360807 are in alias set 0
   8863925 queries asked about the same object
   99 queries asked about the same alias set
   0 access volatile
   50367859 are dependent in the DAG
   1982973 are aritificially in conflict with void *

Modref stats:
  modref kill: 71 kills, 8151 queries
  modref use: 25273 disambiguations, 704264 queries
  modref clobber: 1676006 disambiguations, 21805867 queries
  5264985 tbaa queries (0.241448 per modref query)
  762265 base compares (0.034957 per modref query)

PTA query stats:
  pt_solution_includes: 13460623 disambiguations, 40881373 queries
  pt_solutions_intersect: 1668037 disambiguations, 13958255 queries

to:

Alias oracle query stats:
  refs_may_alias_p: 77575173 disambiguations, 102390852 queries
  ref_maybe_used_by_call_p: 645932 disambiguations, 78607413 queries
  call_may_clobber_ref_p: 386813 disambiguations, 389693 queries
  stmt_kills_ref_p: 106551 kills, 5688432 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 8936 queries
  nonoverlapping_refs_since_match_p: 30583 disambiguations, 65514 must 
overlaps, 97044 queries
  aliasing_component_refs_p: 56847 disambiguations, 15459371 queries
  TBAA oracle: 28238952 disambiguations 104938558 queries
   15435200 are in alias set 0
   8876784 queries asked about the same object
   89 queries asked about the same alias set
   0 access volatile
   50400613 are dependent in the DAG
   1986920 are aritificially in conflict with void *

Modref stats:
  modref kill: 71 kills, 8130 queries
  modref use: 30684 disambiguations, 704287 queries
  modref clobber: 1694295 disambiguations, 21697882 queries
  5233712 tbaa queries (0.241208 per modref query)
  902240 base compares (0.041582 per modref query)

PTA query stats:
  pt_solution_includes: 13495059 disambiguations, 40917961 queries
  pt_solutions_intersect: 1667032 disambiguations, 13951159 queries

So 20% more modref use disambiguations which accounts to 0.3% overal
disambiguation and alo improves a bit situation with exchange2 benchmark,
while the real problem is still present (as dicussed in the pr)

gcc/ChangeLog:

2021-11-21  Jan Hubicka  

PR ipa/103227
* ipa-modref.c (parm_map_for_arg): Rename to ...
(parm_map_for_ptr): .. this one; handle static chain and calls to
malloc functions.
(modref_access_analysis::get_access): Use parm_map_for_ptr.
(modref_access_analysis::process_fnspec): Update.
(modref_access_analysis::analyze_load): Update.
(modref_access_analysis::analyze_store): Update.

gcc/testsuite/ChangeLog:

2021-11-21  Jan Hubicka  

PR ipa/103227
* gcc.dg/tree-ssa/modref-15.c: New test.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index a04e5855a9a..4f9323165ea 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -812,14 +812,15 @@ ignore_stores_p (tree caller, int flags)
   return false;
 }
 
-/* Determine parm_map for argument OP.  */
+/* Determine parm_map for PTR which is supposed to be a pointer.  */
 
 modref_parm_map
-parm_map_for_arg (tree op)
+parm_map_for_ptr (tree op)
 {
   bool offset_known;
   poly_int64 offset;
   struct modref_parm_map parm_map;
+  gcall *call;
 
   parm_map.parm_offset_known = false;
   parm_map.parm_offset = 0;
@@ -830,22 +831,26 @@ parm_map_for_arg (tree op)
   && TREE_CODE (SSA_NAME_VAR (op)) == PARM_DECL)
 {
   int index = 0;
-  for (tree t = DECL_ARGUMENTS (current_function_decl);
-  t != SSA_NAME_VAR (op); t = DECL_CHAIN (t))
-   {
- if (!t)
-   {
- index = MODREF_UNKNOWN_PARM;
- break;
-  

Fix failure in merge_block.c testcase

2021-11-21 Thread Jan Hubicka via Gcc-patches
Hi,
this testcase needs -fno-ipa-modref becuase otherwise it hits the issue
that complete loop unrolling leaves somewhat mismatched profile.

Bootstrapped/regtested x86_64-linux, comitted.

gcc/testsuite/ChangeLog:

2021-11-21  Jan Hubicka  

PR ipa/103264
* gcc.dg/tree-prof/merge_block.c: Add -fno-ipa-modref

diff --git a/gcc/testsuite/gcc.dg/tree-prof/merge_block.c 
b/gcc/testsuite/gcc.dg/tree-prof/merge_block.c
index 5da5ddff6a0..e8a8873f152 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/merge_block.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/merge_block.c
@@ -1,5 +1,5 @@
 
-/* { dg-options "-O2 -fno-ipa-pure-const -fdump-tree-optimized-blocks-details 
-fno-early-inlining" } */
+/* { dg-options "-O2 -fno-ipa-pure-const -fdump-tree-optimized-blocks-details 
-fno-early-inlining -fno-ipa-modref" } */
 int a[8];
 int t()
 {


Re: [PATCH v2] c-format: Add -Wformat-int-precision option [PR80060]

2021-11-21 Thread Daniil Stas via Gcc-patches
On Thu, 4 Nov 2021 18:25:14 -0600
Martin Sebor  wrote:

> On 10/31/21 8:13 AM, Daniil Stas wrote:
> > On Sun, 10 Oct 2021 23:10:20 +
> > Daniil Stas  wrote:
> >   
> >> This option is enabled by default when -Wformat option is enabled.
> >> A user can specify -Wno-format-int-precision to disable emitting
> >> warnings when passing an argument of an incompatible integer type
> >> to a 'd', 'i', 'o', 'u', 'x', or 'X' conversion specifier when it
> >> has the same precision as the expected type.
> >>
> >> Signed-off-by: Daniil Stas 
> >>
> >> gcc/c-family/ChangeLog:
> >>
> >>* c-format.c (check_format_types): Don't emit warnings when
> >>passing an argument of an incompatible integer type to
> >>a 'd', 'i', 'o', 'u', 'x', or 'X' conversion specifier when
> >> it has the same precision as the expected type if
> >>-Wno-format-int-precision option is specified.
> >>* c.opt: Add -Wformat-int-precision option.
> >>
> >> gcc/ChangeLog:
> >>
> >>* doc/invoke.texi: Add -Wformat-int-precision option
> >> description.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >>* c-c++-common/Wformat-int-precision-1.c: New test.
> >>* c-c++-common/Wformat-int-precision-2.c: New test.
> >> ---
> >> This is an update of patch "c-format: Add -Wformat-same-precision
> >> option [PR80060]". The changes comparing to the first patch
> >> version:
> >>
> >> - changed the option name to -Wformat-int-precision
> >> - changed the option description as was suggested by Martin
> >> - changed Wformat-int-precision-2.c to used dg-bogus instead of
> >> previous invalid syntax
> >>
> >> I also tried to combine the tests into one file with #pragma GCC
> >> diagnostic, but looks like it's not possible. I want to test that
> >> when passing just -Wformat option everything works as before my
> >> patch by default. And then in another test case to check that
> >> passing -Wno-format-int-precision disables the warning. But looks
> >> like in GCC you can't toggle the warnings such as
> >> -Wno-format-int-precision individually but only can disable the
> >> general -Wformat option that will disable all the formatting
> >> warnings together, which is not the proper test.  
> > 
> > Hi,
> > Can anyone review this patch?
> > Thank you  
> 
> I can't approve the change but it looks pretty good to me.
> 
> The documentation should wrap code symbols like int64_t, long,
> or printf in @code{} directives.
> 
> I don't think the first test needs to be restricted to just
> lp64, although I'd expect it to already be covered by the test
> suite.  The lp64 selector only tells us that int is 32 bits
> and long (and pointer) are 64, but nothing about long long so
> I suspect the test might fail on other targets.  There's llp64
> that's true for 4 byte ints and longs (but few targets match),
> and long_neq_int that's true when long is not the same size as
> int. So I think the inverse of the latter might be best, with
> int and long as arguments.  testsuite/lib/target-supports.exp
> defines these and others.
> 
> It might also be a good idea to add another case to the second
> test to exercise arguments with different precision to make
> sure -Wformat still triggers for those even  with
> -Wno-format-int-precision.
> 
> The -Wformat warnings are Joseph's domain (CC'd) so either he
> or some other C or global reviewer needs to sign off on changes
> in this area.  (Please ping the patch weekly until you get
> a response.)
> 
> Thanks
> Martin

Hi, Martin
Thanks for your response. I've sent an updated patch.

Best regards,
Daniil


[PATCH v3] c-format: Add -Wformat-int-precision option [PR80060]

2021-11-21 Thread Daniil Stas via Gcc-patches
This option is enabled by default when -Wformat option is enabled. A
user can specify -Wno-format-int-precision to disable emitting
warnings when passing an argument of an incompatible integer type to
a 'd', 'i', 'o', 'u', 'x', or 'X' conversion specifier when it has
the same precision as the expected type.

Signed-off-by: Daniil Stas 

gcc/c-family/ChangeLog:

* c-format.c (check_format_types): Don't emit warnings when
passing an argument of an incompatible integer type to
a 'd', 'i', 'o', 'u', 'x', or 'X' conversion specifier when it has
the same precision as the expected type if
-Wno-format-int-precision option is specified.
* c.opt: Add -Wformat-int-precision option.

gcc/ChangeLog:

* doc/invoke.texi: Add -Wformat-int-precision option description.

gcc/testsuite/ChangeLog:

* c-c++-common/Wformat-int-precision-1.c: New test.
* c-c++-common/Wformat-int-precision-2.c: New test.
---
Changes for v3:
  - Added additional @code{} derictives to the documentation where needed.
  - Changed tests to run on "! long_neq_int" target instead of "lp64".
  - Added a test case to check that gcc still emits warnings for arguments
  with different precision even with -Wno-format-int-precision option enabled.

Changes for v2:
  - Changed the option name to -Wformat-int-precision.
  - Changed the option description as was suggested by Martin.
  - Changed Wformat-int-precision-2.c to use dg-bogus instead of previous
  invalid syntax.

 gcc/c-family/c-format.c |  2 +-
 gcc/c-family/c.opt  |  6 ++
 gcc/doc/invoke.texi | 17 -
 .../c-c++-common/Wformat-int-precision-1.c  |  7 +++
 .../c-c++-common/Wformat-int-precision-2.c  |  8 
 5 files changed, 38 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/Wformat-int-precision-1.c
 create mode 100644 gcc/testsuite/c-c++-common/Wformat-int-precision-2.c

diff --git a/gcc/c-family/c-format.c b/gcc/c-family/c-format.c
index e735e092043..c66787f931f 100644
--- a/gcc/c-family/c-format.c
+++ b/gcc/c-family/c-format.c
@@ -4248,7 +4248,7 @@ check_format_types (const substring_loc _loc,
  && (!pedantic || i < 2)
  && char_type_flag)
continue;
-  if (types->scalar_identity_flag
+  if ((types->scalar_identity_flag || !warn_format_int_precision)
  && (TREE_CODE (cur_type) == TREE_CODE (wanted_type)
  || (INTEGRAL_TYPE_P (cur_type)
  && INTEGRAL_TYPE_P (wanted_type)))
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index 3976fc368db..0621585a4f9 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -684,6 +684,12 @@ C ObjC C++ LTO ObjC++ Warning Alias(Wformat-overflow=, 1, 
0) IntegerRange(0, 2)
 Warn about function calls with format strings that write past the end
 of the destination region.  Same as -Wformat-overflow=1.
 
+Wformat-int-precision
+C ObjC C++ ObjC++ Var(warn_format_int_precision) Warning LangEnabledBy(C ObjC 
C++ ObjC++,Wformat=,warn_format >= 1, 0)
+Warn when passing an argument of an incompatible integer type to a 'd', 'i',
+'o', 'u', 'x', or 'X' conversion specifier even when it has the same precision
+as the expected type.
+
 Wformat-security
 C ObjC C++ ObjC++ Var(warn_format_security) Warning LangEnabledBy(C ObjC C++ 
ObjC++,Wformat=, warn_format >= 2, 0)
 Warn about possible security problems with format functions.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4b1b58318f0..da69d804598 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -351,7 +351,7 @@ Objective-C and Objective-C++ Dialects}.
 -Werror  -Werror=*  -Wexpansion-to-defined  -Wfatal-errors @gol
 -Wfloat-conversion  -Wfloat-equal  -Wformat  -Wformat=2 @gol
 -Wno-format-contains-nul  -Wno-format-extra-args  @gol
--Wformat-nonliteral  -Wformat-overflow=@var{n} @gol
+-Wformat-nonliteral  -Wformat-overflow=@var{n} -Wformat-int-precision @gol
 -Wformat-security  -Wformat-signedness  -Wformat-truncation=@var{n} @gol
 -Wformat-y2k  -Wframe-address @gol
 -Wframe-larger-than=@var{byte-size}  -Wno-free-nonheap-object @gol
@@ -6113,6 +6113,21 @@ If @option{-Wformat} is specified, also warn if the 
format string is not a
 string literal and so cannot be checked, unless the format function
 takes its format arguments as a @code{va_list}.
 
+@item -Wformat-int-precision
+@opindex Wformat-int-precision
+@opindex Wno-format-int-precision
+Warn when passing an argument of an incompatible integer type to
+a @samp{d}, @samp{i}, @samp{o}, @samp{u}, @samp{x}, or @samp{X} conversion
+specifier even when it has the same precision as the expected type.
+For example, on targets where @code{int64_t} is a typedef for @code{long},
+the warning is issued for the @code{printf} call below even when both
+@code{long} and @code{long long} have the same size and precision.
+
+@smallexample
+  extern int64_t n;
+  printf 

[PATCH] tree-optimization: [PR31531] Improve ~a < CST, allow a nop cast inbetween ~ and a

2021-11-21 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

This PR was orignally for the missed optimization of a few isnegative which
had been solved a long time ago (sometime before 4.4.0). I noticed there was
one missed optimization on the gimple level. There is a match.pd pattern
for ~a < CST but we miss that there could be a nop_convert between the the
comparison and the bit_not. This adds the optional option cast to the current
match.pd pattern.

OK? Bootstrapped and tested on x86_64 with no regressions.

PR tree-optimization/31531

gcc/ChangeLog:

* match.pd (~X op C): Allow for an optional nop convert.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr31531-1.c: New test.
---
 gcc/match.pd  |  5 +++--
 gcc/testsuite/gcc.dg/tree-ssa/pr31531-1.c | 19 +++
 2 files changed, 22 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr31531-1.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 37c5be9e5f4..ca6c9eff624 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4729,10 +4729,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (for cmp (simple_comparison)
  scmp (swapped_simple_comparison)
  (simplify
-  (cmp (bit_not@2 @0) CONSTANT_CLASS_P@1)
+  (cmp (nop_convert?:s (bit_not@2 @0)) CONSTANT_CLASS_P@1)
   (if (single_use (@2)
&& (TREE_CODE (@1) == INTEGER_CST || TREE_CODE (@1) == VECTOR_CST))
-   (scmp @0 (bit_not @1)
+   (with { tree type1 = TREE_TYPE (@1); }
+(scmp (convert:type1 @0) (bit_not @1))
 
 (for cmp (simple_comparison)
  /* Fold (double)float1 CMP (double)float2 into float1 CMP float2.  */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr31531-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr31531-1.c
new file mode 100644
index 000..c27299151eb
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr31531-1.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* PR tree-optimization/31531 */
+
+int f(int a)
+{
+  int b = ~a;
+  return b<0;
+}
+
+
+int f1(unsigned a)
+{
+  int b = ~a;
+  return b<0;
+}
+/* We should convert the above two functions from b <0 to ((int)a) >= 0. */
+/* { dg-final { scan-tree-dump-times ">= 0" 2 "optimized"} } */
+/* { dg-final { scan-tree-dump-times "~" 0 "optimized"} } */
-- 
2.17.1



Re: [PATCH] fortran, debug: Fix up DW_AT_rank [PR103315]

2021-11-21 Thread Mikael Morin

Le 19/11/2021 à 10:40, Jakub Jelinek via Fortran a écrit :


Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


Hello,

you know probably better than me or any fortran maintainer whether it’s 
good or bad.

So OK from the fortran side.


Re: [PATCH] PR fortran/87851 - [9/10/11/12 Regression] Wrong return type for len_trim

2021-11-21 Thread Mikael Morin

Le 19/11/2021 à 20:47, Harald Anlauf via Fortran a écrit :

Dear Fortranners,

scalariziation of the elemental intrinsic LEN_TRIM was ICEing
when the optional KIND argument was present.

The cleanest solution is to use the infrastructure added by
Mikael's fix for PR97896.  In that case it is a 1-liner.  :-)

That fix is available on mainline and on 11-branch only, though.
My suggestion is to fix the current PR only for the same branches,
leaving the regression unfixed for older ones.

Regtested on x86_64-pc-linux-gnu.  OK for mainline and 11-branch?


Your change itself is fine.
The PR was originally about a type mismatch between the gfortran library 
and the call generated by the front-end.

As the code generated contains a cast, I think it’s fine as well.
But please give Thomas (bug reporter) one more day to comment on this.
Then I think you can proceed.

Thanks.


Re: [PATCH] PR fortran/99061 - [10/11/12 Regression] ICE in gfc_conv_intrinsic_atan2d, at fortran/trans-intrinsic.c:4728

2021-11-21 Thread Mikael Morin

Le 15/11/2021 à 22:38, Harald Anlauf via Fortran a écrit :

Dear Fortranners,

the attached patch fixes the handling of the DEC trigonometric intrinsics
for different argument kinds.  It is based on the original patch by Steve,
which fixes the lookup for the needed intrinsics.

Regtested on x86_64-pc-linux-gnu.  OK for affected branches?


OK. Thanks.



[PATCH][_GLIBCXX_DEBUG] Enhance std::erase_if for vector/deque

2021-11-21 Thread François Dumont via Gcc-patches
I tried to use the same approach I used for node based containers but 
got ambiguity on erase calls. I think this simple version will do the work.


    libstdc++: [_GLIBCXX_DEBUG] Enhance std::erase_if for vector/deque

    libstdc++-v3/ChangeLog:

    * include/std/deque (erase_if): Use _GLIBCXX_STD_C 
container reference and

    __niter_wrap to limit _GLIBCXX_DEBUG mode impact.
    * include/std/vector (erase_if): Likewise.

Tested under Linux x86_64 normal and _GLIBCXX_DEBUG modes.

Ok to commit ?

François

diff --git a/libstdc++-v3/include/std/deque b/libstdc++-v3/include/std/deque
index 473479c44ac..0a3541af554 100644
--- a/libstdc++-v3/include/std/deque
+++ b/libstdc++-v3/include/std/deque
@@ -96,12 +96,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 erase_if(deque<_Tp, _Alloc>& __cont, _Predicate __pred)
 {
   using namespace __gnu_cxx;
+  _GLIBCXX_STD_C::deque<_Tp, _Alloc>& __ucont = __cont;
   const auto __osz = __cont.size();
-  const auto __end = __cont.end();
-  auto __removed = std::__remove_if(__cont.begin(), __end,
+  const auto __end = __ucont.end();
+  auto __removed = std::__remove_if(__ucont.begin(), __end,
 	__ops::__pred_iter(std::ref(__pred)));
-  __cont.erase(__removed, __end);
-  return __osz - __cont.size();
+  if (__removed != __end)
+	{
+	  __cont.erase(__niter_wrap(__cont.begin(), __removed),
+		   __cont.end());
+	  return __osz - __cont.size();
+	}
+
+  return 0;
 }
 
   template
@@ -109,12 +116,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 erase(deque<_Tp, _Alloc>& __cont, const _Up& __value)
 {
   using namespace __gnu_cxx;
+  _GLIBCXX_STD_C::deque<_Tp, _Alloc>& __ucont = __cont;
   const auto __osz = __cont.size();
-  const auto __end = __cont.end();
-  auto __removed = std::__remove_if(__cont.begin(), __end,
+  const auto __end = __ucont.end();
+  auto __removed = std::__remove_if(__ucont.begin(), __end,
 	__ops::__iter_equals_val(__value));
-  __cont.erase(__removed, __end);
-  return __osz - __cont.size();
+  if (__removed != __end)
+	{
+	  __cont.erase(__niter_wrap(__cont.begin(), __removed),
+		   __cont.end());
+	  return __osz - __cont.size();
+	}
+
+  return 0;
 }
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std
diff --git a/libstdc++-v3/include/std/vector b/libstdc++-v3/include/std/vector
index 890b0ddb3eb..b648b3d7309 100644
--- a/libstdc++-v3/include/std/vector
+++ b/libstdc++-v3/include/std/vector
@@ -107,12 +107,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 erase_if(vector<_Tp, _Alloc>& __cont, _Predicate __pred)
 {
   using namespace __gnu_cxx;
+  _GLIBCXX_STD_C::vector<_Tp, _Alloc>& __ucont = __cont;
   const auto __osz = __cont.size();
-  const auto __end = __cont.end();
-  auto __removed = std::__remove_if(__cont.begin(), __end,
+  const auto __end = __ucont.end();
+  auto __removed = std::__remove_if(__ucont.begin(), __end,
 	__ops::__pred_iter(std::ref(__pred)));
-  __cont.erase(__removed, __end);
-  return __osz - __cont.size();
+  if (__removed != __end)
+	{
+	  __cont.erase(__niter_wrap(__cont.begin(), __removed),
+		   __cont.end());
+	  return __osz - __cont.size();
+	}
+
+  return 0;
 }
 
   template
@@ -121,12 +128,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 erase(vector<_Tp, _Alloc>& __cont, const _Up& __value)
 {
   using namespace __gnu_cxx;
+  _GLIBCXX_STD_C::vector<_Tp, _Alloc>& __ucont = __cont;
   const auto __osz = __cont.size();
-  const auto __end = __cont.end();
-  auto __removed = std::__remove_if(__cont.begin(), __end,
+  const auto __end = __ucont.end();
+  auto __removed = std::__remove_if(__ucont.begin(), __end,
 	__ops::__iter_equals_val(__value));
-  __cont.erase(__removed, __end);
-  return __osz - __cont.size();
+  if (__removed != __end)
+	{
+	  __cont.erase(__niter_wrap(__cont.begin(), __removed),
+		   __cont.end());
+	  return __osz - __cont.size();
+	}
+
+  return 0;
 }
 _GLIBCXX_END_NAMESPACE_VERSION
 } // namespace std


[PATCH v2] Fortran manual: extend deprecated BOZ examples with X'ABC'

2021-11-21 Thread Sergei Trofimovich via Gcc-patches
From: Sergei Trofimovich 

gcc/fortran/
* gfortran.texi (BOZ literal constants): add X'ABC' to the list
of valid examples.
---
 gcc/fortran/gfortran.texi | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 326470964b0..f7184147a82 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -1465,10 +1465,10 @@ dependent.  Gfortran interprets the sign bit as a user 
would expect.
 As a deprecated extension, GNU Fortran allows hexadecimal BOZ literal
 constants to be specified using the @code{X} prefix.  That the BOZ literal
 constant can also be specified by adding a suffix to the string, for
-example, @code{Z'ABC'} and @code{'ABC'X} are equivalent.  Additionally,
-as extension, BOZ literals are permitted in some contexts outside of
-@code{DATA} and the intrinsic functions listed in the Fortran standard.
-Use @option{-fallow-invalid-boz} to enable the extension.
+example, @code{Z'ABC'}, @code{'ABC'X} and @code{X'ABC'} are equivalent.
+Additionally, as extension, BOZ literals are permitted in some contexts
+outside of @code{DATA} and the intrinsic functions listed in the Fortran
+standard. Use @option{-fallow-invalid-boz} to enable the extension.
 
 @node Real array indices
 @subsection Real array indices
-- 
2.33.1



[PATCH] Fortran manual: fix invalid BOZ 'ABC'X example to be X'ABC'.

2021-11-21 Thread Sergei Trofimovich via Gcc-patches
From: Sergei Trofimovich 

gcc/fortran/
* gfortran.texi (BOZ literal constants): fix invalid BOZ 'ABC'X
example to be X'ABC'.
---
 gcc/fortran/gfortran.texi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 326470964b0..f01a49c47cc 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -1465,7 +1465,7 @@ dependent.  Gfortran interprets the sign bit as a user 
would expect.
 As a deprecated extension, GNU Fortran allows hexadecimal BOZ literal
 constants to be specified using the @code{X} prefix.  That the BOZ literal
 constant can also be specified by adding a suffix to the string, for
-example, @code{Z'ABC'} and @code{'ABC'X} are equivalent.  Additionally,
+example, @code{Z'ABC'} and @code{X'ABC'} are equivalent.  Additionally,
 as extension, BOZ literals are permitted in some contexts outside of
 @code{DATA} and the intrinsic functions listed in the Fortran standard.
 Use @option{-fallow-invalid-boz} to enable the extension.
-- 
2.33.1