[PATCH] tree-optimization/107865 - ICE with outlining of loops

2022-11-24 Thread Richard Biener via Gcc-patches
The following makes sure to clear loops number of iterations when
outlining them as part of a SESE region as can happen with
auto-parallelization.  The referenced SSA names become stale otherwise.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

PR tree-optimization/107865
* tree-cfg.cc (move_sese_region_to_fn): Free the number of
iterations of moved loops.

* gfortran.dg/graphite/pr107865.f90: New testcase.
---
 .../gfortran.dg/graphite/pr107865.f90  | 18 ++
 gcc/tree-cfg.cc|  2 ++
 2 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/graphite/pr107865.f90

diff --git a/gcc/testsuite/gfortran.dg/graphite/pr107865.f90 
b/gcc/testsuite/gfortran.dg/graphite/pr107865.f90
new file mode 100644
index 000..6bddb17a1be
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/graphite/pr107865.f90
@@ -0,0 +1,18 @@
+! { dg-do compile }
+! { dg-options "-O1 -floop-parallelize-all -ftree-parallelize-loops=2" }
+
+  SUBROUTINE FNC (F)
+
+  IMPLICIT REAL (A-H)
+  DIMENSION F(N)
+
+  DO I = 1, 6
+ DO J = 1, 6
+IF (J .NE. I) THEN
+   F(I) = F(I) + 1
+END IF
+ END DO
+  END DO
+
+  RETURN
+  END
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 28175312afc..0c409b435fb 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -7859,6 +7859,8 @@ move_sese_region_to_fn (struct function *dest_cfun, 
basic_block entry_bb,
   if (bb->loop_father->header == bb)
{
  class loop *this_loop = bb->loop_father;
+ /* Avoid the need to remap SSA names used in nb_iterations.  */
+ free_numbers_of_iterations_estimates (this_loop);
  class loop *outer = loop_outer (this_loop);
  if (outer == loop
  /* If the SESE region contains some bbs ending with
-- 
2.35.3


Re: [PATCH-1, rs6000] Generate permute index directly for little endian target [PR100866]

2022-11-24 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

Sorry for the late review.

on 2022/10/11 15:38, HAO CHEN GUI wrote:
> Hi,
>   This patch modifies the help function which generates permute index for
> vector byte reversion and generates permute index directly for little endian
> targets. It saves one "xxlnor" instructions on P8 little endian targets as
> the original process needs an "xxlnor" to calculate complement for the index.
> 

Nice.

> Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-10-11  Haochen Gui 
> 
> gcc/
>   PR target/100866
>   * config/rs6000/rs6000-call.cc (swap_endian_selector_for_mode):
>   Generate permute index directly for little endian targets.
>   * config/rs6000/vsx.md (revb_): Call vprem directly with
>   corresponding permute indexes.
> 
> gcc/testsuite/
>   PR target/100866
>   * gcc.target/powerpc/pr100866.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-call.cc 
> b/gcc/config/rs6000/rs6000-call.cc
> index 551968b0995..bad8e9e0e52 100644
> --- a/gcc/config/rs6000/rs6000-call.cc
> +++ b/gcc/config/rs6000/rs6000-call.cc
> @@ -2839,7 +2839,10 @@ swap_endian_selector_for_mode (machine_mode mode)
>  }
> 
>for (i = 0; i < 16; ++i)
> -perm[i] = GEN_INT (swaparray[i]);
> +if (BYTES_BIG_ENDIAN)
> +  perm[i] = GEN_INT (swaparray[i]);
> +else
> +  perm[i] = GEN_INT (~swaparray[i] & 0x001f);

IMHO, it would be good to add a function comment for this function,
it's sad that we didn't have it before.  With this patch, the selector (perm) is
expected to be used with vperm direct as shown below, it would be good to note 
it
explicitly for other potential callers too.

> 
>return force_reg (V16QImode, gen_rtx_CONST_VECTOR (V16QImode,
>gen_rtvec_v (16, perm)));
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index e226a93bbe5..b68eba48d2c 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -6096,8 +6096,8 @@ (define_expand "revb_"
>to the endian mode in use, i.e. in LE mode, put elements
>in BE order.  */
>rtx sel = swap_endian_selector_for_mode(mode);
> -  emit_insn (gen_altivec_vperm_ (operands[0], operands[1],
> -operands[1], sel));
> +  emit_insn (gen_altivec_vperm__direct (operands[0], operands[1],
> +   operands[1], sel));>  }
> 
>DONE;
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr100866.c 
> b/gcc/testsuite/gcc.target/powerpc/pr100866.c
> new file mode 100644
> index 000..c708dfd502e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr100866.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> +/* { dg-final { scan-assembler-not "xxlnor" } } */

Nit: may be better with {\mxxlnor\M}?

The others look good to me.  Thanks!

BR,
Kewen

> +
> +#include 
> +
> +vector unsigned short revb(vector unsigned short a)
> +{
> +   return vec_revb(a);
> +}
> 




[PATCH V3] [x86] Fix incorrect _mm_cvtsbh_ss.

2022-11-24 Thread liuhongt via Gcc-patches
Update in V3:
Remove !flag_signaling_nans since there's already HONOR_NANS (BFmode).

Here's the patch:

After supporting real __bf16, the implementation of _mm_cvtsbh_ss went
wrong.

The patch add a builtin to generate pslld for the intrinsic, also
extendbfsf2 is supported with pslld when !HONOR_NANS (BFmode).

truncsfbf2 is supported with vcvtneps2bf16 when
!HONOR_NANS (BFmode) && flag_unsafe_math_optimizations.

gcc/ChangeLog:

PR target/107748
* config/i386/avx512bf16intrin.h (_mm_cvtsbh_ss): Refined.
* config/i386/i386-builtin-types.def (FLOAT_FTYPE_BFLOAT16):
New function type.
* config/i386/i386-builtin.def (BDESC): New builtin.
* config/i386/i386-expand.cc (ix86_expand_args_builtin):
Handle the builtin.
* config/i386/i386.md (extendbfsf2): New expander.
(extendbfsf2_1): New define_insn.
(truncsfbf2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: Scan pslld.
* gcc.target/i386/extendbfsf.c: New test.
---
 gcc/config/i386/avx512bf16intrin.h|  4 +-
 gcc/config/i386/i386-builtin-types.def|  1 +
 gcc/config/i386/i386-builtin.def  |  2 +
 gcc/config/i386/i386-expand.cc|  1 +
 gcc/config/i386/i386.md   | 40 ++-
 .../gcc.target/i386/avx512bf16-cvtsbh2ss-1.c  |  3 +-
 gcc/testsuite/gcc.target/i386/extendbfsf.c| 16 
 7 files changed, 61 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/extendbfsf.c

diff --git a/gcc/config/i386/avx512bf16intrin.h 
b/gcc/config/i386/avx512bf16intrin.h
index ea1d0125b3f..75378af5584 100644
--- a/gcc/config/i386/avx512bf16intrin.h
+++ b/gcc/config/i386/avx512bf16intrin.h
@@ -46,9 +46,7 @@ extern __inline float
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_cvtsbh_ss (__bf16 __A)
 {
-  union{ float a; unsigned int b;} __tmp;
-  __tmp.b = ((unsigned int)(__A)) << 16;
-  return __tmp.a;
+  return __builtin_ia32_cvtbf2sf (__A);
 }
 
 /* vcvtne2ps2bf16 */
diff --git a/gcc/config/i386/i386-builtin-types.def 
b/gcc/config/i386/i386-builtin-types.def
index d10de32643f..65fe070e37f 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1281,6 +1281,7 @@ DEF_FUNCTION_TYPE (V4SI, V4SI, V4SI, UHI)
 DEF_FUNCTION_TYPE (V8SI, V8SI, V8SI, UHI)
 
 # BF16 builtins
+DEF_FUNCTION_TYPE (FLOAT, BFLOAT16)
 DEF_FUNCTION_TYPE (V32BF, V16SF, V16SF)
 DEF_FUNCTION_TYPE (V32BF, V16SF, V16SF, V32BF, USI)
 DEF_FUNCTION_TYPE (V32BF, V16SF, V16SF, USI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 5e0461acc00..d85b1753039 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2838,6 +2838,8 @@ BDESC (0, OPTION_MASK_ISA2_AVX512BF16, 
CODE_FOR_avx512f_dpbf16ps_v8sf_maskz, "__
 BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf, 
"__builtin_ia32_dpbf16ps_v4sf", IX86_BUILTIN_DPBF16PS_V4SF, UNKNOWN, (int) 
V4SF_FTYPE_V4SF_V8BF_V8BF)
 BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_mask, 
"__builtin_ia32_dpbf16ps_v4sf_mask", IX86_BUILTIN_DPBF16PS_V4SF_MASK, UNKNOWN, 
(int) V4SF_FTYPE_V4SF_V8BF_V8BF_UQI)
 BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_maskz, 
"__builtin_ia32_dpbf16ps_v4sf_maskz", IX86_BUILTIN_DPBF16PS_V4SF_MASKZ, 
UNKNOWN, (int) V4SF_FTYPE_V4SF_V8BF_V8BF_UQI)
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_extendbfsf2_1, 
"__builtin_ia32_cvtbf2sf", IX86_BUILTIN_CVTBF2SF, UNKNOWN, (int) 
FLOAT_FTYPE_BFLOAT16)
+
 
 /* AVX512FP16.  */
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
CODE_FOR_addv8hf3_mask, "__builtin_ia32_addph128_mask", 
IX86_BUILTIN_ADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 0373c3614a4..d26e7e41445 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -10423,6 +10423,7 @@ ix86_expand_args_builtin (const struct 
builtin_description *d,
   return ix86_expand_sse_ptest (d, exp, target);
 case FLOAT128_FTYPE_FLOAT128:
 case FLOAT_FTYPE_FLOAT:
+case FLOAT_FTYPE_BFLOAT16:
 case INT_FTYPE_INT:
 case UINT_FTYPE_UINT:
 case UINT16_FTYPE_UINT16:
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 01faa911b77..9451883396c 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -130,6 +130,7 @@ (define_c_enum "unspec" [
   ;; For AVX/AVX512F support
   UNSPEC_SCALEF
   UNSPEC_PCMP
+  UNSPEC_CVTBFSF
 
   ;; Generic math support
   UNSPEC_IEEE_MIN  ; not commutative
@@ -4961,6 +4962,31 @@ (define_insn "*extendhf2"
(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
+(define_expand "extendbfsf2"
+  [(set (match_operand:SF 0 "register_operand")
+   (unspec:SF
+ [(match_operand:BF 1 "register_operand")]
+UNSPEC_CVTBFS

Re: [PATCH V2] Update block move for struct param or returns

2022-11-24 Thread Jiufu Guo via Gcc-patches


Based on the discussions in previous mails:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607139.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607197.html

I will update the patch accordingly, and then submit a new version.

BR,
Jeff (Jiufu)

Jiufu Guo  writes:

> Hi,
>
> When assigning a parameter to a variable, or assigning a variable to
> return value with struct type, "block move" are used to expand
> the assignment. It would be better to use the register mode according
> to the target/ABI to move the blocks. And then this would raise more 
> opportunities for other optimization passes(cse/dse/xprop).
>
> As the example code (like code in PR65421):
>
> typedef struct SA {double a[3];} A;
> A ret_arg_pt (A *a){return *a;} // on ppc64le, only 3 lfd(s)
> A ret_arg (A a) {return a;} // just empty fun body
> void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s)
>
> This patch is based on the previous version which supports assignments
> from parameter:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605709.html
> This patch also supports returns.
>
> I also tried to update gimplify/nrv to replace "return D.xxx;" with
> "return ;". While there is one issue: "" with
> PARALLEL code can not be accessed through address/component_ref.
> This issue blocks a few passes (e.g. sra, expand).
>
> On ppc64, some dead stores are not eliminated. e.g. for ret_arg:
> .cfi_startproc
> std 4,56(1)//reductant
> std 5,64(1)//reductant
> std 6,72(1)//reductant
> std 4,0(3)
> std 5,8(3)
> std 6,16(3)
> blr
>
> Bootstraped and regtested on ppc64le and x86_64.
>
> I'm wondering if this patch could be committed first.
> Thanks for the comments and suggestions.
>
>
> BR,
> Jeff (Jiufu)
>
>   PR target/65421
>
> gcc/ChangeLog:
>
>   * cfgexpand.cc (expand_used_vars): Add collecting return VARs.
>   (expand_gimple_stmt_1): Call expand_special_struct_assignment.
>   (pass_expand::execute): Free collections of return VARs.
>   * expr.cc (expand_special_struct_assignment): New function.
>   * expr.h (expand_special_struct_assignment): Declare.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/pr65421-1.c: New test.
>   * gcc.target/powerpc/pr65421.c: New test.
>
> ---
>  gcc/cfgexpand.cc | 37 +
>  gcc/expr.cc  | 43 
>  gcc/expr.h   |  3 ++
>  gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 21 ++
>  gcc/testsuite/gcc.target/powerpc/pr65421.c   | 19 +
>  5 files changed, 123 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c
>
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index dd29c03..f185de39341 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -341,6 +341,9 @@ static hash_map *decl_to_stack_part;
> all of them in one big sweep.  */
>  static bitmap_obstack stack_var_bitmap_obstack;
>  
> +/* Those VARs on returns.  */
> +static bitmap return_vars;
> +
>  /* An array of indices such that stack_vars[stack_vars_sorted[i]].size
> is non-decreasing.  */
>  static size_t *stack_vars_sorted;
> @@ -2158,6 +2161,24 @@ expand_used_vars (bitmap forced_stack_vars)
>  frame_phase = off ? align - off : 0;
>}
>  
> +  /* Collect VARs on returns.  */
> +  return_vars = NULL;
> +  if (DECL_RESULT (current_function_decl)
> +  && TYPE_MODE (TREE_TYPE (DECL_RESULT (current_function_decl))) == 
> BLKmode)
> +{
> +  return_vars = BITMAP_ALLOC (NULL);
> +
> +  edge_iterator ei;
> +  edge e;
> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> + if (greturn *ret = safe_dyn_cast (last_stmt (e->src)))
> +   {
> + tree val = gimple_return_retval (ret);
> + if (val && VAR_P (val))
> +   bitmap_set_bit (return_vars, DECL_UID (val));
> +   }
> +}
> +
>/* Set TREE_USED on all variables in the local_decls.  */
>FOR_EACH_LOCAL_DECL (cfun, i, var)
>  TREE_USED (var) = 1;
> @@ -3942,6 +3963,17 @@ expand_gimple_stmt_1 (gimple *stmt)
> /* This is a clobber to mark the going out of scope for
>this LHS.  */
> expand_clobber (lhs);
> + else if ((TREE_CODE (rhs) == PARM_DECL && DECL_INCOMING_RTL (rhs)
> +   && TYPE_MODE (TREE_TYPE (rhs)) == BLKmode
> +   && (GET_CODE (DECL_INCOMING_RTL (rhs)) == PARALLEL
> +   || REG_P (DECL_INCOMING_RTL (rhs
> +  || (VAR_P (lhs) && return_vars
> +  && DECL_RTL_SET_P (DECL_RESULT (current_function_decl))
> +  && GET_CODE (
> +   DECL_RTL (DECL_RESULT (current_function_decl)))
> +   == PARALLEL
> +  && bitmap_bit_p (return_vars

Re: [PATCH V2] Use subscalar mode to move struct block for parameter

2022-11-24 Thread Jiufu Guo via Gcc-patches


Hi Richard,

Thanks a lot for your comments!

Richard Biener  writes:

> On Wed, 23 Nov 2022, Jiufu Guo wrote:
>
>> Hi Jeff,
>> 
>> Thanks a lot for your comments!
>
> Sorry for the late response ...
>
>> Jeff Law  writes:
>> 
>> > On 11/20/22 20:07, Jiufu Guo wrote:
>> >> Jiufu Guo  writes:
>> >>
>> >>> Hi,
>> >>>
>> >>> As mentioned in the previous version patch:
>> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
>> >>> The suboptimal code is generated for "assigning from parameter" or
>> >>> "assigning to return value".
>> >>> This patch enhances the assignment from parameters like the below
>> >>> cases:
>> >>> /case1.c
>> >>> typedef struct SA {double a[3];long l; } A;
>> >>> A ret_arg (A a) {return a;}
>> >>> void st_arg (A a, A *p) {*p = a;}
>> >>>
>> >>> case2.c
>> >>> typedef struct SA {double a[3];} A;
>> >>> A ret_arg (A a) {return a;}
>> >>> void st_arg (A a, A *p) {*p = a;}
>> >>>
>> >>> For this patch, bootstrap and regtest pass on ppc64{,le}
>> >>> and x86_64.
>> >>> * Besides asking for help reviewing this patch, I would like to
>> >>> consult comments about enhancing for "assigning to returns".
>> >> I updated the patch to fix the issue for returns.  This patch
>> >> adds a flag DECL_USEDBY_RETURN_P to indicate if a var is used
>> >> by a return stmt.  This patch fix the issue in expand pass only,
>> >> so, we would try to update the patch to avoid this flag.
>> >>
>> >> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
>> >> index dd29c03..09b8ec64cea 100644
>> >> --- a/gcc/cfgexpand.cc
>> >> +++ b/gcc/cfgexpand.cc
>> >> @@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars)
>> >>   frame_phase = off ? align - off : 0;
>> >> }
>> >>   +  /* Collect VARs on returns.  */
>> >> +  if (DECL_RESULT (current_function_decl))
>> >> +{
>> >> +  edge_iterator ei;
>> >> +  edge e;
>> >> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>> >> + if (greturn *ret = safe_dyn_cast (last_stmt (e->src)))
>> >> +   {
>> >> + tree val = gimple_return_retval (ret);
>> >> + if (val && VAR_P (val))
>> >> +   DECL_USEDBY_RETURN_P (val) = 1;
>
> you probably want to check && auto_var_in_fn (val, ...) since val
> might be global?
Since we are collecting the return vals, it would be built in function
gimplify_return_expr.  In this function, create_tmp_reg is used and
a local temp.  So it would not be a global var here.

code piece in gimplify_return_expr:
  if (!result_decl)
result = NULL_TREE;
  else if (aggregate_value_p (result_decl, TREE_TYPE (current_function_decl)))
{

  result = result_decl;
}
  else if (gimplify_ctxp->return_temp)
result = gimplify_ctxp->return_temp;
  else
{
  result = create_tmp_reg (TREE_TYPE (result_decl));

Here, for "typedef struct SA {double a[3];}", aggregate_value_p returns
false for target like ppc64le, because result of "hard_function_value"
is a "rtx with PARALLELL code".
And then a DECL_VAR is built via "create_tmp_reg" (actually it is not a
reg here. it built a DECL_VAR with RECORD type and BLK mode).

I also tried the way to use RESULT_DECL for this kind of type, or
let aggregate_value_p accept this kind of type.  But it seems not easy,
because " (RESULT_DECL with PARALLEL)" is not ok for address
operations.


>
>> >> +   }
>> >> +}
>> >> +
>> >> /* Set TREE_USED on all variables in the local_decls.  */
>> >> FOR_EACH_LOCAL_DECL (cfun, i, var)
>> >>   TREE_USED (var) = 1;
>> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> >> index d9407432ea5..20973649963 100644
>> >> --- a/gcc/expr.cc
>> >> +++ b/gcc/expr.cc
>> >> @@ -6045,6 +6045,52 @@ expand_assignment (tree to, tree from, bool 
>> >> nontemporal)
>> >> return;
>> >>   }
>
> I miss an explanatory comment here on that the following is heuristics
> and its reasoning.
>
>> >>   +  if ((TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from)
>> >> +   && TYPE_MODE (TREE_TYPE (from)) == BLKmode
>
> Why check TYPE_MODE here?  Do you want AGGREGATE_TYPE_P on the type
> instead?
Checking BLK, because I want make sure the param should occur on
register and stack (saved from register).
Actualy, my intention is checking:
GET_MODE (DECL_INCOMING_RTL (from)) == BLKmode

For code:
GET_MODE (DECL_INCOMING_RTL (from)) == BLKmode
&& (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL
|| REG_P (DECL_INCOMING_RTL (from)))
This checking indicates if the param may be passing via 2 or more
registers.

Using "AGGREGATE_TYPE_P && (PARALLEL || REG_P)" may be ok and more
readable. I would have a test.

>
>> >> +   && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL
>> >> +|| REG_P (DECL_INCOMING_RTL (from
>> >> +  || (VAR_P (to) && DECL_USEDBY_RETURN_P (to)
>> >> +   && TYPE_MODE (TREE_TYPE (to)) == BLKmode
>
> Likewise.
>
>> >> +   && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl)))
>> >> +== PARALLEL))
>
> Not REG_P here?
REG_P with BLK on return would 

Re: PING^2 [PATCH] Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]

2022-11-24 Thread Kewen.Lin via Gcc-patches
Hi Richard,

on 2022/11/23 00:08, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi Richard,
>>
>> Many thanks for your review comments!
>>
> on 2022/8/24 16:17, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> As discussed in PR98125, -fpatchable-function-entry with
>> SECTION_LINK_ORDER support doesn't work well on powerpc64
>> ELFv1 because the filled "Symbol" in
>>
>>   .section name,"flags"o,@type,Symbol
>>
>> sits in .opd section instead of in the function_section
>> like .text or named .text*.
>>
>> Since we already generates one label LPFE* which sits in
>> function_section of current_function_decl, this patch is
>> to reuse it as the symbol for the linked_to section.  It
>> avoids the above ABI specific issue when using the symbol
>> concluded from current_function_decl.
>>
>> Besides, with this support some previous workarounds for
>> powerpc64 ELFv1 can be reverted.
>>
>> btw, rs6000_print_patchable_function_entry can be dropped
>> but there is another rs6000 patch which needs this rs6000
>> specific hook rs6000_print_patchable_function_entry, not
>> sure which one gets landed first, so just leave it here.
>>
>> Bootstrapped and regtested on below:
>>
>>   1) powerpc64-linux-gnu P8 with default binutils 2.27
>>  and latest binutils 2.39.
>>   2) powerpc64le-linux-gnu P9 (default binutils 2.30).
>>   3) powerpc64le-linux-gnu P10 (default binutils 2.30).
>>   4) x86_64-redhat-linux with default binutils 2.30
>>  and latest binutils 2.39.
>>   5) aarch64-linux-gnu  with default binutils 2.30
>>  and latest binutils 2.39.
>>
>>
>> [snip...]
>>
>> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
>> index 4db8506b106..d4de6e164ee 100644
>> --- a/gcc/varasm.cc
>> +++ b/gcc/varasm.cc
>> @@ -6906,11 +6906,16 @@ default_elf_asm_named_section (const char *name, 
>> unsigned int flags,
>>  fprintf (asm_out_file, ",%d", flags & SECTION_ENTSIZE);
>>if (flags & SECTION_LINK_ORDER)
>>  {
>> -  tree id = DECL_ASSEMBLER_NAME (decl);
>> -  ultimate_transparent_alias_target (&id);
>> -  const char *name = IDENTIFIER_POINTER (id);
>> -  name = targetm.strip_name_encoding (name);
>> -  fprintf (asm_out_file, ",%s", name);
>> +  /* For now, only section "__patchable_function_entries"
>> + adopts flag SECTION_LINK_ORDER, internal label LPFE*
>> + was emitted in default_print_patchable_function_entry,
>> + just place it here for linked_to section.  */
>> +  gcc_assert (!strcmp (name, "__patchable_function_entries"));
>>>
>>> I like the idea of removing the rs600 workaround in favour of making the
>>> target-independent more robust.  But this seems a bit hackish.  What
>>> would we do if SECTION_LINK_ORDER was used for something else in future?
>>>
>>
>> Good question!  I think it depends on how we can get the symbol for the
>> linked_to section, if adopting the name of the decl will suffer the
>> similar issue which this patch wants to fix, we have to reuse the label
>> LPFE* or some kind of new artificial label in the related section; or
>> we can just go with the name of the given decl, or something related to
>> that decl.  Since we can't predict any future uses, I just placed an
>> assertion here to ensure that we would revisit and adjust this part at
>> that time.  Does it sound reasonable to you?
> 
> Yeah, I guess that's good enough.  If the old scheme ends up being
> correct for some future use, we can make the new behaviour conditional
> on __patchable_function_entries.

Yes, we can check if the given section name is
"__patchable_function_entries".

> 
> So yeah, the patch LGTM to me, thanks.

Thanks again!  I rebased and re-tested it on x86/aarch64/powerpc64{,le},
just committed in r13-4294-gf120196382ac5a.

BR,
Kewen


[PATCH] [OpenMP] GC unused SIMD clones

2022-11-24 Thread Sandra Loosemore

This patch is a followup to my not-yet-reviewed patch

[PATCH v4] OpenMP: Generate SIMD clones for functions with "declare target"

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606218.html

In comments on a previous iteration of that patch, I was asked to do 
something to delete unused SIMD clones to avoid code bloat; this is it.


I've implemented something like a simple mark-and-sweep algorithm. 
Clones that are used are marked at the point where the call is generated 
in the vectorizer.  The loop that iterates over functions to apply the 
passes after IPA is modified to defer processing of unmarked clones, and 
anything left over is deleted.


OK to commit this along with the above-linked patch?

-SandraFrom bfffcea926d4dfb6275346237c61922a95c9e715 Mon Sep 17 00:00:00 2001
From: Sandra Loosemore 
Date: Wed, 23 Nov 2022 23:14:31 +
Subject: [PATCH] [OpenMP] GC unused SIMD clones

SIMD clones are created during the IPA phase when it is not known whether
or not the vectorizer can use them.  Clones for functions with external
linkage are part of the ABI, but local clones can be GC'ed if no calls are
found in the compilation unit after vectorization.

gcc/ChangeLog
	* cgraph.h (struct cgraph_node): Add gc_candidate bit, modify
	default constructor to initialize it.
	* cgraphunit.cc (expand_all_functions): Save gc_candidate functions
	for last and iterate to handle recursive calls.  Delete leftover
	candidates at the end.
	* omp-simd-clone.cc (simd_clone_create): Set gc_candidate bit
	on local clones.
	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Clear
	gc_candidate bit when a clone is used.

gcc/testsuite/ChangeLog
	* testsuite/g++.dg/gomp/target-simd-clone-1.C: Tweak to test
	that the unused clone is GC'ed.
	* testsuite/gcc.dg/gomp/target-simd-clone-1.c: Likewise.
---
 gcc/cgraph.h  |  7 ++-
 gcc/cgraphunit.cc | 49 ---
 gcc/omp-simd-clone.cc |  5 ++
 .../g++.dg/gomp/target-simd-clone-1.C |  7 ++-
 .../gcc.dg/gomp/target-simd-clone-1.c |  6 ++-
 gcc/tree-vect-stmts.cc|  3 ++
 6 files changed, 66 insertions(+), 11 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 4be67e3cea9..b065677a8d0 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -891,7 +891,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
   versionable (false), can_change_signature (false),
   redefined_extern_inline (false), tm_may_enter_irr (false),
   ipcp_clone (false), declare_variant_alt (false),
-  calls_declare_variant_alt (false), m_uid (uid), m_summary_id (-1)
+  calls_declare_variant_alt (false), gc_candidate (false),
+  m_uid (uid), m_summary_id (-1)
   {}
 
   /* Remove the node from cgraph and all inline clones inlined into it.
@@ -1490,6 +1491,10 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
   unsigned declare_variant_alt : 1;
   /* True if the function calls declare_variant_alt functions.  */
   unsigned calls_declare_variant_alt : 1;
+  /* True if the function should only be emitted if it is used.  This flag
+ is set for local SIMD clones when they are created and cleared if the
+ vectorizer uses them.  */
+  unsigned gc_candidate : 1;
 
 private:
   /* Unique id of the node.  */
diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc
index b05d790bf8d..587daf5674e 100644
--- a/gcc/cgraphunit.cc
+++ b/gcc/cgraphunit.cc
@@ -1996,19 +1996,52 @@ expand_all_functions (void)
 
   /* Output functions in RPO so callees get optimized before callers.  This
  makes ipa-ra and other propagators to work.
- FIXME: This is far from optimal code layout.  */
-  for (i = new_order_pos - 1; i >= 0; i--)
-{
-  node = order[i];
+ FIXME: This is far from optimal code layout.
+ Make multiple passes over the list to defer processing of gc
+ candidates until all potential uses are seen.  */
+  int gc_candidates = 0;
+  int prev_gc_candidates = 0;
 
-  if (node->process)
+  while (1)
+{
+  for (i = new_order_pos - 1; i >= 0; i--)
 	{
-	  expanded_func_count++;
-	  node->process = 0;
-	  node->expand ();
+	  node = order[i];
+
+	  if (node->gc_candidate)
+	gc_candidates++;
+	  else if (node->process)
+	{
+	  expanded_func_count++;
+	  node->process = 0;
+	  node->expand ();
+	}
 	}
+  if (!gc_candidates || gc_candidates == prev_gc_candidates)
+	break;
+  prev_gc_candidates = gc_candidates;
+  gc_candidates = 0;
 }
 
+  /* Free any unused gc_candidate functions.  */
+  if (gc_candidates)
+for (i = new_order_pos - 1; i >= 0; i--)
+  {
+	node = order[i];
+	if (node->gc_candidate)
+	  {
+	struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+	if (symtab->dump_file)
+	  fprintf (symtab->dump_file,
+		   "Deleting unused function %s\n",
+		   IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl)));
+	n

[committed] libstdc++: Change return type of std::bit_width to int (LWG 3656)

2022-11-24 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* doc/html/manual/bugs.html: Regenerate.
* doc/xml/manual/intro.xml: Document LWG 3656 change.
* include/std/bit (__bit_width, bit_width): Return int.
* testsuite/26_numerics/bit/bit.pow.two/lwg3656.cc: New test.
---
 libstdc++-v3/doc/html/manual/bugs.html|  4 
 libstdc++-v3/doc/xml/manual/intro.xml |  7 +++
 libstdc++-v3/include/std/bit  |  6 --
 .../26_numerics/bit/bit.pow.two/lwg3656.cc| 15 +++
 4 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/26_numerics/bit/bit.pow.two/lwg3656.cc

diff --git a/libstdc++-v3/doc/html/manual/bugs.html 
b/libstdc++-v3/doc/html/manual/bugs.html
index 58600cd6ede..c4a2c26ea39 100644
--- a/libstdc++-v3/doc/html/manual/bugs.html
+++ b/libstdc++-v3/doc/html/manual/bugs.html
@@ -619,4 +619,8 @@
path::lexically_relative is confused by trailing slashes

 Implement the fix for trailing slashes.
+http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#3656"; 
target="_top">3656:
+   Inconsistent bit operations returning a count
+   
+Changed bit_width to return 
int.
 Prev Up NextLicense Home Chapter 2. 
Setup
\ No newline at end of file
diff --git a/libstdc++-v3/doc/xml/manual/intro.xml 
b/libstdc++-v3/doc/xml/manual/intro.xml
index dee01c82159..aee96e37c61 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -1308,6 +1308,13 @@ requirements of the license of GCC.
 Implement the fix for trailing slashes.
 
 
+http://www.w3.org/1999/xlink"; xlink:href="&DR;#3656">3656:
+   Inconsistent bit operations returning a count
+   
+
+Changed bit_width to return int.
+
+
   
 
  
diff --git a/libstdc++-v3/include/std/bit b/libstdc++-v3/include/std/bit
index 2fd80187210..3e072ef2113 100644
--- a/libstdc++-v3/include/std/bit
+++ b/libstdc++-v3/include/std/bit
@@ -361,7 +361,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-constexpr _Tp
+constexpr int
 __bit_width(_Tp __x) noexcept
 {
   constexpr auto _Nd = __gnu_cxx::__int_traits<_Tp>::__digits;
@@ -448,9 +448,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 bit_floor(_Tp __x) noexcept
 { return std::__bit_floor(__x); }
 
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3656. Inconsistent bit operations returning a count
   /// The smallest integer greater than the base-2 logarithm of `x`.
   template
-constexpr _If_is_unsigned_integer<_Tp>
+constexpr _If_is_unsigned_integer<_Tp, int>
 bit_width(_Tp __x) noexcept
 { return std::__bit_width(__x); }
 
diff --git a/libstdc++-v3/testsuite/26_numerics/bit/bit.pow.two/lwg3656.cc 
b/libstdc++-v3/testsuite/26_numerics/bit/bit.pow.two/lwg3656.cc
new file mode 100644
index 000..4752c3b1d33
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/bit/bit.pow.two/lwg3656.cc
@@ -0,0 +1,15 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+
+#include 
+
+template constexpr bool is_int = false;
+template<> constexpr bool is_int = true;
+
+// LWG 3656. Inconsistent bit operations returning a count
+// Rturn type of std::bit_width(T) changed from T to int.
+static_assert( is_int );
+static_assert( is_int );
+static_assert( is_int );
+static_assert( is_int );
+static_assert( is_int );
-- 
2.38.1



[committed] libstdc++: Update tests on trunk [PR106201]

2022-11-24 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

This copies the better tests from gcc-12 to trunk.

libstdc++-v3/ChangeLog:

PR libstdc++/106201
* testsuite/27_io/filesystem/iterators/106201.cc: Improve test.
* testsuite/experimental/filesystem/iterators/106201.cc: New test.
---
 .../testsuite/27_io/filesystem/iterators/106201.cc |  8 +---
 .../experimental/filesystem/iterators/106201.cc| 14 ++
 2 files changed, 19 insertions(+), 3 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc

diff --git a/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc
index 4a64e675816..c5fefd9ac3f 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc
@@ -5,8 +5,10 @@
 // PR libstdc++/106201 constraint recursion in path(Source const&) constructor.
 
 #include 
-#include 
-using I = std::counted_iterator;
+#include 
+#include 
+namespace fs = std::filesystem;
+using I = std::counted_iterator;
 static_assert( std::swappable );
-using R = std::counted_iterator;
+using R = std::counted_iterator;
 static_assert( std::swappable );
diff --git a/libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc 
b/libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc
new file mode 100644
index 000..017b72ef5f6
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc
@@ -0,0 +1,14 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+// { dg-require-filesystem-ts "" }
+
+// PR libstdc++/106201 constraint recursion in path(Source const&) constructor.
+
+#include 
+#include 
+#include 
+namespace fs = std::experimental::filesystem;
+using I = std::counted_iterator;
+static_assert( std::swappable );
+using R = std::counted_iterator;
+static_assert( std::swappable );
-- 
2.38.1



[PATCH] gcc/jit/jit-recording.cc: recording::global::write_to_dump: Avoid crashes when writing psuedo-C for globals with string initializers.

2022-11-24 Thread Vibhav Pant via Gcc-patches
If a char * global was initialized with a rvalue from
`gcc_jit_context_new_string_literal` containing a format string,
dumping the context causes libgccjit to SIGSEGV due to an improperly
constructed call to vasprintf. The following code snippet can reproduce
the crash:

int main(int argc, char **argv)
{
 gcc_jit_context *ctxt = gcc_jit_context_acquire ();
 gcc_jit_lvalue *var = gcc_jit_context_new_global(
 ctxt, NULL, GCC_JIT_GLOBAL_EXPORTED,
 gcc_jit_context_get_type(ctxt, GCC_JIT_TYPE_CONST_CHAR_PTR),
 "var");
 gcc_jit_global_set_initializer_rvalue(
 var, gcc_jit_context_new_string_literal(ctxt, "%s"));
 gcc_jit_context_dump_to_file (ctxt, "output", 0);
 return 0;
}

The offending line is jit-recording.cc:4922, where a call to d.write
passes the initializer rvalue's debug string to `write` without a
format specifier. The attached patch fixes this issue.

Thanks,
Vibhav
-- 
Vibhav Pant
vibh...@gmail.com
GPG: 7ED1 D48C 513C A024 BE3A 785F E3FB 28CB 6AB5 9598

From e598a4076b2bff72b4a3cc29d1d70db8c53baf45 Mon Sep 17 00:00:00 2001
From: Vibhav Pant 
Date: Fri, 25 Nov 2022 02:02:09 +0530
Subject: [PATCH] jit-recording.cc: Dump string literal initializers correctly

---
 gcc/jit/jit-recording.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
index 6ae5a667e90..7bb98ddcb42 100644
--- a/gcc/jit/jit-recording.cc
+++ b/gcc/jit/jit-recording.cc
@@ -4919,7 +4919,7 @@ recording::global::write_to_dump (dump &d)
   else if (m_rvalue_init)
 {
   d.write (" = ");
-  d.write (m_rvalue_init->get_debug_string ());
+  d.write ("%s", m_rvalue_init->get_debug_string ());
   d.write (";\n");
 }
 
-- 
2.38.1



signature.asc
Description: This is a digitally signed message part


Re: [PATCH] Fortran: error recovery on associate with bad selector [PR107577]

2022-11-24 Thread Thomas Koenig via Gcc-patches

Hi Harald,


please find attached an obvious patch by Steve for a technical
regression that resulted from improvements in error recovery
of bad uses of associate.

Regtested on x86_64-pc-linux-gnu.

Will commit soon unless there are comments.


Obvious enough, I think.  Thanks!


As a sidenote: the testcase shows that we resolve the associate
names quite often, likely more often than necessary, resulting
in many error messages produced for the same line of code.  In
the present case, each use of the bad name produces two errors,
one where it is used, and one at the associate statement.
That is probably not helpful for the user.


We have an "error" flag in gfc_expr, which we use infrequently to
avoid repetitions of errors.  If an error has already been issued
for an expression, then we could set this.  We would have to be careful
about resetting the error on matching though, so it is probably better
to only use it during resolution.

Best regards

Thomas



Re: [PATCH]AArch64 sve2: Fix expansion of division [PR107830]

2022-11-24 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Wednesday, November 23, 2022 4:18 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov 
>> Subject: Re: [PATCH]AArch64 sve2: Fix expansion of division [PR107830]
>> 
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > SVE has an actual division optab, and when using -Os we don't optimize
>> > the division away.  This means that we need to distinguish between a
>> > div which we can optimize and one we cannot even during expansion.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >PR target/107830
>> >* config/aarch64/aarch64.cc
>> >(aarch64_vectorize_can_special_div_by_constant): Check validity
>> during
>> >codegen phase as well.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >PR target/107830
>> >* gcc.target/aarch64/sve2/pr107830.c: New test.
>> >
>> > --- inline copy of patch --
>> > diff --git a/gcc/config/aarch64/aarch64.cc
>> > b/gcc/config/aarch64/aarch64.cc index
>> >
>> 4176d7b046a126664360596b6db79a43e77ff76a..bee23625807af95d5ec15ad45
>> 702
>> > 961b2d7ab55d 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -24322,12 +24322,15 @@
>> aarch64_vectorize_can_special_div_by_constant (enum tree_code code,
>> >if ((flags & VEC_ANY_SVE) && !TARGET_SVE2)
>> >  return false;
>> >
>> > +  wide_int val = wi::add (cst, 1);
>> > +  int pow = wi::exact_log2 (val);
>> > +  bool valid_p = pow == (int)(element_precision (vectype) / 2);
>> > +  /* SVE actually has a div operator, we we may have gotten here through
>> > + that route.  */
>> >if (in0 == NULL_RTX && in1 == NULL_RTX)
>> > -{
>> > -  wide_int val = wi::add (cst, 1);
>> > -  int pow = wi::exact_log2 (val);
>> > -  return pow == (int)(element_precision (vectype) / 2);
>> > -}
>> > +return valid_p;
>> > +  else if (!valid_p)
>> > +return false;
>> 
>> Is this equivalent to:
>> 
>>   int pow = wi::exact_log2 (cst + 1);
>>   if (pow != (int) (element_precision (vectype) / 2))
>> return false;
>> 
>>   /* We can use the optimized pattern.  */
>>   if (in0 == NULL_RTX && in1 == NULL_RTX)
>> return true;
>> 
>> ?  If so, I'd find that slightly easier to follow, but I realise it's 
>> personal taste.
>> OK with that change if it works and you agree.
>> 
>> While looking at this, I noticed that we ICE for:
>> 
>>   void f(unsigned short *restrict p1, unsigned int *restrict p2)
>>   {
>> for (int i = 0; i < 16; ++i)
>>   {
>> p1[i] /= 0xff;
>> p2[i] += 1;
>>   }
>>   }
>> 
>> for -march=armv8-a+sve2 -msve-vector-bits=512.  I guess we need to filter
>> out partial modes or (better) add support for them.  Adding support for them
>> probably requires changes to the underlying ADDHNB pattern.
>
> I've prevented the ice by checking if the expansion for the mode exists. I'd 
> like to
> defer adding partial support because when I tried I had to modify some 
> iterators
> as well and need to check that it's safe to do so.

Sounds good.

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR target/107830
>   * config/aarch64/aarch64.cc
>   (aarch64_vectorize_can_special_div_by_constant): Check validity during
>   codegen phase as well.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/107830
>   * gcc.target/aarch64/sve2/pr107830-1.c: New test.
>   * gcc.target/aarch64/sve2/pr107830-2.c: New test.
>
> --- inline copy of patch 
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 4176d7b046a126664360596b6db79a43e77ff76a..02aa1f34ac6155b877340d788c6d151b7c8d8bcd
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -24322,12 +24322,18 @@ aarch64_vectorize_can_special_div_by_constant (enum 
> tree_code code,
>if ((flags & VEC_ANY_SVE) && !TARGET_SVE2)
>  return false;
>  
> +  wide_int val = wi::add (cst, 1);
> +  int pow = wi::exact_log2 (val);

Does the:

  int pow = wi::exact_log2 (cst + 1);

I suggested above not work?  That seems easier to read IMO, since there
are no other uses of "val".

> +  auto insn_code = maybe_code_for_aarch64_bitmask_udiv3 (TYPE_MODE 
> (vectype));
> +  /* SVE actually has a div operator, we may have gotten here through
> + that route.  */
> +  if (pow != (int)(element_precision (vectype) / 2)

Formatting nit: should be a space after "(int)".

OK with those changes, thanks.

Richard

> +  || insn_code == CODE_FOR_nothing)
> +return false;
> +
> +  /* We can use the optimized pattern.  */
>if (in0 == NULL_RTX && in1 == NULL_RTX)
> -{
> -  wide_int val = wi::add (cst, 1);
> -  int pow = wi::exact_log2 (val);
> -  ret

RE: [PATCH]AArch64 sve2: Fix expansion of division [PR107830]

2022-11-24 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Wednesday, November 23, 2022 4:18 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH]AArch64 sve2: Fix expansion of division [PR107830]
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > SVE has an actual division optab, and when using -Os we don't optimize
> > the division away.  This means that we need to distinguish between a
> > div which we can optimize and one we cannot even during expansion.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR target/107830
> > * config/aarch64/aarch64.cc
> > (aarch64_vectorize_can_special_div_by_constant): Check validity
> during
> > codegen phase as well.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/107830
> > * gcc.target/aarch64/sve2/pr107830.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index
> >
> 4176d7b046a126664360596b6db79a43e77ff76a..bee23625807af95d5ec15ad45
> 702
> > 961b2d7ab55d 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -24322,12 +24322,15 @@
> aarch64_vectorize_can_special_div_by_constant (enum tree_code code,
> >if ((flags & VEC_ANY_SVE) && !TARGET_SVE2)
> >  return false;
> >
> > +  wide_int val = wi::add (cst, 1);
> > +  int pow = wi::exact_log2 (val);
> > +  bool valid_p = pow == (int)(element_precision (vectype) / 2);
> > +  /* SVE actually has a div operator, we we may have gotten here through
> > + that route.  */
> >if (in0 == NULL_RTX && in1 == NULL_RTX)
> > -{
> > -  wide_int val = wi::add (cst, 1);
> > -  int pow = wi::exact_log2 (val);
> > -  return pow == (int)(element_precision (vectype) / 2);
> > -}
> > +return valid_p;
> > +  else if (!valid_p)
> > +return false;
> 
> Is this equivalent to:
> 
>   int pow = wi::exact_log2 (cst + 1);
>   if (pow != (int) (element_precision (vectype) / 2))
> return false;
> 
>   /* We can use the optimized pattern.  */
>   if (in0 == NULL_RTX && in1 == NULL_RTX)
> return true;
> 
> ?  If so, I'd find that slightly easier to follow, but I realise it's 
> personal taste.
> OK with that change if it works and you agree.
> 
> While looking at this, I noticed that we ICE for:
> 
>   void f(unsigned short *restrict p1, unsigned int *restrict p2)
>   {
> for (int i = 0; i < 16; ++i)
>   {
> p1[i] /= 0xff;
> p2[i] += 1;
>   }
>   }
> 
> for -march=armv8-a+sve2 -msve-vector-bits=512.  I guess we need to filter
> out partial modes or (better) add support for them.  Adding support for them
> probably requires changes to the underlying ADDHNB pattern.

I've prevented the ice by checking if the expansion for the mode exists. I'd 
like to
defer adding partial support because when I tried I had to modify some iterators
as well and need to check that it's safe to do so.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/107830
* config/aarch64/aarch64.cc
(aarch64_vectorize_can_special_div_by_constant): Check validity during
codegen phase as well.

gcc/testsuite/ChangeLog:

PR target/107830
* gcc.target/aarch64/sve2/pr107830-1.c: New test.
* gcc.target/aarch64/sve2/pr107830-2.c: New test.

--- inline copy of patch 

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
4176d7b046a126664360596b6db79a43e77ff76a..02aa1f34ac6155b877340d788c6d151b7c8d8bcd
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -24322,12 +24322,18 @@ aarch64_vectorize_can_special_div_by_constant (enum 
tree_code code,
   if ((flags & VEC_ANY_SVE) && !TARGET_SVE2)
 return false;
 
+  wide_int val = wi::add (cst, 1);
+  int pow = wi::exact_log2 (val);
+  auto insn_code = maybe_code_for_aarch64_bitmask_udiv3 (TYPE_MODE (vectype));
+  /* SVE actually has a div operator, we may have gotten here through
+ that route.  */
+  if (pow != (int)(element_precision (vectype) / 2)
+  || insn_code == CODE_FOR_nothing)
+return false;
+
+  /* We can use the optimized pattern.  */
   if (in0 == NULL_RTX && in1 == NULL_RTX)
-{
-  wide_int val = wi::add (cst, 1);
-  int pow = wi::exact_log2 (val);
-  return pow == (int)(element_precision (vectype) / 2);
-}
+return true;
 
   if (!VECTOR_TYPE_P (vectype))
return false;
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr107830-1.c 
b/gcc/testsuite/gcc.target/aarch64/sve2/pr107830-1.c
new file mode 100644
index 
..6d8ee3615fdb0083dbde1e45a2826fb681726139
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr107830-1.c
@@ -0,0 +1,13 @@
+/* { dg-do c

[Patch] libgomp: Add no-target-region rev offload test + fix plugin-nvptx

2022-11-24 Thread Tobias Burnus

The nvptx reverse-offload code mishandled the case that there was a reverse
offload function that isn't called inside a target region. In that case,
the linker did not include GOMP_target_ext and the global variable it uses.
But the plugin-nvptx.c code expected that the latter is present.

Found via sollve_vv's tests/5.0/requires/test_requires_reverse_offload.c which 
is
similar to the new testcase. (Albeit the 'if' and comments imply that the 
sollve_vv
author did not intend this.)

Solution: Handle it gracefully that the global variable does not exist - and
do this check first - and only when successful allocate dev->rev_data. If not,
deallocate rev_fn_table to disable reverse offload handling.

OK for mainline?

Tobias

PS: Admittedly, the nvptx code is not yet exercised as I still have to submit 
the
libgomp/target.c code handling the reverse offload (+ enabling requires 
reverse_offload
in plugin-nvptx.c). As it is obvious from this patch, the target.c patch is 
nearly but
not yet completely ready. - That patch passes the three sollve_vv testcases and 
also
the existing libgomp testcases, but some corner cases and more testcases are 
missing.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp: Add no-target-region rev offload test + fix plugin-nvptx

OpenMP permits that a 'target device(ancestor:1)' is called without being
enclosed in a target region - using the current device (i.e. the host) in
that case.  This commit adds a testcase for this.

In case of nvptx, the missing on-device 'GOMP_target_ext' call causes that
it and also the associated on-device GOMP_REV_OFFLOAD_VAR variable are not
linked in from nvptx's libgomp.a. Thus, handle the failing cuModuleGetGlobal
gracefully by disabling reverse offload and assuming that the failure is fine.

libgomp/ChangeLog:

	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Use unsigned int
	for 'i' to match 'fn_entries'; regard absent GOMP_REV_OFFLOAD_VAR
	as valid and the code having no reverse-offload code.
	* testsuite/libgomp.c-c++-common/reverse-offload-2.c: New test.

 libgomp/plugin/plugin-nvptx.c  | 36 ++--
 .../libgomp.c-c++-common/reverse-offload-2.c   | 49 ++
 2 files changed, 73 insertions(+), 12 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 0768fca350b..e803f083591 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1390,7 +1390,8 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   else if (rev_fn_table)
 {
   CUdeviceptr var;
-  size_t bytes, i;
+  size_t bytes;
+  unsigned int i;
   r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, &var, &bytes, module,
 			 "$offload_func_table");
   if (r != CUDA_SUCCESS)
@@ -1413,12 +1414,11 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 
   if (rev_fn_table && *rev_fn_table && dev->rev_data == NULL)
 {
-  /* cuMemHostAlloc memory is accessible on the device, if unified-shared
-	 address is supported; this is assumed - see comment in
-	 nvptx_open_device for CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING.   */
-  CUDA_CALL_ASSERT (cuMemHostAlloc, (void **) &dev->rev_data,
-			sizeof (*dev->rev_data), CU_MEMHOSTALLOC_DEVICEMAP);
-  CUdeviceptr dp = (CUdeviceptr) dev->rev_data;
+  /* Get the on-device GOMP_REV_OFFLOAD_VAR variable.  It should be
+	 available but it might be not.  One reason could be: if the user code
+	 has 'omp target device(ancestor:1)' in pure hostcode, GOMP_target_ext
+	 is not called on the device and, hence, it and GOMP_REV_OFFLOAD_VAR
+	 are not linked in.  */
   CUdeviceptr device_rev_offload_var;
   size_t device_rev_offload_size;
   CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal,
@@ -1426,11 +1426,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   &device_rev_offload_size, module,
   XSTRING (GOMP_REV_OFFLOAD_VAR));
   if (r != CUDA_SUCCESS)
-	GOMP_PLUGIN_fatal ("cuModuleGetGlobal error - GOMP_REV_OFFLOAD_VAR: %s", cuda_error (r));
-  r = CUDA_CALL_NOCHECK (cuMemcpyHtoD, device_rev_offload_var, &dp,
-			 sizeof (dp));
-  if (r != CUDA_SUCCESS)
-	GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r));
+	{
+	  free (*rev_fn_table);
+	  *rev_fn_table = NULL;
+	}
+  else
+	{
+	  /* cuMemHostAlloc memory is accessible on the device, if
+	 unified-shared address is supported; this is assumed - see comment
+	 in nvptx_open_device for CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING. */
+	  CUDA_CALL_ASSERT (cuMemHostAlloc, (void **) &dev->rev_data,
+			sizeof (*dev->rev_data), CU_MEMHOSTALLOC_DEVICEMAP);
+	  CUdeviceptr dp = (CUdeviceptr) dev->rev

Re: [Patch Arm] Fix PR 92999

2022-11-24 Thread Richard Earnshaw via Gcc-patches




On 11/11/2022 21:50, Ramana Radhakrishnan via Gcc-patches wrote:

On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan
 wrote:


On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
 wrote:




On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:



On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:

PR92999 is a case where the VFP calling convention does not allocate
enough FP registers for a homogenous aggregate containing FP16 values.
I believe this is the complete fix but would appreciate another set of
eyes on this.

Could I get a hand with a regression test run on an armhf environment
while I fix my environment ?

gcc/ChangeLog:

PR target/92999
*  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
aggregates with elements smaller than SFmode.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr92999.c: New test.


Thanks,
Ramana

Signed-off-by: Ramana Radhakrishnan 


I'm not sure about this.  The AAPCS does not mention a base type of a
half-precision FP type as an appropriate homogeneous aggregate for using
VFP registers for either calling or returning.


Ooh interesting, thanks for taking a look and poking at the AAPCS and
that's a good catch. BF16 should also have the same behaviour as FP16
, I suspect ?


I suspect I got caught out by the definition of the Homogenous
aggregate from Section 5.3.5
((https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates)
which simply suggests it's an aggregate of fundamental types which
lists half precision floating point .


A homogeneous aggregate is any aggregate that fits the general 
definition, but only HAs of specific types are of interest for the VFP 
PCS rules.


The problem we have is that when we added HFmode (and later BF16mode) 
support we didn't notice that the base types are VFP candidates, but the 
nested types (eg in records or arrays) are not.


The problems started around SVN r236269 (git:1b81a1c1bd53) when we added 
FP16 support.





FTR, ideally I should have read 7.1.2.1
https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling)
:)







So perhaps the bug is that we try to treat this as a homogeneous
aggregate at all.


Yep I agree - I'll take a look again tomorrow and see if I can get a fix.

(And thanks Alex for the test run, I might trouble you again while I
still (slowly) get some of my boards back up)



and as promised take 2. I'd really prefer another review on this one
to see if I've not missed anything in the cases below.


I think I'd prefer to try and fix this at the point where we accept the 
base types, ie around:


case REAL_TYPE:
  mode = TYPE_MODE (type);
  if (mode != DFmode && mode != SFmode && mode != HFmode && mode != 
BFmode)

return -1;

by changing this to something like

/* HFmode and BFmode can be passed in registers, but are not valid
   base types for an HFA, so only accept these if we are at the top
   level.  */
if (!(mode == DFmode || mode == SFmode
  || (depth == 0
  && (mode == HFmode || mode == BFmode)))
   return -1;

and we then pass depth into the recursion calls as an extra parameter, 
starting at 0 for the top level and incrementing it by 1 each time 
aapcs_vfp_sub_candidate recurses.


For the test, would it be possible to rewrite it in the style of 
gcc.target/arm/aapcs/* and put it there? That would ensure that not only 
are the caller and callee compatible, but that the values are passed in 
the correct location.


R.



RE: [PATCH 35/35 V2] arm: improve tests for vsetq_lane*

2022-11-24 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 24, 2022 2:44 PM
> To: Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> 
> Subject: [PATCH 35/35 V2] arm: improve tests for vsetq_lane*
> 
> Kyrylo Tkachov  writes:
> 
> [...]
> 
> >> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> >> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> >> index e03e9620528..b5c9f4d5eb8 100644
> >> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> >> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> >> @@ -1,15 +1,45 @@
> >> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } 
> >> {""} }
> */
> >>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> >>  /* { dg-add-options arm_v8_1m_mve_fp } */
> >>  /* { dg-additional-options "-O2" } */
> >> +/* { dg-final { check-function-bodies "**" "" } } */
> >>
> >>  #include "arm_mve.h"
> >>
> >> +/*
> >> +**foo:
> >> +**...
> >> +**vmov.16 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
> >> +**...
> >> +*/
> >>  float16x8_t
> >>  foo (float16_t a, float16x8_t b)
> >>  {
> >> -return vsetq_lane_f16 (a, b, 0);
> >> +  return vsetq_lane_f16 (a, b, 1);
> >>  }
> >>
> >
> > Hmm, for these tests we should be able to scan for more specific codegen
> as we're setting individual lanes, so we should be able to scan for lane 1 in
> the vmov instruction, though it may need to be flipped for big-endian.
> > Thanks,
> > Kyrill
> 
> Hi Kyrill,
> 
> please find attached the updated version of this patch.
> 
> Big-endian should not be a problem as for my understanding is just not
> supported with MVE intrinsics.

Huh, that's right.
This version is ok.
Thanks!
Kyrill

> 
> Thanks!
> 
>   Andrea



Re: [PATCH] Make Warray-bounds alias to Warray-bounds= [PR107787]

2022-11-24 Thread Iskander Shakirzyanov via Gcc-patches
>> How did you test the patch? If you bootstrapped it and ran the
>> testsuite then it's OK.
Yes, i ran testsuite and  bootstrapped and everything seemed OK, but i missed 
fail of tests gcc.dg/Warray-bounds-34.c and gcc.dg/Warray-bounds-43.c, so Franz 
is right. After that I fixed the regexps in dg directives and now everything 
seems OK.  

> I'm pretty sure the testsuite will have regressions, as I have a very similar 
> patch lying around that needs these testsuite changes.

You are right, thank you. I missed this, attaching corrected version of patch.

> This also shows nicely why I don't like warnings with levels, what if I want 
> -Werror=array-bounds=2 + -Warray-bounds=1?

I completely agree with you, because I also thought about using -Werror=opt=X + 
-Wopt=Y, this functionality looks useful. As I know, gcc, while parsing an 
option with the same OPT,  overwrites the old config of OPT. 

> Because I think at least -Wuse-after-free= and Wattributes= have the same 
> problem.

Yes, looks like this, probably should be fixed too.  

> BTW, is the duplicated warning description "Warn if an array is accessed out 
> of bounds." needed or not with Alias()?

According to other examples in common.opt, duplicated description is not 
necessary, you are right.

> I've attached my patch, feel free to integrate the testsuite changes.

Thanks, but it seems to me that duplicating existing tests seems redundant to 
test functionality of -Werror=array-bounds=X.


From bf047e36392dab138db10be2ec257d08c376ada5 Mon Sep 17 00:00:00 2001
From: Iskander Shakirzyanov 
Date: Thu, 24 Nov 2022 14:26:59 +
Subject: [PATCH] Make Warray-bounds alias to Warray-bounds= [PR107787]

According to documentation the -Werror= option makes the specified warning
into an error and also automatically implies this option. Then it seems that
the behavior of the compiler when specifying -Werror=array-bounds=X should be
the same as specifying "-Werror=array-bounds -Warray-bounds=X", so we expect to
receive array-bounds pass triggers and they must be processed as errors.
In practice, we observe that the array-bounds pass is indeed called, but
its responses are processed as warnings, not errors.
As I understand, this happens because Warray-bounds and Warray-bounds= are
declared as 2 different options in common.opt, so when
diagnostic_classify_diagnostic() is called, DK_ERROR is set for
the Warray-bounds= option, but in diagnostic_report_diagnostic() through
warning_at() passes opt_index of Warray-bounds, so information about
DK_ERROR is lost. Fixed by using Alias() in declaration of
Warray-bounds (similarly as in Wattribute-alias etc.)

PR driver/107787

Co-authored-by: Franz Sirl 

gcc/ChangeLog:

* common.opt (Warray-bounds): Turn into alias to
-Warray-bounds=1.
* builtins.cc (warn_array_bounds): Use OPT_Warray_bounds_
instead of OPT_Warray_bounds.
* diagnostic-spec.cc: Likewise.
* gimple-array-bounds.cc: Likewise.
* gimple-ssa-warn-restrict.cc: Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/Warray-bounds-34.c: Correct the regular
expression for -Warray-bounds=.
* gcc.dg/Warray-bounds-43.c: Likewise.
* gcc.dg/pr107787.c: New test.

gcc/c-family/ChangeLog:

* c-common.cc (warn_array_bounds): Use OPT_Warray_bounds_
instead of OPT_Warray_bounds.
---
 gcc/builtins.cc |  6 ++--
 gcc/c-family/c-common.cc|  4 +--
 gcc/common.opt  |  3 +-
 gcc/diagnostic-spec.cc  |  1 -
 gcc/gimple-array-bounds.cc  | 38 -
 gcc/gimple-ssa-warn-restrict.cc |  2 +-
 gcc/testsuite/gcc.dg/Warray-bounds-34.c |  2 +-
 gcc/testsuite/gcc.dg/Warray-bounds-43.c |  6 ++--
 gcc/testsuite/gcc.dg/pr107787.c | 13 +
 9 files changed, 43 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr107787.c

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 4dc1ca672b2..02c4fefa86f 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -696,14 +696,14 @@ c_strlen (tree arg, int only_value, c_strlen_data *data, 
unsigned eltsize)
 {
   /* Suppress multiple warnings for propagated constant strings.  */
   if (only_value != 2
- && !warning_suppressed_p (arg, OPT_Warray_bounds)
- && warning_at (loc, OPT_Warray_bounds,
+ && !warning_suppressed_p (arg, OPT_Warray_bounds_)
+ && warning_at (loc, OPT_Warray_bounds_,
 "offset %qwi outside bounds of constant string",
 eltoff))
{
  if (decl)
inform (DECL_SOURCE_LOCATION (decl), "%qE declared here", decl);
- suppress_warning (arg, OPT_Warray_bounds);
+ suppress_warning (arg, OPT_Warray_bounds_);
}
   return NULL_TREE;
 }
diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 6f1f21bc4c1..b0da6886ccf 100644
--- a/gcc/c-family/c-com

[PATCH 35/35 V2] arm: improve tests for vsetq_lane*

2022-11-24 Thread Andrea Corallo via Gcc-patches
Kyrylo Tkachov  writes:

[...]

>> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
>> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
>> index e03e9620528..b5c9f4d5eb8 100644
>> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
>> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
>> @@ -1,15 +1,45 @@
>> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } 
>> {""} } */
>>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>>  /* { dg-add-options arm_v8_1m_mve_fp } */
>>  /* { dg-additional-options "-O2" } */
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> 
>>  #include "arm_mve.h"
>> 
>> +/*
>> +**foo:
>> +**  ...
>> +**  vmov.16 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
>> +**  ...
>> +*/
>>  float16x8_t
>>  foo (float16_t a, float16x8_t b)
>>  {
>> -return vsetq_lane_f16 (a, b, 0);
>> +  return vsetq_lane_f16 (a, b, 1);
>>  }
>> 
>
> Hmm, for these tests we should be able to scan for more specific codegen as 
> we're setting individual lanes, so we should be able to scan for lane 1 in 
> the vmov instruction, though it may need to be flipped for big-endian.
> Thanks,
> Kyrill

Hi Kyrill,

please find attached the updated version of this patch.

Big-endian should not be a problem as for my understanding is just not
supported with MVE intrinsics.

Thanks!

  Andrea

>From 79f2c990553a1f793e08b9a0c4abb7dae8de7120 Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Thu, 17 Nov 2022 11:06:29 +0100
Subject: [PATCH] arm: improve tests for vsetq_lane*

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c: Improve test.
* gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vsetq_lane_f16.c   | 36 +++--
 .../arm/mve/intrinsics/vsetq_lane_f32.c   | 36 +++--
 .../arm/mve/intrinsics/vsetq_lane_s16.c   | 24 ++--
 .../arm/mve/intrinsics/vsetq_lane_s32.c   | 24 ++--
 .../arm/mve/intrinsics/vsetq_lane_s64.c   | 27 ++---
 .../arm/mve/intrinsics/vsetq_lane_s8.c| 24 ++--
 .../arm/mve/intrinsics/vsetq_lane_u16.c   | 36 +++--
 .../arm/mve/intrinsics/vsetq_lane_u32.c   | 36 +++--
 .../arm/mve/intrinsics/vsetq_lane_u64.c   | 39 ---
 .../arm/mve/intrinsics/vsetq_lane_u8.c| 36 +++--
 10 files changed, 284 insertions(+), 34 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
index e03e9620528..6b148a4b03d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
@@ -1,15 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} 
} */
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+** ...
+** vmov.16 q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
 float16x8_t
 foo (float16_t a, float16x8_t b)
 {
-return vsetq_lane_f16 (a, b, 0);
+  return vsetq_lane_f16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.16"  }  } */
 
+/*
+**foo1:
+** ...
+** vmov.16 q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
+float16x8_t
+foo1 (float16_t a, float16x8_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/*
+**foo2:
+** ...
+** vmov.16 q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
+float16x8_t
+foo2 (float16x8_t b)
+{
+  return vsetq_lane (1.1, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
index 2b9f1a7e627..e4e7f892e97 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
@@ -1,15 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} 
} */
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check

Re: [PATCH v2 16/19] modula2 front end: bootstrap and documentation tools

2022-11-24 Thread Gaius Mulley via Gcc-patches
Martin Liška  writes:

> On 11/8/22 14:22, Gaius Mulley wrote:
>> Martin Liška  writes:
>> 
>> should be good - I'll complete the rst output in the scripts,
>
> Hi.
>

Hi Martin,

> As you probably noticed, the Sphinx migration didn't go well.

Yes, sorry to see this didn't happen.  Thank you for your hard work and
I hope it can occur in the future.

> However, it's still up to you if you want to use it or not for Modula
> 2.

Once modula-2 is in master I'd like to revisit rst in devel/modula-2
along with analyzer patches and m2 generics.  If successful then submit
patches in early stage 1.

> We have manuals like libgccjit, or Ada manuals
> that use RST natively and provide exported .texi files.

Ok thanks for the pointers, I will experiment with these build rhunes.

> Cheers and sorry for the troubles I caused.

No problem at all - the modula-2 scripts are now improved and cleaner
due to the port.  Hopefully rst will happen sometime in the future,

regards,
Gaius


Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices

2022-11-24 Thread Marcel Vollweiler

Hi Jakub,


> * testsuite/libgomp.c-c++-common/icv-4.c: Bugfix.

Better say what exactly you changed in words.


Changed.


> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq
*pre_p)
>struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
>
>if (teams == NULL_TREE)
> -num_teams_upper = integer_one_node;
> +num_teams_upper = build_int_cst (integer_type_node, -2);
>else
>  for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c))
>{

The function comment above optimize_target_teams contains detailed description
on what the values mean and why, so it definitely should document what -2 means
and when it is used.
I know you have documentation in libgomp for it, but it should be in both 
places.


I updated the comment with an explanation for "-2".



> +  intptr_t new_teams = orig_teams, new_threads = orig_threads;
> +  /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1.

Two spaces after .


Corrected here and at other places.



> + ORIG_TEAMS == -1: TEAMS construct with NUM_TEAMS clause specified, but
the
> +  value could not be specified. No Change.

Likewise.
lowercase change ?


Corrected.



> + ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause.
> + Set device-specific value.
> + ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause.
> +No change.  */
> +  if (orig_teams == -2)
> +new_teams = 1;
> +  else if (orig_teams == 0)
> +{
> +  struct gomp_offload_icv_list *item = gomp_get_offload_icv_item 
(device);
> +  if (item != NULL)
> +   new_teams = item->icvs.nteams;
> +}
> +  /* The device-specific teams-thread-limit is only set if (a) an explicit 
TEAMS
> + region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already 
set by
> + e.g. a THREAD_LIMIT clause.  */
> +  if (orig_teams >= -2 && orig_threads == 0)

The comment talks about ORIG_TEAMS > -2, but the condition is >= -2.
So which one is it?


Thanks for the hint. It should be indeed "> -2" since teams_thread_limit "sets
the maximum number of OpenMP threads to use in each contention group created by
a teams construct" (OpenMP 5.2, section 21.6.2). So if there is no (explicit)
teams construct, then teams_thread_limit doesn't need to be copied to the 
device.



> +  /* This tests a large number of teams and threads. If it is larger than
> +2^15+1 then the according argument in the kernels arguments list
> +is encoded with two items instead of one. On NVIDIA there is an
> +adjustment for too large teams and threads. For AMD such adjustment
> +exists only for threads and will cause runtime errors with a two
> +large

s/two/too/ ?
Shouldn't amdgcn adjusts also number of teams?


I adjusted now also the number of teams in the amdgcn plugin. As upper bound I
chose two times the number of compute units. This seems to be sufficient when
one team is executed at one compute unit. This at least avoids the queueing of a
large amount of teams and the corresponding memory allocation.

The drawback is that a user is probably not aware of the actual number of
compute units (which is not very large on gfx cards, e.g. 120 for gfx908 and 104
for gfx90a) and thus maybe expects different values from omp_get_team_num(). For
instance in something like the following:

#pragma omp target
#pragma omp teams num_teams(220)
#pragma omp distribute parallel for
  for(int i = 0; i < 220; ++i)
{
#pragma omp critical
   ... omp_get_team_num () ...
}

On a gfx90a card with 104 compute units 12 threads are assigned to "reused"
teams (instead of having their own teams) that would not be the case without the
limit.

Alternatively, we could just define some (larger) constant number (though I
don't know a reasonable value here). But this does actually not solve the above
mentioned drawback. I think, we need to find a compromise between an
unneccessary small upper bound and the chance to get memory allocation failures
due to a too large number of teams.



As for testcases, have you tested this in a native setup where 
dg-set-target-env-var
actually works?


Besides remote testing with offloading (which does not yet work with
dg-set-target-env-var), I also tested locally on x86_64-pc-linux-gnu with one
nvptx offload device without issues (using "make check" and verifying that
offloading is indeed used).

Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order t

Re: [Patch Arm] Fix PR 92999

2022-11-24 Thread Ramana Radhakrishnan
Ping x 2

Ramana

On Thu, 17 Nov 2022, 20:15 Ramana Radhakrishnan, 
wrote:

> On Fri, Nov 11, 2022 at 9:50 PM Ramana Radhakrishnan
>  wrote:
> >
> > On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan
> >  wrote:
> > >
> > > On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
> > >  wrote:
> > > >
> > > >
> > > >
> > > > On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
> > > > >
> > > > >
> > > > > On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> > > > >> PR92999 is a case where the VFP calling convention does not
> allocate
> > > > >> enough FP registers for a homogenous aggregate containing FP16
> values.
> > > > >> I believe this is the complete fix but would appreciate another
> set of
> > > > >> eyes on this.
> > > > >>
> > > > >> Could I get a hand with a regression test run on an armhf
> environment
> > > > >> while I fix my environment ?
> > > > >>
> > > > >> gcc/ChangeLog:
> > > > >>
> > > > >> PR target/92999
> > > > >> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to
> handle
> > > > >> aggregates with elements smaller than SFmode.
> > > > >>
> > > > >> gcc/testsuite/ChangeLog:
> > > > >>
> > > > >> * gcc.target/arm/pr92999.c: New test.
> > > > >>
> > > > >>
> > > > >> Thanks,
> > > > >> Ramana
> > > > >>
> > > > >> Signed-off-by: Ramana Radhakrishnan 
> > > > >
> > > > > I'm not sure about this.  The AAPCS does not mention a base type
> of a
> > > > > half-precision FP type as an appropriate homogeneous aggregate for
> using
> > > > > VFP registers for either calling or returning.
> > >
> > > Ooh interesting, thanks for taking a look and poking at the AAPCS and
> > > that's a good catch. BF16 should also have the same behaviour as FP16
> > > , I suspect ?
> >
> > I suspect I got caught out by the definition of the Homogenous
> > aggregate from Section 5.3.5
> > ((
> https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates
> )
> > which simply suggests it's an aggregate of fundamental types which
> > lists half precision floating point .
> >
> > FTR, ideally I should have read 7.1.2.1
> >
> https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling
> )
> > :)
> >
> >
> >
> > >
> > > > >
> > > > > So perhaps the bug is that we try to treat this as a homogeneous
> > > > > aggregate at all.
> > >
> > > Yep I agree - I'll take a look again tomorrow and see if I can get a
> fix.
> > >
> > > (And thanks Alex for the test run, I might trouble you again while I
> > > still (slowly) get some of my boards back up)
> >
> >
> > and as promised take 2. I'd really prefer another review on this one
> > to see if I've not missed anything in the cases below.
>
> Ping  ?
>
> Ramana
>
> >
> > regards
> > Ramana
> >
> >
> > >
> > > regards,
> > > Ramana
> > >
> > >
> > > >
> > > > R.
>


Re: [PATCH][AArch64] Cleanup move immediate code

2022-11-24 Thread Richard Sandiford via Gcc-patches
Sorry for the very long delay in reviewing this.

Wilco Dijkstra  writes:
> Hi Richard,
>
> Here is the immediate cleanup splitoff from the previous patch:
>
> Simplify, refactor and improve various move immediate functions.
> Allow 32-bit MOVZ/N as a valid 64-bit immediate which removes special
> cases in aarch64_internal_mov_immediate.  Add new constraint so the movdi
> pattern only needs a single alternative for move immediate.

Just to make sure I understand: isn't it really just MOVN?  I would have
expected a 32-bit MOVZ to be equivalent to (and add no capabilities over)
a 64-bit MOVZ.

> Passes bootstrap and regress, OK for commit?
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64.cc (aarch64_bitmask_imm): Use unsigned type.
> (aarch64_zeroextended_move_imm): New function.
> (aarch64_move_imm): Refactor, assert mode is SImode or DImode.
> (aarch64_internal_mov_immediate): Assert mode is SImode or DImode.
> Simplify special cases.
> (aarch64_uimm12_shift): Simplify code.
> (aarch64_clamp_to_uimm12_shift): Likewise.
> (aarch64_movw_imm): Remove.
> (aarch64_float_const_rtx_p): Pass either SImode or DImode to
> aarch64_internal_mov_immediate.
> (aarch64_rtx_costs): Likewise.
> * config/aarch64/aarch64.md (movdi_aarch64): Merge 'N' and 'M'
> constraints into single 'O'.
> (mov_aarch64): Likewise.
> * config/aarch64/aarch64-protos.h (aarch64_move_imm): Use unsigned.
> (aarch64_bitmask_imm): Likewise.
> (aarch64_uimm12_shift): Likewise.
> (aarch64_zeroextended_move_imm): New prototype.
> * config/aarch64/constraints.md: Add 'O' for 32/64-bit immediates,
> limit 'N' to 64-bit only moves.
>
> ---
>
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 1a71f02284137c64e7115b26e6aa00447596f105..a73bfa20acb9b92ae0475794c3f11c67d22feb97
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -755,7 +755,7 @@ void aarch64_post_cfi_startproc (void);
>  poly_int64 aarch64_initial_elimination_offset (unsigned, unsigned);
>  int aarch64_get_condition_code (rtx);
>  bool aarch64_address_valid_for_prefetch_p (rtx, bool);
> -bool aarch64_bitmask_imm (HOST_WIDE_INT val, machine_mode);
> +bool aarch64_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode);
>  unsigned HOST_WIDE_INT aarch64_and_split_imm1 (HOST_WIDE_INT val_in);
>  unsigned HOST_WIDE_INT aarch64_and_split_imm2 (HOST_WIDE_INT val_in);
>  bool aarch64_and_bitmask_imm (unsigned HOST_WIDE_INT val_in, machine_mode 
> mode);
> @@ -792,7 +792,7 @@ bool aarch64_masks_and_shift_for_bfi_p (scalar_int_mode, 
> unsigned HOST_WIDE_INT,
> unsigned HOST_WIDE_INT,
> unsigned HOST_WIDE_INT);
>  bool aarch64_zero_extend_const_eq (machine_mode, rtx, machine_mode, rtx);
> -bool aarch64_move_imm (HOST_WIDE_INT, machine_mode);
> +bool aarch64_move_imm (unsigned HOST_WIDE_INT, machine_mode);
>  machine_mode aarch64_sve_int_mode (machine_mode);
>  opt_machine_mode aarch64_sve_pred_mode (unsigned int);
>  machine_mode aarch64_sve_pred_mode (machine_mode);
> @@ -842,8 +842,9 @@ bool aarch64_sve_float_arith_immediate_p (rtx, bool);
>  bool aarch64_sve_float_mul_immediate_p (rtx);
>  bool aarch64_split_dimode_const_store (rtx, rtx);
>  bool aarch64_symbolic_address_p (rtx);
> -bool aarch64_uimm12_shift (HOST_WIDE_INT);
> +bool aarch64_uimm12_shift (unsigned HOST_WIDE_INT);
>  int aarch64_movk_shift (const wide_int_ref &, const wide_int_ref &);
> +bool aarch64_zeroextended_move_imm (unsigned HOST_WIDE_INT);
>  bool aarch64_use_return_insn_p (void);
>  const char *aarch64_output_casesi (rtx *);
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 5d1ab5aa42b2cda0a655d2bc69c4df19da457ab3..798363bcc449c414de5bbb4f26b8e1c64a0cf71a
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -5558,12 +5558,10 @@ aarch64_bitmask_imm (unsigned HOST_WIDE_INT val)
>
>  /* Return true if VAL is a valid bitmask immediate for MODE.  */
>  bool
> -aarch64_bitmask_imm (HOST_WIDE_INT val_in, machine_mode mode)
> +aarch64_bitmask_imm (unsigned HOST_WIDE_INT val, machine_mode mode)
>  {
>if (mode == DImode)
> -return aarch64_bitmask_imm (val_in);
> -
> -  unsigned HOST_WIDE_INT val = val_in;
> +return aarch64_bitmask_imm (val);
>
>if (mode == SImode)
>  return aarch64_bitmask_imm ((val & 0x) | (val << 32));
> @@ -5602,51 +5600,60 @@ aarch64_check_bitmask (unsigned HOST_WIDE_INT val,
>  }
>
>
> -/* Return true if val is an immediate that can be loaded into a
> -   register by a MOVZ instruction.  */
> -static bool
> -aarch64_movw_imm (HOST_WIDE_INT val, scalar_int_mode mode)
> +/* Return true if immediate VAL can only be created by using a 32-bit
> +   zero-extended move immediate, 

RE: [PATCH 2/2]AArch64 Support new tbranch optab.

2022-11-24 Thread Tamar Christina via Gcc-patches
Hi,

I had a question and I figured I'd easier to ask before I spend more time 
implementing it 😊

I had noticed that one of the other reasons that cbranch and the other optabs 
like cmov explicitly
emit the compare separately and use combine to match up the final form is for 
ifcvt.

In particular by expanding tbranch directly to the final RTL we lose some ifcvt 
because there are
no patterns that can handle the new zero_extract idiom.

So the three solutions I can think of are:

1. Don't expand tbranch to its final form immediately, but still use 
zero_extract.  This regresses -O0. (but do we care?)
2. Expand tbranch with vec_extract and provide new zero_extract based rtl 
sequences for ifcvt.
 I currently tried this, and while it works, I don't fully trust the RTL.  
In particular unlike say, combine
 Ifcvt doesn't allow me to add an extra clobber to say that CC Is clobbered 
by the pattern.  Now tbranch
 Itself also expands a clobber, so the RTL isn't wrong even after ifcvt, 
but I'm worried that the pattern can
 Be idiom recognized and then no clobber could be present.  I could modify 
the recog code in ifcvt to try to
 Ignore clobbers during matching.
3.  I could expand using AND instead of zero_extract instead.   We have more 
patterns handling AND, but I'm not
 Sure If this will fix the problem entirely, but in principle could expand 
to what ANDS generates and recog that instead.
This shouldn't regress -O0 as we wouldn't put a zero_extract explicitly in 
RTL (and we already have a pattern for ANDS).

What do you think? I personally favor 3.. 

Thanks,
Tamar

> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, November 22, 2022 2:00 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> ; nd ; Marcus Shawcroft
> 
> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> 
> Tamar Christina  writes:
> >> -Original Message-
> >> From: Richard Sandiford 
> >> Sent: Tuesday, November 15, 2022 11:34 AM
> >> To: Tamar Christina 
> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> ; nd ; Marcus Shawcroft
> >> 
> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >>
> >> Tamar Christina  writes:
> >> >> -Original Message-
> >> >> From: Richard Sandiford 
> >> >> Sent: Tuesday, November 15, 2022 11:15 AM
> >> >> To: Tamar Christina 
> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> >> ; nd ; Marcus
> Shawcroft
> >> >> 
> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >> >>
> >> >> Tamar Christina  writes:
> >> >> >> -Original Message-
> >> >> >> From: Richard Sandiford 
> >> >> >> Sent: Tuesday, November 15, 2022 10:51 AM
> >> >> >> To: Tamar Christina 
> >> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> >> >> ; nd ; Marcus
> >> Shawcroft
> >> >> >> 
> >> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >> >> >>
> >> >> >> Tamar Christina  writes:
> >> >> >> >> -Original Message-
> >> >> >> >> From: Richard Sandiford 
> >> >> >> >> Sent: Tuesday, November 15, 2022 10:36 AM
> >> >> >> >> To: Tamar Christina 
> >> >> >> >> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> >> >> >> >> ; nd ; Marcus
> >> >> Shawcroft
> >> >> >> >> 
> >> >> >> >> Subject: Re: [PATCH 2/2]AArch64 Support new tbranch optab.
> >> >> >> >>
> >> >> >> >> Tamar Christina  writes:
> >> >> >> >> > Hello,
> >> >> >> >> >
> >> >> >> >> > Ping and updated patch.
> >> >> >> >> >
> >> >> >> >> > Bootstrapped Regtested on aarch64-none-linux-gnu and no
> >> issues.
> >> >> >> >> >
> >> >> >> >> > Ok for master?
> >> >> >> >> >
> >> >> >> >> > Thanks,
> >> >> >> >> > Tamar
> >> >> >> >> >
> >> >> >> >> > gcc/ChangeLog:
> >> >> >> >> >
> >> >> >> >> > * config/aarch64/aarch64.md (*tb1):
> >> >> >> >> > Rename
> >> >> to...
> >> >> >> >> > (*tb1): ... this.
> >> >> >> >> > (tbranch4): New.
> >> >> >> >> >
> >> >> >> >> > gcc/testsuite/ChangeLog:
> >> >> >> >> >
> >> >> >> >> > * gcc.target/aarch64/tbz_1.c: New test.
> >> >> >> >> >
> >> >> >> >> > --- inline copy of patch ---
> >> >> >> >> >
> >> >> >> >> > diff --git a/gcc/config/aarch64/aarch64.md
> >> >> >> >> > b/gcc/config/aarch64/aarch64.md index
> >> >> >> >> >
> >> >> >> >>
> >> >> >>
> >> >>
> >>
> 2bc2684b82c35a44e0a2cea6e3aaf32d939f8cdf..d7684c93fba5b717d568e1a4fd
> >> >> >> >> 71
> >> >> >> >> > 2bde55c7c72e 100644
> >> >> >> >> > --- a/gcc/config/aarch64/aarch64.md
> >> >> >> >> > +++ b/gcc/config/aarch64/aarch64.md
> >> >> >> >> > @@ -943,12 +943,29 @@ (define_insn "*cb1"
> >> >> >> >> >   (const_int 1)))]
> >> >> >> >> >  )
> >> >> >> >> >
> >> >> >> >> > -(define_insn "*tb1"
> >> >> >> >> > +(define_expand "tbranch4"
> >> >> >> >> >[(set (pc) (if_then_else
> >> >> >> >> > - (EQL (zero_extract:DI (match_operand:GPI 0
> >> >> "register_operand"
> >> >> >> >> "r")
> >> >> >> >> > -   (const_int 1)
> >> >> >> >> 

OpenMP Patch Ping

2022-11-24 Thread Tobias Burnus

Updated list as follow up to last ping at 
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601162.html


Recent patches:

Sandra's (Tue Nov 15 04:46:15 GMT 2022)
[PATCH v4] OpenMP: Generate SIMD clones for functions with "declare target"
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606218.html


Julian's patches - I hope I got it right as I lost a bit track:

(Tue Nov 8 14:36:17 GMT 2022)
[PATCH v2 06/11] OpenMP: lvalue parsing for map clauses (C++)
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605367.html

(Fri Sep 30 13:30:22 GMT 2022)
[PATCH v3 06/11] OpenMP: Pointers and member mappings
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602609.html

(Tue Oct 18 10:39:01 GMT 2022)
[PATCH v5 0/4] OpenMP/OpenACC: Fortran array descriptor mappings
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/thread.html#603790
(I think this is partially my task to review those.)


Approved but waiting for the Fortran patches (v5) to get approved.
[PATCH v3 08/11] OpenMP/OpenACC: Rework clause expansion and nested struct 
handling
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/602010.html


Possibly requiring a second look/review despite my initial comment
(which might require revisions on the patch side as well):
OpenMP: Duplicate checking for map clauses in Fortran (PR107214)
https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604033.html


Older patches:

* [PATCH 00/17] openmp, nvptx, amdgcn: 5.0 Memory Allocators
  https://gcc.gnu.org/pipermail/gcc-patches/2022-July/597976.html
  * Unified-Shared Memory & Pinned Memory

Depending on those:

* [PATCH] OpenMP, libgomp: Handle unified shared memory in 
omp_target_is_accessible.
  https://gcc.gnu.org/pipermail/gcc-patches/2022-May/594187.html

* [PATCH, OpenMP, Fortran] requires unified_shared_memory 1/2: adjust 
libgfortran memory allocators
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599703.html
  (Fortran part, required for ...)
* Re: [PATCH, OpenMP, Fortran] requires unified_shared_memory 2/2: insert USM 
allocators into libgfortran
  https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601059.html

And finally:

* [PATCH, libgomp] Fix chunk_size<1 for dynamic schedule
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599332.html
(Side remark: some other debugging support like showing the mapping being done 
as stderr output or ... would be nice as well; might depend on a 
libgomp-debug.so and/or -f...(sanitize=openmp or ...); the other open-source 
compiler has something similar.)


 * * *


Pending libgomp/nvptx patches:

(Wed Sep 21 07:45:36 GMT 2022)
[PATCH, nvptx, 1/2] Reimplement libgomp barriers for nvptx
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601922.html

(Wed Sep 21 07:45:54 GMT 2022)
[PATCH, nvptx, 2/2] Reimplement libgomp barriers for nvptx: bar.red instruction 
support in GCC
https://gcc.gnu.org/pipermail/gcc-patches/2022-September/601925.html

Those were pinged 4 times :-(


Hopefully, I have not missed any patch

Tobias


PS: The following list covers pending patches, which have been reviewed but
but need to updated before being ready - hopefully, this list is also up to 
date:

* (No pending patch, but wwwdoc's changes-13.html + projects/gomp/ need an 
update before GCC 13)

* [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices
Should be re-submitted any time soon (today, next few days)

* [Patch] OpenMP/Fortran: Permit end-clause on directive
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600433.html
  Trivial patch modifications required - mostly LGTM already.

* [PATCH] libgomp: fix hang on fatal error
  https://gcc.gnu.org/pipermail/gcc-patches/2022-October/603616.html
(Patch rejected but alternative solutions were suggested.)

* Re: [Patch] OpenMP/Fortran: Use firstprivat not alloc for ptr attach for 
arrays
(Committed but failing occasionally:)
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605854.html

* "[PATCH 3/3] vect: inbranch SIMD clones"
  https://gcc.gnu.org/pipermail/gcc-patches/2022-August/599490.html
  Review comments to be addressed.

* [PATCH 0/5] [gfortran] Support for allocate directive (OpenMP 5.0)
  https://gcc.gnu.org/pipermail/gcc-patches/2022-January/588367.html

* [PATCH] openmp: fix max_vf setting for amdgcn offloading
  https://gcc.gnu.org/pipermail/gcc-patches/2022-July/598265.html
→ To be updated for review comments.
(Side note: we should at some point find a way to improve target-specific
handling; similar to the are-exceptions-supported issue of PR101544 but
there are more.)

* [PATCH, OpenMP, v4] Implement uses_allocators clause for target regions
  https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596587.html
  * Needs to be revised according to review comments

* Fortran allocatable components handling (needs to be split into separate 
pieces and submitted
  separately)
  https://gcc.gnu.org/pipermail/gcc-patches/2022-April/59370

Re: [PATCH v2 16/19] modula2 front end: bootstrap and documentation tools

2022-11-24 Thread Martin Liška
On 11/8/22 14:22, Gaius Mulley wrote:
> Martin Liška  writes:
> 
>> 1) I would prefer using ' instead of ":
>>
>> $ flake8 ./gcc/m2/tools-src/tidydates.py
>> ...
>> ./gcc/m2/tools-src/tidydates.py:124:30: Q000 Double quotes found but single 
>> quotes preferred
>> ./gcc/m2/tools-src/tidydates.py:127:27: Q000 Double quotes found but single 
>> quotes preferred
>> ./gcc/m2/tools-src/tidydates.py:132:27: Q000 Double quotes found but single 
>> quotes preferred
>> ./gcc/m2/tools-src/tidydates.py:133:33: Q000 Double quotes found but single 
>> quotes preferred
>> ./gcc/m2/tools-src/tidydates.py:138:26: Q000 Double quotes found but single 
>> quotes preferred
>> ./gcc/m2/tools-src/tidydates.py:143:28: Q000 Double quotes found but
>> single quotes preferred
> 
> ah yes will switch the quotes character.
> 
>> 2) Python-names would be nicer:
>>
>> def writeTemplate(fo, magic, start, end, dates, contribution, summary,
>> lic):
>>
>> def write_template(...)
> 
> agreed, will change
> 
>> 3) def hasExt(name, ext) - please use Path from pathlib
>>
>> 4) while (str.find(line, "(*") != -1):
>>
>> '(*' in line
>> ? Similarly elsewhere.
>>
>> 5) str.find(line, ...)
>>
>> Use rather directly: line.find(...)
>>
>> 6) please use flake8:
>> https://gcc.gnu.org/codingconventions.html#python
> 
> sure will do all above - I used flake8 but maybe the plugins weren't
> enabled.  I'll try flake8 on tumbleweed.
> 
>> Thanks,
>> Martin
>>
>> P.S. I'm going to merge Sphinx branch this Wednesday, so then we should port 
>> your
>> conversion scripts to emit .rst instead of .texi.
> 
> should be good - I'll complete the rst output in the scripts,
> 
> regards,
> Gaius

Hi.

As you probably noticed, the Sphinx migration didn't go well. However, it's 
still up to you
if you want to use it or not for Modula 2. We have manuals like libgccjit, or 
Ada manuals
that use RST natively and provide exported .texi files.

Cheers and sorry for the troubles I caused.

Martin


Re: [PATCH v2] [x86] Fix incorrect _mm_cvtsbh_ss.

2022-11-24 Thread Hongtao Liu via Gcc-patches
On Thu, Nov 24, 2022 at 4:53 PM Jakub Jelinek  wrote:
>
> On Thu, Nov 24, 2022 at 09:22:00AM +0800, liuhongt via Gcc-patches wrote:
> > --- a/gcc/config/i386/i386.md
> > +++ b/gcc/config/i386/i386.md
> > @@ -130,6 +130,7 @@ (define_c_enum "unspec" [
> >;; For AVX/AVX512F support
> >UNSPEC_SCALEF
> >UNSPEC_PCMP
> > +  UNSPEC_CVTBFSF
> >
> >;; Generic math support
> >UNSPEC_IEEE_MIN; not commutative
> > @@ -4961,6 +4962,31 @@ (define_insn "*extendhf2"
> > (set_attr "prefix" "evex")
> > (set_attr "mode" "")])
> >
> > +(define_expand "extendbfsf2"
> > +  [(set (match_operand:SF 0 "register_operand")
> > + (unspec:SF
> > +   [(match_operand:BF 1 "register_operand")]
> > +  UNSPEC_CVTBFSF))]
> > + "TARGET_SSE2 && !HONOR_NANS (BFmode) && !flag_signaling_nans")
>
> I think if !HONOR_NANS (BFmode), then flag_signaling_nans doesn't matter,
> the former says that no NaNs may appear in a valid program,
> so just testing !HONOR_NANS (BFmode) should be enough.
I'll remove flag_signaling_nans.
>
> What I'm not sure about, my memory is weak, is whether one can
> safely use the fast math related tests in define_expand conditions.
> I vaguely remember init_all_optabs remembers the conditions, for
> changes say in the ISA options optabs are reinited, but not sure if
> that happens for optimization option changes like the fast math related
> options are.  So it would be perhaps safer to use just TARGET_SSE2
> as the expand condition and in the C code body do
> if (HONOR_NANS (BFmode) FAIL;
> (similarly for truncsfbf2).
> On the other side brief look at x86 insn-flags.h shows several fast math
> related checks in HAVE_* macros.
> PR92791 I found related to this was actually about
Oh, good to know that, thanks.
> optimize_function_for_{size,speed}_p (cfun)
> so maybe fast math related stuff is fine, just not the optimization for
> speed or size.
I saw many backends(riscv,rs6000,mips,loongarch) already used HONOR_*
stuff in the expander conditions.
>
> Jakub
>


-- 
BR,
Hongtao


[committed] c++: Further -fcontract* option description fixes

2022-11-24 Thread Jakub Jelinek via Gcc-patches
On Tue, Nov 22, 2022 at 10:09:06AM -0500, Jason Merrill wrote:
> > Though, shall we have those [on|off] options at all?
> > Those are inconsistent with all other boolean options gcc has.
> > Every other boolean option is -fwhatever for it being on
> > and -fno-whatever for it being off, shouldn't the options be
> > without arguments and accept negatives (-fcontract-assumption-mode
> > vs. -fno-contract-assumption-mode etc.)?
> 
> True, but I think let's leave them alone for now, they'll probably all be
> replaced as the feature specification evolves.

If we don't want to support them for too long, another possibility
would be to use params for those instead of normal options,
params are understood to be more volatile than normal options and
can be removed easily (while for normal options we typically
keep them but error or them or silently ignore depending on what
the option is about).

Anyway, during testing I've missed my previous patch just changed:
-FAIL: compiler driver --help=c++ option(s): "^ +-.*[^:.]\$" absent from 
output: "  -fcontract-build-level=[off|default|audit] Specify max contract 
level to generate runtime checks for"
+FAIL: compiler driver --help=c++ option(s): "^ +-.*[^:.]\$" absent from 
output: "  -fcontract-role=: Specify the semantics for all 
levels in a role (default, review), or a custom contract role with given 
semantics (ex: opt:assume,assume,assume)"
rather than actually fixed it, the test only reports the first such problem.

This patch fixes the remaining ones.

Tested with make check-gcc RUNTESTFLAGS=help.exp and committed to trunk
as obvious.

2022-11-24  Jakub Jelinek  

* c.opt (fcontract-role=, fcontract-semantic=): Terminate descriptions
with a dot.

--- gcc/c-family/c.opt.jj   2022-11-23 09:29:01.083548284 +0100
+++ gcc/c-family/c.opt  2022-11-24 11:42:29.582499720 +0100
@@ -1713,11 +1713,11 @@ C++ Joined RejectNegative
 
 fcontract-role=
 C++ Joined RejectNegative
--fcontract-role=: Specify the semantics for all levels in 
a role (default, review), or a custom contract role with given semantics (ex: 
opt:assume,assume,assume)
+-fcontract-role=: Specify the semantics for all levels in 
a role (default, review), or a custom contract role with given semantics (ex: 
opt:assume,assume,assume).
 
 fcontract-semantic=
 C++ Joined RejectNegative
--fcontract-semantic=: Specify the concrete semantics for level
+-fcontract-semantic=: Specify the concrete semantics for 
level.
 
 fcoroutines
 C++ LTO Var(flag_coroutines)

Jakub



Re: [PATCH] c: Propagate erroneous types to declaration specifiers [PR107805]

2022-11-24 Thread Florian Weimer via Gcc-patches
* Jakub Jelinek:

> On Thu, Nov 24, 2022 at 11:01:40AM +0100, Florian Weimer via Gcc-patches 
> wrote:
>> * Joseph Myers:
>> 
>> > On Tue, 22 Nov 2022, Florian Weimer via Gcc-patches wrote:
>> >
>> >> Without this change, finish_declspecs cannot tell that whether there
>> >> was an erroneous type specified, or no type at all.  This may result
>> >> in additional diagnostics for implicit ints, or missing diagnostics
>> >> for multiple types.
>> >> 
>> >>   PR c/107805
>> >> 
>> >> gcc/c/
>> >>   * c-decl.cc (declspecs_add_type): Propagate error_mark_bode
>> >>   from type to specs.
>> >> 
>> >> gcc/testsuite/
>> >>   * gcc.dg/pr107805-1.c: New test.
>> >>   * gcc.dg/pr107805-1.c: Likewise.
>> >
>> > OK.
>> 
>> Thanks.  Permission to backport this to GCC 12 after a week or two?
>
> In this case I'd wait a month, it will take some time until possible
> error recovery bugs are discovered.

Okay, I have made a note to backport it in the new year.  Hopefully
any regressions will be flagged on the PR or linked to it.

Thanks,
Florian



Re: [PATCH 5/8] middle-end: Add cltz_complement idiom recognition

2022-11-24 Thread Richard Biener via Gcc-patches
On Mon, Nov 21, 2022 at 4:53 PM Andrew Carlotti  wrote:
>
> On Mon, Nov 14, 2022 at 04:10:22PM +0100, Richard Biener wrote:
> > On Fri, Nov 11, 2022 at 7:53 PM Andrew Carlotti via Gcc-patches
> >  wrote:
> > >
> > > This recognises patterns of the form:
> > > while (n) { n >>= 1 }
> > >
> > > This patch results in improved (but still suboptimal) codegen:
> > >
> > > foo (unsigned int b) {
> > > int c = 0;
> > >
> > > while (b) {
> > > b >>= 1;
> > > c++;
> > > }
> > >
> > > return c;
> > > }
> > >
> > > foo:
> > > .LFB11:
> > > .cfi_startproc
> > > cbz w0, .L3
> > > clz w1, w0
> > > tst x0, 1
> > > mov w0, 32
> > > sub w0, w0, w1
> > > cselw0, w0, wzr, ne
> > > ret
> > >
> > > The conditional is unnecessary. phiopt could recognise a redundant csel
> > > (using cond_removal_in_builtin_zero_pattern) when one of the inputs is a
> > > clz call, but it cannot recognise the redunancy when the input is (e.g.)
> > > (32 - clz).
> > >
> > > I could perhaps extend this function to recognise this pattern in a later
> > > patch, if this is a good place to recognise more patterns.
> > >
> > > gcc/ChangeLog:
> > >
> +   PR tree-optimization/94793
> > > * tree-scalar-evolution.cc (expression_expensive_p): Add checks
> > > for c[lt]z optabs.
> > > * tree-ssa-loop-niter.cc (build_cltz_expr): New.
> > > (number_of_iterations_cltz_complement): New.
> > > (number_of_iterations_bitcount): Add call to the above.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * lib/target-supports.exp (check_effective_target_clz)
> > > (check_effective_target_clzl, check_effective_target_clzll)
> > > (check_effective_target_ctz, check_effective_target_clzl)
> > > (check_effective_target_ctzll): New.
> > > * gcc.dg/tree-ssa/cltz-complement-max.c: New test.
> > > * gcc.dg/tree-ssa/clz-complement-char.c: New test.
> > > * gcc.dg/tree-ssa/clz-complement-int.c: New test.
> > > * gcc.dg/tree-ssa/clz-complement-long-long.c: New test.
> > > * gcc.dg/tree-ssa/clz-complement-long.c: New test.
> > > * gcc.dg/tree-ssa/ctz-complement-char.c: New test.
> > > * gcc.dg/tree-ssa/ctz-complement-int.c: New test.
> > > * gcc.dg/tree-ssa/ctz-complement-long-long.c: New test.
> > > * gcc.dg/tree-ssa/ctz-complement-long.c: New test.
> > >
> > >
> > > --
> > >
> > >
>
> [snip test diffs]
>
> > > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> > > index 
> > > 7e2a3e986619de87e4ae9daf16198be1f13b917c..1ac9526c69b5fe80c26022f2fa1176d222e2dfb9
> > >  100644
> > > --- a/gcc/tree-scalar-evolution.cc
> > > +++ b/gcc/tree-scalar-evolution.cc
> > > @@ -3406,12 +3406,21 @@ expression_expensive_p (tree expr, hash_map > > uint64_t> &cache,
> > >  library call for popcount when backend does not have an 
> > > instruction
> > >  to do so.  We consider this to be expensive and generate
> > >  __builtin_popcount only when backend defines it.  */
> > > +  optab optab;
> > >combined_fn cfn = get_call_combined_fn (expr);
> > >switch (cfn)
> > > {
> > > CASE_CFN_POPCOUNT:
> > > + optab = popcount_optab;
> > > + goto bitcount_call;
> > > +   CASE_CFN_CLZ:
> > > + optab = clz_optab;
> > > + goto bitcount_call;
> > > +   CASE_CFN_CTZ:
> > > + optab = ctz_optab;
> > > +bitcount_call:
> > >   /* Check if opcode for popcount is available in the mode 
> > > required.  */
> > > - if (optab_handler (popcount_optab,
> > > + if (optab_handler (optab,
> > >  TYPE_MODE (TREE_TYPE (CALL_EXPR_ARG (expr, 
> > > 0
> > >   == CODE_FOR_nothing)
> > > {
> > > @@ -3424,7 +3433,7 @@ expression_expensive_p (tree expr, hash_map > > uint64_t> &cache,
> > >  instructions.  */
> > >   if (is_a  (mode, &int_mode)
> > >   && GET_MODE_SIZE (int_mode) == 2 * UNITS_PER_WORD
> > > - && (optab_handler (popcount_optab, word_mode)
> > > + && (optab_handler (optab, word_mode)
> > >   != CODE_FOR_nothing))
> > >   break;
> > >   return true;
> > > diff --git a/gcc/tree-ssa-loop-niter.cc b/gcc/tree-ssa-loop-niter.cc
> > > index 
> > > fece876099c1687569d6351e7d2416ea6acae5b5..16e8e25919d808cea27adbd72f0be01ae2f0e547
> > >  100644
> > > --- a/gcc/tree-ssa-loop-niter.cc
> > > +++ b/gcc/tree-ssa-loop-niter.cc
> > > @@ -2198,6 +2198,195 @@ number_of_iterations_popcount (loop_p loop, edge 
> > > exit,
> > >return true;
> > >  }
> > >
> > > +/* Return an expression that counts the leading/trailing zeroes of src.  
> > > */
> >
> > Can you expand the comment on how you handle defined_at_zero and how
> > that relates to the C[LT

Re: [PATCH] c: Propagate erroneous types to declaration specifiers [PR107805]

2022-11-24 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 24, 2022 at 11:01:40AM +0100, Florian Weimer via Gcc-patches wrote:
> * Joseph Myers:
> 
> > On Tue, 22 Nov 2022, Florian Weimer via Gcc-patches wrote:
> >
> >> Without this change, finish_declspecs cannot tell that whether there
> >> was an erroneous type specified, or no type at all.  This may result
> >> in additional diagnostics for implicit ints, or missing diagnostics
> >> for multiple types.
> >> 
> >>PR c/107805
> >> 
> >> gcc/c/
> >>* c-decl.cc (declspecs_add_type): Propagate error_mark_bode
> >>from type to specs.
> >> 
> >> gcc/testsuite/
> >>* gcc.dg/pr107805-1.c: New test.
> >>* gcc.dg/pr107805-1.c: Likewise.
> >
> > OK.
> 
> Thanks.  Permission to backport this to GCC 12 after a week or two?

In this case I'd wait a month, it will take some time until possible
error recovery bugs are discovered.

Jakub



Re: [PATCH] Add a new conversion for conditional ternary set into ifcvt [PR106536]

2022-11-24 Thread Richard Biener via Gcc-patches
On Thu, Nov 24, 2022 at 8:25 AM HAO CHEN GUI  wrote:
>
> Hi Richard,
>
>
> 在 2022/11/24 4:06, Richard Biener 写道:
> > Wouldn't we usually either add an optab or try to recog a canonical
> > RTL form instead of adding a new target hook for things like this?
>
> Thanks so much for your comments. Please let me make it clear.
>
> Do you mean we should create an optab for "setb" pattern (the nested
> if-then-else insn) and detect candidate insns in ifcvt pass? Then
> generate the insn with the new optab?

Yes, that would be one way to do it.  Another way would be to
generate a (to be defined) canonical form of such instruction and
see whether it can be recognized (whether there's a define_insn
for it).

Note that were just things that came into my mind here, I'm not too
familiar with how we handle such situations but at least I'm not
aware of dozens of target hooks to handle instruction availability.

Richard.

> My concern is that some candidate insns are target specific. For
> example, different modes cause additional zero_extend or subreg insns
> generated on different targets. So I put the detection process into a
> target hook.
>
> Looking forward to your advice.
>
> Thanks again
> Gui Haochen


Re: [PATCH] asan: Fix up error recovery for too large frames [PR107317]

2022-11-24 Thread Richard Biener via Gcc-patches
On Thu, 24 Nov 2022, Jakub Jelinek wrote:

> Hi!
> 
> asan_emit_stack_protection and functions it calls have various asserts that
> verify sanity of the stack protection instrumentation.  But, that
> verification can easily fail if we've diagnosed a frame offset overflow.
> asan_emit_stack_protection just emits some extra code in the prologue,
> if we've reported errors, we aren't producing assembly, so it doesn't
> really matter if we don't include the protection code, compilation
> is going to fail anyway.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2022-11-24  Jakub Jelinek  
> 
>   PR middle-end/107317
>   * asan.cc: Include diagnostic-core.h.
>   (asan_emit_stack_protection): Return NULL early if seen_error ().
> 
>   * gcc.dg/asan/pr107317.c: New test.
> 
> --- gcc/asan.cc.jj2022-06-28 13:03:30.613693889 +0200
> +++ gcc/asan.cc   2022-11-23 17:47:09.130332461 +0100
> @@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.
>  #include "tree-inline.h"
>  #include "tree-ssa.h"
>  #include "tree-eh.h"
> +#include "diagnostic-core.h"
>  
>  /* AddressSanitizer finds out-of-bounds and use-after-free bugs
> with <2x slowdown on average.
> @@ -1818,6 +1819,11 @@ asan_emit_stack_protection (rtx base, rt
>tree str_cst, decl, id;
>int use_after_return_class = -1;
>  
> +  /* Don't emit anything when doing error recovery, the assertions
> + might fail e.g. if a function had a frame offset overflow.  */
> +  if (seen_error ())
> +return NULL;
> +
>if (shadow_ptr_types[0] == NULL_TREE)
>  asan_init_shadow_ptr_types ();
>  
> --- gcc/testsuite/gcc.dg/asan/pr107317.c.jj   2022-11-23 17:46:09.145219960 
> +0100
> +++ gcc/testsuite/gcc.dg/asan/pr107317.c  2022-11-23 17:49:45.148024097 
> +0100
> @@ -0,0 +1,13 @@
> +/* PR middle-end/107317 */
> +/* { dg-do compile { target ilp32 } } */
> +/* { dg-options "-fsanitize=address -ffat-lto-objects" } */
> +
> +void bar (float *, float *);
> +
> +void
> +foo (void)   /* { dg-error "exceeds maximum" } */
> +{
> +  float a[4];
> +  float b[2];
> +  bar (a, b);
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Frankenstrasse 146, 90461 Nuernberg,
Germany; GF: Ivo Totev, Andrew Myers, Andrew McDonald, Boudien Moerman;
HRB 36809 (AG Nuernberg)


[COMMITTED] ada: Add assertion for the implementation of storage models

2022-11-24 Thread Marc Poulhiès via Gcc-patches
From: Eric Botcazou 

We cannot generate a call to memset for an aggregate with an Others choice
when the target of the assignment has a storage model with Copy_To routine.

gcc/ada/

* gcc-interface/trans.cc (gnat_to_gnu) : Add
assertion that memset is not supposed to be used when the target has
a storage model with Copy_To routine.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/gcc-interface/trans.cc | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/ada/gcc-interface/trans.cc b/gcc/ada/gcc-interface/trans.cc
index 1cd621a9377..b9d7c015a73 100644
--- a/gcc/ada/gcc-interface/trans.cc
+++ b/gcc/ada/gcc-interface/trans.cc
@@ -7450,6 +7450,9 @@ gnat_to_gnu (Node_Id gnat_node)
  else if (Present (gnat_smo)
   && Present (Storage_Model_Copy_To (gnat_smo)))
{
+ /* We obviously cannot use memset in this case.  */
+ gcc_assert (!use_memset_p);
+
  tree t = remove_conversions (gnu_rhs, false);
 
  /* If a storage model load is present on the RHS then instantiate
-- 
2.34.1



[COMMITTED] ada: Spurious error on Lock_Free protected type with discriminants

2022-11-24 Thread Marc Poulhiès via Gcc-patches
From: Justin Squirek 

This patch corrects an issue in the compiler whereby unprefixed discriminants
appearing in protected subprograms were unable to be properly resolved -
leading to spurious resolution errors.

gcc/ada/

* sem_ch8.adb
(Find_Direct_Name): Remove bypass to reanalyze incorrectly
analyzed discriminals.
(Set_Entity_Or_Discriminal): Avoid resetting the entity field of a
discriminant reference to be the internally generated renaming
when we are in strict preanalysis mode.

Tested on x86_64-pc-linux-gnu, committed on master.

---
 gcc/ada/sem_ch8.adb | 29 -
 1 file changed, 12 insertions(+), 17 deletions(-)

diff --git a/gcc/ada/sem_ch8.adb b/gcc/ada/sem_ch8.adb
index ca306663791..841076bbd01 100644
--- a/gcc/ada/sem_ch8.adb
+++ b/gcc/ada/sem_ch8.adb
@@ -4891,6 +4891,18 @@ package body Sem_Ch8 is
  then
 null;
 
+ --  Don't replace the discriminant in strict preanalysis mode since
+ --  it can lead to errors during full analysis when the discriminant
+ --  gets referenced later.
+
+ --  This can occur in situations where a protected type contains
+ --  an expression function which references a discriminant.
+
+ elsif Preanalysis_Active
+   and then Inside_Preanalysis_Without_Freezing = 0
+ then
+null;
+
  else
 Set_Entity (N, Discriminal (E));
  end if;
@@ -6048,21 +6060,6 @@ package body Sem_Ch8 is
  if Is_Type (Entity (N)) then
 Set_Etype (N, Entity (N));
 
- --  The exception to this general rule are constants associated with
- --  discriminals of protected types because for each protected op
- --  a new set of discriminals is internally created by the frontend
- --  (see Exp_Ch9.Set_Discriminals), and the current decoration of the
- --  entity pointer may have been set as part of a preanalysis, where
- --  discriminals still reference the first subprogram or entry to be
- --  expanded (see Expand_Protected_Body_Declarations).
-
- elsif Full_Analysis
-   and then Ekind (Entity (N)) = E_Constant
-   and then Present (Discriminal_Link (Entity (N)))
-   and then Is_Protected_Type (Scope (Discriminal_Link (Entity (N
- then
-goto Find_Name;
-
  else
 declare
Entyp : constant Entity_Id := Etype (Entity (N));
@@ -6102,8 +6099,6 @@ package body Sem_Ch8 is
  return;
   end if;
 
-  <>
-
   --  Preserve relevant elaboration-related attributes of the context which
   --  are no longer available or very expensive to recompute once analysis,
   --  resolution, and expansion are over.
-- 
2.34.1



Re: [PATCH] 8/19 modula2 front end: libgm2 contents

2022-11-24 Thread Gaius Mulley via Gcc-patches
Richard Biener  writes:

> On Mon, Oct 10, 2022 at 5:35 PM Gaius Mulley via Gcc-patches
>  wrote:
>>
>>
>>
>> This patch set consists of the libgm2 makefile, autoconf sources
>> necessary to build the libm2pim, libm2iso, libm2min, libm2cor
>> and libm2log.
>
> This looks OK.  I suppose it was also tested building a cross-compiler?
>
> Can we get some up-to-date status on the build and support status for the
> list of primary and secondary platforms we list on
> https://gcc.gnu.org/gcc-13/criteria.html?

sure, here is a summary of the primary/secondary platforms, added at the
end of the page:

https://splendidisolation.ddns.net/public/modula2/patchsummary.html

regards,
Gaius


Re: [PATCH] c: Propagate erroneous types to declaration specifiers [PR107805]

2022-11-24 Thread Florian Weimer via Gcc-patches
* Joseph Myers:

> On Tue, 22 Nov 2022, Florian Weimer via Gcc-patches wrote:
>
>> Without this change, finish_declspecs cannot tell that whether there
>> was an erroneous type specified, or no type at all.  This may result
>> in additional diagnostics for implicit ints, or missing diagnostics
>> for multiple types.
>> 
>>  PR c/107805
>> 
>> gcc/c/
>>  * c-decl.cc (declspecs_add_type): Propagate error_mark_bode
>>  from type to specs.
>> 
>> gcc/testsuite/
>>  * gcc.dg/pr107805-1.c: New test.
>>  * gcc.dg/pr107805-1.c: Likewise.
>
> OK.

Thanks.  Permission to backport this to GCC 12 after a week or two?

Florian



Re: [PATCH] Make Warray-bounds alias to Warray-bounds= [PR107787]

2022-11-24 Thread Franz Sirl

Am 2022-11-23 um 21:11 schrieb Richard Biener via Gcc-patches:

On Wed, Nov 23, 2022 at 3:08 PM Iskander Shakirzyanov via Gcc-patches
 wrote:


Hi!
Sorry for the initially missing description.
The following patch changes the definition of -Warray-bounds to an Alias to 
-Warray-bounds=1. This is necessary for the correct use of 
-Werror=array-bounds=X, for more information see bug report 107787 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107787)
As I understand, this happens because -Warray-bounds and -Warray-bounds= are 
declared as 2 different options in common.opt, so when 
diagnostic_classify_diagnostic() (opts-common.cc:1880) is called, DK_ERROR is 
set for the -Warray-bounds= option, but in diagnostic_report_diagnostic() 
(diagnostic.cc:1446) through warning_at() passes opt_index of -Warray-bounds, 
so information about DK_ERROR is lost.


How did you test the patch?  If you bootstrapped it and ran the
testsuite then it's OK.


Hi,

I'm pretty sure the testsuite will have regressions, as I have a very 
similar patch lying around that needs these testsuite changes. I also
added some modified tests that check that -Werror=array-bounds=X works 
as expected (which it didn't before). This also shows nicely why I don't 
like warnings with levels, what if I want -Werror=array-bounds=2 + 
-Warray-bounds=1?


The reason (besides lack of time) I didn't submit that patch yet, is 
that I wanted to check if the tool to process common.opt can be changed 
to detect warnings with duplicated warning variables. Because I think at 
least -Wuse-after-free= and Wattributes= have the same problem.


BTW, is the duplicated warning description "Warn if an array is accessed 
out of bounds." needed or not with Alias()? There are examples either 
way in common.opt.


I've attached my patch, feel free to integrate the testsuite changes.

Franz

From 9bfefe2082f55f2ad2cd19beedbfeaf9bd20fb4a Mon Sep 17 00:00:00 2001
From: Franz Sirl 
Date: Thu, 8 Jul 2021 10:25:00 +0200
Subject: [PATCH 08/11] Unify -Warray-bounds/-Warray-bounds= warning variables.

---
 gcc/builtins.cc |   6 +-
 gcc/c-family/c-common.cc|   6 +-
 gcc/common.opt  |   3 +-
 gcc/diagnostic-spec.cc  |   1 -
 gcc/gimple-array-bounds.cc  |  38 +++---
 gcc/gimple-ssa-warn-restrict.cc |   2 +-
 gcc/testsuite/gcc.dg/Warray-bounds-34.c |   2 +-
 gcc/testsuite/gcc.dg/Warray-bounds-43.c |   6 +-
 gcc/testsuite/gcc.dg/Warray-bounds-93.c | 103 
 gcc/testsuite/gcc.dg/Warray-bounds-94.c | 149 
 10 files changed, 283 insertions(+), 33 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/Warray-bounds-93.c
 create mode 100644 gcc/testsuite/gcc.dg/Warray-bounds-94.c

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 4dc1ca672b2..02c4fefa86f 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -696,14 +696,14 @@ c_strlen (tree arg, int only_value, c_strlen_data *data, 
unsigned eltsize)
 {
   /* Suppress multiple warnings for propagated constant strings.  */
   if (only_value != 2
- && !warning_suppressed_p (arg, OPT_Warray_bounds)
- && warning_at (loc, OPT_Warray_bounds,
+ && !warning_suppressed_p (arg, OPT_Warray_bounds_)
+ && warning_at (loc, OPT_Warray_bounds_,
 "offset %qwi outside bounds of constant string",
 eltoff))
{
  if (decl)
inform (DECL_SOURCE_LOCATION (decl), "%qE declared here", decl);
- suppress_warning (arg, OPT_Warray_bounds);
+ suppress_warning (arg, OPT_Warray_bounds_);
}
   return NULL_TREE;
 }
diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 71507d4cb0a..cf423d384f6 100644
--- a/gcc/c-family/c-common.cc
+++ b/gcc/c-family/c-common.cc
@@ -6811,7 +6811,7 @@ fold_offsetof (tree expr, tree type, enum tree_code ctx)
 definition thereof.  */
  if (TREE_CODE (v) == ARRAY_REF
  || TREE_CODE (v) == COMPONENT_REF)
-   warning (OPT_Warray_bounds,
+   warning (OPT_Warray_bounds_,
 "index %E denotes an offset "
 "greater than size of %qT",
 t, TREE_TYPE (TREE_OPERAND (expr, 0)));
@@ -8530,9 +8530,9 @@ convert_vector_to_array_for_subscript (location_t loc,
 
   index = fold_for_warn (index);
   if (TREE_CODE (index) == INTEGER_CST)
-if (!tree_fits_uhwi_p (index)
+   if (!tree_fits_uhwi_p (index)
|| maybe_ge (tree_to_uhwi (index), TYPE_VECTOR_SUBPARTS (type)))
-  warning_at (loc, OPT_Warray_bounds, "index value is out of bound");
+ warning_at (loc, OPT_Warray_bounds_, "index value is out of bound");
 
   /* We are building an ARRAY_REF so mark the vector as addressable
  to not run into the gimplifiers premature setting of DECL_GIMP

[PATCH V2] Update block move for struct param or returns

2022-11-24 Thread Jiufu Guo via Gcc-patches
Hi,

When assigning a parameter to a variable, or assigning a variable to
return value with struct type, "block move" are used to expand
the assignment. It would be better to use the register mode according
to the target/ABI to move the blocks. And then this would raise more 
opportunities for other optimization passes(cse/dse/xprop).

As the example code (like code in PR65421):

typedef struct SA {double a[3];} A;
A ret_arg_pt (A *a){return *a;} // on ppc64le, only 3 lfd(s)
A ret_arg (A a) {return a;} // just empty fun body
void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s)

This patch is based on the previous version which supports assignments
from parameter:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605709.html
This patch also supports returns.

I also tried to update gimplify/nrv to replace "return D.xxx;" with
"return ;". While there is one issue: "" with
PARALLEL code can not be accessed through address/component_ref.
This issue blocks a few passes (e.g. sra, expand).

On ppc64, some dead stores are not eliminated. e.g. for ret_arg:
.cfi_startproc
std 4,56(1)//reductant
std 5,64(1)//reductant
std 6,72(1)//reductant
std 4,0(3)
std 5,8(3)
std 6,16(3)
blr

Bootstraped and regtested on ppc64le and x86_64.

I'm wondering if this patch could be committed first.
Thanks for the comments and suggestions.


BR,
Jeff (Jiufu)

PR target/65421

gcc/ChangeLog:

* cfgexpand.cc (expand_used_vars): Add collecting return VARs.
(expand_gimple_stmt_1): Call expand_special_struct_assignment.
(pass_expand::execute): Free collections of return VARs.
* expr.cc (expand_special_struct_assignment): New function.
* expr.h (expand_special_struct_assignment): Declare.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr65421-1.c: New test.
* gcc.target/powerpc/pr65421.c: New test.

---
 gcc/cfgexpand.cc | 37 +
 gcc/expr.cc  | 43 
 gcc/expr.h   |  3 ++
 gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 21 ++
 gcc/testsuite/gcc.target/powerpc/pr65421.c   | 19 +
 5 files changed, 123 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c

diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
index dd29c03..f185de39341 100644
--- a/gcc/cfgexpand.cc
+++ b/gcc/cfgexpand.cc
@@ -341,6 +341,9 @@ static hash_map *decl_to_stack_part;
all of them in one big sweep.  */
 static bitmap_obstack stack_var_bitmap_obstack;
 
+/* Those VARs on returns.  */
+static bitmap return_vars;
+
 /* An array of indices such that stack_vars[stack_vars_sorted[i]].size
is non-decreasing.  */
 static size_t *stack_vars_sorted;
@@ -2158,6 +2161,24 @@ expand_used_vars (bitmap forced_stack_vars)
 frame_phase = off ? align - off : 0;
   }
 
+  /* Collect VARs on returns.  */
+  return_vars = NULL;
+  if (DECL_RESULT (current_function_decl)
+  && TYPE_MODE (TREE_TYPE (DECL_RESULT (current_function_decl))) == 
BLKmode)
+{
+  return_vars = BITMAP_ALLOC (NULL);
+
+  edge_iterator ei;
+  edge e;
+  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
+   if (greturn *ret = safe_dyn_cast (last_stmt (e->src)))
+ {
+   tree val = gimple_return_retval (ret);
+   if (val && VAR_P (val))
+ bitmap_set_bit (return_vars, DECL_UID (val));
+ }
+}
+
   /* Set TREE_USED on all variables in the local_decls.  */
   FOR_EACH_LOCAL_DECL (cfun, i, var)
 TREE_USED (var) = 1;
@@ -3942,6 +3963,17 @@ expand_gimple_stmt_1 (gimple *stmt)
  /* This is a clobber to mark the going out of scope for
 this LHS.  */
  expand_clobber (lhs);
+   else if ((TREE_CODE (rhs) == PARM_DECL && DECL_INCOMING_RTL (rhs)
+ && TYPE_MODE (TREE_TYPE (rhs)) == BLKmode
+ && (GET_CODE (DECL_INCOMING_RTL (rhs)) == PARALLEL
+ || REG_P (DECL_INCOMING_RTL (rhs
+|| (VAR_P (lhs) && return_vars
+&& DECL_RTL_SET_P (DECL_RESULT (current_function_decl))
+&& GET_CODE (
+ DECL_RTL (DECL_RESULT (current_function_decl)))
+ == PARALLEL
+&& bitmap_bit_p (return_vars, DECL_UID (lhs
+ expand_special_struct_assignment (lhs, rhs);
else
  expand_assignment (lhs, rhs,
 gimple_assign_nontemporal_move_p (
@@ -7025,6 +7057,11 @@ pass_expand::execute (function *fun)
   /* After expanding, the return labels are no longer needed. */
   return_label = NULL;
   naked_return_label = NULL;
+  if (return_vars)
+{
+  BITMAP_FREE (return_vars);
+  return_var

[committed] testsuite: Fix up broken testcase [PR107127]

2022-11-24 Thread Jakub Jelinek via Gcc-patches
Hi!

On Thu, Nov 24, 2022 at 09:37:54AM +0800, haochen.jiang wrote:
> 8a0fce6a51915c29584427fd376b40073c328090 is the first bad commit
> commit 8a0fce6a51915c29584427fd376b40073c328090
> Author: Jakub Jelinek 
> Date:   Wed Nov 23 19:09:31 2022 +0100
> 
> c: Fix compile time hog in c_genericize [PR107127]
> 
> caused
> 
> FAIL: gcc.dg/pr107127.c (test for excess errors)

Oops, sorry.
I've added { dg-options "" } line manually in the patch but
forgot to adjust the number of added lines.

Tested on x86_64-linux, committed to trunk as obvious.

2022-11-24  Jakub Jelinek  

PR c/107127
* gcc.dg/pr107127.c (foo): Add missing closing }.

--- gcc/testsuite/gcc.dg/pr107127.c.jj  2022-11-23 19:09:24.670333114 +0100
+++ gcc/testsuite/gcc.dg/pr107127.c 2022-11-24 10:31:48.208143681 +0100
@@ -10,3 +10,4 @@ foo (_Complex double a, double b, double
   return v[0] / c * (0 - 0 / a + 699.0 + 7.05 - 286.0 - +-4.65 + 1.57 + 0) 
* 0.1 - 3.28 + 4.22 + 0.1)) * b + 5.06)
 * 1.23 * 8.0 * 12.0 * 16.0 * 2.0 * 2.0 * 0.25 * 0.125 * 18.2 * 
-15.25 * 0.0001
 * 42.0 * 0.012 - 8.45 + 0 + 88.0 + 6.96 + 867.0 + 9.10 - 7.04 
* -1.0);
+}


Jakub



Re: [PATCH] libstdc++: Workaround buggy printf on Solaris in to_chars/float128_c++23.cc test [PR107815]

2022-11-24 Thread Jonathan Wakely via Gcc-patches
On Thu, 24 Nov 2022 at 09:20, Jakub Jelinek wrote:
>
> Hi!
>
> As mentioned in the PR, Solaris apparently can handle right
> printf ("%.0Lf\n", 1e+202L * __DBL_MAX__);
> which prints 511 chars long number, but can't handle
> printf ("%.0Lf\n", 1e+203L * __DBL_MAX__);
> nor
> printf ("%.0Lf\n", __LDBL_MAX__);
> properly, instead of printing 512 chars long number for the former and
> 4933 chars long number for the second, it handles them as
> if user asked for "%.0Le\n" in those cases.
>
> The following patch disables the single problematic value that fails
> in the test, and also fixes commented out debugging printouts.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK, thanks


>
> 2022-11-24  Jakub Jelinek  
>
> PR libstdc++/107815
> * testsuite/20_util/to_chars/float128_c++23.cc (test): Disable
> __FLT128_MAX__ test on Solaris.  Fix up commented out debugging
> printouts.
>
> --- libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc.jj
> 2022-11-08 11:19:22.251768167 +0100
> +++ libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc   2022-11-23 
> 17:02:22.380051796 +0100
> @@ -52,14 +52,17 @@ test(std::chars_format fmt = std::chars_
>  std::numbers::inv_sqrt3_v,
>  std::numbers::egamma_v,
>  std::numbers::phi_v,
> +// Solaris has non-conforming printf, see PR98384 and PR107815.
> +#if !(defined(__sun__) && defined(__svr4__))
>  std::numeric_limits::max()
> +#endif
>};
>char str1[1], str2[1];
>for (auto u : tests)
>  {
>auto [ptr1, ec1] = std::to_chars(str1, str1 + sizeof(str1), u, fmt);
>VERIFY( ec1 == std::errc() );
> -//std::cout << i << ' ' << std::string_view (str1, ptr1) << '\n';
> +//std::cout << u << ' ' << std::string_view (str1, ptr1) << '\n';
>if (fmt == std::chars_format::fixed)
> {
>   auto [ptr2, ec2] = std::to_chars(str2, str2 + (ptr1 - str1), u, 
> fmt);
> @@ -76,7 +79,7 @@ test(std::chars_format fmt = std::chars_
>
>auto [ptr5, ec5] = std::to_chars(str1, str1 + sizeof(str1), u, fmt, 
> 90);
>VERIFY( ec5 == std::errc() );
> -//std::cout << i << ' ' << std::string_view (str1, ptr5) << '\n';
> +//std::cout << u << ' ' << std::string_view (str1, ptr5) << '\n';
>v = 4.0f128;
>auto [ptr6, ec6] = std::from_chars(str1, ptr5, v,
>  fmt == std::chars_format{}
>
> Jakub
>



Re: [PATCH] libstdc++: Another merge from fast_float upstream [PR107468]

2022-11-24 Thread Jonathan Wakely via Gcc-patches
On Thu, 24 Nov 2022 at 09:23, Jakub Jelinek wrote:
>
> Hi!
>
> Upstream fast_float came up with a cheaper test for
> fegetround () == FE_TONEAREST using one float addition, one subtraction
> and one comparison.  If we know we are rounding to nearest, we can use
> fast path in more cases as before.
> The following patch merges those changes into libstdc++.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK, thanks.


>
> 2022-11-24  Jakub Jelinek  
>
> PR libstdc++/107468
> * src/c++17/fast_float/MERGE: Adjust for merge from upstream.
> * src/c++17/fast_float/fast_float.h: Merge from fast_float
> 2ef9abbcf6a11958b6fa685a89d0150022e82e78 commit.
>
> --- libstdc++-v3/src/c++17/fast_float/MERGE.jj  2022-11-07 15:17:14.035071694 
> +0100
> +++ libstdc++-v3/src/c++17/fast_float/MERGE 2022-11-23 17:09:20.940866070 
> +0100
> @@ -1,4 +1,4 @@
> -662497742fea7055f0e0ee27e5a7ddc382c2c38e
> +2ef9abbcf6a11958b6fa685a89d0150022e82e78
>
>  The first line of this file holds the git revision number of the
>  last merge done from the master library sources.
> --- libstdc++-v3/src/c++17/fast_float/fast_float.h.jj   2022-11-07 
> 15:17:14.066071268 +0100
> +++ libstdc++-v3/src/c++17/fast_float/fast_float.h  2022-11-23 
> 17:19:41.735693122 +0100
> @@ -99,11 +99,11 @@ from_chars_result from_chars_advanced(co
> || defined(__MINGW64__)  \
> || defined(__s390x__)\
> || (defined(__ppc64__) || defined(__PPC64__) || defined(__ppc64le__) 
> || defined(__PPC64LE__)) )
> -#define FASTFLOAT_64BIT
> +#define FASTFLOAT_64BIT 1
>  #elif (defined(__i386) || defined(__i386__) || defined(_M_IX86)   \
>   || defined(__arm__) || defined(_M_ARM)   \
>   || defined(__MINGW32__) || defined(__EMSCRIPTEN__))
> -#define FASTFLOAT_32BIT
> +#define FASTFLOAT_32BIT 1
>  #else
>// Need to check incrementally, since SIZE_MAX is a size_t, avoid overflow.
>// We can never tell the register width, but the SIZE_MAX is a good 
> approximation.
> @@ -111,9 +111,9 @@ from_chars_result from_chars_advanced(co
>#if SIZE_MAX == 0x
>  #error Unknown platform (16-bit, unsupported)
>#elif SIZE_MAX == 0x
> -#define FASTFLOAT_32BIT
> +#define FASTFLOAT_32BIT 1
>#elif SIZE_MAX == 0x
> -#define FASTFLOAT_64BIT
> +#define FASTFLOAT_64BIT 1
>#else
>  #error Unknown platform (not 32-bit, not 64-bit?)
>#endif
> @@ -359,10 +359,12 @@ template  struct binary_form
>static inline constexpr int minimum_exponent();
>static inline constexpr int infinite_power();
>static inline constexpr int sign_index();
> +  static inline constexpr int min_exponent_fast_path(); // used when 
> fegetround() == FE_TONEAREST
>static inline constexpr int max_exponent_fast_path();
>static inline constexpr int max_exponent_round_to_even();
>static inline constexpr int min_exponent_round_to_even();
>static inline constexpr uint64_t max_mantissa_fast_path(int64_t power);
> +  static inline constexpr uint64_t max_mantissa_fast_path(); // used when 
> fegetround() == FE_TONEAREST
>static inline constexpr int largest_power_of_ten();
>static inline constexpr int smallest_power_of_ten();
>static inline constexpr T exact_power_of_ten(int64_t power);
> @@ -372,6 +374,22 @@ template  struct binary_form
>static inline constexpr equiv_uint hidden_bit_mask();
>  };
>
> +template <> inline constexpr int 
> binary_format::min_exponent_fast_path() {
> +#if (FLT_EVAL_METHOD != 1) && (FLT_EVAL_METHOD != 0)
> +  return 0;
> +#else
> +  return -22;
> +#endif
> +}
> +
> +template <> inline constexpr int 
> binary_format::min_exponent_fast_path() {
> +#if (FLT_EVAL_METHOD != 1) && (FLT_EVAL_METHOD != 0)
> +  return 0;
> +#else
> +  return -10;
> +#endif
> +}
> +
>  template <> inline constexpr int 
> binary_format::mantissa_explicit_bits() {
>return 52;
>  }
> @@ -418,13 +436,18 @@ template <> inline constexpr int binary_
>  template <> inline constexpr int 
> binary_format::max_exponent_fast_path() {
>return 10;
>  }
> -
> +template <> inline constexpr uint64_t 
> binary_format::max_mantissa_fast_path() {
> +  return uint64_t(2) << mantissa_explicit_bits();
> +}
>  template <> inline constexpr uint64_t 
> binary_format::max_mantissa_fast_path(int64_t power) {
>// caller is responsible to ensure that
>// power >= 0 && power <= 22
>//
>return max_mantissa_double[power];
>  }
> +template <> inline constexpr uint64_t 
> binary_format::max_mantissa_fast_path() {
> +  return uint64_t(2) << mantissa_explicit_bits();
> +}
>  template <> inline constexpr uint64_t 
> binary_format::max_mantissa_fast_path(int64_t power) {
>// caller is responsible to ensure that
>// power >= 0 && power <= 10
> @@ -619,10 +642,6 @@ parsed_number_string parse_number_string
>
>uint64_t i = 0; // an unsign

[PATCH] asan: Fix up error recovery for too large frames [PR107317]

2022-11-24 Thread Jakub Jelinek via Gcc-patches
Hi!

asan_emit_stack_protection and functions it calls have various asserts that
verify sanity of the stack protection instrumentation.  But, that
verification can easily fail if we've diagnosed a frame offset overflow.
asan_emit_stack_protection just emits some extra code in the prologue,
if we've reported errors, we aren't producing assembly, so it doesn't
really matter if we don't include the protection code, compilation
is going to fail anyway.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-11-24  Jakub Jelinek  

PR middle-end/107317
* asan.cc: Include diagnostic-core.h.
(asan_emit_stack_protection): Return NULL early if seen_error ().

* gcc.dg/asan/pr107317.c: New test.

--- gcc/asan.cc.jj  2022-06-28 13:03:30.613693889 +0200
+++ gcc/asan.cc 2022-11-23 17:47:09.130332461 +0100
@@ -64,6 +64,7 @@ along with GCC; see the file COPYING3.
 #include "tree-inline.h"
 #include "tree-ssa.h"
 #include "tree-eh.h"
+#include "diagnostic-core.h"
 
 /* AddressSanitizer finds out-of-bounds and use-after-free bugs
with <2x slowdown on average.
@@ -1818,6 +1819,11 @@ asan_emit_stack_protection (rtx base, rt
   tree str_cst, decl, id;
   int use_after_return_class = -1;
 
+  /* Don't emit anything when doing error recovery, the assertions
+ might fail e.g. if a function had a frame offset overflow.  */
+  if (seen_error ())
+return NULL;
+
   if (shadow_ptr_types[0] == NULL_TREE)
 asan_init_shadow_ptr_types ();
 
--- gcc/testsuite/gcc.dg/asan/pr107317.c.jj 2022-11-23 17:46:09.145219960 
+0100
+++ gcc/testsuite/gcc.dg/asan/pr107317.c2022-11-23 17:49:45.148024097 
+0100
@@ -0,0 +1,13 @@
+/* PR middle-end/107317 */
+/* { dg-do compile { target ilp32 } } */
+/* { dg-options "-fsanitize=address -ffat-lto-objects" } */
+
+void bar (float *, float *);
+
+void
+foo (void) /* { dg-error "exceeds maximum" } */
+{
+  float a[4];
+  float b[2];
+  bar (a, b);
+}

Jakub



Re: [PATCH] vect: Fold LEN_{LOAD, STORE} if it's for the whole vector [PR107412]

2022-11-24 Thread Richard Sandiford via Gcc-patches
"Kewen.Lin"  writes:
> Hi,
>
> As the test case in PR107412 shows, we can fold IFN .LEN_{LOAD,
> STORE} into normal vector load/store if the given length is known
> to be equal to the length of the whole vector.  It would help to
> improve overall cycles as normally the latency of vector access
> with length in bytes is bigger than normal vector access, and it
> also saves the preparation for length if constant length can not
> be encoded into instruction (such as on power).
>
> Bootstrapped and regtested on x86_64-redhat-linux,
> aarch64-linux-gnu and powerpc64{,le}-linux-gnu.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
>   PR tree-optimization/107412
>
> gcc/ChangeLog:
>
>   * gimple-fold.cc (gimple_fold_mask_load_store_mem_ref): Rename to ...
>   (gimple_fold_partial_load_store_mem_ref): ... this, add one parameter
>   mask_p indicating it's for mask or length, and add some handlings for
>   IFN LEN_{LOAD,STORE}.
>   (gimple_fold_mask_load): Rename to ...
>   (gimple_fold_partial_load): ... this, add one parameter mask_p.
>   (gimple_fold_mask_store): Rename to ...
>   (gimple_fold_partial_store): ... this, add one parameter mask_p.
>   (gimple_fold_call): Add the handlings for IFN LEN_{LOAD,STORE},
>   and adjust calls on gimple_fold_mask_load_store_mem_ref to
>   gimple_fold_partial_load_store_mem_ref.

Sorry to reply to late (still catching up on email), but:

>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/pr107412.c: New test.
>   * gcc.target/powerpc/p9-vec-length-epil-8.c: Adjust scan times for
>   folded LEN_LOAD.
> ---
>  gcc/gimple-fold.cc| 57 ++-
>  .../gcc.target/powerpc/p9-vec-length-epil-8.c |  2 +-
>  gcc/testsuite/gcc.target/powerpc/pr107412.c   | 19 +++
>  3 files changed, 64 insertions(+), 14 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr107412.c
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index a1704784bc9..e3a087defa6 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -5370,19 +5370,39 @@ arith_overflowed_p (enum tree_code code, const_tree 
> type,
>return wi::min_precision (wres, sign) > TYPE_PRECISION (type);
>  }
>
> -/* If IFN_MASK_LOAD/STORE call CALL is unconditional, return a MEM_REF
> +/* If IFN_{MASK,LEN}_LOAD/STORE call CALL is unconditional, return a MEM_REF
> for the memory it references, otherwise return null.  VECTYPE is the
> -   type of the memory vector.  */
> +   type of the memory vector.  MASK_P indicates it's for MASK if true,
> +   otherwise it's for LEN.  */
>
>  static tree
> -gimple_fold_mask_load_store_mem_ref (gcall *call, tree vectype)
> +gimple_fold_partial_load_store_mem_ref (gcall *call, tree vectype, bool 
> mask_p)
>  {
>tree ptr = gimple_call_arg (call, 0);
>tree alias_align = gimple_call_arg (call, 1);
> -  tree mask = gimple_call_arg (call, 2);
> -  if (!tree_fits_uhwi_p (alias_align) || !integer_all_onesp (mask))
> +  if (!tree_fits_uhwi_p (alias_align))
>  return NULL_TREE;
>
> +  if (mask_p)
> +{
> +  tree mask = gimple_call_arg (call, 2);
> +  if (!integer_all_onesp (mask))
> + return NULL_TREE;
> +} else {

Minor nit: }, else, and { should be on separate lines.  But the thing
I actually wanted to say was...

> +  tree basic_len = gimple_call_arg (call, 2);
> +  if (!tree_fits_uhwi_p (basic_len))
> + return NULL_TREE;
> +  unsigned int nargs = gimple_call_num_args (call);
> +  tree bias = gimple_call_arg (call, nargs - 1);
> +  gcc_assert (tree_fits_uhwi_p (bias));
> +  tree biased_len = int_const_binop (MINUS_EXPR, basic_len, bias);
> +  unsigned int len = tree_to_uhwi (biased_len);
> +  unsigned int vect_len
> + = GET_MODE_SIZE (TYPE_MODE (vectype)).to_constant ();
> +  if (vect_len != len)
> + return NULL_TREE;

Using "unsigned int" truncates the value.  I realise that's probably
safe in this context, since large values have undefined behaviour.
But it still seems better to use an untruncated type, so that it
looks less like an oversight.  (I think this is one case where "auto"
can help, since it gets the type right automatically.)

It would also be better to avoid the to_constant, since we haven't
proven is_constant.  How about:

  tree basic_len = gimple_call_arg (call, 2);
  if (!poly_int_tree_p (basic_len))
return NULL_TREE;
  unsigned int nargs = gimple_call_num_args (call);
  tree bias = gimple_call_arg (call, nargs - 1);
  gcc_assert (TREE_CODE (bias) == INTEGER_CST);
  if (maybe_ne (wi::to_poly_widest (basic_len) - wi::to_widest (bias),
GET_MODE_SIZE (TYPE_MODE (vectype
return NULL_TREE;

which also avoids using tree arithmetic for the subtraction?

Thanks,
Richard


> +}
> +
>unsigned HOST_WIDE_INT align = tree_to_uhwi (alias_align);
>if (TYPE_ALIGN (vectype) != align)
>  vectype = build_aligned_t

[PATCH] libstdc++: Another merge from fast_float upstream [PR107468]

2022-11-24 Thread Jakub Jelinek via Gcc-patches
Hi!

Upstream fast_float came up with a cheaper test for
fegetround () == FE_TONEAREST using one float addition, one subtraction
and one comparison.  If we know we are rounding to nearest, we can use
fast path in more cases as before.
The following patch merges those changes into libstdc++.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-11-24  Jakub Jelinek  

PR libstdc++/107468
* src/c++17/fast_float/MERGE: Adjust for merge from upstream.
* src/c++17/fast_float/fast_float.h: Merge from fast_float
2ef9abbcf6a11958b6fa685a89d0150022e82e78 commit.

--- libstdc++-v3/src/c++17/fast_float/MERGE.jj  2022-11-07 15:17:14.035071694 
+0100
+++ libstdc++-v3/src/c++17/fast_float/MERGE 2022-11-23 17:09:20.940866070 
+0100
@@ -1,4 +1,4 @@
-662497742fea7055f0e0ee27e5a7ddc382c2c38e
+2ef9abbcf6a11958b6fa685a89d0150022e82e78
 
 The first line of this file holds the git revision number of the
 last merge done from the master library sources.
--- libstdc++-v3/src/c++17/fast_float/fast_float.h.jj   2022-11-07 
15:17:14.066071268 +0100
+++ libstdc++-v3/src/c++17/fast_float/fast_float.h  2022-11-23 
17:19:41.735693122 +0100
@@ -99,11 +99,11 @@ from_chars_result from_chars_advanced(co
|| defined(__MINGW64__)  \
|| defined(__s390x__)\
|| (defined(__ppc64__) || defined(__PPC64__) || defined(__ppc64le__) || 
defined(__PPC64LE__)) )
-#define FASTFLOAT_64BIT
+#define FASTFLOAT_64BIT 1
 #elif (defined(__i386) || defined(__i386__) || defined(_M_IX86)   \
  || defined(__arm__) || defined(_M_ARM)   \
  || defined(__MINGW32__) || defined(__EMSCRIPTEN__))
-#define FASTFLOAT_32BIT
+#define FASTFLOAT_32BIT 1
 #else
   // Need to check incrementally, since SIZE_MAX is a size_t, avoid overflow.
   // We can never tell the register width, but the SIZE_MAX is a good 
approximation.
@@ -111,9 +111,9 @@ from_chars_result from_chars_advanced(co
   #if SIZE_MAX == 0x
 #error Unknown platform (16-bit, unsupported)
   #elif SIZE_MAX == 0x
-#define FASTFLOAT_32BIT
+#define FASTFLOAT_32BIT 1
   #elif SIZE_MAX == 0x
-#define FASTFLOAT_64BIT
+#define FASTFLOAT_64BIT 1
   #else
 #error Unknown platform (not 32-bit, not 64-bit?)
   #endif
@@ -359,10 +359,12 @@ template  struct binary_form
   static inline constexpr int minimum_exponent();
   static inline constexpr int infinite_power();
   static inline constexpr int sign_index();
+  static inline constexpr int min_exponent_fast_path(); // used when 
fegetround() == FE_TONEAREST
   static inline constexpr int max_exponent_fast_path();
   static inline constexpr int max_exponent_round_to_even();
   static inline constexpr int min_exponent_round_to_even();
   static inline constexpr uint64_t max_mantissa_fast_path(int64_t power);
+  static inline constexpr uint64_t max_mantissa_fast_path(); // used when 
fegetround() == FE_TONEAREST
   static inline constexpr int largest_power_of_ten();
   static inline constexpr int smallest_power_of_ten();
   static inline constexpr T exact_power_of_ten(int64_t power);
@@ -372,6 +374,22 @@ template  struct binary_form
   static inline constexpr equiv_uint hidden_bit_mask();
 };
 
+template <> inline constexpr int 
binary_format::min_exponent_fast_path() {
+#if (FLT_EVAL_METHOD != 1) && (FLT_EVAL_METHOD != 0)
+  return 0;
+#else
+  return -22;
+#endif
+}
+
+template <> inline constexpr int 
binary_format::min_exponent_fast_path() {
+#if (FLT_EVAL_METHOD != 1) && (FLT_EVAL_METHOD != 0)
+  return 0;
+#else
+  return -10;
+#endif
+}
+
 template <> inline constexpr int 
binary_format::mantissa_explicit_bits() {
   return 52;
 }
@@ -418,13 +436,18 @@ template <> inline constexpr int binary_
 template <> inline constexpr int 
binary_format::max_exponent_fast_path() {
   return 10;
 }
-
+template <> inline constexpr uint64_t 
binary_format::max_mantissa_fast_path() {
+  return uint64_t(2) << mantissa_explicit_bits();
+}
 template <> inline constexpr uint64_t 
binary_format::max_mantissa_fast_path(int64_t power) {
   // caller is responsible to ensure that
   // power >= 0 && power <= 22
   //
   return max_mantissa_double[power];
 }
+template <> inline constexpr uint64_t 
binary_format::max_mantissa_fast_path() {
+  return uint64_t(2) << mantissa_explicit_bits();
+}
 template <> inline constexpr uint64_t 
binary_format::max_mantissa_fast_path(int64_t power) {
   // caller is responsible to ensure that
   // power >= 0 && power <= 10
@@ -619,10 +642,6 @@ parsed_number_string parse_number_string
 
   uint64_t i = 0; // an unsigned int avoids signed overflows (which are bad)
 
-  while ((std::distance(p, pend) >= 8) && is_made_of_eight_digits_fast(p)) {
-i = i * 1 + parse_eight_digits_unrolled(p); // in rare cases, this 
will overflow, but that's ok
-p += 8;
-  }
   while ((p != pend) && is_integer(*p)) {
 // a multi

[PATCH 8/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4

2022-11-24 Thread Kewen Lin via Gcc-patches
The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  So it's better
to refactor the handlings of vector integer comparison here.

This is part 4, it's to rework the handlings on GE/GEU/LE/LEU,
also make the function not recursive any more.  This patch
doesn't introduce any functionality change.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refine the
handlings for operators GE/GEU/LE/LEU.
---
 gcc/config/rs6000/rs6000.cc | 87 -
 1 file changed, 17 insertions(+), 70 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index b4ca7b3d1b1..a3645e321a7 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15640,7 +15640,7 @@ output_cbranch (rtx op, const char *label, int 
reversed, rtx_insn *insn)
 }
 
 /* Emit vector compare for operands OP0 and OP1 using code RCODE.
-   DMODE is expected destination mode. This is a recursive function.  */
+   DMODE is expected destination mode.  */
 
 static rtx
 rs6000_emit_vector_compare (enum rtx_code rcode,
@@ -15649,7 +15649,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
 {
   gcc_assert (VECTOR_UNIT_ALTIVEC_OR_VSX_P (dmode));
   gcc_assert (GET_MODE (op0) == GET_MODE (op1));
-  rtx mask;
+  rtx mask = gen_reg_rtx (dmode);
 
   /* In vector.md, we support all kinds of vector float point
  comparison operators in a comparison rtl pattern, we can
@@ -15658,7 +15658,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
  of raising invalid exception.  */
   if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
 {
-  mask = gen_reg_rtx (dmode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
   return mask;
 }
@@ -15667,11 +15666,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
  have direct hardware instructions, just emit it directly
  here.  */
   if (rcode == EQ || rcode == GT || rcode == GTU)
-{
-  mask = gen_reg_rtx (dmode);
-  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
-  return mask;
-}
+emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
   else if (rcode == LT || rcode == LTU)
 {
   /* lt{,u}(a,b) = gt{,u}(b,a)  */
@@ -15679,76 +15674,28 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   std::swap (op0, op1);
   mask = gen_reg_rtx (dmode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
-  return mask;
 }
-  else if (rcode == NE)
+  else if (rcode == NE || rcode == LE || rcode == LEU)
 {
-  /* ne(a,b) = ~eq(a,b)  */
+  /* ne(a,b) = ~eq(a,b); le{,u}(a,b) = ~gt{,u}(a,b)  */
+  enum rtx_code code = reverse_condition (rcode);
   mask = gen_reg_rtx (dmode);
-  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (EQ, dmode, op0, op1)));
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
+  enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
+  gcc_assert (nor_code != CODE_FOR_nothing);
+  emit_insn (GEN_FCN (nor_code) (mask, mask));
+} else {
+  /* ge{,u}(a,b) = ~gt{,u}(b,a)  */
+  gcc_assert (rcode == GE || rcode == GEU);
+  enum rtx_code code = rcode == GE ? GT : GTU;
+  mask = gen_reg_rtx (dmode);
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
   enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
   gcc_assert (nor_code != CODE_FOR_nothing);
   emit_insn (GEN_FCN (nor_code) (mask, mask));
-  return mask;
-}
-
-  switch (rcode)
-{
-case GE:
-case GEU:
-case LE:
-case LEU:
-  /* Try GT/GTU/LT/LTU OR EQ */
-  {
-   rtx c_rtx, eq_rtx;
-   enum insn_code ior_code;
-   enum rtx_code new_code;
-
-   switch (rcode)
- {
- case  GE:
-   new_code = GT;
-   break;
-
- case GEU:
-   new_code = GTU;
-   break;
-
- case LE:
-   new_code = LT;
-   break;
-
- case LEU:
-   new_code = LTU;
-   break;
-
- default:
-   gcc_unreachable ();
- }
-
-   ior_code = optab_handler (ior_optab, dmode);
-   if (ior_code == CODE_FOR_nothing)
- return NULL_RTX;
-
-   c_rtx = rs6000_emit_vector_compare (new_code, op0, op1, dmode);
-   if (!c_rtx)
- return NULL_RTX;
-
-   eq_rtx = rs6000_emit_vector_compare (EQ, op0, op1, dmode);
-   if (!eq_rtx)
- return NULL_RTX;
-
-   mask = gen_reg_rtx (dmode);
-   emit_insn (GEN_FCN (ior_code) (mask, c_rtx, eq_rtx));
-   return mask;
-  }
-  break;
-default:
-  return NULL_RTX;
 }
 
-  /* You only get two chances.  */
-  return NULL_RTX;
+  return mask;
 }
 
 /* Emit vector conditional expression.  DEST is destination. 

[PATCH 5/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1

2022-11-24 Thread Kewen Lin via Gcc-patches
The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  So it's better
to refactor the handlings of vector integer comparison here.

This is part 1, it's to remove the helper function
rs6000_emit_vector_compare_inner and move the logics into
rs6000_emit_vector_compare.  This patch doesn't introduce any
functionality change.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Remove.
(rs6000_emit_vector_compare): Emit rtx comparison for operators EQ/
GT/GTU directly.
---
 gcc/config/rs6000/rs6000.cc | 37 +
 1 file changed, 9 insertions(+), 28 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 94e039649f5..0a5826800c5 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15639,30 +15639,6 @@ output_cbranch (rtx op, const char *label, int 
reversed, rtx_insn *insn)
   return string;
 }
 
-/* Return insn for VSX or Altivec comparisons.  */
-
-static rtx
-rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
-{
-  machine_mode mode = GET_MODE (op0);
-  gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT);
-
-  switch (code)
-{
-default:
-  break;
-
-case EQ:
-case GT:
-case GTU:
-  rtx mask = gen_reg_rtx (mode);
-  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1)));
-  return mask;
-}
-
-  return NULL_RTX;
-}
-
 /* Emit vector compare for operands OP0 and OP1 using code RCODE.
DMODE is expected destination mode. This is a recursive function.  */
 
@@ -15687,10 +15663,15 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   return mask;
 }
 
-  /* See if the comparison works as is.  */
-  mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
-  if (mask)
-return mask;
+  /* For any of vector integer comparison operators for which we
+ have direct hardware instructions, just emit it directly
+ here.  */
+  if (rcode == EQ || rcode == GT || rcode == GTU)
+{
+  mask = gen_reg_rtx (dmode);
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
+  return mask;
+}
 
   bool swap_operands = false;
   bool try_again = false;
-- 
2.27.0



[PATCH] libstdc++: Workaround buggy printf on Solaris in to_chars/float128_c++23.cc test [PR107815]

2022-11-24 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned in the PR, Solaris apparently can handle right
printf ("%.0Lf\n", 1e+202L * __DBL_MAX__);
which prints 511 chars long number, but can't handle
printf ("%.0Lf\n", 1e+203L * __DBL_MAX__);
nor
printf ("%.0Lf\n", __LDBL_MAX__);
properly, instead of printing 512 chars long number for the former and
4933 chars long number for the second, it handles them as
if user asked for "%.0Le\n" in those cases.

The following patch disables the single problematic value that fails
in the test, and also fixes commented out debugging printouts.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-11-24  Jakub Jelinek  

PR libstdc++/107815
* testsuite/20_util/to_chars/float128_c++23.cc (test): Disable
__FLT128_MAX__ test on Solaris.  Fix up commented out debugging
printouts.

--- libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc.jj
2022-11-08 11:19:22.251768167 +0100
+++ libstdc++-v3/testsuite/20_util/to_chars/float128_c++23.cc   2022-11-23 
17:02:22.380051796 +0100
@@ -52,14 +52,17 @@ test(std::chars_format fmt = std::chars_
 std::numbers::inv_sqrt3_v,
 std::numbers::egamma_v,
 std::numbers::phi_v,
+// Solaris has non-conforming printf, see PR98384 and PR107815.
+#if !(defined(__sun__) && defined(__svr4__))
 std::numeric_limits::max()
+#endif
   };
   char str1[1], str2[1];
   for (auto u : tests)
 {
   auto [ptr1, ec1] = std::to_chars(str1, str1 + sizeof(str1), u, fmt);
   VERIFY( ec1 == std::errc() );
-//std::cout << i << ' ' << std::string_view (str1, ptr1) << '\n';
+//std::cout << u << ' ' << std::string_view (str1, ptr1) << '\n';
   if (fmt == std::chars_format::fixed)
{
  auto [ptr2, ec2] = std::to_chars(str2, str2 + (ptr1 - str1), u, fmt);
@@ -76,7 +79,7 @@ test(std::chars_format fmt = std::chars_
 
   auto [ptr5, ec5] = std::to_chars(str1, str1 + sizeof(str1), u, fmt, 90);
   VERIFY( ec5 == std::errc() );
-//std::cout << i << ' ' << std::string_view (str1, ptr5) << '\n';
+//std::cout << u << ' ' << std::string_view (str1, ptr5) << '\n';
   v = 4.0f128;
   auto [ptr6, ec6] = std::from_chars(str1, ptr5, v,
 fmt == std::chars_format{}

Jakub



[PATCH 4/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4

2022-11-24 Thread Kewen Lin via Gcc-patches
All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.

This is part 4, it further checks for comparison opeators
LT/UNGE.  In rs6000_emit_vector_compare, for the handling
of LT, it switches to use code GT, swaps operands and try
again, it's exactly the same as what we have in vector.md:

; lt(a,b)   = gt(b,a)

As to UNGE, in rs6000_emit_vector_compare, it uses reversed
code LT and further operates on the result with one_cmpl,
it's also the same as what's in vector.md:

; unge(a,b) = ~lt(a,b)

This patch should not have any functionality change too.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Emit rtx
comparison for operators LT/UNGE of MODE_VECTOR_FLOAT directly.
(rs6000_emit_vector_compare): Move assertion of no MODE_VECTOR_FLOAT to
function beginning.
---
 gcc/config/rs6000/rs6000.cc | 24 
 1 file changed, 4 insertions(+), 20 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 98754805bd2..94e039649f5 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15645,6 +15645,7 @@ static rtx
 rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
 {
   machine_mode mode = GET_MODE (op0);
+  gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT);
 
   switch (code)
 {
@@ -15654,7 +15655,6 @@ rs6000_emit_vector_compare_inner (enum rtx_code code, 
rtx op0, rtx op1)
 case EQ:
 case GT:
 case GTU:
-  gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT);
   rtx mask = gen_reg_rtx (mode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1)));
   return mask;
@@ -15679,18 +15679,8 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
  comparison operators in a comparison rtl pattern, we can
  just emit the comparison rtx insn directly here.  Besides,
  we should have a centralized place to handle the possibility
- of raising invalid exception.  For EQ/GT/GE/UNORDERED/
- ORDERED/LTGT/UNEQ, they are handled equivalently as before;
- for NE/UNLE/UNLT, they are handled with reversed code
- and inverting, it's the same as before; for LE/UNGT, they
- are handled with LE ior EQ previously, emitting directly
- here will make use of GE later, it's slightly better;
-
- FIXME: Handle the remaining vector float comparison operators
- here.  */
-  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
-  && rcode != LT
-  && rcode != UNGE)
+ of raising invalid exception.  */
+  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT)
 {
   mask = gen_reg_rtx (dmode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
@@ -15718,23 +15708,17 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   try_again = true;
   break;
 case NE:
-case UNGE:
   /* Invert condition and try again.
 e.g., A != B becomes ~(A==B).  */
   {
-   enum rtx_code rev_code;
enum insn_code nor_code;
rtx mask2;
 
-   rev_code = reverse_condition_maybe_unordered (rcode);
-   if (rev_code == UNKNOWN)
- return NULL_RTX;
-
nor_code = optab_handler (one_cmpl_optab, dmode);
if (nor_code == CODE_FOR_nothing)
  return NULL_RTX;
 
-   mask2 = rs6000_emit_vector_compare (rev_code, op0, op1, dmode);
+   mask2 = rs6000_emit_vector_compare (EQ, op0, op1, dmode);
if (!mask2)
  return NULL_RTX;
 
-- 
2.27.0



[PATCH 2/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2

2022-11-24 Thread Kewen Lin via Gcc-patches
All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.

This is part 2, it further checks for comparison opeators
NE/UNLE/UNLT.  In rs6000_emit_vector_compare, they are
handled with reversed code which is queried from function
reverse_condition_maybe_unordered and inverting with
one_cmpl_optab.  It's the same as what we have in vector.md:

; ne(a,b)   = ~eq(a,b)
; unle(a,b) = ~gt(a,b)
; unlt(a,b) = ~ge(a,b)

The operators on the right side have been supported in part 1.
This patch should not have any functionality change too.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Emit rtx
comparison for operators NE/UNLE/UNLT of MODE_VECTOR_FLOAT directly.
---
 gcc/config/rs6000/rs6000.cc | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 5a8f7ff3bf8..09299bef6a2 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15679,20 +15679,18 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
  comparison operators in a comparison rtl pattern, we can
  just emit the comparison rtx insn directly here.  Besides,
  we should have a centralized place to handle the possibility
- of raising invalid exception.  As the first step, only check
- operators EQ/GT/GE/UNORDERED/ORDERED/LTGT/UNEQ for now, they
- are handled equivalently as before.
+ of raising invalid exception.  For EQ/GT/GE/UNORDERED/
+ ORDERED/LTGT/UNEQ, they are handled equivalently as before;
+ for NE/UNLE/UNLT, they are handled with reversed code
+ and inverting, it's the same as before.
 
  FIXME: Handle the remaining vector float comparison operators
  here.  */
   if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
-  && (rcode == EQ
- || rcode == GT
- || rcode == GE
- || rcode == UNORDERED
- || rcode == ORDERED
- || rcode == LTGT
- || rcode == UNEQ))
+  && rcode != LT
+  && rcode != LE
+  && rcode != UNGE
+  && rcode != UNGT)
 {
   mask = gen_reg_rtx (dmode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
@@ -15720,8 +15718,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   try_again = true;
   break;
 case NE:
-case UNLE:
-case UNLT:
 case UNGE:
 case UNGT:
   /* Invert condition and try again.
-- 
2.27.0



[PATCH 7/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3

2022-11-24 Thread Kewen Lin via Gcc-patches
The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  So it's better
to refactor the handlings of vector integer comparison here.

This is part 3, it's to refactor the handlings on NE.
This patch doesn't introduce any functionality change.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refactor the
handlings for operator NE.
---
 gcc/config/rs6000/rs6000.cc | 30 ++
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index c1aebbb5c03..b4ca7b3d1b1 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15681,29 +15681,19 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
   return mask;
 }
+  else if (rcode == NE)
+{
+  /* ne(a,b) = ~eq(a,b)  */
+  mask = gen_reg_rtx (dmode);
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (EQ, dmode, op0, op1)));
+  enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
+  gcc_assert (nor_code != CODE_FOR_nothing);
+  emit_insn (GEN_FCN (nor_code) (mask, mask));
+  return mask;
+}
 
   switch (rcode)
 {
-case NE:
-  /* Invert condition and try again.
-e.g., A != B becomes ~(A==B).  */
-  {
-   enum insn_code nor_code;
-   rtx mask2;
-
-   nor_code = optab_handler (one_cmpl_optab, dmode);
-   if (nor_code == CODE_FOR_nothing)
- return NULL_RTX;
-
-   mask2 = rs6000_emit_vector_compare (EQ, op0, op1, dmode);
-   if (!mask2)
- return NULL_RTX;
-
-   mask = gen_reg_rtx (dmode);
-   emit_insn (GEN_FCN (nor_code) (mask, mask2));
-   return mask;
-  }
-  break;
 case GE:
 case GEU:
 case LE:
-- 
2.27.0



[PATCH 1/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1

2022-11-24 Thread Kewen Lin via Gcc-patches
All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.

This is part 1, it only handles the operators which are
already emitted with an rtx comparison previously in function
rs6000_emit_vector_compare_inner, they are EQ/GT/GE/ORDERED/
UNORDERED/UNEQ/LTGT.  There is no functionality change.

With this change, rs6000_emit_vector_compare_inner would
only work for vector integer comparison handling, it would
be cleaned up later in vector integer comparison rework.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare_inner): Move
MODE_VECTOR_FLOAT handlings out.
(rs6000_emit_vector_compare): Emit rtx comparison for operators EQ/GT/
GE/UNORDERED/ORDERED/UNEQ/LTGT of MODE_VECTOR_FLOAT directly, and
adjust one call site of rs6000_emit_vector_compare_inner to
rs6000_emit_vector_compare.
---
 gcc/config/rs6000/rs6000.cc | 47 -
 1 file changed, 31 insertions(+), 16 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index d2743f7bce6..5a8f7ff3bf8 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15644,7 +15644,6 @@ output_cbranch (rtx op, const char *label, int 
reversed, rtx_insn *insn)
 static rtx
 rs6000_emit_vector_compare_inner (enum rtx_code code, rtx op0, rtx op1)
 {
-  rtx mask;
   machine_mode mode = GET_MODE (op0);
 
   switch (code)
@@ -15652,19 +15651,11 @@ rs6000_emit_vector_compare_inner (enum rtx_code code, 
rtx op0, rtx op1)
 default:
   break;
 
-case GE:
-  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
-   return NULL_RTX;
-  /* FALLTHRU */
-
 case EQ:
 case GT:
 case GTU:
-case ORDERED:
-case UNORDERED:
-case UNEQ:
-case LTGT:
-  mask = gen_reg_rtx (mode);
+  gcc_assert (GET_MODE_CLASS (mode) != MODE_VECTOR_FLOAT);
+  rtx mask = gen_reg_rtx (mode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, mode, op0, op1)));
   return mask;
 }
@@ -15680,18 +15671,42 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
rtx op0, rtx op1,
machine_mode dmode)
 {
-  rtx mask;
-  bool swap_operands = false;
-  bool try_again = false;
-
   gcc_assert (VECTOR_UNIT_ALTIVEC_OR_VSX_P (dmode));
   gcc_assert (GET_MODE (op0) == GET_MODE (op1));
+  rtx mask;
+
+  /* In vector.md, we support all kinds of vector float point
+ comparison operators in a comparison rtl pattern, we can
+ just emit the comparison rtx insn directly here.  Besides,
+ we should have a centralized place to handle the possibility
+ of raising invalid exception.  As the first step, only check
+ operators EQ/GT/GE/UNORDERED/ORDERED/LTGT/UNEQ for now, they
+ are handled equivalently as before.
+
+ FIXME: Handle the remaining vector float comparison operators
+ here.  */
+  if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
+  && (rcode == EQ
+ || rcode == GT
+ || rcode == GE
+ || rcode == UNORDERED
+ || rcode == ORDERED
+ || rcode == LTGT
+ || rcode == UNEQ))
+{
+  mask = gen_reg_rtx (dmode);
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
+  return mask;
+}
 
   /* See if the comparison works as is.  */
   mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
   if (mask)
 return mask;
 
+  bool swap_operands = false;
+  bool try_again = false;
+
   switch (rcode)
 {
 case LT:
@@ -15791,7 +15806,7 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   if (swap_operands)
std::swap (op0, op1);
 
-  mask = rs6000_emit_vector_compare_inner (rcode, op0, op1);
+  mask = rs6000_emit_vector_compare (rcode, op0, op1, dmode);
   if (mask)
return mask;
 }
-- 
2.27.0



[PATCH 9/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5

2022-11-24 Thread Kewen Lin via Gcc-patches
The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  So it's better
to refactor the handlings of vector integer comparison here.

This is part 5, it's to refactor all the handlings of vector
integer comparison to make it neat.  This patch doesn't
introduce any functionality change.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refactor the
handlings of vector integer comparison.
---
 gcc/config/rs6000/rs6000.cc | 68 -
 1 file changed, 44 insertions(+), 24 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index a3645e321a7..b694080840a 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15662,34 +15662,54 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   return mask;
 }
 
-  /* For any of vector integer comparison operators for which we
- have direct hardware instructions, just emit it directly
- here.  */
-  if (rcode == EQ || rcode == GT || rcode == GTU)
-emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
-  else if (rcode == LT || rcode == LTU)
+  bool swap_operands = false;
+  bool need_invert = false;
+  enum rtx_code code = rcode;
+
+  switch (rcode)
 {
+case EQ:
+case GT:
+case GTU:
+  /* Emit directly with native hardware insn.  */
+  break;
+case LT:
+case LTU:
   /* lt{,u}(a,b) = gt{,u}(b,a)  */
-  enum rtx_code code = swap_condition (rcode);
-  std::swap (op0, op1);
-  mask = gen_reg_rtx (dmode);
-  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
+  code = swap_condition (rcode);
+  swap_operands = true;
+  break;
+case NE:
+case LE:
+case LEU:
+  /* ne(a,b) = ~eq(a,b); le{,u}(a,b) = ~gt{,u}(a,b)  */
+  code = reverse_condition (rcode);
+  need_invert = true;
+  break;
+case GE:
+  /* ge(a,b) = ~gt(b,a)  */
+  code = GT;
+  swap_operands = true;
+  need_invert = true;
+  break;
+case GEU:
+  /* geu(a,b) = ~gtu(b,a)  */
+  code = GTU;
+  swap_operands = true;
+  need_invert = true;
+  break;
+default:
+  gcc_unreachable ();
+  break;
 }
-  else if (rcode == NE || rcode == LE || rcode == LEU)
+
+  if (swap_operands)
+std::swap (op0, op1);
+
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
+
+  if (need_invert)
 {
-  /* ne(a,b) = ~eq(a,b); le{,u}(a,b) = ~gt{,u}(a,b)  */
-  enum rtx_code code = reverse_condition (rcode);
-  mask = gen_reg_rtx (dmode);
-  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
-  enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
-  gcc_assert (nor_code != CODE_FOR_nothing);
-  emit_insn (GEN_FCN (nor_code) (mask, mask));
-} else {
-  /* ge{,u}(a,b) = ~gt{,u}(b,a)  */
-  gcc_assert (rcode == GE || rcode == GEU);
-  enum rtx_code code = rcode == GE ? GT : GTU;
-  mask = gen_reg_rtx (dmode);
-  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
   enum insn_code nor_code = optab_handler (one_cmpl_optab, dmode);
   gcc_assert (nor_code != CODE_FOR_nothing);
   emit_insn (GEN_FCN (nor_code) (mask, mask));
-- 
2.27.0



[PATCH 6/9] rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2

2022-11-24 Thread Kewen Lin via Gcc-patches
The current handlings in rs6000_emit_vector_compare is a bit
complicated to me, especially after we emit vector float
comparison insn with the given code directly.  So it's better
to refactor the handlings of vector integer comparison here.

This is part 2, it's to refactor the handlings on LT and LTU.
This patch doesn't introduce any functionality change.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Refine the
handlings for operators LT and LTU.
---
 gcc/config/rs6000/rs6000.cc | 32 +---
 1 file changed, 9 insertions(+), 23 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 0a5826800c5..c1aebbb5c03 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15672,22 +15672,18 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
   return mask;
 }
-
-  bool swap_operands = false;
-  bool try_again = false;
+  else if (rcode == LT || rcode == LTU)
+{
+  /* lt{,u}(a,b) = gt{,u}(b,a)  */
+  enum rtx_code code = swap_condition (rcode);
+  std::swap (op0, op1);
+  mask = gen_reg_rtx (dmode);
+  emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (code, dmode, op0, op1)));
+  return mask;
+}
 
   switch (rcode)
 {
-case LT:
-  rcode = GT;
-  swap_operands = true;
-  try_again = true;
-  break;
-case LTU:
-  rcode = GTU;
-  swap_operands = true;
-  try_again = true;
-  break;
 case NE:
   /* Invert condition and try again.
 e.g., A != B becomes ~(A==B).  */
@@ -15761,16 +15757,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   return NULL_RTX;
 }
 
-  if (try_again)
-{
-  if (swap_operands)
-   std::swap (op0, op1);
-
-  mask = rs6000_emit_vector_compare (rcode, op0, op1, dmode);
-  if (mask)
-   return mask;
-}
-
   /* You only get two chances.  */
   return NULL_RTX;
 }
-- 
2.27.0



[PATCH 3/9] rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3

2022-11-24 Thread Kewen Lin via Gcc-patches
All kinds of vector float comparison operators have been
supported in a rtl comparison pattern as vector.md, we can
just emit an rtx comparison insn with the given comparison
operator in function rs6000_emit_vector_compare instead of
checking and handling the reverse condition cases.

This is part 3, it further checks for comparison opeators
LE/UNGT.  In rs6000_emit_vector_compare, UNGT is handled
with reversed code LE and inverting with one_cmpl_optab,
LE is handled with LT ior EQ, while in vector.md, we have
the support:

; le(a,b)   = ge(b,a)
; ungt(a,b) = ~le(a,b)

The associated test case shows it's an improvement.

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_emit_vector_compare): Emit rtx
comparison for operators LE/UNGT of MODE_VECTOR_FLOAT directly.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/vcond-fp.c: New test.
---
 gcc/config/rs6000/rs6000.cc |  9 
 gcc/testsuite/gcc.target/powerpc/vcond-fp.c | 25 +
 2 files changed, 29 insertions(+), 5 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 09299bef6a2..98754805bd2 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -15682,15 +15682,15 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
  of raising invalid exception.  For EQ/GT/GE/UNORDERED/
  ORDERED/LTGT/UNEQ, they are handled equivalently as before;
  for NE/UNLE/UNLT, they are handled with reversed code
- and inverting, it's the same as before.
+ and inverting, it's the same as before; for LE/UNGT, they
+ are handled with LE ior EQ previously, emitting directly
+ here will make use of GE later, it's slightly better;
 
  FIXME: Handle the remaining vector float comparison operators
  here.  */
   if (GET_MODE_CLASS (dmode) == MODE_VECTOR_FLOAT
   && rcode != LT
-  && rcode != LE
-  && rcode != UNGE
-  && rcode != UNGT)
+  && rcode != UNGE)
 {
   mask = gen_reg_rtx (dmode);
   emit_insn (gen_rtx_SET (mask, gen_rtx_fmt_ee (rcode, dmode, op0, op1)));
@@ -15719,7 +15719,6 @@ rs6000_emit_vector_compare (enum rtx_code rcode,
   break;
 case NE:
 case UNGE:
-case UNGT:
   /* Invert condition and try again.
 e.g., A != B becomes ~(A==B).  */
   {
diff --git a/gcc/testsuite/gcc.target/powerpc/vcond-fp.c 
b/gcc/testsuite/gcc.target/powerpc/vcond-fp.c
new file mode 100644
index 000..b71861d3588
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vcond-fp.c
@@ -0,0 +1,25 @@
+/* { dg-require-effective-target powerpc_p8vector_ok } */
+/* { dg-options "-O2 -ftree-vectorize -fno-vect-cost-model -mpower8-vector" } 
*/
+
+/* Test we use xvcmpge[sd]p rather than xvcmpeq[sd]p and xvcmpgt[sd]p
+   for UNGT and LE handlings.  */
+
+#define UNGT(a, b) (!__builtin_islessequal ((a), (b)))
+#define LE(a, b) (((a) <= (b)))
+
+#define TEST_VECT(NAME, TYPE)  
\
+  __attribute__ ((noipa)) void test_##NAME##_##TYPE (TYPE *x, TYPE *y, 
\
+int *res, int n)  \
+  {
\
+for (int i = 0; i < n; i++)
\
+  res[i] = NAME (x[i], y[i]);  
\
+  }
+
+#define TEST(TYPE) 
\
+  TEST_VECT (UNGT, TYPE)   
\
+  TEST_VECT (LE, TYPE)
+
+TEST (float)
+TEST (double)
+
+/* { dg-final { scan-assembler-not {\mxvcmp(gt|eq)[sd]p\M} } } */
-- 
2.27.0



[PATCH 0/9] rs6000: Rework rs6000_emit_vector_compare

2022-11-24 Thread Kewen Lin via Gcc-patches
Hi,

Following Segher's suggestion, this patch series is to rework
function rs6000_emit_vector_compare for vector float and int
in multiple steps, it's based on the previous attempts [1][2].
As mentioned in [1], the need to rework this for float is to
make a centralized place for vector float comparison handlings
instead of supporting with swapping ops and reversing code etc.
dispersedly.  It's also for a subsequent patch to handle
comparison operators with or without trapping math (PR105480).
With the handling on vector float reworked, we can further make
the handling on vector int simplified as shown.

For Segher's concern about whether this rework causes any
assembly change, I constructed two testcases for vector float[3]
and int[4] respectively before, it showed the most are fine
excepting for the difference on LE and UNGT, it's demonstrated
as improvement since it uses GE instead of GT ior EQ.  The
associated test case in patch 3/9 is a good example.

Besides, w/ and w/o the whole patch series, I built the whole
SPEC2017 at options -O3 and -Ofast separately, checked the
differences on object assembly.  The result showed that the
most are unchanged, except for:

  * at -O3, 521.wrf_r has 9 object files and 526.blender_r has
9 object files with differences.

  * at -Ofast, 521.wrf_r has 12 object files, 526.blender_r has
one and 527.cam4_r has 4 object files with differences.

By looking into these differences, all significant differences
are caused by the known improvement mentined above transforming
GT ior EQ to GE, which can also affect unrolling decision due
to insn count.  Some other trivial differences are branch
target offset difference, nop difference for alignment, vsx
register number differences etc.

I also evaluated the runtime performance for these changed
benchmarks, the result is neutral.

These patches are bootstrapped and regress-tested
incrementally on powerpc64-linux-gnu P7 & P8, and
powerpc64le-linux-gnu P9 & P10.

Is it ok for trunk?

BR,
Kewen
-
[1] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606375.html
[2] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606376.html
[3] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606504.html
[4] https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606506.html

Kewen Lin (9):
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p1
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p2
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p3
  rs6000: Rework vector float comparison in rs6000_emit_vector_compare - p4
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p1
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p2
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p3
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p4
  rs6000: Rework vector integer comparison in rs6000_emit_vector_compare - p5

 gcc/config/rs6000/rs6000.cc | 180 ++--
 gcc/testsuite/gcc.target/powerpc/vcond-fp.c |  25 +++
 2 files changed, 74 insertions(+), 131 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vcond-fp.c

-- 
2.27.0



[PATCH] c++: Don't clear TREE_READONLY for -fmerge-all-constants for non-aggregates [PR107558]

2022-11-24 Thread Jakub Jelinek via Gcc-patches
Hi!

The following testcase ICEs, because OpenMP lowering for shared clause
on l variable with REFERENCE_TYPE creates POINTER_TYPE to REFERENCE_TYPE.
The reason is that the automatic variable has non-trivial construction
(reference to a lambda) and -fmerge-all-constants is on and so TREE_READONLY
isn't set - omp-low will handle automatic TREE_READONLY vars in shared
specially and only copy to the construct and not back, while !TREE_READONLY
are assumed to be changeable.
The PR91529 change rationale was that the gimplification can change
some non-addressable automatic variables to TREE_STATIC with
-fmerge-all-constants and therefore TREE_READONLY on them is undesirable.
But, the gimplifier does that only for aggregate variables:
  switch (TREE_CODE (type))
{  
case RECORD_TYPE:
case UNION_TYPE:
case QUAL_UNION_TYPE:
case ARRAY_TYPE:
and not for anything else.  So, I think clearing TREE_READONLY for
automatic integral or reference or pointer etc. vars for
-fmerge-all-constants only is unnecessary.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-11-24  Jakub Jelinek  

PR c++/107558
* decl.cc (cp_finish_decl): Don't clear TREE_READONLY on
automatic non-aggregate variables just because of
-fmerge-all-constants.

* g++.dg/gomp/pr107558.C: New test.

--- gcc/cp/decl.cc.jj   2022-11-19 09:21:14.662439877 +0100
+++ gcc/cp/decl.cc  2022-11-23 13:12:31.866553152 +0100
@@ -8679,8 +8679,10 @@ cp_finish_decl (tree decl, tree init, bo
 
   if (var_definition_p
  /* With -fmerge-all-constants, gimplify_init_constructor
-might add TREE_STATIC to the variable.  */
- && (TREE_STATIC (decl) || flag_merge_constants >= 2))
+might add TREE_STATIC to aggregate variables.  */
+ && (TREE_STATIC (decl)
+ || (flag_merge_constants >= 2
+ && AGGREGATE_TYPE_P (type
{
  /* If a TREE_READONLY variable needs initialization
 at runtime, it is no longer readonly and we need to
--- gcc/testsuite/g++.dg/gomp/pr107558.C.jj 2022-11-23 13:13:27.260736525 
+0100
+++ gcc/testsuite/g++.dg/gomp/pr107558.C2022-11-23 13:15:22.271041005 
+0100
@@ -0,0 +1,14 @@
+// PR c++/107558
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-fmerge-all-constants" }
+// { dg-additional-options "-flto" { target lto } }
+
+int a = 15;
+
+void
+foo ()
+{
+  auto &&l = [&]() { return a; };
+#pragma omp target parallel
+  l ();
+}

Jakub



[r13-4272 Regression] FAIL: gcc.dg/guality/loop-1.c -O3 -g -DPREVENT_OPTIMIZATION line 20 i == 1 on Linux/x86_64

2022-11-24 Thread haochen.jiang via Gcc-patches
On Linux/x86_64,

8caf155a3d6e23e47bf55068ad23c23d4655a054 is the first bad commit
commit 8caf155a3d6e23e47bf55068ad23c23d4655a054
Author: Hongyu Wang 
Date:   Sat Nov 19 09:38:00 2022 +0800

i386: Only enable small loop unrolling in backend [PR 107692]

caused

FAIL: gcc.dg/guality/loop-1.c   -O2  -DPREVENT_OPTIMIZATION  line 20 i == 1
FAIL: gcc.dg/guality/loop-1.c   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  -DPREVENT_OPTIMIZATION line 20 i == 1
FAIL: gcc.dg/guality/loop-1.c   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  -DPREVENT_OPTIMIZATION line 20 i == 1
FAIL: gcc.dg/guality/loop-1.c   -O3 -g  -DPREVENT_OPTIMIZATION  line 20 i == 1

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r13-4272/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/loop-1.c --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="guality.exp=gcc.dg/guality/loop-1.c --target_board='unix{-m32\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com)


Re: [PATCH v2] [x86] Fix incorrect _mm_cvtsbh_ss.

2022-11-24 Thread Jakub Jelinek via Gcc-patches
On Thu, Nov 24, 2022 at 09:22:00AM +0800, liuhongt via Gcc-patches wrote:
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -130,6 +130,7 @@ (define_c_enum "unspec" [
>;; For AVX/AVX512F support
>UNSPEC_SCALEF
>UNSPEC_PCMP
> +  UNSPEC_CVTBFSF
>  
>;; Generic math support
>UNSPEC_IEEE_MIN; not commutative
> @@ -4961,6 +4962,31 @@ (define_insn "*extendhf2"
> (set_attr "prefix" "evex")
> (set_attr "mode" "")])
>  
> +(define_expand "extendbfsf2"
> +  [(set (match_operand:SF 0 "register_operand")
> + (unspec:SF
> +   [(match_operand:BF 1 "register_operand")]
> +  UNSPEC_CVTBFSF))]
> + "TARGET_SSE2 && !HONOR_NANS (BFmode) && !flag_signaling_nans")

I think if !HONOR_NANS (BFmode), then flag_signaling_nans doesn't matter,
the former says that no NaNs may appear in a valid program,
so just testing !HONOR_NANS (BFmode) should be enough.

What I'm not sure about, my memory is weak, is whether one can
safely use the fast math related tests in define_expand conditions.
I vaguely remember init_all_optabs remembers the conditions, for
changes say in the ISA options optabs are reinited, but not sure if
that happens for optimization option changes like the fast math related
options are.  So it would be perhaps safer to use just TARGET_SSE2
as the expand condition and in the C code body do
if (HONOR_NANS (BFmode) FAIL;
(similarly for truncsfbf2).
On the other side brief look at x86 insn-flags.h shows several fast math
related checks in HAVE_* macros.
PR92791 I found related to this was actually about
optimize_function_for_{size,speed}_p (cfun)
so maybe fast math related stuff is fine, just not the optimization for
speed or size.

Jakub



Re: [PING] [PATCH RESEND] riscv: improve the cost model for loading a 64bit constant in rv32.

2022-11-24 Thread Sinan via Gcc-patches
> The motivation of this patch is to correct the wrong estimation of
>> the number of instructions needed for loading a 64bit constant in
>> rv32 in the current cost model(riscv_interger_cost). According to
>> the current implementation, if a constant requires more than 3
>> instructions(riscv_const_insn and riscv_legitimate_constant_p),
>> then the constant will be put into constant pool when expanding
>> gimple to rtl(legitimate_constant_p hook and emit_move_insn).
>> So the inaccurate cost model leads to the suboptimal codegen
>> in rv32 and the wrong estimation part could be corrected through
>> this fix.
>>
>> e.g. the current codegen for loading 0x839290001 in rv32
>>
>> lui a5,%hi(.LC0)
>> lw a0,%lo(.LC0)(a5)
>> lw a1,%lo(.LC0+4)(a5)
>> .LC0:
>> .word 958988289
>> .word 8
>>
>> output after this patch
>>
>> li a0,958988288
>> addi a0,a0,1
>> li a1,8
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv.cc (riscv_build_integer): Handle the case of loading 
>> 64bit constant in rv32.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/rv32-load-64bit-constant.c: New test.
>>
>> Signed-off-by: Lin Sinan 
>> ---
>> gcc/config/riscv/riscv.cc | 23 +++
>> .../riscv/rv32-load-64bit-constant.c | 38 +++
>> 2 files changed, 61 insertions(+)
>> create mode 100644 gcc/testsuite/gcc.target/riscv/rv32-load-64bit-constant.c
>>
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index 32f9ef9ade9..9dffabdc5e3 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -618,6 +618,29 @@ riscv_build_integer (struct riscv_integer_op *codes, 
>> HOST_WIDE_INT value,
>> }
>> }
>> 
>> + if ((value > INT32_MAX || value < INT32_MIN) && !TARGET_64BIT)
>
> Nit. It's common practice to have the TARGET test first in a series of 
> tests. It may also be advisable to break this into two lines. 
> Something like this:
>
>
> if ((!TARGET_64BIT)
> || value > INT32_MAX || value < INT32_MIN)
>
>
> That's the style most GCC folks are more accustomed to reading.
Thanks for the tips and I will change it then.
>> + {
>> + unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
>> + unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
>> + struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS],
>> + hicode[RISCV_MAX_INTEGER_OPS];
>> + int hi_cost, lo_cost;
>> +
>> + hi_cost = riscv_build_integer_1 (hicode, hival, mode);
>> + if (hi_cost < cost)
>> + {
>> + lo_cost = riscv_build_integer_1 (alt_codes, loval, mode);
>> + if (lo_cost + hi_cost < cost)
>
> Just so I'm sure. "cost" here refers strictly to other synthesized 
> forms? If so, then ISTM that we'd want to generate the new style when 
> lo_cost + hi_cost < cost OR when lo_cost + hi_cost is less than loading 
> the constant from memory -- which is almost certainly more than "3" 
> since the sequence from memory will be at least 3 instructions, two of 
> which will hit memory.
>
>
> Jeff
>
Yes, almost right. The basic idea of this patch is to improve the cost
calculation for loading 64bit constant in rv32, instead of adding a new
way to load constant.
gcc now loads 0x739290001LL in rv32gc with three instructions,
 li a0,958988288
 addi a0,a0,1
 li a1,7
However, when it loads 0x839290001LL, the output assembly becomes
 lui a5,%hi(.LC0)
 lw a0,%lo(.LC0)(a5)
 lw a1,%lo(.LC0+4)(a5)
 .LC0:
 .word 958988289
 .word 8
The cost calculation is inaccurate in such cases, since loading these
two constant should have no difference in rv32 (just change `li a1,7`
to `li a1,8` to load the hi part). This patch will take these cases
into consideration.
BR,
Sinan