RE: [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines.

2022-01-10 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Richard Biener 
> Sent: Monday, January 10, 2022 1:00 PM
> To: Tamar Christina 
> Cc: GCC Patches ; nd ; Richard
> Guenther 
> Subject: Re: [1/3 PATCH]middle-end vect: Simplify and extend the complex
> numbers validation routines.
> 
> On Fri, Dec 17, 2021 at 4:44 PM Tamar Christina via Gcc-patches
>  wrote:
> >
> > Hi All,
> >
> > This patch boosts the analysis for complex mul,fma and fms in order to
> ensure
> > that it doesn't create an incorrect output.
> >
> > Essentially it adds an extra verification to check that the two nodes it's
> going
> > to combine do the same operations on compatible values.  The reason it
> needs to
> > do this is that if one computation differs from the other then with the
> current
> > implementation we have no way to deal with it since we have to remove
> the
> > permute.
> >
> > When we can keep the permute around we can probably handle these by
> unrolling.
> >
> > While implementing this since I have to do the traversal anyway I took
> advantage
> > of it by simplifying the code a bit.  Previously we would determine whether
> > something is a conjugate and then try to figure out which conjugate it is
> and
> > then try to see if the permutes match what we expect.
> >
> > Now the code that does the traversal will detect this in one go and return
> to us
> > whether the operation is something that can be combined and whether a
> conjugate
> > is present.
> >
> > Secondly because it does this I can now simplify the checking code itself to
> > essentially just try to apply fixed patterns to each operation.
> >
> > The patterns represent the order operations should appear in. For instance
> a
> > complex MUL operation combines :
> >
> >   Left 1 + Right 1
> >   Left 2 + Right 2
> >
> > with a permute on the nodes consisting of:
> >
> >   { Even, Even } + { Odd, Odd  }
> >   { Even, Odd  } + { Odd, Even }
> >
> > By abstracting over these patterns the checking code becomes quite simple.
> >
> > As part of this I was checking the order of the operands which was left in
> > "slp" order. as in, the same order they showed up in during SLP, which
> means
> > that the accumulator is first.  However it looks like I didn't document this
> > and the x86 optab was implemented assuming the same order as FMA, i.e.
> that
> > the accumulator is last.
> >
> > I have this changed the order to match that of FMA and FMS which corrects
> the
> > x86 codegen and will update the Arm targets.  This has now also been
> > documented.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > x86_64-pc-linux-gnu and no regressions.
> >
> > Ok for master? and backport to GCC 11 after some stew?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/102819
> > PR tree-optimization/103169
> > * doc/md.texi: Update docs for cfms, cfma.
> > * tree-data-ref.h (same_data_refs): Accept optional offset.
> > * tree-vect-slp-patterns.c (is_linear_load_p): Fix issue with 
> > repeating
> > patterns.
> > (vect_normalize_conj_loc): Remove.
> > (is_eq_or_top): Change to take two nodes.
> > (enum _conj_status, compatible_complex_nodes_p,
> > vect_validate_multiplication): New.
> > (class complex_add_pattern, complex_add_pattern::matches,
> > complex_add_pattern::recognize, class complex_mul_pattern,
> > complex_mul_pattern::recognize, class complex_fms_pattern,
> > complex_fms_pattern::recognize, class complex_operations_pattern,
> > complex_operations_pattern::recognize, addsub_pattern::recognize):
> Pass
> > new cache.
> > (complex_fms_pattern::matches, complex_mul_pattern::matches):
> Pass new
> > cache and use new validation code.
> > * tree-vect-slp.c (vect_match_slp_patterns_2,
> vect_match_slp_patterns,
> > vect_analyze_slp): Pass along cache.
> > (compatible_calls_p): Expose.
> > * tree-vectorizer.h (compatible_calls_p, slp_node_hash,
> > slp_compat_nodes_map_t): New.
> > (class vect_pattern): Update signatures include new cache.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR tree-optimization/102819
> > PR tree-optimization/103169
> > * g++.dg/vect/pr99149.cc: xfail for now.
> > * gcc.dg/vect/complex/pr102819-1.c: New test.
> > * gcc.dg/vect/complex/pr102819-2.c: New test.
> > * gcc.dg/vect/complex/pr102819-3.c: New test.
> > * gcc.dg/vect/complex/pr102819-4.c: New test.
> > * gcc.dg/vect/complex/pr102819-5.c: New test.
> > * gcc.dg/vect/complex/pr102819-6.c: New test.
> > * gcc.dg/vect/complex/pr102819-7.c: New test.
> > * gcc.dg/vect/complex/pr102819-8.c: New test.
> > * gcc.dg/vect/complex/pr102819-9.c: New test.
> > * gcc.dg/vect/complex/pr103169.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > i

Re: Ping: [PATCH] rs6000: Add split pattern to replace

2022-01-10 Thread Xionghu Luo via Gcc-patches



On 2022/1/11 06:55, David Edelsohn wrote:
>>> +(define_insn_and_split "sldoi_to_mov_"
> It would be more consistent with the naming convention to use
> "sldoi_to_mov" without the final "_".

OK, thanks.

> 
>>> +  [(set (match_operand:VM 0 "altivec_register_operand")
>>> + (unspec:VM [(match_operand:VM 1 "easy_vector_constant")
> Should this be "easy_vector_constant_vsldoi"?


This doesn't work. easy_vector_constant_vsldoi return false due to
vspltis_shifted "return 0" as:

vspltis_shifted (rtx op): /* If all elements are equal, we don't need to do 
VSLDOI.  */

 
(gdb) p op
$7 = (rtx_def *) (const_vector:V4SI [
(const_int 0 [0]) repeated x4
])
(gdb) p easy_vector_constant_vsldoi(op, V4SImode)
$8 = false
p easy_vector_constant(op, V4SImode)
$9 = true

> 
>>> + (match_dup 1)
>>> + (match_operand:VM 2 "u5bit_cint_operand")]
> This should be match_operand:QI, right?

Yes. 

> 
> Thanks, David
> 

-- 
Thanks,
Xionghu


Re: [PATCH] testsuite: Fix regression on m32 by r12-6087 [PR103820]

2022-01-10 Thread Richard Biener via Gcc-patches
On Tue, Jan 11, 2022 at 6:27 AM Xionghu Luo  wrote:
>
> r12-6087 will avoid move cold bb out of hot loop, while the original
> intent of this testcase is to hoist divides out of loop and CSE them to
> only one divide.  So increase the loop count to turn the cold bb to hot
> bb again.  Then the 3 divides could be rewritten with same reciptmp.
>
> Tested pass on Power-Linux {32,64}, x86 {64,32} and i686-linux, OK for
> master?

OK.

Thanks,
Richard.

> gcc/testsuite/ChangeLog:
>
> PR 103820
> * gcc.dg/tree-ssa/recip-3.c: Adjust.
> ---
>  gcc/testsuite/gcc.dg/tree-ssa/recip-3.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
> index 641c91e719e..410b28044b4 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O1 -fno-trapping-math -funsafe-math-optimizations 
> -fdump-tree-recip" } */
>
> -double F[2] = { 0.0, 0.0 }, e;
> +double F[5] = { 0.0, 0.0 }, e;
>
>  /* In this case the optimization is interesting.  */
>  float h ()
> @@ -13,7 +13,7 @@ float h ()
> d = 2.*e;
> E = 1. - d;
>
> -   for( i=0; i < 2; i++ )
> +   for( i=0; i < 5; i++ )
> if( d > 0.01 )
> {
> P = ( W < E ) ? (W - E)/d : (E - W)/d;
> @@ -23,4 +23,4 @@ float h ()
> F[0] += E / d;
>  }
>
> -/* { dg-final { scan-tree-dump-times " / " 5 "recip" } } */
> +/* { dg-final { scan-tree-dump-times " / " 1 "recip" } } */
> --
> 2.27.0.90.geebb51ba8c
>


Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2022-01-10 Thread Richard Biener via Gcc-patches
On Mon, 10 Jan 2022, Andre Vieira (lists) wrote:

> Hi,
> 
> I don't think I ever ended up posting the rebased version on top of the
> epilogue mode patch. So here it is, I think I had a conditional OK if I split
> the epilogue mode patch, but just want to double check this is OK for trunk?

Yes, I think I acked this.

Richard.

> 
> gcc/ChangeLog:
> 
>     * tree-vect-loop.c (vect_estimate_min_profitable_iters): Pass new
> argument
>     suggested_unroll_factor.
>     (vect_analyze_loop_costing): Likewise.
>     (_loop_vec_info::_loop_vec_info): Initialize new member 
> suggested_unroll_factor.
>     (vect_determine_partial_vectors_and_peeling): Make epilogue of
> unrolled
>     main loop use partial vectors.
>     (vect_analyze_loop_2): Pass and use new argument 
> suggested_unroll_factor.
>     (vect_analyze_loop_1): Likewise.
>     (vect_analyze_loop): Change to intialize local 
> suggested_unroll_factor and use it.
>     (vectorizable_reduction): Don't use single_defuse_cycle when
> unrolling.
>     * tree-vectorizer.h (_loop_vec_info::_loop_vec_info): Add new member
> suggested_unroll_factor.
>     (vector_costs::vector_costs): Add new member
> m_suggested_unroll_factor.
>     (vector_costs::suggested_unroll_factor): New getter function.
>     (finish_cost): Set return argument suggested_unroll_factor.
> 
> 
> 
> Regards,
> Andre
> 
> On 30/11/2021 13:56, Richard Biener wrote:
> > On Tue, 30 Nov 2021, Andre Vieira (lists) wrote:
> >
> >> On 25/11/2021 12:46, Richard Biener wrote:
> >>> Oops, my fault, yes, it does.  I would suggest to refactor things so
> >>> that the mode_i = first_loop_i case is there only once.  I also wonder
> >>> if all the argument about starting at 0 doesn't apply to the
> >>> not unrolled LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P as well?  So
> >>> what's the reason to differ here?  So in the end I'd just change
> >>> the existing
> >>>
> >>> if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
> >>>   {
> >>>
> >>> to
> >>>
> >>> if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)
> >>> || first_loop_vinfo->suggested_unroll_factor > 1)
> >>>   {
> >>>
> >>> and maybe revisit this when we have an actual testcase showing that
> >>> doing sth else has a positive effect?
> >>>
> >>> Thanks,
> >>> Richard.
> >> So I had a quick chat with Richard Sandiford and he is suggesting resetting
> >> mode_i to 0 for all cases.
> >>
> >> He pointed out that for some tunings the SVE mode might come after the NEON
> >> mode, which means that even for not-unrolled loop_vinfos we could end up
> >> with
> >> a suboptimal choice of mode for the epilogue. I.e. it could be that we pick
> >> V16QI for main vectorization, but that's VNx16QI + 1 in the array, so we'd
> >> not
> >> try VNx16QI for the epilogue.
> >>
> >> This would simplify the mode selecting cases, by just simply restarting at
> >> mode_i in all epilogue cases. Is that something you'd be OK?
> > Works for me with an updated comment.  Even better with showing a
> > testcase exercising such tuning.
> >
> > Richard.
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Ivo Totev; HRB 36809 (AG Nuernberg)


RE: [3/3 PATCH][AArch32] use canonical ordering for complex mul, fma and fms

2022-01-10 Thread Tamar Christina via Gcc-patches
ping

> -Original Message-
> From: Tamar Christina
> Sent: Monday, December 20, 2021 4:22 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: RE: [3/3 PATCH][AArch32] use canonical ordering for complex mul,
> fma and fms
> 
> Updated version of patch following AArch64 review.
> 
> Bootstrapped Regtested on arm-none-linux-gnueabihf and no issues.
> 
> Ok for master? and backport along with the first patch?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/102819
>   PR tree-optimization/103169
>   * config/arm/vec-common.md (cml4):
> Use
>   canonical order.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/config/arm/vec-common.md b/gcc/config/arm/vec-
> common.md index
> e71d9b3811fde62159f5c21944fef9fe3f97b4bd..eab77ac8decce76d70f5b2594f
> 4439e6ed363e6e 100644
> --- a/gcc/config/arm/vec-common.md
> +++ b/gcc/config/arm/vec-common.md
> @@ -265,18 +265,18 @@ (define_expand "arm_vcmla"
>  ;; remainder.  Because of this, expand early.
>  (define_expand "cml4"
>[(set (match_operand:VF 0 "register_operand")
> - (plus:VF (match_operand:VF 1 "register_operand")
> -  (unspec:VF [(match_operand:VF 2 "register_operand")
> -  (match_operand:VF 3 "register_operand")]
> - VCMLA_OP)))]
> + (plus:VF (unspec:VF [(match_operand:VF 1 "register_operand")
> +  (match_operand:VF 2 "register_operand")]
> + VCMLA_OP)
> +  (match_operand:VF 3 "register_operand")))]
>"(TARGET_COMPLEX || (TARGET_HAVE_MVE &&
> TARGET_HAVE_MVE_FLOAT
> && ARM_HAVE__ARITH))
> && !BYTES_BIG_ENDIAN"
>  {
>rtx tmp = gen_reg_rtx (mode);
> -  emit_insn (gen_arm_vcmla (tmp, operands[1],
> -  operands[3], operands[2]));
> +  emit_insn (gen_arm_vcmla (tmp, operands[3],
> +  operands[2], operands[1]));
>emit_insn (gen_arm_vcmla (operands[0], tmp,
> -  operands[3], operands[2]));
> +  operands[2], operands[1]));
>DONE;
>  })



RE: [2/3 PATCH]AArch64 use canonical ordering for complex mul, fma and fms

2022-01-10 Thread Tamar Christina via Gcc-patches
ping

> -Original Message-
> From: Tamar Christina
> Sent: Monday, December 20, 2021 4:21 PM
> To: Richard Sandiford 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw 
> ; Marcus Shawcroft 
> ; Kyrylo Tkachov 
> Subject: RE: [2/3 PATCH]AArch64 use canonical ordering for complex 
> mul, fma and fms
> 
> 
> 
> > -Original Message-
> > From: Richard Sandiford 
> > Sent: Friday, December 17, 2021 4:49 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw 
> > ; Marcus Shawcroft 
> > ; Kyrylo Tkachov
> 
> > Subject: Re: [2/3 PATCH]AArch64 use canonical ordering for complex 
> > mul, fma and fms
> >
> > Richard Sandiford  writes:
> > > Tamar Christina  writes:
> > >> Hi All,
> > >>
> > >> After the first patch in the series this updates the optabs to 
> > >> expect the canonical sequence.
> > >>
> > >> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >>
> > >> Ok for master? and backport along with the first patch?
> > >>
> > >> Thanks,
> > >> Tamar
> > >>
> > >> gcc/ChangeLog:
> > >>
> > >>  PR tree-optimization/102819
> > >>  PR tree-optimization/103169
> > >>  * config/aarch64/aarch64-simd.md
> > (cml4,
> > >>  cmul3): Use canonical order.
> > >>  * config/aarch64/aarch64-sve.md (cml4,
> > >>  cmul3): Likewise.
> > >>
> > >> --- inline copy of patch --
> > >> diff --git a/gcc/config/aarch64/aarch64-simd.md
> > >> b/gcc/config/aarch64/aarch64-simd.md
> > >> index
> > >>
> >
> f95a7e1d91c97c9e981d75e71f0b49c02ef748ba..875896ee71324712c8034eeff9
> > c
> > >> fb5649f9b0e73 100644
> > >> --- a/gcc/config/aarch64/aarch64-simd.md
> > >> +++ b/gcc/config/aarch64/aarch64-simd.md
> > >> @@ -556,17 +556,17 @@ (define_insn
> > "aarch64_fcmlaq_lane"
> > >>  ;; remainder.  Because of this, expand early.
> > >>  (define_expand "cml4"
> > >>[(set (match_operand:VHSDF 0 "register_operand")
> > >> -(plus:VHSDF (match_operand:VHSDF 1 "register_operand")
> > >> -(unspec:VHSDF [(match_operand:VHSDF 2
> > "register_operand")
> > >> -   (match_operand:VHSDF 3
> > "register_operand")]
> > >> -   FCMLA_OP)))]
> > >> +(plus:VHSDF (unspec:VHSDF [(match_operand:VHSDF 1
> > "register_operand")
> > >> +   (match_operand:VHSDF 2
> > "register_operand")]
> > >> +   FCMLA_OP)
> > >> +(match_operand:VHSDF 3 "register_operand")))]
> > >>"TARGET_COMPLEX && !BYTES_BIG_ENDIAN"
> > >>  {
> > >>rtx tmp = gen_reg_rtx (mode);
> > >> -  emit_insn (gen_aarch64_fcmla (tmp, operands[1],
> > >> - operands[3], 
> > >> operands[2]));
> > >> +  emit_insn (gen_aarch64_fcmla (tmp, operands[3],
> > >> + operands[1], 
> > >> operands[2]));
> > >>emit_insn (gen_aarch64_fcmla (operands[0], tmp,
> > >> - operands[3], 
> > >> operands[2]));
> > >> + operands[1], 
> > >> operands[2]));
> > >>DONE;
> > >>  })
> > >>
> > >> @@ -583,9 +583,9 @@ (define_expand "cmul3"
> > >>rtx tmp = force_reg (mode, CONST0_RTX (mode));
> > >>rtx res1 = gen_reg_rtx (mode);
> > >>emit_insn (gen_aarch64_fcmla (res1, tmp,
> > >> - operands[2], 
> > >> operands[1]));
> > >> + operands[1], 
> > >> operands[2]));
> > >>emit_insn (gen_aarch64_fcmla (operands[0], res1,
> > >> - operands[2], 
> > >> operands[1]));
> > >> + operands[1], 
> > >> operands[2]));
> > >
> > > This doesn't look right.  Going from the documentation, patch 1 
> > > isn't changing the operand order for CMUL: the conjugated operand 
> > > (if there is one) is still operand 2.  The FCMLA sequences use the 
> > > opposite order, where the conjugated operand (if there is one) is
> operand 1.
> > > So I think
> >
> > I meant “the first multiplication operand” rather than “operand 1” here.
> >
> > > the reversal here is still needed.
> > >
> > > Same for the multiplication operands in CML* above.
> 
> I did actually change the order in patch 1, but didn't update the docs..
> That was done because I followed the SLP order again, but now I've 
> updated them to do what the docs say.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master? and backport along with the first patch?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/102819
>   PR tree-optimization/103169
>   * config/aarch64/aarch64-simd.md
> (cml4): Use
>   canonical order.
>   * config/aarch64/aarch64-sve.md (cml4):
> Likewise.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md
> b/gcc/config/aarch64/aa

RE: [PATCH][AArch32]: correct usdot-product RTL patterns.

2022-01-10 Thread Tamar Christina via Gcc-patches
ping

> -Original Message-
> From: Tamar Christina
> Sent: Tuesday, December 21, 2021 12:32 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: [PATCH][AArch32]: correct usdot-product RTL patterns.
> 
> Hi All,
> 
> There was a bug in the ACLE specication for dot product which has now been
> fixed[1].  This means some intrinsics were missing and are added by this
> patch.
> 
> Bootstrapped and regtested on arm-none-linux-gnueabihf and no issues.
> 
> Ok for master?
> 
> [1] https://github.com/ARM-software/acle/releases/tag/r2021Q3
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm_neon.h (vusdotq_s32, vusdot_laneq_s32,
>   vusdotq_laneq_s32, vsudot_laneq_s32, vsudotq_laneq_s32): New
>   * config/arm/arm_neon_builtins.def (usdot): Add V16QI.
>   (usdot_laneq, sudot_laneq): New.
>   * config/arm/neon.md (neon_dot_laneq): New.
>   (neon_dot_lane): Remote unneeded code.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/simd/vdot-2-1.c: Add new tests.
>   * gcc.target/arm/simd/vdot-2-2.c: Likewise and fix output.
> 
> --- inline copy of patch --
> diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index
> af6ac63dc3b47830d92f199d93153ff510f658e9..2255d600549a2a1e5dbcebc03f
> 7d6a63bab9f5aa 100644
> --- a/gcc/config/arm/arm_neon.h
> +++ b/gcc/config/arm/arm_neon.h
> @@ -18930,6 +18930,13 @@ vusdot_s32 (int32x2_t __r, uint8x8_t __a,
> int8x8_t __b)
>return __builtin_neon_usdotv8qi_ssus (__r, __a, __b);  }
> 
> +__extension__ extern __inline int32x4_t __attribute__
> +((__always_inline__, __gnu_inline__, __artificial__))
> +vusdotq_s32 (int32x4_t __r, uint8x16_t __a, int8x16_t __b) {
> +  return __builtin_neon_usdotv16qi_ssus (__r, __a, __b); }
> +
>  __extension__ extern __inline int32x2_t  __attribute__
> ((__always_inline__, __gnu_inline__, __artificial__))
>  vusdot_lane_s32 (int32x2_t __r, uint8x8_t __a, @@ -18962,6 +18969,38 @@
> vsudotq_lane_s32 (int32x4_t __r, int8x16_t __a,
>return __builtin_neon_sudot_lanev16qi_sssus (__r, __a, __b, __index);  }
> 
> +__extension__ extern __inline int32x2_t __attribute__
> +((__always_inline__, __gnu_inline__, __artificial__))
> +vusdot_laneq_s32 (int32x2_t __r, uint8x8_t __a,
> +   int8x16_t __b, const int __index)
> +{
> +  return __builtin_neon_usdot_laneqv8qi_ssuss (__r, __a, __b, __index);
> +}
> +
> +__extension__ extern __inline int32x4_t __attribute__
> +((__always_inline__, __gnu_inline__, __artificial__))
> +vusdotq_laneq_s32 (int32x4_t __r, uint8x16_t __a,
> +int8x16_t __b, const int __index)
> +{
> +  return __builtin_neon_usdot_laneqv16qi_ssuss (__r, __a, __b,
> +__index); }
> +
> +__extension__ extern __inline int32x2_t __attribute__
> +((__always_inline__, __gnu_inline__, __artificial__))
> +vsudot_laneq_s32 (int32x2_t __r, int8x8_t __a,
> +   uint8x16_t __b, const int __index)
> +{
> +  return __builtin_neon_sudot_laneqv8qi_sssus (__r, __a, __b, __index);
> +}
> +
> +__extension__ extern __inline int32x4_t __attribute__
> +((__always_inline__, __gnu_inline__, __artificial__))
> +vsudotq_laneq_s32 (int32x4_t __r, int8x16_t __a,
> +uint8x16_t __b, const int __index) {
> +  return __builtin_neon_sudot_laneqv16qi_sssus (__r, __a, __b,
> +__index); }
> +
>  #pragma GCC pop_options
> 
>  #pragma GCC pop_options
> diff --git a/gcc/config/arm/arm_neon_builtins.def
> b/gcc/config/arm/arm_neon_builtins.def
> index
> f83dd4327c16c0af68f72eb6d9ca8cf21e2e56b5..1c150ed3b650a003b44901b4d
> 160a7d6f595f057 100644
> --- a/gcc/config/arm/arm_neon_builtins.def
> +++ b/gcc/config/arm/arm_neon_builtins.def
> @@ -345,9 +345,11 @@ VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi)
>  VAR2 (MAC_LANE, sdot_laneq, v8qi, v16qi)
>  VAR2 (UMAC_LANE, udot_laneq, v8qi, v16qi)
> 
> -VAR1 (USTERNOP, usdot, v8qi)
> +VAR2 (USTERNOP, usdot, v8qi, v16qi)
>  VAR2 (USMAC_LANE_QUADTUP, usdot_lane, v8qi, v16qi)
>  VAR2 (SUMAC_LANE_QUADTUP, sudot_lane, v8qi, v16qi)
> +VAR2 (USMAC_LANE_QUADTUP, usdot_laneq, v8qi, v16qi)
> +VAR2 (SUMAC_LANE_QUADTUP, sudot_laneq, v8qi, v16qi)
> 
>  VAR4 (BINOP, vcadd90, v4hf, v2sf, v8hf, v4sf)
>  VAR4 (BINOP, vcadd270, v4hf, v2sf, v8hf, v4sf) diff --git
> a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index
> 848166311b5f82c5facb66e97c2260a5aba5d302..1707d8e625079b83497a3db44
> db5e33405bb5fa1 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -2977,9 +2977,33 @@ (define_insn "neon_dot_lane"
>   DOTPROD_I8MM)
> (match_operand:VCVTI 1 "register_operand" "0")))]
>"TARGET_I8MM"
> +  "vdot.\\t%0, %2, %P3[%c4]"
> +  [(set_attr "type" "neon_dot")]
> +)
> +
> +;; These instructions map to the __builtins for the Dot Product ;;
> +indexed operations in the v8.6 I8MM extension.
> +(define_insn "neon_dot_laneq"
> +  [(set (match_operand:VCVTI 0 "register_operand" "=w")
> + (plus:VCVTI
> +   (unspec:VCVTI [(match_op

RE: [AArch32]: correct dot-product RTL patterns.

2022-01-10 Thread Tamar Christina via Gcc-patches
ping

> -Original Message-
> From: Tamar Christina
> Sent: Tuesday, December 21, 2021 12:31 PM
> To: gcc-patches@gcc.gnu.org
> Cc: nd ; Ramana Radhakrishnan
> ; Richard Earnshaw
> ; ni...@redhat.com; Kyrylo Tkachov
> 
> Subject: [AArch32]: correct dot-product RTL patterns.
> 
> Hi All,
> 
> The previous fix for this problem was wrong due to a subtle difference
> between where NEON expects the RMW values and where intrinsics expects
> them.
> 
> The insn pattern is modeled after the intrinsics and so needs an expand for
> the vectorizer optab to switch the RTL.
> 
> However operand[3] is not expected to be written to so the current pattern
> is bogus.
> 
> Instead we use the expand to shuffle around the RTL.
> 
> The vectorizer expects operands[3] and operands[0] to be the same but the
> aarch64 intrinsics expanders expect operands[0] and operands[1] to be the
> same.
> 
> This also fixes some issues with big-endian, each dot product performs 4 8-
> byte multiplications.  However compared to AArch64 we don't enter lanes in
> GCC lane indexed in AArch32 aside from loads/stores.  This means no lane
> remappings are done in arm-builtins.c and so none should be done at the
> instruction side.
> 
> There are some other instructions that need inspections as I think there are
> more incorrect ones.
> 
> Third there was a bug in the ACLE specication for dot product which has now
> been fixed[1].  This means some intrinsics were missing and are added by
> this patch.
> 
> Bootstrapped and regtested on arm-none-linux-gnueabihf and no issues.
> 
> Ok for master? and active branches after some stew?
> 
> [1] https://github.com/ARM-software/acle/releases/tag/r2021Q3
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   * config/arm/arm_neon.h (vdot_laneq_u32, vdotq_laneq_u32,
>   vdot_laneq_s32, vdotq_laneq_s32): New.
>   * config/arm/arm_neon_builtins.def (sdot_laneq, udot_laneq: New.
>   * config/arm/neon.md (neon_dot): New.
>   (dot_prod): Re-order rtl.
>   (neon_dot_lane): Fix rtl order and endiannes.
>   (neon_dot_laneq): New.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.target/arm/simd/vdot-compile.c: Add new cases.
>   * gcc.target/arm/simd/vdot-exec.c: Likewise.
> 
> --- inline copy of patch --
> diff --git a/gcc/config/arm/arm_neon.h b/gcc/config/arm/arm_neon.h index
> 3364b37f69dfc33082388246c03149d9ad66a634..af6ac63dc3b47830d92f199d93
> 153ff510f658e9 100644
> --- a/gcc/config/arm/arm_neon.h
> +++ b/gcc/config/arm/arm_neon.h
> @@ -18243,6 +18243,35 @@ vdotq_lane_s32 (int32x4_t __r, int8x16_t __a,
> int8x8_t __b, const int __index)
>return __builtin_neon_sdot_lanev16qi (__r, __a, __b, __index);  }
> 
> +__extension__ extern __inline uint32x2_t __attribute__
> +((__always_inline__, __gnu_inline__, __artificial__))
> +vdot_laneq_u32 (uint32x2_t __r, uint8x8_t __a, uint8x16_t __b, const
> +int __index) {
> +  return __builtin_neon_udot_laneqv8qi_s (__r, __a, __b, __index);
> +}
> +
> +__extension__ extern __inline uint32x4_t __attribute__
> +((__always_inline__, __gnu_inline__, __artificial__))
> +vdotq_laneq_u32 (uint32x4_t __r, uint8x16_t __a, uint8x16_t __b,
> + const int __index)
> +{
> +  return __builtin_neon_udot_laneqv16qi_s (__r, __a, __b, __index);
> +}
> +
> +__extension__ extern __inline int32x2_t __attribute__
> +((__always_inline__, __gnu_inline__, __artificial__))
> +vdot_laneq_s32 (int32x2_t __r, int8x8_t __a, int8x16_t __b, const int
> +__index) {
> +  return __builtin_neon_sdot_laneqv8qi (__r, __a, __b, __index); }
> +
> +__extension__ extern __inline int32x4_t __attribute__
> +((__always_inline__, __gnu_inline__, __artificial__))
> +vdotq_laneq_s32 (int32x4_t __r, int8x16_t __a, int8x16_t __b, const int
> +__index) {
> +  return __builtin_neon_sdot_laneqv16qi (__r, __a, __b, __index); }
> +
>  #pragma GCC pop_options
>  #endif
> 
> diff --git a/gcc/config/arm/arm_neon_builtins.def
> b/gcc/config/arm/arm_neon_builtins.def
> index
> fafb5c6fc51c16679ead1afda7cccfea8264fd15..f83dd4327c16c0af68f72eb6d9ca
> 8cf21e2e56b5 100644
> --- a/gcc/config/arm/arm_neon_builtins.def
> +++ b/gcc/config/arm/arm_neon_builtins.def
> @@ -342,6 +342,8 @@ VAR2 (TERNOP, sdot, v8qi, v16qi)
>  VAR2 (UTERNOP, udot, v8qi, v16qi)
>  VAR2 (MAC_LANE, sdot_lane, v8qi, v16qi)
>  VAR2 (UMAC_LANE, udot_lane, v8qi, v16qi)
> +VAR2 (MAC_LANE, sdot_laneq, v8qi, v16qi)
> +VAR2 (UMAC_LANE, udot_laneq, v8qi, v16qi)
> 
>  VAR1 (USTERNOP, usdot, v8qi)
>  VAR2 (USMAC_LANE_QUADTUP, usdot_lane, v8qi, v16qi) diff --git
> a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index
> 8b0a396947cc8e7345f178b926128d7224fb218a..848166311b5f82c5facb66e97c
> 2260a5aba5d302 100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -2866,20 +2866,49 @@ (define_expand "cmul3"
>  })
> 
> 
> -;; These instructions map to the __builtins for the Dot Product operations.
> -(define_insn "neon_dot"
> +;; These map to the auto-vectorizer Dot Product optab.
> +;; The auto-vectorizer expects

Re: [PATCH] PR tree-optimization/103821 - Prevent exponential range calculations.

2022-01-10 Thread Richard Biener via Gcc-patches
On Tue, Jan 11, 2022 at 12:28 AM Andrew MacLeod via Gcc-patches
 wrote:
>
> This test case demonstrates an unnoticed exponential situation in range-ops.
>
> We end up unrolling the  loop, and the pattern of code creates a set of
> cascading multiplies for which we can precisely evaluate them with
> sub-ranges.
>
> For instance, we calculated :
>
> _38 = int [8192, 8192][24576, 24576][40960, 40960][57344, 57344]
>
> so _38 has 4 sub-ranges, and then we calculate:
>
> _39 = _38 * _38;
>
> we do 16 sub-range multiplications and end up with:  int [67108864,
> 67108864][201326592, 201326592][335544320, 335544320][469762048,
> 469762048][603979776, 603979776][1006632960, 1006632960][1409286144,
> 1409286144][1677721600, 1677721600][+INF, +INF]
>
> This feeds other multiplies (_39 * _39)  and progresses rapidly to blow
> up the number of sub-ranges in subsequent operations.
>
> Folding of sub-ranges is an O(n*m) process. We perform the operation on
> each pair of sub-ranges and union them.   Values like _38 * _38 that
> continue feeding each other quickly become exponential.
>
> Then combining that with union (an inherently linear operation over the
> number of sub-ranges) at each step of the way adds an additional
> quadratic operation on top of the exponential factor.
>
> This patch adjusts the wi_fold routine to recognize when the calculation
> is moving in an exponential direction, simply produce a summary result
> instead of a precise one.  The attached patch does this if (#LH
> sub-ranges * #RH sub-ranges > 12)... then it just performs the operation
> with the lower and upper bound instead.We could choose a different
> number, but that one seems to keep things under control, and allows us
> to process up to a 3x4 operation for precision (there is a testcase in
> the testsuite for this combination gcc.dg/tree-ssa/pr61839_2.c).
> Longer term, we might want adjust this routine to be slightly smarter
> than that, but this is a virtually zero-risk solution this late in the
> release cycle.

I'm not sure we can do smarter in a good way other than maybe having
a range helper that reduces a N component range to M components
with maintaining as much precision as possible?  Like for [1, 1] u [3, 3]
u [100, 100] and requesting at most 2 elements merge [1, 1] and [3, 3]
and not [100, 100].  That should eventually be doable in O(n log n).

> This also a generalize ~1% speedup in the VRP2 pass across 380 gcc
> source files, but I'm sure has much more dramatic results at -O3 that
> this testcase exposes.
>
> Bootstraps on x86_64-pc-linux-gnu with no regressions. OK for trunk?

OK.

Thanks,
Richard.

>
> Andrew


[PATCH] testsuite: Fix regression on m32 by r12-6087 [PR103820]

2022-01-10 Thread Xionghu Luo via Gcc-patches
r12-6087 will avoid move cold bb out of hot loop, while the original
intent of this testcase is to hoist divides out of loop and CSE them to
only one divide.  So increase the loop count to turn the cold bb to hot
bb again.  Then the 3 divides could be rewritten with same reciptmp.

Tested pass on Power-Linux {32,64}, x86 {64,32} and i686-linux, OK for
master?

gcc/testsuite/ChangeLog:

PR 103820
* gcc.dg/tree-ssa/recip-3.c: Adjust.
---
 gcc/testsuite/gcc.dg/tree-ssa/recip-3.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c 
b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
index 641c91e719e..410b28044b4 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/recip-3.c
@@ -1,7 +1,7 @@
 /* { dg-do compile } */
 /* { dg-options "-O1 -fno-trapping-math -funsafe-math-optimizations 
-fdump-tree-recip" } */
 
-double F[2] = { 0.0, 0.0 }, e;
+double F[5] = { 0.0, 0.0 }, e;
 
 /* In this case the optimization is interesting.  */
 float h ()
@@ -13,7 +13,7 @@ float h ()
d = 2.*e;
E = 1. - d;
 
-   for( i=0; i < 2; i++ )
+   for( i=0; i < 5; i++ )
if( d > 0.01 )
{
P = ( W < E ) ? (W - E)/d : (E - W)/d;
@@ -23,4 +23,4 @@ float h ()
F[0] += E / d;
 }
 
-/* { dg-final { scan-tree-dump-times " / " 5 "recip" } } */
+/* { dg-final { scan-tree-dump-times " / " 1 "recip" } } */
-- 
2.27.0.90.geebb51ba8c



Re: [PATCH 5/6] ira: Consider modelling caller-save allocations as loop spills

2022-01-10 Thread Hans-Peter Nilsson via Gcc-patches
> From: Richard Sandiford via Gcc-patches 
> Date: Thu, 6 Jan 2022 15:48:01 +0100

> If an allocno A in an inner loop L spans a call, a parent allocno AP
> can choose to handle a call-clobbered/caller-saved hard register R
> in one of two ways:
> 
> (1) save R before each call in L and restore R after each call
> (2) spill R to memory throughout L
> 
> (2) can be cheaper than (1) in some cases, particularly if L does
> not reference A.
> 
> Before the patch we always did (1).  The patch adds support for
> picking (2) instead, when it seems cheaper.  It builds on the
> earlier support for not propagating conflicts to parent allocnos.
> 
> gcc/
>   PR rtl-optimization/98782
>   * ira-int.h (ira_caller_save_cost): New function.
>   (ira_caller_save_loop_spill_p): Likewise.
>   * ira-build.c (ira_propagate_hard_reg_costs): Test whether it is
>   cheaper to spill a call-clobbered register throughout a loop rather
>   than spill it around each individual call.  If so, treat all
>   call-clobbered registers as conflicts and...
>   (propagate_allocno_info): ...do not propagate call information
>   from the child to the parent.
>   * ira-color.c (move_spill_restore): Update accordingly.
>   * ira-costs.c (ira_tune_allocno_costs): Use ira_caller_save_cost.

I bisected a broken build for cris-elf to this patch.
Details in https://gcc.gnu.org/PR103974 supposedly
sufficient to find a quick resolution.

(JFTR, as you're already CC:ed by your @gcc.gnu.org account.)

Perhaps some of these patches are better postponed for stage 1?

brgds, H-P


Re: [PATCH] rs6000: Remove useless code related to -mno-power10

2022-01-10 Thread Kewen.Lin via Gcc-patches
on 2022/1/11 上午8:26, David Edelsohn wrote:
> On Wed, Dec 29, 2021 at 4:37 AM Kewen.Lin  wrote:
>>
>> Hi,
>>
>> Option -mpower10 was made as "WarnRemoved" since commit r11-2318,
>> so -mno-power10 doesn't take effect any more.  This patch is to
>> remove one line useless code which still respects it.
>>
>> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
>> powerpc64-linux-gnu P8.
>>
>> Is it ok for trunk?
>>
>> BR,
>> Kewen
>> -
>> gcc/ChangeLog:
>>
>> * config/rs6000/rs6000.c (rs6000_disable_incompatible_switches): 
>> Remove
>> useless related to option -mno-power10.
>> ---
>>  gcc/config/rs6000/rs6000.c | 1 -
>>  1 file changed, 1 deletion(-)
>>
>> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
>> index e82a47f4c0e..66b01e589b0 100644
>> --- a/gcc/config/rs6000/rs6000.c
>> +++ b/gcc/config/rs6000/rs6000.c
>> @@ -24825,7 +24825,6 @@ rs6000_disable_incompatible_switches (void)
>>  const HOST_WIDE_INT dep_flags; /* flags that depend on this option. 
>>  */
>>  const char *const name;/* name of the switch.  */
>>} flags[] = {
>> -{ OPTION_MASK_POWER10, OTHER_POWER10_MASKS,"power10"   },
>>  { OPTION_MASK_P9_VECTOR,   OTHER_P9_VECTOR_MASKS,  "power9-vector" },
>>  { OPTION_MASK_P8_VECTOR,   OTHER_P8_VECTOR_MASKS,  "power8-vector" },
>>  { OPTION_MASK_VSX, OTHER_VSX_VECTOR_MASKS, "vsx"   },
> 
> Okay.
> 
> Thanks, David
> 

Thanks!  Pushed via r12-6429.

BR,
Kewen


Re: [PATCH] c++: Reject in constant evaluation address comparisons of start of one var and end of another [PR89074]

2022-01-10 Thread Andrew Pinski via Gcc-patches
On Mon, Jan 10, 2022 at 6:11 AM Richard Biener via Gcc-patches
 wrote:
>
> On Thu, Jan 6, 2022 at 10:25 AM Jakub Jelinek via Gcc-patches
>  wrote:
> >
> > Hi!
> >
> > The following testcase used to be incorrectly accepted.  The match.pd
> > optimization that uses address_compare punts on folding comparison
> > of start of one object and end of another one only when those addresses
> > are cast to integral types, when the comparison is done on pointer types
> > it assumes undefined behavior and decides to fold the comparison such
> > that the addresses don't compare equal even when they at runtime they
> > could be equal.
> > But C++ says it is undefined behavior and so during constant evaluation
> > we should reject those, so this patch adds !folding_initializer &&
> > check to that spot.
> >
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> >
> > Note, address_compare has some special cases, e.g. it assumes that
> > static vars are never adjacent to automatic vars, which is the case
> > for the usual layout where automatic vars are on the stack and after
> > .rodata/.data sections there is heap:
> >   /* Assume that automatic variables can't be adjacent to global
> >  variables.  */
> >   else if (is_global_var (base0) != is_global_var (base1))
> > ;
> > Is it ok that during constant evaluation we don't treat those as undefined
> > behavior, or shall that be with !folding_initializer && too?
> >
> > Another special case is:
> >   if ((DECL_P (base0) && TREE_CODE (base1) == STRING_CST)
> >|| (TREE_CODE (base0) == STRING_CST && DECL_P (base1))
> >|| (TREE_CODE (base0) == STRING_CST
> >&& TREE_CODE (base1) == STRING_CST
> >&& ioff0 >= 0 && ioff1 >= 0
> >&& ioff0 < TREE_STRING_LENGTH (base0)
> >&& ioff1 < TREE_STRING_LENGTH (base1)
> >   /* This is a too conservative test that the STRING_CSTs
> >  will not end up being string-merged.  */
> >&& strncmp (TREE_STRING_POINTER (base0) + ioff0,
> >TREE_STRING_POINTER (base1) + ioff1,
> >MIN (TREE_STRING_LENGTH (base0) - ioff0,
> > TREE_STRING_LENGTH (base1) - ioff1)) != 0))
> > ;
> >   else if (!DECL_P (base0) || !DECL_P (base1))
> > return 2;
> > Here we similarly assume that vars aren't adjacent to string literals
> > or vice versa.  Do we need to stick !folding_initializer && to those
> > DECL_P vs. STRING_CST cases?  Though, because of the return 2; for
> > non-DECL_P that would mean rejecting comparisons like &var == &"foobar"[3]
> > etc. which ought to be fine, no?  So perhaps we need to watch for
> > decls. vs. STRING_CSTs like for DECLs whether the address is at the start
> > or at the end of the string literal or somewhere in between (at least
> > for folding_initializer)?
> > And yet another chapter but probably unsolvable is comparison of
> > string literal addresses.  I think pedantically in C++
> > &"foo"[0] == &"foo"[0] is undefined behavior, different occurences of
> > the same string literals might still not be merged in some implementations.
> > But constexpr const char *s = "foo"; &s[0] == &s[0] should be well defined,
> > and we aren't tracking anywhere whether the string literal was the same one
> > or different (and I think other compilers don't track that either).
>
> On my TODO list is to make &"foo" invalid and instead require &CONST_DECL
> (and DECL_INITIAL of it then being "foo"), that would make it possible to
> track the "original" string literal and perform string merging in a
> more considerate way.

Interesting because I wrote this would be one way to fix PR88925.

Thanks,
Andrew Pinski

>
> Richard.
>
> >
> > 2022-01-06  Jakub Jelinek  
> >
> > PR c++/89074
> > * fold-const.c (address_compare): Punt on comparison of address of
> > one object with address of end of another object if
> > folding_initializer.
> >
> > * g++.dg/cpp1y/constexpr-89074-1.C: New test.
> >
> > --- gcc/fold-const.c.jj 2022-01-05 20:30:08.731806756 +0100
> > +++ gcc/fold-const.c2022-01-05 20:34:52.277822349 +0100
> > @@ -16627,7 +16627,7 @@ address_compare (tree_code code, tree ty
> >/* If this is a pointer comparison, ignore for now even
> >   valid equalities where one pointer is the offset zero
> >   of one object and the other to one past end of another one.  */
> > -  else if (!INTEGRAL_TYPE_P (type))
> > +  else if (!folding_initializer && !INTEGRAL_TYPE_P (type))
> >  ;
> >/* Assume that automatic variables can't be adjacent to global
> >   variables.  */
> > --- gcc/testsuite/g++.dg/cpp1y/constexpr-89074-1.C.jj   2022-01-05 
> > 20:43:03.696917484 +0100
> > +++ gcc/testsuite/g++.dg/cpp1y/constexpr-89074-1.C  2022-01-05 
> > 20:42:12.676634044 +0100
> > @@ -0,0 +1,28 @@
> > +// PR c++/89074
> > +// { dg-do compile { target c++14 } }
> > +
> > +constexpr bool
> > +foo ()
> > +

Re: Ping^1 [PATCH, rs6000] new split pattern for TI to V1TI move [PR103124]

2022-01-10 Thread HAO CHEN GUI via Gcc-patches
Segher and David,

   Thanks for your explanation. I got it. The "\m" itself is a constraint 
escape.

Gui Haochen

On 11/1/2022 上午 9:12, Segher Boessenkool wrote:
> On Mon, Jan 10, 2022 at 06:09:01PM -0500, David Edelsohn wrote:
>> On Sun, Jan 9, 2022 at 10:16 PM HAO CHEN GUI  wrote:
 +/* { dg-final { scan-assembler-not "\mmr\M" } } */
>>
>> Segher probably would prefer {\mmr\M} .
> 
> Because that one works, and the one with double quotes doesn't, yes :-)
> 
> It is a scan-assembler-not so the testcase likely won't fail, but it is
> checking the wrong thing.  In double-quoted strings "\m" means the same
> as "m", and "\M" means the same as "M" (neither escape has any special
> meaning).  If you want the regex escapes in such a string, you need to
> escape the escapes, so write "\\m" and "\\M".  It is much simpler to not
> have backslash substitution on the strings at all, so to use {\m} etc.
> 
> 
> Segher


Re: [PATCH 0/2]: C N2653 char8_t implementation

2022-01-10 Thread Joseph Myers
Please repost these patches after GCC 12 branches (updated as appropriate 
depending on whether the feature is accepted at the two-week Jan/Feb WG14 
meeting, which doesn't yet have an agenda), since we're currently 
stabilizing for the release and so not considering new features.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Ping: [PATCH] rs6000: powerpc suboptimal boolean test of contiguous bits [PR102239]

2022-01-10 Thread David Edelsohn via Gcc-patches
On Mon, Jan 10, 2022 at 12:37 AM Xionghu Luo  wrote:
>
> Ping, thanks.
>
>
> On 2021/12/13 13:16, Xionghu Luo wrote:
> > Add specialized version to combine two instructions from
> >
> >  9: {r123:CC=cmp(r124:DI&0x6,0);clobber scratch;}
> >REG_DEAD r124:DI
> >  10: pc={(r123:CC==0)?L15:pc}
> >   REG_DEAD r123:CC
> >
> > to:
> >
> >  10: {pc={(r123:DI&0x6==0)?L15:pc};clobber scratch;clobber %0:CC;}
> >
> > then split2 will split it to one rotate dot instruction (to save one
> > rotate back instruction) as shifted result doesn't matter when comparing
> > to 0 in CCEQmode.
> >
> > Bootstrapped and regression tested pass on Power 8/9/10, OK for master?
> >
> > gcc/ChangeLog:
> >
> >   PR target/102239
> >   * config/rs6000/rs6000.md (*anddi3_insn_dot): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   PR target/102239
> >   * gcc.target/powerpc/pr102239.c: New test.
> > ---
> >  gcc/config/rs6000/rs6000-protos.h   |  1 +
> >  gcc/config/rs6000/rs6000.c  |  7 
> >  gcc/config/rs6000/rs6000.md | 38 +
> >  gcc/testsuite/gcc.target/powerpc/pr102239.c | 13 +++
> >  4 files changed, 59 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr102239.c
> >
> > diff --git a/gcc/config/rs6000/rs6000-protos.h 
> > b/gcc/config/rs6000/rs6000-protos.h
> > index 14f6b313105..3644c524376 100644
> > --- a/gcc/config/rs6000/rs6000-protos.h
> > +++ b/gcc/config/rs6000/rs6000-protos.h
> > @@ -73,6 +73,7 @@ extern int expand_block_move (rtx[], bool);
> >  extern bool expand_block_compare (rtx[]);
> >  extern bool expand_strn_compare (rtx[], int);
> >  extern bool rs6000_is_valid_mask (rtx, int *, int *, machine_mode);
> > +extern bool rs6000_is_valid_rotate_dot_mask (rtx mask, machine_mode mode);
> >  extern bool rs6000_is_valid_and_mask (rtx, machine_mode);
> >  extern bool rs6000_is_valid_shift_mask (rtx, rtx, machine_mode);
> >  extern bool rs6000_is_valid_insert_mask (rtx, rtx, machine_mode);
> > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> > index 5e129986516..57a38cf954a 100644
> > --- a/gcc/config/rs6000/rs6000.c
> > +++ b/gcc/config/rs6000/rs6000.c
> > @@ -11606,6 +11606,13 @@ rs6000_is_valid_mask (rtx mask, int *b, int *e, 
> > machine_mode mode)
> >return true;
> >  }
> >
> > +bool
> > +rs6000_is_valid_rotate_dot_mask (rtx mask, machine_mode mode)
> > +{
> > +  int nb, ne;
> > +  return rs6000_is_valid_mask (mask, &nb, &ne, mode) && nb >= ne && ne > 0;
> > +}
> > +
> >  /* Return whether MASK (a CONST_INT) is a valid mask for any rlwinm, 
> > rldicl,
> > or rldicr instruction, to implement an AND with it in mode MODE.  */
> >
> > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> > index 6bec2bddbde..014dc9612ea 100644
> > --- a/gcc/config/rs6000/rs6000.md
> > +++ b/gcc/config/rs6000/rs6000.md
> > @@ -3762,6 +3762,44 @@ (define_insn_and_split "*and3_2insn_dot2"
> > (set_attr "dot" "yes")
> > (set_attr "length" "8,12")])
> >
> > +(define_insn_and_split "*anddi3_insn_dot"

This pattern needs a name that better represents its purpose.  The
pattern name implies that it's operating on a combination of AND and
Record Condition bit.  Also "insn" is confusing; I think that you are
using the template from the 2insn_dot names, so this should explicitly
be 1insn. Maybe "branch_anddi3_1insn_dot", or just
"branch_anddi3_dot".

> > + [(set (pc)
> > +(if_then_else (eq (and:DI (match_operand:DI 1 "gpc_reg_operand" "%r,r")
> > +   (match_operand:DI 2 "const_int_operand" "n,n"))
> > +   (const_int 0))
> > +   (label_ref (match_operand 3 ""))
> > +   (pc)))
> > +  (clobber (match_scratch:DI 0 "=r,r"))
> > +  (clobber (reg:CC CR0_REGNO))]
> > +  "rs6000_is_valid_rotate_dot_mask (operands[2], DImode)
> > +  && TARGET_POWERPC64"
> > +  "#"
> > +  "&& reload_completed"
> > +  [(pc)]
> > +{
> > +   int nb, ne;
> > +   if (rs6000_is_valid_mask (operands[2], &nb, &ne, DImode)
> > +   && nb >= ne
> > +   && ne > 0)
> > + {
> > + unsigned HOST_WIDE_INT val = INTVAL (operands[2]);
> > + int shift = 63 - nb;
> > + rtx tmp = gen_rtx_ASHIFT (DImode, operands[1], GEN_INT (shift));
> > + tmp = gen_rtx_AND (DImode, tmp, GEN_INT (val << shift));
> > + rtx cr0 = gen_rtx_REG (CCmode, CR0_REGNO);
> > + rs6000_emit_dot_insn (operands[0], tmp, 1, cr0);
> > + rtx loc_ref = gen_rtx_LABEL_REF (VOIDmode, operands[3]);
> > + rtx cond = gen_rtx_EQ (CCEQmode, cr0, const0_rtx);
> > + rtx ite = gen_rtx_IF_THEN_ELSE (VOIDmode, cond, loc_ref, pc_rtx);
> > + emit_jump_insn (gen_rtx_SET (pc_rtx, ite));
> > + DONE;
> > + }
> > +   else
> > + FAIL;
> > +}
> > +  [(set_attr "type" "shift")
> > +   (set_attr "dot" "yes")
> > +   (set_attr "length" "8,12")])
> >
> >  (define_expand "3"
> >[(set (match_operand:SDI 0 "gpc_reg_operand")
> > diff --git a/gcc/test

Re: [PATCH] [i386] Remove register restriction on operands for andnot insn

2022-01-10 Thread Hongtao Liu via Gcc-patches
On Mon, Jan 10, 2022 at 3:21 PM Jiang, Haochen  wrote:
>
> Hi Hongtao,
>
> I have changed that message in this patch. Ok for trunk?
Ok.
>
> Thx,
> Haochen
>
> -Original Message-
> From: Hongtao Liu 
> Sent: Monday, January 10, 2022 3:25 PM
> To: Jiang, Haochen 
> Cc: GCC Patches ; Liu, Hongtao 
> 
> Subject: Re: [PATCH] [i386] Remove register restriction on operands for 
> andnot insn
>
> On Mon, Jan 10, 2022 at 2:23 PM Haochen Jiang via Gcc-patches 
>  wrote:
> >
> > Hi all,
> >
> > This patch removes the register restriction on operands for andnot insn so 
> > that it can be used from memory.
> >
> > Regtested on x86_64-pc-linux-gnu. Ok for trunk?
> >
> > BRs,
> > Haochen
> >
> > gcc/ChangeLog:
> >
> > PR target/53652
> > * config/i386/sse.md (*andnot3): Remove register restriction.
> It should be "Extend predicate of operands[1] from register_operand to 
> vector_operand".
> Similar for you commit message.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/53652
> > * gcc.target/i386/pr53652-1.c: New test.
> > ---
> >  gcc/config/i386/sse.md|  2 +-
> >  gcc/testsuite/gcc.target/i386/pr53652-1.c | 16 
> >  2 files changed, 17 insertions(+), 1 deletion(-)  create mode 100644
> > gcc/testsuite/gcc.target/i386/pr53652-1.c
> >
> > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > 0997d9edf9d..4448b875d35 100644
> > --- a/gcc/config/i386/sse.md
> > +++ b/gcc/config/i386/sse.md
> > @@ -16630,7 +16630,7 @@
> >  (define_insn "*andnot3"
> >[(set (match_operand:VI 0 "register_operand" "=x,x,v")
> > (and:VI
> > - (not:VI (match_operand:VI 1 "register_operand" "0,x,v"))
> > + (not:VI (match_operand:VI 1 "vector_operand" "0,x,v"))
> >   (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr")))]
> >"TARGET_SSE"
> >  {
> > diff --git a/gcc/testsuite/gcc.target/i386/pr53652-1.c
> > b/gcc/testsuite/gcc.target/i386/pr53652-1.c
> > new file mode 100644
> > index 000..bd07ee29f4d
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/i386/pr53652-1.c
> > @@ -0,0 +1,16 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2 -msse2" } */
> > +/* { dg-final { scan-assembler-times "pandn\[ \\t\]" 2 } } */
> > +/* { dg-final { scan-assembler-not "vpternlogq\[ \\t\]" } } */
> > +
> > +typedef unsigned long long vec __attribute__((vector_size (16))); vec
> > +g; vec f1 (vec a, vec b) {
> > +  return ~a&b;
> > +}
> > +vec f2 (vec a, vec b)
> > +{
> > +  return ~g&b;
> > +}
> > +
> > --
> > 2.18.1
> >
>
>
> --
> BR,
> Hongtao



-- 
BR,
Hongtao


Re: Ping^1 [PATCH, rs6000] new split pattern for TI to V1TI move [PR103124]

2022-01-10 Thread Segher Boessenkool
On Mon, Jan 10, 2022 at 06:09:01PM -0500, David Edelsohn wrote:
> On Sun, Jan 9, 2022 at 10:16 PM HAO CHEN GUI  wrote:
> > > +/* { dg-final { scan-assembler-not "\mmr\M" } } */
> 
> Segher probably would prefer {\mmr\M} .

Because that one works, and the one with double quotes doesn't, yes :-)

It is a scan-assembler-not so the testcase likely won't fail, but it is
checking the wrong thing.  In double-quoted strings "\m" means the same
as "m", and "\M" means the same as "M" (neither escape has any special
meaning).  If you want the regex escapes in such a string, you need to
escape the escapes, so write "\\m" and "\\M".  It is much simpler to not
have backslash substitution on the strings at all, so to use {\m} etc.


Segher


Re: [PATCH v5 2/4] tree-object-size: Handle function parameters

2022-01-10 Thread Siddhesh Poyarekar

On 10/01/2022 16:20, Jakub Jelinek wrote:

On Sat, Dec 18, 2021 at 06:05:09PM +0530, Siddhesh Poyarekar wrote:

@@ -1440,6 +1441,53 @@ cond_expr_object_size (struct object_size_info *osi, 
tree var, gimple *stmt)
return reexamine;
  }
  
+/* Find size of an object passed as a parameter to the function.  */

+
+static void
+parm_object_size (struct object_size_info *osi, tree var)
+{
+  int object_size_type = osi->object_size_type;
+  tree parm = SSA_NAME_VAR (var);
+
+  if (!(object_size_type & OST_DYNAMIC) || !POINTER_TYPE_P (TREE_TYPE (parm)))
+expr_object_size (osi, var, parm);


This looks very suspicious.  Didn't you mean { expr_object_size (...); return; 
} here?
Because the code below e.g. certainly assumes OST_DYNAMIC and that TREE_TYPE 
(parm)
is a pointer type (otherwise TREE_TYPE (TREE_TYPE (...) wouldn't work.


Indeed, fixed.




+
+  /* Look for access attribute.  */
+  rdwr_map rdwr_idx;
+
+  tree fndecl = cfun->decl;
+  const attr_access *access = get_parm_access (rdwr_idx, parm, fndecl);
+  tree typesize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (parm)));
+  tree sz = NULL_TREE;
+
+  if (access && access->sizarg != UINT_MAX)


Perhaps && typesize here?  It makes no sense to e.g. create ssa default def
when you aren't going to use it in any way.


The typesize is only for scaling; the result of 
get_or_create_ssa_default_def should get returned unscaled if it is 
non-NULL and typesize is NULL; the latter happens when the type is void *:


  sz = get_or_create_ssa_default_def (cfun, arg);
  if (sz != NULL_TREE)
{
  sz = fold_convert (sizetype, sz);
  if (typesize)
sz = size_binop (MULT_EXPR, sz, typesize);
}
}




+{
+  tree fnargs = DECL_ARGUMENTS (fndecl);
+  tree arg = NULL_TREE;
+  unsigned argpos = 0;
+
+  /* Walk through the parameters to pick the size parameter and safely
+scale it by the type size.  */
+  for (arg = fnargs; argpos != access->sizarg && arg;
+  arg = TREE_CHAIN (arg), ++argpos);


Instead of a loop with empty body wouldn't it be better to
do the work in that for loop?
I.e. take argpos != access->sizarg && from the condition,
replace arg != NULL_TREE with that argpos == access->sizarg
and add a break;?


Fixed.




+
+  if (arg != NULL_TREE && INTEGRAL_TYPE_P (TREE_TYPE (arg)))
+   {
+ sz = get_or_create_ssa_default_def (cfun, arg);


Also, I must say I'm little bit worried about this
get_or_create_ssa_default_def call.  If the SSA_NAME doesn't exist,
so you create it and then attempt to use it but in the end don't
because e.g. some PHI's another argument was unknown etc., will
that SSA_NAME be released through release_ssa_name?
I think GIMPLE is fairly unhappy if there are SSA_NAMEs created and not
released that don't appear in the IL anywhere.


AFAICT, set_ss_default_def ends up creating a definition for the new 
SSA_NAME it creates, so it does end up in the IR and in case of object 
size computation failure, it just ends up being a dead store.  I've 
added a test to verify this:


size_t
__attribute__ ((access (__read_write__, 1, 3)))
__attribute__ ((noinline))
test_parmsz_unknown (void *obj, void *unknown, size_t sz, int cond)
{
  return __builtin_dynamic_object_size (cond ? obj : unknown, 0);
}

which works as expected and returns -1.

Thanks,
Siddhesh


Re: [PATCH] rs6000: Remove useless code related to -mno-power10

2022-01-10 Thread David Edelsohn via Gcc-patches
On Wed, Dec 29, 2021 at 4:37 AM Kewen.Lin  wrote:
>
> Hi,
>
> Option -mpower10 was made as "WarnRemoved" since commit r11-2318,
> so -mno-power10 doesn't take effect any more.  This patch is to
> remove one line useless code which still respects it.
>
> Bootstrapped and regtested on powerpc64le-linux-gnu P9 and
> powerpc64-linux-gnu P8.
>
> Is it ok for trunk?
>
> BR,
> Kewen
> -
> gcc/ChangeLog:
>
> * config/rs6000/rs6000.c (rs6000_disable_incompatible_switches): 
> Remove
> useless related to option -mno-power10.
> ---
>  gcc/config/rs6000/rs6000.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index e82a47f4c0e..66b01e589b0 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -24825,7 +24825,6 @@ rs6000_disable_incompatible_switches (void)
>  const HOST_WIDE_INT dep_flags; /* flags that depend on this option.  
> */
>  const char *const name;/* name of the switch.  */
>} flags[] = {
> -{ OPTION_MASK_POWER10, OTHER_POWER10_MASKS,"power10"   },
>  { OPTION_MASK_P9_VECTOR,   OTHER_P9_VECTOR_MASKS,  "power9-vector" },
>  { OPTION_MASK_P8_VECTOR,   OTHER_P8_VECTOR_MASKS,  "power8-vector" },
>  { OPTION_MASK_VSX, OTHER_VSX_VECTOR_MASKS, "vsx"   },

Okay.

Thanks, David


[Patch][V3][Patch 2/2]Enable -Wuninitialized + -ftrivial-auto-var-init for address taken variables.

2022-01-10 Thread Qing Zhao via Gcc-patches
Hi, Richard,

This is the second patch, which is mainly the change for "Enable 
-Wuninitialized + -ftrivial-auto-var-init for  address taken variables”.

Please see the detailed description below for the problem and solution of this 
patch.

This patch has been bootstrapped and regressing tested on both X86 and aarch64.

Okay for GCC12?

thanks.

Qing.

=
Enable -Wuninitialized + -ftrivial-auto-var-init for address taken variables.

With -ftrivial-auto-var-init, the address taken auto variable is replaced with
a temporary variable during gimplification, and the original auto variable might
be eliminated by compiler optimization completely. As a result, the current
uninitialized warning analysis cannot get enough information from the IR,
therefore the uninitialized warnings for address taken variable cannot be
issued based on the current implemenation of -ftrival-auto-var-init.

For more info please refer to:
https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577431.html

In order to improve this situation, we can improve uninitialized analysis
for address taken auto variables with -ftrivial-auto-var-init as following:

for the following stmt:

_1 = .DEFERRED_INIT (4, 2, &"alt_reloc"[0]);

The original user variable has been eliminated from the IR, the LHS is the
temporary variable that was created to replace it. we will get the necessary
information from this stmt for reportinng the warning message:

A. the name of the DECL from the 3rd parameter of the call;
B. the location of the DECL from the location of the call;
C. the LHS is used to hold the information on whether the warning
   has been issued or not to suppress warning messages when needed;

The current testing cases for uninitialized warnings + -ftrivial-auto-var-init
are adjusted to reflect the fact that we can issue warnings for address taken
variables.

gcc/ChangeLog:

2022-01-10  qing zhao  

* tree-ssa-uninit.c (warn_uninit): Handle .DEFERRED_INIT call with an
anonymous SSA_NAME specially.
(check_defs): Likewise.
(warn_uninit_phi_uses): Adjust the message format for warn_uninit.
(warn_uninitialized_vars): Likewise.
(warn_uninitialized_phi): Likewise.

gcc/testsuite/ChangeLog:

2022-01-10  qing zhao  

* gcc.dg/auto-init-uninit-16.c (testfunc): Delete xfail to reflect
the fact that address taken variable can be warned.
* gcc.dg/auto-init-uninit-34.c (warn_scalar_1): Likewise.
(warn_scalar_2): Likewise.
* gcc.dg/auto-init-uninit-37.c (T1): Likewise.
(T2): Likewise.
* gcc.dg/auto-init-uninit-B.c (baz): Likewise.
—

The complete patch is attached:





0002-Enable-Wuninitialized-ftrivial-auto-var-init-for-add.patch
Description: 0002-Enable-Wuninitialized-ftrivial-auto-var-init-for-add.patch


[Patch][V3][Patch 1/2]Change the 3rd parameter of function .DEFERRED_INIT from IS_VLA to decl name

2022-01-10 Thread Qing Zhao via Gcc-patches
Hi, Richard,

I splited the previous patch for “Enable -Wuninitialized + 
-ftrivial-auto-var-init for address taken variables” into two separate patches.
This is the first one 

This first  patch  is to fix (or work around ) PR103720, therefore it’s an 
important change, and need to be go into GCC12.
At the same time, this patch is the preparation for the second patch that will 
actually enable -Wuninitialized + -ftrivial-auto-var-init for address taken 
variables. 

The reason I separate the previous patch into two is: most of the previous 
concern was on the second part of the patch (the change in tree-ssa-uninit.c), 
I don’t
want those concern prevent this first patch from being approved into GCC12. 


In this part, I addressed your comments in  gimplify.c :

=
 tree decl_name
+= build_string_literal (IDENTIFIER_LENGTH (DECL_NAME (decl)) + 1,
+   IDENTIFIER_POINTER (DECL_NAME (decl)));

you need to deal with DECL_NAME being NULL.
=

Please also see the detailed description below for the problem and solution of 
this patch.

This first patch has been bootstrapped and regressing tested on both X86 and 
aarch64. 

Okay for GCC12?

Thanks.

Qing.


=

 Change the 3rd parameter of function .DEFERRED_INIT from
 IS_VLA to decl name.

Currently, the 3rd parameter of function .DEFERRED_INIT is IS_VLA, which is
not needed at all;

In this patch, we change the 3rd parameter from IS_VLA to the name of the var
decl for the following purposes:

1. Fix (or work around) PR103720:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103720

As confirmed in PR103720, with the current definition of .DEFERRED_INIT,

Dom transformed:
  c$a$0_6 = .DEFERRED_INIT (8, 2, 0);
  _1 = .DEFERRED_INIT (8, 2, 0);

into:
  c$a$0_6 = .DEFERRED_INIT (8, 2, 0);
  _1 = c$a$0_6;

which is incorrectly done due to Dom treating the two calls to const function
.DEFERRED_INIT as the same call since all actual parameters are the same.

The same issue has been exposed in PR102608 due to a different optimization VN,
the fix for PR102608 is to specially handle call to .DEFERRED_INIT in VN to
exclude it from CSE.

To fix PR103720, we could do the same as the fix to PR102608 to specially
handle call to .DEFERRED_INIT in Dom to exclude it from being optimized.

However, in addition to Dom and VN, there should be other optimizations that
have the same issue as PR103720 or PR102608 (As I built Linux kernel with
-ftrivial-auto-var-init=zero -Werror, I noticed a bunch of bugos warnings).

Other than identifying all the optimizations and specially handling call to
.DEFERRED_INIT in all these optimizations, changing the 3rd parameter of the
function .DEFERRED_INIT from IS_VLA to the name string of the var decl might
be a better workaround (or a fix). After this change, since the 3rd actual
parameter is the name string of the variable, different calls for different
variables will have different name strings as the 3rd actual, As a result, the
optimization that previously treated the different calls to .DEFERRED_INIT as
the same will be prevented.

2. Prepare for enabling -Wuninitialized + -ftrivail-auto-var-init for address
taken variables.

As discussion in the following thread:

https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577431.html

With the current implemenation of -ftrivial-auto-var-init and uninitialized
warning analysis, the uninitialized warning for an address taken auto variable
might be missed since the variable is completely eliminated by optimization and
replaced with a temporary variable in all the uses.

In order to improve such situation, changing the 3rd parameter of the function
.DEFERRED_INIT to the name string of the variable will provide necessary
information to uninitialized warning analysis to make the missing warning
possible.

gcc/ChangeLog:

2022-01-10  qing zhao  

* gimplify.c (gimple_add_init_for_auto_var): Delete the 3rd argument.
Change the 3rd argument of function .DEFERRED_INIT to the name of the
decl.
(gimplify_decl_expr): Delete the 3rd argument when call
gimple_add_init_for_auto_var.
* internal-fn.c (expand_DEFERRED_INIT): Update comments to reflect
the 3rd argument change of function .DEFERRED_INIT.
* tree-cfg.c (verify_gimple_call): Update comments and verification
to reflect the 3rd argument change of function .DEFERRED_INIT.
* tree-sra.c (generate_subtree_deferred_init): Delete the 3rd argument.
(sra_modify_deferred_init): Change the 3rd argument of function
.DEFERRED_INIT to the name of the decl.

gcc/testsuite/ChangeLog:

2022-01-10  qing zhao  

* c-c++-common/auto-init-1.c: Adjust testcase to reflect the 3rd
argument change of function .DEFERRED_INIT.
* c-c++-common/auto-init-10.c: Likewise.
* c-c++-common/auto-init-11.c: Likewise.
* c-c++-common/auto-init-12.c: Likewise.
* c-c++-common/auto-init-13.c: Li

Re: [PATCH v5 1/4] tree-object-size: Support dynamic sizes in conditions

2022-01-10 Thread Siddhesh Poyarekar

On 10/01/2022 16:07, Jakub Jelinek wrote:

You test the above with both possibilities.


+  if (test_builtin_calloc_condphi (128, 1, 0) == 128)
+FAIL ();


But not this one, why?  Also, it would be better to have
a != ... test rather than ==, if it is the VLA, then 128 * sizeof (struct { int 
a; char b; })
?


I think I'll move the test_builtin_calloc_condphi test into the 
GIMPLE_CALL patch since that's where it becomes fully functional.


Thanks,
Siddhesh


[PATCH] PR tree-optimization/103821 - Prevent exponential range calculations.

2022-01-10 Thread Andrew MacLeod via Gcc-patches

This test case demonstrates an unnoticed exponential situation in range-ops.

We end up unrolling the  loop, and the pattern of code creates a set of 
cascading multiplies for which we can precisely evaluate them with 
sub-ranges.


For instance, we calculated :

_38 = int [8192, 8192][24576, 24576][40960, 40960][57344, 57344]

so _38 has 4 sub-ranges, and then we calculate:

_39 = _38 * _38;

we do 16 sub-range multiplications and end up with:  int [67108864, 
67108864][201326592, 201326592][335544320, 335544320][469762048, 
469762048][603979776, 603979776][1006632960, 1006632960][1409286144, 
1409286144][1677721600, 1677721600][+INF, +INF]


This feeds other multiplies (_39 * _39)  and progresses rapidly to blow 
up the number of sub-ranges in subsequent operations.


Folding of sub-ranges is an O(n*m) process. We perform the operation on 
each pair of sub-ranges and union them.   Values like _38 * _38 that 
continue feeding each other quickly become exponential.


Then combining that with union (an inherently linear operation over the 
number of sub-ranges) at each step of the way adds an additional 
quadratic operation on top of the exponential factor.


This patch adjusts the wi_fold routine to recognize when the calculation 
is moving in an exponential direction, simply produce a summary result 
instead of a precise one.  The attached patch does this if (#LH 
sub-ranges * #RH sub-ranges > 12)... then it just performs the operation 
with the lower and upper bound instead.    We could choose a different 
number, but that one seems to keep things under control, and allows us 
to process up to a 3x4 operation for precision (there is a testcase in 
the testsuite for this combination gcc.dg/tree-ssa/pr61839_2.c).    
Longer term, we might want adjust this routine to be slightly smarter 
than that, but this is a virtually zero-risk solution this late in the 
release cycle.


This also a generalize ~1% speedup in the VRP2 pass across 380 gcc 
source files, but I'm sure has much more dramatic results at -O3 that 
this testcase exposes.


Bootstraps on x86_64-pc-linux-gnu with no regressions. OK for trunk?

Andrew
commit d8c5c37d5362bd876118949de76086daba756ace
Author: Andrew MacLeod 
Date:   Mon Jan 10 13:33:44 2022 -0500

Prevent exponential range calculations.

Produce a summary result for any operation involving too many subranges.

PR tree-optimization/103821
* range-op.cc (range_operator::fold_range): Only do precise ranges
when there are not too many subranges.

range_operator::fold_range

diff --git a/gcc/range-op.cc b/gcc/range-op.cc
index 1af42ebc376..a4f6e9eba29 100644
--- a/gcc/range-op.cc
+++ b/gcc/range-op.cc
@@ -209,10 +209,12 @@ range_operator::fold_range (irange &r, tree type,
   unsigned num_rh = rh.num_pairs ();
 
   // If both ranges are single pairs, fold directly into the result range.
-  if (num_lh == 1 && num_rh == 1)
+  // If the number of subranges grows too high, produce a summary result as the
+  // loop becomes exponential with little benefit.  See PR 103821.
+  if ((num_lh == 1 && num_rh == 1) || num_lh * num_rh > 12)
 {
-  wi_fold_in_parts (r, type, lh.lower_bound (0), lh.upper_bound (0),
-			rh.lower_bound (0), rh.upper_bound (0));
+  wi_fold_in_parts (r, type, lh.lower_bound (), lh.upper_bound (),
+			rh.lower_bound (), rh.upper_bound ());
   op1_op2_relation_effect (r, type, lh, rh, rel);
   return true;
 }


Re: Ping^1 [PATCH, rs6000] new split pattern for TI to V1TI move [PR103124]

2022-01-10 Thread David Edelsohn via Gcc-patches
On Sun, Jan 9, 2022 at 10:16 PM HAO CHEN GUI  wrote:
>
> Hi,
>
> Gentle ping this:
> https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587051.html
>
> Thanks
>
> On 17/12/2021 上午 9:55, HAO CHEN GUI wrote:
> > Hi,
> >This patch defines a new split pattern for TI to V1TI move. The pattern 
> > concatenates two subreg:DI of
> > a TI to a V2DI. With the pattern, the subreg pass can do register split for 
> > TI when there is a TI to V1TI
> > move. The patch optimizes one unnecessary "mr" out on P9. The new test case 
> > illustrates it.
> >
> >Bootstrapped and tested on powerpc64-linux BE and LE with no 
> > regressions. Is this okay for trunk?
> > Any recommendations? Thanks a lot.
> >
> > ChangeLog
> > 2021-12-13 Haochen Gui 
> >
> > gcc/
> >   * config/rs6000/vsx.md (split pattern for TI to V1TI move): Defined.
> >
> > gcc/testsuite/
> >   * gcc.target/powerpc/pr103124.c: New testcase.
> >
> >
> > patch.diff
> > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> > index bf033e31c1c..52968eb4609 100644
> > --- a/gcc/config/rs6000/vsx.md
> > +++ b/gcc/config/rs6000/vsx.md
> > @@ -6589,3 +6589,19 @@ (define_insn "xxeval"
> > [(set_attr "type" "vecperm")
> >  (set_attr "prefixed" "yes")])
> >
> > +;; Construct V1TI by vsx_concat_v2di
> > +(define_split
> > +  [(set (match_operand:V1TI 0 "vsx_register_operand")
> > + (subreg:V1TI
> > +   (match_operand:TI 1 "int_reg_operand") 0 ))]
> > +  "TARGET_P9_VECTOR && !reload_completed"
> > +  [(const_int 0)]
> > +{
> > +  rtx tmp1 = simplify_gen_subreg (DImode, operands[1], TImode, 0);
> > +  rtx tmp2 = simplify_gen_subreg (DImode, operands[1], TImode, 8);
> > +  rtx tmp3 = gen_reg_rtx (V2DImode);
> > +  emit_insn (gen_vsx_concat_v2di (tmp3, tmp1, tmp2));
> > +  rtx tmp4 = simplify_gen_subreg (V1TImode, tmp3, V2DImode, 0);
> > +  emit_move_insn (operands[0], tmp4);
> > +  DONE;
> > +})
> > diff --git a/gcc/testsuite/gcc.target/powerpc/pr103124.c 
> > b/gcc/testsuite/gcc.target/powerpc/pr103124.c
> > new file mode 100644
> > index 000..e9072d19b8e
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/pr103124.c
> > @@ -0,0 +1,12 @@
> > +/* { dg-do compile } */
> > +/* { dg-require-effective-target powerpc_p9vector_ok } */
> > +/* { dg-require-effective-target int128 } */
> > +/* { dg-options "-O2 -mdejagnu-cpu=power9" } */
> > +/* { dg-final { scan-assembler-not "\mmr\M" } } */

Segher probably would prefer {\mmr\M} .

> > +
> > +vector __int128 add (long long a)
> > +{
> > +  vector __int128 b;
> > +  b = (vector __int128) {a};
> > +  return b;
> > +}

This is okay.

Thanks, David


Re: Ping: [PATCH] rs6000: Add split pattern to replace

2022-01-10 Thread David Edelsohn via Gcc-patches
On Mon, Jan 10, 2022 at 12:04 AM Xionghu Luo  wrote:
>
> Gentle ping, thanks.
>
>
> On 2021/12/29 09:27, Xionghu Luo wrote:
> > 7: r120:V4SI=const_vector
> > 8: r121:V4SI=unspec[r120:V4SI,r120:V4SI,0xc] 260
> >
> > with r121:v4SI = r120:V4SI when r120 is a vector with same element.
> >
> > Bootstrapped and regtested pass on powerpc64le-linux-gnu {P10, P9}
> > and powerpc64-linux-gnu {P8, P7}.  OK for master?
> >
> > gcc/ChangeLog:
> >
> >   * config/rs6000/altivec.md (sldoi_to_mov_): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> >   * gcc.target/powerpc/sldoi_to_mov.c: New test.
> > ---
> >  gcc/config/rs6000/altivec.md| 11 +++
> >  gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c | 15 +++
> >  2 files changed, 26 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c
> >
> > diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> > index b2909857c34..25f86dbe828 100644
> > --- a/gcc/config/rs6000/altivec.md
> > +++ b/gcc/config/rs6000/altivec.md
> > @@ -383,6 +383,17 @@ (define_split
> >  }
> >  })
> >
> > +(define_insn_and_split "sldoi_to_mov_"

It would be more consistent with the naming convention to use
"sldoi_to_mov" without the final "_".

> > +  [(set (match_operand:VM 0 "altivec_register_operand")
> > + (unspec:VM [(match_operand:VM 1 "easy_vector_constant")

Should this be "easy_vector_constant_vsldoi"?

> > + (match_dup 1)
> > + (match_operand:VM 2 "u5bit_cint_operand")]

This should be match_operand:QI, right?

Thanks, David

> > + UNSPEC_VSLDOI))]
> > +  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode) && can_create_pseudo_p ()"
> > +  "#"
> > +  "&& 1"
> > +  [(set (match_dup 0) (match_dup 1))])
> > +
> >  (define_insn "get_vrsave_internal"
> >[(set (match_operand:SI 0 "register_operand" "=r")
> >   (unspec:SI [(reg:SI VRSAVE_REGNO)] UNSPEC_GET_VRSAVE))]
> > diff --git a/gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c 
> > b/gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c
> > new file mode 100644
> > index 000..2053243c456
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/powerpc/sldoi_to_mov.c
> > @@ -0,0 +1,15 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +#include 
> > +vector signed int foo1 (vector signed int a) {
> > +vector signed int b = {0};
> > +return vec_sum2s(a, b);
> > +}
> > +
> > +vector signed int foo2 (vector signed int a) {
> > +vector signed int b = {0};
> > +return vec_sld(b, b, 4);
> > +}
> > +
> > +/* { dg-final { scan-assembler-times {\mvsldoi\M} 1 {target le} } } */
>
> --
> Thanks,
> Xionghu


[power-ieee128, committed] Enable conversion selection via environment variable

2022-01-10 Thread Thomas Koenig via Gcc-patches

Hello world,

I have just pushed the attched patch to the branch.

With this patch, the program

tkoenig@gcc-fortran:~/Tst$ cat write_env.f90
program main
  real(kind=16) :: x
  character (len=30) :: conv
  x = 1/3._16
  open 
(10,file="out.dat",status="replace",access="stream",form="unformatted")

  inquire(10,convert=conv)
  print *,conv
  write (10) 1/3._16
end program main

gives the result:

tkoenig@gcc-fortran:~/Tst$ GFORTRAN_CONVERT_UNIT="r16_ibm:10" ./a.out && 
od -w64 -t x1 out.dat

 LITTLE_ENDIAN,R16_IBM
000 55 55 55 55 55 55 d5 3f 56 55 55 55 55 55 75 3c
020
tkoenig@gcc-fortran:~/Tst$ GFORTRAN_CONVERT_UNIT="r16_ieee:10" ./a.out 
&& od -w64 -t x1 out.dat

 LITTLE_ENDIAN,R16_IEEE
000 80 55 55 55 55 55 55 55 55 55 55 55 55 55 fd 3f
020
tkoenig@gcc-fortran:~/Tst$ 
GFORTRAN_CONVERT_UNIT="big_endian:10;r16_ieee:10" ./a.out && od -w64 -t 
x1 out.dat

 BIG_ENDIAN,R16_IEEE
000 3f fd 55 55 55 55 55 55 55 55 55 55 55 55 55 80
020

so things look OK.  In the next few days, I will do a bit more
testing to see if I have missed any corner cases.

So, the only thing missing is handling of the options, but
I think that is not critical (and could be added later; two
separate possibilities might just be enough for most users :-)

So... time to merge the branch into trunk before stage 4
kicks in?

Best regards

Thomas


Handle R16 conversion for POWER in the environment variables.

This patch handles the environment variables for the REAL(KIND=16)
variables like for the little/big-endian routines, so users without
who have no access to the source or are unwilling to recompile
can use this.

Syntax is, for example

GFORTRAN_CONVERT_UNIT="r16_ieee:10;little_endian:10" ./a.out

libgfortran/ChangeLog:

* runtime/environ.c (R16_IEEE): New macro.
(R16_IBM): New macro.
(next_token): Handle IBM R16 conversion cases.
(push_token): Likewise.
(mark_single): Likewise.
(do_parse): Likewise, initialize endian.
diff --git a/libgfortran/runtime/environ.c b/libgfortran/runtime/environ.c
index fe16c080797..ff10fe53f68 100644
--- a/libgfortran/runtime/environ.c
+++ b/libgfortran/runtime/environ.c
@@ -247,6 +247,11 @@ init_variables (void)
 #define SWAP 258
 #define BIG  259
 #define LITTLE   260
+#ifdef HAVE_GFC_REAL_17
+#define R16_IEEE 261
+#define R16_IBM  262
+#endif
+
 /* Some space for additional tokens later.  */
 #define INTEGER  273
 #define END  (-1)
@@ -392,6 +397,15 @@ next_token (void)
   result = match_word ("swap", SWAP);
   break;
 
+#ifdef HAVE_GFC_REAL_17
+case 'r':
+case 'R':
+  result = match_word ("r16_ieee", R16_IEEE);
+  if (result == ILLEGAL)
+	result = match_word ("r16_ibm", R16_IBM);
+  break;
+
+#endif
 case '1': case '2': case '3': case '4': case '5':
 case '6': case '7': case '8': case '9':
   result = match_integer ();
@@ -414,7 +428,8 @@ push_token (void)
 
 /* This is called when a unit is identified.  If do_count is nonzero,
increment the number of units by one.  If do_count is zero,
-   put the unit into the table.  */
+   put the unit into the table.  For POWER, we have to make sure that
+   we can also put in the conversion btween IBM and IEEE long double.  */
 
 static void
 mark_single (int unit)
@@ -428,7 +443,11 @@ mark_single (int unit)
 }
   if (search_unit (unit, &i))
 {
+#ifdef HAVE_GFC_REAL_17
+  elist[i].conv |= endian;
+#else
   elist[i].conv = endian;
+#endif
 }
   else
 {
@@ -437,7 +456,11 @@ mark_single (int unit)
 
   n_elist += 1;
   elist[i].unit = unit;
+#ifdef HAVE_GFC_REAL_17
+  elist[i].conv |= endian;
+#else
   elist[i].conv = endian;
+#endif
 }
 }
 
@@ -481,6 +504,8 @@ do_parse (void)
 
   /* Parse the string.  First, let's look for a default.  */
   tok = next_token ();
+  endian = 0;
+
   switch (tok)
 {
 case NATIVE:
@@ -499,6 +524,15 @@ do_parse (void)
   endian = GFC_CONVERT_LITTLE;
   break;
 
+#ifdef HAVE_GFC_REAL_17
+case R16_IEEE:
+  endian = GFC_CONVERT_R16_IEEE;
+  break;
+
+case R16_IBM:
+  endian = GFC_CONVERT_R16_IBM;
+  break;
+#endif
 case INTEGER:
   /* A leading digit means that we are looking at an exception.
 	 Reset the position to the beginning, and continue processing
@@ -571,6 +605,19 @@ do_parse (void)
 	goto error;
 	  endian = GFC_CONVERT_BIG;
 	  break;
+#ifdef HAVE_GFC_REAL_17
+	case R16_IEEE:
+	  if (next_token () != ':')
+	goto error;
+	  endian = GFC_CONVERT_R16_IEEE;
+	  break;
+
+	case R16_IBM:
+	  if (next_token () != ':')
+	goto error;
+	  endian = GFC_CONVERT_R16_IBM;
+	  break;
+#endif
 
 	case INTEGER:
 	  push_token ();


Re: [PATCH] Add VxWorks fixincludes hack, kernel math.h FP_ constants

2022-01-10 Thread Olivier Hainque via Gcc-patches
Hi Rasmus,

> On 17 Dec 2021, at 21:47, Olivier Hainque  wrote:
> 
>>> Don't you also need to add an fpclassify() macro? There's a
>> 
>> We have a separate "fix" for a set of such functions indeed.

> I probably can merge the two, actually. I'll do that.

We have had pretty good results with the attached patch,
which adds definitions of a few classification macros allowing
the "checking for ISO C99 support in  for C++98"
bit to pass.

Hope this would work for you as well,

Cheers,

Olivier

2021-01-10  Olivier Hainque  

* inclhack.def (vxworks_math_h_fp_c99): New hack.
* tests/base/math.h: Update.
* fixincl.x: Regenerate.



0001-Add-VxWorks-fixincludes-hack-C99-FP-classification.patch
Description: Binary data






[PATCH] Add VxWorks fixincludes hack, #include sysLib.h in time.h

2022-01-10 Thread Olivier Hainque via Gcc-patches
Hello,

This change instates a new vxworks fixincludes hack to make sure
there is a visible prototype of sysClkRateGet() when CLOCKS_PER_SEC
is #defined to that in time.h for VxWorks < 7 (fixincludes not run
otherwise).

The function is provided by sysLib.h, so we arrange to #include
this one at a spot which "works" for either kernel or rtp headers,
as those we have for VxWorsk 6.9.

This allowed us getting at least a first reasonable batch
of libstdc++ test results.


2021-01-10  Olivier Hainque  

* inclhack.def (vxworks_time_h_syslib): New hack.
* tests/base/time.h: Update.
* fixincl.x: Regenerate.




0002-Add-VxWorks-fixincludes-hack-include-sysLib.h-in-tim.patch
Description: Binary data


PING 4 [PATCH v2 1/2] add -Wuse-after-free

2022-01-10 Thread Martin Sebor via Gcc-patches

Last ping before stage 3 ends:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585816.html

On 1/4/22 11:01, Martin Sebor wrote:

Ping.  (CC'ing Jason as requested.)

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585816.html

On 12/13/21 9:48 AM, Martin Sebor wrote:

Ping.

Jeff, I addressed your comments in the updated patch.  If there
are no other changes is the last revision okay to commit?

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585816.html

On 12/6/21 5:50 PM, Martin Sebor wrote:

Ping:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585816.html

On 11/30/21 3:32 PM, Martin Sebor wrote:

Attached is a revised patch with the following changes based
on your comments:

1) Set and use statement uids to determine which statement
    precedes which in the same basic block.
2) Avoid testing flag_isolate_erroneous_paths_dereference.
3) Use post-dominance to decide whether to use the "maybe"
    phrasing vs a definite form.

David raised (and in our offline discussion today reiterated)
an objection to the default setting of the option being
the strictest.  I have not changed that in this revision.
See my rationale for this choice in my reply below:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/583176.html

Martin

On 11/23/21 2:16 PM, Martin Sebor wrote:

On 11/22/21 6:32 PM, Jeff Law wrote:



On 11/1/2021 4:17 PM, Martin Sebor via Gcc-patches wrote:

Patch 1 in the series detects a small subset of uses of pointers
made indeterminate by calls to deallocation functions like free
or C++ operator delete.  To control the conditions the warnings
are issued under the new -Wuse-after-free= option provides three
levels.  At the lowest level the warning triggers only for
unconditional uses of freed pointers and doesn't warn for uses
in equality expressions.  Level 2 warns also for come conditional
uses, and level 3 also for uses in equality expressions.

I debated whether to make level 2 or 3 the default included in
-Wall.  I decided on 3 for two reasons: 1) to raise awareness
of both the problem and GCC's new ability to detect it: using
a pointer after it's been freed, even only in principle, by
a successful call to realloc, is undefined, and 2) because
it's trivial to lower the level either globally, or locally
by suppressing the warning around such misuses.

I've tested the patch on x86_64-linux and by building Glibc
and Binutils/GDB.  It triggers a number of times in each, all
due to comparing invalidated pointers for equality (i.e., level
3).  I have suppressed these in GCC (libiberty) by a #pragma,
and will see how the Glibc folks want to deal with theirs (I
track them in BZ #28521).

The tests contain a number of xfails due to limitations I'm
aware of.  I marked them pr?? until the patch is approved.
I will open bugs for them before committing if I don't resolve
them in a followup.

Martin

gcc-63272-1.diff

Add -Wuse-after-free.

gcc/c-family/ChangeLog

* c.opt (-Wuse-after-free): New options.

gcc/ChangeLog:

* diagnostic-spec.c (nowarn_spec_t::nowarn_spec_t): Handle
OPT_Wreturn_local_addr and OPT_Wuse_after_free_.
* diagnostic-spec.h (NW_DANGLING): New enumerator.
* doc/invoke.texi (-Wuse-after-free): Document new option.
* gimple-ssa-warn-access.cc (pass_waccess::check_call): 
Rename...

(pass_waccess::check_call_access): ...to this.
(pass_waccess::check): Rename...
(pass_waccess::check_block): ...to this.
(pass_waccess::check_pointer_uses): New function.
(pass_waccess::gimple_call_return_arg): New function.
(pass_waccess::warn_invalid_pointer): New function.
(pass_waccess::check_builtin): Handle free and realloc.
(gimple_use_after_inval_p): New function.
(get_realloc_lhs): New function.
(maybe_warn_mismatched_realloc): New function.
(pointers_related_p): New function.
(pass_waccess::check_call): Call check_pointer_uses.
(pass_waccess::execute): Compute and free dominance info.

libcpp/ChangeLog:

* files.c (_cpp_find_file): Substitute a valid pointer for
an invalid one to avoid -Wuse-0after-free.

libiberty/ChangeLog:

* regex.c: Suppress -Wuse-after-free.

gcc/testsuite/ChangeLog:

* gcc.dg/Wmismatched-dealloc-2.c: Avoid -Wuse-after-free.
* gcc.dg/Wmismatched-dealloc-3.c: Same.
* gcc.dg/attr-alloc_size-6.c: Disable -Wuse-after-free.
* gcc.dg/attr-alloc_size-7.c: Same.
* c-c++-common/Wuse-after-free-2.c: New test.
* c-c++-common/Wuse-after-free-3.c: New test.
* c-c++-common/Wuse-after-free-4.c: New test.
* c-c++-common/Wuse-after-free-5.c: New test.
* c-c++-common/Wuse-after-free-6.c: New test.
* c-c++-common/Wuse-after-free-7.c: New test.
* c-c++-common/Wuse-after-free.c: New test.
* g++.dg/warn/Wdangling-pointer.C: New test.
* g++.dg/warn/Wmismatched-dealloc-3.C: New test.
* g++.dg/warn/Wuse-after-free.C: New test.

diff --git a/gcc/gimple-ssa-warn-access.cc 
b/gcc/gimple-ssa-warn-access.cc

index 63fc27a1

PING [PATCH] Use enclosing object size if it's smaller than member [PR 101475]

2022-01-10 Thread Martin Sebor via Gcc-patches

Ping (CC'ing Jason as requested):
https://gcc.gnu.org/pipermail/gcc-patches/2021-December/587033.html

On 1/4/22 10:28, Martin Sebor wrote:

On 12/20/21 12:29 PM, Jeff Law wrote:



On 12/16/2021 12:56 PM, Martin Sebor via Gcc-patches wrote:

Enabling vectorization at -O2 caused quite a few tests for
warnings to start failing in GCC 12.  These tests were xfailed
and bugs were opened to track the problems until they can be
fully analyzed and ultimately fixed before GCC 12 is released.

I've now started going through these and the first such bug
I tackled is PR 102944.  As it turns out, the xfails there
are all due to a known limitation tracked in PR 101475: when
determining the size of a destination for A COMPONENT_REF,
unless asked for the size of the complete object,
compute_objsize() only considers the size of the referenced
member, even when the member is larger than the object would
allow.  This prevents warnings from diagnosing unvectorized
past-the-end accesses to objects in backing buffers (such as
in character arrays or allocated chunks of memory).

Many (though not all) accesses that are vectorized are diagnosed
because there the COMPONENT_REF is replaced by a MEM_REF.  But
because vectorization depends on target-specific things like
alignment requirements, what is and isn't diagnosed also tends
to be target-specific, making these tests quite brittle..

The attached patch corrects this oversight by using the complete
object's size instead of the member when the former is smaller.
Besides improving the out-of-bounds access detection it also
makes the tests behave more consistently across targets.

Tested on x86_64-linux and by building Glibc and verifying
that the change triggers no new warnings.
I must be missing something here.  How can the enclosing object be 
smaller than a member?


When the enclosing object is backed by a buffer of insufficient
size.  The buffer might be a declared character array such as
in the the tests added and modified by the patch, or it might
be dynamically allocated.

Martin




PING 4 [PATCH v2 2/2] add -Wdangling-pointer [PR #63272]

2022-01-10 Thread Martin Sebor via Gcc-patches

Last ping for this stage 1 feature before stage 3 ends:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585819.html

On 1/4/22 11:02, Martin Sebor wrote:

Ping:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585819.html

On 12/13/21 9:50 AM, Martin Sebor wrote:

Ping.  This patch, originally submitted on Nov. 1, has not been
reviewed yet.

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585819.html

On 12/6/21 5:51 PM, Martin Sebor wrote:

Ping:
https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585819.html

On 11/30/21 3:55 PM, Martin Sebor wrote:

Attached is a revision of this patch with adjustments for
the changes to the prerequisite patch 1 in the series and
a couple of minor simplifications and slightly improved
test coverage, rested on x86_64-linux.

On 11/1/21 4:18 PM, Martin Sebor wrote:

Patch 2 in this series adds support for detecting the uses of
dangling pointers: those to auto objects that have gone out of
scope.  Like patch 1, to minimize false positives this detection
is very simplistic.  However, thanks to the more deterministic
nature of the problem (all local objects go out of scope) is able
to detect more instances of it.  The approach I used is to simply
search the IL for clobbers that dominate uses of pointers to
the clobbered objects.  If such a use is found that's not
followed by a clobber of the same object the warning triggers.
Similar to -Wuse-after-free, the new -Wdangling-pointer option
has multiple levels: level 1 to detect unconditional uses and
level 2 to flag conditional ones.  Unlike with -Wuse-after-free
there is no use case for testing dangling pointers for
equality, so there is no level 3.

Tested on x86_64-linux and  by building Glibc and Binutils/GDB.
It found no problems outside of the GCC test suite.

As with the first patch in this series, the tests contain a number
of xfails due to known limitations marked with pr??.  I'll
open bugs for them before committing the patch if I don't resolve
them first in a followup.

Martin












Re: [PATCH] C++ P0482R6 char8_t: declare std::c8rtomb and std::mbrtoc8 if provided by the C library

2022-01-10 Thread Jonathan Wakely via Gcc-patches
On Mon, 10 Jan 2022 at 21:24, Tom Honermann via Libstdc++
 wrote:
>
> On 1/10/22 8:23 AM, Jonathan Wakely wrote:
> >
> >
> > On Sat, 8 Jan 2022 at 00:42, Tom Honermann via Libstdc++
> > mailto:libstdc%2b...@gcc.gnu.org>> wrote:
> >
> > This patch completes implementation of the C++20 proposal P0482R6
> > [1] by
> > adding declarations of std::c8rtomb() and std::mbrtoc8() in
> >  if
> > provided by the C library in .
> >
> > This patch addresses feedback provided in response to a previous
> > patch
> > submission [2].
> >
> > Autoconf changes determine if the C library declares c8rtomb and
> > mbrtoc8
> > at global scope when uchar.h is included and compiled with either
> > -fchar8_t or -std=c++20. New
> > _GLIBCXX_USE_UCHAR_C8RTOMB_MBRTOC8_FCHAR8_T
> > and _GLIBCXX_USE_UCHAR_C8RTOMB_MBRTOC8_CXX20 configuration macros
> > reflect the probe results. The  header declares these
> > functions
> > in the std namespace only if available and the _GLIBCXX_USE_CHAR8_T
> > configuration macro is defined (by default it is defined if the C++20
> > __cpp_char8_t feature test macro is defined)
> >
> > Patches to glibc to implement c8rtomb and mbrtoc8 have been
> > submitted [3].
> >
> > New tests validate the presence of these declarations. The tests pass
> > trivially if the C library does not provide these functions.
> > Otherwise
> > they ensure that the functions are declared when  is included
> > and either -fchar8_t or -std=c++20 is enabled.
> >
> > Tested on Linux x86_64.
> >
> > libstdc++-v3/ChangeLog:
> >
> > 2022-01-07  Tom Honermann   > >
> >
> > * acinclude.m4 Define config macros if uchar.h provides
> > c8rtomb() and mbrtoc8().
> > * config.h.in : Re-generate.
> > * configure: Re-generate.
> > * include/c_compatibility/uchar.h: Declare ::c8rtomb and
> > ::mbrtoc8.
> > * include/c_global/cuchar: Declare std::c8rtomb and
> > std::mbrtoc8.
> > * include/c_std/cuchar: Declare std::c8rtomb and std::mbrtoc8.
> > * testsuite/21_strings/headers/cuchar/functions_std_cxx20.cc:
> > New test.
> > *
> > testsuite/21_strings/headers/cuchar/functions_std_fchar8_t.cc:
> > New test.
> >
> >
> >
> > Thanks, Tom, this looks good and I'll get it committed for GCC 12.
> Thank you!
> >
> > My only concern is that the new tests depend on an internal macro:
> >
> > +#if _GLIBCXX_USE_UCHAR_C8RTOMB_MBRTOC8_CXX20
> > +  using std::mbrtoc8;
> > +  using std::c8rtomb;
> >
> > I prefer if tests are written as "user code" when possible, and not
> > using our internal macros. That isn't always possible, and in this
> > case would require adding new effective-target keyword to
> > testsuite/lib/libstdc++.exp just for use in these two tests. I don't
> > think we should bother with that.
> I went with this approach solely due to my unfamiliarity with the test
> system. I knew there should be a way to conditionally make the test
> "pass" as unsupported or as an expected failure, but didn't know how to
> go about implementing that. I don't mind following up with an additional
> patch if such a change is desirable. I took a look at
> testsuite/lib/libstdc++.exp and it looks like it may be pretty straight
> forward to add effective-target support. It would probably be a good
> learning experience for me. I'll prototype and report back.

Yes, it's very easy to do. Take a look at the
check_effective_target_blah procs in that file, especially the later
ones that use v3_check_preprocessor_condition. You can use that to
define an effective target keyword for any preprocessor condition
(such as the new macros you're adding).

Then the test can do:
// { dg-do compile { target blah } }
which will make it UNSUPPORTED if the effective target proc doesn't return true.
See https://gcc.gnu.org/onlinedocs/gccint/Selectors.html#Selectors for
the docs on target selectors.

I'm just not sure it's worth adding a new keyword for just two tests.


> >
> > I suppose strictly speaking we should not define __cpp_lib_char8_t
> > unless these two functions are present in libc. But I'm not sure we
> > want to change that now either.
>
> All of libstdc++, libc++, and MS STL have been defining
> __cpp_lib_char8_t despite the absence of these functions, so yeah, I
> don't think we want to change that.

OK, thanks.


PING^2 (C/C++): Re: [PATCH 6/6] Add __attribute__ ((tainted))

2022-01-10 Thread David Malcolm via Gcc-patches
On Thu, 2022-01-06 at 09:08 -0500, David Malcolm wrote:
> On Sat, 2021-11-13 at 15:37 -0500, David Malcolm wrote:
> > This patch adds a new __attribute__ ((tainted)) to the C/C++
> > frontends.
> 
> Ping for GCC C/C++ mantainers for review of the C/C++ FE parts of this
> patch (attribute registration, documentation, the name of the
> attribute, etc).
> 
> (I believe it's independent of the rest of the patch kit, in that it
> could go into trunk without needing the prior patches)
> 
> Thanks
> Dave

Getting close to end of stage 3 for GCC 12, so pinging this patch
again...

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/584376.html

Thanks
Dave

> 
> 
> > 
> > It can be used on function decls: the analyzer will treat as tainted
> > all parameters to the function and all buffers pointed to by
> > parameters
> > to the function.  Adding this in one place to the Linux kernel's
> > __SYSCALL_DEFINEx macro allows the analyzer to treat all syscalls as
> > having tainted inputs.  This gives additional testing beyond e.g.
> > __user
> > pointers added by earlier patches - an example of the use of this can
> > be
> > seen in CVE-2011-2210, where given:
> > 
> >  SYSCALL_DEFINE5(osf_getsysinfo, unsigned long, op, void __user *,
> > buffer,
> >  unsigned long, nbytes, int __user *, start, void
> > __user *, arg)
> > 
> > the analyzer will treat the nbytes param as under attacker control,
> > and
> > can complain accordingly:
> > 
> > taint-CVE-2011-2210-1.c: In function ‘sys_osf_getsysinfo’:
> > taint-CVE-2011-2210-1.c:69:21: warning: use of attacker-controlled
> > value
> >   ‘nbytes’ as size without upper-bounds checking [CWE-129] [-
> > Wanalyzer-tainted-size]
> >    69 | if (copy_to_user(buffer, hwrpb, nbytes) != 0)
> >   | ^~~
> > 
> > Additionally, the patch allows the attribute to be used on field
> > decls:
> > specifically function pointers.  Any function used as an initializer
> > for such a field gets treated as tainted.  An example can be seen in
> > CVE-2020-13143, where adding __attribute__((tainted)) to the "store"
> > callback of configfs_attribute:
> > 
> >   struct configfs_attribute {
> >  /* [...snip...] */
> >  ssize_t (*store)(struct config_item *, const char *, size_t)
> >    __attribute__((tainted));
> >  /* [...snip...] */
> >   };
> > 
> > allows the analyzer to see:
> > 
> >  CONFIGFS_ATTR(gadget_dev_desc_, UDC);
> > 
> > and treat gadget_dev_desc_UDC_store as tainted, so that it complains:
> > 
> > taint-CVE-2020-13143-1.c: In function ‘gadget_dev_desc_UDC_store’:
> > taint-CVE-2020-13143-1.c:33:17: warning: use of attacker-controlled
> > value
> >   ‘len + 18446744073709551615’ as offset without upper-bounds
> > checking [CWE-823] [-Wanalyzer-tainted-offset]
> >    33 | if (name[len - 1] == '\n')
> >   | ^
> > 
> > Similarly, the attribute could be used on the ioctl callback field,
> > USB device callbacks, network-handling callbacks etc.  This
> > potentially
> > gives a lot of test coverage with relatively little code annotation,
> > and
> > without necessarily needing link-time analysis (which -fanalyzer can
> > only do at present on trivial examples).
> > 
> > I believe this is the first time we've had an attribute on a field.
> > If that's an issue, I could prepare a version of the patch that
> > merely allowed it on functions themselves.
> > 
> > As before this currently still needs -fanalyzer-checker=taint (in
> > addition to -fanalyzer).
> > 
> > gcc/analyzer/ChangeLog:
> > * engine.cc: Include "stringpool.h", "attribs.h", and
> > "tree-dfa.h".
> > (mark_params_as_tainted): New.
> > (class tainted_function_custom_event): New.
> > (class tainted_function_info): New.
> > (exploded_graph::add_function_entry): Handle functions with
> > "tainted" attribute.
> > (class tainted_field_custom_event): New.
> > (class tainted_callback_custom_event): New.
> > (class tainted_call_info): New.
> > (add_tainted_callback): New.
> > (add_any_callbacks): New.
> > (exploded_graph::build_initial_worklist): Find callbacks that
> > are
> > reachable from global initializers, calling add_any_callbacks
> > on
> > them.
> > 
> > gcc/c-family/ChangeLog:
> > * c-attribs.c (c_common_attribute_table): Add "tainted".
> > (handle_tainted_attribute): New.
> > 
> > gcc/ChangeLog:
> > * doc/extend.texi (Function Attributes): Note that "tainted"
> > can
> > be used on field decls.
> > (Common Function Attributes): Add entry on "tainted"
> > attribute.
> > 
> > gcc/testsuite/ChangeLog:
> > * gcc.dg/analyzer/attr-tainted-1.c: New test.
> > * gcc.dg/analyzer/attr-tainted-misuses.c: New test.
> > * gcc.dg/analyzer/taint-CVE-2011-2210-1.c: New test.
> > * gcc.dg/analyzer/taint-CVE-2020-1

Re: [PATCH] C++ P0482R6 char8_t: declare std::c8rtomb and std::mbrtoc8 if provided by the C library

2022-01-10 Thread Tom Honermann via Gcc-patches

On 1/10/22 8:23 AM, Jonathan Wakely wrote:



On Sat, 8 Jan 2022 at 00:42, Tom Honermann via Libstdc++ 
mailto:libstdc%2b...@gcc.gnu.org>> wrote:


This patch completes implementation of the C++20 proposal P0482R6
[1] by
adding declarations of std::c8rtomb() and std::mbrtoc8() in
 if
provided by the C library in .

This patch addresses feedback provided in response to a previous
patch
submission [2].

Autoconf changes determine if the C library declares c8rtomb and
mbrtoc8
at global scope when uchar.h is included and compiled with either
-fchar8_t or -std=c++20. New
_GLIBCXX_USE_UCHAR_C8RTOMB_MBRTOC8_FCHAR8_T
and _GLIBCXX_USE_UCHAR_C8RTOMB_MBRTOC8_CXX20 configuration macros
reflect the probe results. The  header declares these
functions
in the std namespace only if available and the _GLIBCXX_USE_CHAR8_T
configuration macro is defined (by default it is defined if the C++20
__cpp_char8_t feature test macro is defined)

Patches to glibc to implement c8rtomb and mbrtoc8 have been
submitted [3].

New tests validate the presence of these declarations. The tests pass
trivially if the C library does not provide these functions.
Otherwise
they ensure that the functions are declared when  is included
and either -fchar8_t or -std=c++20 is enabled.

Tested on Linux x86_64.

libstdc++-v3/ChangeLog:

2022-01-07  Tom Honermann  mailto:t...@honermann.net>>

        * acinclude.m4 Define config macros if uchar.h provides
        c8rtomb() and mbrtoc8().
        * config.h.in : Re-generate.
        * configure: Re-generate.
        * include/c_compatibility/uchar.h: Declare ::c8rtomb and
        ::mbrtoc8.
        * include/c_global/cuchar: Declare std::c8rtomb and
        std::mbrtoc8.
        * include/c_std/cuchar: Declare std::c8rtomb and std::mbrtoc8.
        * testsuite/21_strings/headers/cuchar/functions_std_cxx20.cc:
        New test.
        *
testsuite/21_strings/headers/cuchar/functions_std_fchar8_t.cc:
        New test.



Thanks, Tom, this looks good and I'll get it committed for GCC 12.

Thank you!


My only concern is that the new tests depend on an internal macro:

+#if _GLIBCXX_USE_UCHAR_C8RTOMB_MBRTOC8_CXX20
+  using std::mbrtoc8;
+  using std::c8rtomb;

I prefer if tests are written as "user code" when possible, and not 
using our internal macros. That isn't always possible, and in this 
case would require adding new effective-target keyword to 
testsuite/lib/libstdc++.exp just for use in these two tests. I don't 
think we should bother with that.
I went with this approach solely due to my unfamiliarity with the test 
system. I knew there should be a way to conditionally make the test 
"pass" as unsupported or as an expected failure, but didn't know how to 
go about implementing that. I don't mind following up with an additional 
patch if such a change is desirable. I took a look at 
testsuite/lib/libstdc++.exp and it looks like it may be pretty straight 
forward to add effective-target support. It would probably be a good 
learning experience for me. I'll prototype and report back.


I suppose strictly speaking we should not define __cpp_lib_char8_t 
unless these two functions are present in libc. But I'm not sure we 
want to change that now either.


All of libstdc++, libc++, and MS STL have been defining 
__cpp_lib_char8_t despite the absence of these functions, so yeah, I 
don't think we want to change that.


Tom.



Re: [PATCH] Improve sequence logic in cxx_init_decl_processing

2022-01-10 Thread Olivier Hainque via Gcc-patches



> On 10 Jan 2022, at 20:02, Jason Merrill  wrote:
> 
>> The attached patch just moves the reset above the test.
> 
> OK.

Great, thanks Jason!

>> 2021-12-30  Olivier Hainque  
>> gcc/
>>  * cp/decl.c (cxx_init_decl_processing): Move code possibly
>>  altering flag_weak before code testing it.
>> Olivier



[PATCH] i386: Introduce V2QImode vector compares [PR103861]

2022-01-10 Thread Uros Bizjak via Gcc-patches
Add V2QImode vector compares with SSE registers.

2022-01-10  Uroš Bizjak  

gcc/ChangeLog:

PR target/103861
* config/i386/i386-expand.c (ix86_expand_int_sse_cmp):
Handle V2QImode.
* config/i386/mmx.md (3):
Use VI1_16_32 mode iterator.
(*eq3): Ditto.
(*gt3): Ditto.
(*xop_maskcmp3): Ditto.
(*xop_maskcmp_uns3): Ditto.
(vec_cmp): Ditto.
(vec_cmpu): Ditto.

gcc/testsuite/ChangeLog:

PR target/103861
* gcc.target/i386/pr103861-2.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pushed to master.

Uros.
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index 95bba254daf..add748bcf40 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -,6 +,12 @@ ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, 
rtx cop0, rtx cop1,
  else if (code == GT && TARGET_SSE4_1)
gen = gen_sminv4qi3;
  break;
+   case E_V2QImode:
+ if (code == GTU && TARGET_SSE2)
+   gen = gen_uminv2qi3;
+ else if (code == GT && TARGET_SSE4_1)
+   gen = gen_sminv2qi3;
+ break;
case E_V8HImode:
  if (code == GTU && TARGET_SSE4_1)
gen = gen_uminv8hi3;
@@ -4537,6 +4543,7 @@ ix86_expand_int_sse_cmp (rtx dest, enum rtx_code code, 
rtx cop0, rtx cop1,
case E_V16QImode:
case E_V8QImode:
case E_V4QImode:
+   case E_V2QImode:
case E_V8HImode:
case E_V4HImode:
case E_V2HImode:
diff --git a/gcc/config/i386/mmx.md b/gcc/config/i386/mmx.md
index 4fc3e00f100..91d642187be 100644
--- a/gcc/config/i386/mmx.md
+++ b/gcc/config/i386/mmx.md
@@ -1824,10 +1824,10 @@
(set_attr "mode" "DI,TI,TI")])
 
 (define_insn "*3"
-  [(set (match_operand:VI_32 0 "register_operand" "=x,Yw")
-(sat_plusminus:VI_32
- (match_operand:VI_32 1 "register_operand" "0,Yw")
- (match_operand:VI_32 2 "register_operand" "x,Yw")))]
+  [(set (match_operand:VI_16_32 0 "register_operand" "=x,Yw")
+(sat_plusminus:VI_16_32
+ (match_operand:VI_16_32 1 "register_operand" "0,Yw")
+ (match_operand:VI_16_32 2 "register_operand" "x,Yw")))]
   "TARGET_SSE2"
   "@
p\t{%2, %0|%0, %2}
@@ -2418,10 +2418,10 @@
(set_attr "mode" "DI,TI,TI")])
 
 (define_insn "*eq3"
-  [(set (match_operand:VI_32 0 "register_operand" "=x,x")
-(eq:VI_32
- (match_operand:VI_32 1 "register_operand" "%0,x")
- (match_operand:VI_32 2 "register_operand" "x,x")))]
+  [(set (match_operand:VI_16_32 0 "register_operand" "=x,x")
+(eq:VI_16_32
+ (match_operand:VI_16_32 1 "register_operand" "%0,x")
+ (match_operand:VI_16_32 2 "register_operand" "x,x")))]
   "TARGET_SSE2"
   "@
pcmpeq\t{%2, %0|%0, %2}
@@ -2446,10 +2446,10 @@
(set_attr "mode" "DI,TI,TI")])
 
 (define_insn "*gt3"
-  [(set (match_operand:VI_32 0 "register_operand" "=x,x")
-(gt:VI_32
- (match_operand:VI_32 1 "register_operand" "0,x")
- (match_operand:VI_32 2 "register_operand" "x,x")))]
+  [(set (match_operand:VI_16_32 0 "register_operand" "=x,x")
+(gt:VI_16_32
+ (match_operand:VI_16_32 1 "register_operand" "0,x")
+ (match_operand:VI_16_32 2 "register_operand" "x,x")))]
   "TARGET_SSE2"
   "@
pcmpgt\t{%2, %0|%0, %2}
@@ -2473,10 +2473,10 @@
(set_attr "mode" "TI")])
 
 (define_insn "*xop_maskcmp3"
-  [(set (match_operand:VI_32 0 "register_operand" "=x")
-   (match_operator:VI_32 1 "ix86_comparison_int_operator"
-[(match_operand:VI_32 2 "register_operand" "x")
- (match_operand:VI_32 3 "register_operand" "x")]))]
+  [(set (match_operand:VI_16_32 0 "register_operand" "=x")
+   (match_operator:VI_16_32 1 "ix86_comparison_int_operator"
+[(match_operand:VI_16_32 2 "register_operand" "x")
+ (match_operand:VI_16_32 3 "register_operand" "x")]))]
   "TARGET_XOP"
   "vpcom%Y1\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr "type" "sse4arg")
@@ -2501,10 +2501,10 @@
(set_attr "mode" "TI")])
 
 (define_insn "*xop_maskcmp_uns3"
-  [(set (match_operand:VI_32 0 "register_operand" "=x")
-   (match_operator:VI_32 1 "ix86_comparison_uns_operator"
-[(match_operand:VI_32 2 "register_operand" "x")
- (match_operand:VI_32 3 "register_operand" "x")]))]
+  [(set (match_operand:VI_16_32 0 "register_operand" "=x")
+   (match_operator:VI_16_32 1 "ix86_comparison_uns_operator"
+[(match_operand:VI_16_32 2 "register_operand" "x")
+ (match_operand:VI_16_32 3 "register_operand" "x")]))]
   "TARGET_XOP"
   "vpcom%Y1u\t{%3, %2, %0|%0, %2, %3}"
   [(set_attr "type" "ssecmp")
@@ -2527,10 +2527,10 @@
 })
 
 (define_expand "vec_cmp"
-  [(set (match_operand:VI_32 0 "register_operand")
-   (match_operator:VI_32 1 ""
- [(match_operand:VI_32 2 "register_operand")
-  (match_operand:VI_32 3 "register_operand"

[PATCH] tree-optimization/103948 - detect vector vec_cmp in expand_vector_condition

2022-01-10 Thread Uros Bizjak via Gcc-patches
Currently, expand_vector_condition detects only vcondMN and vconduMN
named RTX patterns.  Teach it to also consider vec_cmpMN and vec_cmpuMN
RTX patterns when all ones vector is returned for true and all zeros vector
is returned for false.

Patch by Richard, I tested it on the patched x86 target and wrote a
ChangeLog entry.

(No testcase, it will be added in a follow-up target-dependent patch).

2022-01-10  Richard Biener  

gcc/ChangeLog:

PR tree-optimization/103948
* tree-vect-generic.c (expand_vector_condition): Return true if
all ones vector is returned for true, all zeros vector for false
and the target defines corresponding vec_cmp{,u}MN named RTX pattern.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Pre-approved in the PR and pushed to master.

Uros.
diff --git a/gcc/tree-vect-generic.c b/gcc/tree-vect-generic.c
index 6afb6999cd7..5814a71a5bb 100644
--- a/gcc/tree-vect-generic.c
+++ b/gcc/tree-vect-generic.c
@@ -1052,7 +1052,9 @@ expand_vector_condition (gimple_stmt_iterator *gsi, 
bitmap dce_ssa_names)
}
 }
 
-  if (expand_vec_cond_expr_p (type, TREE_TYPE (a1), code))
+  if (expand_vec_cond_expr_p (type, TREE_TYPE (a1), code)
+  || (integer_all_onesp (b) && integer_zerop (c)
+ && expand_vec_cmp_expr_p (type, TREE_TYPE (a1), code)))
 {
   gcc_assert (TREE_CODE (a) == SSA_NAME || TREE_CODE (a) == VECTOR_CST);
   return true;


Re: [PATCH] c++: constexpr base-to-derived conversion with offset 0 [PR103879]

2022-01-10 Thread Jason Merrill via Gcc-patches

On 1/4/22 11:54, Patrick Palka wrote:

r12-136 made us canonicalize an object/offset pair with negative offset
into one with a nonnegative offset, by iteratively absorbing the
innermost component into the offset and stopping as soon as the offset
becomes nonnegative.

This patch strengthens this transformation to make it keep absorbing
even if the offset is already 0 as long as the innermost component is at
position 0 (and thus absorbing doesn't change the offset).  This lets us
accept the two constexpr testcases below, which we'd previously reject
essentially because cxx_fold_indirect_ref wasn't able to resolve
*(B*)&b.D123 (where D123 is the base subobject A at position 0) to just b.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/103879

gcc/cp/ChangeLog:

* constexpr.c (cxx_fold_indirect_ref): Split out object/offset
canonicalization step into a local lambda.  Strengthen it to
absorb more components at position 0.  Use it before both calls
to cxx_fold_indirect_ref_1.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1y/constexpr-base2.C: New test.
* g++.dg/cpp1y/constexpr-base2a.C: New test.
---
  gcc/cp/constexpr.c| 38 +--
  gcc/testsuite/g++.dg/cpp1y/constexpr-base2.C  | 21 ++
  gcc/testsuite/g++.dg/cpp1y/constexpr-base2a.C | 25 
  3 files changed, 72 insertions(+), 12 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-base2.C
  create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-base2a.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index 72be45c9e87..1ec33a00ee5 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -5144,6 +5144,25 @@ cxx_fold_indirect_ref (const constexpr_ctx *ctx, 
location_t loc, tree type,
if (!INDIRECT_TYPE_P (subtype))
  return NULL_TREE;
  
+  /* Canonicalizes the given OBJ/OFF pair by iteratively absorbing

+ the innermost component into the offset until the offset is
+ nonnegative,


Maybe "until it would make the offset positive" now that you continue 
with repeated zeros.  OK with that change.



so that cxx_fold_indirect_ref_1 can identify
+ more folding opportunities.  */
+  auto canonicalize_obj_off = [] (tree& obj, tree& off) {
+while (TREE_CODE (obj) == COMPONENT_REF
+  && (tree_int_cst_sign_bit (off) || integer_zerop (off)))
+  {
+   tree field = TREE_OPERAND (obj, 1);
+   tree pos = byte_position (field);
+   if (integer_zerop (off) && integer_nonzerop (pos))
+ /* If the offset is already 0, keep going as long as the
+component is at position 0.  */
+ break;
+   off = int_const_binop (PLUS_EXPR, off, pos);
+   obj = TREE_OPERAND (obj, 0);
+  }
+  };

if (TREE_CODE (sub) == ADDR_EXPR)
  {
tree op = TREE_OPERAND (sub, 0);
@@ -5162,7 +5181,12 @@ cxx_fold_indirect_ref (const constexpr_ctx *ctx, 
location_t loc, tree type,
return op;
}
else
-   return cxx_fold_indirect_ref_1 (ctx, loc, type, op, 0, empty_base);
+   {
+ tree off = integer_zero_node;
+ canonicalize_obj_off (op, off);
+ gcc_assert (integer_zerop (off));
+ return cxx_fold_indirect_ref_1 (ctx, loc, type, op, 0, empty_base);
+   }
  }
else if (TREE_CODE (sub) == POINTER_PLUS_EXPR
   && tree_fits_uhwi_p (TREE_OPERAND (sub, 1)))
@@ -5174,17 +5198,7 @@ cxx_fold_indirect_ref (const constexpr_ctx *ctx, 
location_t loc, tree type,
if (TREE_CODE (op00) == ADDR_EXPR)
{
  tree obj = TREE_OPERAND (op00, 0);
- while (TREE_CODE (obj) == COMPONENT_REF
-&& tree_int_cst_sign_bit (off))
-   {
- /* Canonicalize this object/offset pair by iteratively absorbing
-the innermost component into the offset until the offset is
-nonnegative, so that cxx_fold_indirect_ref_1 can identify
-more folding opportunities.  */
- tree field = TREE_OPERAND (obj, 1);
- off = int_const_binop (PLUS_EXPR, off, byte_position (field));
- obj = TREE_OPERAND (obj, 0);
-   }
+ canonicalize_obj_off (obj, off);
  return cxx_fold_indirect_ref_1 (ctx, loc, type, obj,
  tree_to_uhwi (off), empty_base);
}
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-base2.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-base2.C
new file mode 100644
index 000..7cbf5bf32b7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-base2.C
@@ -0,0 +1,21 @@
+// PR c++/103879
+// { dg-do compile { target c++14 } }
+
+struct A {
+  int n = 42;
+};
+
+struct B : A { };
+
+struct C {
+  B b;
+};
+
+constexpr int f() {
+  C c;
+  A& a = static_cast(c.b);
+  B& b = static_cast(a);
+  return b.n;
+}
+
+static_assert(f() == 42, "");
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-base2a.C

Re: [PATCH] c++: "more constrained" vs staticness of memfn [PR103783]

2022-01-10 Thread Jason Merrill via Gcc-patches

On 1/4/22 13:01, Patrick Palka wrote:

Here we're rejecting the calls to g1 and g2 as ambiguous even though one
overload is more constrained than the other (and otherwise equivalent),
because the implicit 'this' parameter of the non-static overload causes
cand_parms_match to think the function parameter lists aren't equivalent.

This patch fixes this by making cand_parms_match skip over 'this'
appropriately.  Note that this bug only occurs with non-template member
functions because for the template case more_specialized_fns seems to
already skips over 'this' appropriately.

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk and perhaps 11?


OK for both.


PR c++/103783

gcc/cp/ChangeLog:

* call.c (cand_parms_match): Skip over 'this' when given one
static and one non-static member function.  Declare static.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/concepts-memfun2.C: New test.
---
  gcc/cp/call.c | 17 ++---
  gcc/testsuite/g++.dg/cpp2a/concepts-memfun2.C | 25 +++
  2 files changed, 39 insertions(+), 3 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/concepts-memfun2.C

diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index 7f7ee88deed..ed74b907828 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -11918,7 +11918,7 @@ joust_maybe_elide_copy (z_candidate *&cand)
  /* True if the defining declarations of the two candidates have equivalent
 parameters.  */
  
-bool

+static bool
  cand_parms_match (z_candidate *c1, z_candidate *c2)
  {
tree fn1 = c1->fn;
@@ -11940,8 +11940,19 @@ cand_parms_match (z_candidate *c1, z_candidate *c2)
fn1 = DECL_TEMPLATE_RESULT (t1);
fn2 = DECL_TEMPLATE_RESULT (t2);
  }
-  return compparms (TYPE_ARG_TYPES (TREE_TYPE (fn1)),
-   TYPE_ARG_TYPES (TREE_TYPE (fn2)));
+  tree parms1 = TYPE_ARG_TYPES (TREE_TYPE (fn1));
+  tree parms2 = TYPE_ARG_TYPES (TREE_TYPE (fn2));
+  if (DECL_FUNCTION_MEMBER_P (fn1)
+  && DECL_FUNCTION_MEMBER_P (fn2)
+  && (DECL_NONSTATIC_MEMBER_FUNCTION_P (fn1)
+ != DECL_NONSTATIC_MEMBER_FUNCTION_P (fn2)))
+{
+  /* Ignore 'this' when comparing the parameters of a static member
+function with those of a non-static one.  */
+  parms1 = skip_artificial_parms_for (fn1, parms1);
+  parms2 = skip_artificial_parms_for (fn2, parms2);
+}
+  return compparms (parms1, parms2);
  }
  
  /* Compare two candidates for overloading as described in

diff --git a/gcc/testsuite/g++.dg/cpp2a/concepts-memfun2.C 
b/gcc/testsuite/g++.dg/cpp2a/concepts-memfun2.C
new file mode 100644
index 000..e3845e48387
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/concepts-memfun2.C
@@ -0,0 +1,25 @@
+// PR c++/103783
+// { dg-do compile { target c++20 } }
+
+template
+struct A {
+  template void f1() = delete;
+  template static void f1() requires B;
+
+  template void f2() requires B;
+  template static void f2() = delete;
+
+  void g1() = delete;
+  static void g1() requires B;
+
+  void g2() requires B;
+  static void g2() = delete;
+};
+
+int main() {
+  A a;
+  a.f1(); // OK
+  a.f2(); // OK
+  a.g1(); // OK, previously rejected as ambiguous
+  a.g2(); // OK, previously rejected as ambiguous
+}




Re: [PATCH] Improve sequence logic in cxx_init_decl_processing

2022-01-10 Thread Jason Merrill via Gcc-patches

On 1/6/22 03:26, Olivier Hainque wrote:

Hello,

commit aa2c978400f3b3ca6e9f2d18598a379589e77ba0, introduced per

   https://gcc.gnu.org/pipermail/gcc-patches/2020-May/545552.html

makes references to __cxa_pure_virtual weak and this is causing
issues on some VxWorks configurations, where weak symbols are only
supported for one of the two major operating modes, and not on all
versions.

While trying to circumvent that, I noticed that the current
code in cxx_init_decl_processing does something like:

   if (flag_weak)
 /* If no definition is available, resolve references to NULL.  */
 declare_weak (abort_fndecl);
   ...
   if (! supports_one_only ())
 flag_weak = 0;


The code possibly resetting flag_weak should presumlably execute
before the test checking the flag, or we'd need a comment explaining
why this surprising order is on purpose.

The attached patch just moves the reset above the test.

It bootstraps/regtests fine on x86_64-linux and allows better control
on vxWorks. I'm not yet clear on some of the ramifications there (tigthening
the definitions of SUPPORTS_ONE_ONLY and TARGET_SUPPORTS_WEAK yields lots of
dg test failures) but that's another story.

Ok to commit?


OK.


Thanks in advance!

2021-12-30  Olivier Hainque  

gcc/
* cp/decl.c (cxx_init_decl_processing): Move code possibly
altering flag_weak before code testing it.

Olivier









Re: [PATCH] c++: Ensure some more that immediate functions aren't gimplified [PR103912]

2022-01-10 Thread Jason Merrill via Gcc-patches

On 1/6/22 03:48, Jakub Jelinek wrote:

Hi!

Immediate functions should never be emitted into assembly, the FE doesn't
genericize them and does various things to ensure they aren't gimplified.
But the following testcase ICEs anyway due to that, because the consteval
function returns a lambda, and operator() of the lambda has
decl_function_context of the consteval function.  cgraphunit.c then
does:
   /* Preserve a functions function context node.  It will
  later be needed to output debug info.  */
   if (tree fn = decl_function_context (decl))
 {
   cgraph_node *origin_node = cgraph_node::get_create (fn);
   enqueue_node (origin_node);
 }
which enqueues the immediate function and then tries to gimplify it,
which results in ICE because it hasn't been genericized.

When I try similar testcase with constexpr instead of consteval and
static constinit auto instead of auto in main, what happens is that
the functions are gimplified, later ipa.c discovers they aren't reachable
and sets body_removed to true for them (and clears other flags) and we end
up with a debug info which has the foo and bar functions without
DW_AT_low_pc and other code specific attributes, just stuff from its BLOCK
structure and in there the lambda with DW_AT_low_pc etc.

The following patch attempts to emulate that behavior early, so that cgraph
doesn't try to gimplify those and pretends they were already gimplified
and found unused and optimized away.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2022-01-06  Jakub Jelinek  

PR c++/103912
* semantics.c (expand_or_defer_fn): For immediate functions, set
node->body_removed to true and clear analyzed, definition and
force_output.
* decl2.c (c_parse_final_cleanups): Ignore immediate functions for
expand_or_defer_fn.

* g++.dg/cpp2a/consteval26.C: New test.

--- gcc/cp/semantics.c.jj   2022-01-03 10:40:48.0 +0100
+++ gcc/cp/semantics.c  2022-01-05 12:52:11.484379138 +0100
@@ -4785,6 +4785,18 @@ expand_or_defer_fn (tree fn)
emit_associated_thunks (fn);
  
function_depth--;

+
+  if (DECL_IMMEDIATE_FUNCTION_P (fn))
+   {
+ cgraph_node *node = cgraph_node::get (fn);
+ if (node)


This can be

if (cgraph_node *node = cgraph_node::get (fn))

OK either way.


+   {
+ node->body_removed = true;
+ node->analyzed = false;
+ node->definition = false;
+ node->force_output = false;
+   }
+   }
  }
  }
  
--- gcc/cp/decl2.c.jj	2022-01-03 10:40:48.083068010 +0100

+++ gcc/cp/decl2.c  2022-01-05 12:53:34.930204119 +0100
@@ -5272,6 +5272,7 @@ c_parse_final_cleanups (void)
  if (!DECL_EXTERNAL (decl)
  && decl_needed_p (decl)
  && !TREE_ASM_WRITTEN (decl)
+ && !DECL_IMMEDIATE_FUNCTION_P (decl)
  && !node->definition)
{
  /* We will output the function; no longer consider it in this
--- gcc/testsuite/g++.dg/cpp2a/consteval26.C.jj 2022-01-05 12:42:07.918878074 
+0100
+++ gcc/testsuite/g++.dg/cpp2a/consteval26.C2022-01-05 12:40:18.853416637 
+0100
@@ -0,0 +1,39 @@
+// PR c++/103912
+// { dg-do run { target c++20 } }
+// { dg-additional-options "-O2 -g -fkeep-inline-functions" }
+
+extern "C" void abort ();
+
+struct A { A () {} };
+
+consteval auto
+foo ()
+{
+  if (1)
+;
+  return [] (A x) { return 1; };
+}
+
+consteval auto
+bar (int a)
+{
+  int b = a + 4;
+  if (1)
+;
+  return [=] (A x) { return a + b; };
+}
+
+int
+main ()
+{
+  A x;
+  auto h = foo ();
+  if (h (x) != 1)
+abort ();
+  auto i = bar (5);
+  if (i (x) != 14)
+abort ();
+  auto j = bar (42);
+  if (j (x) != 88)
+abort ();
+}

Jakub





Re: [PATCH 1v2/3][vect] Add main vectorized loop unrolling

2022-01-10 Thread Andre Vieira (lists) via Gcc-patches

Hi,

I don't think I ever ended up posting the rebased version on top of the 
epilogue mode patch. So here it is, I think I had a conditional OK if I 
split the epilogue mode patch, but just want to double check this is OK 
for trunk?



gcc/ChangeLog:

    * tree-vect-loop.c (vect_estimate_min_profitable_iters): Pass 
new argument

    suggested_unroll_factor.
    (vect_analyze_loop_costing): Likewise.
    (_loop_vec_info::_loop_vec_info): Initialize new member 
suggested_unroll_factor.
    (vect_determine_partial_vectors_and_peeling): Make epilogue of 
unrolled

    main loop use partial vectors.
    (vect_analyze_loop_2): Pass and use new argument 
suggested_unroll_factor.

    (vect_analyze_loop_1): Likewise.
    (vect_analyze_loop): Change to intialize local 
suggested_unroll_factor and use it.
    (vectorizable_reduction): Don't use single_defuse_cycle when 
unrolling.
    * tree-vectorizer.h (_loop_vec_info::_loop_vec_info): Add new 
member suggested_unroll_factor.
    (vector_costs::vector_costs): Add new member 
m_suggested_unroll_factor.

    (vector_costs::suggested_unroll_factor): New getter function.
    (finish_cost): Set return argument suggested_unroll_factor.



Regards,
Andre

On 30/11/2021 13:56, Richard Biener wrote:

On Tue, 30 Nov 2021, Andre Vieira (lists) wrote:


On 25/11/2021 12:46, Richard Biener wrote:

Oops, my fault, yes, it does.  I would suggest to refactor things so
that the mode_i = first_loop_i case is there only once.  I also wonder
if all the argument about starting at 0 doesn't apply to the
not unrolled LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P as well?  So
what's the reason to differ here?  So in the end I'd just change
the existing

if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo))
  {

to

if (LOOP_VINFO_EPIL_USING_PARTIAL_VECTORS_P (first_loop_vinfo)
|| first_loop_vinfo->suggested_unroll_factor > 1)
  {

and maybe revisit this when we have an actual testcase showing that
doing sth else has a positive effect?

Thanks,
Richard.

So I had a quick chat with Richard Sandiford and he is suggesting resetting
mode_i to 0 for all cases.

He pointed out that for some tunings the SVE mode might come after the NEON
mode, which means that even for not-unrolled loop_vinfos we could end up with
a suboptimal choice of mode for the epilogue. I.e. it could be that we pick
V16QI for main vectorization, but that's VNx16QI + 1 in the array, so we'd not
try VNx16QI for the epilogue.

This would simplify the mode selecting cases, by just simply restarting at
mode_i in all epilogue cases. Is that something you'd be OK?

Works for me with an updated comment.  Even better with showing a
testcase exercising such tuning.

Richard.Hi,

I don't think I ever ended up posting the rebased version on top of the 
epilogue mode patch. So here it is.

OK for trunk?


gcc/ChangeLog:

* tree-vect-loop.c (vect_estimate_min_profitable_iters): Pass new 
argument
suggested_unroll_factor.
(vect_analyze_loop_costing): Likewise.
(_loop_vec_info::_loop_vec_info): Initialize new member 
suggested_unroll_factor.
(vect_determine_partial_vectors_and_peeling): Make epilogue of unrolled
main loop use partial vectors.
(vect_analyze_loop_2): Pass and use new argument 
suggested_unroll_factor.
(vect_analyze_loop_1): Likewise.
(vect_analyze_loop): Change to intialize local suggested_unroll_factor 
and use it.
(vectorizable_reduction): Don't use single_defuse_cycle when unrolling.
* tree-vectorizer.h (_loop_vec_info::_loop_vec_info): Add new member 
suggested_unroll_factor.
(vector_costs::vector_costs): Add new member m_suggested_unroll_factor.
(vector_costs::suggested_unroll_factor): New getter function.
(finish_cost): Set return argument suggested_unroll_factor.



Regards,
Andre

[v2 COMMITTED] rs6000: Add Power10 optimization for _mm_blendv*

2022-01-10 Thread Paul A. Clarke via Gcc-patches
This is the patch that was committed. Thanks for the review!
---
Power10 ISA added `xxblendv*` instructions which are realized in the
`vec_blendv` instrinsic.

Use `vec_blendv` for `_mm_blendv_epi8`, `_mm_blendv_ps`, and
`_mm_blendv_pd` compatibility intrinsics, when `_ARCH_PWR10`.

Update original implementation of _mm_blendv_epi8 to use signed types,
to better match the function parameters. Realization is unchanged.

Also, copy a test from i386 for testing `_mm_blendv_ps`.
This should have come with commit ed04cf6d73e233c74c4e55c27f1cbd89ae4710e8,
but was inadvertently omitted.

2022-01-10  Paul A. Clarke  

gcc
* config/rs6000/smmintrin.h (_mm_blendv_epi8): Use vec_blendv
when _ARCH_PWR10. Use signed types.
(_mm_blendv_ps): Use vec_blendv when _ARCH_PWR10.
(_mm_blendv_pd): Likewise.

gcc/testsuite
* gcc.target/powerpc/sse4_1-blendvps.c: Copy from gcc.target/i386,
adjust dg directives to suit.
---
v2: Used signed types within new and original implementation of
_mm_blendv_epi8.

 gcc/config/rs6000/smmintrin.h | 14 +++-
 .../gcc.target/powerpc/sse4_1-blendvps.c  | 65 +++
 2 files changed, 78 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c

diff --git a/gcc/config/rs6000/smmintrin.h b/gcc/config/rs6000/smmintrin.h
index 1fda04881554..b9cb46b3c1dd 100644
--- a/gcc/config/rs6000/smmintrin.h
+++ b/gcc/config/rs6000/smmintrin.h
@@ -113,9 +113,13 @@ _mm_blend_epi16 (__m128i __A, __m128i __B, const int 
__imm8)
 extern __inline __m128i __attribute__((__gnu_inline__, __always_inline__, 
__artificial__))
 _mm_blendv_epi8 (__m128i __A, __m128i __B, __m128i __mask)
 {
+#ifdef _ARCH_PWR10
+  return (__m128i) vec_blendv ((__v16qi) __A, (__v16qi) __B, (__v16qu) __mask);
+#else
   const __v16qu __seven = vec_splats ((unsigned char) 0x07);
   __v16qu __lmask = vec_sra ((__v16qu) __mask, __seven);
-  return (__m128i) vec_sel ((__v16qu) __A, (__v16qu) __B, __lmask);
+  return (__m128i) vec_sel ((__v16qi) __A, (__v16qi) __B, __lmask);
+#endif
 }
 
 extern __inline __m128
@@ -149,9 +153,13 @@ extern __inline __m128
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blendv_ps (__m128 __A, __m128 __B, __m128 __mask)
 {
+#ifdef _ARCH_PWR10
+  return (__m128) vec_blendv ((__v4sf) __A, (__v4sf) __B, (__v4su) __mask);
+#else
   const __v4si __zero = {0};
   const __vector __bool int __boolmask = vec_cmplt ((__v4si) __mask, __zero);
   return (__m128) vec_sel ((__v4su) __A, (__v4su) __B, (__v4su) __boolmask);
+#endif
 }
 
 extern __inline __m128d
@@ -174,9 +182,13 @@ extern __inline __m128d
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_blendv_pd (__m128d __A, __m128d __B, __m128d __mask)
 {
+#ifdef _ARCH_PWR10
+  return (__m128d) vec_blendv ((__v2df) __A, (__v2df) __B, (__v2du) __mask);
+#else
   const __v2di __zero = {0};
   const __vector __bool long long __boolmask = vec_cmplt ((__v2di) __mask, 
__zero);
   return (__m128d) vec_sel ((__v2du) __A, (__v2du) __B, (__v2du) __boolmask);
+#endif
 }
 #endif
 
diff --git a/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c 
b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
new file mode 100644
index ..8fcb55383047
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/sse4_1-blendvps.c
@@ -0,0 +1,65 @@
+/* { dg-do run } */
+/* { dg-require-effective-target p8vector_hw } */
+/* { dg-options "-O2 -mpower8-vector -Wno-psabi" } */
+
+#include "sse4_1-check.h"
+
+#include 
+#include 
+
+#define NUM 20
+
+static void
+init_blendvps (float *src1, float *src2, float *mask)
+{
+  int i, msk, sign = 1; 
+
+  msk = -1;
+  for (i = 0; i < NUM * 4; i++)
+{
+  if((i % 4) == 0)
+   msk++;
+  src1[i] = i* (i + 1) * sign;
+  src2[i] = (i + 20) * sign;
+  mask[i] = (i + 120) * i;
+  if( (msk & (1 << (i % 4
+   mask[i] = -mask[i];
+  sign = -sign;
+}
+}
+
+static int
+check_blendvps (__m128 *dst, float *src1, float *src2,
+   float *mask)
+{
+  float tmp[4];
+  int j;
+
+  memcpy (&tmp[0], src1, sizeof (tmp));
+  for (j = 0; j < 4; j++)
+if (mask [j] < 0.0)
+  tmp[j] = src2[j];
+
+  return memcmp (dst, &tmp[0], sizeof (tmp));
+}
+
+static void
+sse4_1_test (void)
+{
+  union
+{
+  __m128 x[NUM];
+  float f[NUM * 4];
+} dst, src1, src2, mask;
+  int i;
+
+  init_blendvps (src1.f, src2.f, mask.f);
+
+  for (i = 0; i < NUM; i++)
+{
+  dst.x[i] = _mm_blendv_ps (src1.x[i], src2.x[i], mask.x[i]);
+  if (check_blendvps (&dst.x[i], &src1.f[i * 4], &src2.f[i * 4],
+ &mask.f[i * 4]))
+   abort ();
+}
+}
-- 
2.27.0



[PATCH] Fortran: make IEEE_VALUE produce signaling NaNs

2022-01-10 Thread FX via Gcc-patches
Hi,

Second part of a three-patch series to fix PR 82207 
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82207), making gfortran handle 
signaling NaNs. This part fixes the library code implementing IEEE_VALUE. To do 
so, I switched that part of library code from Fortran to C, because in C we 
have access to all GCC built-ins related to NaNs/infinities/etc, which is super 
useful for generating the right bit patterns (instead of using roundabout ways, 
like the previous Fortran implementation, for which I am guilty).

I needed to add to kinds.h the value of TINY for each floating-point (which is 
used to produce denormals, by halving TINY).

The patch comes with a testcase, which is still conditional on issignaling 
support at this stage (and therefore will run on glibc targets only).

I had to amend the gfortran.dg/ieee/ieee_10.f90 testcase, which produces 
signaling NaNs while -ffpe-trap=invalid is set. It passed before, but only by 
accident, because we were not actually generating signaling NaNs. I’m not sure 
what is the expected behaviour, but the patch does not affect the real 
behaviour.

Bootstrapped and regtested on x86_64-pc-gnu-linux. OK to commit?

FX



0001-Fortran-Allow-IEEE_CLASS-to-identify-signaling-NaNs.patch
Description: Binary data




0001-Fortran-allow-IEEE_VALUE-to-correctly-return-signali.patch
Description: Binary data


Re: [PATCH] Register --sysroot in the driver switches table

2022-01-10 Thread Olivier Hainque via Gcc-patches



> On 10 Jan 2022, at 09:00, Richard Biener  wrote:
> 
> On Wed, Jan 5, 2022 at 6:58 PM Olivier Hainque via Gcc-patches
>  wrote:
>> 
>> 
>> 
>>> On 5 Jan 2022, at 10:26, Olivier Hainque  wrote:
>>> 
>>> The change should also set "validated" true
>>> when requesting to save --sysroot.
>> 
>> The attached adjustment fixes the failure I could reproduce,
>> bootstraps and regtests fine on x86_64-linux, and passes a build
>> + a couple of in-house testsuites for one of our vxworks ports.
>> 
>> Ok to commit?
> 
> OK.

Great, thanks Richard :)




[PING] [PATCH] C, C++, Fortran, OpenMP: Add 'has_device_addr' clause to 'target' construct

2022-01-10 Thread Marcel Vollweiler

Hi,

I'd like to ping the patch for the OpenMP 'has_device_addr' clause on
the target construct:

https://gcc.gnu.org/pipermail/gcc-patches/2021-November/585361.html


Thanks
Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


[PATCH] Check sorting of MAINTAINERS

2022-01-10 Thread Martin Liška

The script is capable of checking if MAINTAINER names are sorted
alphabetically. I used English locales and the scripts emits:

Are you fine with the suggested changes?

Cheers,
Martin

$ contrib/check-MAINTAINERS.py MAINTAINERS
Global Reviewers are fine!
Wrong order for Write After Approval:

  Mark G. Adams 
  Pedro Alves   
  Raksit Ashok  
  Matt Austern  
  David Ayers   
  Prakhar Bahuguna  
  Giovanni Bajo 
  Simon Baldwin 
  Scott Bambrough   
  Wolfgang Bangerth 
  Gergö Barany  
  Charles Baylis

  Tejas Belagod 
  Matthew Beliveau  
  Serge Belyshev

  Jon Beniston  
  Andrew Bennett

  Andrew Benson 
  Daniel Berlin 
  Pat Bernardi  
  Jan Beulich   
  Indu Bhagat   
  David Billinghurst

  Tomas Bily
  Laurynas Biveinis 
  Eric Blake
  Phil Blundell 
  Hans Boehm
  Lynn Boger
  Ian Bolton
  Andrea Bona   
  Neil Booth
  Antoni Boucher
  Robert Bowdidge   
  Joel Brobecker
  Dave Brolley  
  Christian Bruel   
  Kevin Buettner
  Andrew Burgess
  Adam Butcher  
  Andrew Cagney 
  Daniel Carrera
  Stephane Carrez   
  Gabriel Charette  
  Chandra Chavva
  Dehao Chen
- Fabien Chêne  
  Clément Chigot

  Harshit Chopra
  Tamar Christina   

  Eric Christopher  
+ Fabien Chêne  
  Paul Clarke   
  William Cohen 
  Michael Collison  
  Josh Conner   
  R. Kelley Cook
  Alex Coplan   
  Andrea Corallo
  Christian Cornelssen  
  Ludovic Courtès   
  Lawrence Crowl
  Ian Dall  
  David Daney   
  Robin Dapp
  Simon Dardis  
  Sudakshina Das
  Bud Davis 
  Chris Demetriou   
  Sameera Deshpande 
  Wilco Dijkstra
  Benoit Dupont de Dinechin 

  Jason Eckhardt
  Bernd Edlinger

  Phil Edwards  
  Mark Eggleston

  Steve Ellcey  
  Mohan Embar   
  Revital Eres  
  Marc Espie
  Ansgar Esztermann 

  Doug Evans
  Chris Fairles 
  Alessandro Fanfarillo 
  Changpeng Fang
  David Faust   
  Li Feng   
  Thomas Fitzsimmons
  Alexander Fomin   

  Brian Ford
  John Freeman  
+

[PATCH][pushed] Partially sort MAINTAINERS.

2022-01-10 Thread Martin Liška

There are more complicated names that I'm going to address in a different
patch.

Cheers,
Martin

ChangeLog:

* MAINTAINERS: Fix obvious issues with sorting.
---
 MAINTAINERS | 38 +++---
 1 file changed, 19 insertions(+), 19 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 0fd3a1f99f2..c5aeb1af174 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -23,8 +23,8 @@ Maintainers
 
 			Global Reviewers
 
-Richard Earnshaw

 Richard Biener 
+Richard Earnshaw   
 Jakub Jelinek  
 Richard Kenner 
 Jeff Law   
@@ -346,8 +346,8 @@ Robert Bowdidge 

 Joel Brobecker 
 Dave Brolley   
 Christian Bruel
-Andrew Burgess 
 Kevin Buettner 
+Andrew Burgess 
 Adam Butcher   
 Andrew Cagney  
 Daniel Carrera 
@@ -469,8 +469,8 @@ Tim Josling 

 Victor Kaplansky   
 Geoffrey Keating   
 Brendan Kehoe  
-Matthias Klose 
 Andi Kleen 
+Matthias Klose 
 Jeff Knaggs
 Michael Koch   
 Nicolas Koenig 
@@ -487,9 +487,9 @@ Doug Kwan   

 Scott Robert Ladd  
 Razya Ladelsky 
 Thierry Lafage 
-Aaron W. LaFramboise   
 Rask Ingemann Lambertsen   
 Jerome Lambourg
+Aaron W. LaFramboise   
 Asher Langton  
 Chris Lattner  
 Terry Laurenzo 
@@ -517,6 +517,7 @@ H.J. Lu 

 Xiong Hu Luo   
 Bin Bin Lv 
 Christophe Lyon
+Jun Ma 
 Luis Machado   
 Ziga Mahkovec  
 Matthew Malcomson  
@@ -545,7 +546,6 @@ Dirk Mueller

 Phil Muldoon   
 Gaius Mulley   
 Steven Munroe  
-Jun Ma 
 Szabolcs Nagy  
 Quentin Neill  
 Adam Nemet 
@@ -556,9 +556,9 @@ James Norris
 Diego Novillo  
 Dorit Nuzman   
 David O'Brien  
-Braden Obrzut  
 Carlos O'Donell
 Peter O'Gorman 
+Braden Obrzut  
 Andrea Ornstein
 Maxim Ostapenko

 Patrick Palka  
@@ -621,14 +621,15 @@ Franz Sirl

 Jan Sjodin 
 Trevor Smigiel 

 Edward Smith-Rowland   <3dw...@verizon.net>
-Jayant Sonar   
 Anatoly Sokolov
 Michael Sokolov

+Jayant Sonar   
 Richard Stallman   
 Basile Starynkevitch   
 Jakub Staszak  
 Graham Stott   
 Jeff Sturm 
+YunQiang Su
 Robert Suchanek
 Andrew Sutton  
 Gabriele Svelto
@@ -695,7 +696,6 @@ Shujing Zhao

 Jon Ziegler
 Roman Zippel   
 Josef Zlomek   
-YunQiang Su
 
 			Bug database only accounts
 
@@ -711,14 +711,14 @@ Certificate of Origin Version 1.1.  See https://gcc.gnu.org/

Re: [PATCH] x86_64: Ignore zero width bitfields in ABI and issue -Wpsabi warning about C zero width bitfield ABI changes [PR102024]

2022-01-10 Thread Uros Bizjak via Gcc-patches
On Mon, Jan 10, 2022 at 3:23 PM Michael Matz  wrote:
>
> Hello,
>
> On Mon, 20 Dec 2021, Uros Bizjak wrote:
>
> > > Thanks.
> > > I see nobody commented on Micha's post there.
> > >
> > > Here is a patch that implements it in GCC, i.e. C++ doesn't change ABI 
> > > (at least
> > > not from the past few releases) and C does for GCC:
> > >
> > > 2021-12-15  Jakub Jelinek  
> > >
> > > PR target/102024
> > > * config/i386/i386.c (classify_argument): Add zero_width_bitfields
> > > argument, when seeing DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD 
> > > bitfields,
> > > always ignore them, when seeing other zero sized bitfields, either
> > > set zero_width_bitfields to 1 and ignore it or if equal to 2 
> > > process
> > > it.  Pass it to recursive calls.  Add wrapper
> > > with old arguments and diagnose ABI differences for C structures
> > > with zero width bitfields.  Formatting fixes.
> > >
> > > * gcc.target/i386/pr102024.c: New test.
> > > * g++.target/i386/pr102024.C: New test.
> >
> > Please get a signoff on the ABI change (perhaps HJ can help here),
> > I'll approve the implementation after that.
>
> Christmas came in the way, but I just merged the proposed change
> (zero-with bit-fields -> NO_CLASS) into the psABI.

Thanks - LGTM for the patch.

Thanks,
Uros.


Re: PING^2: [PATCH] Add --enable-first-stage-cross configure option

2022-01-10 Thread Jonathan Wakely via Gcc-patches

On 10/01/22 00:26 +0300, Serge Belyshev wrote:

Ping: [PATCH] Add --enable-first-stage-cross configure option
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575318.html


Add --enable-first-stage-cross configure option

Build static-only, C-only compiler that is sufficient to cross compile
glibc.  This option disables various runtime libraries that require
libc to compile, turns on --with-newlib, --without-headers,
--disable-decimal-float, --disable-shared, --disable-threads, and sets
--enable-languages=c.



Rationale: current way of building first stage compiler of a cross
toolchain requires specifying a list of target libraries that are not
going to be compiled due to their dependency on target libc.  This
list is not documented in gccinstall.texi and sometimes changes.  To
simplify the procedure, it is better to maintain that list in the GCC
itself.


I think this is a great idea.

I don't think it makes any difference to this patch, but I've just
committed a change to libstdc++ so that you no longer neeed to add
--with-newlib when libstdc++ is configured with --without-headers
(because it's counter-intuitive to have to say which libc you're using
when not using any libc).

I only tested it for --disable-hosted-libstdcxx --without-headers
because I am not sure what libstdc++ even does if you build it hosted
but --without-headers.



Re: [PATCH, OpenMP, C/C++] Fix PR103705

2022-01-10 Thread Chung-Lin Tang

Forgot to attach the patch, here it is :P

On 2022/1/10 10:59 PM, Chung-Lin Tang wrote:

For cases like:
   #pragma omp target update from(s[0].a[0:1])

The handling in [c_]finish_omp_clauses was only peeling off ARRAY_REF once
before the loop handling COMPONENT_REF, and snagged when the base of the
component_ref is an array access. This adds the handling there for both C
and C++ front-ends.

(ICE started to happen after 
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=6c0399378e77d029
where map/from/to clause syntax was relaxed to allow more stuff)

Tested without regressions, okay to commit?

Thanks,
Chung-Lin

     PR c++/103705

gcc/c/ChangeLog:

     * c-typeck.c (c_finish_omp_clauses): Also continue peeling off of
     outer node for ARRAY_REFs.

gcc/cp/ChangeLog:

     * semantics.c (finish_omp_clauses): Also continue peeling off of
     outer node for ARRAY_REFs.

gcc/testsuite/ChangeLog:

 * c-c++-common/gomp/pr103705.c: New test.diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index 8b492cf5bed..ac6618eca5c 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -14929,7 +14929,8 @@ c_finish_omp_clauses (tree clauses, enum 
c_omp_region_type ort)
t = TREE_OPERAND (t, 0);
}
}
- while (TREE_CODE (t) == COMPONENT_REF);
+ while (TREE_CODE (t) == COMPONENT_REF
+|| TREE_CODE (t) == ARRAY_REF);
 
  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
  && OMP_CLAUSE_MAP_IMPLICIT (c)
diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 645654768e3..a7435ed1266 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -7931,7 +7931,8 @@ finish_omp_clauses (tree clauses, enum c_omp_region_type 
ort)
t = TREE_OPERAND (t, 0);
}
}
- while (TREE_CODE (t) == COMPONENT_REF);
+ while (TREE_CODE (t) == COMPONENT_REF
+|| TREE_CODE (t) == ARRAY_REF);
 
  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
  && OMP_CLAUSE_MAP_IMPLICIT (c)
diff --git a/gcc/testsuite/c-c++-common/gomp/pr103705.c 
b/gcc/testsuite/c-c++-common/gomp/pr103705.c
new file mode 100644
index 000..bf4c7066d28
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/gomp/pr103705.c
@@ -0,0 +1,14 @@
+/* PR c++/103705 */
+/* { dg-do compile } */
+
+struct S
+{
+  int a[2];
+};
+
+int main (void)
+{
+  struct S s[1];
+  #pragma omp target update from(s[0].a[0:1])
+  return 0;
+}


[PATCH, OpenMP, C/C++] Fix PR103705

2022-01-10 Thread Chung-Lin Tang

For cases like:
  #pragma omp target update from(s[0].a[0:1])

The handling in [c_]finish_omp_clauses was only peeling off ARRAY_REF once
before the loop handling COMPONENT_REF, and snagged when the base of the
component_ref is an array access. This adds the handling there for both C
and C++ front-ends.

(ICE started to happen after 
https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=6c0399378e77d029
where map/from/to clause syntax was relaxed to allow more stuff)

Tested without regressions, okay to commit?

Thanks,
Chung-Lin

PR c++/103705

gcc/c/ChangeLog:

* c-typeck.c (c_finish_omp_clauses): Also continue peeling off of
outer node for ARRAY_REFs.

gcc/cp/ChangeLog:

* semantics.c (finish_omp_clauses): Also continue peeling off of
outer node for ARRAY_REFs.

gcc/testsuite/ChangeLog:

* c-c++-common/gomp/pr103705.c: New test.


Re: [PATCH 4/6] ira: Try to avoid propagating conflicts

2022-01-10 Thread Richard Sandiford via Gcc-patches
Hi Vlad,

Vladimir Makarov  writes:
> On 2022-01-06 09:47, Richard Sandiford wrote:
>> Suppose that:
>>
>> - an inner loop L contains an allocno A
>> - L clobbers hard register R while A is live
>> - A's parent allocno is AP
>>
>> Previously, propagate_allocno_info would propagate conflict sets up the
>> loop tree, so that the conflict between A and R would become a conflict
>> between AP and R (and so on for ancestors of AP).
> My thoughts for propagating hard register conflicts was to avoid 
> changing allocations on the region border as much as possible.  The 
> solution you are proposing might result in allocating R to the allocno 
> and creating moves/loads/stores on the region border when it would be 
> possible to assign R to another allocno and another hard reg to the 
> allocno in consideration.  As it is all about heuristics it is hard to 
> say just speculating what probability of such situation and what 
> heuristic is better.  Only checking credible benchmarks is a criterium 
> to choose heuristics. […]

Yeah, I agree with all of the above.  Any change to these heuristics is
likely to make some things worse: a strict improvement over the status
quo is essentially impossible.

I guess in principle, the new heuristic is more suited to those high
register pressure situations in which some spilling is inevitable.
The cases that are most likely to lose out under the new heuristics
are those where allocation is possible without spilling and where
recording the conflicts gave the allocator the extra “push” it needed to
prioritise the parent allocno over others with fewer subloop conflicts.
And the risks of that happening are probably greater if the target is
providing lop-sided costs.

(Talking of which, the aarch64 port is still providing equal costs
for loads and stores, which is something that we ought to look at.
But again, it's a sensitive heuristic, so tweaking it will need care.)

Thanks a lot for the quick reviews, really appreciate it.

I've now pushed the series.  Let's see what the fallout is :-)

Richard


Re: [AArch64] Enable generation of FRINTNZ instructions

2022-01-10 Thread Richard Biener via Gcc-patches
On Mon, 10 Jan 2022, Andre Vieira (lists) wrote:

> Yeah seems I forgot to send the latest version, my bad.
> 
> Bootstrapped on aarch64-none-linux.
> 
> OK for trunk?

The match.pd part looks OK to me.

Thanks,
Richard.

> gcc/ChangeLog:
> 
>     * config/aarch64/aarch64.md (ftrunc2): New
> pattern.
>     * config/aarch64/iterators.md (FRINTNZ): New iterator.
>     (frintnz_mode): New int attribute.
>     (VSFDF): Make iterator conditional.
>     * internal-fn.def (FTRUNC_INT): New IFN.
>     * internal-fn.c (ftrunc_int_direct): New define.
>     (expand_ftrunc_int_optab_fn): New custom expander.
>     (direct_ftrunc_int_optab_supported_p): New supported_p.
>     * match.pd: Add to the existing TRUNC pattern match.
>     * optabs.def (ftrunc_int): New entry.
>     * stor-layout.h (element_precision): Moved from here...
>     * tree.h (element_precision): ... to here.
>     (element_type): New declaration.
>     * tree.c (element_type): New function.
>     (element_precision): Changed to use element_type.
>     * tree-vect-stmts.c (vectorizable_internal_function): Add 
> support for
>     IFNs with different input types.
>     (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
>     * doc/md.texi: New entry for ftrunc pattern name.
>     * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.
> 
> gcc/testsuite/ChangeLog:
> 
>     * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz
> instruction available.
>     * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
>     * gcc.target/aarch64/frintnz.c: New test.
>     * gcc.target/aarch64/frintnz_vec.c: New test.
> 
> On 03/01/2022 12:18, Richard Biener wrote:
> > On Wed, 29 Dec 2021, Andre Vieira (lists) wrote:
> >
> >> Hi Richard,
> >>
> >> Thank you for the review, I've adopted all above suggestions downstream, I
> >> am
> >> still surprised how many style things I still miss after years of gcc
> >> development :(
> >>
> >> On 17/12/2021 12:44, Richard Sandiford wrote:
>  @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
>   rhs_type = unsigned_type_node;
> }
> -  int mask_opno = -1;
>  +  /* The argument that is not of the same type as the others.  */
>  +  int diff_opno = -1;
>  +  bool masked = false;
>   if (internal_fn_p (cfn))
>  -mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
>  +{
>  +  if (cfn == CFN_FTRUNC_INT)
>  +/* For FTRUNC this represents the argument that carries the 
>  type of
>  the
>  +   intermediate signed integer.  */
>  +diff_opno = 1;
>  +  else
>  +{
>  +  /* For masked operations this represents the argument that 
>  carries
>  the
>  + mask.  */
>  +  diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
>  +  masked = diff_opno >=  0;
>  +}
>  +}
> >>> I think it would be cleaner not to process argument 1 at all for
> >>> CFN_FTRUNC_INT.  There's no particular need to vectorise it.
> >> I agree with this,  will change the loop to continue for argument 1 when
> >> dealing with non-masked CFN's.
> >>
>   }
>  […]
>  diff --git a/gcc/tree.c b/gcc/tree.c
>  index
>  845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
>  100644
>  --- a/gcc/tree.c
>  +++ b/gcc/tree.c
>  @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size,
>  cst_size_error *perr /* = NULL */)
>   return true;
> }
> 
>  -/* Return the precision of the type, or for a complex or vector type the
>  -   precision of the type of its elements.  */
>  +/* Return the type, or for a complex or vector type the type of its
>  +   elements.  */
> -unsigned int
>  -element_precision (const_tree type)
>  +tree
>  +element_type (const_tree type)
> {
>   if (!TYPE_P (type))
> type = TREE_TYPE (type);
>  @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
>   if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
> type = TREE_TYPE (type);
> -  return TYPE_PRECISION (type);
>  +  return (tree) type;
> >>> I think we should stick a const_cast in element_precision and make
> >>> element_type take a plain “type”.  As it stands element_type is an
> >>> implicit const_cast for many cases.
> >>>
> >>> Thanks,
> >> Was just curious about something here, I thought the purpose of having
> >> element_precision (before) and element_type (now) take a const_tree as an
> >> argument was to make it clear we aren't changing the input type. I
> >> understand
> >> that as it stands element_type could be an implicit const_cast (which I
> >> should
> >> be using rather than the '(tree)' cast), but that's only if 'type' is a

Re: [PATCH] x86_64: Ignore zero width bitfields in ABI and issue -Wpsabi warning about C zero width bitfield ABI changes [PR102024]

2022-01-10 Thread Michael Matz via Gcc-patches
Hello,

On Mon, 20 Dec 2021, Uros Bizjak wrote:

> > Thanks.
> > I see nobody commented on Micha's post there.
> >
> > Here is a patch that implements it in GCC, i.e. C++ doesn't change ABI (at 
> > least
> > not from the past few releases) and C does for GCC:
> >
> > 2021-12-15  Jakub Jelinek  
> >
> > PR target/102024
> > * config/i386/i386.c (classify_argument): Add zero_width_bitfields
> > argument, when seeing DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD bitfields,
> > always ignore them, when seeing other zero sized bitfields, either
> > set zero_width_bitfields to 1 and ignore it or if equal to 2 process
> > it.  Pass it to recursive calls.  Add wrapper
> > with old arguments and diagnose ABI differences for C structures
> > with zero width bitfields.  Formatting fixes.
> >
> > * gcc.target/i386/pr102024.c: New test.
> > * g++.target/i386/pr102024.C: New test.
> 
> Please get a signoff on the ABI change (perhaps HJ can help here),
> I'll approve the implementation after that.

Christmas came in the way, but I just merged the proposed change 
(zero-with bit-fields -> NO_CLASS) into the psABI.


Ciao,
Michael.

> 
> Uros.
> 
> >
> > --- gcc/config/i386/i386.c.jj   2021-12-10 17:00:06.024369219 +0100
> > +++ gcc/config/i386/i386.c  2021-12-15 15:04:49.245148023 +0100
> > @@ -2065,7 +2065,8 @@ merge_classes (enum x86_64_reg_class cla
> >
> >  static int
> >  classify_argument (machine_mode mode, const_tree type,
> > -  enum x86_64_reg_class classes[MAX_CLASSES], int 
> > bit_offset)
> > +  enum x86_64_reg_class classes[MAX_CLASSES], int 
> > bit_offset,
> > +  int &zero_width_bitfields)
> >  {
> >HOST_WIDE_INT bytes
> >  = mode == BLKmode ? int_size_in_bytes (type) : (int) GET_MODE_SIZE 
> > (mode);
> > @@ -2123,6 +2124,16 @@ classify_argument (machine_mode mode, co
> >  misaligned integers.  */
> >   if (DECL_BIT_FIELD (field))
> > {
> > + if (integer_zerop (DECL_SIZE (field)))
> > +   {
> > + if (DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD (field))
> > +   continue;
> > + if (zero_width_bitfields != 2)
> > +   {
> > + zero_width_bitfields = 1;
> > + continue;
> > +   }
> > +   }
> >   for (i = (int_bit_position (field)
> > + (bit_offset % 64)) / 8 / 8;
> >i < ((int_bit_position (field) + (bit_offset % 
> > 64))
> > @@ -2160,7 +2171,8 @@ classify_argument (machine_mode mode, co
> >   num = classify_argument (TYPE_MODE (type), type,
> >subclasses,
> >(int_bit_position (field)
> > -   + bit_offset) % 512);
> > +   + bit_offset) % 512,
> > +  zero_width_bitfields);
> >   if (!num)
> > return 0;
> >   pos = (int_bit_position (field)
> > @@ -2178,7 +2190,8 @@ classify_argument (machine_mode mode, co
> >   {
> > int num;
> > num = classify_argument (TYPE_MODE (TREE_TYPE (type)),
> > -TREE_TYPE (type), subclasses, 
> > bit_offset);
> > +TREE_TYPE (type), subclasses, 
> > bit_offset,
> > +zero_width_bitfields);
> > if (!num)
> >   return 0;
> >
> > @@ -2211,7 +2224,7 @@ classify_argument (machine_mode mode, co
> >
> >   num = classify_argument (TYPE_MODE (TREE_TYPE (field)),
> >TREE_TYPE (field), subclasses,
> > -  bit_offset);
> > +  bit_offset, 
> > zero_width_bitfields);
> >   if (!num)
> > return 0;
> >   for (i = 0; i < num && i < words; i++)
> > @@ -2231,7 +2244,7 @@ classify_argument (machine_mode mode, co
> >  X86_64_SSEUP_CLASS, everything should be passed in
> >  memory.  */
> >   if (classes[0] != X86_64_SSE_CLASS)
> > - return 0;
> > +   return 0;
> >
> >   for (i = 1; i < words; i++)
> > if (classes[i] != X86_64_SSEUP_CLASS)
> > @@ -2257,8 +2270,8 @@ classify_argument (machine_mode mode, co
> >   classes[i] = X86_64_SSE_CLASS;
> > }
> >
> > - /*  If X86_64_X87UP_CLASS isn't preceded by X86_64_X87_CLASS,
> > -  everything should be passed in me

Re: [PATCH] vect: Add bias parameter for partial vectorization

2022-01-10 Thread Richard Sandiford via Gcc-patches
Robin Dapp  writes:
> Hi Richard,
>
>> I think it would be better to fold this into the existing documentation
>> a bit more:
> [..]
>
> done.
>
> Fixed the remaining nits in the attached v5.
>
> Bootstrap and regtest are good on s390x, Power9 and i386.
>
> Regards
>  Robin
>
> --
>
> gcc/ChangeLog:
>
> * config/rs6000/vsx.md: Use const0 bias predicate.
> * doc/md.texi: Document bias value.
> * internal-fn.c (expand_partial_load_optab_fn): Add bias.
> (expand_partial_store_optab_fn): Likewise.
> (internal_len_load_store_bias): New function.
> * internal-fn.h (VECT_PARTIAL_BIAS_UNSUPPORTED): New define.
> (internal_len_load_store_bias): New function.
> * tree-vect-loop-manip.c (vect_set_loop_controls_directly): Set
> bias.
> (vect_set_loop_condition_partial_vectors): Add header_seq parameter.
> * tree-vect-loop.c (vect_verify_loop_lens): Verify bias.
> (vect_estimate_min_profitable_iters): Account for bias.
> (vect_get_loop_len): Add bias-adjusted length.
> * tree-vect-stmts.c (vectorizable_store): Use.
> (vectorizable_load): Use.
> * tree-vectorizer.h (struct rgroup_controls): Add bias-adjusted
> length.
> (LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS): New macro.

OK, thanks!

Richard


Re: [PATCH] c++: Reject in constant evaluation address comparisons of start of one var and end of another [PR89074]

2022-01-10 Thread Richard Biener via Gcc-patches
On Thu, Jan 6, 2022 at 10:25 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> Hi!
>
> The following testcase used to be incorrectly accepted.  The match.pd
> optimization that uses address_compare punts on folding comparison
> of start of one object and end of another one only when those addresses
> are cast to integral types, when the comparison is done on pointer types
> it assumes undefined behavior and decides to fold the comparison such
> that the addresses don't compare equal even when they at runtime they
> could be equal.
> But C++ says it is undefined behavior and so during constant evaluation
> we should reject those, so this patch adds !folding_initializer &&
> check to that spot.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> Note, address_compare has some special cases, e.g. it assumes that
> static vars are never adjacent to automatic vars, which is the case
> for the usual layout where automatic vars are on the stack and after
> .rodata/.data sections there is heap:
>   /* Assume that automatic variables can't be adjacent to global
>  variables.  */
>   else if (is_global_var (base0) != is_global_var (base1))
> ;
> Is it ok that during constant evaluation we don't treat those as undefined
> behavior, or shall that be with !folding_initializer && too?
>
> Another special case is:
>   if ((DECL_P (base0) && TREE_CODE (base1) == STRING_CST)
>|| (TREE_CODE (base0) == STRING_CST && DECL_P (base1))
>|| (TREE_CODE (base0) == STRING_CST
>&& TREE_CODE (base1) == STRING_CST
>&& ioff0 >= 0 && ioff1 >= 0
>&& ioff0 < TREE_STRING_LENGTH (base0)
>&& ioff1 < TREE_STRING_LENGTH (base1)
>   /* This is a too conservative test that the STRING_CSTs
>  will not end up being string-merged.  */
>&& strncmp (TREE_STRING_POINTER (base0) + ioff0,
>TREE_STRING_POINTER (base1) + ioff1,
>MIN (TREE_STRING_LENGTH (base0) - ioff0,
> TREE_STRING_LENGTH (base1) - ioff1)) != 0))
> ;
>   else if (!DECL_P (base0) || !DECL_P (base1))
> return 2;
> Here we similarly assume that vars aren't adjacent to string literals
> or vice versa.  Do we need to stick !folding_initializer && to those
> DECL_P vs. STRING_CST cases?  Though, because of the return 2; for
> non-DECL_P that would mean rejecting comparisons like &var == &"foobar"[3]
> etc. which ought to be fine, no?  So perhaps we need to watch for
> decls. vs. STRING_CSTs like for DECLs whether the address is at the start
> or at the end of the string literal or somewhere in between (at least
> for folding_initializer)?
> And yet another chapter but probably unsolvable is comparison of
> string literal addresses.  I think pedantically in C++
> &"foo"[0] == &"foo"[0] is undefined behavior, different occurences of
> the same string literals might still not be merged in some implementations.
> But constexpr const char *s = "foo"; &s[0] == &s[0] should be well defined,
> and we aren't tracking anywhere whether the string literal was the same one
> or different (and I think other compilers don't track that either).

On my TODO list is to make &"foo" invalid and instead require &CONST_DECL
(and DECL_INITIAL of it then being "foo"), that would make it possible to
track the "original" string literal and perform string merging in a
more considerate
way.

Richard.

>
> 2022-01-06  Jakub Jelinek  
>
> PR c++/89074
> * fold-const.c (address_compare): Punt on comparison of address of
> one object with address of end of another object if
> folding_initializer.
>
> * g++.dg/cpp1y/constexpr-89074-1.C: New test.
>
> --- gcc/fold-const.c.jj 2022-01-05 20:30:08.731806756 +0100
> +++ gcc/fold-const.c2022-01-05 20:34:52.277822349 +0100
> @@ -16627,7 +16627,7 @@ address_compare (tree_code code, tree ty
>/* If this is a pointer comparison, ignore for now even
>   valid equalities where one pointer is the offset zero
>   of one object and the other to one past end of another one.  */
> -  else if (!INTEGRAL_TYPE_P (type))
> +  else if (!folding_initializer && !INTEGRAL_TYPE_P (type))
>  ;
>/* Assume that automatic variables can't be adjacent to global
>   variables.  */
> --- gcc/testsuite/g++.dg/cpp1y/constexpr-89074-1.C.jj   2022-01-05 
> 20:43:03.696917484 +0100
> +++ gcc/testsuite/g++.dg/cpp1y/constexpr-89074-1.C  2022-01-05 
> 20:42:12.676634044 +0100
> @@ -0,0 +1,28 @@
> +// PR c++/89074
> +// { dg-do compile { target c++14 } }
> +
> +constexpr bool
> +foo ()
> +{
> +  int a[] = { 1, 2 };
> +  int b[] = { 3, 4 };
> +
> +  if (&a[0] == &b[0])
> +return false;
> +
> +  if (&a[1] == &b[0])
> +return false;
> +
> +  if (&a[1] == &b[1])
> +return false;
> +
> +  if (&a[2] == &b[1])
> +return false;
> +
> +  if (&a[2] == &b[0])  // { dg-error "is not a constant expression" }
> +return false;
> +
>

Re: [AArch64] Enable generation of FRINTNZ instructions

2022-01-10 Thread Andre Vieira (lists) via Gcc-patches

Yeah seems I forgot to send the latest version, my bad.

Bootstrapped on aarch64-none-linux.

OK for trunk?

gcc/ChangeLog:

    * config/aarch64/aarch64.md (ftrunc2): New 
pattern.

    * config/aarch64/iterators.md (FRINTNZ): New iterator.
    (frintnz_mode): New int attribute.
    (VSFDF): Make iterator conditional.
    * internal-fn.def (FTRUNC_INT): New IFN.
    * internal-fn.c (ftrunc_int_direct): New define.
    (expand_ftrunc_int_optab_fn): New custom expander.
    (direct_ftrunc_int_optab_supported_p): New supported_p.
    * match.pd: Add to the existing TRUNC pattern match.
    * optabs.def (ftrunc_int): New entry.
    * stor-layout.h (element_precision): Moved from here...
    * tree.h (element_precision): ... to here.
    (element_type): New declaration.
    * tree.c (element_type): New function.
    (element_precision): Changed to use element_type.
    * tree-vect-stmts.c (vectorizable_internal_function): Add 
support for

    IFNs with different input types.
    (vectorizable_call): Teach to handle IFN_FTRUNC_INT.
    * doc/md.texi: New entry for ftrunc pattern name.
    * doc/sourcebuild.texi (aarch64_frintzx_ok): New target.

gcc/testsuite/ChangeLog:

    * gcc.target/aarch64/merge_trunc1.c: Adapted to skip if frintNz 
instruction available.

    * lib/target-supports.exp: Added arm_v8_5a_frintnzx_ok target.
    * gcc.target/aarch64/frintnz.c: New test.
    * gcc.target/aarch64/frintnz_vec.c: New test.

On 03/01/2022 12:18, Richard Biener wrote:

On Wed, 29 Dec 2021, Andre Vieira (lists) wrote:


Hi Richard,

Thank you for the review, I've adopted all above suggestions downstream, I am
still surprised how many style things I still miss after years of gcc
development :(

On 17/12/2021 12:44, Richard Sandiford wrote:

@@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
 rhs_type = unsigned_type_node;
   }
   -  int mask_opno = -1;
+  /* The argument that is not of the same type as the others.  */
+  int diff_opno = -1;
+  bool masked = false;
 if (internal_fn_p (cfn))
-mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
+{
+  if (cfn == CFN_FTRUNC_INT)
+   /* For FTRUNC this represents the argument that carries the type of
the
+  intermediate signed integer.  */
+   diff_opno = 1;
+  else
+   {
+ /* For masked operations this represents the argument that carries
the
+mask.  */
+ diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
+ masked = diff_opno >=  0;
+   }
+}

I think it would be cleaner not to process argument 1 at all for
CFN_FTRUNC_INT.  There's no particular need to vectorise it.

I agree with this,  will change the loop to continue for argument 1 when
dealing with non-masked CFN's.


}
[…]
diff --git a/gcc/tree.c b/gcc/tree.c
index
845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
100644
--- a/gcc/tree.c
+++ b/gcc/tree.c
@@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size,
cst_size_error *perr /* = NULL */)
 return true;
   }
   
-/* Return the precision of the type, or for a complex or vector type the

-   precision of the type of its elements.  */
+/* Return the type, or for a complex or vector type the type of its
+   elements.  */
   -unsigned int
-element_precision (const_tree type)
+tree
+element_type (const_tree type)
   {
 if (!TYPE_P (type))
   type = TREE_TYPE (type);
@@ -6657,7 +6657,16 @@ element_precision (const_tree type)
 if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
   type = TREE_TYPE (type);
   -  return TYPE_PRECISION (type);
+  return (tree) type;

I think we should stick a const_cast in element_precision and make
element_type take a plain “type”.  As it stands element_type is an
implicit const_cast for many cases.

Thanks,

Was just curious about something here, I thought the purpose of having
element_precision (before) and element_type (now) take a const_tree as an
argument was to make it clear we aren't changing the input type. I understand
that as it stands element_type could be an implicit const_cast (which I should
be using rather than the '(tree)' cast), but that's only if 'type' is a type
that isn't complex/vector, either way, we are conforming to the promise that
we aren't changing the incoming type, what the caller then does with the
result is up to them no?

I don't mind making the changes, just trying to understand the reasoning
behind it.

I'll send in a new patch with all the changes after the review on the match.pd
stuff.

I'm missing an updated patch after my initial review of the match.pd
stuff so not sure what to review.  Can you re-post and updated patch?

Thanks,
Richard.diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
3c72bdad01bfab49ee4ae6fb7b139202e89c8d34..9d04a2e088cd7d03963b58ed3708c339b841801c
 100644
--- a/gcc/config

Re: [PATCH 6/6] ira: Handle "soft" conflicts between cap and non-cap allocnos

2022-01-10 Thread Vladimir Makarov via Gcc-patches



On 2022-01-06 09:48, Richard Sandiford wrote:

This patch looks for allocno conflicts of the following form:

- One allocno (X) is a cap allocno for some non-cap allocno X2.
- X2 belongs to some loop L2.
- The other allocno (Y) is a non-cap allocno.
- Y is an ancestor of some allocno Y2 in L2.
- Y2 is not referenced in L2 (that is, ALLOCNO_NREFS (Y2) == 0).
- Y can use a different allocation from Y2.

In this case, Y's register is live across L2 but is not used within it,
whereas X's register is used only within L2.  The conflict is therefore
only "soft", in that it can easily be avoided by spilling Y2 inside L2
without affecting any insn references.

In principle we could do this for ALLOCNO_NREFS (Y2) != 0 too, with the
callers then taking Y2's ALLOCNO_MEMORY_COST into account.  There would
then be no "cliff edge" between a Y2 that has no references and a Y2 that
has (say) a single cold reference.

However, doing that isn't necessary for the PR and seems to give
variable results in practice.  (fotonik3d_r improves slightly but
namd_r regresses slightly.)  It therefore seemed better to start
with the higher-value zero-reference case and see how things go.

On top of the previous patches in the series, this fixes the exchange2
regression seen in GCC 11.

gcc/
PR rtl-optimization/98782
* ira-int.h (ira_soft_conflict): Declare.
* ira-costs.c (max_soft_conflict_loop_depth): New constant.
(ira_soft_conflict): New function.
(spill_soft_conflicts): Likewise.
(assign_hard_reg): Use them to handle the case described by
the comment above ira_soft_conflict.
(improve_allocation): Likewise.
* ira.c (check_allocation): Allow allocnos with "soft" conflicts
to share the same register.

gcc/testsuite/
* gcc.target/aarch64/reg-alloc-4.c: New test.


OK.  If something goes wrong with the patches (e.g. a lot of GCC 
testsuite failures or performance degradation), we can revert only the 
last 3 of them as ones actually changing the heuristics.  But I hope it 
will be not necessary.


Thank you again for working on the PR.  Fixing it required big efforts 
in thinking, testing and benchmarking.





Re: [PATCH 5/6] ira: Consider modelling caller-save allocations as loop spills

2022-01-10 Thread Vladimir Makarov via Gcc-patches



On 2022-01-06 09:48, Richard Sandiford wrote:

If an allocno A in an inner loop L spans a call, a parent allocno AP
can choose to handle a call-clobbered/caller-saved hard register R
in one of two ways:

(1) save R before each call in L and restore R after each call
(2) spill R to memory throughout L

(2) can be cheaper than (1) in some cases, particularly if L does
not reference A.

Before the patch we always did (1).  The patch adds support for
picking (2) instead, when it seems cheaper.  It builds on the
earlier support for not propagating conflicts to parent allocnos.
Another cost calculation improvement for calls would be taking into 
account that allocno can be saved and restored once for several 
subsequent calls (e.g. in one BB).

gcc/
PR rtl-optimization/98782
* ira-int.h (ira_caller_save_cost): New function.
(ira_caller_save_loop_spill_p): Likewise.
* ira-build.c (ira_propagate_hard_reg_costs): Test whether it is
cheaper to spill a call-clobbered register throughout a loop rather
than spill it around each individual call.  If so, treat all
call-clobbered registers as conflicts and...
(propagate_allocno_info): ...do not propagate call information
from the child to the parent.
* ira-color.c (move_spill_restore): Update accordingly.
* ira-costs.c (ira_tune_allocno_costs): Use ira_caller_save_cost.

gcc/testsuite/
* gcc.target/aarch64/reg-alloc-3.c: New test.

OK for me.  Thank you for the patch.



Re: [PATCH 4/6] ira: Try to avoid propagating conflicts

2022-01-10 Thread Vladimir Makarov via Gcc-patches



On 2022-01-06 09:47, Richard Sandiford wrote:

Suppose that:

- an inner loop L contains an allocno A
- L clobbers hard register R while A is live
- A's parent allocno is AP

Previously, propagate_allocno_info would propagate conflict sets up the
loop tree, so that the conflict between A and R would become a conflict
between AP and R (and so on for ancestors of AP).
My thoughts for propagating hard register conflicts was to avoid 
changing allocations on the region border as much as possible.  The 
solution you are proposing might result in allocating R to the allocno 
and creating moves/loads/stores on the region border when it would be 
possible to assign R to another allocno and another hard reg to the 
allocno in consideration.  As it is all about heuristics it is hard to 
say just speculating what probability of such situation and what 
heuristic is better.  Only checking credible benchmarks is a criterium 
to choose heuristics.  It seems yours work better.  Thank you putting 
deep thoughts in improving existing heuristics in this and the following 
patches, Richard.

However, when IRA treats loops as separate allocation regions, it can
decide on a loop-by-loop basis whether to allocate a register or spill
to memory.  Conflicts in inner loops therefore don't need to become
hard conflicts in parent loops.  Instead we can record that using the
“conflicting” registers for the parent allocnos has a higher cost.
In the example above, this higher cost is the sum of:

- the cost of saving R on entry to L
- the cost of keeping the pseudo register in memory throughout L
- the cost of reloading R on exit from L

This value is also a cap on the hard register cost that A can contribute
to AP in general (not just for conflicts).  Whatever allocation we pick
for AP, there is always the option of spilling that register to memory
throughout L, so the cost to A of allocating a register to AP can't be
more than the cost of spilling A.

To take an extreme example: if allocating a register R2 to A is more
expensive than spilling A to memory, ALLOCNO_HARD_REG_COSTS (A)[R2]
could be (say) 2 times greater than ALLOCNO_MEMORY_COST (A) or 100
times greater than ALLOCNO_MEMORY_COST (A).  But this scale factor
doesn't matter to AP.  All that matters is that R2 is more expensive
than memory for A, so that allocating R2 to AP should be costed as
spilling A to memory (again assuming that A and AP are in different
allocation regions).  Propagating a factor of 100 would distort the
register costs for AP.

move_spill_restore tries to undo the propagation done by
propagate_allocno_info, so we need some extra processing there.

gcc/
PR rtl-optimization/98782
* ira-int.h (ira_allocno::might_conflict_with_parent_p): New field.
(ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P): New macro.
(ira_single_region_allocno_p): New function.
(ira_total_conflict_hard_regs): Likewise.
* ira-build.c (ira_create_allocno): Initialize
ALLOCNO_MIGHT_CONFLICT_WITH_PARENT_P.
(ira_propagate_hard_reg_costs): New function.
(propagate_allocno_info): Use it.  Try to avoid propagating
hard register conflicts to parent allocnos if we can handle
the conflicts by spilling instead.  Limit the propagated
register costs to the cost of spilling throughout the child loop.
* ira-color.c (color_pass): Use ira_single_region_allocno_p to
test whether a child and parent allocno can share the same
register.
(move_spill_restore): Adjust for the new behavior of
propagate_allocno_info.

gcc/testsuite/
* gcc.target/aarch64/reg-alloc-2.c: New test.

Thank you for the patch.  It is ok for me.



Re: [PATCH] libstdc++, v2: Add %j, %U, %w, %W time_get support, fix %y, %Y, %C, %p [PR77760]

2022-01-10 Thread Jonathan Wakely via Gcc-patches
On Fri, 17 Dec 2021 at 07:19, Jakub Jelinek via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> On Thu, Dec 16, 2021 at 03:46:47PM +0100, Jakub Jelinek via Gcc-patches
> wrote:
> > glibc strptime passes around some state, what fields in struct tm have
> been
> > set and what needs to be finalized through possibly recursive calls, and
> > at the end performs various finalizations, like applying %p so that it
> > works for both %I %p and %p %I orders, or applying century so that both
> > %C %y and %y %C works, or computation of missing fields from others
> > (e.g. from %Y and %j one can compute tm_mon, tm_mday and tm_wday,
> > from %Y %U %w, %Y %W %w, %Y %U %a, or %Y %W %w one can compute
> > tm_mon, tm_mday, tm_yday or e.g. from %Y %m %d one can compute tm_wday
> > and tm_yday.
> ...
>
> Here is an updated patch with _M_ prefixes on members instead of __
> and no uglification of parameters and automatic vars in *.cc.  No changes
> otherwise, bootstrapped/regtested on x86_64-linux and i686-linux
> successfully.
>

OK for trunk, thanks.



>
> 2021-12-17  Jakub Jelinek  
>
> PR libstdc++/77760
> * include/bits/locale_facets_nonio.h (__time_get_state): New
> struct.
> (time_get::_M_extract_via_format): Declare new method with
> __time_get_state& as an extra argument.
> * include/bits/locale_facets_nonio.tcc (_M_extract_via_format): Add
> __state argument, set various fields in it while parsing.  Handle
> %j,
> %U, %w and %W, fix up handling of %y, %Y and %C, don't adjust
> tm_hour
> for %p immediately.  Add a wrapper around the method without the
> __state argument for backwards compatibility.
> (_M_extract_num): Remove all __len == 4 special cases.
> (time_get::do_get_time, time_get::do_get_date, time_get::do_get):
> Zero
> initialize __state, pass it to _M_extract_via_format and finalize
> it
> at the end.
> (do_get_year): For 1-2 digit parsed years, map 0-68 to 2000-2068,
> 69-99 to 1969-1999.  For 3-4 digit parsed years use that as year.
> (get): If do_get isn't overloaded from the locale_facets_nonio.tcc
> version, don't call do_get but call _M_extract_via_format instead
> to
> pass around state.
> * config/abi/pre/gnu.ver (GLIBCXX_3.4.30): Export
> _M_extract_via_format
> with extra __time_get_state and
> __time_get_state::_M_finalize_state.
> * src/c++98/locale_facets.cc (is_leap, day_of_the_week,
> day_of_the_year): New functions in anon namespace.
> (mon_yday): New var in anon namespace.
> (__time_get_state::_M_finalize_state): Define.
> * testsuite/22_locale/time_get/get/char/4.cc: New test.
> * testsuite/22_locale/time_get/get/wchar_t/4.cc: New test.
> * testsuite/22_locale/time_get/get_year/char/1.cc (test01): Parse
> 197
> as year 197AD instead of error.
> * testsuite/22_locale/time_get/get_year/char/5.cc (test01): Parse
> 1 as
> year 2001 instead of error.
> * testsuite/22_locale/time_get/get_year/char/6.cc: New test.
> * testsuite/22_locale/time_get/get_year/wchar_t/1.cc (test01):
> Parse
> 197 as year 197AD instead of error.
> * testsuite/22_locale/time_get/get_year/wchar_t/5.cc (test01):
> Parse
> 1 as year 2001 instead of error.
> * testsuite/22_locale/time_get/get_year/wchar_t/6.cc: New test.
>
> --- libstdc++-v3/include/bits/locale_facets_nonio.h.jj  2021-12-10
> 22:17:54.541591942 +0100
> +++ libstdc++-v3/include/bits/locale_facets_nonio.h 2021-12-17
> 00:21:33.125297992 +0100
> @@ -355,6 +355,30 @@ namespace std _GLIBCXX_VISIBILITY(defaul
>  {
>  _GLIBCXX_BEGIN_NAMESPACE_VERSION
>
> +  struct __time_get_state
> +  {
> +// Finalize state.
> +void
> +_M_finalize_state(tm* __tm);
> +
> +unsigned int _M_have_I : 1;
> +unsigned int _M_have_wday : 1;
> +unsigned int _M_have_yday : 1;
> +unsigned int _M_have_mon : 1;
> +unsigned int _M_have_mday : 1;
> +unsigned int _M_have_uweek : 1;
> +unsigned int _M_have_wweek : 1;
> +unsigned int _M_have_century : 1;
> +unsigned int _M_is_pm : 1;
> +unsigned int _M_want_century : 1;
> +unsigned int _M_want_xday : 1;
> +unsigned int _M_pad1 : 5;
> +unsigned int _M_week_no : 6;
> +unsigned int _M_pad2 : 10;
> +int _M_century;
> +int _M_pad3;
> +  };
> +
>  _GLIBCXX_BEGIN_NAMESPACE_CXX11
>
>/**
> @@ -756,6 +780,14 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
>_M_extract_via_format(iter_type __beg, iter_type __end, ios_base&
> __io,
> ios_base::iostate& __err, tm* __tm,
> const _CharT* __format) const;
> +
> +  // Extract on a component-by-component basis, via __format
> argument, with
> +  // state.
> +  iter_type
> +  _M_extract_via_format(iter_type __beg, iter_type __end, ios_base&
> __io,
> +   ios_base:

Re: [PATCH] C++ P0482R6 char8_t: declare std::c8rtomb and std::mbrtoc8 if provided by the C library

2022-01-10 Thread Jonathan Wakely via Gcc-patches
On Sat, 8 Jan 2022 at 00:42, Tom Honermann via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> This patch completes implementation of the C++20 proposal P0482R6 [1] by
> adding declarations of std::c8rtomb() and std::mbrtoc8() in  if
> provided by the C library in .
>
> This patch addresses feedback provided in response to a previous patch
> submission [2].
>
> Autoconf changes determine if the C library declares c8rtomb and mbrtoc8
> at global scope when uchar.h is included and compiled with either
> -fchar8_t or -std=c++20. New _GLIBCXX_USE_UCHAR_C8RTOMB_MBRTOC8_FCHAR8_T
> and _GLIBCXX_USE_UCHAR_C8RTOMB_MBRTOC8_CXX20 configuration macros
> reflect the probe results. The  header declares these functions
> in the std namespace only if available and the _GLIBCXX_USE_CHAR8_T
> configuration macro is defined (by default it is defined if the C++20
> __cpp_char8_t feature test macro is defined)
>
> Patches to glibc to implement c8rtomb and mbrtoc8 have been submitted [3].
>
> New tests validate the presence of these declarations. The tests pass
> trivially if the C library does not provide these functions. Otherwise
> they ensure that the functions are declared when  is included
> and either -fchar8_t or -std=c++20 is enabled.
>
> Tested on Linux x86_64.
>
> libstdc++-v3/ChangeLog:
>
> 2022-01-07  Tom Honermann  
>
> * acinclude.m4 Define config macros if uchar.h provides
> c8rtomb() and mbrtoc8().
> * config.h.in: Re-generate.
> * configure: Re-generate.
> * include/c_compatibility/uchar.h: Declare ::c8rtomb and
> ::mbrtoc8.
> * include/c_global/cuchar: Declare std::c8rtomb and
> std::mbrtoc8.
> * include/c_std/cuchar: Declare std::c8rtomb and std::mbrtoc8.
> * testsuite/21_strings/headers/cuchar/functions_std_cxx20.cc:
> New test.
> * testsuite/21_strings/headers/cuchar/functions_std_fchar8_t.cc:
> New test.
>


Thanks, Tom, this looks good and I'll get it committed for GCC 12.

My only concern is that the new tests depend on an internal macro:

+#if _GLIBCXX_USE_UCHAR_C8RTOMB_MBRTOC8_CXX20
+  using std::mbrtoc8;
+  using std::c8rtomb;

I prefer if tests are written as "user code" when possible, and not using
our internal macros. That isn't always possible, and in this case would
require adding new effective-target keyword to testsuite/lib/libstdc++.exp
just for use in these two tests. I don't think we should bother with that.

I suppose strictly speaking we should not define __cpp_lib_char8_t unless
these two functions are present in libc. But I'm not sure we want to change
that now either.


Re: [PATCH take #3] Recognize MULT_HIGHPART_EXPR in tree-ssa-math-opts pass.

2022-01-10 Thread Richard Biener via Gcc-patches
On Thu, Jan 6, 2022 at 11:39 PM Roger Sayle  wrote:
>
>
> This is the third iteration of a patch to perceive MULT_HIGHPART_EXPR
> in the middle-end.  As they say "the third time's a charm".  The first
> version implemented this in match.pd, which was considered too early.
> https://gcc.gnu.org/pipermail/gcc-patches/2020-August/551316.html
> The second version attempted to do this during RTL expansion, and was
> considered to be too late in the middle-end.
> https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576922.html
> https://gcc.gnu.org/pipermail/gcc-patches/2021-August/576923.html
>
> This latest version incorporates Richard Biener's feedback/suggestion
> to perceive MULT_HIGHPART_EXPR in one of the "instruction selection
> passes", specifically tree-ssa-math-opts, where the recognition of
> highpart multiplications takes place in the same pass as widening
> multiplications.
>
> With each rewrite, the patch is also getting more aggressive in the
> set of widening multiplications that it recognizes as highpart multiplies.
> Currently any widening multiplication followed by a right shift (either
> signed or unsigned) by a bit count sufficient to eliminate the lowpart
> is recognized.  The result of this shift doesn't need to be truncated.
> As previously, this patch confirms the target provides a suitable
> optab before introducing the MULT_HIGHPART_EXPR.  This is the reason
> the testcase is restricted to x86_64, as this pass doesn't do anything
> on some platforms, but x86_64 should be sufficient to confirm that the
> pass is working/continues to work.
>
> This patch has been tested on x86_64-pc-linux-gnu with a make bootstrap
> and make -k check (both with and without --target_board='unix{-m32}')
> with no new regressions.  Ok for mainline?

Few nits:

+static bool
+convert_mult_to_highpart (gimple *stmt, gimple_stmt_iterator *gsi)
+{
+  tree lhs = gimple_assign_lhs (stmt);

since you assume 'stmt' is a GIMPLE assignment please statically
type it as 'gassign *'.

+  gimple *def = SSA_NAME_DEF_STMT (sarg0);
+  if (!is_gimple_assign (def))
+return false;

could be written as

gassign *def = dyn_cast  (SSA_NAME_DEF_STMT (sarg0));
if (!def)
  return false;

as well to make the followup code cheaper.

+  tree mop1 = gimple_assign_rhs1 (def);
+  tree mop2 = gimple_assign_rhs2 (def);
+  tree optype = TREE_TYPE (mop1);
+  bool unsignedp = TYPE_UNSIGNED (optype);
+  unsigned int prec = TYPE_PRECISION (optype);
+
+  if (optype != TREE_TYPE (mop2)

I think mop1 and mop2 have to be compatible types (the tree-cfg.c
GIMPLE verification only tests for same precision it seems but tree.def
says they are of type T1).  That said, I think optype != TREE_TYPE (mop2)
is superfluous and too conservative at it.  Bonus points for improving the
WIDEN_MULT_EXPR IL verification.  I hope the signs of the result and
the operands are the same though neiher does tree.def specify that nor
tree-cfg.c verify it.

+  /* The above transformation often creates useless type conversions, i.e.
+ when followed by a truncation, that aren't cleaned up elsewhere.  */
+  gimple *use_stmt;
+  imm_use_iterator use_iter;
+  FOR_EACH_IMM_USE_STMT (use_stmt, use_iter, lhs)
+if (gimple_assign_cast_p (use_stmt))
+  {

I don't like this much but I see how this looks like a useful thing to do.
Since we're close to RTL expansion it might not matter much or does
this have an effect on final code generation?  Implementation wise
I'd rather have sth like the simple_dce_from_worklist helper - have
a helper that folds use stmts of SSA names whose definition stmt
we changed in a worklist manner.  Ideally this would do what
forwprop does on a stmt, and ideally ordered in a way that if
we fold two of a stmts uses defs we only fold that stmt once after
we've folded both use defs.

I'm not sure iterating with FOR_EACH_IMM_USE_STMT and at
the same time fiddling with immediate uses of that def is OK,
IIRC it can easily break the iteration scheme which applies sorting
and a marker.  So to fix that you need some kind of worklist
anyhow which means you could do a more simplistic
fold_stmt () on that.

If the change works good enough even w/o the use folding the
patch looks good to me with that stripped.

Thanks,
Richard.

>
> 2022-01-06  Roger Sayle  
>
> gcc/ChangeLog
> * tree-ssa-math-opts.c (struct widen_mul_stats): Add a
> highpart_mults_inserted field.
> (convert_mult_to_highpart): New function to convert right shift
> of a widening multiply into a MULT_HIGHPART_EXPR.
> (math_opts_dom_walker::after_dom_children) [RSHIFT_EXPR]:
> Call new convert_mult_to_highpart function.
> (pass_optimize_widening_mul::execute): Add a statistics counter
> for tracking "highpart multiplications inserted" events.
>
> gcc/testsuite/ChangeLog
> * gcc.target/i386/mult-highpart.c: New test case.
>
>
> Thanks in advance,
> Roger
> --
>


Re: [PATCH v2] match.pd: Simplify 1 / X for integer X [PR95424]

2022-01-10 Thread Richard Biener via Gcc-patches
On Thu, Jan 6, 2022 at 11:36 AM Zhao Wei Liew  wrote:
>
> This patch implements an optimization for the following C++ code:
>
> int f(int x) {
> return 1 / x;
> }
>
> int f(unsigned int x) {
> return 1 / x;
> }
>
> Before this patch, x86-64 gcc -std=c++20 -O3 produces the following assembly:
>
> f(int):
> xor edx, edx
> mov eax, 1
> idiv edi
> ret
> f(unsigned int):
> xor edx, edx
> mov eax, 1
> div edi
> ret
>
> In comparison, clang++ -std=c++20 -O3 produces the following assembly:
>
> f(int):
> lea ecx, [rdi + 1]
> xor eax, eax
> cmp ecx, 3
> cmovb eax, edi
> ret
> f(unsigned int):
> xor eax, eax
> cmp edi, 1
> sete al
> ret
>
> Clang's output is more efficient as it avoids expensive div operations.
>
> With this patch, GCC now produces the following assembly:
>
> f(int):
> lea eax, [rdi + 1]
> cmp eax, 2
> mov eax, 0
> cmovbe eax, edi
> ret
> f(unsigned int):
> xor eax, eax
> cmp edi, 1
> sete al
> ret
>
> which is virtually identical to Clang's assembly output. Any slight 
> differences
> in the output for f(int) is possibly related to a different missed 
> optimization.
>
> v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-January/587634.html
> Changes from v1:
> 1. Refactor common if conditions.
> 2. Use build_[minus_]one_cst (type) to get -1/1 of the correct type.
> 3. Match only for TRUNC_DIV_EXPR and TYPE_PRECISION (type) > 1.
>
> gcc/ChangeLog:
>
> * match.pd: Simplify 1 / X where X is an integer.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/divide-6.c: New test.
> * gcc.dg/tree-ssa/divide-7.c: New test.
> ---
>  gcc/match.pd | 15 +++
>  gcc/testsuite/gcc.dg/tree-ssa/divide-6.c |  9 +
>  gcc/testsuite/gcc.dg/tree-ssa/divide-7.c |  9 +
>  3 files changed, 33 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-6.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/divide-7.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 84c9b918041..52a0f77f455 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -432,6 +432,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>&& TYPE_UNSIGNED (type))
>(trunc_div @0 @1)))
>
> + /* 1 / X -> X == 1 for unsigned integer X.
> +1 / X -> X >= -1 && X <= 1 ? X : 0 for signed integer X.
> +But not for 1 / 0 so that we can get proper warnings and errors,
> +and not for 1-bit integers as they are edge cases better handled 
> elsewhere. */
> +(simplify
> +  (trunc_div integer_onep@0 @1)
> +  (if (INTEGRAL_TYPE_P (type) && !integer_zerop (@1) && TYPE_PRECISION 
> (type) > 1)
> +(switch
> +  (if (TYPE_UNSIGNED (type))
> +(eq @1 { build_one_cst (type); }))
> +  (if (!TYPE_UNSIGNED (type))

   (if (TYPE_UNSIGNED (type))
(... A ...)
(... B ...))

works like if (x) A else B, that's shorter and faster than the switch variant.

OK with that change.

Thanks,
Richard.

> +(with { tree utype = unsigned_type_for (type); }
> +  (cond (le (plus (convert:utype @1) { build_one_cst (utype); }) { 
> build_int_cst (utype, 2); })
> +@1 { build_zero_cst (type); }))
> +
>  /* Combine two successive divisions.  Note that combining ceil_div
> and floor_div is trickier and combining round_div even more so.  */
>  (for div (trunc_div exact_div)
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c
> new file mode 100644
> index 000..a9fc4c04058
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-6.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-optimized" } */
> +
> +unsigned int f(unsigned int x) {
> +  return 1 / x;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */
> +/* { dg-final { scan-tree-dump "x_..D. == 1;" "optimized" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c
> new file mode 100644
> index 000..285279af7c2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/divide-7.c
> @@ -0,0 +1,9 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O -fdump-tree-optimized" } */
> +
> +int f(int x) {
> +  return 1 / x;
> +}
> +
> +/* { dg-final { scan-tree-dump-not "1 / x_..D.;" "optimized" } } */
> +/* { dg-final { scan-tree-dump ".. <= 2 ? x_..D. : 0;" "optimized" } } */
> --
> 2.17.1
>


Re: [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines.

2022-01-10 Thread Richard Biener via Gcc-patches
On Fri, Dec 17, 2021 at 4:44 PM Tamar Christina via Gcc-patches
 wrote:
>
> Hi All,
>
> This patch boosts the analysis for complex mul,fma and fms in order to ensure
> that it doesn't create an incorrect output.
>
> Essentially it adds an extra verification to check that the two nodes it's 
> going
> to combine do the same operations on compatible values.  The reason it needs 
> to
> do this is that if one computation differs from the other then with the 
> current
> implementation we have no way to deal with it since we have to remove the
> permute.
>
> When we can keep the permute around we can probably handle these by unrolling.
>
> While implementing this since I have to do the traversal anyway I took 
> advantage
> of it by simplifying the code a bit.  Previously we would determine whether
> something is a conjugate and then try to figure out which conjugate it is and
> then try to see if the permutes match what we expect.
>
> Now the code that does the traversal will detect this in one go and return to 
> us
> whether the operation is something that can be combined and whether a 
> conjugate
> is present.
>
> Secondly because it does this I can now simplify the checking code itself to
> essentially just try to apply fixed patterns to each operation.
>
> The patterns represent the order operations should appear in. For instance a
> complex MUL operation combines :
>
>   Left 1 + Right 1
>   Left 2 + Right 2
>
> with a permute on the nodes consisting of:
>
>   { Even, Even } + { Odd, Odd  }
>   { Even, Odd  } + { Odd, Even }
>
> By abstracting over these patterns the checking code becomes quite simple.
>
> As part of this I was checking the order of the operands which was left in
> "slp" order. as in, the same order they showed up in during SLP, which means
> that the accumulator is first.  However it looks like I didn't document this
> and the x86 optab was implemented assuming the same order as FMA, i.e. that
> the accumulator is last.
>
> I have this changed the order to match that of FMA and FMS which corrects the
> x86 codegen and will update the Arm targets.  This has now also been
> documented.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no regressions.
>
> Ok for master? and backport to GCC 11 after some stew?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR tree-optimization/102819
> PR tree-optimization/103169
> * doc/md.texi: Update docs for cfms, cfma.
> * tree-data-ref.h (same_data_refs): Accept optional offset.
> * tree-vect-slp-patterns.c (is_linear_load_p): Fix issue with 
> repeating
> patterns.
> (vect_normalize_conj_loc): Remove.
> (is_eq_or_top): Change to take two nodes.
> (enum _conj_status, compatible_complex_nodes_p,
> vect_validate_multiplication): New.
> (class complex_add_pattern, complex_add_pattern::matches,
> complex_add_pattern::recognize, class complex_mul_pattern,
> complex_mul_pattern::recognize, class complex_fms_pattern,
> complex_fms_pattern::recognize, class complex_operations_pattern,
> complex_operations_pattern::recognize, addsub_pattern::recognize): 
> Pass
> new cache.
> (complex_fms_pattern::matches, complex_mul_pattern::matches): Pass new
> cache and use new validation code.
> * tree-vect-slp.c (vect_match_slp_patterns_2, vect_match_slp_patterns,
> vect_analyze_slp): Pass along cache.
> (compatible_calls_p): Expose.
> * tree-vectorizer.h (compatible_calls_p, slp_node_hash,
> slp_compat_nodes_map_t): New.
> (class vect_pattern): Update signatures include new cache.
>
> gcc/testsuite/ChangeLog:
>
> PR tree-optimization/102819
> PR tree-optimization/103169
> * g++.dg/vect/pr99149.cc: xfail for now.
> * gcc.dg/vect/complex/pr102819-1.c: New test.
> * gcc.dg/vect/complex/pr102819-2.c: New test.
> * gcc.dg/vect/complex/pr102819-3.c: New test.
> * gcc.dg/vect/complex/pr102819-4.c: New test.
> * gcc.dg/vect/complex/pr102819-5.c: New test.
> * gcc.dg/vect/complex/pr102819-6.c: New test.
> * gcc.dg/vect/complex/pr102819-7.c: New test.
> * gcc.dg/vect/complex/pr102819-8.c: New test.
> * gcc.dg/vect/complex/pr102819-9.c: New test.
> * gcc.dg/vect/complex/pr103169.c: New test.
>
> --- inline copy of patch --
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467bc66e9cfebe9dcfc
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate that is 
> semantically the same as
>  a multiply and accumulate of complex numbers.
>
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
>for (int i = 0; i < N; 

Re: [PATCH] vect: Add bias parameter for partial vectorization

2022-01-10 Thread Robin Dapp via Gcc-patches
Hi Richard,

> I think it would be better to fold this into the existing documentation
> a bit more:
[..]

done.

Fixed the remaining nits in the attached v5.

Bootstrap and regtest are good on s390x, Power9 and i386.

Regards
 Robin

--

gcc/ChangeLog:

* config/rs6000/vsx.md: Use const0 bias predicate.
* doc/md.texi: Document bias value.
* internal-fn.c (expand_partial_load_optab_fn): Add bias.
(expand_partial_store_optab_fn): Likewise.
(internal_len_load_store_bias): New function.
* internal-fn.h (VECT_PARTIAL_BIAS_UNSUPPORTED): New define.
(internal_len_load_store_bias): New function.
* tree-vect-loop-manip.c (vect_set_loop_controls_directly): Set
bias.
(vect_set_loop_condition_partial_vectors): Add header_seq parameter.
* tree-vect-loop.c (vect_verify_loop_lens): Verify bias.
(vect_estimate_min_profitable_iters): Account for bias.
(vect_get_loop_len): Add bias-adjusted length.
* tree-vect-stmts.c (vectorizable_store): Use.
(vectorizable_load): Use.
* tree-vectorizer.h (struct rgroup_controls): Add bias-adjusted
length.
(LOOP_VINFO_PARTIAL_LOAD_STORE_BIAS): New macro.diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 83d6c7b76f3..9da166f0502 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -5618,7 +5618,8 @@
 (define_expand "len_load_v16qi"
   [(match_operand:V16QI 0 "vlogical_operand")
(match_operand:V16QI 1 "memory_operand")
-   (match_operand:QI 2 "gpc_reg_operand")]
+   (match_operand:QI 2 "gpc_reg_operand")
+   (match_operand:QI 3 "zero_constant")]
   "TARGET_P9_VECTOR && TARGET_64BIT"
 {
   rtx mem = XEXP (operands[1], 0);
@@ -5632,6 +5633,7 @@
   [(match_operand:V16QI 0 "memory_operand")
(match_operand:V16QI 1 "vlogical_operand")
(match_operand:QI 2 "gpc_reg_operand")
+   (match_operand:QI 3 "zero_constant")
   ]
   "TARGET_P9_VECTOR && TARGET_64BIT"
 {
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 8fd0f8d2fe1..2af3e68ca7e 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -5206,25 +5206,43 @@ This pattern is not allowed to @code{FAIL}.
 
 @cindex @code{len_load_@var{m}} instruction pattern
 @item @samp{len_load_@var{m}}
-Load the number of vector elements specified by operand 2 from memory
-operand 1 into vector register operand 0, setting the other elements of
+Load (operand 2 - operand 3) elements from vector memory operand 1
+into vector register operand 0, setting the other elements of
 operand 0 to undefined values.  Operands 0 and 1 have mode @var{m},
 which must be a vector mode.  Operand 2 has whichever integer mode the
-target prefers.  If operand 2 exceeds the number of elements in mode
-@var{m}, the behavior is undefined.  If the target prefers the length
-to be measured in bytes rather than elements, it should only implement
-this pattern for vectors of @code{QI} elements.
+target prefers.  Operand 3 conceptually has mode @code{QI}.
+
+Operand 2 can be a variable or a constant amount.  Operand 3 specifies a
+constant bias: it is either a constant 0 or a constant -1.  The predicate on
+operand 3 must only accept the bias values that the target actually supports.
+GCC handles a bias of 0 more efficiently than a bias of -1.
+
+If (operand 2 - operand 3) exceeds the number of elements in mode
+@var{m}, the behavior is undefined.
+
+If the target prefers the length to be measured in bytes rather than
+elements, it should only implement this pattern for vectors of @code{QI}
+elements.
 
 This pattern is not allowed to @code{FAIL}.
 
 @cindex @code{len_store_@var{m}} instruction pattern
 @item @samp{len_store_@var{m}}
-Store the number of vector elements specified by operand 2 from vector
-register operand 1 into memory operand 0, leaving the other elements of
+Store (operand 2 - operand 3) vector elements from vector register operand 1
+into memory operand 0, leaving the other elements of
 operand 0 unchanged.  Operands 0 and 1 have mode @var{m}, which must be
 a vector mode.  Operand 2 has whichever integer mode the target prefers.
-If operand 2 exceeds the number of elements in mode @var{m}, the behavior
-is undefined.  If the target prefers the length to be measured in bytes
+Operand 3 conceptually has mode @code{QI}.
+
+Operand 2 can be a variable or a constant amount.  Operand 3 specifies a
+constant bias: it is either a constant 0 or a constant -1.  The predicate on
+operand 3 must only accept the bias values that the target actually supports.
+GCC handles a bias of 0 more efficiently than a bias of -1.
+
+If (operand 2 - operand 3) exceeds the number of elements in mode
+@var{m}, the behavior is undefined.
+
+If the target prefers the length to be measured in bytes
 rather than elements, it should only implement this pattern for vectors
 of @code{QI} elements.
 
diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
index 08f94b7a17a..655c04bfa5a 100644
--- a/gcc/internal-fn.c
+++ b/gcc/internal-

Re: [PATCH] libstdc++: Fix and simplify freestanding configuration [PR103866]

2022-01-10 Thread Jonathan Wakely via Gcc-patches
On Fri, 7 Jan 2022 at 13:46, Jonathan Wakely via Libstdc++ <
libstd...@gcc.gnu.org> wrote:

> Tested powerpc64le-linux and by building a mips-none-elf cross with
> --disable-hosted-libstdcxx --without-headers (which fails currently).
>
> Any objections?
>

Pushed to trunk.




>
> This fixes the --disable-hosted-libstdcxx build so that it works with
> --without-headers. Currently you need to also use --with-newlib, which
> is confusing for users who aren't actually using newlib.
>
> The AM_PROG_LIBTOOL checks are currently skipped for --with-newlib and
> --with-avrlibc builds, with this change they are also skipped when using
> --without-headers.  It would be nice if using --disable-hosted-libstdcxx
> automatically skipped those checks, but GLIBCXX_ENABLE_HOSTED comes too
> late to make the AM_PROG_LIBTOOL checks depend on $is_hosted.
>
> The checks for EOF, SEEK_CUR etc. cause the build to fail if there is no
>  available.  Unlike most headers, which get a HAVE_FOO_H macro,
>  is in autoconf's default includes, so every check tries to
> include it unconditionally. This change skips those checks for
> freestanding builds.
>
> Similarly, the checks for  types done by GCC_HEADER_STDINT try
> to include  and fail for --without-headers builds. This change
> skips the use of GCC_HEADER_STDINT for freestanding. We can probably
> stop using GCC_HEADER_STDINT entirely, since only one file uses the
> gstdint.h header that is generated, and that could easily be changed to
> use  instead. That can wait for stage 1.
>
> We also need to skip the GLIBCXX_CROSSCONFIG stage if --without-headers
> was used, since we don't have any of the functions it deals with.
>
> The end result of the changes above is that it should not be necessary
> for a --disable-hosted-libstdcxx --without-headers build to also use
> --with-newlib.
>
> Finally, compile libsupc++ with -ffreestanding when --without-headers is
> used, so that  will use  instead of expecting it
> to come from libc.
>
> libstdc++-v3/ChangeLog:
>
> PR libstdc++/103866
> * acinclude.m4 (GLIBCXX_COMPUTE_STDIO_INTEGER_CONSTANTS): Do
> nothing for freestanding builds.
> (GLIBCXX_ENABLE_HOSTED): Define FREESTANDING_FLAGS.
> * configure.ac: Do not use AC_LIBTOOL_DLOPEN when configured
> with --without-headers.  Do not use GCC_HEADER_STDINT for
> freestanding builds.
> * libsupc++/Makefile.am (HOSTED_CXXFLAGS): Use -ffreestanding
> for freestanding builds.
> * configure: Regenerate.
> * Makefile.in: Regenerate.
> * doc/Makefile.in: Regenerate.
> * include/Makefile.in: Regenerate.
> * libsupc++/Makefile.in: Regenerate.
> * po/Makefile.in: Regenerate.
> * python/Makefile.in: Regenerate.
> * src/Makefile.in: Regenerate.
> * src/c++11/Makefile.in: Regenerate.
> * src/c++17/Makefile.in: Regenerate.
> * src/c++20/Makefile.in: Regenerate.
> * src/c++98/Makefile.in: Regenerate.
> * src/filesystem/Makefile.in: Regenerate.
> * testsuite/Makefile.in: Regenerate.
> ---
>  libstdc++-v3/Makefile.in|  1 +
>  libstdc++-v3/acinclude.m4   |  8 ++
>  libstdc++-v3/configure  | 35 ++---
>  libstdc++-v3/configure.ac   | 10 +--
>  libstdc++-v3/doc/Makefile.in|  1 +
>  libstdc++-v3/include/Makefile.in|  1 +
>  libstdc++-v3/libsupc++/Makefile.am  |  2 +-
>  libstdc++-v3/libsupc++/Makefile.in  |  3 ++-
>  libstdc++-v3/po/Makefile.in |  1 +
>  libstdc++-v3/python/Makefile.in |  1 +
>  libstdc++-v3/src/Makefile.in|  1 +
>  libstdc++-v3/src/c++11/Makefile.in  |  1 +
>  libstdc++-v3/src/c++17/Makefile.in  |  1 +
>  libstdc++-v3/src/c++20/Makefile.in  |  1 +
>  libstdc++-v3/src/c++98/Makefile.in  |  1 +
>  libstdc++-v3/src/filesystem/Makefile.in |  1 +
>  libstdc++-v3/testsuite/Makefile.in  |  1 +
>  17 files changed, 56 insertions(+), 14 deletions(-)
>
> diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
> index 635168d7e25..b770d5bcdc4 100644
> --- a/libstdc++-v3/acinclude.m4
> +++ b/libstdc++-v3/acinclude.m4
> @@ -2081,6 +2081,7 @@ dnl Compute the EOF, SEEK_CUR, and SEEK_END integer
> constants.
>  dnl
>  AC_DEFUN([GLIBCXX_COMPUTE_STDIO_INTEGER_CONSTANTS], [
>
> +if test "$is_hosted" = yes; then
>AC_CACHE_CHECK([for the value of EOF], glibcxx_cv_stdio_eof, [
>AC_COMPUTE_INT([glibcxx_cv_stdio_eof], [[EOF]],
>  [#include ],
> @@ -2104,6 +2105,7 @@ AC_DEFUN([GLIBCXX_COMPUTE_STDIO_INTEGER_CONSTANTS], [
>])
>AC_DEFINE_UNQUOTED(_GLIBCXX_STDIO_SEEK_END, $glibcxx_cv_stdio_seek_end,
>  [Define to the value of the SEEK_END integer
> constant.])
> +fi
>  ])
>
>  dnl
> @@ -2923,12 +2925,16 @@ AC_DEFUN([GLIBCXX_ENABLE_HOSTED], [
> enable_hosted_libstdcxx=yes
> ;;
>   esac])
> +  free

[committed] libstdc++: Add dg-timeout-factor to some more regex tests

2022-01-10 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux.  Pushed to trunk.


I'm seeing these fail with tool_timeout=30 on a busy machine.

libstdc++-v3/ChangeLog:

* testsuite/28_regex/algorithms/regex_replace/char/103664.cc:
Add dg-timeout-factor directive.
* testsuite/28_regex/basic_regex/84110.cc: Likewise.
* testsuite/28_regex/basic_regex/ctors/char/other.cc: Likewise.
* testsuite/28_regex/match_results/102667.cc: Likewise.
---
 .../testsuite/28_regex/algorithms/regex_replace/char/103664.cc   | 1 +
 libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc | 1 +
 libstdc++-v3/testsuite/28_regex/basic_regex/ctors/char/other.cc  | 1 +
 libstdc++-v3/testsuite/28_regex/match_results/102667.cc  | 1 +
 4 files changed, 4 insertions(+)

diff --git 
a/libstdc++-v3/testsuite/28_regex/algorithms/regex_replace/char/103664.cc 
b/libstdc++-v3/testsuite/28_regex/algorithms/regex_replace/char/103664.cc
index ca75e49ed3e..c61912823d5 100644
--- a/libstdc++-v3/testsuite/28_regex/algorithms/regex_replace/char/103664.cc
+++ b/libstdc++-v3/testsuite/28_regex/algorithms/regex_replace/char/103664.cc
@@ -1,4 +1,5 @@
 // { dg-do run { target c++11 } }
+// { dg-timeout-factor 2 }
 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc 
b/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc
index 16f928b40ef..a4d5db6c14a 100644
--- a/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc
+++ b/libstdc++-v3/testsuite/28_regex/basic_regex/84110.cc
@@ -1,4 +1,5 @@
 // { dg-do run { target c++11 } }
+// { dg-timeout-factor 2 }
 #include 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/28_regex/basic_regex/ctors/char/other.cc 
b/libstdc++-v3/testsuite/28_regex/basic_regex/ctors/char/other.cc
index f9b68a72f0a..10c20d9d4bc 100644
--- a/libstdc++-v3/testsuite/28_regex/basic_regex/ctors/char/other.cc
+++ b/libstdc++-v3/testsuite/28_regex/basic_regex/ctors/char/other.cc
@@ -1,4 +1,5 @@
 // { dg-do run { target c++11 } }
+// { dg-timeout-factor 2 }
 #include 
 #include 
 #include 
diff --git a/libstdc++-v3/testsuite/28_regex/match_results/102667.cc 
b/libstdc++-v3/testsuite/28_regex/match_results/102667.cc
index 9e38c9edaa4..1614f3f9eb8 100644
--- a/libstdc++-v3/testsuite/28_regex/match_results/102667.cc
+++ b/libstdc++-v3/testsuite/28_regex/match_results/102667.cc
@@ -1,4 +1,5 @@
 // { dg-do run { target c++11 } }
+// { dg-timeout-factor 2 }
 
 #include 
 #include 
-- 
2.31.1



[committed] libstdc++: Update default -std option in manual

2022-01-10 Thread Jonathan Wakely via Gcc-patches
Pushed to trunk.

libstdc++-v3/ChangeLog:

* doc/xml/manual/using.xml: Update documentation around default
-std option.
* doc/html/*: Regenerate.
---
 libstdc++-v3/doc/html/index.html| 4 ++--
 libstdc++-v3/doc/html/manual/using.html | 3 ++-
 libstdc++-v3/doc/xml/manual/using.xml   | 3 ++-
 3 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/libstdc++-v3/doc/xml/manual/using.xml 
b/libstdc++-v3/doc/xml/manual/using.xml
index 65fde4609db..36b86702d22 100644
--- a/libstdc++-v3/doc/xml/manual/using.xml
+++ b/libstdc++-v3/doc/xml/manual/using.xml
@@ -16,7 +16,8 @@
   The standard library conforms to the dialect of C++ specified by the
   -std option passed to the compiler.
   By default, g++ is equivalent to
-  g++ -std=gnu++14 since GCC 6, and
+  g++ -std=gnu++17 since GCC 11, and
+  g++ -std=gnu++14 in GCC 6, 7, 8, 9, and 10, and
   g++ -std=gnu++98 for older releases.
 
 
-- 
2.31.1



[committed] libstdc++: Add -nostdinc++ for c++17 sources [PR100017]

2022-01-10 Thread Jonathan Wakely via Gcc-patches
Tested powerpc64le-linux, and build=x86_64-linux host=aarch64-linux.
Pushed to trunk. Backport to gcc-11 also needed.


When building a build!=host compiler, the just-built gcc can't be used
to build the target libstdc++ (because it is built for the host triplet,
not the build triplet). The top-level configure.ac sets up the build
flags for libstdc++ (and other "raw_cxx" libs) like this:

GCC_TARGET_TOOL(c++ for libstdc++, RAW_CXX_FOR_TARGET, CXX,
[gcc/xgcc -shared-libgcc -B$$r/$(HOST_SUBDIR)/gcc -nostdinc++ 
-L$$r/$(TARGET_SUBDIR)/libstdc++-v3/src 
-L$$r/$(TARGET_SUBDIR)/libstdc++-v3/src/.libs 
-L$$r/$(TARGET_SUBDIR)/libstdc++-v3/libsupc++/.libs],
c++)

The -nostdinc++ flag is only used for the IN-TREE-TOOL, i.e. when using
the just-built gcc/xgcc compiler. This means that the cross-compiler
used to build libstdc++ will add its own libstdc++ headers to the
include path. That results in the #include  in
src/c++17/floating_to_chars.cc and src/c++17/floating_from_chars.cc
doing #include_next  and finding the libstdc++ fenv.h wrapper
from the host compiler. Because that has the same include guard as the
 in the libstdc++ we're trying to build, we never reach the
underlying  from libc. That results in several errors of the
form:

error: 'fenv_t' has not been declared in '::'

The most correct fix would be to add -nostdinc++ to the
RAW_CXX_FOR_TARGET variable in configure.ac, or the
RAW_CXX_TARGET_EXPORTS variable in Makefile.tpl.

Another solution would be to make the libstdc++  wrapper use
_GLIBCXX_INCLUDE_NEXT_C_HEADERS like our  and other C header
wrappers.

For now though, the simplest and safest solution is to just add
-nostdinc++ to the CXXFLAGS used for src/c++17/*.cc, which is what this
does.

libstdc++-v3/ChangeLog:

PR libstdc++/100017
* src/c++17/Makefile.am (AM_CXXFLAGS): Add -nostdinc++.
* src/c++17/Makefile.in: Regenerate.
---
 libstdc++-v3/src/c++17/Makefile.am | 2 +-
 libstdc++-v3/src/c++17/Makefile.in | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/libstdc++-v3/src/c++17/Makefile.am 
b/libstdc++-v3/src/c++17/Makefile.am
index f08553a1dd7..3d53f652fac 100644
--- a/libstdc++-v3/src/c++17/Makefile.am
+++ b/libstdc++-v3/src/c++17/Makefile.am
@@ -79,7 +79,7 @@ endif
 # OPTIMIZE_CXXFLAGS on the compile line so that -O2 can be overridden
 # as the occasion calls for it.
 AM_CXXFLAGS = \
-   -std=gnu++17 \
+   -std=gnu++17 -nostdinc++ \
$(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
$(XTEMPLATE_FLAGS) $(VTV_CXXFLAGS) \
$(WARN_CXXFLAGS) $(OPTIMIZE_CXXFLAGS) $(CONFIG_CXXFLAGS) \
diff --git a/libstdc++-v3/src/c++17/Makefile.in 
b/libstdc++-v3/src/c++17/Makefile.in
index 63984ecd52a..8c02be6514f 100644
--- a/libstdc++-v3/src/c++17/Makefile.in
+++ b/libstdc++-v3/src/c++17/Makefile.in
@@ -455,7 +455,7 @@ libc__17convenience_la_SOURCES = $(sources)  $(inst_sources)
 # OPTIMIZE_CXXFLAGS on the compile line so that -O2 can be overridden
 # as the occasion calls for it.
 AM_CXXFLAGS = \
-   -std=gnu++17 \
+   -std=gnu++17 -nostdinc++ \
$(glibcxx_lt_pic_flag) $(glibcxx_compiler_shared_flag) \
$(XTEMPLATE_FLAGS) $(VTV_CXXFLAGS) \
$(WARN_CXXFLAGS) $(OPTIMIZE_CXXFLAGS) $(CONFIG_CXXFLAGS) \
-- 
2.31.1



Re: [PATCH] [aarch64/64821]: Simplify __builtin_aarch64_sqrt* into internal function .SQRT.

2022-01-10 Thread Richard Sandiford via Gcc-patches
apinski--- via Gcc-patches  writes:
> From: Andrew Pinski 
>
> This is a simple patch which simplifies the __builtin_aarch64_sqrt* builtins
> into the internal function SQRT which allows for constant folding and other
> optimizations at the gimple level. It was originally suggested we do to
> __builtin_sqrt just for __builtin_aarch64_sqrtdf when -fno-math-errno
> but since r6-4969-g686ee9719a4 we have the internal function SQRT which does
> the same so it makes we don't need to check -fno-math-errno either now.
>
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.
>
>   PR target/64821
>
> gcc/ChangeLog:
>
>   * config/aarch64/aarch64-builtins.c
>   (aarch64_general_gimple_fold_builtin): Handle
>   __builtin_aarch64_sqrt* and simplify into SQRT internal
>   function.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/aarch64/vsqrt-1.c: New test.
>   * gcc.target/aarch64/vsqrt-2.c: New test.
> ---
>  gcc/config/aarch64/aarch64-builtins.c  |  7 ++
>  gcc/testsuite/gcc.target/aarch64/vsqrt-1.c | 17 +
>  gcc/testsuite/gcc.target/aarch64/vsqrt-2.c | 28 ++
>  3 files changed, 52 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/vsqrt-1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/vsqrt-2.c
>
> diff --git a/gcc/config/aarch64/aarch64-builtins.c 
> b/gcc/config/aarch64/aarch64-builtins.c
> index 58bcbd9875f..1bf487477eb 100644
> --- a/gcc/config/aarch64/aarch64-builtins.c
> +++ b/gcc/config/aarch64/aarch64-builtins.c
> @@ -2820,6 +2820,13 @@ aarch64_general_gimple_fold_builtin (unsigned int 
> fcode, gcall *stmt,
>   gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
>   break;
>  
> +  /* Lower sqrt builtins to gimple/internal function sqrt. */
> +  BUILTIN_VHSDF_DF (UNOP, sqrt, 2, FP)
> + new_stmt = gimple_build_call_internal (IFN_SQRT,
> +1, args[0]);

Sorry for the nit-pick, but: IMO it looks odd to split this over two lines.

> + gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
> + break;
> +
>   /*lower store and load neon builtins to gimple.  */
>   BUILTIN_VALL_F16 (LOAD1, ld1, 0, LOAD)
>   BUILTIN_VDQ_I (LOAD1_U, ld1, 0, LOAD)
> diff --git a/gcc/testsuite/gcc.target/aarch64/vsqrt-1.c 
> b/gcc/testsuite/gcc.target/aarch64/vsqrt-1.c
> new file mode 100644
> index 000..3207c8774ca
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vsqrt-1.c
> @@ -0,0 +1,17 @@
> +/* PR target/64821 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> +/* Check that we constant fold sqrt(4.0) into 2.0. */
> +/* { dg-final { scan-tree-dump-times ".SQRT" 0 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "2.0" 1 "optimized" } } */

I think these would be better as scan-tree-dump-not and scan-tree-dump
respectively.  The number of 2.0s isn't important and 2.0 has the risk
of matching someone's directory name.  Probably worth backslash-quoting
the "."s too.

OK with those changes, thanks.

Richard

> +/* { dg-final { scan-assembler-times "fsqrt" 0 } } */
> +/* We should produce a fmov to d0 with 2.0 but currently don't, see PR 
> 103959. */
> +/* { dg-final { scan-assembler-times "\n\tfmov\td0, 2.0e.0" 1 { xfail *-*-* 
> } } } */
> +
> +#include 
> +
> +float64x1_t f64(void)
> +{
> +   float64x1_t a = (float64x1_t){4.0};
> +   return vsqrt_f64 (a);
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/vsqrt-2.c 
> b/gcc/testsuite/gcc.target/aarch64/vsqrt-2.c
> new file mode 100644
> index 000..7465b79f3a7
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/vsqrt-2.c
> @@ -0,0 +1,28 @@
> +/* PR target/64821 */
> +/* { dg-do compile } */
> +/* { dg-options "-fdump-tree-optimized" } */
> +#include 
> +
> +/* Check that we lower __builtin_aarch64_sqrt* into the internal function 
> SQRT. */
> +/* { dg-final { scan-tree-dump-times "__builtin_aarch64" 0 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times ".SQRT" 4 "optimized" } } */
> +
> +float64x1_t f64(float64x1_t a)
> +{
> +  return vsqrt_f64 (a);
> +}
> +
> +float64x2_t f64q(float64x2_t a)
> +{
> +  return vsqrtq_f64 (a);
> +}
> +
> +float32x2_t f32(float32x2_t a)
> +{
> +  return vsqrt_f32 (a);
> +}
> +
> +float32x4_t f32q(float32x4_t a)
> +{
> +  return vsqrtq_f32 (a);
> +}


Re: [PATCH] Fortran: make IEEE_CLASS recognize signaling NaNs

2022-01-10 Thread FX via Gcc-patches
Thanks Mikael. I haven’t been active with gfortran development in a while, but 
I originally wrote those IEEE routines so I believe my understanding of them is 
fair. I will continue posting my next patches to gather comments (if any), but 
they’re relatively straightforward.

The main limitation (not with this patch, but with the next ones) is some 
targets have really weird floating-point formats, and I cannot test on all 
possible targets. Feel free to poke me on any issue that arises, in ML or in 
bugzilla.

Best,
FX


GCC 12.0.0 Status Report (2022-01-10), Stage 3 ends Jan 16th

2022-01-10 Thread Richard Biener via Gcc-patches
Status
==

The GCC development branch is open for general bugfixing (Stage 3)
and will transition to regression and documentation fixing only
(Stage 4) on the end of Jan 16th.

Take the quality data below with a big grain of salt - most of the
new P3 classified bugs will become P1 or P2 (generally every
regression against GCC 11 is to be considered P1 if it concerns
primary or secondary platforms).


Quality Data


Priority  #   Change from last report
---   ---
P1   30   -  4
P2  307   +  1
P3  279   + 42
P4  220   + 13
P5   25
---   ---
Total P1-P3 616   + 39
Total   861   + 52


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2021-November/237741.html


Re: [COMIITTED] Testsuite: Make dependence on -fdelete-null-pointer-checks explicit

2022-01-10 Thread Jonathan Wakely via Gcc-patches

CC libstdc++ and Jakub.

On 08/01/22 23:22 -0700, Sandra Loosemore wrote:

I've checked in these tweaks for various testcases that fail on
nios2-elf without an explicit -fdelete-null-pointer-checks option.  This
target is configured to build with that optimization off by default.

-Sandra

commit 04c69d0e61c0f98a010d77a79ab749d5f0aa6b67
Author: Sandra Loosemore 
Date:   Sat Jan 8 22:02:13 2022 -0800

   Testsuite: Make dependence on -fdelete-null-pointer-checks explicit

   nios2-elf target defaults to -fno-delete-null-pointer-checks, breaking
   tests that implicitly depend on that optimization.  Add the option
   explicitly on these tests.

   2022-01-08  Sandra Loosemore  

gcc/testsuite/
* g++.dg/cpp0x/constexpr-compare1.C: Add explicit
-fdelete-null-pointer-checks option.
* g++.dg/cpp0x/constexpr-compare2.C: Likewise.
* g++.dg/cpp0x/constexpr-typeid2.C: Likewise.
* g++.dg/cpp1y/constexpr-94716.C: Likewise.
* g++.dg/cpp1z/constexpr-compare1.C: Likewise.
* g++.dg/cpp1z/constexpr-if36.C: Likewise.
* gcc.dg/init-compare-1.c: Likewise.

libstdc++-v3/
* testsuite/18_support/type_info/constexpr.cc: Add explicit
-fdelete-null-pointer-checks option.


This test should not be doing anything with null pointers. Instead of
working around the error on nios2-elf, I think the front-end needs
fixing.

Maybe something is not being folded early enough for the constexpr
evaluation to work. Jakub?

$ g++ -std=gnu++23  
~/src/gcc/gcc/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc -c 
-fno-delete-null-pointer-checks
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc:49:22:
 error: non-constant condition for static assertion
   49 | static_assert( test01() );
  |~~^~
In file included from 
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc:5:
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/18_support/type_info/constexpr.cc:49:22:
   in 'constexpr' expansion of 'test01()'
/home/jwakely/gcc/12/include/c++/12.0.0/typeinfo:196:19: error: '(((const 
std::type_info*)(& _ZTIi)) == ((const std::type_info*)(& _ZTIl)))' is not a 
constant expression
  196 |   return this == &__arg;
  |  ~^





[PATCH] [aarch64/64821]: Simplify __builtin_aarch64_sqrt* into internal function .SQRT.

2022-01-10 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

This is a simple patch which simplifies the __builtin_aarch64_sqrt* builtins
into the internal function SQRT which allows for constant folding and other
optimizations at the gimple level. It was originally suggested we do to
__builtin_sqrt just for __builtin_aarch64_sqrtdf when -fno-math-errno
but since r6-4969-g686ee9719a4 we have the internal function SQRT which does
the same so it makes we don't need to check -fno-math-errno either now.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

PR target/64821

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c
(aarch64_general_gimple_fold_builtin): Handle
__builtin_aarch64_sqrt* and simplify into SQRT internal
function.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/vsqrt-1.c: New test.
* gcc.target/aarch64/vsqrt-2.c: New test.
---
 gcc/config/aarch64/aarch64-builtins.c  |  7 ++
 gcc/testsuite/gcc.target/aarch64/vsqrt-1.c | 17 +
 gcc/testsuite/gcc.target/aarch64/vsqrt-2.c | 28 ++
 3 files changed, 52 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vsqrt-1.c
 create mode 100644 gcc/testsuite/gcc.target/aarch64/vsqrt-2.c

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index 58bcbd9875f..1bf487477eb 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -2820,6 +2820,13 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, 
gcall *stmt,
gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
break;
 
+  /* Lower sqrt builtins to gimple/internal function sqrt. */
+  BUILTIN_VHSDF_DF (UNOP, sqrt, 2, FP)
+   new_stmt = gimple_build_call_internal (IFN_SQRT,
+  1, args[0]);
+   gimple_call_set_lhs (new_stmt, gimple_call_lhs (stmt));
+   break;
+
  /*lower store and load neon builtins to gimple.  */
  BUILTIN_VALL_F16 (LOAD1, ld1, 0, LOAD)
  BUILTIN_VDQ_I (LOAD1_U, ld1, 0, LOAD)
diff --git a/gcc/testsuite/gcc.target/aarch64/vsqrt-1.c 
b/gcc/testsuite/gcc.target/aarch64/vsqrt-1.c
new file mode 100644
index 000..3207c8774ca
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vsqrt-1.c
@@ -0,0 +1,17 @@
+/* PR target/64821 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-optimized" } */
+/* Check that we constant fold sqrt(4.0) into 2.0. */
+/* { dg-final { scan-tree-dump-times ".SQRT" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "2.0" 1 "optimized" } } */
+/* { dg-final { scan-assembler-times "fsqrt" 0 } } */
+/* We should produce a fmov to d0 with 2.0 but currently don't, see PR 103959. 
*/
+/* { dg-final { scan-assembler-times "\n\tfmov\td0, 2.0e.0" 1 { xfail *-*-* } 
} } */
+
+#include 
+
+float64x1_t f64(void)
+{
+   float64x1_t a = (float64x1_t){4.0};
+   return vsqrt_f64 (a);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/vsqrt-2.c 
b/gcc/testsuite/gcc.target/aarch64/vsqrt-2.c
new file mode 100644
index 000..7465b79f3a7
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/vsqrt-2.c
@@ -0,0 +1,28 @@
+/* PR target/64821 */
+/* { dg-do compile } */
+/* { dg-options "-fdump-tree-optimized" } */
+#include 
+
+/* Check that we lower __builtin_aarch64_sqrt* into the internal function 
SQRT. */
+/* { dg-final { scan-tree-dump-times "__builtin_aarch64" 0 "optimized" } } */
+/* { dg-final { scan-tree-dump-times ".SQRT" 4 "optimized" } } */
+
+float64x1_t f64(float64x1_t a)
+{
+  return vsqrt_f64 (a);
+}
+
+float64x2_t f64q(float64x2_t a)
+{
+  return vsqrtq_f64 (a);
+}
+
+float32x2_t f32(float32x2_t a)
+{
+  return vsqrt_f32 (a);
+}
+
+float32x4_t f32q(float32x4_t a)
+{
+  return vsqrtq_f32 (a);
+}
-- 
2.17.1



Re: [patch] Fix PR target/103465

2022-01-10 Thread Richard Biener via Gcc-patches
On Mon, Jan 10, 2022 at 10:19 AM Eric Botcazou via Gcc-patches
 wrote:
>
> Hi,
>
> this PR uncovered that -freorder-blocks-and-partition was working by accident
> on 64-bit Windows, i.e. the middle-end was supposed to disable it with SEH.
> After Martin's change, the middle-end properly disables it now, which is too
> bad since a significant amount of work went into its implementation for SEH.
>
> Tested on x86-64/Windows, OK for all active branches?

OK.

Richard.

>
> 2022-01-10  Eric Botcazou  
>
> PR target/103465
> * coretypes.h (unwind_info_type): Swap UI_SEH and UI_TARGET.
>
> --
> Eric Botcazou


Re: [PATCH] middle-end/101530 - fix shufflevector lowering

2022-01-10 Thread Richard Biener via Gcc-patches
On Wed, 5 Jan 2022, Jeff Law wrote:

> 
> 
> On 1/5/2022 7:17 AM, Richard Biener via Gcc-patches wrote:
> > This makes __builtin_shufflevector lowering force the result
> > of the BIT_FIELD_REF lowpart operation to a temporary as to
> > fulfil the IL verifier constraint that BIT_FIELD_REFs should
> > be always in outermost handled component position.  Trying to
> > enforce this during gimplification isn't as straight-forward
> > as here where we know we're dealing with an rvalue.
> >
> > Bootstrap & regtest running on x86_64-unknown-linux-gnu - OK for trunk?
> >
> > Thanks,
> > Richard.
> >
> > 2022-01-05  Richard Biener  
> >
> > PR middle-end/101530
> > gcc/c-family/
> >  * c-common.c (c_build_shufflevector): Wrap the BIT_FIELD_REF
> >  in a TARGET_EXPR to force a temporary.
> >
> > gcc/testsuite/
> >  * c-c++-common/builtin-shufflevector-3.c: New testcase.
> Seems quite reasonable to me.

So I forgot to mark the TARGET_EXPR as having TREE_SIDE_EFFECTS
which makes the C++ FE side elide the initialization after
wrapping the TARGET_EXPR inside a COMPOUND_EXPR.  Fixed as follows,
re-bootstrapped and tested on x86_64-unknown-linux-gnu and pushed.

Richard.

>From be59671c5624fe8bf21ddb0192e97ebdfa4db381 Mon Sep 17 00:00:00 2001
From: Richard Biener 
Date: Wed, 5 Jan 2022 15:13:33 +0100
Subject: [PATCH] middle-end/101530 - fix shufflevector lowering
To: gcc-patches@gcc.gnu.org

This makes __builtin_shufflevector lowering force the result
of the BIT_FIELD_REF lowpart operation to a temporary as to
fulfil the IL verifier constraint that BIT_FIELD_REFs should
be always in outermost handled component position.  Trying to
enforce this during gimplification isn't as straight-forward
as here where we know we're dealing with an rvalue.

FAIL: c-c++-common/torture/builtin-shufflevector-1.c   -O0  execution test

2022-01-05  Richard Biener  

PR middle-end/101530
gcc/c-family/
* c-common.c (c_build_shufflevector): Wrap the BIT_FIELD_REF
in a TARGET_EXPR to force a temporary.

gcc/testsuite/
* c-c++-common/builtin-shufflevector-3.c: New testcase.
---
 gcc/c-family/c-common.c   |  7 +++
 .../c-c++-common/builtin-shufflevector-3.c| 15 +++
 2 files changed, 22 insertions(+)
 create mode 100644 gcc/testsuite/c-c++-common/builtin-shufflevector-3.c

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 13341fa315e..4a6a4edb763 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -1243,6 +1243,13 @@ c_build_shufflevector (location_t loc, tree v0, tree v1,
   tree lpartt = build_vector_type (TREE_TYPE (ret_type), mask.length ());
   ret = build3_loc (loc, BIT_FIELD_REF,
lpartt, ret, TYPE_SIZE (lpartt), bitsize_zero_node);
+  /* Wrap the lowpart operation in a TARGET_EXPR so it gets a separate
+temporary during gimplification.  See PR101530 for cases where
+we'd otherwise end up with non-toplevel BIT_FIELD_REFs.  */
+  tree tem = create_tmp_var_raw (lpartt);
+  DECL_CONTEXT (tem) = current_function_decl;
+  ret = build4 (TARGET_EXPR, lpartt, tem, ret, NULL_TREE, NULL_TREE);
+  TREE_SIDE_EFFECTS (ret) = 1;
 }
 
   if (!c_dialect_cxx () && !wrap)
diff --git a/gcc/testsuite/c-c++-common/builtin-shufflevector-3.c 
b/gcc/testsuite/c-c++-common/builtin-shufflevector-3.c
new file mode 100644
index 000..0c9bda689ef
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/builtin-shufflevector-3.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+
+typedef int __attribute__((__vector_size__ (sizeof(int)*4))) V;
+
+int
+foo(V v, int i)
+{
+  return __builtin_shufflevector (v, v, 2, 3)[i];
+}
+
+int
+bar(V v, int i)
+{
+  return __builtin_shufflevector(v, v, 4)[0] & i;
+}
-- 
2.31.1



Re: [PATCH v3 7/7] ifcvt: Run second pass if it is possible to omit a temporary.

2022-01-10 Thread Robin Dapp via Gcc-patches
Posting the ChangeLog before pushing.

--

gcc/ChangeLog:

* ifcvt.c (noce_convert_multiple_sets_1): New function.
(noce_convert_multiple_sets): Call function a second time if we can
improve the first try.


Re: [PATCH v3 6/7] testsuite/s390: Add tests for noce_convert_multiple.

2022-01-10 Thread Robin Dapp via Gcc-patches
Posting the ChangeLog before pushing.

--

gcc/testsuite/ChangeLog:

* gcc.dg/ifcvt-4.c: Remove s390-specific check.
* gcc.target/s390/ifcvt-two-insns-bool.c: New test.
* gcc.target/s390/ifcvt-two-insns-int.c: New test.
* gcc.target/s390/ifcvt-two-insns-long.c: New test.


Re: [PATCH v3 5/7] ifcvt: Try re-using CC for conditional moves.

2022-01-10 Thread Robin Dapp via Gcc-patches
Posting the ChangeLog before pushing.

--

gcc/ChangeLog:

* ifcvt.c (cond_exec_get_condition): New parameter to allow getting the
reversed comparison.
(try_emit_cmove_seq): New function to facilitate creating a cmov
sequence.
(noce_convert_multiple_sets): Create two sequences and use the less
expensive one.


Re: [PATCH v3 4/7] ifcvt/optabs: Allow using a CC comparison for emit_conditional_move.

2022-01-10 Thread Robin Dapp via Gcc-patches
Posting the ChangeLog before pushing.

--

gcc/ChangeLog:

* rtl.h (struct rtx_comparison): New struct that holds an rtx
comparison.
* config/rs6000/rs6000.c (rs6000_emit_minmax): Use struct instead of
single parameters.
(rs6000_emit_swsqrt): Likewise.
* expmed.c (expand_sdiv_pow2): Likewise.
(emit_store_flag): Likewise.
* expr.c (expand_cond_expr_using_cmove): Likewise.
(expand_expr_real_2): Likewise.
* ifcvt.c (noce_emit_cmove): Add compare and reversed compare
parameters and allow to call directly without going through
preparation steps.
* optabs.c (emit_conditional_move_1): New function.
(expand_doubleword_shift_condmove): Use struct.
(emit_conditional_move): Use struct.
* optabs.h (emit_conditional_move): Use struct.



Re: [PATCH v3 3/7] ifcvt: Improve costs handling for noce_convert_multiple.

2022-01-10 Thread Robin Dapp via Gcc-patches
Posting the ChangeLog before pushing.

--

gcc/ChangeLog:

* ifcvt.c (bb_ok_for_noce_convert_multiple_sets): Estimate insns costs.
(noce_process_if_block): Use potential costs.


Re: [PATCH v3 2/7] ifcvt: Allow constants for noce_convert_multiple.

2022-01-10 Thread Robin Dapp via Gcc-patches
Posting the ChangeLog before pushing.

--

gcc/ChangeLog:

* ifcvt.c (noce_convert_multiple_sets): Allow constants.
(bb_ok_for_noce_convert_multiple_sets): Likewise.


Re: [PATCH v3 1/7] ifcvt: Check if cmovs are needed.

2022-01-10 Thread Robin Dapp via Gcc-patches
Hi,

I included the outstanding minor remarks and believe everything is OK'ed
now.  Still posting the ChangeLogs that I omitted before continuing.
I'd expect some fallout on other targets (hopefully nothing major) since
rtx costs are handled differently now for this code path.

Regards
 Robin

--

gcc/ChangeLog:

* ifcvt.c (need_cmov_or_rewire): New function.
(noce_convert_multiple_sets): Call it.


Re: [PATCH] middle-end/77608: object size estimate with variable offsets

2022-01-10 Thread Jakub Jelinek via Gcc-patches
On Wed, Jan 05, 2022 at 06:50:58PM +0530, Siddhesh Poyarekar wrote:
> --- a/gcc/tree-object-size.c
> +++ b/gcc/tree-object-size.c
> @@ -347,10 +347,21 @@ init_offset_limit (void)
> be positive and hence, be within OFFSET_LIMIT for valid offsets.  */
>  
>  static tree
> -size_for_offset (tree sz, tree offset, tree wholesize = NULL_TREE)
> +size_for_offset (int object_size_type,tree sz, tree offset,

Formatting, missing space in between , and tree sz.

Otherwise LGTM.

Jakub



Re: [PATCH v5 4/4] tree-object-size: Dynamic sizes for ADDR_EXPR

2022-01-10 Thread Jakub Jelinek via Gcc-patches
On Sat, Dec 18, 2021 at 06:05:11PM +0530, Siddhesh Poyarekar wrote:
> --- a/gcc/tree-object-size.c
> +++ b/gcc/tree-object-size.c
> @@ -107,6 +107,14 @@ size_unknown_p (tree val, int object_size_type)
> ? integer_zerop (val) : integer_all_onesp (val));
>  }
>  
> +/* Return true if VAL is represents a valid size for OBJECT_SIZE_TYPE.  */
> +
> +static inline bool
> +size_valid_p (tree val, int object_size_type)
> +{
> +  return ((object_size_type & OST_DYNAMIC) || TREE_CODE (val) == 
> INTEGER_CST);
> +}

It is fine to introduce this predicate, but then it should be consistenly
used wherever you check the above.

> @@ -1328,7 +1323,7 @@ plus_stmt_object_size (struct object_size_info *osi, 
> tree var, gimple *stmt)
>  return false;
>  
>/* Handle PTR + OFFSET here.  */
> -  if (TREE_CODE (op1) == INTEGER_CST
> +  if (((object_size_type & OST_DYNAMIC) || TREE_CODE (op1) == INTEGER_CST)

E.g. above.

Otherwise LGTM.

Jakub



Re: [PATCH v5 3/4] tree-object-size: Handle GIMPLE_CALL

2022-01-10 Thread Jakub Jelinek via Gcc-patches
On Sat, Dec 18, 2021 at 06:05:10PM +0530, Siddhesh Poyarekar wrote:
> Handle non-constant expressions in GIMPLE_CALL arguments.  Also handle
> alloca.
> 
> gcc/ChangeLog:
> 
>   * tree-object-size.c (alloc_object_size): Make and return
>   non-constant size expression.
>   (call_object_size): Return expression or unknown based on
>   whether dynamic object size is requested.
> 
> gcc/testsuite/ChangeLog:
> 
>   * gcc.dg/builtin-dynamic-object-size-0.c: Add new tests.
>   * gcc.dg/builtin-object-size-1.c (test1)
>   [__builtin_object_size]: Alter expected result for dynamic
>   object size.
>   * gcc.dg/builtin-object-size-2.c (test1)
>   [__builtin_object_size]: Likewise.
>   * gcc.dg/builtin-object-size-3.c (test1)
>   [__builtin_object_size]: Likewise.
>   * gcc.dg/builtin-object-size-4.c (test1)
>   [__builtin_object_size]: Likewise.

Ok.

Jakub



[PATCH] nvptx: Expand QI mode operations using SI mode instructions.

2022-01-10 Thread Roger Sayle

One of the unusual target features of the Nvidia PTX ISA is that it
doesn't provide QI mode (byte sized) operations or registers.  Somewhat
conventionally, 8-bit quantities are read from/written to memory using
special instructions, but stored internally using SImode (32-bit) registers.
GCC's middle-end accomodates targets without QImode optabs, by widening
operations until suitable support is found, and with the current nvptx
backend this means 16-bit HImode operations.  The inconvenience is that
nvptx is also a TARGET_TRULY_NOOP_TRUNCATION=false target, meaning that
additional instructions are required to convert between the SImode
registers used to hold QImode values, and the HImode registers used to
operate on them (and back again).  This results in a large amount of
shuffling and type conversion in code dealing with bytes, i.e. using
char or Boolean types.

This patch improves the situation by providing expanders in the nvptx
machine description to perform QImode operations natively in SImode
instead of HImode.  An alternate implementation might be to provide
some form of target hook to specify which fallback modes to use during
RTL expansion, but I think this requirement is unusual, and a solution
entirely in the nvptx backend doesn't disturb/affect other targets.

The improvements can be quite dramatic, as shown in the example below:

int foo(int x, int y) { return (x==21) && (y==69); }

previously with -O2 required 15 instructions:

mov.u32 %r26, %ar0;
mov.u32 %r27, %ar1;
setp.eq.u32 %r31, %r26, 21;
selp.u32%r30, 1, 0, %r31;
mov.u32 %r29, %r30;
setp.eq.u32 %r34, %r27, 69;
selp.u32%r33, 1, 0, %r34;
mov.u32 %r32, %r33;
cvt.u16.u8  %r39, %r29;
mov.u16 %r36, %r39;
cvt.u16.u8  %r39, %r32;
mov.u16 %r37, %r39;
and.b16 %r35, %r36, %r37;
cvt.u32.u16 %r38, %r35;
cvt.u32.u8  %value, %r38;

with this patch, now requires only 7 instructions:

mov.u32 %r26, %ar0;
mov.u32 %r27, %ar1;
setp.eq.u32 %r31, %r26, 21;
setp.eq.u32 %r34, %r27, 69;
selp.u32%r37, 1, 0, %r31;
selp.u32%r38, 1, 0, %r34;
and.b32 %value, %r37, %r38;


This patch has been tested on nvptx-none hosted on x86_64-pc-linux-gnu
(including newlib) with a make and make -k check with no new failures.
Ok for mainline?


2022-01-10  Roger Sayle  

gcc/ChangeLog
* config/nvptx/nvptx.md (cmp): Renamed from *cmp.
(setcc_from_bi): Additionally support QImode.
(extendbi2): Additionally support QImode.
(zero_extendbi2): Additionally support QImode.
(any_sbinary, any_ubinary, any_sunary, any_uunary): New code
iterators for signed and unsigned, binary and unary operations.
(qi3, qi3, qi2, qi2): New
expanders to perform QImode operations using SImode instructions.
(cstoreqi4): New define_expand.
(*ext_truncsi2_qi): New define_insn.
(*zext_truncsi2_qi): New define_insn.

gcc/testsuite/ChangeLog
* gcc.target/nvptx/bool-1.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/nvptx/nvptx.md b/gcc/config/nvptx/nvptx.md
index ce74672..cc9cff7 100644
--- a/gcc/config/nvptx/nvptx.md
+++ b/gcc/config/nvptx/nvptx.md
@@ -763,7 +763,7 @@
 
 ;; Comparisons and branches
 
-(define_insn "*cmp"
+(define_insn "cmp"
   [(set (match_operand:BI 0 "nvptx_register_operand" "=R")
(match_operator:BI 1 "nvptx_comparison_operator"
   [(match_operand:HSDIM 2 "nvptx_register_operand" "R")
@@ -867,22 +867,22 @@
 ;; Conditional stores
 
 (define_insn "setcc_from_bi"
-  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
-   (ne:HSDIM (match_operand:BI 1 "nvptx_register_operand" "R")
+  [(set (match_operand:QHSDIM 0 "nvptx_register_operand" "=R")
+   (ne:QHSDIM (match_operand:BI 1 "nvptx_register_operand" "R")
   (const_int 0)))]
   ""
   "%.\\tselp%t0\\t%0, 1, 0, %1;")
 
 (define_insn "extendbi2"
-  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
-   (sign_extend:HSDIM
+  [(set (match_operand:QHSDIM 0 "nvptx_register_operand" "=R")
+   (sign_extend:QHSDIM
 (match_operand:BI 1 "nvptx_register_operand" "R")))]
   ""
   "%.\\tselp%t0\\t%0, -1, 0, %1;")
 
 (define_insn "zero_extendbi2"
-  [(set (match_operand:HSDIM 0 "nvptx_register_operand" "=R")
-   (zero_extend:HSDIM
+  [(set (match_operand:QHSDIM 0 "nvptx_register_operand" "=R")
+   (zero_extend:QHSDIM
 (match_operand:BI 1 "nvptx_register_operand" "R")))]
   ""
   "%.\\tselp%t0\\t%0, 1, 0, %1;")
@@ -1947,3 +1947,104 @@
 return nvptx_output_red_partition (operands[0], operands[1]);
   }
   [(set_attr "predicable" "false")])
+
+;; 

Re: [PATCH v5 2/4] tree-object-size: Handle function parameters

2022-01-10 Thread Jakub Jelinek via Gcc-patches
On Sat, Dec 18, 2021 at 06:05:09PM +0530, Siddhesh Poyarekar wrote:
> @@ -1440,6 +1441,53 @@ cond_expr_object_size (struct object_size_info *osi, 
> tree var, gimple *stmt)
>return reexamine;
>  }
>  
> +/* Find size of an object passed as a parameter to the function.  */
> +
> +static void
> +parm_object_size (struct object_size_info *osi, tree var)
> +{
> +  int object_size_type = osi->object_size_type;
> +  tree parm = SSA_NAME_VAR (var);
> +
> +  if (!(object_size_type & OST_DYNAMIC) || !POINTER_TYPE_P (TREE_TYPE 
> (parm)))
> +expr_object_size (osi, var, parm);

This looks very suspicious.  Didn't you mean { expr_object_size (...); return; 
} here?
Because the code below e.g. certainly assumes OST_DYNAMIC and that TREE_TYPE 
(parm)
is a pointer type (otherwise TREE_TYPE (TREE_TYPE (...) wouldn't work.

> +
> +  /* Look for access attribute.  */
> +  rdwr_map rdwr_idx;
> +
> +  tree fndecl = cfun->decl;
> +  const attr_access *access = get_parm_access (rdwr_idx, parm, fndecl);
> +  tree typesize = TYPE_SIZE_UNIT (TREE_TYPE (TREE_TYPE (parm)));
> +  tree sz = NULL_TREE;
> +
> +  if (access && access->sizarg != UINT_MAX)

Perhaps && typesize here?  It makes no sense to e.g. create ssa default def
when you aren't going to use it in any way.

> +{
> +  tree fnargs = DECL_ARGUMENTS (fndecl);
> +  tree arg = NULL_TREE;
> +  unsigned argpos = 0;
> +
> +  /* Walk through the parameters to pick the size parameter and safely
> +  scale it by the type size.  */
> +  for (arg = fnargs; argpos != access->sizarg && arg;
> +arg = TREE_CHAIN (arg), ++argpos);

Instead of a loop with empty body wouldn't it be better to
do the work in that for loop?
I.e. take argpos != access->sizarg && from the condition,
replace arg != NULL_TREE with that argpos == access->sizarg
and add a break;?

> +
> +  if (arg != NULL_TREE && INTEGRAL_TYPE_P (TREE_TYPE (arg)))
> + {
> +   sz = get_or_create_ssa_default_def (cfun, arg);

Also, I must say I'm little bit worried about this
get_or_create_ssa_default_def call.  If the SSA_NAME doesn't exist,
so you create it and then attempt to use it but in the end don't
because e.g. some PHI's another argument was unknown etc., will
that SSA_NAME be released through release_ssa_name?
I think GIMPLE is fairly unhappy if there are SSA_NAMEs created and not
released that don't appear in the IL anywhere.

> +   if (sz != NULL_TREE)
> + {
> +   sz = fold_convert (sizetype, sz);
> +   if (typesize)
> + sz = size_binop (MULT_EXPR, sz, typesize);
> + }
> + }
> +}

Jakub



Re: [PATCH v5 1/4] tree-object-size: Support dynamic sizes in conditions

2022-01-10 Thread Jakub Jelinek via Gcc-patches
On Sat, Dec 18, 2021 at 06:05:08PM +0530, Siddhesh Poyarekar wrote:
Sorry for the delay.

> +size_t
> +__attribute__ ((noinline))
> +test_builtin_calloc_condphi (size_t cnt, size_t sz, int cond)
> +{
> +  struct
> +{
> +  int a;
> +  char b;
> +} bin[cnt];
> +
> +  char *ch = __builtin_calloc (cnt, sz);
> +  size_t ret = __builtin_dynamic_object_size (cond ? ch : (void *) &bin, 0);
> +
> +  __builtin_free (ch);
> +  return ret;
> +}

> +int
> +main (int argc, char **argv)

You don't use argc nor argv, just leave those out IMO.

> +{
> +  if (test_builtin_malloc_condphi (1) != 32)
> +FAIL ();
> +  if (test_builtin_malloc_condphi (0) != 64)
> +FAIL ();

You test the above with both possibilities.

> +  if (test_builtin_calloc_condphi (128, 1, 0) == 128)
> +FAIL ();

But not this one, why?  Also, it would be better to have
a != ... test rather than ==, if it is the VLA, then 128 * sizeof (struct { int 
a; char b; })
?

> +/* Return true if VAL is represents an initial size for OBJECT_SIZE_TYPE.  */

s/is //

> +
> +static inline bool
> +size_initval_p (tree val, int object_size_type)

> +  phires = TREE_VEC_ELT (size, TREE_VEC_LENGTH (size) - 1);
> +  gphi *phi = create_phi_node (phires, gimple_bb (stmt));
> +  gphi *obj_phi =  as_a  (stmt);

Formatting, just one space before as_a.

> +  /* Expand all size expressions to put their definitions close to the 
> objects
> + for whom size is being computed.  */

English is not my primary language, but shouldn't whom be used just
when talking about persons?  So for which size instead?

>  
> +static void
> +dynamic_object_size (struct object_size_info *osi, tree var,
> +  tree *size, tree *wholesize)

Missing function comment.

> +  if (i < num_args)
> +sizes = wholesizes = size_unknown (object_size_type);

Perhaps
  ggc_free (sizes);
  gcc_free (wholesizes);
here before the assignment?

> +
> +  /* Point to the same TREE_VEC so that we can avoid emitting two PHI
> + nodes.  */
> +  if (!wholesize_needed)

and make this else if

> +wholesizes = sizes;

and ggc_free (wholesizes); before the assignment?
When it is very easy and provably correct that it will just be memory
to be GCed later...

Otherwise LGTM.

Jakub



RE: [1/3 PATCH]middle-end vect: Simplify and extend the complex numbers validation routines.

2022-01-10 Thread Tamar Christina via Gcc-patches
ping

> -Original Message-
> From: Tamar Christina
> Sent: Monday, December 20, 2021 4:19 PM
> To: Richard Sandiford ; Tamar Christina via 
> Gcc- patches 
> Cc: nd ; rguent...@suse.de
> Subject: RE: [1/3 PATCH]middle-end vect: Simplify and extend the 
> complex numbers validation routines.
> 
> 
> 
> > -Original Message-
> > From: Richard Sandiford 
> > Sent: Friday, December 17, 2021 4:19 PM
> > To: Tamar Christina via Gcc-patches 
> > Cc: Tamar Christina ; nd ; 
> > rguent...@suse.de
> > Subject: Re: [1/3 PATCH]middle-end vect: Simplify and extend the 
> > complex numbers validation routines.
> >
> > Just a comment on the documentation:
> >
> > Tamar Christina via Gcc-patches  writes:
> > > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> > >
> >
> 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..60dc5b3ea6087c2824ad1467
> > bc66
> > > e9cfebe9dcfc 100644
> > > --- a/gcc/doc/md.texi
> > > +++ b/gcc/doc/md.texi
> > > @@ -6325,12 +6325,12 @@ Perform a vector multiply and accumulate 
> > > that is semantically the same as  a multiply and accumulate of 
> > > complex
> numbers.
> > >
> > >  @smallexample
> > > -  complex TYPE c[N];
> > > -  complex TYPE a[N];
> > > -  complex TYPE b[N];
> > > +  complex TYPE op0[N];
> > > +  complex TYPE op1[N];
> > > +  complex TYPE op2[N];
> > >for (int i = 0; i < N; i += 1)
> > >  @{
> > > -  c[i] += a[i] * b[i];
> > > +  op2[i] += op1[i] * op2[i];
> > >  @}
> >
> > I think this should be:
> >
> >   op0[i] = op1[i] * op2[i] + op3[i];
> >
> > since operand 0 is the output and operand 3 is the accumulator input.
> >
> > Same idea for the others.  For:
> >
> > > @@ -6415,12 +6415,12 @@ Perform a vector multiply that is 
> > > semantically the same as multiply of  complex numbers.
> > >
> > >  @smallexample
> > > -  complex TYPE c[N];
> > > -  complex TYPE a[N];
> > > -  complex TYPE b[N];
> > > +  complex TYPE op0[N];
> > > +  complex TYPE op1[N];
> > > +  complex TYPE op2[N];
> > >for (int i = 0; i < N; i += 1)
> > >  @{
> > > -  c[i] = a[i] * b[i];
> > > +  op2[i] = op0[i] * op1[i];
> >
> > …this I think it should be:
> >
> >   op0[i] = op1[i] * op2[i];
> 
> Updated patch attached.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu 
> and no regressions.
> 
> Ok for master? and backport to GCC 11 after some stew?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/102819
>   PR tree-optimization/103169
>   * doc/md.texi: Update docs for cfms, cfma.
>   * tree-data-ref.h (same_data_refs): Accept optional offset.
>   * tree-vect-slp-patterns.c (is_linear_load_p): Fix issue with repeating
>   patterns.
>   (vect_normalize_conj_loc): Remove.
>   (is_eq_or_top): Change to take two nodes.
>   (enum _conj_status, compatible_complex_nodes_p,
>   vect_validate_multiplication): New.
>   (class complex_add_pattern, complex_add_pattern::matches,
>   complex_add_pattern::recognize, class complex_mul_pattern,
>   complex_mul_pattern::recognize, class complex_fms_pattern,
>   complex_fms_pattern::recognize, class complex_operations_pattern,
>   complex_operations_pattern::recognize,
> addsub_pattern::recognize): Pass
>   new cache.
>   (complex_fms_pattern::matches, complex_mul_pattern::matches):
> Pass new
>   cache and use new validation code.
>   * tree-vect-slp.c (vect_match_slp_patterns_2, 
> vect_match_slp_patterns,
>   vect_analyze_slp): Pass along cache.
>   (compatible_calls_p): Expose.
>   * tree-vectorizer.h (compatible_calls_p, slp_node_hash,
>   slp_compat_nodes_map_t): New.
>   (class vect_pattern): Update signatures include new cache.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/102819
>   PR tree-optimization/103169
>   * g++.dg/vect/pr99149.cc: xfail for now.
>   * gcc.dg/vect/complex/pr102819-1.c: New test.
>   * gcc.dg/vect/complex/pr102819-2.c: New test.
>   * gcc.dg/vect/complex/pr102819-3.c: New test.
>   * gcc.dg/vect/complex/pr102819-4.c: New test.
>   * gcc.dg/vect/complex/pr102819-5.c: New test.
>   * gcc.dg/vect/complex/pr102819-6.c: New test.
>   * gcc.dg/vect/complex/pr102819-7.c: New test.
>   * gcc.dg/vect/complex/pr102819-8.c: New test.
>   * gcc.dg/vect/complex/pr102819-9.c: New test.
>   * gcc.dg/vect/complex/pr103169.c: New test.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index 
> 9ec051e94e10cca9eec2773e1b8c01b74b6ea4db..ad06b02d36876082afe4c3f3f
> b51887f7a522b23 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6325,12 +6325,13 @@ Perform a vector multiply and accumulate that 
> is semantically the same as  a multiply and accumulate of complex numbers.
> 
>  @smallexample
> -  complex TYPE c[N];
> -  complex TYPE a[N];
> -  complex TYPE b[N];
> +  complex TYPE op0[N];
> +  complex TYPE op1[N];
> +  complex TYPE op2[N];
> +  complex TYPE op3[N];
>  

[PATCH] tree-optimization/100359 - restore unroll at -O3

2022-01-10 Thread Richard Biener via Gcc-patches
This fixes a mistake done with r8-5008 when introducing
allow_peel to the unroll code.  The intent was to allow
peeling that doesn't grow code but the result was that
with -O3 and UL_ALL this wasn't done.  The following
instantiates the desired effect by adjusting ul to UL_NO_GROWTH
if peeling is not allowed.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2022-01-05  Richard Biener  

PR tree-optimization/100359
* tree-ssa-loop-ivcanon.c (try_unroll_loop_completely):
Allow non-growing peeling with !allow_peel and UL_ALL.

* gcc.dg/tree-ssa/pr100359.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr100359.c | 31 
 gcc/tree-ssa-loop-ivcanon.c  |  6 -
 2 files changed, 36 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr100359.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr100359.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr100359.c
new file mode 100644
index 000..29243522caa
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr100359.c
@@ -0,0 +1,31 @@
+/* { dg-do link } */
+/* { dg-options "-O3 -fdump-tree-cunrolli-optimized" } */
+
+extern void foo(void);
+static int b, f, *a = &b;
+int **c = &a;
+static void d() {
+  int g, h;
+  for (f = 0; f < 1; f++) {
+int *i = &b;
+{
+  int *j[3], **k = &a;
+  for (g = 0; g < 3; g++)
+for (h = 0; h < 1; h++)
+  j[g] = &b;
+  *k = j[0];
+}
+*c = i;
+  }
+}
+int main() {
+  d();
+  *a = 0;
+  if (**c)
+foo();
+  return 0;
+}
+
+/* Verify that we unroll the inner loop early even with -O3.  */
+/* { dg-final { scan-tree-dump "loop with 1 iterations completely unrolled" 
"cunrolli" } }  */
+/* { dg-final { scan-tree-dump "loop with 3 iterations completely unrolled" 
"cunrolli" } }  */
diff --git a/gcc/tree-ssa-loop-ivcanon.c b/gcc/tree-ssa-loop-ivcanon.c
index 4f1e3537f05..e2ac2044741 100644
--- a/gcc/tree-ssa-loop-ivcanon.c
+++ b/gcc/tree-ssa-loop-ivcanon.c
@@ -720,7 +720,7 @@ try_unroll_loop_completely (class loop *loop,
 exit = NULL;
 
   /* See if we can improve our estimate by using recorded loop bounds.  */
-  if ((allow_peel || maxiter == 0 || ul == UL_NO_GROWTH)
+  if ((maxiter == 0 || ul != UL_SINGLE_ITER)
   && maxiter >= 0
   && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll))
 {
@@ -729,6 +729,10 @@ try_unroll_loop_completely (class loop *loop,
   /* Loop terminates before the IV variable test, so we cannot
 remove it in the last iteration.  */
   edge_to_cancel = NULL;
+  /* If we do not allow peeling and we iterate just allow cases
+that do not grow code.  */
+  if (!allow_peel && maxiter != 0)
+   ul = UL_NO_GROWTH;
 }
 
   if (!n_unroll_found)
-- 
2.31.1


[Ada] Fix bogus error on call to subprogram with incomplete profile

2022-01-10 Thread Pierre-Marie de Rodat via Gcc-patches
This fixes a bad interaction between the machinery used to build subprogram
types referencing incomplete types and the Copy-In/Copy-Out mechanism used
to implement In/Out and Out parameters of elementary types in subprograms.

The latter mechanism cannot be finalized until after incomplete types are
replaced with their full view, both of which actions needing to take place
before the first call to the subprogram is translated.  The first constraint
was not effectively met, leading to a confused error message.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* gcc-interface/trans.c (Identifier_to_gnu): Use correct subtype.
(elaborate_profile): New function.
(Call_to_gnu): Call it on the formals and the result type before
retrieving the translated result type from the subprogram type.diff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -1171,7 +1171,7 @@ Identifier_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p)
  specific circumstances only, so evaluated lazily.  < 0 means
  unknown, > 0 means known true, 0 means known false.  */
   int require_lvalue = -1;
-  Node_Id gnat_result_type;
+  Entity_Id gnat_result_type;
   tree gnu_result, gnu_result_type;
 
   /* If the Etype of this node is not the same as that of the Entity, then
@@ -4457,6 +4457,22 @@ return_slot_opt_for_pure_call_p (tree target, tree call)
   return !bitmap_bit_p (decls, DECL_UID (target));
 }
 
+/* Elaborate types referenced in the profile (FIRST_FORMAL, RESULT_TYPE).  */
+
+static void
+elaborate_profile (Entity_Id first_formal, Entity_Id result_type)
+{
+  Entity_Id formal;
+
+  for (formal = first_formal;
+   Present (formal);
+   formal = Next_Formal_With_Extras (formal))
+(void) gnat_to_gnu_type (Etype (formal));
+
+  if (Present (result_type) && Ekind (result_type) != E_Void)
+(void) gnat_to_gnu_type (result_type);
+}
+
 /* Subroutine of gnat_to_gnu to translate gnat_node, either an N_Function_Call
or an N_Procedure_Call_Statement, to a GCC tree, which is returned.
GNU_RESULT_TYPE_P is a pointer to where we should place the result type.
@@ -4481,7 +4497,7 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target,
   /* The FUNCTION_TYPE node giving the GCC type of the subprogram.  */
   tree gnu_subprog_type = TREE_TYPE (gnu_subprog);
   /* The return type of the FUNCTION_TYPE.  */
-  tree gnu_result_type = TREE_TYPE (gnu_subprog_type);
+  tree gnu_result_type;;
   const bool frontend_builtin
 = (TREE_CODE (gnu_subprog) == FUNCTION_DECL
&& DECL_BUILT_IN_CLASS (gnu_subprog) == BUILT_IN_FRONTEND);
@@ -4496,6 +4512,7 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target,
   bool variadic;
   bool by_descriptor;
   Entity_Id gnat_formal;
+  Entity_Id gnat_result_type;
   Node_Id gnat_actual;
   atomic_acces_t aa_type;
   bool aa_sync;
@@ -4510,6 +4527,7 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target,
 	= Underlying_Type (Etype (Prefix (gnat_subprog)));
 
   gnat_formal = First_Formal_With_Extras (Etype (gnat_subprog));
+  gnat_result_type = Etype (Etype (gnat_subprog));
   variadic = IN (Convention (gnat_prefix_type), Convention_C_Variadic);
 
   /* If the access type doesn't require foreign-compatible representation,
@@ -4523,6 +4541,7 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target,
 {
   /* Assume here that this must be 'Elab_Body or 'Elab_Spec.  */
   gnat_formal = Empty;
+  gnat_result_type = Empty;
   variadic = false;
   by_descriptor = false;
 }
@@ -4532,6 +4551,7 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target,
   gcc_checking_assert (Is_Entity_Name (gnat_subprog));
 
   gnat_formal = First_Formal_With_Extras (Entity (gnat_subprog));
+  gnat_result_type = Etype (Entity_Id (gnat_subprog));
   variadic = IN (Convention (Entity (gnat_subprog)), Convention_C_Variadic);
   by_descriptor = false;
 
@@ -4549,6 +4569,7 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target,
 
 	  if (returning_value)
 	{
+	  gnu_result_type = TREE_TYPE (gnu_subprog_type);
 	  *gnu_result_type_p = gnu_result_type;
 	  return build1 (NULL_EXPR, gnu_result_type, call_expr);
 	}
@@ -4557,7 +4578,13 @@ Call_to_gnu (Node_Id gnat_node, tree *gnu_result_type_p, tree gnu_target,
 	}
 }
 
+  /* We must elaborate the entire profile now because, if it references types
+ that were initially incomplete,, their elaboration changes the contents
+ of GNU_SUBPROG_TYPE and, in particular, may change the result type.  */
+  elaborate_profile (gnat_formal, gnat_result_type);
+
   gcc_assert (FUNC_OR_METHOD_TYPE_P (gnu_subprog_type));
+  gnu_result_type = TREE_TYPE (gnu_subprog_type);
 
   if (TREE_CODE (gnu_subprog) == FUNCTION_DECL)
 {




[Ada] Fix internal error on unchecked union with component clauses

2022-01-10 Thread Pierre-Marie de Rodat via Gcc-patches
The issue arises when the unchecked union contains nested variants, i.e.
variants containing themselves a variant part, and is subject to a full
representation clause covering all the components in all the variants,
when the component clauses do not align the variant boundaries with byte
boundaries consistently.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* gcc-interface/decl.c (gnat_to_gnu_entity) : Fix
computation of boolean result in the unchecked union case.
(components_to_record): Rename MAYBE_UNUSED parameter to IN_VARIANT
and remove local variable of the same name.  Pass NULL recursively
as P_GNU_REP_LIST for nested variants in the unchecked union case.diff --git a/gcc/ada/gcc-interface/decl.c b/gcc/ada/gcc-interface/decl.c
--- a/gcc/ada/gcc-interface/decl.c
+++ b/gcc/ada/gcc-interface/decl.c
@@ -3059,7 +3059,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entity, tree gnu_expr, bool definition)
 	   Present (gnat_field);
 	   gnat_field = Next_Entity (gnat_field))
 	if ((Ekind (gnat_field) == E_Component
-		 || Ekind (gnat_field) == E_Discriminant)
+		 || (Ekind (gnat_field) == E_Discriminant
+		 && !is_unchecked_union))
 		&& No (Component_Clause (gnat_field)))
 	  {
 		all_rep = false;
@@ -7874,8 +7875,7 @@ typedef struct vinfo
 
DEBUG_INFO is true if we need to write debug information about the type.
 
-   MAYBE_UNUSED is true if this type may be unused in the end; this doesn't
-   mean that its contents may be unused as well, only the container itself.
+   IN_VARIANT is true if the componennt list is that of a variant.
 
FIRST_FREE_POS, if nonzero, is the first (lowest) free field position in
the outer record type down to this variant level.  It is nonzero only if
@@ -7890,7 +7890,7 @@ components_to_record (Node_Id gnat_component_list, Entity_Id gnat_record_type,
 		  tree gnu_field_list, tree gnu_record_type, int packed,
 		  bool definition, bool cancel_alignment, bool all_rep,
 		  bool unchecked_union, bool artificial, bool debug_info,
-		  bool maybe_unused, tree first_free_pos,
+		  bool in_variant, tree first_free_pos,
 		  tree *p_gnu_rep_list)
 {
   const bool needs_xv_encodings
@@ -8075,15 +8075,21 @@ components_to_record (Node_Id gnat_component_list, Entity_Id gnat_record_type,
 		= TYPE_SIZE_UNIT (gnu_record_type);
 	}
 
-	  /* Add the fields into the record type for the variant.  Note that
-	 we aren't sure to really use it at this point, see below.  */
+	  /* Add the fields into the record type for the variant but note that
+	 we aren't sure to really use it at this point, see below.  In the
+	 case of an unchecked union, we force the fields with a rep clause
+	 present in a nested variant to be moved to the outermost variant,
+	 so as to flatten the rep-ed layout as much as possible, the reason
+	 being that we cannot do any flattening when a subtype statically
+	 selects a variant later on, for example for an aggregate.  */
 	  has_rep
 	= components_to_record (Component_List (variant), gnat_record_type,
 NULL_TREE, gnu_variant_type, packed,
 definition, !all_rep_and_size, all_rep,
 unchecked_union, true, needs_xv_encodings,
 true, this_first_free_pos,
-all_rep || this_first_free_pos
+(all_rep || this_first_free_pos)
+&& !(in_variant && unchecked_union)
 ? NULL : &gnu_rep_list);
 
 	  /* Translate the qualifier and annotate the GNAT node.  */
@@ -8206,9 +8212,9 @@ components_to_record (Node_Id gnat_component_list, Entity_Id gnat_record_type,
 	  finish_record_type (gnu_union_type, nreverse (gnu_variant_list),
 			  all_rep_and_size ? 1 : 0, needs_xv_encodings);
 
-	  /* If GNU_UNION_TYPE is our record type, it means we must have an
-	 Unchecked_Union with no fields.  Verify that and, if so, just
-	 return.  */
+	  /* If GNU_UNION_TYPE is our record type, this means that we must have
+	 an Unchecked_Union whose fields are all in the variant part.  Now
+	 verify that and, if so, just return.  */
 	  if (gnu_union_type == gnu_record_type)
 	{
 	  gcc_assert (unchecked_union
@@ -8275,7 +8281,6 @@ components_to_record (Node_Id gnat_component_list, Entity_Id gnat_record_type,
 = (Convention (gnat_record_type) == Convention_Ada
&& Warn_On_Questionable_Layout
&& !(No_Reordering (gnat_record_type) && GNAT_Mode));
-  const bool in_variant = (p_gnu_rep_list != NULL);
   tree gnu_zero_list = NULL_TREE;
   tree gnu_self_list = NULL_TREE;
   tree gnu_var_list = NULL_TREE;
@@ -8640,7 +8645,7 @@ components_to_record (Node_Id gnat_component_list, Entity_Id gnat_record_type,
   TYPE_ARTIFICIAL (gnu_record_type) = artificial;
 
   finish_record_type (gnu_record_type, gnu_field_list, layout_with_rep ? 1 : 0,
-		  debug_info && !maybe_unused);
+		  debug_info && !in_variant);
 
   /* Chain the fields with zero size at the beginning of the fie

[Ada] Make pragma Inspection_Point work for constants

2022-01-10 Thread Pierre-Marie de Rodat via Gcc-patches
This entails marking the pragma as requiring an lvalue and explicitly going
to the corresponding variable of the constants, which is always built since
the front-end marks the constants as having their address taken.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* gcc-interface/trans.c (lvalue_required_p) : New case.
: Likewise.
(Pragma_to_gnu) : Fetch the corresponding
variable of a constant before marking it as addressable.diff --git a/gcc/ada/gcc-interface/trans.c b/gcc/ada/gcc-interface/trans.c
--- a/gcc/ada/gcc-interface/trans.c
+++ b/gcc/ada/gcc-interface/trans.c
@@ -865,6 +865,20 @@ lvalue_required_p (Node_Id gnat_node, tree gnu_type, bool constant,
 	  || must_pass_by_ref (gnu_type)
 	  || default_pass_by_ref (gnu_type));
 
+case N_Pragma_Argument_Association:
+  return lvalue_required_p (gnat_parent, gnu_type, constant,
+address_of_constant);
+
+case N_Pragma:
+  if (Is_Pragma_Name (Chars (Pragma_Identifier (gnat_parent
+	{
+	  const unsigned char id
+	= Get_Pragma_Id (Chars (Pragma_Identifier (gnat_parent)));
+	  return id == Pragma_Inspection_Point;
+	}
+  else
+	return 0;
+
 case N_Indexed_Component:
   /* Only the array expression can require an lvalue.  */
   if (Prefix (gnat_parent) != gnat_node)
@@ -1387,6 +1401,9 @@ Pragma_to_gnu (Node_Id gnat_node)
 	  char *comment;
 #endif
 	  gnu_expr = maybe_unconstrained_array (gnu_expr);
+	  if (TREE_CODE (gnu_expr) == CONST_DECL
+	  && DECL_CONST_CORRESPONDING_VAR (gnu_expr))
+	gnu_expr = DECL_CONST_CORRESPONDING_VAR (gnu_expr);
 	  gnat_mark_addressable (gnu_expr);
 
 #ifdef ASM_COMMENT_START




[Ada] Switch from __sync to __atomic builtins for Lock_Free_Try_Write

2022-01-10 Thread Pierre-Marie de Rodat via Gcc-patches
Routine Lock_Free_Try_Write was using deprecated __sync GCC builtins.
Now it uses __atomic builtins, which are recommended for new code.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-atopri.ads (Atomic_Compare_Exchange): Replaces
deprecated Sync_Compare_And_Swap.
* libgnat/s-atopri.adb (Lock_Free_Try_Write): Switch from __sync
to __atomic builtins.diff --git a/gcc/ada/libgnat/s-atopri.adb b/gcc/ada/libgnat/s-atopri.adb
--- a/gcc/ada/libgnat/s-atopri.adb
+++ b/gcc/ada/libgnat/s-atopri.adb
@@ -55,23 +55,16 @@ package body System.Atomic_Primitives is
Expected : in out Atomic_Type;
Desired  : Atomic_Type) return Boolean
is
-  function My_Sync_Compare_And_Swap is
-new Sync_Compare_And_Swap (Atomic_Type);
-
-  Actual : Atomic_Type;
+  function My_Atomic_Compare_Exchange is
+new Atomic_Compare_Exchange (Atomic_Type);
 
begin
   if Expected /= Desired then
  if Atomic_Type'Atomic_Always_Lock_Free then
-Actual := My_Sync_Compare_And_Swap (Ptr, Expected, Desired);
+return My_Atomic_Compare_Exchange (Ptr, Expected'Address, Desired);
  else
 raise Program_Error;
  end if;
-
- if Actual /= Expected then
-Expected := Actual;
-return False;
- end if;
   end if;
 
   return True;


diff --git a/gcc/ada/libgnat/s-atopri.ads b/gcc/ada/libgnat/s-atopri.ads
--- a/gcc/ada/libgnat/s-atopri.ads
+++ b/gcc/ada/libgnat/s-atopri.ads
@@ -80,17 +80,20 @@ package System.Atomic_Primitives is
 
generic
   type Atomic_Type is mod <>;
-   function Sync_Compare_And_Swap
- (Ptr  : Address;
-  Expected : Atomic_Type;
-  Desired  : Atomic_Type) return Atomic_Type;
+   function Atomic_Compare_Exchange
+ (Ptr   : Address;
+  Expected  : Address;
+  Desired   : Atomic_Type;
+  Weak  : Boolean   := False;
+  Success_Model : Mem_Model := Seq_Cst;
+  Failure_Model : Mem_Model := Seq_Cst) return Boolean;
pragma Import
- (Intrinsic, Sync_Compare_And_Swap, "__sync_val_compare_and_swap");
+ (Intrinsic, Atomic_Compare_Exchange, "__atomic_compare_exchange_n");
 
-   function Sync_Compare_And_Swap_8  is new Sync_Compare_And_Swap (uint8);
-   function Sync_Compare_And_Swap_16 is new Sync_Compare_And_Swap (uint16);
-   function Sync_Compare_And_Swap_32 is new Sync_Compare_And_Swap (uint32);
-   function Sync_Compare_And_Swap_64 is new Sync_Compare_And_Swap (uint64);
+   function Atomic_Compare_Exchange_8  is new Atomic_Compare_Exchange (uint8);
+   function Atomic_Compare_Exchange_16 is new Atomic_Compare_Exchange (uint16);
+   function Atomic_Compare_Exchange_32 is new Atomic_Compare_Exchange (uint32);
+   function Atomic_Compare_Exchange_64 is new Atomic_Compare_Exchange (uint64);
 
function Atomic_Test_And_Set
  (Ptr   : System.Address;




[Ada] Remove CodePeer annotations for pragma Loop_Variant

2022-01-10 Thread Pierre-Marie de Rodat via Gcc-patches
Pragma Loop_Variant is now expanded into a null statement in CodePeer
mode. Remove annotations related to false positives in runtime units.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-exponn.adb, libgnat/s-expont.adb,
libgnat/s-exponu.adb, libgnat/s-widthi.adb,
libgnat/s-widthu.adb: Remove CodePeer annotations for pragma
Loop_Variant.diff --git a/gcc/ada/libgnat/s-exponn.adb b/gcc/ada/libgnat/s-exponn.adb
--- a/gcc/ada/libgnat/s-exponn.adb
+++ b/gcc/ada/libgnat/s-exponn.adb
@@ -130,9 +130,6 @@ is
  pragma Loop_Invariant
(Big (Result) * Big (Factor) ** Exp = Big (Left) ** Right);
  pragma Loop_Variant (Decreases => Exp);
- pragma Annotate
-   (CodePeer, False_Positive,
-"validity check", "confusion on generated code");
 
  if Exp rem 2 /= 0 then
 declare


diff --git a/gcc/ada/libgnat/s-expont.adb b/gcc/ada/libgnat/s-expont.adb
--- a/gcc/ada/libgnat/s-expont.adb
+++ b/gcc/ada/libgnat/s-expont.adb
@@ -130,9 +130,6 @@ is
  pragma Loop_Invariant
(Big (Result) * Big (Factor) ** Exp = Big (Left) ** Right);
  pragma Loop_Variant (Decreases => Exp);
- pragma Annotate
-   (CodePeer, False_Positive,
-"validity check", "confusion on generated code");
 
  if Exp rem 2 /= 0 then
 declare


diff --git a/gcc/ada/libgnat/s-exponu.adb b/gcc/ada/libgnat/s-exponu.adb
--- a/gcc/ada/libgnat/s-exponu.adb
+++ b/gcc/ada/libgnat/s-exponu.adb
@@ -64,9 +64,6 @@ begin
  pragma Loop_Invariant (Exp > 0);
  pragma Loop_Invariant (Result * Factor ** Exp = Left ** Right);
  pragma Loop_Variant (Decreases => Exp);
- pragma Annotate
-   (CodePeer, False_Positive,
-"validity check", "confusion on generated code");
 
  if Exp rem 2 /= 0 then
 pragma Assert


diff --git a/gcc/ada/libgnat/s-widthi.adb b/gcc/ada/libgnat/s-widthi.adb
--- a/gcc/ada/libgnat/s-widthi.adb
+++ b/gcc/ada/libgnat/s-widthi.adb
@@ -163,9 +163,6 @@ begin
  pragma Loop_Invariant (Pow = Big_10 ** (W - 2));
  pragma Loop_Invariant (Big (T) = Big (T_Init) / Pow);
  pragma Loop_Variant (Decreases => T);
- pragma Annotate
-   (CodePeer, False_Positive,
-"validity check", "confusion on generated code");
   end loop;
 
   declare


diff --git a/gcc/ada/libgnat/s-widthu.adb b/gcc/ada/libgnat/s-widthu.adb
--- a/gcc/ada/libgnat/s-widthu.adb
+++ b/gcc/ada/libgnat/s-widthu.adb
@@ -156,9 +156,6 @@ begin
  pragma Loop_Invariant (Pow = Big_10 ** (W - 2));
  pragma Loop_Invariant (Big (T) = Big (T_Init) / Pow);
  pragma Loop_Variant (Decreases => T);
- pragma Annotate
-   (CodePeer, False_Positive,
-"validity check", "confusion on generated code");
   end loop;
 
   declare




[Ada] Disable expansion of pragma Loop_Variant in CodePeer mode

2022-01-10 Thread Pierre-Marie de Rodat via Gcc-patches
Pragma Loop_Variant is expanded into code which is too complicated for
CodePeer to handle and results in messages with internal names. Disable
expansion.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_prag.adb (Expand_Pragma_Loop_Variant): Disable expansion
in CodePeer mode.diff --git a/gcc/ada/exp_prag.adb b/gcc/ada/exp_prag.adb
--- a/gcc/ada/exp_prag.adb
+++ b/gcc/ada/exp_prag.adb
@@ -2692,8 +2692,11 @@ package body Exp_Prag is
begin
   --  If pragma is not enabled, rewrite as Null statement. If pragma is
   --  disabled, it has already been rewritten as a Null statement.
+  --
+  --  Likewise, do this in CodePeer mode, because the expanded code is too
+  --  complicated for CodePeer to analyse.
 
-  if Is_Ignored (N) then
+  if Is_Ignored (N) or else CodePeer_Mode then
  Rewrite (N, Make_Null_Statement (Loc));
  Analyze (N);
  return;




  1   2   >