[PATCH 1/2] SVE intrinsics: Add strength reduction for division by constant.

2024-07-17 Thread Jennifer Schmitz
This patch folds signed SVE division where all divisor elements are the same
power of 2 to svasrd. Tests were added to check 1) whether the transform is
applied, i.e. asrd is used, and 2) correctness for all possible input types
for svdiv, predication, and a variety of values. As the transform is applied
only to signed integers, correctness for predication and values was only
tested for svint32_t and svint64_t.
Existing svdiv tests were adjusted such that the divisor is no longer a
power of 2.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/

* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl): Implement
fold and expand.

gcc/testsuite/

* gcc.target/aarch64/sve/div_const_1.c: New test.
* gcc.target/aarch64/sve/div_const_1_run.c: Likewise.
* gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected output.
* gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.


0001-SVE-intrinsics-Add-strength-reduction-for-division-b.patch
Description: Binary data


smime.p7s
Description: S/MIME cryptographic signature


[PATCH 2/2] SVE intrinsics: Add strength reduction for division by constant.

2024-07-17 Thread Jennifer Schmitz
This patch folds signed SVE division where all divisor elements are any power
of 2 to svasr. Tests were added to check 1) whether the transform is applied
and 2) to check correctness for a variety of values.

The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
OK for mainline?

Signed-off-by: Jennifer Schmitz 

gcc/

* config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl): Extend
fold and expand.

gcc/testsuite/

* gcc.target/aarch64/sve/div_const_1.c: New test.
* gcc.target/aarch64/sve/div_const_1_run.c: Likewise.


0002-SVE-intrinsics-Add-strength-reduction-for-division-b.patch
Description: Binary data


smime.p7s
Description: S/MIME cryptographic signature


Re: [PATCH] RISC-V: Fix testcase missing arch attribute

2024-07-17 Thread Kito Cheng
LGTM :)

On Wed, Jul 17, 2024 at 9:15 AM Edwin Lu  wrote:
>
> The C + F extentions implies the zcf extension on rv32. Add missing zcf
> extension for the rv32 target.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/target-attr-16.c: Update expected assembly
>
> Signed-off-by: Edwin Lu 
> ---
>  gcc/testsuite/gcc.target/riscv/target-attr-16.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/testsuite/gcc.target/riscv/target-attr-16.c 
> b/gcc/testsuite/gcc.target/riscv/target-attr-16.c
> index 1c7badccdee..c6b626d0c6c 100644
> --- a/gcc/testsuite/gcc.target/riscv/target-attr-16.c
> +++ b/gcc/testsuite/gcc.target/riscv/target-attr-16.c
> @@ -24,5 +24,5 @@ void bar (void)
>  {
>  }
>
> -/* { dg-final { scan-assembler-times ".option arch, 
> rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zba1p0_zbb1p0"
>  4 { target { rv32 } } } } */
> +/* { dg-final { scan-assembler-times ".option arch, 
> rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zcf1p0_zba1p0_zbb1p0"
>  4 { target { rv32 } } } } */
>  /* { dg-final { scan-assembler-times ".option arch, 
> rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zba1p0_zbb1p0"
>  4 { target { rv64 } } } } */
> --
> 2.34.1
>


Re: [RFC/RFA][PATCH 0/2] SVE intrinsics: Add strength reduction for division by constant.

2024-07-17 Thread Richard Sandiford
Jennifer Schmitz  writes:
> This patch series is part of an ongoing effort to replace the SVE intrinsic 
> svdiv
> by lower-strength instructions for division by constant. To that end, we
> implemented svdiv_impl::fold to perform the following transformation in 
> gimple:
> - Division where all divisors are the same power of 2 --> svasrd

Sounds good.

> - Division where all divisors are powers of 2 --> svasr

I don't think this is correct for negative dividends (which is why
ASRD exists).  E.g. -1 / 4 is 0 as computed by svdiv (round towards zero),
but -1 as computed by svasr (round towards -Inf).

> We chose svdiv_impl::fold as location for the implementation to have the
> transform applied as early as possible, such that other (existing or future)
> gimple optimizations can be applied on the result.
> Currently, the transform to is only applied for signed integers, because
> there do not exist an unsigned svasrd and svasr. The transform has not (yet)
> been implemented for svdivr.

FWIW, using svlsr for unsigned divisions should be OK.

> Please also comment/advise on the following:
> In a next patch, we would like to replace SVE division by constants (other
> than powers of 2) by multiply and shifts, similar as for scalar division.
> This is planned to be implemented in the gimple_folder as well. Thoughts?

I'm a bit uneasy about going that far.  I suppose it comes down to a
question about what intrinsics are for.  Are they for describing an
algorithm, or for hand-optimising a specific implementation of the
algorithm?  IMO it's the latter.

If people want to write out a calculation in natural arithmetic, it
would be better to write the algorithm in scalar code and let the
vectoriser handle it.  That gives the opportunity for many more
optimisations than just this one.

Intrinsics are about giving programmers direct, architecture-level
control over how something is implemented.  I've seen Arm's library
teams go to great lengths to work out which out of a choice of
instruction sequences is the best one, even though the sequences in
question would look functionally equivalent to a smart-enough compiler.

So part of the work of using intrinsics is to figure out what the best
sequence is.  And IMO, part of the contract is that the compiler
shouldn't interfere with the programmer's choices too much.  If the
compiler makes a change, it must very confident that it is a win for
the function as a whole.

Replacing one division with one shift is fine, as an aid to the programmer.
It removes the need for (say) templated functions to check for that case
manually.  Constant folding is fine too, for similar reasons.  In these
cases, there's not really a cost/benefit choice to be made between
different expansions.  One choice is objectively better in all
realistic situations.

But when it comes to general constants, there are many different choices
that could be made when deciding which constants should be open-coded
and which shouldn't.  IMO we should leave the choice to the programmer
in those cases.  If the compiler gets it wrong, there will be no way
for the programmer to force the compiler's hand ("no, when I say svdiv,
I really do mean svdiv").

Thanks,
Richard


[PATCH] libcpp, c++: Optimize initializers using #embed in C++

2024-07-17 Thread Jakub Jelinek
Hi!

This patch on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655012.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/655013.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657049.html
patches which just introduce non-optimized support for the C23 feature
and two extensions to it actually optimizes it and on top of the
https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657053.html
patch which adds optimizations to C & middle-end adds similar
optimizations to the C++ FE.
The first hunk enables use of CPP_EMBED token even for C++, not just
C; the preprocessor guarantees there is always a CPP_NUMBER CPP_COMMA
before CPP_EMBED and CPP_COMMA CPP_NUMBER after it which simplifies
parsing (unless #embed is more than 2GB, in that case it could be
CPP_NUMBER CPP_COMMA CPP_EMBED CPP_COMMA CPP_EMBED CPP_COMMA CPP_EMBED
CPP_COMMA CPP_NUMBER etc. with each CPP_EMBED covering at most INT_MAX
bytes).
Similarly to the C patch, this patch parses it into RAW_DATA_CST tree
in the braced initializers (and from there peels into INTEGER_CSTs unless
it is an initializer of an std::byte array or integral array with CHAR_BIT
element precision), parses CPP_EMBED in cp_parser_expression into just
the last INTEGER_CST in it because I think users don't need millions of
-Wunused-value warnings because they did useless
  int a = (
  #embed "megabyte.dat"
  );
and so most of the inner INTEGER_CSTs would be there just for the warning,
and in the rest of contexts like template argument list, function argument
list, attribute argument list, ...) parse it into a sequence of INTEGER_CSTs
(I wrote a range/iterator classes to simplify that).

My dumb
cat embed-11.c
constexpr unsigned char a[] = {
#embed "cc1plus"
};
const unsigned char *b = a;
testcase where cc1plus is 492329008 bytes long when configured
--enable-checking=yes,rtl,extra against recent binutils with .base64 gas
support results in:
time ./xg++ -B ./ -S -O2 embed-11.c

real0m4.350s
user0m2.427s
sys 0m0.830s
time ./xg++ -B ./ -c -O2 embed-11.c

real0m6.932s
user0m6.034s
sys 0m0.888s
(compared to running out of memory or very long compilation).
On a shorter inclusion,
cat embed-12.c
constexpr unsigned char a[] = {
#embed "xg++"
};
const unsigned char *b = a;
where xg++ is 15225904 bytes long, this takes using GCC with the #embed
patchset except for this patch:
time ~/src/gcc/obj36/gcc/xg++ -B ~/src/gcc/obj36/gcc/ -S -O2 embed-12.c

real0m33.190s
user0m32.327s
sys 0m0.790s
and with this patch:
time ./xg++ -B ./ -S -O2 embed-12.c

real0m0.118s
user0m0.090s
sys 0m0.028s

The patch doesn't change anything on what the first patch in the series
introduces even for C++, namely that #embed is expanded (actually or as if)
into a sequence of literals like
127,69,76,70,2,1,1,3,0,0,0,0,0,0,0,0,2,0,62,0,1,0,0,0,80,211,64,0,0,0,0,0,64,0,0,0,0,0,0,0,8,253
and so each element has int type.
That is how I believe it is in C23, and the different versions of the
C++ P1967 paper specified there some casts, P1967R12 in particular
"Otherwise, the integral constant expression is the value of std::fgetc’s 
return is cast
to unsigned char."
but please see
https://github.com/llvm/llvm-project/pull/97274#issuecomment-2230929277
comment and whether we really want the preprocessor to preprocess it for
C++ as (or as-if)
static_cast(127),static_cast(69),static_cast(76),static_cast(70),static_cast(2),...
i.e. 9 tokens per byte rather than 2, or
(unsigned char)127,(unsigned char)69,...
or
((unsigned char)127),((unsigned char)69),...
etc.
Without a literal suffix for unsigned char constant literals it is horrible,
plus the incompatibility between C and C++.  Sure, we could use the magic
form more often for C++ to save the size and do the 9 or how many tokens
form only for the boundary constants and use #embed "." 
__gnu__::__base64__("...")
for what is in between if there are at least 2 tokens inside of it.
E.g. (unsigned char)127 vs. static_cast(127) behaves
differently if there is constexpr long long p[] = { ... };
...
#embed __FILE__
[p]

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk if
the rest of the series is approved?

2024-07-17  Jakub Jelinek  

libcpp/
* files.cc (finish_embed): Use CPP_EMBED even for C++.
gcc/cp/ChangeLog:
* cp-tree.h (class raw_data_iterator): New type.
(class raw_data_range): New type.
* parser.cc (cp_parser_postfix_open_square_expression): Handle
parsing of CPP_EMBED.
(cp_parser_parenthesized_expression_list): Likewise.  Use
cp_lexer_next_token_is.
(cp_parser_expression): Handle parsing of CPP_EMBED.
(cp_parser_template_argument_list): Likewise.
(cp_parser_initializer_list): Likewise.
(cp_parser_oacc_clause_tile): Likewise.
(cp_parser_omp_tile_sizes): Likewise.
* pt.cc (tsubst_expr): Handle RAW_DATA_CST.
* constexpr.cc (reduced_constant_expression_p): Likewise.
(r

Re: [PATCH 1/2] SVE intrinsics: Add strength reduction for division by constant.

2024-07-17 Thread Richard Sandiford
Jennifer Schmitz  writes:
> This patch folds signed SVE division where all divisor elements are the same
> power of 2 to svasrd. Tests were added to check 1) whether the transform is
> applied, i.e. asrd is used, and 2) correctness for all possible input types
> for svdiv, predication, and a variety of values. As the transform is applied
> only to signed integers, correctness for predication and values was only
> tested for svint32_t and svint64_t.
> Existing svdiv tests were adjusted such that the divisor is no longer a
> power of 2.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
>
>   * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl): Implement
>   fold and expand.
>
> gcc/testsuite/
>
>   * gcc.target/aarch64/sve/div_const_1.c: New test.
>   * gcc.target/aarch64/sve/div_const_1_run.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected output.
>   * gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
>
> From e8ffbab52ad7b9307cbfc9dbca4ef4d20e08804b Mon Sep 17 00:00:00 2001
> From: Jennifer Schmitz 
> Date: Tue, 16 Jul 2024 01:59:50 -0700
> Subject: [PATCH 1/2] SVE intrinsics: Add strength reduction for division by
>  constant.
>
> This patch folds signed SVE division where all divisor elements are the same
> power of 2 to svasrd. Tests were added to check 1) whether the transform is
> applied, i.e. asrd is used, and 2) correctness for all possible input types
> for svdiv, predication, and a variety of values. As the transform is applied
> only to signed integers, correctness for predication and values was only
> tested for svint32_t and svint64_t.
> Existing svdiv tests were adjusted such that the divisor is no longer a
> power of 2.
>
> The patch was bootstrapped and regtested on aarch64-linux-gnu, no regression.
> OK for mainline?
>
> Signed-off-by: Jennifer Schmitz 
>
> gcc/
>
>   * config/aarch64/aarch64-sve-builtins-base.cc (svdiv_impl): Implement
>   fold and expand.
>
> gcc/testsuite/
>
>   * gcc.target/aarch64/sve/div_const_1.c: New test.
>   * gcc.target/aarch64/sve/div_const_1_run.c: Likewise.
>   * gcc.target/aarch64/sve/acle/asm/div_s32.c: Adjust expected output.
>   * gcc.target/aarch64/sve/acle/asm/div_s64.c: Likewise.
> ---
>  .../aarch64/aarch64-sve-builtins-base.cc  | 44 -
>  .../gcc.target/aarch64/sve/acle/asm/div_s32.c | 60 ++--
>  .../gcc.target/aarch64/sve/acle/asm/div_s64.c | 60 ++--
>  .../gcc.target/aarch64/sve/div_const_1.c  | 34 +++
>  .../gcc.target/aarch64/sve/div_const_1_run.c  | 91 +++
>  5 files changed, 228 insertions(+), 61 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/div_const_1.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/div_const_1_run.c
>
> diff --git a/gcc/config/aarch64/aarch64-sve-builtins-base.cc 
> b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> index aa26370d397..d821cc96588 100644
> --- a/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> +++ b/gcc/config/aarch64/aarch64-sve-builtins-base.cc
> @@ -746,6 +746,48 @@ public:
>}
>  };
>  
> +class svdiv_impl : public unspec_based_function
> +{
> +public:
> +  CONSTEXPR svdiv_impl ()
> +: unspec_based_function (DIV, UDIV, UNSPEC_COND_FDIV) {}
> +
> +  gimple *
> +  fold (gimple_folder &f) const override
> +  {
> +tree divisor = gimple_call_arg (f.call, 2);
> +tree divisor_cst = uniform_integer_cst_p (divisor);
> +
> +if (f.type_suffix (0).unsigned_p)
> +  {
> + return NULL;
> +  }

We might as well test this first, since it doesn't depend on the
divisor_cst result.

Formatting nit: should be no braces for single statements, so:

if (f.type_suffix (0).unsigned_p)
  return NULL;

Same for the others.

> +
> +if (!divisor_cst)
> +  {
> + return NULL;
> +  }
> +
> +if (!integer_pow2p (divisor_cst))
> +  {
> + return NULL;
> +  }
> +
> +function_instance instance ("svasrd", functions::svasrd, 
> shapes::shift_right_imm, MODE_n, f.type_suffix_ids, GROUP_none, f.pred);

This line is above the 80 character limit.  Maybe:

function_instance instance ("svasrd", functions::svasrd,
shapes::shift_right_imm, MODE_n,
f.type_suffix_ids, GROUP_none, f.pred);

> +gcall *call = as_a  (f.redirect_call (instance));

Looks like an oversight that redirect_call doesn't return a gcall directly.
IMO it'd better to fix that instead.

> +tree shift_amt = wide_int_to_tree (TREE_TYPE (divisor_cst), tree_log2 
> (divisor_cst));

This ought to have type uint64_t instead, to match the function prototype.
That can be had from scalar_types[VECTOR_TYPE_svuint64_t].

> +gimple_call_set_arg (call, 2, shift_amt);
> +return call;
> +  }
> +
> +  rtx
> +  expand (function_expander &e) const override
> +  {
> +return e.

[PATCH] RISC-V: Support __mulbc3 and __divbc3 in libgcc for __bf16

2024-07-17 Thread Xiao Zeng
libgcc/ChangeLog:

* Makefile.in: Support __divbc3 and __mulbc3.
* libgcc2.c (if): Support BC mode for __bf16.
(defined): Ditto.
(MTYPE): Ditto.
(CTYPE): Ditto.
(AMTYPE): Ditto.
(MODE): Ditto.
(CEXT): Ditto.
(NOTRUNC): Ditto.
* libgcc2.h (LIBGCC2_HAS_BF_MODE): Ditto.
(__attribute__): Ditto.
(__divbc3): Add __divbc3 for __bf16.
(__mulbc3): Add __mulbc3 for __bf16.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/bf16-mulbc3-divbc3.c: New test.

Signed-off-by: Xiao Zeng 
---
 .../gcc.target/riscv/bf16-mulbc3-divbc3.c | 31 +++
 libgcc/Makefile.in|  6 ++--
 libgcc/libgcc2.c  | 20 
 libgcc/libgcc2.h  | 14 +
 4 files changed, 62 insertions(+), 9 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/bf16-mulbc3-divbc3.c

diff --git a/gcc/testsuite/gcc.target/riscv/bf16-mulbc3-divbc3.c 
b/gcc/testsuite/gcc.target/riscv/bf16-mulbc3-divbc3.c
new file mode 100644
index 000..5b30de15ccf
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/bf16-mulbc3-divbc3.c
@@ -0,0 +1,31 @@
+/* { dg-do run } */
+/* { dg-options "" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+#include 
+
+typedef _Complex float __cbf16 __attribute__((__mode__(__BC__)));
+
+__cbf16
+divbc3 (__cbf16 rs1, __cbf16 rs2)
+{
+  return rs1 / rs2;
+}
+
+__cbf16
+mulbc3 (__cbf16 rs1, __cbf16 rs2)
+{
+  return rs1 * rs2;
+}
+
+int main()
+{
+  __cbf16 rs1 = 2.0 + 4.0 * I;
+  __cbf16 rs2 = 1.0 + 2.0 * I;
+  __cbf16 mul = -6.0 + 8.0 * I;
+  __cbf16 div = 2.0 + 0.0 * I;
+  if (mulbc3 (rs1, rs2) != mul)
+__builtin_abort();
+  if (divbc3 (rs1, rs2) != div)
+__builtin_abort();
+  return 0;
+}
diff --git a/libgcc/Makefile.in b/libgcc/Makefile.in
index 0e46e9ef768..b71fd5e2250 100644
--- a/libgcc/Makefile.in
+++ b/libgcc/Makefile.in
@@ -450,9 +450,9 @@ lib2funcs = _muldi3 _negdi2 _lshrdi3 _ashldi3 _ashrdi3 
_cmpdi2 _ucmpdi2\
_negvsi2 _negvdi2 _ctors _ffssi2 _ffsdi2 _clz _clzsi2 _clzdi2  \
_ctzsi2 _ctzdi2 _popcount_tab _popcountsi2 _popcountdi2\
_paritysi2 _paritydi2 _powisf2 _powidf2 _powixf2 _powitf2  \
-   _mulhc3 _mulsc3 _muldc3 _mulxc3 _multc3 _divhc3 _divsc3\
-   _divdc3 _divxc3 _divtc3 _bswapsi2 _bswapdi2 _clrsbsi2  \
-   _clrsbdi2 _mulbitint3
+   _mulhc3 _mulbc3 _mulsc3 _muldc3 _mulxc3 _multc3 _divhc3\
+   _divbc3 _divsc3 _divdc3 _divxc3 _divtc3 _bswapsi2 _bswapdi2\
+   _clrsbsi2 _clrsbdi2 _mulbitint3
 
 # The floating-point conversion routines that involve a single-word integer.
 # XX stands for the integer mode.
diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c
index 3fcb85c5b92..1d2aafcfd63 100644
--- a/libgcc/libgcc2.c
+++ b/libgcc/libgcc2.c
@@ -2591,6 +2591,7 @@ NAME (TYPE x, int m)
 #endif
 
 #if((defined(L_mulhc3) || defined(L_divhc3)) && LIBGCC2_HAS_HF_MODE) \
+|| ((defined(L_mulbc3) || defined(L_divbc3)) && LIBGCC2_HAS_BF_MODE) \
 || ((defined(L_mulsc3) || defined(L_divsc3)) && LIBGCC2_HAS_SF_MODE) \
 || ((defined(L_muldc3) || defined(L_divdc3)) && LIBGCC2_HAS_DF_MODE) \
 || ((defined(L_mulxc3) || defined(L_divxc3)) && LIBGCC2_HAS_XF_MODE) \
@@ -2607,6 +2608,13 @@ NAME (TYPE x, int m)
 # define MODE  hc
 # define CEXT  __LIBGCC_HF_FUNC_EXT__
 # define NOTRUNC (!__LIBGCC_HF_EXCESS_PRECISION__)
+#elif defined(L_mulbc3) || defined(L_divbc3)
+# define MTYPE  BFtype
+# define CTYPE  BCtype
+# define AMTYPE SFtype
+# define MODE   bc
+# define CEXT   __LIBGCC_BF_FUNC_EXT__
+# define NOTRUNC (!__LIBGCC_BF_EXCESS_PRECISION__)
 #elif defined(L_mulsc3) || defined(L_divsc3)
 # define MTYPE SFtype
 # define CTYPE SCtype
@@ -2690,8 +2698,8 @@ extern void *compile_type_assert[sizeof(INFINITY) == 
sizeof(MTYPE) ? 1 : -1];
 # define TRUNC(x)  __asm__ ("" : "=m"(x) : "m"(x))
 #endif
 
-#if defined(L_mulhc3) || defined(L_mulsc3) || defined(L_muldc3) \
-|| defined(L_mulxc3) || defined(L_multc3)
+#if defined(L_mulhc3) || defined(L_mulbc3) || defined(L_mulsc3)  \
+|| defined(L_muldc3) || defined(L_mulxc3) || defined(L_multc3)
 
 CTYPE
 CONCAT3(__mul,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d)
@@ -2760,16 +2768,16 @@ CONCAT3(__mul,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE 
d)
 }
 #endif /* complex multiply */
 
-#if defined(L_divhc3) || defined(L_divsc3) || defined(L_divdc3) \
-|| defined(L_divxc3) || defined(L_divtc3)
+#if defined(L_divhc3) || defined(L_divbc3) || defined(L_divsc3) \
+|| defined(L_divdc3) || defined(L_divxc3) || defined(L_divtc3)
 
 CTYPE
 CONCAT3(__div,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d)
 {
-#if defined(L_divhc3)  \
+#if (defined(L_divhc3) || defined(L_divbc3) )  \
   || (defined(L_divsc3) && defined(__LIBGCC_HAVE_HWDBL__) )
 
-  /* Half precision is handled with float

Re: [PATCH] [x86][avx512] Optimize maskstore when mask is 0 or -1 in UNSPEC_MASKMOV

2024-07-17 Thread Uros Bizjak
On Wed, Jul 17, 2024 at 8:54 AM Liu, Hongtao  wrote:
>
>
>
> > -Original Message-
> > From: Uros Bizjak 
> > Sent: Wednesday, July 17, 2024 2:52 PM
> > To: Liu, Hongtao 
> > Cc: gcc-patches@gcc.gnu.org; crazy...@gmail.com; hjl.to...@gmail.com
> > Subject: Re: [PATCH] [x86][avx512] Optimize maskstore when mask is 0 or -1
> > in UNSPEC_MASKMOV
> >
> > On Wed, Jul 17, 2024 at 3:27 AM liuhongt  wrote:
> > >
> > > Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> > > Ready push to trunk.
> > >
> > > gcc/ChangeLog:
> > >
> > > PR target/115843
> > > * config/i386/predicates.md (const0_or_m1_operand): New
> > > predicate.
> > > * config/i386/sse.md (*_store_mask_1): New
> > > pre_reload define_insn_and_split.
> > > (V): Add V32BF,V16BF,V8BF.
> > > (V4SF_V8BF): Rename to ..
> > > (V24F_128): .. this.
> > > (*vec_concat): Adjust with V24F_128.
> > > (*vec_concat_0): Ditto.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/i386/pr115843.c: New test.
> > > ---
> > >  gcc/config/i386/predicates.md|  5 
> > >  gcc/config/i386/sse.md   | 32 
> > >  gcc/testsuite/gcc.target/i386/pr115843.c | 38
> > > 
> > >  3 files changed, 69 insertions(+), 6 deletions(-)  create mode 100644
> > > gcc/testsuite/gcc.target/i386/pr115843.c
> > >
> > > diff --git a/gcc/config/i386/predicates.md
> > > b/gcc/config/i386/predicates.md index 5d0bb1e0f54..680594871de
> > 100644
> > > --- a/gcc/config/i386/predicates.md
> > > +++ b/gcc/config/i386/predicates.md
> > > @@ -825,6 +825,11 @@ (define_predicate "constm1_operand"
> > >(and (match_code "const_int")
> > > (match_test "op == constm1_rtx")))
> > >
> > > +;; Match 0 or -1.
> > > +(define_predicate "const0_or_m1_operand"
> > > +  (ior (match_operand 0 "const0_operand")
> > > +   (match_operand 0 "constm1_operand")))
> > > +
> > >  ;; Match exactly eight.
> > >  (define_predicate "const8_operand"
> > >(and (match_code "const_int")
> > > diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index
> > > e44822f705b..e11610f4b88 100644
> > > --- a/gcc/config/i386/sse.md
> > > +++ b/gcc/config/i386/sse.md
> > > @@ -294,6 +294,7 @@ (define_mode_iterator V
> > > (V16SI "TARGET_AVX512F && TARGET_EVEX512") (V8SI "TARGET_AVX")
> > V4SI
> > > (V8DI "TARGET_AVX512F && TARGET_EVEX512")  (V4DI "TARGET_AVX")
> > V2DI
> > > (V32HF "TARGET_AVX512F && TARGET_EVEX512") (V16HF
> > "TARGET_AVX")
> > > V8HF
> > > +   (V32BF "TARGET_AVX512F && TARGET_EVEX512") (V16BF
> > "TARGET_AVX")
> > > + V8BF
> > > (V16SF "TARGET_AVX512F && TARGET_EVEX512") (V8SF "TARGET_AVX")
> > V4SF
> > > (V8DF "TARGET_AVX512F && TARGET_EVEX512")  (V4DF "TARGET_AVX")
> > > (V2DF "TARGET_SSE2")])
> > >
> > > @@ -430,8 +431,8 @@ (define_mode_iterator VFB_512
> > > (V16SF "TARGET_EVEX512")
> > > (V8DF "TARGET_EVEX512")])
> > >
> > > -(define_mode_iterator V4SF_V8HF
> > > -  [V4SF V8HF])
> > > +(define_mode_iterator V24F_128
> > > +  [V4SF V8HF V8BF])
> > >
> > >  (define_mode_iterator VI48_AVX512VL
> > >[(V16SI "TARGET_EVEX512") (V8SI "TARGET_AVX512VL") (V4SI
> > > "TARGET_AVX512VL") @@ -11543,8 +11544,8 @@ (define_insn
> > "*vec_concatv2sf_sse"
> > > (set_attr "mode" "V4SF,SF,DI,DI")])
> > >
> > >  (define_insn "*vec_concat"
> > > -  [(set (match_operand:V4SF_V8HF 0 "register_operand"   "=x,v,x,v")
> > > -   (vec_concat:V4SF_V8HF
> > > +  [(set (match_operand:V24F_128 0 "register_operand"   "=x,v,x,v")
> > > +   (vec_concat:V24F_128
> > >   (match_operand: 1 "register_operand" " 
> > > 0,v,0,v")
> > >   (match_operand: 2 "nonimmediate_operand" "
> > x,v,m,m")))]
> > >"TARGET_SSE"
> > > @@ -11559,8 +11560,8 @@ (define_insn "*vec_concat"
> > > (set_attr "mode" "V4SF,V4SF,V2SF,V2SF")])
> > >
> > >  (define_insn "*vec_concat_0"
> > > -  [(set (match_operand:V4SF_V8HF 0 "register_operand"   "=v")
> > > -   (vec_concat:V4SF_V8HF
> > > +  [(set (match_operand:V24F_128 0 "register_operand"   "=v")
> > > +   (vec_concat:V24F_128
> > >   (match_operand: 1 "nonimmediate_operand" "vm")
> > >   (match_operand: 2 "const0_operand")))]
> > >"TARGET_SSE2"
> > > @@ -28574,6 +28575,25 @@ (define_insn
> > "_store_mask"
> > > (set_attr "memory" "store")
> > > (set_attr "mode" "")])
> > >
> > > +(define_insn_and_split "*_store_mask_1"
> > > +  [(set (match_operand:V 0 "memory_operand")
> > > +   (unspec:V
> > > + [(match_operand:V 1 "register_operand")
> > > +  (match_dup 0)
> > > +  (match_operand: 2 "const0_or_m1_operand")]
> > > + UNSPEC_MASKMOV))]
> > > +  "TARGET_AVX512F"
> >
> > Please add "ix86_pre_reload_split ()" condition to insn constraint for
> > instructions that have to be split before reload.
> Yes, thanks for the remainder 😊
> >
> > Uros.
> >
> > > +  "#"
> > > +  "&& 1"
>

[PATCH] gimple-fold: Fix up __builtin_clear_padding lowering [PR115527]

2024-07-17 Thread Jakub Jelinek
Hi!

The builtin-clear-padding-6.c testcase fails as clear_padding_type
doesn't correctly recompute the buf->size and buf->off members after
expanding clearing of an array using a runtime loop.
buf->size should be in that case the offset after which it should continue
with next members or padding before them modulo UNITS_PER_WORD and
buf->off that offset minus buf->size.  That is what the code was doing,
but with off being the start of the loop cleared array, not its end.
So, the last hunk in gimple-fold.cc fixes that.
When adding the testcase, I've noticed that the
c-c++-common/torture/builtin-clear-padding-* tests, although clearly
written as runtime tests to test the builtins at runtime, didn't have
{ dg-do run } directive and were just compile tests because of that.
When adding that to the tests, builtin-clear-padding-1.c was already
failing without that clear_padding_type hunk too, but
builtin-clear-padding-5.c was still failing even after the change.
That is due to a bug in clear_padding_flush which the patch fixes as
well - when clear_padding_flush is called with full=true (that happens
at the end of the whole __builtin_clear_padding or on those array
padding clears done by a runtime loop), it wants to flush all the pending
padding clearings rather than just some.  If it is at the end of the whole
object, it decreases wordsize when needed to make sure the code never writes
including RMW cycles to something outside of the object:
  if ((unsigned HOST_WIDE_INT) (buf->off + i + wordsize)
  > (unsigned HOST_WIDE_INT) buf->sz)
{
  gcc_assert (wordsize > 1);
  wordsize /= 2;
  i -= wordsize;
  continue;
}
but if it is full==true flush in the middle, this doesn't happen, but we
still process just the buffer bytes before the current end.  If that end
is not on a wordsize boundary, e.g. on the builtin-clear-padding-5.c test
the last chunk is 2 bytes, '\0', '\xff', i is 16 and end is 18,
nonzero_last might be equal to the end - i, i.e. 2 here, but still all_ones
might be true, so in some spots we just didn't emit any clearing in that
last chunk.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
and release branches?  This affects even 11 branch, dunno if we want to
change it even there or just ignore.  It is a wrong-code issue though, where
it can overwrite random non-padding bits or bytes instead of the padding
ones.

2024-07-17  Jakub Jelinek  

PR middle-end/115527
* gimple-fold.cc (clear_padding_flush): Introduce endsize
variable and use it instead of wordsize when comparing it against
nonzero_last.
(clear_padding_type): Increment off by sz.

* c-c++-common/torture/builtin-clear-padding-1.c: Add dg-do run
directive.
* c-c++-common/torture/builtin-clear-padding-2.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-3.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-4.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-5.c: Likewise.
* c-c++-common/torture/builtin-clear-padding-6.c: New test.

--- gcc/gimple-fold.cc.jj   2024-07-16 13:36:36.0 +0200
+++ gcc/gimple-fold.cc  2024-07-16 19:27:06.776641459 +0200
@@ -4242,7 +4242,8 @@ clear_padding_flush (clear_padding_struc
  i -= wordsize;
  continue;
}
-  for (size_t j = i; j < i + wordsize && j < end; j++)
+  size_t endsize = end - i > wordsize ? wordsize : end - i;
+  for (size_t j = i; j < i + endsize; j++)
{
  if (buf->buf[j])
{
@@ -4271,12 +4272,12 @@ clear_padding_flush (clear_padding_struc
   if (padding_bytes)
{
  if (nonzero_first == 0
- && nonzero_last == wordsize
+ && nonzero_last == endsize
  && all_ones)
{
  /* All bits are padding and we had some padding
 before too.  Just extend it.  */
- padding_bytes += wordsize;
+ padding_bytes += endsize;
  continue;
}
  if (all_ones && nonzero_first == 0)
@@ -4316,7 +4317,7 @@ clear_padding_flush (clear_padding_struc
   if (nonzero_first == wordsize)
/* All bits in a word are 0, there are no padding bits.  */
continue;
-  if (all_ones && nonzero_last == wordsize)
+  if (all_ones && nonzero_last == endsize)
{
  /* All bits between nonzero_first and end of word are padding
 bits, start counting padding_bytes.  */
@@ -4358,7 +4359,7 @@ clear_padding_flush (clear_padding_struc
  j = k;
}
}
- if (nonzero_last == wordsize)
+ if (nonzero_last == endsize)
padding_bytes = nonzero_last - zero_last;
  continue;
}
@@ -4832,6 +4833,7 @@ clear_padding_type (clear_padding_struct
  buf->off = 0;
  buf->size = 0;
  clear_padding_emit_loop (buf,

[PATCH] testsuite: Add dg-do run to another test

2024-07-17 Thread Jakub Jelinek
Hi!

This is another test which clearly has been written with the assumption that
it will be executed, but it isn't.
It works fine when it is executed on both x86_64-linux and i686-linux.

Ok for trunk?

2024-07-17  Jakub Jelinek  

* c-c++-common/torture/builtin-convertvector-1.c: Add dg-do run
directive.

--- gcc/testsuite/c-c++-common/torture/builtin-convertvector-1.c.jj 
2023-05-22 16:28:25.499147149 +0200
+++ gcc/testsuite/c-c++-common/torture/builtin-convertvector-1.c
2024-07-16 18:54:55.907042232 +0200
@@ -1,3 +1,4 @@
+/* { dg-do run } */
 /* { dg-skip-if "double support is incomplete" { "avr-*-*" } } */
 
 extern

Jakub



Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-17 Thread Tejas Belagod

On 7/15/24 6:05 PM, Richard Biener wrote:

On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod  wrote:


On 7/15/24 12:16 PM, Tejas Belagod wrote:

On 7/12/24 6:40 PM, Richard Biener wrote:

On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  wrote:


On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:

Padding is only an issue for very small vectors - the obvious choice is
to disallow vector types that would require any padding.  I can hardly
see where those are faster than using a vector of up to 4 char
elements.
Problematic are 1-bit elements with 4, 2 or one element vectors,
2-bit elements
with 2 or one element vectors and 4-bit elements with 1 element
vectors.


I'd really like to avoid having to support something like
_BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) *
16)))
_BitInt(2) to say size of long long could be acceptable.


I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic
way to say
the element should have n (< 8) bits.


I have no idea what the stance of supporting _BitInt in C++ are,
but most certainly diverging support (or even semantics) of the
vector extension in C vs. C++ is undesirable.


I believe Clang supports it in C++ next to C, GCC doesn't and Jason
didn't
look favorably to _BitInt support in C++, so at least until something
like
that is standardized in C++ the answer is probably no.


OK, I think that rules out _BitInt use here so while bool is then natural
for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the
number of bits explicitly.  There is signed_bool_precision but like
vector_mask it's use is restricted to the GIMPLE frontend because
interaction with the rest of the language isn't defined.



Thanks for all the suggestions - really insightful (to me) discussions.

Yeah, BitInt seemed like it was best placed for this, but not having C++
support is definitely a blocker. But as you say, in the absence of
BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One
way to specify non-1-bit widths could be overloading vector_size.

Also, I think overloading GIMPLE's vector_mask takes us into the
earlier-discussed territory of what it should actually mean - it meaning
the target truth type in GIMPLE and a generic vector extension in the FE
will probably confuse gcc developers more than users.


That said - we're mixing two things here.  The desire to have "proper"
svbool (fix: declare in the backend) and the desire to have "packed"
bit-precision vectors (for whatever actual reason) as part of the
GCC vector extension.



If we leave lane-disambiguation of svbool to the backend, the values I
see in supporting 1, 2 and 4 bitsizes are 1) first step towards
supporting BitInt(N) vectors possibly in the future 2) having a way for
targets to define their intrinsics' bool vector types using GNU
extensions 3) feature parity with Clang's ext_vector_type?

I believe the primary motivation for Clang to support ext_vector_type
was to have a way to define target intrinsics' vector bool type using
vector extensions.




Interestingly, Clang seems to support

typedef struct {
  _Bool i:1;
} STR;

typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR)
* 4))) vec;


int foo (vec b) {
 return sizeof b;
}

I can't find documentation about how it is implemented, but I suspect
the vector is constructed as an array STR[] i.e. possibly each
bit-element padded to byte boundary etc. Also, I can't seem to apply
many operations other than sizeof.

I don't know if we've tried to support such cases in GNU in the past?


Why should we do that?  It doesn't make much sense.

single-bit vectors is what _BitInt was invented for. 


Forgive me if I'm misunderstanding - I'm trying to figure out how 
_BitInts can be made to have single-bit generic vector semantics. For 
eg. If I want to initialize a _BitInt as vector, I can't do:


 _BitInt (4) a = (_BitInt (4)){1, 0, 1, 1};

as 'a' expects a scalar initialization.

Of if I want to convert an int vector to bit vector, I can't do

  v4si_p = v4si_a > v4si_b;
  _BitInt (4) vbool = __builtin_convertvector (v4si_p, _BitInt (4));

Also semantics of conditionals with _BitInt behave like scalars

  _BitInt (4) p = a && b; // Here a and b are _BitInt (4), but they 
behave as scalars.


Also, I can't do things like

  typedef _BitInt (2) vbool __attribute__((vector_size(sizeof (_BitInt 
(2)) * 4)));


to force it to behave as a vector because _BitInt is disallowed here.


 2-bit and 4-bit

element vectors is what's missing, but the scope is narrow and
efficient lowering or native support is missing.  Vectors of bit
elements but with padding is just something stupid to look for
as a general feature.



Fair enough.


Thanks,
Tejas.



Richard.


Thanks,
Tejas.




[PATCH] varasm: Fix bootstrap after the .base64 changes [PR115958]

2024-07-17 Thread Jakub Jelinek
Hi!

Apparently there is a -Wsign-compare warning if ptrdiff_t has precision of
int, then (t - s + 1 + 2) / 3 * 4 has int type while cnt unsigned int.
This doesn't warn if ptrdiff_t has larger precision, say on x86_64
it is 64-bit and so (t - s + 1 + 2) / 3 * 4 has long type and cnt unsigned
int.  And it doesn't warn when using older binutils (in my tests I've
used new binutils on x86_64 and old binutils on i686).
Anyway, earlier condition guarantees that t - s is at most 256-ish and
t >= s by construction, so we can just cast it to (unsigned) to avoid
the warning.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-07-17  Jakub Jelinek  

PR other/115958
* varasm.cc (default_elf_asm_output_ascii): Cast t - s to unsigned
to avoid -Wsign-compare warnings.

--- gcc/varasm.cc.jj2024-07-16 19:33:53.553501415 +0200
+++ gcc/varasm.cc   2024-07-16 22:35:05.132495049 +0200
@@ -8558,7 +8558,7 @@ default_elf_asm_output_ascii (FILE *f, c
{
  if (t == p && t != s)
{
- if (cnt <= (t - s + 1 + 2) / 3 * 4
+ if (cnt <= ((unsigned) (t - s) + 1 + 2) / 3 * 4
  && (!prev_base64 || (t - s) >= 16)
  && ((t - s) > 1 || cnt <= 2))
{
@@ -8584,7 +8584,7 @@ default_elf_asm_output_ascii (FILE *f, c
  break;
}
}
- if (cnt > (t - s + 2) / 3 * 4 && (t - s) >= 3)
+ if (cnt > ((unsigned) (t - s) + 2) / 3 * 4 && (t - s) >= 3)
{
  if (bytes_in_chunk > 0)
{

Jakub



RE: [PATCH]middle-end: fix 0 offset creation and folding [PR115936]

2024-07-17 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, July 16, 2024 12:47 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; j...@ventanamicro.com
> Subject: Re: [PATCH]middle-end: fix 0 offset creation and folding [PR115936]
> 
> On Tue, 16 Jul 2024, Tamar Christina wrote:
> 
> > Hi All,
> >
> > As shown in PR115936 SCEV and IVOPTS create an invalidate IV when the IV is
> > a pointer type:
> >
> > ivtmp.39_65 = ivtmp.39_59 + 0B;
> >
> > where the IVs are DI mode and the offset is a pointer.
> > This comes from this weird candidate:
> >
> > Candidate 8:
> >   Var befor: ivtmp.39_59
> >   Var after: ivtmp.39_65
> >   Incr POS: before exit test
> >   IV struct:
> > Type:   sizetype
> > Base:   0
> > Step:   0B
> > Biv:N
> > Overflowness wrto loop niter:   No-overflow
> >
> > This IV was always created just ended up not being used.
> >
> > This is created by SCEV.
> >
> > simple_iv_with_niters in the case where no CHREC is found creates an IV with
> > base == ev, offset == 0;
> >
> > however in this case EV is a POINTER_PLUS_EXPR and so the type is a pointer.
> > it ends up creating an unusable expression.
> >
> > However IVOPTS also has code to refold expression in case the IV is a 
> > pointer.
> > For most cases it uses basetype to fold both operand but the 0 case it 
> > ommitted
> > it.  This leads to us creating a PLUS expression with mismatched types.
> >
> > This fixes that bug as well.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > x86_64-pc-linux-gnu -m32, -m64 and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR tree-optimization/115936
> > * tree-scalar-evolution.cc (simple_iv_with_niters): Use sizetype for
> > pointers.
> > * tree-ssa-loop-ivopts.cc (add_iv_candidate_for_use): Use same type for
> > both operands.
> >
> > ---
> > diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> > index
> 5aa95a2497a317f9b43408ce78a2d50c20151314..abb2bad777375889d6c980b
> 54d60699672fd5742 100644
> > --- a/gcc/tree-scalar-evolution.cc
> > +++ b/gcc/tree-scalar-evolution.cc
> > @@ -3243,7 +3243,11 @@ simple_iv_with_niters (class loop *wrto_loop, class
> loop *use_loop,
> >if (tree_does_not_contain_chrecs (ev))
> >  {
> >iv->base = ev;
> > -  iv->step = build_int_cst (TREE_TYPE (ev), 0);
> > +  tree ev_type = TREE_TYPE (ev);
> > +  if (POINTER_TYPE_P (ev_type))
> > +   ev_type = sizetype;
> > +
> > +  iv->step = build_int_cst (ev_type, 0);
> >iv->no_overflow = true;
> >return true;
> >  }
> 
> This hunk is OK.
> 
> > diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> > index
> c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..fe130541526e74fb80fee633f
> 6c96b41437aa1c1 100644
> > --- a/gcc/tree-ssa-loop-ivopts.cc
> > +++ b/gcc/tree-ssa-loop-ivopts.cc
> > @@ -3529,7 +3529,8 @@ add_iv_candidate_for_use (struct ivopts_data *data,
> struct iv_use *use)
> >basetype = TREE_TYPE (iv->base);
> >if (POINTER_TYPE_P (basetype))
> >  basetype = sizetype;
> > -  record_common_cand (data, build_int_cst (basetype, 0), iv->step, use);
> > +  record_common_cand (data, build_int_cst (basetype, 0),
> > + fold_convert (basetype, iv->step), use);
> 
> But this looks redundant?  iv->step should already be sizetype for a
> pointer base?

Yeah, on a correctly formed candidate.  I thought it would be a good idea to be 
a bit more
resilient in the cases where we're changing the base anyway.  But I equally 
understand that
we'd want to know when a candidate is malformed.

So I'll go with only the first hunk.

Thanks,
Tamar

> 
> Thanks,
> Richard.
> 
> >/* Compare the cost of an address with an unscaled index with the cost of
> >  an address with a scaled index and add candidate if useful.  */
> >
> >
> >
> >
> >
> 
> --
> Richard Biener 
> SUSE Software Solutions Germany GmbH,
> Frankenstrasse 146, 90461 Nuernberg, Germany;
> GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] rs6000: Relax some FLOAT128 expander condition for FLOAT128_IEEE_P [PR105359]

2024-07-17 Thread Kewen.Lin
Hi,

As PR105359 shows, we disable some FLOAT128 expanders for
64-bit long double, but in fact IEEE float128 types like
__ieee128 are only guarded with TARGET_FLOAT128_TYPE and
TARGET_LONG_DOUBLE_128 is only checked when determining if
we can reuse long_double_type_node.  So this patch is to
relax all affected FLOAT128 expander conditions for
FLOAT128_IEEE_P.  By the way, currently IBM double double
type __ibm128 is guarded by TARGET_LONG_DOUBLE_128, so we
have to use TARGET_LONG_DOUBLE_128 for it.  IMHO, it's not
necessary and can be enhanced later.

Btw, for all test cases mentioned in PR105359, I removed
the xfails and tested them with explicit -mlong-double-64,
both pr79004.c and float128-hw.c are tested well and
float128-hw4.c isn't tested (unsupported due to 64 bit
long double conflicts with -mabi=ieeelongdouble).

Bootstrapped and regtested on powerpc64-linux-gnu P8/P9
and powerpc64le-linux-gnu P9/P10.

I'm going to push this next week if no objections.

BR,
Kewen
-
PR target/105359

gcc/ChangeLog:

* config/rs6000/rs6000.md (@extenddf2): Remove condition
TARGET_LONG_DOUBLE_128 for FLOAT128_IEEE_P modes.
(extendsf2): Likewise.
(truncdf2): Likewise.
(truncsf2): Likewise.
(floatsi2): Likewise.
(fix_truncsi2): Likewise.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr79004.c: Remove xfails.
---
 gcc/config/rs6000/rs6000.md| 18 --
 gcc/testsuite/gcc.target/powerpc/pr79004.c | 14 ++
 2 files changed, 18 insertions(+), 14 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 276a5c9cf2d..c79858ba064 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -8845,7 +8845,8 @@ (define_insn_and_split "*mov_softfloat"
 (define_expand "@extenddf2"
   [(set (match_operand:FLOAT128 0 "gpc_reg_operand")
(float_extend:FLOAT128 (match_operand:DF 1 "gpc_reg_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
+  "TARGET_HARD_FLOAT
+   && (TARGET_LONG_DOUBLE_128 || FLOAT128_IEEE_P (mode))"
 {
   if (FLOAT128_IEEE_P (mode))
 rs6000_expand_float128_convert (operands[0], operands[1], false);
@@ -8903,7 +8904,8 @@ (define_insn_and_split "@extenddf2_vsx"
 (define_expand "extendsf2"
   [(set (match_operand:FLOAT128 0 "gpc_reg_operand")
(float_extend:FLOAT128 (match_operand:SF 1 "gpc_reg_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
+  "TARGET_HARD_FLOAT
+   && (TARGET_LONG_DOUBLE_128 || FLOAT128_IEEE_P (mode))"
 {
   if (FLOAT128_IEEE_P (mode))
 rs6000_expand_float128_convert (operands[0], operands[1], false);
@@ -8919,7 +8921,8 @@ (define_expand "extendsf2"
 (define_expand "truncdf2"
   [(set (match_operand:DF 0 "gpc_reg_operand")
(float_truncate:DF (match_operand:FLOAT128 1 "gpc_reg_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
+  "TARGET_HARD_FLOAT
+   && (TARGET_LONG_DOUBLE_128 || FLOAT128_IEEE_P (mode))"
 {
   if (FLOAT128_IEEE_P (mode))
 {
@@ -8956,7 +8959,8 @@ (define_insn "truncdf2_internal2"
 (define_expand "truncsf2"
   [(set (match_operand:SF 0 "gpc_reg_operand")
(float_truncate:SF (match_operand:FLOAT128 1 "gpc_reg_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
+  "TARGET_HARD_FLOAT
+   && (TARGET_LONG_DOUBLE_128 || FLOAT128_IEEE_P (mode))"
 {
   if (FLOAT128_IEEE_P (mode))
 rs6000_expand_float128_convert (operands[0], operands[1], false);
@@ -8973,7 +8977,8 @@ (define_expand "floatsi2"
   [(parallel [(set (match_operand:FLOAT128 0 "gpc_reg_operand")
   (float:FLOAT128 (match_operand:SI 1 "gpc_reg_operand")))
  (clobber (match_scratch:DI 2))])]
-  "TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
+  "TARGET_HARD_FLOAT
+   && (TARGET_LONG_DOUBLE_128 || FLOAT128_IEEE_P (mode))"
 {
   rtx op0 = operands[0];
   rtx op1 = operands[1];
@@ -9009,7 +9014,8 @@ (define_insn "fix_trunc_helper"
 (define_expand "fix_truncsi2"
   [(set (match_operand:SI 0 "gpc_reg_operand")
(fix:SI (match_operand:FLOAT128 1 "gpc_reg_operand")))]
-  "TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
+  "TARGET_HARD_FLOAT
+   && (TARGET_LONG_DOUBLE_128 || FLOAT128_IEEE_P (mode))"
 {
   rtx op0 = operands[0];
   rtx op1 = operands[1];
diff --git a/gcc/testsuite/gcc.target/powerpc/pr79004.c 
b/gcc/testsuite/gcc.target/powerpc/pr79004.c
index 60c576cd36b..ac89a4c9f32 100644
--- a/gcc/testsuite/gcc.target/powerpc/pr79004.c
+++ b/gcc/testsuite/gcc.target/powerpc/pr79004.c
@@ -100,12 +100,10 @@ void to_uns_short_store_n (TYPE a, unsigned short *p, 
long n) { p[n] = (unsigned
 void to_uns_int_store_n (TYPE a, unsigned int *p, long n) { p[n] = (unsigned 
int)a; }
 void to_uns_long_store_n (TYPE a, unsigned long *p, long n) { p[n] = (unsigned 
long)a; }

-/* On targets with 64-bit long double, some opcodes to deal with __float128 are
-   disabled, see PR target/105359.  */
-/* { dg-final { scan-assembler-not {\mbl __}   { xfail longdouble64 } }

Re: [PATCH] gimple-fold: Fix up __builtin_clear_padding lowering [PR115527]

2024-07-17 Thread Richard Biener
On Wed, 17 Jul 2024, Jakub Jelinek wrote:

> Hi!
> 
> The builtin-clear-padding-6.c testcase fails as clear_padding_type
> doesn't correctly recompute the buf->size and buf->off members after
> expanding clearing of an array using a runtime loop.
> buf->size should be in that case the offset after which it should continue
> with next members or padding before them modulo UNITS_PER_WORD and
> buf->off that offset minus buf->size.  That is what the code was doing,
> but with off being the start of the loop cleared array, not its end.
> So, the last hunk in gimple-fold.cc fixes that.
> When adding the testcase, I've noticed that the
> c-c++-common/torture/builtin-clear-padding-* tests, although clearly
> written as runtime tests to test the builtins at runtime, didn't have
> { dg-do run } directive and were just compile tests because of that.
> When adding that to the tests, builtin-clear-padding-1.c was already
> failing without that clear_padding_type hunk too, but
> builtin-clear-padding-5.c was still failing even after the change.
> That is due to a bug in clear_padding_flush which the patch fixes as
> well - when clear_padding_flush is called with full=true (that happens
> at the end of the whole __builtin_clear_padding or on those array
> padding clears done by a runtime loop), it wants to flush all the pending
> padding clearings rather than just some.  If it is at the end of the whole
> object, it decreases wordsize when needed to make sure the code never writes
> including RMW cycles to something outside of the object:
>   if ((unsigned HOST_WIDE_INT) (buf->off + i + wordsize)
>   > (unsigned HOST_WIDE_INT) buf->sz)
> {
>   gcc_assert (wordsize > 1);
>   wordsize /= 2;
>   i -= wordsize;
>   continue;
> }
> but if it is full==true flush in the middle, this doesn't happen, but we
> still process just the buffer bytes before the current end.  If that end
> is not on a wordsize boundary, e.g. on the builtin-clear-padding-5.c test
> the last chunk is 2 bytes, '\0', '\xff', i is 16 and end is 18,
> nonzero_last might be equal to the end - i, i.e. 2 here, but still all_ones
> might be true, so in some spots we just didn't emit any clearing in that
> last chunk.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
> and release branches?  This affects even 11 branch, dunno if we want to
> change it even there or just ignore.  It is a wrong-code issue though, where
> it can overwrite random non-padding bits or bytes instead of the padding
> one.

OK.  It's a bit late for the 11 branch without some soaking on trunk - 
when do we use __builtin_clear_padding?  IIRC for C++ atomics?

Thanks,
Richard.

> 2024-07-17  Jakub Jelinek  
> 
>   PR middle-end/115527
>   * gimple-fold.cc (clear_padding_flush): Introduce endsize
>   variable and use it instead of wordsize when comparing it against
>   nonzero_last.
>   (clear_padding_type): Increment off by sz.
> 
>   * c-c++-common/torture/builtin-clear-padding-1.c: Add dg-do run
>   directive.
>   * c-c++-common/torture/builtin-clear-padding-2.c: Likewise.
>   * c-c++-common/torture/builtin-clear-padding-3.c: Likewise.
>   * c-c++-common/torture/builtin-clear-padding-4.c: Likewise.
>   * c-c++-common/torture/builtin-clear-padding-5.c: Likewise.
>   * c-c++-common/torture/builtin-clear-padding-6.c: New test.
> 
> --- gcc/gimple-fold.cc.jj 2024-07-16 13:36:36.0 +0200
> +++ gcc/gimple-fold.cc2024-07-16 19:27:06.776641459 +0200
> @@ -4242,7 +4242,8 @@ clear_padding_flush (clear_padding_struc
> i -= wordsize;
> continue;
>   }
> -  for (size_t j = i; j < i + wordsize && j < end; j++)
> +  size_t endsize = end - i > wordsize ? wordsize : end - i;
> +  for (size_t j = i; j < i + endsize; j++)
>   {
> if (buf->buf[j])
>   {
> @@ -4271,12 +4272,12 @@ clear_padding_flush (clear_padding_struc
>if (padding_bytes)
>   {
> if (nonzero_first == 0
> -   && nonzero_last == wordsize
> +   && nonzero_last == endsize
> && all_ones)
>   {
> /* All bits are padding and we had some padding
>before too.  Just extend it.  */
> -   padding_bytes += wordsize;
> +   padding_bytes += endsize;
> continue;
>   }
> if (all_ones && nonzero_first == 0)
> @@ -4316,7 +4317,7 @@ clear_padding_flush (clear_padding_struc
>if (nonzero_first == wordsize)
>   /* All bits in a word are 0, there are no padding bits.  */
>   continue;
> -  if (all_ones && nonzero_last == wordsize)
> +  if (all_ones && nonzero_last == endsize)
>   {
> /* All bits between nonzero_first and end of word are padding
>bits, start counting padding_bytes.  */
> @@ -4358,7 +4359,7 @@ clear_padding_flush (clear_padding_struc
> j = k;
>   }
>   

Re: [PATCH] testsuite: Add dg-do run to another test

2024-07-17 Thread Richard Biener
On Wed, 17 Jul 2024, Jakub Jelinek wrote:

> Hi!
> 
> This is another test which clearly has been written with the assumption that
> it will be executed, but it isn't.
> It works fine when it is executed on both x86_64-linux and i686-linux.
> 
> Ok for trunk?

OK.

> 2024-07-17  Jakub Jelinek  
> 
>   * c-c++-common/torture/builtin-convertvector-1.c: Add dg-do run
>   directive.
> 
> --- gcc/testsuite/c-c++-common/torture/builtin-convertvector-1.c.jj   
> 2023-05-22 16:28:25.499147149 +0200
> +++ gcc/testsuite/c-c++-common/torture/builtin-convertvector-1.c  
> 2024-07-16 18:54:55.907042232 +0200
> @@ -1,3 +1,4 @@
> +/* { dg-do run } */
>  /* { dg-skip-if "double support is incomplete" { "avr-*-*" } } */
>  
>  extern
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


Re: [PATCH] varasm: Fix bootstrap after the .base64 changes [PR115958]

2024-07-17 Thread Richard Biener
On Wed, 17 Jul 2024, Jakub Jelinek wrote:

> Hi!
> 
> Apparently there is a -Wsign-compare warning if ptrdiff_t has precision of
> int, then (t - s + 1 + 2) / 3 * 4 has int type while cnt unsigned int.
> This doesn't warn if ptrdiff_t has larger precision, say on x86_64
> it is 64-bit and so (t - s + 1 + 2) / 3 * 4 has long type and cnt unsigned
> int.  And it doesn't warn when using older binutils (in my tests I've
> used new binutils on x86_64 and old binutils on i686).
> Anyway, earlier condition guarantees that t - s is at most 256-ish and
> t >= s by construction, so we can just cast it to (unsigned) to avoid
> the warning.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

> 2024-07-17  Jakub Jelinek  
> 
>   PR other/115958
>   * varasm.cc (default_elf_asm_output_ascii): Cast t - s to unsigned
>   to avoid -Wsign-compare warnings.
> 
> --- gcc/varasm.cc.jj  2024-07-16 19:33:53.553501415 +0200
> +++ gcc/varasm.cc 2024-07-16 22:35:05.132495049 +0200
> @@ -8558,7 +8558,7 @@ default_elf_asm_output_ascii (FILE *f, c
>   {
> if (t == p && t != s)
>   {
> -   if (cnt <= (t - s + 1 + 2) / 3 * 4
> +   if (cnt <= ((unsigned) (t - s) + 1 + 2) / 3 * 4
> && (!prev_base64 || (t - s) >= 16)
> && ((t - s) > 1 || cnt <= 2))
>   {
> @@ -8584,7 +8584,7 @@ default_elf_asm_output_ascii (FILE *f, c
> break;
>   }
>   }
> -   if (cnt > (t - s + 2) / 3 * 4 && (t - s) >= 3)
> +   if (cnt > ((unsigned) (t - s) + 2) / 3 * 4 && (t - s) >= 3)
>   {
> if (bytes_in_chunk > 0)
>   {
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] testsuite: Add dg-do run to more tests

2024-07-17 Thread Sam James
All of these are for wrong-code bugs. Confirmed to be used before but
with no execution.

2024-07-17  Sam James 

PR/96369
PR/102124
PR/108692
* c-c++-common/pr96369.c: Add dg-do run directive.
* gcc.dg/torture/pr102124.c: Ditto.
* gcc.dg/pr108692.c: Ditto.
---
Inspired by Jakub's fixes earlier for other missing dg-run tests which clearly
needed execution.

Please commit if OK. Thanks.

 gcc/testsuite/c-c++-common/pr96369.c| 1 +
 gcc/testsuite/gcc.dg/pr108692.c | 1 +
 gcc/testsuite/gcc.dg/torture/pr102124.c | 1 +
 3 files changed, 3 insertions(+)

diff --git a/gcc/testsuite/c-c++-common/pr96369.c 
b/gcc/testsuite/c-c++-common/pr96369.c
index 8c468d9fec2f..9cb5aaa50420 100644
--- a/gcc/testsuite/c-c++-common/pr96369.c
+++ b/gcc/testsuite/c-c++-common/pr96369.c
@@ -1,4 +1,5 @@
 /* { dg-do compile } */
+/* { dg-do run } */
 /* { dg-options "-O" } */
 
 int main()
diff --git a/gcc/testsuite/gcc.dg/pr108692.c b/gcc/testsuite/gcc.dg/pr108692.c
index fc25bf54e45d..e5f8af5669bb 100644
--- a/gcc/testsuite/gcc.dg/pr108692.c
+++ b/gcc/testsuite/gcc.dg/pr108692.c
@@ -1,5 +1,6 @@
 /* PR tree-optimization/108692 */
 /* { dg-do compile } */
+/* { dg-do run } */
 /* { dg-options "-O2 -ftree-vectorize" } */
 
 __attribute__((noipa)) int
diff --git a/gcc/testsuite/gcc.dg/torture/pr102124.c 
b/gcc/testsuite/gcc.dg/torture/pr102124.c
index a158b4a60b69..a0eb01521242 100644
--- a/gcc/testsuite/gcc.dg/torture/pr102124.c
+++ b/gcc/testsuite/gcc.dg/torture/pr102124.c
@@ -1,4 +1,5 @@
 /* PR tree-optimization/102124 */
+/* { dg-do run } */
 
 int
 foo (const unsigned char *a, const unsigned char *b, unsigned long len)

-- 
2.45.2



Re: [PATCH] gimple-fold: Fix up __builtin_clear_padding lowering [PR115527]

2024-07-17 Thread Jakub Jelinek
On Wed, Jul 17, 2024 at 11:10:34AM +0200, Richard Biener wrote:
> OK.  It's a bit late for the 11 branch without some soaking on trunk - 
> when do we use __builtin_clear_padding?  IIRC for C++ atomics?

Apparently in GCC 11 only if one uses the builtin directly, I wrote the
support primarily for libstdc++, but it started using it only in GCC 13.
And GCC 12 and later uses it for -ftrivial-auto* and OpenMP atomics.

So maybe it won't be that serious problem if GCC 11.5 is without this fix.

Jakub



[PING][patch, avr] Implement PR90616: Improve adding symbols that are 256-byte aligned

2024-07-17 Thread Georg-Johann Lay

Ping #1 for

https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656332.html


Address computation (usually add) with symbols that are aligned
to 256 bytes does not require to add the lo8() part as it is zero.

This patch adds a new combine insn that performs a widening add
from QImode plus such a symbol.  The case when such an aligned
symbol is added to a reg that's already in HImode can be handled
in the addhi3 asm printer.

Ok to apply?

Johann

--

AVR: target90616 - Improve adding constants that are 0 mod 256.

This patch introduces a new insn that works as an insn combine
pattern for (plus:HI (zero_extend:HI(reg:QI) const_0mod256_operannd:HI))
which requires at most 2 instructions.  When the input register operand
is already in HImode, the addhi3 printer only adds the hi8 part when
it sees a SYMBOL_REF or CONST aligned to at least 256 bytes.
(The CONST_INT case was already handled).

gcc/
PR target/90616
* config/avr/predicates.md (const_0mod256_operand): New predicate.
* config/avr/constraints.md (Cp8): New constraint.
* config/avr/avr.md (*aligned_add_symbol): New insn.
* config/avr/avr.cc (avr_out_plus_symbol) [HImode]:
When op2 is a multiple of 256, there is no need to add / subtract
the lo8 part.
(avr_rtx_costs_1) [PLUS && HImode]: Return expected costs for
new insn *aligned_add_symbol as it applies.

pr90616-const_0mod256.diff

diff --git a/gcc/config/avr/avr.cc b/gcc/config/avr/avr.cc
index f048bf5fd41..014588dd6a7 100644
--- a/gcc/config/avr/avr.cc
+++ b/gcc/config/avr/avr.cc
@@ -9343,6 +9343,12 @@ avr_out_plus_symbol (rtx *xop, enum rtx_code code, int 
*plen)
 
   gcc_assert (mode == HImode || mode == PSImode);
 
+  if (mode == HImode

+  && const_0mod256_operand (xop[2], HImode))
+return avr_asm_len (PLUS == code
+   ? "subi %B0,hi8(-(%2))"
+   : "subi %B0,hi8(%2)", xop, plen, -1);
+
   avr_asm_len (PLUS == code
   ? "subi %A0,lo8(-(%2))" CR_TAB "sbci %B0,hi8(-(%2))"
   : "subi %A0,lo8(%2)"CR_TAB "sbci %B0,hi8(%2)",
@@ -12615,6 +12621,14 @@ avr_rtx_costs_1 (rtx x, machine_mode mode, int 
outer_code,
  *total = COSTS_N_INSNS (3);
  return true;
}
+  // *aligned_add_symbol
+  if (mode == HImode
+ && GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+ && const_0mod256_operand (XEXP (x, 1), HImode))
+   {
+ *total = COSTS_N_INSNS (1.5);
+ return true;
+   }
 
   if (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND

  && REG_P (XEXP (x, 1)))
diff --git a/gcc/config/avr/avr.md b/gcc/config/avr/avr.md
index dabf4c0fc5a..72ea1292576 100644
--- a/gcc/config/avr/avr.md
+++ b/gcc/config/avr/avr.md
@@ -10077,6 +10077,23 @@ (define_expand "isinf2"
 FAIL;
   })
 
+

+;; PR90616: Adding symbols that are aligned to 256 bytes can
+;; save up to two instructions.
+(define_insn "*aligned_add_symbol"
+  [(set (match_operand:HI 0 "register_operand" "=d")
+(plus:HI (zero_extend:HI (match_operand:QI 1 "register_operand" "r"))
+ (match_operand:HI 2 "const_0mod256_operand"
"Cp8")))]
+  ""
+  {
+return REGNO (operands[0]) == REGNO (operands[1])
+  ? "ldi %B0,hi8(%2)"
+  : "mov %A0,%1\;ldi %B0,hi8(%2)";
+  }
+  [(set (attr "length")
+(symbol_ref ("2 - (REGNO (operands[0]) == REGNO (operands[1]))")))])
+
+
 
 ;; Fixed-point instructions

 (include "avr-fixed.md")
diff --git a/gcc/config/avr/constraints.md b/gcc/config/avr/constraints.md
index b4e5525d197..35448614aa7 100644
--- a/gcc/config/avr/constraints.md
+++ b/gcc/config/avr/constraints.md
@@ -253,6 +253,11 @@ (define_constraint "Cn8"
   (and (match_code "const_int")
(match_test "IN_RANGE (ival, -255, -1)")))
 
+(define_constraint "Cp8"

+  "A constant integer or symbolic operand that is at least .p2align 8."
+  (and (match_code "const_int,symbol_ref,const")
+   (match_test "const_0mod256_operand (op, HImode)")))
+
 ;; CONST_FIXED is no element of 'n' so cook our own.
 ;; "i" or "s" would match but because the insn uses iterators that cover
 ;; INT_MODE, "i" or "s" is not always possible.
diff --git a/gcc/config/avr/predicates.md b/gcc/config/avr/predicates.md
index 12013660ed1..5b49481ff0f 100644
--- a/gcc/config/avr/predicates.md
+++ b/gcc/config/avr/predicates.md
@@ -171,6 +171,20 @@ (define_predicate "reg_or_0_operand"
 (define_predicate "symbol_ref_operand"
   (match_code "symbol_ref"))
 
+;; Returns true when OP is a SYMBOL_REF, CONST or CONST_INT that is

+;; a multiple of 256, i.e. lo8(OP) = 0.
+(define_predicate "const_0mod256_operand"
+  (ior (and (match_code "symbol_ref")
+(match_test "SYMBOL_REF_DECL (op)
+ && DECL_P (SYMBOL_REF_DECL (op))
+ && DECL_ALIGN (SYMBOL_REF_DECL (op)) >= 8 * 256"))
+   (and (match_code "const")
+(match_test "GET_CODE (XEXP (op, 0)) == PLUS")
+(match_test "const_0mod256_oper

Re: [PATCH] gimple-fold: Fix up __builtin_clear_padding lowering [PR115527]

2024-07-17 Thread Richard Biener
On Wed, 17 Jul 2024, Jakub Jelinek wrote:

> On Wed, Jul 17, 2024 at 11:10:34AM +0200, Richard Biener wrote:
> > OK.  It's a bit late for the 11 branch without some soaking on trunk - 
> > when do we use __builtin_clear_padding?  IIRC for C++ atomics?
> 
> Apparently in GCC 11 only if one uses the builtin directly, I wrote the
> support primarily for libstdc++, but it started using it only in GCC 13.
> And GCC 12 and later uses it for -ftrivial-auto* and OpenMP atomics.
> 
> So maybe it won't be that serious problem if GCC 11.5 is without this fix.

OTOH that also means pushing it there should be quite safe.

Richard.


[PATCH v1] Match: Bugfix .SAT_TRUNC honor types has no mode precision [PR115961]

2024-07-17 Thread pan2 . li
From: Pan Li 

The .SAT_TRUNC matching doesn't check the type has mode precision.  Thus
when bitfield like below will be recog as .SAT_TRUNC.

struct e
{
  unsigned pre : 12;
  unsigned a : 4;
};

__attribute__((noipa))
void bug (e * v, unsigned def, unsigned use) {
  e & defE = *v;
  defE.a = min_u (use + 1, 0xf);
}

This patch would like to add type_has_mode_precision_p for the
.SAT_TRUNC matching to get rid of this.

The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. The x86 bootstrap tests.
3. The x86 fully regression tests.

PR target/115961

gcc/ChangeLog:

* match.pd: Add type_has_mode_precision_p check for .SAT_TRUNC.

gcc/testsuite/ChangeLog:

* g++.target/i386/pr115961-run-1.C: New test.
* g++.target/riscv/rvv/base/pr115961-run-1.C: New test.

Signed-off-by: Pan Li 
---
 gcc/match.pd  |  4 +--
 .../g++.target/i386/pr115961-run-1.C  | 34 +++
 .../riscv/rvv/base/pr115961-run-1.C   | 34 +++
 3 files changed, 70 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr115961-run-1.C
 create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C

diff --git a/gcc/match.pd b/gcc/match.pd
index 24a0bbead3e..8121ec09f53 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -3240,7 +3240,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1)))
(convert @0))
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0)))
+  && TYPE_UNSIGNED (TREE_TYPE (@0)) && type_has_mode_precision_p (type))
  (with
   {
unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
@@ -3255,7 +3255,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (match (unsigned_integer_sat_trunc @0)
  (convert (min @0 INTEGER_CST@1))
  (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
-  && TYPE_UNSIGNED (TREE_TYPE (@0)))
+  && TYPE_UNSIGNED (TREE_TYPE (@0)) && type_has_mode_precision_p (type))
  (with
   {
unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
diff --git a/gcc/testsuite/g++.target/i386/pr115961-run-1.C 
b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
new file mode 100644
index 000..b8c8aef3b17
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
@@ -0,0 +1,34 @@
+/* PR target/115961 */
+/* { dg-do run } */
+/* { dg-options "-O3 -fdump-rtl-expand-details" } */
+
+struct e
+{
+  unsigned pre : 12;
+  unsigned a : 4;
+};
+
+static unsigned min_u (unsigned a, unsigned b)
+{
+  return (b < a) ? b : a;
+}
+
+__attribute__((noipa))
+void bug (e * v, unsigned def, unsigned use) {
+  e & defE = *v;
+  defE.a = min_u (use + 1, 0xf);
+}
+
+__attribute__((noipa, optimize(0)))
+int main(void)
+{
+  e v = { 0xded, 3 };
+
+  bug(&v, 32, 33);
+
+  if (v.a != 0xf)
+__builtin_abort ();
+
+  return 0;
+}
+/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */
diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C 
b/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
new file mode 100644
index 000..b8c8aef3b17
--- /dev/null
+++ b/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
@@ -0,0 +1,34 @@
+/* PR target/115961 */
+/* { dg-do run } */
+/* { dg-options "-O3 -fdump-rtl-expand-details" } */
+
+struct e
+{
+  unsigned pre : 12;
+  unsigned a : 4;
+};
+
+static unsigned min_u (unsigned a, unsigned b)
+{
+  return (b < a) ? b : a;
+}
+
+__attribute__((noipa))
+void bug (e * v, unsigned def, unsigned use) {
+  e & defE = *v;
+  defE.a = min_u (use + 1, 0xf);
+}
+
+__attribute__((noipa, optimize(0)))
+int main(void)
+{
+  e v = { 0xded, 3 };
+
+  bug(&v, 32, 33);
+
+  if (v.a != 0xf)
+__builtin_abort ();
+
+  return 0;
+}
+/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */
-- 
2.34.1



Re: [PATCH] testsuite: Add dg-do run to more tests

2024-07-17 Thread Jakub Jelinek
On Wed, Jul 17, 2024 at 10:27:53AM +0100, Sam James wrote:
> All of these are for wrong-code bugs. Confirmed to be used before but
> with no execution.
> 
> 2024-07-17  Sam James 

2 spaces instead of one before <.

> 
>   PR/96369
>   PR/102124
>   PR/108692
>   * c-c++-common/pr96369.c: Add dg-do run directive.
>   * gcc.dg/torture/pr102124.c: Ditto.
>   * gcc.dg/pr108692.c: Ditto.
> ---
> Inspired by Jakub's fixes earlier for other missing dg-run tests which clearly
> needed execution.
> 
> Please commit if OK. Thanks.
> 
>  gcc/testsuite/c-c++-common/pr96369.c| 1 +
>  gcc/testsuite/gcc.dg/pr108692.c | 1 +
>  gcc/testsuite/gcc.dg/torture/pr102124.c | 1 +
>  3 files changed, 3 insertions(+)
> 
> diff --git a/gcc/testsuite/c-c++-common/pr96369.c 
> b/gcc/testsuite/c-c++-common/pr96369.c
> index 8c468d9fec2f..9cb5aaa50420 100644
> --- a/gcc/testsuite/c-c++-common/pr96369.c
> +++ b/gcc/testsuite/c-c++-common/pr96369.c
> @@ -1,4 +1,5 @@
>  /* { dg-do compile } */
> +/* { dg-do run } */

Please don't do this.
If the test should be run, there shouldn't be dg-do compile (exception is
if the two dg-do directives are conditional on effective targets or similar
and you want the test to be dg-do compile in some cases and dg-do run in
other).

>  /* { dg-options "-O" } */
>  
>  int main()
> diff --git a/gcc/testsuite/gcc.dg/pr108692.c b/gcc/testsuite/gcc.dg/pr108692.c
> index fc25bf54e45d..e5f8af5669bb 100644
> --- a/gcc/testsuite/gcc.dg/pr108692.c
> +++ b/gcc/testsuite/gcc.dg/pr108692.c
> @@ -1,5 +1,6 @@
>  /* PR tree-optimization/108692 */
>  /* { dg-do compile } */
> +/* { dg-do run } */

Ditto.

>  /* { dg-options "-O2 -ftree-vectorize" } */
>  
>  __attribute__((noipa)) int
> diff --git a/gcc/testsuite/gcc.dg/torture/pr102124.c 
> b/gcc/testsuite/gcc.dg/torture/pr102124.c
> index a158b4a60b69..a0eb01521242 100644
> --- a/gcc/testsuite/gcc.dg/torture/pr102124.c
> +++ b/gcc/testsuite/gcc.dg/torture/pr102124.c
> @@ -1,4 +1,5 @@
>  /* PR tree-optimization/102124 */
> +/* { dg-do run } */
>  
>  int
>  foo (const unsigned char *a, const unsigned char *b, unsigned long len)
> 
> -- 
> 2.45.2

Jakub



RE: [RFC][middle-end] SLP Early break and control flow support in GCC

2024-07-17 Thread Tamar Christina
> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, July 16, 2024 4:08 PM
> To: Tamar Christina 
> Cc: GCC Patches ; Richard Sandiford
> 
> Subject: Re: [RFC][middle-end] SLP Early break and control flow support in GCC
> 
> On Mon, 15 Jul 2024, Tamar Christina wrote:
> 
> > Hi All,
> >
> > This RFC document covers at a high level how to extend early break support 
> > in
> > GCC to support SLP and how this will be extended in the future to support
> > full control flow in GCC.
> >
> > The basic idea in this is based on the paper "All You Need Is 
> > Superword-Level
> > Parallelism: Systematic Control-Flow Vectorization with SLP"[1] but it is
> > adapted to fit within the GCC vectorizer.
> 
> An interesting read - I think the approach is viable for loop
> vectorization where we schedule the whole vectorized loop but difficult
> for basic-block vectorization as they seem to re-build the whole function
> from scratch.  They also do not address how to code-generate predicated
> not vectorized code or how they decide to handle mixed vector/non-vector
> code at all.  For example I don't think any current CPU architecture
> supports a full set of predicated _scalar_ ops and not every scalar
> op would have a vector equivalent in case one would use single-lane
> vectors.

Hmm I'm guessing you mean here, they don't address for BB vectorization
how to deal with the fact that you may not always be able to vectorize the
entire function up from the seed? I thought today we dealt with that by
splitting during discovery?  Can we not do the same? i.e. treat them as
externals?

> 
> For GCCs basic-block vectorization the main outstanding issue is one
> of scheduling and dependences with scalar code (live lane extracts,
> vector builds from scalars) as well.
> 
> > What will not be covered is the support for First-Faulting Loads nor
> > alignment peeling as these are done by a different engineer.
> >
> > == Concept and Theory ==
> >
> > Supporting Early Break in SLP requires the same theory as general control 
> > flow,
> > the only difference is in that Early break can be supported for non-masked
> > architectures while full control flow support requires a fully masked
> > architecture.
> >
> > In GCC 14 Early break was added for non-SLP by teaching the vectorizer to
> branch
> > to a scalar epilogue whenever an early exit is taken.  This means the 
> > vectorizer
> > itself doesn't need to know how to perform side-effects or final reductions.
> 
> With current GCC we avoid the need of predicated stmts by using the scalar
> epilog and a branch-on-all-true/false stmt.  To make that semantically
> valid stmts are moved around.
> 
> > The additional benefit of this is that it works for any target providing a
> > vector cbranch optab implementation since we don't require masking support.
> In
> > order for this to work we need to have code motion that moves side effects
> > (primarily stores) into the latch basic block.  i.e. we delay any 
> > side-effects
> > up to a point where we know the full vector iteration would have been
> performed.
> > For this to work however we had to disable support for epilogue 
> > vectorization as
> > when the scalar statements are moved we break the link to the original BB 
> > they
> > were in.  This means that the stmt_vinfo for the stores that need to be 
> > moved
> > will no longer be valid for epilogue vectorization and the only way to 
> > recover
> > this would be to perform a full dataflow analysis again.  We decided against
> > this as the plan of record was to switch to SLP.
> >
> > -- Predicated SSA --
> >
> > The core of the proposal is to support a form of Predicated SSA (PSSA) in 
> > the
> > vectorizer [2]. The idea is to assign a control predicate to every SSA
> > statement.  This control predicate denotes their dependencies wrt to the BB 
> > they
> > are in and defines the dependencies between statements.
> >
> > As an example the following early break code:
> >
> > unsigned test4(unsigned x)
> > {
> >  unsigned ret = 0;
> >  for (int i = 0; i < N; i++)
> >  {
> >vect_b[i] = x + i;
> >if (vect_a[i]*2 != x)
> >  break;
> >vect_a[i] = x;
> >
> >  }
> >  return ret;
> > }
> >
> > is transformed into
> >
> > unsigned test4(unsigned x)
> > {
> >  unsigned ret = 0;
> >  for (int i = 0; i < N; i++)
> >  {
> >vect_b[i] = x + i;  :: !(vect_a[i]*2 != x)
> 
> why's this not 'true'?
> 

I think there are three cases here:

1. control flow with no exit == true, in this case we
can perform statements with side-effects in place as
you never exit the loop early.
2. early break non-masked, this one requires the store
 to be moved to the latch, or the block with the last
 early break.  This is what we do today, and the predicate
 would cause it to be moved.
3.  early break masked, In this case I think we need an exit
 block that performs any side effects masked.  Since on every
 exit you must still perform the stores, but 

Re: [PATCH v2] gimple-fold: consistent dump of builtin call simplifications

2024-07-17 Thread Richard Biener
On Tue, Jul 16, 2024 at 9:30 PM rubin.gerritsen  wrote:
>
> Changes since v1:
>  * Added DCO signoff
>  * Removed tabs from commit message
>
> --
> Previously only simplifications of the `__st[xrp]cpy_chk`
> were dumped. Now all call replacement simplifications are
> dumped.
>
> Examples of statements with corresponding dumpfile entries:
>
> `printf("mystr\n");`:
>   optimized: simplified printf to __builtin_puts
> `printf("%c", 'a');`:
>   optimized: simplified printf to __builtin_putchar
> `printf("%s\n", "mystr");`:
>   optimized: simplified printf to __builtin_puts
>
> The below test suites passed for this patch
> * The x86 bootstrap test.
> * Manual testing with some small example code manually
>   examining dump logs, outputting the lines mentioned above.

OK.

I'll push this for you.

Richard.

> gcc/ChangeLog:
>
> * gimple-fold.cc (dump_transformation): Moved definition.
> (replace_call_with_call_and_fold): Calls dump_transformation.
> (gimple_fold_builtin_stxcpy_chk): Removes call to
> dump_transformation, now in replace_call_with_call_and_fold.
> (gimple_fold_builtin_stxncpy_chk): Removes call to
> dump_transformation, now in replace_call_with_call_and_fold.
>
> Signed-off-by: Rubin Gerritsen 
> ---
>  gcc/gimple-fold.cc | 22 ++
>  1 file changed, 10 insertions(+), 12 deletions(-)
>
> diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> index 7c534d56bf1..b20d3a2ff9a 100644
> --- a/gcc/gimple-fold.cc
> +++ b/gcc/gimple-fold.cc
> @@ -802,6 +802,15 @@ gimplify_and_update_call_from_tree (gimple_stmt_iterator 
> *si_p, tree expr)
>gsi_replace_with_seq_vops (si_p, stmts);
>  }
>
> +/* Print a message in the dump file recording transformation of FROM to TO.  
> */
> +
> +static void
> +dump_transformation (gcall *from, gcall *to)
> +{
> +  if (dump_enabled_p ())
> +dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, from, "simplified %T to %T\n",
> +  gimple_call_fn (from), gimple_call_fn (to));
> +}
>
>  /* Replace the call at *GSI with the gimple value VAL.  */
>
> @@ -835,6 +844,7 @@ static void
>  replace_call_with_call_and_fold (gimple_stmt_iterator *gsi, gimple *repl)
>  {
>gimple *stmt = gsi_stmt (*gsi);
> +  dump_transformation (as_a  (stmt), as_a  (repl));
>gimple_call_set_lhs (repl, gimple_call_lhs (stmt));
>gimple_set_location (repl, gimple_location (stmt));
>gimple_move_vops (repl, stmt);
> @@ -3090,16 +3100,6 @@ gimple_fold_builtin_memory_chk (gimple_stmt_iterator 
> *gsi,
>return true;
>  }
>
> -/* Print a message in the dump file recording transformation of FROM to TO.  
> */
> -
> -static void
> -dump_transformation (gcall *from, gcall *to)
> -{
> -  if (dump_enabled_p ())
> -dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, from, "simplified %T to %T\n",
> -  gimple_call_fn (from), gimple_call_fn (to));
> -}
> -
>  /* Fold a call to the __st[rp]cpy_chk builtin.
> DEST, SRC, and SIZE are the arguments to the call.
> IGNORE is true if return value can be ignored.  FCODE is the BUILT_IN_*
> @@ -3189,7 +3189,6 @@ gimple_fold_builtin_stxcpy_chk (gimple_stmt_iterator 
> *gsi,
>  return false;
>
>gcall *repl = gimple_build_call (fn, 2, dest, src);
> -  dump_transformation (stmt, repl);
>replace_call_with_call_and_fold (gsi, repl);
>return true;
>  }
> @@ -3235,7 +3234,6 @@ gimple_fold_builtin_stxncpy_chk (gimple_stmt_iterator 
> *gsi,
>  return false;
>
>gcall *repl = gimple_build_call (fn, 3, dest, src, len);
> -  dump_transformation (stmt, repl);
>replace_call_with_call_and_fold (gsi, repl);
>return true;
>  }
> --
> 2.34.1
>


[PATCH] RISC-V: More support of vx and vf for autovec comparison

2024-07-17 Thread demin.han
There are still some cases which can't utilize vx or vf for autovec
comparison after last_combine pass.

1. integer comparison when imm isn't in range of [-16, 15]
2. float imm is 0.0
3. DI or DF mode under RV32

This patch fix above mentioned issues.

Tested on RV32 and RV64.

gcc/ChangeLog:

* config/riscv/autovec.md: register_operand to nonmemory_operand
* config/riscv/riscv-v.cc (get_cmp_insn_code): Select code according
* to scalar_p
(expand_vec_cmp): Generate scalar_p and transform op1
* config/riscv/riscv.cc (riscv_const_insns): Add !FLOAT_MODE_P
* constrain
* config/riscv/vector.md: Add !FLOAT_MODE_P constrain

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Fix test
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Ditto
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Ditto
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Ditto
* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: Fix and add test
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: Fix
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Fix test
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: Ditto

Signed-off-by: demin.han 
---
 gcc/config/riscv/autovec.md   |  2 +-
 gcc/config/riscv/riscv-v.cc   | 72 ---
 gcc/config/riscv/riscv.cc |  2 +-
 gcc/config/riscv/vector.md|  3 +-
 .../rvv/autovec/binop/vadd-rv32gcv-nofm.c |  4 +-
 .../rvv/autovec/binop/vdiv-rv32gcv-nofm.c |  4 +-
 .../rvv/autovec/binop/vmul-rv32gcv-nofm.c |  4 +-
 .../rvv/autovec/binop/vsub-rv32gcv-nofm.c |  4 +-
 .../riscv/rvv/autovec/cmp/vcond-1.c   | 48 -
 .../rvv/autovec/cond/cond_copysign-rv32gcv.c  |  8 +--
 .../riscv/rvv/autovec/cond/cond_fadd-1.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fadd-2.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fadd-3.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fadd-4.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fma_fnma-1.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fma_fnma-3.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fma_fnma-4.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fma_fnma-5.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fma_fnma-6.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fmax-1.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fmax-2.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fmax-3.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fmax-4.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fmin-1.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fmin-2.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fmin-3.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fmin-4.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fms_fnms-1.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fms_fnms-3.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fms_fnms-4.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fms_fnms-5.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fms_fnms-6.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fmul-1.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fmul-2.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fmul-3.c  |  4 +-
 .../riscv/rvv/autovec/cond/cond_fmul-4.c  |  4 +-
 .../riscv/rvv/autovec/co

Re: [PATCH v2] gimple-fold: consistent dump of builtin call simplifications

2024-07-17 Thread Richard Biener
On Wed, Jul 17, 2024 at 12:47 PM Richard Biener
 wrote:
>
> On Tue, Jul 16, 2024 at 9:30 PM rubin.gerritsen  wrote:
> >
> > Changes since v1:
> >  * Added DCO signoff
> >  * Removed tabs from commit message
> >
> > --
> > Previously only simplifications of the `__st[xrp]cpy_chk`
> > were dumped. Now all call replacement simplifications are
> > dumped.
> >
> > Examples of statements with corresponding dumpfile entries:
> >
> > `printf("mystr\n");`:
> >   optimized: simplified printf to __builtin_puts
> > `printf("%c", 'a');`:
> >   optimized: simplified printf to __builtin_putchar
> > `printf("%s\n", "mystr");`:
> >   optimized: simplified printf to __builtin_puts
> >
> > The below test suites passed for this patch
> > * The x86 bootstrap test.
> > * Manual testing with some small example code manually
> >   examining dump logs, outputting the lines mentioned above.
>
> OK.
>
> I'll push this for you.

Can you please post the patch as generated by
git format-patch and attach it?  I have problems with your
mailer wrapping lines and even with that fixed the patch
not applying with git am.

Richard.

> Richard.
>
> > gcc/ChangeLog:
> >
> > * gimple-fold.cc (dump_transformation): Moved definition.
> > (replace_call_with_call_and_fold): Calls dump_transformation.
> > (gimple_fold_builtin_stxcpy_chk): Removes call to
> > dump_transformation, now in replace_call_with_call_and_fold.
> > (gimple_fold_builtin_stxncpy_chk): Removes call to
> > dump_transformation, now in replace_call_with_call_and_fold.
> >
> > Signed-off-by: Rubin Gerritsen 
> > ---
> >  gcc/gimple-fold.cc | 22 ++
> >  1 file changed, 10 insertions(+), 12 deletions(-)
> >
> > diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> > index 7c534d56bf1..b20d3a2ff9a 100644
> > --- a/gcc/gimple-fold.cc
> > +++ b/gcc/gimple-fold.cc
> > @@ -802,6 +802,15 @@ gimplify_and_update_call_from_tree 
> > (gimple_stmt_iterator *si_p, tree expr)
> >gsi_replace_with_seq_vops (si_p, stmts);
> >  }
> >
> > +/* Print a message in the dump file recording transformation of FROM to 
> > TO.  */
> > +
> > +static void
> > +dump_transformation (gcall *from, gcall *to)
> > +{
> > +  if (dump_enabled_p ())
> > +dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, from, "simplified %T to 
> > %T\n",
> > +  gimple_call_fn (from), gimple_call_fn (to));
> > +}
> >
> >  /* Replace the call at *GSI with the gimple value VAL.  */
> >
> > @@ -835,6 +844,7 @@ static void
> >  replace_call_with_call_and_fold (gimple_stmt_iterator *gsi, gimple *repl)
> >  {
> >gimple *stmt = gsi_stmt (*gsi);
> > +  dump_transformation (as_a  (stmt), as_a  (repl));
> >gimple_call_set_lhs (repl, gimple_call_lhs (stmt));
> >gimple_set_location (repl, gimple_location (stmt));
> >gimple_move_vops (repl, stmt);
> > @@ -3090,16 +3100,6 @@ gimple_fold_builtin_memory_chk (gimple_stmt_iterator 
> > *gsi,
> >return true;
> >  }
> >
> > -/* Print a message in the dump file recording transformation of FROM to 
> > TO.  */
> > -
> > -static void
> > -dump_transformation (gcall *from, gcall *to)
> > -{
> > -  if (dump_enabled_p ())
> > -dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, from, "simplified %T to 
> > %T\n",
> > -  gimple_call_fn (from), gimple_call_fn (to));
> > -}
> > -
> >  /* Fold a call to the __st[rp]cpy_chk builtin.
> > DEST, SRC, and SIZE are the arguments to the call.
> > IGNORE is true if return value can be ignored.  FCODE is the BUILT_IN_*
> > @@ -3189,7 +3189,6 @@ gimple_fold_builtin_stxcpy_chk (gimple_stmt_iterator 
> > *gsi,
> >  return false;
> >
> >gcall *repl = gimple_build_call (fn, 2, dest, src);
> > -  dump_transformation (stmt, repl);
> >replace_call_with_call_and_fold (gsi, repl);
> >return true;
> >  }
> > @@ -3235,7 +3234,6 @@ gimple_fold_builtin_stxncpy_chk (gimple_stmt_iterator 
> > *gsi,
> >  return false;
> >
> >gcall *repl = gimple_build_call (fn, 3, dest, src, len);
> > -  dump_transformation (stmt, repl);
> >replace_call_with_call_and_fold (gsi, repl);
> >return true;
> >  }
> > --
> > 2.34.1
> >


Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-17 Thread Richard Biener
On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod  wrote:
>
> On 7/15/24 6:05 PM, Richard Biener wrote:
> > On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod  wrote:
> >>
> >> On 7/15/24 12:16 PM, Tejas Belagod wrote:
> >>> On 7/12/24 6:40 PM, Richard Biener wrote:
>  On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  wrote:
> >
> > On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
> >> Padding is only an issue for very small vectors - the obvious choice is
> >> to disallow vector types that would require any padding.  I can hardly
> >> see where those are faster than using a vector of up to 4 char
> >> elements.
> >> Problematic are 1-bit elements with 4, 2 or one element vectors,
> >> 2-bit elements
> >> with 2 or one element vectors and 4-bit elements with 1 element
> >> vectors.
> >
> > I'd really like to avoid having to support something like
> > _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) *
> > 16)))
> > _BitInt(2) to say size of long long could be acceptable.
> 
>  I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic
>  way to say
>  the element should have n (< 8) bits.
> 
> >> I have no idea what the stance of supporting _BitInt in C++ are,
> >> but most certainly diverging support (or even semantics) of the
> >> vector extension in C vs. C++ is undesirable.
> >
> > I believe Clang supports it in C++ next to C, GCC doesn't and Jason
> > didn't
> > look favorably to _BitInt support in C++, so at least until something
> > like
> > that is standardized in C++ the answer is probably no.
> 
>  OK, I think that rules out _BitInt use here so while bool is then natural
>  for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the
>  number of bits explicitly.  There is signed_bool_precision but like
>  vector_mask it's use is restricted to the GIMPLE frontend because
>  interaction with the rest of the language isn't defined.
> 
> >>>
> >>> Thanks for all the suggestions - really insightful (to me) discussions.
> >>>
> >>> Yeah, BitInt seemed like it was best placed for this, but not having C++
> >>> support is definitely a blocker. But as you say, in the absence of
> >>> BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One
> >>> way to specify non-1-bit widths could be overloading vector_size.
> >>>
> >>> Also, I think overloading GIMPLE's vector_mask takes us into the
> >>> earlier-discussed territory of what it should actually mean - it meaning
> >>> the target truth type in GIMPLE and a generic vector extension in the FE
> >>> will probably confuse gcc developers more than users.
> >>>
>  That said - we're mixing two things here.  The desire to have "proper"
>  svbool (fix: declare in the backend) and the desire to have "packed"
>  bit-precision vectors (for whatever actual reason) as part of the
>  GCC vector extension.
> 
> >>>
> >>> If we leave lane-disambiguation of svbool to the backend, the values I
> >>> see in supporting 1, 2 and 4 bitsizes are 1) first step towards
> >>> supporting BitInt(N) vectors possibly in the future 2) having a way for
> >>> targets to define their intrinsics' bool vector types using GNU
> >>> extensions 3) feature parity with Clang's ext_vector_type?
> >>>
> >>> I believe the primary motivation for Clang to support ext_vector_type
> >>> was to have a way to define target intrinsics' vector bool type using
> >>> vector extensions.
> >>>
> >>
> >>
> >> Interestingly, Clang seems to support
> >>
> >> typedef struct {
> >>   _Bool i:1;
> >> } STR;
> >>
> >> typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR)
> >> * 4))) vec;
> >>
> >>
> >> int foo (vec b) {
> >>  return sizeof b;
> >> }
> >>
> >> I can't find documentation about how it is implemented, but I suspect
> >> the vector is constructed as an array STR[] i.e. possibly each
> >> bit-element padded to byte boundary etc. Also, I can't seem to apply
> >> many operations other than sizeof.
> >>
> >> I don't know if we've tried to support such cases in GNU in the past?
> >
> > Why should we do that?  It doesn't make much sense.
> >
> > single-bit vectors is what _BitInt was invented for.
>
> Forgive me if I'm misunderstanding - I'm trying to figure out how
> _BitInts can be made to have single-bit generic vector semantics. For
> eg. If I want to initialize a _BitInt as vector, I can't do:
>
>   _BitInt (4) a = (_BitInt (4)){1, 0, 1, 1};
>
> as 'a' expects a scalar initialization.
>
> Of if I want to convert an int vector to bit vector, I can't do
>
>v4si_p = v4si_a > v4si_b;
>_BitInt (4) vbool = __builtin_convertvector (v4si_p, _BitInt (4));
>
> Also semantics of conditionals with _BitInt behave like scalars
>
>_BitInt (4) p = a && b; // Here a and b are _BitInt (4), but they
> behave as scalars.
>
> Also, I can't do things like
>
>typedef 

Re: [PATCH] RISC-V: More support of vx and vf for autovec comparison

2024-07-17 Thread juzhe.zh...@rivai.ai
Thanks for supporting vf/vx transforming.
I'd rather let Robin review this patch.



juzhe.zh...@rivai.ai
 
From: demin.han
Date: 2024-07-17 18:55
To: gcc-patches
CC: juzhe.zhong; kito.cheng; pan2.li; jeffreyalaw; rdapp.gcc
Subject: [PATCH] RISC-V: More support of vx and vf for autovec comparison
There are still some cases which can't utilize vx or vf for autovec
comparison after last_combine pass.
 
1. integer comparison when imm isn't in range of [-16, 15]
2. float imm is 0.0
3. DI or DF mode under RV32
 
This patch fix above mentioned issues.
 
Tested on RV32 and RV64.
 
gcc/ChangeLog:
 
* config/riscv/autovec.md: register_operand to nonmemory_operand
* config/riscv/riscv-v.cc (get_cmp_insn_code): Select code according
* to scalar_p
(expand_vec_cmp): Generate scalar_p and transform op1
* config/riscv/riscv.cc (riscv_const_insns): Add !FLOAT_MODE_P
* constrain
* config/riscv/vector.md: Add !FLOAT_MODE_P constrain
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Fix test
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Ditto
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Ditto
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Ditto
* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: Fix and add test
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: Fix
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Fix test
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: Ditto
 
Signed-off-by: demin.han 
---
gcc/config/riscv/autovec.md   |  2 +-
gcc/config/riscv/riscv-v.cc   | 72 ---
gcc/config/riscv/riscv.cc |  2 +-
gcc/config/riscv/vector.md|  3 +-
.../rvv/autovec/binop/vadd-rv32gcv-nofm.c |  4 +-
.../rvv/autovec/binop/vdiv-rv32gcv-nofm.c |  4 +-
.../rvv/autovec/binop/vmul-rv32gcv-nofm.c |  4 +-
.../rvv/autovec/binop/vsub-rv32gcv-nofm.c |  4 +-
.../riscv/rvv/autovec/cmp/vcond-1.c   | 48 -
.../rvv/autovec/cond/cond_copysign-rv32gcv.c  |  8 +--
.../riscv/rvv/autovec/cond/cond_fadd-1.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fadd-2.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fadd-3.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fadd-4.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fma_fnma-1.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fma_fnma-3.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fma_fnma-4.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fma_fnma-5.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fma_fnma-6.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fmax-1.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fmax-2.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fmax-3.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fmax-4.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fmin-1.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fmin-2.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fmin-3.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fmin-4.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fms_fnms-1.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fms_fnms-3.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fms_fnms-4.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fms_fnms-5.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fms_fnms-6.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fmul-1.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fmul-2.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fmul-3.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fmul-4.c  |  4 +-
.../riscv/rvv/autovec/cond/cond_fmul-5.c  |  4 +-
37 fi

Re: [PATCH v1] Match: Bugfix .SAT_TRUNC honor types has no mode precision [PR115961]

2024-07-17 Thread Richard Biener
On Wed, Jul 17, 2024 at 11:48 AM  wrote:
>
> From: Pan Li 
>
> The .SAT_TRUNC matching doesn't check the type has mode precision.  Thus
> when bitfield like below will be recog as .SAT_TRUNC.
>
> struct e
> {
>   unsigned pre : 12;
>   unsigned a : 4;
> };
>
> __attribute__((noipa))
> void bug (e * v, unsigned def, unsigned use) {
>   e & defE = *v;
>   defE.a = min_u (use + 1, 0xf);
> }
>
> This patch would like to add type_has_mode_precision_p for the
> .SAT_TRUNC matching to get rid of this.
>
> The below test suites are passed for this patch:
> 1. The rv64gcv fully regression tests.
> 2. The x86 bootstrap tests.
> 3. The x86 fully regression tests.

Hmm, rather than restricting the matching the issue is the optab query or
in this case how *_optab_supported_p blindly uses TYPE_MODE without
either asserting the type has mode precision or failing the query in this case.

I think it would be simplest to adjust direct_optab_supported_p
(and convert_optab_supported_p) to reject such operations?  Richard, do
you agree or should callers check this instead?

So, instead of match.pd the check would need to be in vector pattern matching
and SSA math opts.  Or alternatively in internal-fn.cc as laid out above.

Richard.

> PR target/115961
>
> gcc/ChangeLog:
>
> * match.pd: Add type_has_mode_precision_p check for .SAT_TRUNC.
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/pr115961-run-1.C: New test.
> * g++.target/riscv/rvv/base/pr115961-run-1.C: New test.
>
> Signed-off-by: Pan Li 
> ---
>  gcc/match.pd  |  4 +--
>  .../g++.target/i386/pr115961-run-1.C  | 34 +++
>  .../riscv/rvv/base/pr115961-run-1.C   | 34 +++
>  3 files changed, 70 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr115961-run-1.C
>  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 24a0bbead3e..8121ec09f53 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -3240,7 +3240,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1)))
> (convert @0))
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0)))
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)) && type_has_mode_precision_p (type))
>   (with
>{
> unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> @@ -3255,7 +3255,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (match (unsigned_integer_sat_trunc @0)
>   (convert (min @0 INTEGER_CST@1))
>   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> -  && TYPE_UNSIGNED (TREE_TYPE (@0)))
> +  && TYPE_UNSIGNED (TREE_TYPE (@0)) && type_has_mode_precision_p (type))
>   (with
>{
> unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> diff --git a/gcc/testsuite/g++.target/i386/pr115961-run-1.C 
> b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
> new file mode 100644
> index 000..b8c8aef3b17
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
> @@ -0,0 +1,34 @@
> +/* PR target/115961 */
> +/* { dg-do run } */
> +/* { dg-options "-O3 -fdump-rtl-expand-details" } */
> +
> +struct e
> +{
> +  unsigned pre : 12;
> +  unsigned a : 4;
> +};
> +
> +static unsigned min_u (unsigned a, unsigned b)
> +{
> +  return (b < a) ? b : a;
> +}
> +
> +__attribute__((noipa))
> +void bug (e * v, unsigned def, unsigned use) {
> +  e & defE = *v;
> +  defE.a = min_u (use + 1, 0xf);
> +}
> +
> +__attribute__((noipa, optimize(0)))
> +int main(void)
> +{
> +  e v = { 0xded, 3 };
> +
> +  bug(&v, 32, 33);
> +
> +  if (v.a != 0xf)
> +__builtin_abort ();
> +
> +  return 0;
> +}
> +/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */
> diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C 
> b/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
> new file mode 100644
> index 000..b8c8aef3b17
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
> @@ -0,0 +1,34 @@
> +/* PR target/115961 */
> +/* { dg-do run } */
> +/* { dg-options "-O3 -fdump-rtl-expand-details" } */
> +
> +struct e
> +{
> +  unsigned pre : 12;
> +  unsigned a : 4;
> +};
> +
> +static unsigned min_u (unsigned a, unsigned b)
> +{
> +  return (b < a) ? b : a;
> +}
> +
> +__attribute__((noipa))
> +void bug (e * v, unsigned def, unsigned use) {
> +  e & defE = *v;
> +  defE.a = min_u (use + 1, 0xf);
> +}
> +
> +__attribute__((noipa, optimize(0)))
> +int main(void)
> +{
> +  e v = { 0xded, 3 };
> +
> +  bug(&v, 32, 33);
> +
> +  if (v.a != 0xf)
> +__builtin_abort ();
> +
> +  return 0;
> +}
> +/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */
> --
> 2.34.1
>


Re: [PATCH] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-17 Thread Richard Biener
On Tue, 16 Jul 2024, Filip Kastl wrote:

> On Wed 2024-07-10 11:34:44, Richard Biener wrote:
> > On Mon, 8 Jul 2024, Filip Kastl wrote:
> > 
> > > Hi,
> > > 
> > > I'm replying to Richard and keeping Andrew in cc since your suggestions
> > > overlap.
> > > 
> > > 
> > > On Tue 2024-06-11 14:48:06, Richard Biener wrote:
> > > > On Thu, 30 May 2024, Filip Kastl wrote:
> > > > > +/* { dg-do compile } */
> > > > > +/* { dg-options "-O2 -fdump-tree-switchconv -march=znver3" } */
> > > > 
> > > > I think it's better to enable -mpopcnt and -mbmi (or what remains
> > > > as minimal requirement).
> > > 
> > > Will do.  Currently the testcases are in the i386 directory.  After I 
> > > exchange
> > > the -march for -mpopcnt -mbmi can I put these testcases into 
> > > gcc.dg/tree-ssa?
> > > Will the -mpopcnt -mbmi options work with all target architectures?
> > 
> > No, those are i386 specific flags.  At least for popcount there's
> > dejagnu effective targets popcount, popcountl and popcountll so you
> > could do
> > 
> > /* { dg-additional-options "-mpopcnt" { target { x86_64-*-* i?86-*-* } } } 
> > */
> > 
> > and guard the tree dump scan with { target popcount } to cover other
> > archs that have popcount (without adding extra flags).
> > 
> 
> How does this take into account the FFS instruction?  If -mbmi is i386 
> specific
> then I can't just put it into dg-options, right?  And if I wanted to handle it
> similarly to how you suggest handling POPCOUNT, there would have to be
> something like { target bmi }.  Is there something like that?

I don't think so.  You can of course add architecture specific tests.

Richard.

> Note that I commited to adding x & -x == x as a fallback to POPCOUNT so now I
> do not require -mpopcount.  I now just have to ensure that the testcase only
> runs when the target supports FFS (or runs always but scans output only when
> target supports FFS).



> Cheers,
> Filip Kastl
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH,
Frankenstrasse 146, 90461 Nuernberg, Germany;
GF: Ivo Totev, Andrew McDonald, Werner Knoblich; (HRB 36809, AG Nuernberg)


[PATCH] tree-optimization/115959 - ICE with SLP condition reduction

2024-07-17 Thread Richard Biener
The following fixes how during reduction epilogue generation we
gather conditional compares for condition reductions, thereby
following the reduction chain via STMT_VINFO_REDUC_IDX.  The issue
is that SLP nodes for COND_EXPRs can have either three or four
children dependent on whether we have legacy GENERIC expressions
in the transitional pattern GIMPLE for the COND_EXPR condition.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

PR tree-optimization/115959
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Get at the REDUC_IDX child in a safer way for COND_EXPR
nodes.

* gcc.dg/vect/pr115959.c: New testcase.
---
 gcc/testsuite/gcc.dg/vect/pr115959.c | 14 ++
 gcc/tree-vect-loop.cc| 10 +++---
 2 files changed, 21 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/vect/pr115959.c

diff --git a/gcc/testsuite/gcc.dg/vect/pr115959.c 
b/gcc/testsuite/gcc.dg/vect/pr115959.c
new file mode 100644
index 000..181d5522018
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/pr115959.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+
+int a;
+_Bool *b;
+void f()
+{
+  int t = a;
+  for (int e = 0; e < 2048; e++)
+{
+  if (!b[e])
+   t = 0;
+}
+  a = t;
+}
diff --git a/gcc/tree-vect-loop.cc b/gcc/tree-vect-loop.cc
index 2d68db613f8..191a37a1e1a 100644
--- a/gcc/tree-vect-loop.cc
+++ b/gcc/tree-vect-loop.cc
@@ -6107,9 +6107,13 @@ vect_create_epilog_for_reduction (loop_vec_info 
loop_vinfo,
(std::make_pair (gimple_assign_rhs1 (vec_stmt),
 STMT_VINFO_REDUC_IDX (cond_info) == 2));
}
- /* ???  We probably want to have REDUC_IDX on the SLP node?  */
- cond_node = SLP_TREE_CHILDREN
-   (cond_node)[STMT_VINFO_REDUC_IDX (cond_info)];
+ /* ???  We probably want to have REDUC_IDX on the SLP node?
+We have both three and four children COND_EXPR nodes
+dependent on whether the comparison is still embedded
+as GENERIC.  So work backwards.  */
+ int slp_reduc_idx = (SLP_TREE_CHILDREN (cond_node).length () - 3
+  + STMT_VINFO_REDUC_IDX (cond_info));
+ cond_node = SLP_TREE_CHILDREN (cond_node)[slp_reduc_idx];
}
}
   else
-- 
2.35.3


[PATCH] tree-optimization/104515 - store motion and clobbers

2024-07-17 Thread Richard Biener
The following addresses an old regression when end-of-object/storage
clobbers were introduced.  In particular when there's an end-of-object
clobber in a loop but no corresponding begin-of-object we can still
perform store motion of may-aliased refs when we re-issue the
end-of-object/storage on the exits but elide it from the loop.  This
should be the safest way to deal with this considering stack-slot
sharing and it should not cause missed dead store eliminations given
DSE can now follow multiple paths in case there are multiple exits.

Note when the clobber is re-materialized only on one exit but not
on anther we are erroring on the side of removing the clobber on
such path.  This should be OK (removing clobbers is always OK).

Note there's no corresponding code to handle begin-of-object/storage
during the hoisting part of loads that are part of a store motion
optimization, so this only enables stored-only store motion or cases
without such clobber inside the loop.

Bootstrapped and tested on x86_64-unknown-linux-gnu, I'll push when
the pre-commit CI is happy.

Richard.

PR tree-optimization/104515
* tree-ssa-loop-im.cc (execute_sm_exit): Add clobbers_to_prune
parameter and handle re-materializing of clobbers.
(sm_seq_valid_bb): end-of-storage/object clobbers are OK inside
an ordered sequence of stores.
(sm_seq_push_down): Refuse to push down clobbers.
(hoist_memory_references): Prune clobbers from the loop body
we re-materialized on an exit.

* g++.dg/opt/pr104515.C: New testcase.
---
 gcc/testsuite/g++.dg/opt/pr104515.C | 18 ++
 gcc/tree-ssa-loop-im.cc | 86 -
 2 files changed, 89 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/opt/pr104515.C

diff --git a/gcc/testsuite/g++.dg/opt/pr104515.C 
b/gcc/testsuite/g++.dg/opt/pr104515.C
new file mode 100644
index 000..f5455a45aa6
--- /dev/null
+++ b/gcc/testsuite/g++.dg/opt/pr104515.C
@@ -0,0 +1,18 @@
+// { dg-do compile { target c++11 } }
+// { dg-options "-O2 -fdump-tree-lim2-details" }
+
+using T = int;
+struct Vec {
+  T* end;
+};
+void pop_back_many(Vec& v, unsigned n)
+{
+  for (unsigned i = 0; i < n; ++i) {
+--v.end;
+//  The end-of-object clobber prevented store motion of v
+v.end->~T();
+  }
+}
+
+// { dg-final { scan-tree-dump "Executing store motion of v" "lim2" } }
+// { dg-final { scan-tree-dump "Re-issueing dependent" "lim2" } }
diff --git a/gcc/tree-ssa-loop-im.cc b/gcc/tree-ssa-loop-im.cc
index 61c6339bc35..c53efbb8d59 100644
--- a/gcc/tree-ssa-loop-im.cc
+++ b/gcc/tree-ssa-loop-im.cc
@@ -2368,7 +2368,8 @@ struct seq_entry
 static void
 execute_sm_exit (class loop *loop, edge ex, vec &seq,
 hash_map &aux_map, sm_kind kind,
-edge &append_cond_position, edge &last_cond_fallthru)
+edge &append_cond_position, edge &last_cond_fallthru,
+bitmap clobbers_to_prune)
 {
   /* Sink the stores to exit from the loop.  */
   for (unsigned i = seq.length (); i > 0; --i)
@@ -2377,15 +2378,35 @@ execute_sm_exit (class loop *loop, edge ex, 
vec &seq,
   if (seq[i-1].second == sm_other)
{
  gcc_assert (kind == sm_ord && seq[i-1].from != NULL_TREE);
- if (dump_file && (dump_flags & TDF_DETAILS))
+ gassign *store;
+ if (ref->mem.ref == error_mark_node)
{
- fprintf (dump_file, "Re-issueing dependent store of ");
- print_generic_expr (dump_file, ref->mem.ref);
- fprintf (dump_file, " from loop %d on exit %d -> %d\n",
-  loop->num, ex->src->index, ex->dest->index);
+ tree lhs = gimple_assign_lhs (ref->accesses_in_loop[0].stmt);
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Re-issueing dependent ");
+ print_generic_expr (dump_file, unshare_expr (seq[i-1].from));
+ fprintf (dump_file, " of ");
+ print_generic_expr (dump_file, lhs);
+ fprintf (dump_file, " from loop %d on exit %d -> %d\n",
+  loop->num, ex->src->index, ex->dest->index);
+   }
+ store = gimple_build_assign (unshare_expr (lhs),
+  unshare_expr (seq[i-1].from));
+ bitmap_set_bit (clobbers_to_prune, seq[i-1].first);
+   }
+ else
+   {
+ if (dump_file && (dump_flags & TDF_DETAILS))
+   {
+ fprintf (dump_file, "Re-issueing dependent store of ");
+ print_generic_expr (dump_file, ref->mem.ref);
+ fprintf (dump_file, " from loop %d on exit %d -> %d\n",
+  loop->num, ex->src->index, ex->dest->index);
+   }
+ store = gimple_build_assign (unshare_expr (ref->mem.ref),
+ 

[PATCH] MATCH: Add simplification for MAX and MIN to match.pd [PR109878]

2024-07-17 Thread Eikansh Gupta
Min and max could be optimized if both operands are defined by
(same) variable restricted by an and(&). For signed types,
optimization can be done when both constant have same sign bit.
The patch also adds optimization for specific case of min/max(a, a&1).

This patch adds match pattern for:

max (a & CST0, a & CST1) -> a & CST0 IFF CST0 & CST1 == CST1
min (a & CST0, a & CST1) -> a & CST0 IFF CST0 & CST1 == CST0
min (a, a & 1) --> a & 1
max (a, a & 1) --> a

PR tree-optimization/109878

gcc/ChangeLog:

* match.pd min/max (a & CST0, a & CST1): New pattern.
min/max (a, a & 1): New pattern.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr109878.c: New test.
* gcc.dg/tree-ssa/pr109878-1.c: New test.
* gcc.dg/tree-ssa/pr109878-2.c: New test.
* gcc.dg/tree-ssa/pr109878-3.c: New test.

Signed-off-by: Eikansh Gupta 
---
 gcc/match.pd   | 22 
 gcc/testsuite/gcc.dg/tree-ssa/pr109878-1.c | 64 ++
 gcc/testsuite/gcc.dg/tree-ssa/pr109878-2.c | 31 +++
 gcc/testsuite/gcc.dg/tree-ssa/pr109878-3.c | 26 +
 gcc/testsuite/gcc.dg/tree-ssa/pr109878.c   | 64 ++
 5 files changed, 207 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109878-1.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109878-2.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109878-3.c
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr109878.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 24a0bbead3e..029ec0b487f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -4310,6 +4310,28 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 @0
 @2)))
 
+/* min (a & CST0, a & CST1) -> a & CST0 IFF CST0 & CST1 == CST0 */
+/* max (a & CST0, a & CST1) -> a & CST0 IFF CST0 & CST1 == CST1 */
+/* If signed a, then both the constants should have same sign. */
+(for minmax (min max)
+ (simplify
+  (minmax:c (bit_and@3 @0 INTEGER_CST@1) (bit_and@4 @0 INTEGER_CST@2))
+   (if (TYPE_UNSIGNED (type)
+|| ((tree_int_cst_sgn (@1) <= 0) == (tree_int_cst_sgn (@2) <= 0)))
+(if ((wi::to_wide (@1) & wi::to_wide (@2))
+ == ((minmax == MIN_EXPR) ? wi::to_wide (@1) : wi::to_wide (@2)))
+ @3
+
+/* min (a, a & 1) --> a & 1 */
+/* max (a, a & 1) --> a */
+(for minmax (min max)
+ (simplify
+  (minmax:c @0 (bit_and@1 @0 integer_onep))
+   (if (TYPE_UNSIGNED(type))
+(if (minmax == MIN_EXPR)
+ @1
+ @0
+
 /* Simplify min (&var[off0], &var[off1]) etc. depending on whether
the addresses are known to be less, equal or greater.  */
 (for minmax (min max)
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109878-1.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr109878-1.c
new file mode 100644
index 000..509e59adea1
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr109878-1.c
@@ -0,0 +1,64 @@
+/* PR tree-optimization/109878 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized" } */
+
+/* All the constant pair  used here satisfy the condition:
+   (cst0 & cst1 == cst0) || (cst0 & cst1 == cst1).
+   If the above condition is true, then MIN_EXPR is not needed. */
+int min_and(int a, int b) {
+  b = a & 3;
+  a = a & 1;
+  if (b < a)
+return b;
+  else
+return a;
+}
+
+int min_and1(int a, int b) {
+  b = a & 3;
+  a = a & 15;
+  if (b < a)
+return b;
+  else
+return a;
+}
+
+int min_and2(int a, int b) {
+  b = a & -7;
+  a = a & -3;
+  if (b < a)
+return b;
+  else
+return a;
+}
+
+int min_and3(int a, int b) {
+  b = a & -5;
+  a = a & -13;
+  if (b < a)
+return b;
+  else
+return a;
+}
+
+/* When constants are of opposite signs, the simplification will only
+   work for unsigned types. */
+unsigned int min_and4(unsigned int a, unsigned int b) {
+  b = a & 3;
+  a = a & -5;
+  if (b < a)
+return b;
+  else
+return a;
+}
+
+unsigned int min_and5(unsigned int a, unsigned int b) {
+  b = a & -3;
+  a = a & 5;
+  if (b < a)
+return b;
+  else
+return a;
+}
+
+/* { dg-final { scan-tree-dump-not " MIN_EXPR " "optimized" } } */
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr109878-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr109878-2.c
new file mode 100644
index 000..62846d5d784
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr109878-2.c
@@ -0,0 +1,31 @@
+/* PR tree-optimization/109878 */
+/* { dg-do compile } */
+/* { dg-options "-O1 -fdump-tree-optimized" } */
+
+/* The testcases here should not get optimized with the patch.
+   For constant pair  the condition:
+   (cst0 & cst1 == cst0) || (cst0 & cst1 == cst1)
+   is false for the constants used here. */
+int max_and(int a, int b) {
+
+  b = a & 3;
+  a = a & 5;
+  if (b > a)
+return b;
+  else
+return a;
+}
+
+/* The constants in this function satisfy the condition but a is signed.
+   For signed types both the constants should have same sign. */
+int min_and(int a, int b) {
+  b = a & 1;
+  a = a & -3;
+  if (b < a)
+return b;
+  else
+return a;
+}
+
+/* { dg-final { sc

Re: [PATCH] eh: ICE with std::initializer_list and ASan [PR115865]

2024-07-17 Thread Jakub Jelinek
On Thu, Jul 11, 2024 at 04:08:20PM -0400, Marek Polacek wrote:
> and we ICE on the empty finally in lower_try_finally_onedest while
> getting get_eh_else.
> 
> Rather than checking everywhere that a GIMPLE_TRY_FINALLY is not empty,
> I thought we could process it as if it were unreachable.

I'm not sure we need to check everywhere, the only spot is get_eh_else
from what I can see and IMHO that should be fixed with something like:

--- gcc/tree-eh.cc.jj   2024-01-03 11:51:27.490787274 +0100
+++ gcc/tree-eh.cc  2024-07-17 13:21:01.093840884 +0200
@@ -950,7 +950,7 @@ static inline geh_else *
 get_eh_else (gimple_seq finally)
 {
   gimple *x = gimple_seq_first_stmt (finally);
-  if (gimple_code (x) == GIMPLE_EH_ELSE)
+  if (x && gimple_code (x) == GIMPLE_EH_ELSE)
 {
   gcc_assert (gimple_seq_singleton_p (finally));
   return as_a  (x);

because empty finally sequence certainly doesn't contain GIMPLE_EH_ELSE.

Looking at other spots, I see it is able to deal with empty sequences:
  if (gimple_seq_singleton_p (new_seq)
  && gimple_code (gimple_seq_first_stmt (new_seq)) == GIMPLE_GOTO)
gimple_seq_singleton_p is false for empty new_seq.
  inner = gimple_seq_first_stmt (gimple_try_cleanup (tp));
  
  if (flag_exceptions)
{
  this_region = gen_eh_region_allowed (state->cur_region,
   gimple_eh_filter_types (inner));
in lower_eh_filter will not work with empty gimple_try_cleanup, but caller
will only call it if it is non-empty and starts with GIMPLE_EH_FILTER.
  gimple *inner = gimple_seq_first_stmt (gimple_try_cleanup (tp));
  eh_region this_region;
  
  this_region = gen_eh_region_must_not_throw (state->cur_region);
  this_region->u.must_not_throw.failure_decl
= gimple_eh_must_not_throw_fndecl (
as_a  (inner));
Similarly (lower_eh_must_not_throw and GIMPLE_EH_MUST_NOT_THROW).
x = gimple_seq_first_stmt (gimple_try_cleanup (try_stmt));
if (!x)
  {
replace = gimple_try_eval (try_stmt);
lower_eh_constructs_1 (state, &replace);
  }
else
  switch (gimple_code (x))
{
This case shows special case for empty cleanup and otherwise calls
lower_eh_* etc. depending on what the cleanup first statement is.
  x = gimple_seq_last_stmt (finally);
  finally_loc = x ? gimple_location (x) : tf_loc;
This allows for finally to be empty again.

> 
>   PR c++/115865
> 
> gcc/ChangeLog:
> 
>   * tree-eh.cc (lower_try_finally): If the FINALLY block is empty, treat
>   it as if it were not reachable.
> 
> gcc/testsuite/ChangeLog:
> 
>   * g++.dg/asan/initlist2.C: New test.
> ---
>  gcc/testsuite/g++.dg/asan/initlist2.C | 16 
>  gcc/tree-eh.cc|  4 ++--
>  2 files changed, 18 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.dg/asan/initlist2.C
> 
> diff --git a/gcc/testsuite/g++.dg/asan/initlist2.C 
> b/gcc/testsuite/g++.dg/asan/initlist2.C
> new file mode 100644
> index 000..bce5410be33
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/asan/initlist2.C
> @@ -0,0 +1,16 @@
> +// PR c++/115865
> +// { dg-do compile }
> +// { dg-options "-fsanitize=address" }
> +
> +typedef decltype(sizeof(char)) size_t;
> +
> +namespace std {
> +template  class initializer_list {
> +  int *_M_array;
> +  size_t _M_len;
> +};
> +}
> +
> +int main() {
> +  std::initializer_list x = { 1, 2, 3 };
> +}
> diff --git a/gcc/tree-eh.cc b/gcc/tree-eh.cc
> index a776ad5c92b..6ae6dab223e 100644
> --- a/gcc/tree-eh.cc
> +++ b/gcc/tree-eh.cc
> @@ -1703,8 +1703,8 @@ lower_try_finally (struct leh_state *state, gtry *tp)
>ndests += this_tf.may_return;
>ndests += this_tf.may_throw;
>  
> -  /* If the FINALLY block is not reachable, dike it out.  */
> -  if (ndests == 0)
> +  /* If the FINALLY block is not reachable or empty, dike it out.  */
> +  if (ndests == 0 || gimple_seq_empty_p (gimple_try_cleanup (tp)))
>  {
>gimple_seq_add_seq (&this_tf.top_p_seq, gimple_try_eval (tp));
>gimple_try_set_cleanup (tp, NULL);
> 
> base-commit: 74d8accaf88f83bfcab1150bf9be5140e7ac0e94
> -- 
> 2.45.2

Jakub



Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-17 Thread Tejas Belagod

On 7/17/24 4:36 PM, Richard Biener wrote:

On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod  wrote:


On 7/15/24 6:05 PM, Richard Biener wrote:

On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod  wrote:


On 7/15/24 12:16 PM, Tejas Belagod wrote:

On 7/12/24 6:40 PM, Richard Biener wrote:

On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  wrote:


On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:

Padding is only an issue for very small vectors - the obvious choice is
to disallow vector types that would require any padding.  I can hardly
see where those are faster than using a vector of up to 4 char
elements.
Problematic are 1-bit elements with 4, 2 or one element vectors,
2-bit elements
with 2 or one element vectors and 4-bit elements with 1 element
vectors.


I'd really like to avoid having to support something like
_BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) *
16)))
_BitInt(2) to say size of long long could be acceptable.


I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic
way to say
the element should have n (< 8) bits.


I have no idea what the stance of supporting _BitInt in C++ are,
but most certainly diverging support (or even semantics) of the
vector extension in C vs. C++ is undesirable.


I believe Clang supports it in C++ next to C, GCC doesn't and Jason
didn't
look favorably to _BitInt support in C++, so at least until something
like
that is standardized in C++ the answer is probably no.


OK, I think that rules out _BitInt use here so while bool is then natural
for 1-bit elements for 2-bit and 4-bit elements we'd have to specify the
number of bits explicitly.  There is signed_bool_precision but like
vector_mask it's use is restricted to the GIMPLE frontend because
interaction with the rest of the language isn't defined.



Thanks for all the suggestions - really insightful (to me) discussions.

Yeah, BitInt seemed like it was best placed for this, but not having C++
support is definitely a blocker. But as you say, in the absence of
BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One
way to specify non-1-bit widths could be overloading vector_size.

Also, I think overloading GIMPLE's vector_mask takes us into the
earlier-discussed territory of what it should actually mean - it meaning
the target truth type in GIMPLE and a generic vector extension in the FE
will probably confuse gcc developers more than users.


That said - we're mixing two things here.  The desire to have "proper"
svbool (fix: declare in the backend) and the desire to have "packed"
bit-precision vectors (for whatever actual reason) as part of the
GCC vector extension.



If we leave lane-disambiguation of svbool to the backend, the values I
see in supporting 1, 2 and 4 bitsizes are 1) first step towards
supporting BitInt(N) vectors possibly in the future 2) having a way for
targets to define their intrinsics' bool vector types using GNU
extensions 3) feature parity with Clang's ext_vector_type?

I believe the primary motivation for Clang to support ext_vector_type
was to have a way to define target intrinsics' vector bool type using
vector extensions.




Interestingly, Clang seems to support

typedef struct {
   _Bool i:1;
} STR;

typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR)
* 4))) vec;


int foo (vec b) {
  return sizeof b;
}

I can't find documentation about how it is implemented, but I suspect
the vector is constructed as an array STR[] i.e. possibly each
bit-element padded to byte boundary etc. Also, I can't seem to apply
many operations other than sizeof.

I don't know if we've tried to support such cases in GNU in the past?


Why should we do that?  It doesn't make much sense.

single-bit vectors is what _BitInt was invented for.


Forgive me if I'm misunderstanding - I'm trying to figure out how
_BitInts can be made to have single-bit generic vector semantics. For
eg. If I want to initialize a _BitInt as vector, I can't do:

   _BitInt (4) a = (_BitInt (4)){1, 0, 1, 1};

as 'a' expects a scalar initialization.

Of if I want to convert an int vector to bit vector, I can't do

v4si_p = v4si_a > v4si_b;
_BitInt (4) vbool = __builtin_convertvector (v4si_p, _BitInt (4));

Also semantics of conditionals with _BitInt behave like scalars

_BitInt (4) p = a && b; // Here a and b are _BitInt (4), but they
behave as scalars.

Also, I can't do things like

typedef _BitInt (2) vbool __attribute__((vector_size(sizeof (_BitInt
(2)) * 4)));

to force it to behave as a vector because _BitInt is disallowed here.



All I'm trying to say is that when people want to use vector as
a large packed bitfield they can now use _BitInt instead.  Of course
with a different (but portable) API.
> I don't see single-bit element vectors something as especially
useful with a "vector API".  What's its the use-case? (similar
for the two and four bit elements, with or without padding)



I'm trying to figure out if we had a portable (generi

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-17 Thread Richard Biener
On Wed, Jul 17, 2024 at 1:53 PM Tejas Belagod  wrote:
>
> On 7/17/24 4:36 PM, Richard Biener wrote:
> > On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod  
> > wrote:
> >>
> >> On 7/15/24 6:05 PM, Richard Biener wrote:
> >>> On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod  
> >>> wrote:
> 
>  On 7/15/24 12:16 PM, Tejas Belagod wrote:
> > On 7/12/24 6:40 PM, Richard Biener wrote:
> >> On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  wrote:
> >>>
> >>> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
>  Padding is only an issue for very small vectors - the obvious choice 
>  is
>  to disallow vector types that would require any padding.  I can 
>  hardly
>  see where those are faster than using a vector of up to 4 char
>  elements.
>  Problematic are 1-bit elements with 4, 2 or one element vectors,
>  2-bit elements
>  with 2 or one element vectors and 4-bit elements with 1 element
>  vectors.
> >>>
> >>> I'd really like to avoid having to support something like
> >>> _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) *
> >>> 16)))
> >>> _BitInt(2) to say size of long long could be acceptable.
> >>
> >> I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic
> >> way to say
> >> the element should have n (< 8) bits.
> >>
>  I have no idea what the stance of supporting _BitInt in C++ are,
>  but most certainly diverging support (or even semantics) of the
>  vector extension in C vs. C++ is undesirable.
> >>>
> >>> I believe Clang supports it in C++ next to C, GCC doesn't and Jason
> >>> didn't
> >>> look favorably to _BitInt support in C++, so at least until something
> >>> like
> >>> that is standardized in C++ the answer is probably no.
> >>
> >> OK, I think that rules out _BitInt use here so while bool is then 
> >> natural
> >> for 1-bit elements for 2-bit and 4-bit elements we'd have to specify 
> >> the
> >> number of bits explicitly.  There is signed_bool_precision but like
> >> vector_mask it's use is restricted to the GIMPLE frontend because
> >> interaction with the rest of the language isn't defined.
> >>
> >
> > Thanks for all the suggestions - really insightful (to me) discussions.
> >
> > Yeah, BitInt seemed like it was best placed for this, but not having C++
> > support is definitely a blocker. But as you say, in the absence of
> > BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One
> > way to specify non-1-bit widths could be overloading vector_size.
> >
> > Also, I think overloading GIMPLE's vector_mask takes us into the
> > earlier-discussed territory of what it should actually mean - it meaning
> > the target truth type in GIMPLE and a generic vector extension in the FE
> > will probably confuse gcc developers more than users.
> >
> >> That said - we're mixing two things here.  The desire to have "proper"
> >> svbool (fix: declare in the backend) and the desire to have "packed"
> >> bit-precision vectors (for whatever actual reason) as part of the
> >> GCC vector extension.
> >>
> >
> > If we leave lane-disambiguation of svbool to the backend, the values I
> > see in supporting 1, 2 and 4 bitsizes are 1) first step towards
> > supporting BitInt(N) vectors possibly in the future 2) having a way for
> > targets to define their intrinsics' bool vector types using GNU
> > extensions 3) feature parity with Clang's ext_vector_type?
> >
> > I believe the primary motivation for Clang to support ext_vector_type
> > was to have a way to define target intrinsics' vector bool type using
> > vector extensions.
> >
> 
> 
>  Interestingly, Clang seems to support
> 
>  typedef struct {
> _Bool i:1;
>  } STR;
> 
>  typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR)
>  * 4))) vec;
> 
> 
>  int foo (vec b) {
>    return sizeof b;
>  }
> 
>  I can't find documentation about how it is implemented, but I suspect
>  the vector is constructed as an array STR[] i.e. possibly each
>  bit-element padded to byte boundary etc. Also, I can't seem to apply
>  many operations other than sizeof.
> 
>  I don't know if we've tried to support such cases in GNU in the past?
> >>>
> >>> Why should we do that?  It doesn't make much sense.
> >>>
> >>> single-bit vectors is what _BitInt was invented for.
> >>
> >> Forgive me if I'm misunderstanding - I'm trying to figure out how
> >> _BitInts can be made to have single-bit generic vector semantics. For
> >> eg. If I want to initialize a _BitInt as vector, I can't do:
> >>
> >>_BitInt (4) a = (_BitInt (4)){1, 0, 1, 1};
> >>
> >> as 'a' expects a scalar initialization

Re: [PATCH v1] c++: Hash placeholder constraint in ctp_hasher

2024-07-17 Thread Seyed Sajad Kahani
On Tue, 16 Jul 2024 at 17:05, Jason Merrill  wrote:
> The change looks good, just a couple of whitespace tweaks needed.  But
> what happened to the testcase?

I was unable to design any testcase that differs by applying this
patch, due to the proper handling of hash collisions
(hash-table.h:1059).

> > --- a/gcc/cp/pt.cc
> > +++ b/gcc/cp/pt.cc
> > @@ -4525,7 +4525,12 @@ struct ctp_hasher : ggc_ptr_hash
> >   val = iterative_hash_object (TEMPLATE_TYPE_LEVEL (t), val);
> >   val = iterative_hash_object (TEMPLATE_TYPE_IDX (t), val);
> >   if (TREE_CODE (t) == TEMPLATE_TYPE_PARM)
> > -  val = iterative_hash_template_arg (CLASS_PLACEHOLDER_TEMPLATE (t), 
> > val);
> > +  {
> > + val
>
> Extra space at end of line.
>
> > +   = iterative_hash_template_arg (CLASS_PLACEHOLDER_TEMPLATE (t), val);
> > + if (tree c = NON_ERROR (PLACEHOLDER_TYPE_CONSTRAINTS (t)))
> > +   val = iterative_hash_placeholder_constraint(c, val);
>
> Missing space before paren.

My apologies. Thank you for pointing those out.


Re: [PATCH v9 07/10] Give better error messages for musttail

2024-07-17 Thread Richard Biener
On Mon, Jul 8, 2024 at 7:00 PM Andi Kleen  wrote:
>
> When musttail is set, make tree-tailcall give error messages
> when it cannot handle a call. This avoids vague "other reasons"
> error messages later at expand time when it sees a musttail
> function not marked tail call.
>
> In various cases this requires delaying the error until
> the call is discovered.
>
> Also print more information on the failure to the dump file.

This looks OK now.

Thanks,
Richard.

> gcc/ChangeLog:
>
> PR83324
> * tree-tailcall.cc (maybe_error_musttail): New function.
> (suitable_for_tail_opt_p): Report error reason.
> (suitable_for_tail_call_opt_p): Report error reason.
> (find_tail_calls): Accept basic blocks with abnormal edges.
> Delay reporting of errors until the call is discovered.
> Move top level suitability checks to here.
> (tree_optimize_tail_calls_1): Remove top level checks.
> ---
>  gcc/tree-tailcall.cc | 187 +++
>  1 file changed, 154 insertions(+), 33 deletions(-)
>
> diff --git a/gcc/tree-tailcall.cc b/gcc/tree-tailcall.cc
> index 43e8c25215cb..a68079d4f507 100644
> --- a/gcc/tree-tailcall.cc
> +++ b/gcc/tree-tailcall.cc
> @@ -40,9 +40,11 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-eh.h"
>  #include "dbgcnt.h"
>  #include "cfgloop.h"
> +#include "intl.h"
>  #include "common/common-target.h"
>  #include "ipa-utils.h"
>  #include "tree-ssa-live.h"
> +#include "diagnostic-core.h"
>
>  /* The file implements the tail recursion elimination.  It is also used to
> analyze the tail calls in general, passing the results to the rtl level
> @@ -131,14 +133,20 @@ static tree m_acc, a_acc;
>
>  static bitmap tailr_arg_needs_copy;
>
> +static void maybe_error_musttail (gcall *call, const char *err);
> +
>  /* Returns false when the function is not suitable for tail call optimization
> -   from some reason (e.g. if it takes variable number of arguments).  */
> +   from some reason (e.g. if it takes variable number of arguments). CALL
> +   is call to report for.  */
>
>  static bool
> -suitable_for_tail_opt_p (void)
> +suitable_for_tail_opt_p (gcall *call)
>  {
>if (cfun->stdarg)
> -return false;
> +{
> +  maybe_error_musttail (call, _("caller uses stdargs"));
> +  return false;
> +}
>
>return true;
>  }
> @@ -146,35 +154,47 @@ suitable_for_tail_opt_p (void)
>  /* Returns false when the function is not suitable for tail call optimization
> for some reason (e.g. if it takes variable number of arguments).
> This test must pass in addition to suitable_for_tail_opt_p in order to 
> make
> -   tail call discovery happen.  */
> +   tail call discovery happen. CALL is call to report error for.  */
>
>  static bool
> -suitable_for_tail_call_opt_p (void)
> +suitable_for_tail_call_opt_p (gcall *call)
>  {
>tree param;
>
>/* alloca (until we have stack slot life analysis) inhibits
>   sibling call optimizations, but not tail recursion.  */
>if (cfun->calls_alloca)
> -return false;
> +{
> +  maybe_error_musttail (call, _("caller uses alloca"));
> +  return false;
> +}
>
>/* If we are using sjlj exceptions, we may need to add a call to
>   _Unwind_SjLj_Unregister at exit of the function.  Which means
>   that we cannot do any sibcall transformations.  */
>if (targetm_common.except_unwind_info (&global_options) == UI_SJLJ
>&& current_function_has_exception_handlers ())
> -return false;
> +{
> +  maybe_error_musttail (call, _("caller uses sjlj exceptions"));
> +  return false;
> +}
>
>/* Any function that calls setjmp might have longjmp called from
>   any called function.  ??? We really should represent this
>   properly in the CFG so that this needn't be special cased.  */
>if (cfun->calls_setjmp)
> -return false;
> +{
> +  maybe_error_musttail (call, _("caller uses setjmp"));
> +  return false;
> +}
>
>/* Various targets don't handle tail calls correctly in functions
>   that call __builtin_eh_return.  */
>if (cfun->calls_eh_return)
> -return false;
> +{
> +  maybe_error_musttail (call, _("caller uses __builtin_eh_return"));
> +  return false;
> +}
>
>/* ??? It is OK if the argument of a function is taken in some cases,
>   but not in all cases.  See PR15387 and PR19616.  Revisit for 4.1.  */
> @@ -182,7 +202,10 @@ suitable_for_tail_call_opt_p (void)
> param;
> param = DECL_CHAIN (param))
>  if (TREE_ADDRESSABLE (param))
> -  return false;
> +  {
> +   maybe_error_musttail (call, _("address of caller arguments taken"));
> +   return false;
> +  }
>
>return true;
>  }
> @@ -402,16 +425,42 @@ propagate_through_phis (tree var, edge e)
>return var;
>  }
>
> +/* Report an error for failing to tail convert must call CALL
> +   with error message ERR. Also clear the flag to

Re: [PATCH v2] MATCH: Simplify (a ? x : y) eq/ne (b ? x : y) [PR111150]

2024-07-17 Thread Richard Biener
On Tue, Jul 16, 2024 at 3:36 PM Eikansh Gupta  wrote:
>
> This patch adds match pattern for `(a ? x : y) eq/ne (b ? x : y)`.
> In forwprop1 pass, depending on the type of `a` and `b`, GCC produces
> `vec_cond` or `cond_expr`. Based on the observation that `(x != y)` is
> TRUE, the pattern can be optimized to produce `(a^b ? TRUE : FALSE)`.
>
> The patch adds match pattern for a, b:
> (a ? x : y) != (b ? x : y) --> (a^b) ? TRUE  : FALSE
> (a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE
> (a ? x : y) != (b ? y : x) --> (a^b) ? TRUE  : FALSE
> (a ? x : y) == (b ? y : x) --> (a^b) ? FALSE : TRUE

OK.

Richard.

> PR tree-optimization/50
>
> gcc/ChangeLog:
>
> * match.pd (`(a ? x : y) eq/ne (b ? x : y)`): New pattern.
> (`(a ? x : y) eq/ne (b ? y : x)`): New pattern.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.dg/tree-ssa/pr50.c: New test.
> * gcc.dg/tree-ssa/pr50-1.c: New test.
> * g++.dg/tree-ssa/pr50.C: New test.
>
> Signed-off-by: Eikansh Gupta 
> ---
>  gcc/match.pd   | 15 +
>  gcc/testsuite/g++.dg/tree-ssa/pr50.C   | 33 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr50-1.c | 72 ++
>  gcc/testsuite/gcc.dg/tree-ssa/pr50.c   | 22 +++
>  4 files changed, 142 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr50.C
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr50-1.c
>  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr50.c
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 3759c64d461..7c125255ea3 100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -5577,6 +5577,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(vec_cond (bit_and (bit_not @0) @1) @2 @3)))
>  #endif
>
> +/* (a ? x : y) != (b ? x : y) --> (a^b) ? TRUE  : FALSE */
> +/* (a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE  */
> +/* (a ? x : y) != (b ? y : x) --> (a^b) ? FALSE : TRUE  */
> +/* (a ? x : y) == (b ? y : x) --> (a^b) ? TRUE  : FALSE */
> +(for cnd (cond vec_cond)
> + (for eqne (eq ne)
> +  (simplify
> +   (eqne:c (cnd @0 @1 @2) (cnd @3 @1 @2))
> +(cnd (bit_xor @0 @3) { constant_boolean_node (eqne == NE_EXPR, type); }
> + { constant_boolean_node (eqne != NE_EXPR, type); }))
> +  (simplify
> +   (eqne:c (cnd @0 @1 @2) (cnd @3 @2 @1))
> +(cnd (bit_xor @0 @3) { constant_boolean_node (eqne != NE_EXPR, type); }
> + { constant_boolean_node (eqne == NE_EXPR, type); }
> +
>  /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
> types are compatible.  */
>  (simplify
> diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr50.C 
> b/gcc/testsuite/g++.dg/tree-ssa/pr50.C
> new file mode 100644
> index 000..ca02d8dc51e
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/tree-ssa/pr50.C
> @@ -0,0 +1,33 @@
> +/* PR tree-optimization/50 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fdump-tree-forwprop1" } */
> +typedef int v4si __attribute((__vector_size__(4 * sizeof(int;
> +
> +/* Before the patch, VEC_COND_EXPR was generated for each statement in the
> +   function. This resulted in 3 VEC_COND_EXPR. */
> +v4si f1_(v4si a, v4si b, v4si c, v4si d, v4si e, v4si f) {
> +  v4si X = a == b ? e : f;
> +  v4si Y = c == d ? e : f;
> +  return (X != Y);
> +}
> +
> +v4si f2_(v4si a, v4si b, v4si c, v4si d, v4si e, v4si f) {
> +  v4si X = a == b ? e : f;
> +  v4si Y = c == d ? e : f;
> +  return (X == Y);
> +}
> +
> +v4si f3_(v4si a, v4si b, v4si c, v4si d, v4si e, v4si f) {
> +  v4si X = a == b ? e : f;
> +  v4si Y = c == d ? f : e;
> +  return (X != Y);
> +}
> +
> +v4si f4_(v4si a, v4si b, v4si c, v4si d, v4si e, v4si f) {
> +  v4si X = a == b ? e : f;
> +  v4si Y = c == d ? f : e;
> +  return (X == Y);
> +}
> +
> +/* For each testcase, should produce only one VEC_COND_EXPR for X^Y. */
> +/* { dg-final { scan-tree-dump-times " VEC_COND_EXPR " 4 "forwprop1" } } */
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr50-1.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr50-1.c
> new file mode 100644
> index 000..6f4b21ac6bc
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr50-1.c
> @@ -0,0 +1,72 @@
> +/* PR tree-optimization/50 */
> +/* { dg-do compile } */
> +/* { dg-options "-O1 -fgimple -fdump-tree-forwprop1-raw" } */
> +
> +/* Checks if pattern (X ? e : f) == (Y ? e : f) gets optimized. */
> +__GIMPLE()
> +_Bool f1_(int a, int b, int c, int d, int e, int f) {
> +  _Bool X;
> +  _Bool Y;
> +  _Bool t;
> +  int t1;
> +  int t2;
> +  X = a == b;
> +  Y = c == d;
> +  /* Before the patch cond_expr was generated for these 2 statements. */
> +  t1 = X ? e : f;
> +  t2 = Y ? e : f;
> +  t = t1 == t2;
> +  return t;
> +}
> +
> +/* Checks if pattern (X ? e : f) != (Y ? e : f) gets optimized. */
> +__GIMPLE()
> +_Bool f2_(int a, int b, int c, int d, int e, int f) {
> +  _Bool X;
> +  _Bool Y;
> +  _Bool t;
> +  int t1;
> +  int t2;
> +  X = a == b;
> +  Y = c == d;
> +  t1 = X ? e : f;
> +  t2 = Y ? e : f;
> +  t = t1 != t2;
> 

[PATCH v2] c++: Hash placeholder constraint in ctp_hasher

2024-07-17 Thread Seyed Sajad Kahani
This patch addresses a difference between the hash function and the equality
function for canonical types of template parameters (ctp_hasher). The equality
function uses comptypes (typeck.cc) (with COMPARE_STRUCTURAL) and checks
constraint equality for two auto nodes (typeck.cc:1586), while the hash
function ignores it (pt.cc:4528). This leads to hash collisions that can be
avoided by using `hash_placeholder_constraint` (constraint.cc:1150).

* constraint.cc (hash_placeholder_constraint): Rename to
iterative_hash_placeholder_constraint.
(iterative_hash_placeholder_constraint): Rename from
hash_placeholder_constraint and add the initial val argument.
* cp-tree.h (hash_placeholder_constraint): Rename to
iterative_hash_placeholder_constraint.
(iterative_hash_placeholder_constraint): Renamed from
hash_placeholder_constraint and add the initial val argument.
* pt.cc (struct ctp_hasher): Updated to use
iterative_hash_placeholder_constraint in the case of a valid placeholder
constraint.
(auto_hash::hash): Reflect the renaming of hash_placeholder_constraint 
to
iterative_hash_placeholder_constraint.
---
 gcc/cp/constraint.cc | 4 ++--
 gcc/cp/cp-tree.h | 2 +-
 gcc/cp/pt.cc | 9 +++--
 3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index ebf4255e5..78aacb77a 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -1971,13 +1971,13 @@ equivalent_placeholder_constraints (tree c1, tree c2)
 /* Return a hash value for the placeholder ATOMIC_CONSTR C.  */
 
 hashval_t
-hash_placeholder_constraint (tree c)
+iterative_hash_placeholder_constraint (tree c, hashval_t val)
 {
   tree t, a;
   placeholder_extract_concept_and_args (c, t, a);
 
   /* Like hash_tmpl_and_args, but skip the first argument.  */
-  hashval_t val = iterative_hash_object (DECL_UID (t), 0);
+  val = iterative_hash_object (DECL_UID (t), val);
 
   for (int i = TREE_VEC_LENGTH (a)-1; i > 0; --i)
 val = iterative_hash_template_arg (TREE_VEC_ELT (a, i), val);
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 4bb3e9c49..294e88f75 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8588,7 +8588,7 @@ extern tree_pair finish_type_constraints  (tree, tree, 
tsubst_flags_t);
 extern tree build_constrained_parameter (tree, tree, tree = NULL_TREE);
 extern void placeholder_extract_concept_and_args (tree, tree&, tree&);
 extern bool equivalent_placeholder_constraints  (tree, tree);
-extern hashval_t hash_placeholder_constraint   (tree);
+extern hashval_t iterative_hash_placeholder_constraint (tree, hashval_t);
 extern bool deduce_constrained_parameter(tree, tree&, tree&);
 extern tree resolve_constraint_check(tree);
 extern tree check_function_concept  (tree);
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d1316483e..3229c3706 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -4525,7 +4525,12 @@ struct ctp_hasher : ggc_ptr_hash
 val = iterative_hash_object (TEMPLATE_TYPE_LEVEL (t), val);
 val = iterative_hash_object (TEMPLATE_TYPE_IDX (t), val);
 if (TREE_CODE (t) == TEMPLATE_TYPE_PARM)
-  val = iterative_hash_template_arg (CLASS_PLACEHOLDER_TEMPLATE (t), val);
+  {
+   val
+ = iterative_hash_template_arg (CLASS_PLACEHOLDER_TEMPLATE (t), val);
+   if (tree c = NON_ERROR (PLACEHOLDER_TYPE_CONSTRAINTS (t)))
+ val = iterative_hash_placeholder_constraint (c, val);
+  }
 if (TREE_CODE (t) == BOUND_TEMPLATE_TEMPLATE_PARM)
   val = iterative_hash_template_arg (TYPE_TI_ARGS (t), val);
 --comparing_specializations;
@@ -29605,7 +29610,7 @@ auto_hash::hash (tree t)
   if (tree c = NON_ERROR (PLACEHOLDER_TYPE_CONSTRAINTS (t)))
 /* Matching constrained-type-specifiers denote the same template
parameter, so hash the constraint.  */
-return hash_placeholder_constraint (c);
+return iterative_hash_placeholder_constraint (c, 0);
   else
 /* But unconstrained autos are all separate, so just hash the pointer.  */
 return iterative_hash_object (t, 0);
-- 
2.45.2



Re: [PATCH] rs6000: Relax some FLOAT128 expander condition for FLOAT128_IEEE_P [PR105359]

2024-07-17 Thread Peter Bergner
On 7/17/24 4:09 AM, Kewen.Lin wrote:
>   * config/rs6000/rs6000.md (@extenddf2): Remove condition
>   TARGET_LONG_DOUBLE_128 for FLOAT128_IEEE_P modes.

This all LGTM, except this ChangeLog fragment doesn't match the code changes
below.  Rather than removing TARGET_LONG_DOUBLE_128, you've added
FLOAT128_IEEE_P (mode)).


> -  "TARGET_HARD_FLOAT && TARGET_LONG_DOUBLE_128"
> +  "TARGET_HARD_FLOAT
> +   && (TARGET_LONG_DOUBLE_128 || FLOAT128_IEEE_P (mode))"


Peter



Re: [Ping, Fortran, Patch, PR78466, coarray, v1.1] Fix Explicit cobounds of a procedures parameter not respected

2024-07-17 Thread Andre Vehreschild
Hi all,

just pinging on this patch. The attached patch is rebased to an unmodified
master as of this afternoon (CEST 3 p.m.).

Anyone in for a review?

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?

Regards,
Andre

On Wed, 10 Jul 2024 11:17:44 +0200
Andre Vehreschild  wrote:

> Hi all,
>
> the attached patch fixes explicit cobounds of procedure parameters not
> respected. The central issue is, that class (array) types store their
> attributes and `as` in the first component of the derived type. This made
> comparison of existing types harder and gfortran confused generated trees for
> different cobounds. The attached patch fixes this.
>
> Note, the patch is based
> on https://gcc.gnu.org/pipermail/fortran/2024-July/060645.html . Without it 
> the
> test poly_run_2 fails.
>
> Regtests ok on x86_64-pc-linux-gnu/Fedora 39. Ok for mainline?
>
> This patch also fixes PR fortran/80774.
>
> Regards,
>   Andre
> --
> Andre Vehreschild * Email: vehre ad gmx dot de


--
Andre Vehreschild * Email: vehre ad gmx dot de
From 5de2037da1bca3e084838458b60751965020c031 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Thu, 31 Dec 2020 10:40:30 +0100
Subject: [PATCH] Fortran: Fix Explicit cobounds of a procedures parameter not
 respected [PR78466]

Explicit cobounds of class array procedure parameters were not taken
into account.  Furthermore were different cobounds in distinct
procedure parameter lists mixed up, i.e. the last definition was taken
for all.  The bounds are now regenerated when tree's and expr's bounds
do not match.

	PR fortran/78466
	PR fortran/80774

gcc/fortran/ChangeLog:

	* array.cc (gfc_compare_array_spec): Take cotype into account.
	* class.cc (gfc_build_class_symbol): Coarrays are also arrays.
	* gfortran.h (IS_CLASS_COARRAY_OR_ARRAY): New macro to detect
	regular and coarray class arrays.
	* interface.cc (compare_components): Take codimension into
	account.
	* resolve.cc (resolve_symbol): Improve error message.
	* simplify.cc (simplify_bound_dim): Remove duplicate.
	* trans-array.cc (gfc_trans_array_cobounds): Coarrays are also
	arrays.
	(gfc_trans_array_bounds): Same.
	(gfc_trans_dummy_array_bias): Same.
	(get_coarray_as): Get the as having a non-zero codim.
	(is_explicit_coarray): Detect explicit coarrays.
	(gfc_conv_expr_descriptor): Create a new descriptor for explicit
	coarrays.
	* trans-decl.cc (gfc_build_qualified_array): Coarrays are also
	arrays.
	(gfc_build_dummy_array_decl): Same.
	(gfc_get_symbol_decl): Same.
	(gfc_trans_deferred_vars): Same.
	* trans-expr.cc (class_scalar_coarray_to_class): Get the
	descriptor from the correct location.
	(gfc_conv_variable): Pick up the descriptor when needed.
	* trans-types.cc (gfc_is_nodesc_array): Coarrays are also
	arrays.
	(gfc_get_nodesc_array_type): Indentation fix only.
	(cobounds_match_decl): Match a tree's bounds to the expr's
	bounds and return true, when they match.
	(gfc_get_derived_type): Create a new type tree/descriptor, when
	the cobounds of the existing declaration and expr to not
	match.  This happends for class arrays in parameter list, when
	there are different cobound declarations.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/poly_run_1.f90: Activate old test code.
	* gfortran.dg/coarray/poly_run_2.f90: Activate test.  It was
	stopping before and passing without an error.
---
 gcc/fortran/array.cc  |  3 +
 gcc/fortran/class.cc  |  8 +-
 gcc/fortran/gfortran.h|  5 ++
 gcc/fortran/interface.cc  |  7 ++
 gcc/fortran/resolve.cc|  3 +-
 gcc/fortran/simplify.cc   |  2 -
 gcc/fortran/trans-array.cc| 53 -
 gcc/fortran/trans-decl.cc | 20 ++---
 gcc/fortran/trans-expr.cc | 34 ++---
 gcc/fortran/trans-types.cc| 74 ---
 .../gfortran.dg/coarray/poly_run_1.f90| 33 -
 .../gfortran.dg/coarray/poly_run_2.f90| 28 ---
 12 files changed, 207 insertions(+), 63 deletions(-)

diff --git a/gcc/fortran/array.cc b/gcc/fortran/array.cc
index e9934f1491b2..79c774d59a0b 100644
--- a/gcc/fortran/array.cc
+++ b/gcc/fortran/array.cc
@@ -1017,6 +1017,9 @@ gfc_compare_array_spec (gfc_array_spec *as1, gfc_array_spec *as2)
   if (as1->type != as2->type)
 return 0;

+  if (as1->cotype != as2->cotype)
+return 0;
+
   if (as1->type == AS_EXPLICIT)
 for (i = 0; i < as1->rank + as1->corank; i++)
   {
diff --git a/gcc/fortran/class.cc b/gcc/fortran/class.cc
index abe89630be3c..b9dcc0a3d98c 100644
--- a/gcc/fortran/class.cc
+++ b/gcc/fortran/class.cc
@@ -709,8 +709,12 @@ gfc_build_class_symbol (gfc_typespec *ts, symbol_attribute *attr,
  work on the declared type. All array type other than deferred shape or
  assumed rank are added to the function namespace to ensure that they
  are properly distinguished.  */
-  if (attr->dummy && !a

Re: [Ping, Fortran, Patch, PR82904] Fix [11/12/13/14/15 Regression][Coarray] ICE in make_ssa_name_fn, at tree-ssanames.c:261

2024-07-17 Thread Andre Vehreschild
Hi all,

pinging for attached patch rebased on master and my patch for 78466.

Anyone in for a review?

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?

Regards,
Andre

On Wed, 10 Jul 2024 14:51:53 +0200
Andre Vehreschild  wrote:

> Hi all,
>
> the patch attached fixes the use of an uninitialized variable for the string
> length in the declaration of the char[1:_len] type (the _len!). The type for
> save'd deferred length char arrays is now char*, so that there is no need for
> the length in the type declaration anymore. The length is of course still
> provided and needed later on.
>
> I hope this fixes the ICE in the IPA: inline phase, because I never saw it. Is
> that what you had in mind @Richard?
>
> Regtests ok on x86_64-pc-linux-gnu/Fedora 39. Ok for mainline?
>
> Regards,
>   Andre
> --
> Andre Vehreschild * Email: vehre ad gmx dot de


--
Andre Vehreschild * Email: vehre ad gmx dot de
From e66e66ed6ac191763b430789f804fabafd9dfc4c Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Wed, 10 Jul 2024 14:37:37 +0200
Subject: [PATCH] Fortran: Use char* for deferred length character arrays
 [PR82904]

Randomly during compiling the pass IPA: inline would ICE.  This was
caused by a saved deferred length string.  The length variable was not
set, but the variable was used in the array's declaration.  Now using a
character pointer to prevent this.

gcc/fortran/ChangeLog:

	* trans-types.cc (gfc_sym_type): Use type `char*` for saved
	deferred length char arrays.
	* trans.cc (get_array_span): Get `.span` also for `char*` typed
	arrays, i.e. for those that have INTEGER_TYPE instead of
	ARRAY_TYPE.

gcc/testsuite/ChangeLog:

	* gfortran.dg/deferred_character_38.f90: New test.
---
 gcc/fortran/trans-types.cc|  6 --
 gcc/fortran/trans.cc  |  4 +++-
 .../gfortran.dg/deferred_character_38.f90 | 20 +++
 3 files changed, 27 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/deferred_character_38.f90

diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index 6743adfc9bab..59d72136a0de 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -2334,8 +2334,10 @@ gfc_sym_type (gfc_symbol * sym, bool is_bind_c)
 	  || ((sym->attr.result || sym->attr.value)
 	  && sym->ns->proc_name
 	  && sym->ns->proc_name->attr.is_bind_c)
-	  || (sym->ts.deferred && (!sym->ts.u.cl
-   || !sym->ts.u.cl->backend_decl))
+	  || (sym->ts.deferred
+	  && (!sym->ts.u.cl
+		  || !sym->ts.u.cl->backend_decl
+		  || sym->attr.save))
 	  || (sym->attr.dummy
 	  && sym->attr.value
 	  && gfc_length_one_character_type_p (&sym->ts
diff --git a/gcc/fortran/trans.cc b/gcc/fortran/trans.cc
index 1067e032621b..d4c54093cbc3 100644
--- a/gcc/fortran/trans.cc
+++ b/gcc/fortran/trans.cc
@@ -398,7 +398,9 @@ get_array_span (tree type, tree decl)
 return gfc_conv_descriptor_span_get (decl);

   /* Return the span for deferred character length array references.  */
-  if (type && TREE_CODE (type) == ARRAY_TYPE && TYPE_STRING_FLAG (type))
+  if (type
+  && (TREE_CODE (type) == ARRAY_TYPE || TREE_CODE (type) == INTEGER_TYPE)
+  && TYPE_STRING_FLAG (type))
 {
   if (TREE_CODE (decl) == PARM_DECL)
 	decl = build_fold_indirect_ref_loc (input_location, decl);
diff --git a/gcc/testsuite/gfortran.dg/deferred_character_38.f90 b/gcc/testsuite/gfortran.dg/deferred_character_38.f90
new file mode 100644
index ..d5a6c0e50136
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/deferred_character_38.f90
@@ -0,0 +1,20 @@
+! { dg-do run }
+
+! Check for PR fortran/82904
+! Contributed by G.Steinmetz  
+
+! This test checks that 'IPA pass: inline' passes.
+! The initial version of the testcase contained coarrays, which does not work
+! yet.
+
+program p
+   save
+   character(:), allocatable :: x
+   character(:), allocatable :: y
+   allocate (character(3) :: y)
+   allocate (x, source='abc')
+   y = x
+
+   if (y /= 'abc') stop 1
+end
+
--
2.45.2



Re: [Ping, Patch, Fortran, PR88624, v1] Fix Rejects allocatable coarray passed as a dummy argument

2024-07-17 Thread Andre Vehreschild
Hi all,

and another ping...

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?

- Andre

On Thu, 11 Jul 2024 15:42:54 +0200
Andre Vehreschild  wrote:

> Hi all,
>
> attached patch fixes using of coarrays as dummy arguments. The coarray
> dummy argument was not dereferenced correctly, which is fixed now.
>
> Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline.
>
> Regards,
>   Andre
> --
> Andre Vehreschild * Email: vehre ad gcc dot gnu dot org


--
Andre Vehreschild * Email: vehre ad gmx dot de
From 7af72686efe21c14672b909646862a6fd80ca7b4 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Thu, 11 Jul 2024 10:07:12 +0200
Subject: [PATCH] [Fortran] Fix Rejects allocatable coarray passed as a dummy
 argument [88624]

Coarray parameters of procedures/functions need to be dereffed, because
they are references to the descriptor but the routine expected the
descriptor directly.

	PR fortran/88624

gcc/fortran/ChangeLog:

	* trans-expr.cc (gfc_conv_procedure_call): Treat
	pointers/references (e.g. from parameters) correctly by derefing
	them.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/dummy_1.f90: Add calling function trough
	function.
---
 gcc/fortran/trans-expr.cc | 35 +--
 gcc/testsuite/gfortran.dg/coarray/dummy_1.f90 |  2 ++
 2 files changed, 27 insertions(+), 10 deletions(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index d9eb333abcb1..feb43fdec746 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -7773,16 +7773,26 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 		   && CLASS_DATA (fsym)->attr.codimension
 		   && !CLASS_DATA (fsym)->attr.allocatable)))
 	{
-	  tree caf_decl, caf_type;
+	  tree caf_decl, caf_type, caf_desc = NULL_TREE;
 	  tree offset, tmp2;

 	  caf_decl = gfc_get_tree_for_caf_expr (e);
 	  caf_type = TREE_TYPE (caf_decl);
-
-	  if (GFC_DESCRIPTOR_TYPE_P (caf_type)
-	  && (GFC_TYPE_ARRAY_AKIND (caf_type) == GFC_ARRAY_ALLOCATABLE
-		  || GFC_TYPE_ARRAY_AKIND (caf_type) == GFC_ARRAY_POINTER))
-	tmp = gfc_conv_descriptor_token (caf_decl);
+	  if (POINTER_TYPE_P (caf_type)
+	  && GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (caf_type)))
+	caf_desc = TREE_TYPE (caf_type);
+	  else if (GFC_DESCRIPTOR_TYPE_P (caf_type))
+	caf_desc = caf_type;
+
+	  if (caf_desc
+	  && (GFC_TYPE_ARRAY_AKIND (caf_desc) == GFC_ARRAY_ALLOCATABLE
+		  || GFC_TYPE_ARRAY_AKIND (caf_desc) == GFC_ARRAY_POINTER))
+	{
+	  tmp = POINTER_TYPE_P (TREE_TYPE (caf_decl))
+		  ? build_fold_indirect_ref (caf_decl)
+		  : caf_decl;
+	  tmp = gfc_conv_descriptor_token (tmp);
+	}
 	  else if (DECL_LANG_SPECIFIC (caf_decl)
 		   && GFC_DECL_TOKEN (caf_decl) != NULL_TREE)
 	tmp = GFC_DECL_TOKEN (caf_decl);
@@ -7795,8 +7805,8 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,

 	  vec_safe_push (stringargs, tmp);

-	  if (GFC_DESCRIPTOR_TYPE_P (caf_type)
-	  && GFC_TYPE_ARRAY_AKIND (caf_type) == GFC_ARRAY_ALLOCATABLE)
+	  if (caf_desc
+	  && GFC_TYPE_ARRAY_AKIND (caf_desc) == GFC_ARRAY_ALLOCATABLE)
 	offset = build_int_cst (gfc_array_index_type, 0);
 	  else if (DECL_LANG_SPECIFIC (caf_decl)
 		   && GFC_DECL_CAF_OFFSET (caf_decl) != NULL_TREE)
@@ -7806,8 +7816,13 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 	  else
 	offset = build_int_cst (gfc_array_index_type, 0);

-	  if (GFC_DESCRIPTOR_TYPE_P (caf_type))
-	tmp = gfc_conv_descriptor_data_get (caf_decl);
+	  if (caf_desc)
+	{
+	  tmp = POINTER_TYPE_P (TREE_TYPE (caf_decl))
+		  ? build_fold_indirect_ref (caf_decl)
+		  : caf_decl;
+	  tmp = gfc_conv_descriptor_data_get (tmp);
+	}
 	  else
 	{
 	  gcc_assert (POINTER_TYPE_P (caf_type));
diff --git a/gcc/testsuite/gfortran.dg/coarray/dummy_1.f90 b/gcc/testsuite/gfortran.dg/coarray/dummy_1.f90
index 33e95853ad4a..c437b2a10fc4 100644
--- a/gcc/testsuite/gfortran.dg/coarray/dummy_1.f90
+++ b/gcc/testsuite/gfortran.dg/coarray/dummy_1.f90
@@ -66,5 +66,7 @@
 if (lcobound(A, dim=1) /= 2) STOP 13
 if (ucobound(A, dim=1) /= 3) STOP 14
 if (lcobound(A, dim=2) /= 5) STOP 15
+
+call sub4(A)  ! Check PR88624 is fixed.
   end subroutine sub5
   end
--
2.45.2



Re: [Ping, Patch, Fortran, PR84244, v1] Fix ICE in recompute_tree_invariant_for_addr_expr, at tree.c:4535

2024-07-17 Thread Andre Vehreschild
Hi all,

and the last ping.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?

Regards,
Andre

On Thu, 11 Jul 2024 16:05:09 +0200
Andre Vehreschild  wrote:

> Hi all,
>
> the attached patch fixes a segfault in the compiler, where for pointer
> components of a derived type the caf_token in the component was not
> set, when the derived was previously used outside of a coarray.
>
> Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?
>
> Regards,
>   Andre


--
Andre Vehreschild * Email: vehre ad gmx dot de
From ff2d1145fc008f23e835054e6bb668be0430fdd7 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Thu, 11 Jul 2024 15:44:56 +0200
Subject: [PATCH] [Fortran] Fix ICE in recompute_tree_invariant_for_addr_expr,
 at tree.c:4535 [PR84244]

Declaring an unused function with a derived type having a pointer
component and using that derived type as a coarray, lead the compiler to
ICE because the caf_token for the pointer was not linked into the
component correctly.

	PR fortran/84244

gcc/fortran/ChangeLog:

	* trans-types.cc (gfc_get_derived_type): When a caf_sub_token is
	generated for a component, link it to the component it is
	generated for (the previous one).

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray/ptr_comp_5.f08: New test.
---
 gcc/fortran/trans-types.cc|  6 +-
 .../gfortran.dg/coarray/ptr_comp_5.f08| 19 +++
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gfortran.dg/coarray/ptr_comp_5.f08

diff --git a/gcc/fortran/trans-types.cc b/gcc/fortran/trans-types.cc
index 59d72136a0de..71415385c8c3 100644
--- a/gcc/fortran/trans-types.cc
+++ b/gcc/fortran/trans-types.cc
@@ -2661,7 +2661,7 @@ gfc_get_derived_type (gfc_symbol * derived, int codimen)
   tree *chain = NULL;
   bool got_canonical = false;
   bool unlimited_entity = false;
-  gfc_component *c;
+  gfc_component *c, *last_c = nullptr;
   gfc_namespace *ns;
   tree tmp;
   bool coarray_flag, class_coarray_flag;
@@ -2961,10 +2961,14 @@ gfc_get_derived_type (gfc_symbol * derived, int codimen)
 	 types.  */
   if (class_coarray_flag || !c->backend_decl)
 	c->backend_decl = field;
+  if (c->attr.caf_token && last_c)
+	last_c->caf_token = field;

   if (c->attr.pointer && (c->attr.dimension || c->attr.codimension)
 	  && !(c->ts.type == BT_DERIVED && strcmp (c->name, "_data") == 0))
 	GFC_DECL_PTR_ARRAY_P (c->backend_decl) = 1;
+
+  last_c = c;
 }

   /* Now lay out the derived type, including the fields.  */
diff --git a/gcc/testsuite/gfortran.dg/coarray/ptr_comp_5.f08 b/gcc/testsuite/gfortran.dg/coarray/ptr_comp_5.f08
new file mode 100644
index ..ed3a8db13fa5
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/ptr_comp_5.f08
@@ -0,0 +1,19 @@
+! { dg-do compile }
+
+! Check PR84244 does not ICE anymore.
+
+program ptr_comp_5
+  integer, target :: dest = 42
+  type t
+integer, pointer :: p
+  end type
+  type(t) :: o[*]
+
+  o%p => dest
+contains
+  ! This unused routine is crucial for the ICE.
+  function f(x)
+type(t), intent(in) ::x
+  end function
+end program
+
--
2.45.2



[Fortran, Patch, coarray, PR84246] Fix for [Coarray] ICE in conv_caf_send, at fortran/trans-intrinsic.c:1950

2024-07-17 Thread Andre Vehreschild
Hi all,

attached patch fixes an ICE in coarray code, where the coarray expression
already was a pointer, which confused the compiler. Furthermore have I removed
a rewrite to a caf_send late in the trans-phase. This is now done in the
resolve stage to have only a single place for these rewriting actions.

Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?

Regards,
Andre
--
Andre Vehreschild * Email: vehre ad gmx dot de
From e073b32bd792f7db92334e89e546095c9ad583f8 Mon Sep 17 00:00:00 2001
From: Andre Vehreschild 
Date: Wed, 17 Jul 2024 12:30:52 +0200
Subject: [PATCH] Fortran: Fix [Coarray] ICE in conv_caf_send, at
 fortran/trans-intrinsic.c:1950 [PR84246]

Fix ICE caused by converted expression already being pointer by checking
for its type.  Lift rewrite to caf_send completely into resolve and
prevent more temporary arrays.

	PR fortran/84246

gcc/fortran/ChangeLog:

	* resolve.cc (caf_possible_reallocate): Detect arrays that may
	be reallocated by caf_send.
	(resolve_ordinary_assign): More reliably detect assignments
	where a rewrite to caf_send is needed.
	* trans-expr.cc (gfc_trans_assignment_1): Remove rewrite to
	caf_send, because this is done by resolve now.
	* trans-intrinsic.cc (conv_caf_send): Prevent unneeded temporary
	arrays.

libgfortran/ChangeLog:

	* caf/single.c (send_by_ref): Created array's lbound is now 1
	and the offset set correctly.

gcc/testsuite/ChangeLog:

	* gfortran.dg/coarray_allocate_7.f08: Adapt to array being
	allocate by caf_send.
---
 gcc/fortran/resolve.cc| 18 +++
 gcc/fortran/trans-expr.cc | 23 ---
 gcc/fortran/trans-intrinsic.cc| 17 --
 .../gfortran.dg/coarray_allocate_7.f08|  4 +---
 libgfortran/caf/single.c  |  6 ++---
 5 files changed, 32 insertions(+), 36 deletions(-)

diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 503029364c14..3bf48f793d80 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -11450,6 +11450,23 @@ gfc_resolve_blocks (gfc_code *b, gfc_namespace *ns)
 }
 }

+bool
+caf_possible_reallocate (gfc_expr *e)
+{
+  symbol_attribute caf_attr;
+  gfc_ref *last_arr_ref = nullptr;
+
+  caf_attr = gfc_caf_attr (e);
+  if (!caf_attr.codimension || !caf_attr.allocatable || !caf_attr.dimension)
+return false;
+
+  /* Only full array refs can indicate a needed reallocation.  */
+  for (gfc_ref *ref = e->ref; ref; ref = ref->next)
+if (ref->type == REF_ARRAY && ref->u.ar.dimen)
+  last_arr_ref = ref;
+
+  return last_arr_ref && last_arr_ref->u.ar.type == AR_FULL;
+}

 /* Does everything to resolve an ordinary assignment.  Returns true
if this is an interface assignment.  */
@@ -11694,6 +11711,7 @@ resolve_ordinary_assign (gfc_code *code, gfc_namespace *ns)

   bool caf_convert_to_send = flag_coarray == GFC_FCOARRAY_LIB
   && (lhs_coindexed
+	  || caf_possible_reallocate (lhs)
 	  || (code->expr2->expr_type == EXPR_FUNCTION
 	  && code->expr2->value.function.isym
 	  && code->expr2->value.function.isym->id == GFC_ISYM_CAF_GET
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index feb43fdec746..d4bfb39a774f 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -12598,29 +12598,6 @@ gfc_trans_assignment_1 (gfc_expr * expr1, gfc_expr * expr2, bool init_flag,

   expr1->must_finalize = 0;
 }
-  else if (flag_coarray == GFC_FCOARRAY_LIB
-	   && lhs_caf_attr.codimension && rhs_caf_attr.codimension
-	   && ((lhs_caf_attr.allocatable && lhs_refs_comp)
-	   || (rhs_caf_attr.allocatable && rhs_refs_comp)))
-{
-  /* Only detour to caf_send[get][_by_ref] () when the lhs or rhs is an
-	 allocatable component, because those need to be accessed via the
-	 caf-runtime.  No need to check for coindexes here, because resolve
-	 has rewritten those already.  */
-  gfc_code code;
-  gfc_actual_arglist a1, a2;
-  /* Clear the structures to prevent accessing garbage.  */
-  memset (&code, '\0', sizeof (gfc_code));
-  memset (&a1, '\0', sizeof (gfc_actual_arglist));
-  memset (&a2, '\0', sizeof (gfc_actual_arglist));
-  a1.expr = expr1;
-  a1.next = &a2;
-  a2.expr = expr2;
-  a2.next = NULL;
-  code.ext.actual = &a1;
-  code.resolved_isym = gfc_intrinsic_subroutine_by_id (GFC_ISYM_CAF_SEND);
-  tmp = gfc_conv_intrinsic_subroutine (&code);
-}
   else if (!is_poly_assign && expr2->must_finalize
 	   && expr1->ts.type == BT_CLASS
 	   && expr2->ts.type == BT_CLASS)
diff --git a/gcc/fortran/trans-intrinsic.cc b/gcc/fortran/trans-intrinsic.cc
index 180d0d7a88c6..d58bea30101c 100644
--- a/gcc/fortran/trans-intrinsic.cc
+++ b/gcc/fortran/trans-intrinsic.cc
@@ -1945,11 +1945,14 @@ conv_caf_send (gfc_code *code) {
   tree lhs_type = NULL_TREE;
   tree vec = null_pointer_node, rhs_vec = null_pointer_node;
   symbol_attribute lhs_caf_attr, rhs_caf_attr;
+  bool lhs_is_coindexe

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-17 Thread Richard Sandiford
Richard Biener  writes:
> On Wed, Jul 17, 2024 at 1:53 PM Tejas Belagod  wrote:
>>
>> On 7/17/24 4:36 PM, Richard Biener wrote:
>> > On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod  
>> > wrote:
>> >>
>> >> On 7/15/24 6:05 PM, Richard Biener wrote:
>> >>> On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod  
>> >>> wrote:
>> 
>>  On 7/15/24 12:16 PM, Tejas Belagod wrote:
>> > On 7/12/24 6:40 PM, Richard Biener wrote:
>> >> On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  
>> >> wrote:
>> >>>
>> >>> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
>>  Padding is only an issue for very small vectors - the obvious 
>>  choice is
>>  to disallow vector types that would require any padding.  I can 
>>  hardly
>>  see where those are faster than using a vector of up to 4 char
>>  elements.
>>  Problematic are 1-bit elements with 4, 2 or one element vectors,
>>  2-bit elements
>>  with 2 or one element vectors and 4-bit elements with 1 element
>>  vectors.
>> >>>
>> >>> I'd really like to avoid having to support something like
>> >>> _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) *
>> >>> 16)))
>> >>> _BitInt(2) to say size of long long could be acceptable.
>> >>
>> >> I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic
>> >> way to say
>> >> the element should have n (< 8) bits.
>> >>
>>  I have no idea what the stance of supporting _BitInt in C++ are,
>>  but most certainly diverging support (or even semantics) of the
>>  vector extension in C vs. C++ is undesirable.
>> >>>
>> >>> I believe Clang supports it in C++ next to C, GCC doesn't and Jason
>> >>> didn't
>> >>> look favorably to _BitInt support in C++, so at least until something
>> >>> like
>> >>> that is standardized in C++ the answer is probably no.
>> >>
>> >> OK, I think that rules out _BitInt use here so while bool is then 
>> >> natural
>> >> for 1-bit elements for 2-bit and 4-bit elements we'd have to specify 
>> >> the
>> >> number of bits explicitly.  There is signed_bool_precision but like
>> >> vector_mask it's use is restricted to the GIMPLE frontend because
>> >> interaction with the rest of the language isn't defined.
>> >>
>> >
>> > Thanks for all the suggestions - really insightful (to me) discussions.
>> >
>> > Yeah, BitInt seemed like it was best placed for this, but not having 
>> > C++
>> > support is definitely a blocker. But as you say, in the absence of
>> > BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One
>> > way to specify non-1-bit widths could be overloading vector_size.
>> >
>> > Also, I think overloading GIMPLE's vector_mask takes us into the
>> > earlier-discussed territory of what it should actually mean - it 
>> > meaning
>> > the target truth type in GIMPLE and a generic vector extension in the 
>> > FE
>> > will probably confuse gcc developers more than users.
>> >
>> >> That said - we're mixing two things here.  The desire to have "proper"
>> >> svbool (fix: declare in the backend) and the desire to have "packed"
>> >> bit-precision vectors (for whatever actual reason) as part of the
>> >> GCC vector extension.
>> >>
>> >
>> > If we leave lane-disambiguation of svbool to the backend, the values I
>> > see in supporting 1, 2 and 4 bitsizes are 1) first step towards
>> > supporting BitInt(N) vectors possibly in the future 2) having a way for
>> > targets to define their intrinsics' bool vector types using GNU
>> > extensions 3) feature parity with Clang's ext_vector_type?
>> >
>> > I believe the primary motivation for Clang to support ext_vector_type
>> > was to have a way to define target intrinsics' vector bool type using
>> > vector extensions.
>> >
>> 
>> 
>>  Interestingly, Clang seems to support
>> 
>>  typedef struct {
>> _Bool i:1;
>>  } STR;
>> 
>>  typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof (STR)
>>  * 4))) vec;
>> 
>> 
>>  int foo (vec b) {
>>    return sizeof b;
>>  }
>> 
>>  I can't find documentation about how it is implemented, but I suspect
>>  the vector is constructed as an array STR[] i.e. possibly each
>>  bit-element padded to byte boundary etc. Also, I can't seem to apply
>>  many operations other than sizeof.
>> 
>>  I don't know if we've tried to support such cases in GNU in the past?
>> >>>
>> >>> Why should we do that?  It doesn't make much sense.
>> >>>
>> >>> single-bit vectors is what _BitInt was invented for.
>> >>
>> >> Forgive me if I'm misunderstanding - I'm trying to figure out how
>> >> _BitInts can be made to have single-bit generic vector semantic

[PATCH v3] testsuite: Avoid running incompatible Arm tests

2024-07-17 Thread Torbjörn SVENSSON
Ok for trunk and releases/gcc-14?

Changes since v2:

- The case when -mfpu and/or -mfloat-abi is defined to an incompatible value. 
With v3,
  the defined multilib wins the race and the flag is no longer overridden if 
set.

Changes since v1:

- Fixed a regression for armv8l-unknown-linux-gnueabihf reported by Linaro 
TCWG-CI.

--

Overriding the -mpfu and -mfloat-abi might be incompatible with selected
multilib. As a result, verify that the current multilib is compatible
with the effective target without changing the -mfpu or -mfloat-abi
options.

gcc/testsuite/ChangeLog:

* lib/target-supports.exp
(check_effective_target_arm_hard_vfp_ok): Check -mfpu value.
(check_effective_target_arm_fp16_ok_nocache): Do not override
-mfpu or -mfloat-abi if defined.
(check_effective_target_arm_fp16_alternative_ok_nocache):
Reuse check_effective_target_arm_fp16_ok.
(check_effective_target_arm_fp16_none_ok_nocache): Likewise.
(check_effective_target_arm_v8_neon_ok_nocache): Align checks
with skeleton from check_effective_target_arm_fp16_ok_nocache.
(check_effective_target_arm_neonv2_ok_nocache): Likewise.

Signed-off-by: Torbjörn SVENSSON 
Co-authored-by: Yvan ROUX 
---
 gcc/testsuite/lib/target-supports.exp | 156 +++---
 1 file changed, 116 insertions(+), 40 deletions(-)

diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index f001c28072f..e0de44872e0 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -4829,6 +4829,7 @@ proc check_effective_target_arm_v8_vfp_ok {} {
 
 proc check_effective_target_arm_hard_vfp_ok { } {
 if { [check_effective_target_arm32]
+&& ! [check-flags [list "" { *-*-* } { "-mfpu=*" } { "-mfpu=vfp" }]]
 && ! [check-flags [list "" { *-*-* } { "-mfloat-abi=*" } { 
"-mfloat-abi=hard" }]] } {
return [check_no_compiler_messages arm_hard_vfp_ok executable {
int main() { return 0;}
@@ -5405,11 +5406,12 @@ proc 
check_effective_target_arm_fp16_alternative_ok_nocache { } {
# Not supported by the target system.
return 0
 }
+global et_arm_fp16_flags
 global et_arm_fp16_alternative_flags
 set et_arm_fp16_alternative_flags ""
-if { [check_effective_target_arm32] } {
-   foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon-fp16"
-  "-mfpu=neon-fp16 -mfloat-abi=softfp"} {
+
+if { [check_effective_target_arm32] && 
[check_effective_target_arm_fp16_ok] } {
+   foreach flags [list "" $et_arm_fp16_flags] {
if { [check_no_compiler_messages_nocache \
  arm_fp16_alternative_ok object {
#if !defined (__ARM_FP16_FORMAT_ALTERNATIVE) || ! (__ARM_FP & 2)
@@ -5434,9 +5436,9 @@ proc check_effective_target_arm_fp16_alternative_ok { } {
 # format.  Some multilibs may be incompatible with the options needed.
 
 proc check_effective_target_arm_fp16_none_ok_nocache { } {
-if { [check_effective_target_arm32] } {
-   foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon-fp16"
-  "-mfpu=neon-fp16 -mfloat-abi=softfp"} {
+global et_arm_fp16_flags
+if { [check_effective_target_arm32] && 
[check_effective_target_arm_fp16_ok] } {
+   foreach flags [list "" $et_arm_fp16_flags] {
if { [check_no_compiler_messages_nocache \
  arm_fp16_none_ok object {
#if defined (__ARM_FP16_FORMAT_ALTERNATIVE)
@@ -5467,23 +5469,55 @@ proc check_effective_target_arm_fp16_none_ok { } {
 proc check_effective_target_arm_v8_neon_ok_nocache { } {
 global et_arm_v8_neon_flags
 set et_arm_v8_neon_flags ""
-if { [check_effective_target_arm32] } {
-   foreach flags {"" "-mfloat-abi=softfp" "-mfpu=neon-fp-armv8" 
"-mfpu=neon-fp-armv8 -mfloat-abi=softfp"} {
-   if { [check_no_compiler_messages_nocache arm_v8_neon_ok object {
-   #if __ARM_ARCH < 8
-   #error not armv8 or later
-   #endif
-   #include "arm_neon.h"
-   void
-   foo ()
-   {
- __asm__ volatile ("vrintn.f32 q0, q0");
-   }
-   } "$flags -march=armv8-a"] } {
-   set et_arm_v8_neon_flags $flags
-   return 1
-   }
+if { ! [check_effective_target_arm32] } {
+   return 0;
+}
+if [check-flags \
+   [list "" { *-*-* } { "-mfpu=*" } \
+{ "-mfpu=*fp-armv8*" } ]] {
+   # Multilib flags would override -mfpu.
+   return 0
+}
+if [check-flags [list "" { *-*-* } { "-mfloat-abi=soft" } { "" } ]] {
+   # Must generate floating-point instructions.
+   return 0
+}
+if [check_effective_target_arm_hf_eabi] {
+   if [check-flags [list "" { *-*-* } { "-mfpu=*" } { "" } ]] {
+   # Use existing -mfpu value and -mfloat-abi value
+   set et_arm_v8_neon

Re: [RFC] Proposal to support Packed Boolean Vector masks.

2024-07-17 Thread Richard Biener
On Wed, Jul 17, 2024 at 3:17 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Wed, Jul 17, 2024 at 1:53 PM Tejas Belagod  wrote:
> >>
> >> On 7/17/24 4:36 PM, Richard Biener wrote:
> >> > On Wed, Jul 17, 2024 at 10:17 AM Tejas Belagod  
> >> > wrote:
> >> >>
> >> >> On 7/15/24 6:05 PM, Richard Biener wrote:
> >> >>> On Mon, Jul 15, 2024 at 1:22 PM Tejas Belagod  
> >> >>> wrote:
> >> 
> >>  On 7/15/24 12:16 PM, Tejas Belagod wrote:
> >> > On 7/12/24 6:40 PM, Richard Biener wrote:
> >> >> On Fri, Jul 12, 2024 at 3:05 PM Jakub Jelinek  
> >> >> wrote:
> >> >>>
> >> >>> On Fri, Jul 12, 2024 at 02:56:53PM +0200, Richard Biener wrote:
> >>  Padding is only an issue for very small vectors - the obvious 
> >>  choice is
> >>  to disallow vector types that would require any padding.  I can 
> >>  hardly
> >>  see where those are faster than using a vector of up to 4 char
> >>  elements.
> >>  Problematic are 1-bit elements with 4, 2 or one element vectors,
> >>  2-bit elements
> >>  with 2 or one element vectors and 4-bit elements with 1 element
> >>  vectors.
> >> >>>
> >> >>> I'd really like to avoid having to support something like
> >> >>> _BitInt(16372) __attribute__((vector_size (sizeof (_BitInt(16372)) 
> >> >>> *
> >> >>> 16)))
> >> >>> _BitInt(2) to say size of long long could be acceptable.
> >> >>
> >> >> I'd disallow _BitInt(n) with n >= 8, it should be just the syntactic
> >> >> way to say
> >> >> the element should have n (< 8) bits.
> >> >>
> >>  I have no idea what the stance of supporting _BitInt in C++ are,
> >>  but most certainly diverging support (or even semantics) of the
> >>  vector extension in C vs. C++ is undesirable.
> >> >>>
> >> >>> I believe Clang supports it in C++ next to C, GCC doesn't and Jason
> >> >>> didn't
> >> >>> look favorably to _BitInt support in C++, so at least until 
> >> >>> something
> >> >>> like
> >> >>> that is standardized in C++ the answer is probably no.
> >> >>
> >> >> OK, I think that rules out _BitInt use here so while bool is then 
> >> >> natural
> >> >> for 1-bit elements for 2-bit and 4-bit elements we'd have to 
> >> >> specify the
> >> >> number of bits explicitly.  There is signed_bool_precision but like
> >> >> vector_mask it's use is restricted to the GIMPLE frontend because
> >> >> interaction with the rest of the language isn't defined.
> >> >>
> >> >
> >> > Thanks for all the suggestions - really insightful (to me) 
> >> > discussions.
> >> >
> >> > Yeah, BitInt seemed like it was best placed for this, but not having 
> >> > C++
> >> > support is definitely a blocker. But as you say, in the absence of
> >> > BitInt, bool becomes the natural choice for bit sizes 1, 2 and 4. One
> >> > way to specify non-1-bit widths could be overloading vector_size.
> >> >
> >> > Also, I think overloading GIMPLE's vector_mask takes us into the
> >> > earlier-discussed territory of what it should actually mean - it 
> >> > meaning
> >> > the target truth type in GIMPLE and a generic vector extension in 
> >> > the FE
> >> > will probably confuse gcc developers more than users.
> >> >
> >> >> That said - we're mixing two things here.  The desire to have 
> >> >> "proper"
> >> >> svbool (fix: declare in the backend) and the desire to have "packed"
> >> >> bit-precision vectors (for whatever actual reason) as part of the
> >> >> GCC vector extension.
> >> >>
> >> >
> >> > If we leave lane-disambiguation of svbool to the backend, the values 
> >> > I
> >> > see in supporting 1, 2 and 4 bitsizes are 1) first step towards
> >> > supporting BitInt(N) vectors possibly in the future 2) having a way 
> >> > for
> >> > targets to define their intrinsics' bool vector types using GNU
> >> > extensions 3) feature parity with Clang's ext_vector_type?
> >> >
> >> > I believe the primary motivation for Clang to support ext_vector_type
> >> > was to have a way to define target intrinsics' vector bool type using
> >> > vector extensions.
> >> >
> >> 
> >> 
> >>  Interestingly, Clang seems to support
> >> 
> >>  typedef struct {
> >> _Bool i:1;
> >>  } STR;
> >> 
> >>  typedef struct { _Bool i: 1; } __attribute__((vector_size (sizeof 
> >>  (STR)
> >>  * 4))) vec;
> >> 
> >> 
> >>  int foo (vec b) {
> >>    return sizeof b;
> >>  }
> >> 
> >>  I can't find documentation about how it is implemented, but I suspect
> >>  the vector is constructed as an array STR[] i.e. possibly each
> >>  bit-element padded to byte boundary etc. Also, I can't seem to apply
> >>  many operations other than sizeof.
> >>

[RFC] x86: Implement separate shrink wrapping

2024-07-17 Thread Michael Matz
Hello,

I have implemented the separate shrink wrapping hooks for x86, and this is 
the result.  With it we can now generate the pro- and epilogue sequences 
individually and possibly split over multiple BBs, unlike the non-separate 
shrink wrapping we implement (which can only move the whole 
prologue/epilogue sequence to a different single place).  For instance, in 
this example:

```c
int g(void);
int h(void);
int f(int x)
{
if (x >= 42) {
register int reg1 asm("bx") = x;
asm volatile("#before 1" : "+r"(reg1));
h();
return 1 + g();
}
if (x == 31) {
register int reg2 asm("r13") = x;
asm volatile("#before 2" : "+r"(reg2));
return 2 + h();
}
return x;
}
```

(the asms are there to simulate stuff that needs doing which clobbers 
some callee saved registers) the generated code is:

```asm
f:
movl%edi, %eax #nothing save/restored in fast path
cmpl$41, %edi
jg  .L6
cmpl$31, %edi
je  .L7
ret

.L6:
movq%rbx, -16(%rsp)# regsave2
subq$24, %rsp  # frame-alloc/align
movl%edi, %ebx
#before 1
callh
callg
movq8(%rsp), %rbx  # regrestore2
addq$24, %rsp
addl$1, %eax
ret

.L7:
movq%r13, -8(%rsp) # regsave1
subq$24, %rsp
movl$31, %r13d
#before 2
callh
movq16(%rsp), %r13 # regrestore1
addq$24, %rsp
addl$2, %eax
ret
```

In particular the frame allocation is also moved, and in each BB only the 
(single) register that needs saving/restoring is so, making the early-out 
as small as possible.

The patch regstraps on x86_64-linux and works as advertised.  But 
unfortunately the performance results on SPECcpu2017 on a zen3 microarch 
are a wash, but need to be paid for by a code size increase of about 2.5%. 
I guess the stack engines of recent CPUs are simply so good that the 
push/pop sequences we normally use don't matter in the OOO pipelines.  
Some benchmarks speed up by a percent, some slow down, so that's just 
noise and probably more related to code layout changes due to code size 
changes.

So, on recent CPU microarchs separate shrink wrapping doesn't seem useful, 
and hence this is really only a request for comments.  I don't intend to 
push for inclusion of this patch.  Though, if people benchmark it on older 
microarchs I would be thankful; if it brings anything the separate shrink 
wrapping could be included behind a tuning option.


Ciao,
Michael.

P.S: the SPEC results were (with only -O2, I wanted to see effects for 
'normal' compilations) x264 missing for unrelated reasons, "SSW"-separate 
shrink wrapping:

Name|no SSW| SSW
|runtime | rate|runtime | rate
--
500.perlbench_r | 236|6.73 |  242   | 6.58
502.gcc_r   | 187|7.59 |  185   | 7.66
505.mcf_r   | 247|6.55 |  245   | 6.60
520.omnetpp_r   | 292|4.49 |  288   | 4.55
523.xalancbmk_r | 214|4.94 |  213   | 4.96
531.deepsjeng_r | 237|4.84 |  237   | 4.84
541.leela_r | 371|4.46 |  369   | 4.49
548.exchange2_r | 228|   11.5  |  232   |11.3
557.xz_r| 263|4.11 |  266   | 4.06
--
overall INTrate  |5.81  | 5.80

Name|no SSW| SSW
|runtime | rate|runtime | rate
--
503.bwaves_r| 248|   40.4  |  248   |40.5
507.cactuBSSN_r | 196|6.45 |  195   | 6.50
508.namd_r  | 186|5.12 |  185   | 5.12
510.parest_r| 339|7.73 |  339   | 7.72
511.povray_r| 324|7.21 |  327   | 7.14
519.lbm_r   | 150|7.03 |  150   | 7.05
521.wrf_r   | 316|7.09 |  317   | 7.08
526.blender_r   | 211|7.22 |  209   | 7.27
527.cam4_r  | 217|8.05 |  216   | 8.09
538.imagick_r   | 438|5.68 |  438   | 5.67
544.nab_r   | 234|7.20 |  232   | 7.24
549.fotonik3d_r | 254|   15.3  |  254   |15.3
554.roms_r  | 287|5.54 |  281   | 5.65
--
overall FPrate   |8.19  | 8.21

-- >8 --

this adds support for the infrastructure for shrink wrapping
separate components to the x86 target.  The components we track
are individual registers to save/restore and the frame allocation
itself.

There are various limitations where we give up:
* when the frame becomes too large
* when any complicated realignment is needed (DRAP or not)
* when the calling convention requires certai

[PATCH] varasm: Shorten assembly of strings with larger zero regions

2024-07-17 Thread Jakub Jelinek
Hi!

When not using .base64 directive, we emit for long sequences of zeros
.string "foobarbaz"
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
.string ""
The following patch changes that to
.string "foobarbaz"
.zero   12
It keeps emitting .string "" if there is just one zero or two zeros where
the first one is preceded by non-zeros, so we can have
.string "foobarbaz"
.string ""
or
.base64 "VG8gYmUgb3Igbm90IHRvIGJlLCB0aGF0IGlzIHRoZSBxdWVzdGlvbg=="
.string ""
but not 2 .string "" in a row.

On a testcase I have with around 310440 0-255 unsigned char character
constants mostly derived from cc1plus start but with too long sequences of
0s which broke transformation to STRING_CST adjusted to have at most 126
consecutive 0s, I see:
1504498 bytes long assembly without this patch on i686-linux (without
.base64 support in binutils)
1155071 bytes long assembly with this patch on i686-linux (without .base64
support in binutils)
431390 bytes long assembly without this patch on x86_64-linux (with
.base64 support in binutils)
427593 bytes long assembly with this patch on x86_64-linux (with .base64
support in binutils)
All 4 assemble to identical *.o file when using x86_64-linux .base64
supporting gas, and the former 2 when using older x86_64-linux gas assemble
to identical content as well.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-07-17  Jakub Jelinek  

* varasm.cc (default_elf_asm_output_ascii): Use ASM_OUTPUT_SKIP instead
of 2 or more default_elf_asm_output_limited_string (f, "") calls and
adjust base64 heuristics correspondingly.

--- gcc/varasm.cc.jj2024-07-16 13:36:54.259748720 +0200
+++ gcc/varasm.cc   2024-07-16 14:08:19.211753867 +0200
@@ -8538,6 +8538,7 @@ default_elf_asm_output_ascii (FILE *f, c
   if (s >= last_base64)
{
  unsigned cnt = 0;
+ unsigned char prev_c = ' ';
  const char *t;
  for (t = s; t < limit && (t - s) < (long) ELF_STRING_LIMIT - 1; t++)
{
@@ -8560,7 +8561,13 @@ default_elf_asm_output_ascii (FILE *f, c
  break;
case 1:
  if (c == 0)
-   cnt += 2 + strlen (STRING_ASM_OP) + 1;
+   {
+ if (prev_c == 0
+ && t + 1 < limit
+ && (t + 1 - s) < (long) ELF_STRING_LIMIT - 1)
+   break;
+ cnt += 2 + strlen (STRING_ASM_OP) + 1;
+   }
  else
cnt += 4;
  break;
@@ -8568,6 +8575,7 @@ default_elf_asm_output_ascii (FILE *f, c
  cnt += 2;
  break;
}
+ prev_c = c;
}
  if (cnt > ((unsigned) (t - s) + 2) / 3 * 4 && (t - s) >= 3)
{
@@ -8633,8 +8641,18 @@ default_elf_asm_output_ascii (FILE *f, c
  bytes_in_chunk = 0;
}
 
- default_elf_asm_output_limited_string (f, s);
- s = p;
+ if (p == s && p + 1 < limit && p[1] == '\0')
+   {
+ for (p = s + 2; p < limit && *p == '\0'; p++)
+   continue;
+ ASM_OUTPUT_SKIP (f, (unsigned HOST_WIDE_INT) (p - s));
+ s = p - 1;
+   }
+ else
+   {
+ default_elf_asm_output_limited_string (f, s);
+ s = p;
+   }
}
   else
{

Jakub



[PATCH] bitint: Use gsi_insert_on_edge rather than gsi_insert_on_edge_immediate [PR115887]

2024-07-17 Thread Jakub Jelinek
Hi!

The following testcase ICEs on x86_64-linux, because we try to
gsi_insert_on_edge_immediate a statement on an edge which already has
statements queued with gsi_insert_on_edge, and the deferral has been
intentional so that we don't need to deal with cfg changes in between.

The following patch uses the delayed insertion as well.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2024-07-17  Jakub Jelinek  

PR middle-end/115887
* gimple-lower-bitint.cc (gimple_lower_bitint): Use gsi_insert_on_edge
instead of gsi_insert_on_edge_immediate and set edge_insertions to
true.

* gcc.dg/bitint-108.c: New test.

--- gcc/gimple-lower-bitint.cc.jj   2024-06-19 21:10:33.037708202 +0200
+++ gcc/gimple-lower-bitint.cc  2024-07-17 12:17:30.407211495 +0200
@@ -6903,7 +6903,8 @@ gimple_lower_bitint (void)
if (stmt_ends_bb_p (stmt))
  {
edge e = find_fallthru_edge (gsi_bb (gsi)->succs);
-   gsi_insert_on_edge_immediate (e, g);
+   gsi_insert_on_edge (e, g);
+   edge_insertions = true;
  }
else
  gsi_insert_after (&gsi, g, GSI_SAME_STMT);
--- gcc/testsuite/gcc.dg/bitint-108.c.jj2024-07-17 12:18:04.684768583 
+0200
+++ gcc/testsuite/gcc.dg/bitint-108.c   2024-07-17 12:21:19.594252002 +0200
@@ -0,0 +1,38 @@
+/* PR middle-end/115887 */
+/* { dg-do compile { target { bitint && int128 } } } */
+/* { dg-options "-O -fnon-call-exceptions -finstrument-functions -w" } */
+
+float f;
+#if __BITINT_MAXWIDTH__ >= 1024
+#define N1024 1024
+#define N127 127
+#define N256 256
+#else
+#define N1024 64
+#define N127 64
+#define N256 64
+#endif
+
+_BitInt(N1024) a;
+
+static inline void
+bar (_BitInt(N127) b, _BitInt(N256) c, int,
+ int, int, int, int, int, int, int, int, 
+ int, int, int, int, int, int, int, int,
+ int *)
+{
+  b %= 0;
+  do
+c -= *(short *) 0;
+  while (__builtin_add_overflow_p (a, 0, 0));
+  __int128 d = b + c + f;
+}
+
+void
+foo (void)
+{
+  int x;
+  bar (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, &x);
+  while (x)
+;
+}

Jakub



Re: [PATCH 1/4] vect: Add a unified vect_get_num_copies for slp and non-slp

2024-07-17 Thread Feng Xue OS
>> +inline unsigned int
>> +vect_get_num_copies (vec_info *vinfo, slp_tree node, tree vectype = NULL)
>> +{
>> +  poly_uint64 vf;
>> +
>> +  if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
>> +vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
>> +  else
>> +vf = 1;
>> +
>> +  if (node)
>> +{
>> +  vf *= SLP_TREE_LANES (node);
>> +  if (!vectype)
>> +   vectype = SLP_TREE_VECTYPE (node);
>> +}
>> +  else
>> +gcc_checking_assert (vectype);
>
> can you make the checking assert unconditional?
>
> OK with that change.  vect_get_num_vectors will ICE anyway
> I guess, so at your choice remove the assert completely.
>

OK, I removed the assert.

Thanks,
Feng


From: Richard Biener 
Sent: Monday, July 15, 2024 10:00 PM
To: Feng Xue OS
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH 1/4] vect: Add a unified vect_get_num_copies for slp and 
non-slp

On Sat, Jul 13, 2024 at 5:46 PM Feng Xue OS  wrote:
>
> Extend original vect_get_num_copies (pure loop-based) to calculate number of
> vector stmts for slp node regarding a generic vect region.
>
> Thanks,
> Feng
> ---
> gcc/
> * tree-vectorizer.h (vect_get_num_copies): New overload function.
> (vect_get_slp_num_vectors): New function.
> * tree-vect-slp.cc (vect_slp_analyze_node_operations_1): Calculate
> number of vector stmts for slp node with vect_get_num_copies.
> (vect_slp_analyze_node_operations): Calculate number of vector 
> elements
> for constant/external slp node with vect_get_num_copies.
> ---
>  gcc/tree-vect-slp.cc  | 19 +++
>  gcc/tree-vectorizer.h | 29 -
>  2 files changed, 31 insertions(+), 17 deletions(-)
>
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index d0a8531fd3b..4dadbc6854d 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -6573,17 +6573,7 @@ vect_slp_analyze_node_operations_1 (vec_info *vinfo, 
> slp_tree node,
>   }
>  }
>else
> -{
> -  poly_uint64 vf;
> -  if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
> -   vf = loop_vinfo->vectorization_factor;
> -  else
> -   vf = 1;
> -  unsigned int group_size = SLP_TREE_LANES (node);
> -  tree vectype = SLP_TREE_VECTYPE (node);
> -  SLP_TREE_NUMBER_OF_VEC_STMTS (node)
> -   = vect_get_num_vectors (vf * group_size, vectype);
> -}
> +SLP_TREE_NUMBER_OF_VEC_STMTS (node) = vect_get_num_copies (vinfo, node);
>
>/* Handle purely internal nodes.  */
>if (SLP_TREE_CODE (node) == VEC_PERM_EXPR)
> @@ -6851,12 +6841,9 @@ vect_slp_analyze_node_operations (vec_info *vinfo, 
> slp_tree node,
>   && j == 1);
>   continue;
> }
> - unsigned group_size = SLP_TREE_LANES (child);
> - poly_uint64 vf = 1;
> - if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
> -   vf = loop_vinfo->vectorization_factor;
> +
>   SLP_TREE_NUMBER_OF_VEC_STMTS (child)
> -   = vect_get_num_vectors (vf * group_size, vector_type);
> +   = vect_get_num_copies (vinfo, child);
>   /* And cost them.  */
>   vect_prologue_cost_for_slp (child, cost_vec);
> }
> diff --git a/gcc/tree-vectorizer.h b/gcc/tree-vectorizer.h
> index 8eb3ec4df86..09923b9b440 100644
> --- a/gcc/tree-vectorizer.h
> +++ b/gcc/tree-vectorizer.h
> @@ -2080,6 +2080,33 @@ vect_get_num_vectors (poly_uint64 nunits, tree vectype)
>return exact_div (nunits, TYPE_VECTOR_SUBPARTS (vectype)).to_constant ();
>  }
>
> +/* Return the number of vectors in the context of vectorization region VINFO,
> +   needed for a group of total SIZE statements that are supposed to be
> +   interleaved together with no gap, and all operate on vectors of type
> +   VECTYPE.  If NULL, SLP_TREE_VECTYPE of NODE is used.  */
> +
> +inline unsigned int
> +vect_get_num_copies (vec_info *vinfo, slp_tree node, tree vectype = NULL)
> +{
> +  poly_uint64 vf;
> +
> +  if (loop_vec_info loop_vinfo = dyn_cast  (vinfo))
> +vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo);
> +  else
> +vf = 1;
> +
> +  if (node)
> +{
> +  vf *= SLP_TREE_LANES (node);
> +  if (!vectype)
> +   vectype = SLP_TREE_VECTYPE (node);
> +}
> +  else
> +gcc_checking_assert (vectype);

can you make the checking assert unconditional?

OK with that change.  vect_get_num_vectors will ICE anyway
I guess, so at your choice remove the assert completely.

Thanks,
Richard.

> +
> +  return vect_get_num_vectors (vf, vectype);
> +}
> +
>  /* Return the number of copies needed for loop vectorization when
> a statement operates on vectors of type VECTYPE.  This is the
> vectorization factor divided by the number of elements in
> @@ -2088,7 +2115,7 @@ vect_get_num_vectors (poly_uint64 nunits, tree vectype)
>  inline unsigned int
>  vect_get_num_copies (loop_vec_info loop_vinfo, tree vectype)
>  {
> -  return vect_get_num_vectors (LOOP_VINFO

Re: [PATCH] varasm: Shorten assembly of strings with larger zero regions

2024-07-17 Thread Richard Biener



> Am 17.07.2024 um 15:55 schrieb Jakub Jelinek :
> 
> Hi!
> 
> When not using .base64 directive, we emit for long sequences of zeros
>.string"foobarbaz"
>.string ""
>.string ""
>.string ""
>.string ""
>.string ""
>.string ""
>.string ""
>.string ""
>.string ""
>.string ""
>.string ""
>.string ""
> The following patch changes that to
>.string"foobarbaz"
>.zero12
> It keeps emitting .string "" if there is just one zero or two zeros where
> the first one is preceded by non-zeros, so we can have
>.string "foobarbaz"
>.string ""
> or
>.base64 "VG8gYmUgb3Igbm90IHRvIGJlLCB0aGF0IGlzIHRoZSBxdWVzdGlvbg=="
>.string ""
> but not 2 .string "" in a row.
> 
> On a testcase I have with around 310440 0-255 unsigned char character
> constants mostly derived from cc1plus start but with too long sequences of
> 0s which broke transformation to STRING_CST adjusted to have at most 126
> consecutive 0s, I see:
> 1504498 bytes long assembly without this patch on i686-linux (without
> .base64 support in binutils)
> 1155071 bytes long assembly with this patch on i686-linux (without .base64
> support in binutils)
> 431390 bytes long assembly without this patch on x86_64-linux (with
> .base64 support in binutils)
> 427593 bytes long assembly with this patch on x86_64-linux (with .base64
> support in binutils)
> All 4 assemble to identical *.o file when using x86_64-linux .base64
> supporting gas, and the former 2 when using older x86_64-linux gas assemble
> to identical content as well.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok.  Is there a more general repeat byte op available?

Richard 

> 2024-07-17  Jakub Jelinek  
> 
>* varasm.cc (default_elf_asm_output_ascii): Use ASM_OUTPUT_SKIP instead
>of 2 or more default_elf_asm_output_limited_string (f, "") calls and
>adjust base64 heuristics correspondingly.
> 
> --- gcc/varasm.cc.jj2024-07-16 13:36:54.259748720 +0200
> +++ gcc/varasm.cc2024-07-16 14:08:19.211753867 +0200
> @@ -8538,6 +8538,7 @@ default_elf_asm_output_ascii (FILE *f, c
>   if (s >= last_base64)
>{
>  unsigned cnt = 0;
> +  unsigned char prev_c = ' ';
>  const char *t;
>  for (t = s; t < limit && (t - s) < (long) ELF_STRING_LIMIT - 1; t++)
>{
> @@ -8560,7 +8561,13 @@ default_elf_asm_output_ascii (FILE *f, c
>  break;
>case 1:
>  if (c == 0)
> -cnt += 2 + strlen (STRING_ASM_OP) + 1;
> +{
> +  if (prev_c == 0
> +  && t + 1 < limit
> +  && (t + 1 - s) < (long) ELF_STRING_LIMIT - 1)
> +break;
> +  cnt += 2 + strlen (STRING_ASM_OP) + 1;
> +}
>  else
>cnt += 4;
>  break;
> @@ -8568,6 +8575,7 @@ default_elf_asm_output_ascii (FILE *f, c
>  cnt += 2;
>  break;
>}
> +  prev_c = c;
>}
>  if (cnt > ((unsigned) (t - s) + 2) / 3 * 4 && (t - s) >= 3)
>{
> @@ -8633,8 +8641,18 @@ default_elf_asm_output_ascii (FILE *f, c
>  bytes_in_chunk = 0;
>}
> 
> -  default_elf_asm_output_limited_string (f, s);
> -  s = p;
> +  if (p == s && p + 1 < limit && p[1] == '\0')
> +{
> +  for (p = s + 2; p < limit && *p == '\0'; p++)
> +continue;
> +  ASM_OUTPUT_SKIP (f, (unsigned HOST_WIDE_INT) (p - s));
> +  s = p - 1;
> +}
> +  else
> +{
> +  default_elf_asm_output_limited_string (f, s);
> +  s = p;
> +}
>}
>   else
>{
> 
>Jakub
> 


[PATCH v2] gimple ssa: Teach switch conversion to optimize powers of 2 switches

2024-07-17 Thread Filip Kastl
Hi,

This is the second version of my patch introducing "exponential index
transformation" to the switch conversion pass.  See the version 1 mail here:

https://gcc.gnu.org/pipermail/gcc-patches/2024-May/653120.html


Changes made


In this version I addressed the following comments:

- Linaro CI: switch-3.c failing regtest
Exp transform interfered with this test so I added -fdisable-tree-switchconv.

- richi: gsi_start_bb -> gsi_after_labels
- richi: Use int_const_binop
- richi: Merge two cycles into one
- apinki, richi: Use wide_int instead of HOST_WIDE_INT
- richi: You can use the gimple_build API for nicer code
- richi: Use -mbmi -mpopcount instead -march=znver4
Made these modifications.

- richi: Split out code generating GIMPLE for log2 and pow2p operations
Made this modification.  The final implementation differs from the one I sent
in a reply to the version 1 patch.  I chose to return gimple_seq as Richard
suggested because it seems more elegant to me.

- richi: "It is good to not leave IL in a broken state"
Removed my FIXME remark suggesting to not fix dominators because the GIMPLE
code gets removed later.


Changes not made


These are the comments that I think were meant as suggestions, not as "this
must be changed" and I'm leaving them for possible future patches:

- richi: Also handle *minus* powers of 2 and powers of 2 +- a constant
- richi: Also allow using CTZ to compute log2
- apinski, richi: Smarter handling of type of index variable (current
  implementation cannot handle shorts and chars for example)

These are the comments that I'd like to reply to here but they didn't prompt
any change to the patch:

- richi: Signed index variable with 0x8000 as its value may be a problem.
Tested the patch version 2 for this situation.  In this situation, switch
conversion evaluates exponential transform as not viable so its fine.

- richi:
> > +  redirect_immediate_dominators (CDI_DOMINATORS, swtch_bb, cond_bb);
> > +
> > +  /* FIXME: Since exponential transform is run only if we know that the
> > +switch will be converted, we know these blocks will be removed and we
> > +maybe don't have to bother updating their dominators.  */
> 
> It's good to not leave the IL in an intermediate broken state.
> 
> > +  edge e;
> > +  edge_iterator ei;
> > +  FOR_EACH_EDGE (e, ei, swtch_bb->succs)
> > +   {
> > + basic_block bb = e->dest;
> > + if (bb == m_final_bb || bb == default_bb)
> > +   continue;
> > + set_immediate_dominator (CDI_DOMINATORS, bb, swtch_bb);
> 
> If there's an alternate edge into the cases, thus the original
> dominator wasn't the swtch_bb the dominator shouldn't change.
> I wonder why it would change at all - you are only creating
> a new edge into the default case so only that block dominance
> relationship changes?
If I read the switch conversion sources right, there cannot be an alternate
edge into the non-default cases.  switch_convesion::collect would reject that
kind of switch.  So we know that case basic blocks will always have the switch
basic block as their immediate dominator.  This code here actually just
partially reverts (only for the case basic blocks) the
redirect_immediate_dominators call that happens a few lines earlier.  That call
is needed because all basic blocks *outside* the switch that had the switch
basic block as their immediate dominator should now have cond_bb as their
immediate dominator.

- richi:
> > +   }
> > +
> > +  vec v;
> > +  v.create (1);
> > +  v.quick_push (m_final_bb);
> > +  iterate_fix_dominators (CDI_DOMINATORS, v, true);
> 
> The final BB should have the condition block as immediate dominator
> if it's old immediate dominator was the switch block, otherwise
> it's immediate dominator shouldn't change.
I think that this is not true.  Consider this CFG where no path through default
BB intersects final BB:

switch BB ---+
/  |  \   \
case BBsdefault BB
\  |  /   /
final BB /
   |/

Here idom(final BB) == switch BB.
After the index exponential transform the CFG looks like this

cond BB -+
   | |
switch BB ---+   |
/  |  \   \  |
case BBsdefault BB
\  |  /   /
final BB /
   |/

It still holds that idom(final BB) == switch BB.


I bootstrapped and regtested the patch on x86_64 linux.

Can I commit the patch like this?  Or are there some things that still need
addressing?

Cheers
Filip Kastl


--- 8< ---


gimple ssa: Teach switch conversion to optimize powers of 2 switches

Sometimes a switch has case numbers that are powers of 2.  Switch
conversion usually isn't able to optimize switches.  This patch adds
"exponential index transformation" to switch conversion.  After switch
conversion applies this transformation on the switch the index variable
of the switch becomes the exponent instead of the whole value.  For
example:

switch (i)
  {
case (1 << 0): return 0;
case (1 << 1): return 1;
 

Re: [PATCH] bitint: Use gsi_insert_on_edge rather than gsi_insert_on_edge_immediate [PR115887]

2024-07-17 Thread Richard Biener



> Am 17.07.2024 um 16:01 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The following testcase ICEs on x86_64-linux, because we try to
> gsi_insert_on_edge_immediate a statement on an edge which already has
> statements queued with gsi_insert_on_edge, and the deferral has been
> intentional so that we don't need to deal with cfg changes in between.
> 
> The following patch uses the delayed insertion as well.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-07-17  Jakub Jelinek  
> 
>PR middle-end/115887
>* gimple-lower-bitint.cc (gimple_lower_bitint): Use gsi_insert_on_edge
>instead of gsi_insert_on_edge_immediate and set edge_insertions to
>true.
> 
>* gcc.dg/bitint-108.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj2024-06-19 21:10:33.037708202 +0200
> +++ gcc/gimple-lower-bitint.cc2024-07-17 12:17:30.407211495 +0200
> @@ -6903,7 +6903,8 @@ gimple_lower_bitint (void)
>if (stmt_ends_bb_p (stmt))
>  {
>edge e = find_fallthru_edge (gsi_bb (gsi)->succs);
> -gsi_insert_on_edge_immediate (e, g);
> +gsi_insert_on_edge (e, g);
> +edge_insertions = true;
>  }
>else
>  gsi_insert_after (&gsi, g, GSI_SAME_STMT);
> --- gcc/testsuite/gcc.dg/bitint-108.c.jj2024-07-17 12:18:04.684768583 
> +0200
> +++ gcc/testsuite/gcc.dg/bitint-108.c2024-07-17 12:21:19.594252002 +0200
> @@ -0,0 +1,38 @@
> +/* PR middle-end/115887 */
> +/* { dg-do compile { target { bitint && int128 } } } */
> +/* { dg-options "-O -fnon-call-exceptions -finstrument-functions -w" } */
> +
> +float f;
> +#if __BITINT_MAXWIDTH__ >= 1024
> +#define N1024 1024
> +#define N127 127
> +#define N256 256
> +#else
> +#define N1024 64
> +#define N127 64
> +#define N256 64
> +#endif
> +
> +_BitInt(N1024) a;
> +
> +static inline void
> +bar (_BitInt(N127) b, _BitInt(N256) c, int,
> + int, int, int, int, int, int, int, int,
> + int, int, int, int, int, int, int, int,
> + int *)
> +{
> +  b %= 0;
> +  do
> +c -= *(short *) 0;
> +  while (__builtin_add_overflow_p (a, 0, 0));
> +  __int128 d = b + c + f;
> +}
> +
> +void
> +foo (void)
> +{
> +  int x;
> +  bar (0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, &x);
> +  while (x)
> +;
> +}
> 
>Jakub
> 


Re: [PATCH] rtl-ssa: Fix split_clobber_group [PR115928]

2024-07-17 Thread Jeff Law




On 7/16/24 8:24 AM, Richard Sandiford wrote:

One of the goals of the rtl-ssa representation was to allow a
group of consecutive clobbers to be skipped in constant time,
with amortised sublinear insertion and deletion.  This involves
putting consecutive clobbers in groups.  Splitting or joining
groups would be linear if we had to update every clobber on
each update, so the operation to query a clobber's group is
lazy and (again) amortised sublinear.

This means that, when splitting a group into two, we cannot
reuse the old group for one side.  We have to invalidate it,
so that the lazy clobber_info::group query can tell that something
has changed.  The ICE in the PR came from failing to do that.

Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

Richard


gcc/
PR rtl-optimization/115928
* rtl-ssa/accesses.h (clobber_group): Add a new constructor that
takes the first, last and root clobbers.
* rtl-ssa/internals.inl (clobber_group::clobber_group): Define it.
* rtl-ssa/accesses.cc (function_info::split_clobber_group): Use it.
Allocate a new group for both sides and invalidate the previous group.
(function_info::add_def): After calling split_clobber_group,
remove the old group from the splay tree.

gcc/testsuite/
PR rtl-optimization/115928
* gcc.dg/torture/pr115928.c: New test.

OK
jeff



Re: [PING][patch, avr] Implement PR90616: Improve adding symbols that are 256-byte aligned

2024-07-17 Thread Jeff Law




On 7/17/24 3:45 AM, Georg-Johann Lay wrote:

Ping #1 for

https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656332.html


Address computation (usually add) with symbols that are aligned
to 256 bytes does not require to add the lo8() part as it is zero.

This patch adds a new combine insn that performs a widening add
from QImode plus such a symbol.  The case when such an aligned
symbol is added to a reg that's already in HImode can be handled
in the addhi3 asm printer.

Ok to apply?

Johann

--

AVR: target90616 - Improve adding constants that are 0 mod 256.

This patch introduces a new insn that works as an insn combine
pattern for (plus:HI (zero_extend:HI(reg:QI) const_0mod256_operannd:HI))
which requires at most 2 instructions.  When the input register operand
is already in HImode, the addhi3 printer only adds the hi8 part when
it sees a SYMBOL_REF or CONST aligned to at least 256 bytes.
(The CONST_INT case was already handled).

gcc/
    PR target/90616
    * config/avr/predicates.md (const_0mod256_operand): New predicate.
    * config/avr/constraints.md (Cp8): New constraint.
    * config/avr/avr.md (*aligned_add_symbol): New insn.
    * config/avr/avr.cc (avr_out_plus_symbol) [HImode]:
    When op2 is a multiple of 256, there is no need to add / subtract
    the lo8 part.
    (avr_rtx_costs_1) [PLUS && HImode]: Return expected costs for
    new insn *aligned_add_symbol as it applies.

Sorry.  I must have lost this.  Thanks for pinging.

It looks fine for the trunk.  Out of curiosity, does the avr port 
implement linker relaxing for this case?  That would seem to be 
generally helpful as it'd be able to automatically detect when the 
symbolic reference has the right low order bits.  Obviously not a 
requirement for this patch to go forward, just an observation for 
further improvements.


jeff



Re: [PATCH v5] RISC-V: use fclass insns to implement isfinite,isnormal and isinf builtins

2024-07-17 Thread Jeff Law




On 7/16/24 12:26 PM, Vineet Gupta wrote:

Hi Jeff,

On 7/13/24 07:54, Vineet Gupta wrote:

Changes since v4:
   - No functional changes.
   - Use int iterator and attr to implement expanders in md
 (inspired by loongarch patch. Thx Xi Ruoyao)

Changes since v3:
   - Remove '*' from define_insn for fclass
   - Remove the dummy expander for fclass.
   - De-duplicate the expanders code by using a helper which takes fclass
 val.

Changes since v2:
   - fclass define_insn tightened to check op0 mode "X" with additional
 expander w/o mode for callers.
   - builtins expander explicit mode check and FAIL if mode not appropriate.
   - subreg promoted handling to elide potential extension of ret val.
   - Added isinf builtin with bimodal retun value as others.

Changes since v1:
   - Removed UNSPEC_{INFINITE,ISNORMAL}
   - Don't hardcode SI in patterns, try to keep X to avoid potential
 sign extension pitfalls. Implementation wise requires skipping
 :MODE specifier in match_operand which is flagged as missing mode
 warning.
---

Currently thsse builtins use float compare instructions which require
FP flags to be save/restore around them.
Our perf team complained this could be costly in uarch.
RV Base ISA already has FCLASS.{d,s,h} instruction to compare/identify FP
values w/o disturbing FP exception flags.

Coincidently, upstream very recently got support for the corresponding
optabs. So this just requires wiring up in the backend.

Tested for rv64, one additioal failure g++.dg/opt/pr107569.C needs
upstream ranger fix for the new optab.



Pre-commit CI reports [1] three additional failures which are related to
ranger patches in this space, same as what Loongarch folks are seeing [2]

FAIL: gcc.dg/tree-ssa/range-sincos.c scan-tree-dump-not evrp "link_error"
FAIL: gcc.dg/tree-ssa/vrp-float-abs-1.c scan-tree-dump-not evrp "link_error"
FAIL: g++.dg/opt/pr107569.C  -std=gnu++20  scan-tree-dump-times vrp1
"return 1;" 2

Here's my testing with trunk of this morning, which corroborates with
the same:
  2024-07-16 fec38d7987dd rtl-ssa: Fix removal of order_nodes
[PR115929]

  rv64imafdc_zba_zbb_zbs_zicond_zfa/  lp64d/ medlow |  321 /    56 |    0
/ 0 |    6 / 1 |   # upstream
  rv64imafdc_zba_zbb_zbs_zicond_zfa/  lp64d/ medlow |  323 /    58 |    1
/ 1 |    6 / 1 |   # upstream + mypatch
  rv64imafdc_zba_zbb_zbs_zicond_zfa/  lp64d/ medlow |  321 /    56 |    0
/ 0 |    6 / 1 |   # upstream + mypatch + ranger patches

Having said that I don't mind if you prefer the ranger patches make it
in first.

Thx,
-Vineet

[1] https://github.com/ewlu/gcc-precommit-ci/issues/1908
[2] https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656972.html
Thanks.  I didn't make the connection between Hao's patch and these 
failures, particularly the range-sincos.c case.  Not sure how I could 
have missed it given that test is explicitly mentioned in the first 
paragraph of Hao's message.


I actually gave Hao a bit of feeback last week on a couple things that 
didn't look quite right.  So hopefully we'll have that patch moving 
forward shortly.


Let's get closure on Hao's patch, then this one can go forward.

Thanks!

jeff



Re: [PATCH] RISC-V: More support of vx and vf for autovec comparison

2024-07-17 Thread Jeff Law




On 7/17/24 4:55 AM, demin.han wrote:

There are still some cases which can't utilize vx or vf for autovec
comparison after last_combine pass.

1. integer comparison when imm isn't in range of [-16, 15]
2. float imm is 0.0
3. DI or DF mode under RV32

This patch fix above mentioned issues.

Tested on RV32 and RV64.

gcc/ChangeLog:

* config/riscv/autovec.md: register_operand to nonmemory_operand
* config/riscv/riscv-v.cc (get_cmp_insn_code): Select code according
 * to scalar_p
(expand_vec_cmp): Generate scalar_p and transform op1
* config/riscv/riscv.cc (riscv_const_insns): Add !FLOAT_MODE_P
 * constrain
* config/riscv/vector.md: Add !FLOAT_MODE_P constrain

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c: Fix test
* gcc.target/riscv/rvv/autovec/binop/vdiv-rv32gcv-nofm.c: Ditto
* gcc.target/riscv/rvv/autovec/binop/vmul-rv32gcv-nofm.c: Ditto
* gcc.target/riscv/rvv/autovec/binop/vsub-rv32gcv-nofm.c: Ditto
* gcc.target/riscv/rvv/autovec/cmp/vcond-1.c: Fix and add test
* gcc.target/riscv/rvv/autovec/cond/cond_copysign-rv32gcv.c: Fix
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-1.c: Fix test
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fadd-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-5.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fma_fnma-6.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmax-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmin-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-5.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fms_fnms-6.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-1.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-2.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-3.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-4.c: Ditto
* gcc.target/riscv/rvv/autovec/cond/cond_fmul-5.c: Ditto

Signed-off-by: demin.han 
---
  gcc/config/riscv/autovec.md   |  2 +-
  gcc/config/riscv/riscv-v.cc   | 72 ---
  gcc/config/riscv/riscv.cc |  2 +-
  gcc/config/riscv/vector.md|  3 +-
  .../rvv/autovec/binop/vadd-rv32gcv-nofm.c |  4 +-
  .../rvv/autovec/binop/vdiv-rv32gcv-nofm.c |  4 +-
  .../rvv/autovec/binop/vmul-rv32gcv-nofm.c |  4 +-
  .../rvv/autovec/binop/vsub-rv32gcv-nofm.c |  4 +-
  .../riscv/rvv/autovec/cmp/vcond-1.c   | 48 -
  .../rvv/autovec/cond/cond_copysign-rv32gcv.c  |  8 +--
  .../riscv/rvv/autovec/cond/cond_fadd-1.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fadd-2.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fadd-3.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fadd-4.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fma_fnma-1.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fma_fnma-3.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fma_fnma-4.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fma_fnma-5.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fma_fnma-6.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fmax-1.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fmax-2.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fmax-3.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fmax-4.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fmin-1.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fmin-2.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fmin-3.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fmin-4.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fms_fnms-1.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fms_fnms-3.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fms_fnms-4.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fms_fnms-5.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fms_fnms-6.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fmul-1.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fmul-2.c  |  4 +-
  .../riscv/rvv/autovec/cond/cond_fmul-3.c  |  4 +-
 

Re: [PATCH] RISC-V: More support of vx and vf for autovec comparison

2024-07-17 Thread Robin Dapp
Hi Demin,

> +  void add_integer_operand (rtx x)
> +  {
> +create_integer_operand (&m_ops[m_opno++], INTVAL (x));
> +gcc_assert (m_opno <= MAX_OPERANDS);
> +  }

Can that be folded into add_input_operand somehow?

>void add_input_operand (rtx x, machine_mode mode)
>{
>  create_input_operand (&m_ops[m_opno++], x, mode);
> @@ -284,12 +289,13 @@ public:
>  for (; num_ops; num_ops--, opno++)
>{
>   any_mem_p |= MEM_P (ops[opno]);
> - machine_mode mode = insn_data[(int) icode].operand[m_opno].mode;
> + machine_mode orig_mode = insn_data[(int) icode].operand[m_opno].mode;
> + machine_mode mode = orig_mode;
>   /* 'create_input_operand doesn't allow VOIDmode.
>  According to vector.md, we may have some patterns that do not have
>  explicit machine mode specifying the operand. Such operands are
>  always Pmode.  */
> - if (mode == VOIDmode)
> + if (orig_mode == VOIDmode)
> mode = Pmode;

Maybe source_mode and dest_mode would be a bit clearer.

> - add_input_operand (ops[opno], mode);
> + if (CONST_INT_P (ops[opno]) && orig_mode != E_VOIDmode)
> +   add_integer_operand (ops[opno]);
> + else
> +   add_input_operand (ops[opno], mode);

Indents look odd from here.  Could you double-check with clang-format?

> - icode = code_for_pred_cmp (mode);
> +  icode = !scalar_p ? code_for_pred_cmp (mode)
> +   icode = code_for_pred_cmp_scalar (mode);

Ditto.

> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index 19b9b2daa95..ad5668b2c5a 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -2140,7 +2140,7 @@ riscv_const_insns (rtx x)
>  register vec_duplicate into vmv.v.x.  */
>   scalar_mode smode = GET_MODE_INNER (GET_MODE (x));
>   if (maybe_gt (GET_MODE_SIZE (smode), UNITS_PER_WORD)
> - && !immediate_operand (elt, Pmode))
> + && !FLOAT_MODE_P (smode) && !immediate_operand (elt, Pmode))

FLOAT_MODE is a bit broad here.  Maybe rather add a case before all others that
always allows zero constants for any mode (as well as a comment)?

> -if (maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (Pmode)))
> +bool gt_p = maybe_gt (GET_MODE_SIZE (mode), GET_MODE_SIZE (Pmode));
> +if (!FLOAT_MODE_P (mode) && gt_p)
>{
>  riscv_vector::emit_vlmax_insn (code_for_pred_broadcast (mode),
>  riscv_vector::UNARY_OP, operands);

Same here basically.  Isn't it just the zero constant?

> diff --git 
> a/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c 
> b/gcc/testsuite/gcc.target/riscv/rvv/autovec/binop/vadd-rv32gcv-nofm.c
> index db8c653b179..b9a040f2f78 100644

I suppose the -rv64 tests also need adjustment?

Regards
 Robin


Re: [PATCH] varasm: Shorten assembly of strings with larger zero regions

2024-07-17 Thread Jakub Jelinek
On Wed, Jul 17, 2024 at 04:15:16PM +0200, Richard Biener wrote:
> Ok.  Is there a more general repeat byte op available?

I think
.skip   bytes, fill
but not sure what assemblers do support that, not sure it is that
common to have long sequences of non-zero identical bytes and
for '\0's we were already splitting the string into separate directives,
while say for
.string 
"fooo"
we aren't.
The intent of this patch was just to get some common savings without
sacrifying readability too much.

Jakub



Re: [committed][PR rtl-optimization/115876][PR rtl-optimization/115916] Fix sign/carry bit handling in ext-dce

2024-07-17 Thread Jeff Law




On 7/17/24 12:46 AM, Andreas Schwab wrote:

On Jul 15 2024, Jeff Law wrote:


My change to fix a ubsan issue broke handling propagation of the
carry/sign bit down through a right shift.


What about the other ASHIFTs?
They're on my list.  Just didn't have the time to work through those 
cases mentally and didn't want to leave things in a badly broken state.


jeff


Re: [PATCH v9 08/10] Add tests for C/C++ musttail attributes

2024-07-17 Thread Andi Kleen
> Great. Does it also work in a non-template function?

Sadly it did not because there needs to be more AGGR_VIEW_EXPR handling,
as you predicted at some point. I fixed it now. Will send updated patches.

-Andi


Re: [PATCH] varasm: Shorten assembly of strings with larger zero regions

2024-07-17 Thread Richard Biener



> Am 17.07.2024 um 16:45 schrieb Jakub Jelinek :
> 
> On Wed, Jul 17, 2024 at 04:15:16PM +0200, Richard Biener wrote:
>> Ok.  Is there a more general repeat byte op available?
> 
> I think
>.skipbytes, fill
> but not sure what assemblers do support that, not sure it is that
> common to have long sequences of non-zero identical bytes and
> for '\0's we were already splitting the string into separate directives,
> while say for
>.string 
> "fooo"
> we aren't.
> The intent of this patch was just to get some common savings without
> sacrifying readability too much.

Fair enough, I was just wondering.

Richard 

>Jakub
> 


Re: [PING][patch, avr] Implement PR90616: Improve adding symbols that are 256-byte aligned

2024-07-17 Thread Georg-Johann Lay

Am 17.07.24 um 16:36 schrieb Jeff Law:

On 7/17/24 3:45 AM, Georg-Johann Lay wrote:

Ping #1 for

https://gcc.gnu.org/pipermail/gcc-patches/2024-July/656332.html


Address computation (usually add) with symbols that are aligned
to 256 bytes does not require to add the lo8() part as it is zero.

This patch adds a new combine insn that performs a widening add
from QImode plus such a symbol.  The case when such an aligned
symbol is added to a reg that's already in HImode can be handled
in the addhi3 asm printer.

Ok to apply?

Johann

--

AVR: target90616 - Improve adding constants that are 0 mod 256.

This patch introduces a new insn that works as an insn combine
pattern for (plus:HI (zero_extend:HI(reg:QI) const_0mod256_operannd:HI))
which requires at most 2 instructions.  When the input register operand
is already in HImode, the addhi3 printer only adds the hi8 part when
it sees a SYMBOL_REF or CONST aligned to at least 256 bytes.
(The CONST_INT case was already handled).

gcc/
    PR target/90616
    * config/avr/predicates.md (const_0mod256_operand): New predicate.
    * config/avr/constraints.md (Cp8): New constraint.
    * config/avr/avr.md (*aligned_add_symbol): New insn.
    * config/avr/avr.cc (avr_out_plus_symbol) [HImode]:
    When op2 is a multiple of 256, there is no need to add / subtract
    the lo8 part.
    (avr_rtx_costs_1) [PLUS && HImode]: Return expected costs for
    new insn *aligned_add_symbol as it applies.

Sorry.  I must have lost this.  Thanks for pinging.

It looks fine for the trunk.  Out of curiosity, does the avr port 
implement linker relaxing for this case?  That would seem to be 


No. avr-ld performs relaxing, but only the two cases of

- JMP/CALL to RJMP/RCALL provided the offset fits.
- [R]CALL+RET to [R]JMP provided there's no label between.

I had a look at ld relaxing some time ago, and I must admit that
I don't understand completely what they are doing.  Not much comments
and explanations there, basically a copy+paste from some other target
from decades ago...

Linker relaxing would be ADD.lo8 + ADC.hi8 => ADD.hi8 which affects
condition code.

Johann

generally helpful as it'd be able to automatically detect when the 
symbolic reference has the right low order bits.  Obviously not a 
requirement for this patch to go forward, just an observation for 
further improvements.


jeff


Re: [PATCH]middle-end: fix 0 offset creation and folding [PR115936]

2024-07-17 Thread Andrew Pinski
On Tue, Jul 16, 2024 at 4:08 AM Tamar Christina  wrote:
>
> Hi All,
>
> As shown in PR115936 SCEV and IVOPTS create an invalidate IV when the IV is
> a pointer type:
>
> ivtmp.39_65 = ivtmp.39_59 + 0B;
>
> where the IVs are DI mode and the offset is a pointer.
> This comes from this weird candidate:
>
> Candidate 8:
>   Var befor: ivtmp.39_59
>   Var after: ivtmp.39_65
>   Incr POS: before exit test
>   IV struct:
> Type:   sizetype
> Base:   0
> Step:   0B
> Biv:N
> Overflowness wrto loop niter:   No-overflow
>
> This IV was always created just ended up not being used.
>
> This is created by SCEV.
>
> simple_iv_with_niters in the case where no CHREC is found creates an IV with
> base == ev, offset == 0;
>
> however in this case EV is a POINTER_PLUS_EXPR and so the type is a pointer.
> it ends up creating an unusable expression.
>
> However IVOPTS also has code to refold expression in case the IV is a pointer.
> For most cases it uses basetype to fold both operand but the 0 case it 
> ommitted
> it.  This leads to us creating a PLUS expression with mismatched types.
>
> This fixes that bug as well.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu -m32, -m64 and no issues.
>
> Ok for master?

Thanks for fixing this. This looks like it has been a bug since I
introduced POINTER_PLUS_EXPR back in r0-81506-g5be014d5b728cf before
that PLUS_EXPR was used for pointers and didn't need this kind of
special casing. scalar evolution and IV-OPTs were one of the places
which needed many fixups when PTRPLUS and this one place spilled
through for many years (17).

Thanks,
Andrew

>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> PR tree-optimization/115936
> * tree-scalar-evolution.cc (simple_iv_with_niters): Use sizetype for
> pointers.
> * tree-ssa-loop-ivopts.cc (add_iv_candidate_for_use): Use same type 
> for
> both operands.
>
> ---
> diff --git a/gcc/tree-scalar-evolution.cc b/gcc/tree-scalar-evolution.cc
> index 
> 5aa95a2497a317f9b43408ce78a2d50c20151314..abb2bad777375889d6c980b54d60699672fd5742
>  100644
> --- a/gcc/tree-scalar-evolution.cc
> +++ b/gcc/tree-scalar-evolution.cc
> @@ -3243,7 +3243,11 @@ simple_iv_with_niters (class loop *wrto_loop, class 
> loop *use_loop,
>if (tree_does_not_contain_chrecs (ev))
>  {
>iv->base = ev;
> -  iv->step = build_int_cst (TREE_TYPE (ev), 0);
> +  tree ev_type = TREE_TYPE (ev);
> +  if (POINTER_TYPE_P (ev_type))
> +   ev_type = sizetype;
> +
> +  iv->step = build_int_cst (ev_type, 0);
>iv->no_overflow = true;
>return true;
>  }
> diff --git a/gcc/tree-ssa-loop-ivopts.cc b/gcc/tree-ssa-loop-ivopts.cc
> index 
> c3218a3e8eedbb8d0a7f14c01eeb069cb6024c29..fe130541526e74fb80fee633f6c96b41437aa1c1
>  100644
> --- a/gcc/tree-ssa-loop-ivopts.cc
> +++ b/gcc/tree-ssa-loop-ivopts.cc
> @@ -3529,7 +3529,8 @@ add_iv_candidate_for_use (struct ivopts_data *data, 
> struct iv_use *use)
>basetype = TREE_TYPE (iv->base);
>if (POINTER_TYPE_P (basetype))
>  basetype = sizetype;
> -  record_common_cand (data, build_int_cst (basetype, 0), iv->step, use);
> +  record_common_cand (data, build_int_cst (basetype, 0),
> + fold_convert (basetype, iv->step), use);
>
>/* Compare the cost of an address with an unscaled index with the cost of
>  an address with a scaled index and add candidate if useful.  */
>
>
>
>
> --


[PATCH] rs6000, remove __builtin_vsx_xvcmp* built-ins

2024-07-17 Thread Carl Love

GCC maintainers:

The following patch removes the three __builtin_vsx_xvcmp[eq|ge|gt]sp  
builtins as they similar to the overloaded vec_cmp[eq|ge|gt] built-ins.  
The difference is the overloaded built-ins return a vector of boolean or 
a vector of long long booleans where as the removed built-ins returned a 
vector of floats or vector of doubles.


The tests for __builtin_vsx_xvcmp[eq|ge|gt]sp and 
__builtin_vsx_xvcmp[eq|ge|gt]dp are updated to use the overloaded 
vec_cmp[eq|ge|gt] built-in with the required changes for the return 
type.  Note __builtin_vsx_xvcmp[eq|ge|gt]dp are used internally.


The patches have been tested on a Power 10 LE system with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl
-
rs6000, remove __builtin_vsx_xvcmp* built-ins

This patch removes the built-ins:
 __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
 __builtin_vsx_xvcmpgtsp.

which are similar to the overloaded vec_cmpeq, vec_cmpgt and vec_cmpge
built-ins.

The difference is that the overloaded built-ins return a vector of
booleans or a vector of long long boolean depending if the inputs were a
vector of floats or a vector of doubles.  The removed built-ins
returned a vector of floats or vector of double for the vector float and
vector double inputs respectively.

The __builtin_vsx_xvcmpeqdp, __builtin_vsx_xvcmpgedp and
__builtin_vsx_xvcmpgtdp are not removed as they are used by the
overloaded vec_cmpeq, vec_cmpgt and vec_cmpge built-ins.

The test cases for the __builtin_vsx_xvcmpeqsp, __builtin_vsx_xvcmpgesp,
__builtin_vsx_xvcmpgtsp, __builtin_vsx_xvcmpeqdp,
__builtin_vsx_xvcmpgedp and __builtin_vsx_xvcmpgtdp  are changed to use
the overloaded vec_cmpeq, vec_cmpgt, vec_cmpge built-ins.  Use of the
overloaded built-ins requires the result to be stored in a vector of
boolean of the appropriate size or the result must be cast to the return
type used by the original __builtin_vsx_xvcmp* built-ins.
---
 gcc/config/rs6000/rs6000-builtins.def | 10 ---
 .../gcc.target/powerpc/vsx-builtin-3.c    | 28 ++-
 2 files changed, 21 insertions(+), 17 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 77eb0f7e406..896d9686ac6 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1579,30 +1579,20 @@
   const signed int __builtin_vsx_xvcmpeqdp_p (signed int, vd, vd);
 XVCMPEQDP_P vector_eq_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpeqsp (vf, vf);
-    XVCMPEQSP vector_eqv4sf {}
-
   const vd __builtin_vsx_xvcmpgedp (vd, vd);
 XVCMPGEDP vector_gev2df {}

   const signed int __builtin_vsx_xvcmpgedp_p (signed int, vd, vd);
 XVCMPGEDP_P vector_ge_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpgesp (vf, vf);
-    XVCMPGESP vector_gev4sf {}
-
   const signed int __builtin_vsx_xvcmpgesp_p (signed int, vf, vf);
 XVCMPGESP_P vector_ge_v4sf_p {pred}

   const vd __builtin_vsx_xvcmpgtdp (vd, vd);
 XVCMPGTDP vector_gtv2df {}
-
   const signed int __builtin_vsx_xvcmpgtdp_p (signed int, vd, vd);
 XVCMPGTDP_P vector_gt_v2df_p {pred}

-  const vf __builtin_vsx_xvcmpgtsp (vf, vf);
-    XVCMPGTSP vector_gtv4sf {}
-
   const signed int __builtin_vsx_xvcmpgtsp_p (signed int, vf, vf);
 XVCMPGTSP_P vector_gt_v4sf_p {pred}

diff --git a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c 
b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c

index 60f91aad23c..d67f97c8011 100644
--- a/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
+++ b/gcc/testsuite/gcc.target/powerpc/vsx-builtin-3.c
@@ -156,13 +156,27 @@ int do_cmp (void)
 {
   int i = 0;

-  d[i][0] = __builtin_vsx_xvcmpeqdp (d[i][1], d[i][2]); i++;
-  d[i][0] = __builtin_vsx_xvcmpgtdp (d[i][1], d[i][2]); i++;
-  d[i][0] = __builtin_vsx_xvcmpgedp (d[i][1], d[i][2]); i++;
-
-  f[i][0] = __builtin_vsx_xvcmpeqsp (f[i][1], f[i][2]); i++;
-  f[i][0] = __builtin_vsx_xvcmpgtsp (f[i][1], f[i][2]); i++;
-  f[i][0] = __builtin_vsx_xvcmpgesp (f[i][1], f[i][2]); i++;
+  /* The __builtin_vsx_xvcmp[gt|ge|eq]dp and 
__builtin_vsx_xvcmp[gt|ge|eq]sp

+ have been removed in favor of the overloaded vec_cmpeq, vec_cmpgt and
+ vec_cmpge built-ins.  The __builtin_vsx_xvcmp* builtins returned a 
vector
+ result of the same type as the arguments.  The vec_cmp* built-ins 
return
+ a vector of boolenas of the same size as the arguments. Thus the 
result
+ assignment must be to a boolean or cast to a boolean.  Test both 
cases.

+  */
+
+  d[i][0] = (vector double) vec_cmpeq (d[i][1], d[i][2]); i++;
+  d[i][0] = (vector double) vec_cmpgt (d[i][1], d[i][2]); i++;
+  d[i][0] = (vector double) vec_cmpge (d[i][1], d[i][2]); i++;
+  bl[i][0] = vec_cmpeq (d[i][1], d[i][2]); i++;
+  bl[i][0] = vec_cmpgt (d[i][1], d[i][2]); i++;
+  bl[i][0] = vec_cmpge (d[i][1], d[i][2]); i++;
+

Re: [Ping, Fortran, Patch, PR82904] Fix [11/12/13/14/15 Regression][Coarray] ICE in make_ssa_name_fn, at tree-ssanames.c:261

2024-07-17 Thread Paul Richard Thomas
Hi Andre,

It looks good to me. I am happy to see that the principle of the patch has
Richi's blessing too.

OK for mainline. I leave it for you (and Richi?) to decide whether to
backport in time for the 14.2 release.

Regards

Paul


On Wed, 17 Jul 2024 at 14:08, Andre Vehreschild  wrote:

> Hi all,
>
> pinging for attached patch rebased on master and my patch for 78466.
>
> Anyone in for a review?
>
> Regtests ok on x86_64-pc-linux-gnu / Fedora 39. Ok for mainline?
>
> Regards,
> Andre
>
> On Wed, 10 Jul 2024 14:51:53 +0200
> Andre Vehreschild  wrote:
>
> > Hi all,
> >
> > the patch attached fixes the use of an uninitialized variable for the
> string
> > length in the declaration of the char[1:_len] type (the _len!). The type
> for
> > save'd deferred length char arrays is now char*, so that there is no
> need for
> > the length in the type declaration anymore. The length is of course still
> > provided and needed later on.
> >
> > I hope this fixes the ICE in the IPA: inline phase, because I never saw
> it. Is
> > that what you had in mind @Richard?
> >
> > Regtests ok on x86_64-pc-linux-gnu/Fedora 39. Ok for mainline?
> >
> > Regards,
> >   Andre
> > --
> > Andre Vehreschild * Email: vehre ad gmx dot de
>
>
> --
> Andre Vehreschild * Email: vehre ad gmx dot de
>


Re: [PING][patch, avr] Implement PR90616: Improve adding symbols that are 256-byte aligned

2024-07-17 Thread Jeff Law




On 7/17/24 9:26 AM, Georg-Johann Lay wrote:



It looks fine for the trunk.  Out of curiosity, does the avr port 
implement linker relaxing for this case?  That would seem to be 


No. avr-ld performs relaxing, but only the two cases of

- JMP/CALL to RJMP/RCALL provided the offset fits.
- [R]CALL+RET to [R]JMP provided there's no label between.
Yea, the first could be comparable to other targets.  The second is 
probably not all that common since the compiler should be doing that 
tail call elimination.




I had a look at ld relaxing some time ago, and I must admit that
I don't understand completely what they are doing.  Not much comments
and explanations there, basically a copy+paste from some other target
from decades ago...
Can't speak for the avr implementation, but in general, yes, odds are it 
was copied from some other target eons ago with minimal documentation.


The basics are straightforward.  The devil is in all the details.  It's 
been years since I've done any linker relaxing, but I've been immersed 
in it in the past.  I tried to comment my implementation, both in terms 
of the code sequences for the target and the interactions with the BFD 
datastructures.




Linker relaxing would be ADD.lo8 + ADC.hi8 => ADD.hi8 which affects
condition code.

In which case it'd only be safe if you knew that CC died before being used.

jeff


[PATCH] c++: wrong error initializing empty class [PR115900]

2024-07-17 Thread Marek Polacek
Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?

-- >8 --
In r14-409, we started handling empty bases first in cxx_fold_indirect_ref_1
so that we don't need to recurse and waste time.

This caused a bogus "modifying a const object" error.  I'm appending my
analysis from the PR, but basically, cxx_fold_indirect_ref now returns
a different object than before, and we mark the wrong thing as const,
but since we're initializing an empty object, we should avoid setting
the object constness.

~~
Pre-r14-409: we're evaluating the call to C::C(), which is in the body of
B::B(), which is the body of D::D(&d):

  C::C ((struct C *) this, NON_LVALUE_EXPR <0>)

It's a ctor so we get here:

 3118   /* Remember the object we are constructing or destructing.  */
 3119   tree new_obj = NULL_TREE;
 3120   if (DECL_CONSTRUCTOR_P (fun) || DECL_DESTRUCTOR_P (fun))
 3121 {
 3122   /* In a cdtor, it should be the first `this' argument.
 3123  At this point it has already been evaluated in the call
 3124  to cxx_bind_parameters_in_call.  */
 3125   new_obj = TREE_VEC_ELT (new_call.bindings, 0);

new_obj=(struct C *) &d.D.2656

 3126   new_obj = cxx_fold_indirect_ref (ctx, loc, DECL_CONTEXT (fun), 
new_obj);

new_obj=d.D.2656.D.2597

We proceed to evaluate the call, then we get here:

 3317   /* At this point, the object's constructor will have run, so
 3318  the object is no longer under construction, and its possible
 3319  'const' semantics now apply.  Make a note of this fact by
 3320  marking the CONSTRUCTOR TREE_READONLY.  */
 3321   if (new_obj && DECL_CONSTRUCTOR_P (fun))
 3322 cxx_set_object_constness (ctx, new_obj, /*readonly_p=*/true,
 3323   non_constant_p, overflow_p);

new_obj is still d.D.2656.D.2597, its type is "C", cxx_set_object_constness
doesn't set anything as const.  This is fine.

After r14-409: on line 3125, new_obj is (struct C *) &d.D.2656 as before,
but we go to cxx_fold_indirect_ref_1:

 5739   if (is_empty_class (type)
 5740   && CLASS_TYPE_P (optype)
 5741   && lookup_base (optype, type, ba_any, NULL, tf_none, off))
 5742 {
 5743   if (empty_base)
 5744 *empty_base = true;
 5745   return op;

type is C, which is an empty class; optype is "const D", and C is a base of D.
So we return the VAR_DECL 'd'.  Then we get to cxx_set_object_constness with
object=d, which is const, so we mark the constructor READONLY.

Then we're evaluating A::A() which has

  ((A*)this)->data = 0;

we evaluate the LHS to d.D.2656.a, for which the initializer is
{.D.2656={.a={.data=}}} which is TREE_READONLY and 'd' is const, so we think
we're modifying a const object and fail the constexpr evaluation.

PR c++/115900

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Set new_obj to NULL_TREE
if cxx_fold_indirect_ref set empty_base to true.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-init23.C: New test.
---
 gcc/cp/constexpr.cc   | 14 
 gcc/testsuite/g++.dg/cpp2a/constexpr-init23.C | 22 +++
 2 files changed, 32 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-init23.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index f12b1dfc46d..abd3b04ea7f 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -3123,10 +3123,16 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
tree t,
 At this point it has already been evaluated in the call
 to cxx_bind_parameters_in_call.  */
   new_obj = TREE_VEC_ELT (new_call.bindings, 0);
-  new_obj = cxx_fold_indirect_ref (ctx, loc, DECL_CONTEXT (fun), new_obj);
-
-  if (ctx->call && ctx->call->fundef
- && DECL_CONSTRUCTOR_P (ctx->call->fundef->decl))
+  bool empty_base = false;
+  new_obj = cxx_fold_indirect_ref (ctx, loc, DECL_CONTEXT (fun), new_obj,
+  &empty_base);
+  /* If we're initializing an empty class, don't set constness, because
+cxx_fold_indirect_ref will return the wrong object to set constness
+of.  */
+  if (empty_base)
+   new_obj = NULL_TREE;
+  else if (ctx->call && ctx->call->fundef
+  && DECL_CONSTRUCTOR_P (ctx->call->fundef->decl))
{
  tree cur_obj = TREE_VEC_ELT (ctx->call->bindings, 0);
  STRIP_NOPS (cur_obj);
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-init23.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-init23.C
new file mode 100644
index 000..466236d446d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-init23.C
@@ -0,0 +1,22 @@
+// PR c++/115900
+// { dg-do compile { target c++20 } }
+
+struct A {
+char m;
+constexpr A() { m = 0; }
+};
+
+struct C {
+  constexpr C(){ };
+};
+
+struct B : C {
+  A a;
+  constexpr B() {}
+};
+
+struct D : B { };
+

[PATCH] rs6000, Remove __builtin_vec_set_v1ti,, __builtin_vec_set_v2df, __builtin_vec_set_v2di

2024-07-17 Thread Carl Love

GCC maintainers:

This patch removes the __builtin_vec_set_v1ti, __builtin_vec_set_v2df 
and __builtin_vec_set_v2di built-ins.  The users should just use normal 
C-code to update the various vector elements.  This change was 
originally intended to be part of the earlier series of cleanup 
patches.  It was initially thought that some additional work would be 
needed to do some gimple generation instead of these built-ins.  
However, the existing default code generation does produce the needed 
code.  The code generated with normal C-code is as good or better than 
the code generated with these built-ins.


The patch has been tested on Power 10 LE with no regressions.

Please let me know if the patch is acceptable for mainline.  Thanks.

   Carl

---
rs6000, Remove __builtin_vec_set_v1ti, __builtin_vec_set_v2df, 
__builtin_vec_set_v2di


Remove the built-ins, use the default gimple generation instead.

gcc/ChangeLog:
    * config/rs6000/rs6000-builtins.def (__builtin_vec_set_v1ti,
    __builtin_vec_set_v2df, __builtin_vec_set_v2di): Remove built-in
    definitions.
    * config/rs6000/rs6000-c.cc (resolve_vec_insert):  Remove if
    statemnts for mode == V2DFmode, mode == V2DImode and
    mode == V1TImode that reference RS6000_BIF_VEC_SET_V2DF,
    RS6000_BIF_VEC_SET_V2DI and RS6000_BIF_VEC_SET_V1TI.
---
 gcc/config/rs6000/rs6000-builtins.def | 13 -
 gcc/config/rs6000/rs6000-c.cc | 40 ---
 2 files changed, 53 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-builtins.def 
b/gcc/config/rs6000/rs6000-builtins.def

index 896d9686ac6..0ebc940f395 100644
--- a/gcc/config/rs6000/rs6000-builtins.def
+++ b/gcc/config/rs6000/rs6000-builtins.def
@@ -1263,19 +1263,6 @@
   const signed long long __builtin_vec_ext_v2di (vsll, signed int);
 VEC_EXT_V2DI nothing {extract}

-;; VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI are used in
-;; resolve_vec_insert(), rs6000-c.cc
-;; TODO: Remove VEC_SET_V1TI, VEC_SET_V2DF and VEC_SET_V2DI once the uses
-;; in resolve_vec_insert are replaced by the equivalent gimple statements.
-  const vsq __builtin_vec_set_v1ti (vsq, signed __int128, const int<0,0>);
-    VEC_SET_V1TI nothing {set}
-
-  const vd __builtin_vec_set_v2df (vd, double, const int<1>);
-    VEC_SET_V2DF nothing {set}
-
-  const vsll __builtin_vec_set_v2di (vsll, signed long long, const int<1>);
-    VEC_SET_V2DI nothing {set}
-
   const vsc __builtin_vsx_cmpge_16qi (vsc, vsc);
 CMPGE_16QI vector_nltv16qi {}

diff --git a/gcc/config/rs6000/rs6000-c.cc b/gcc/config/rs6000/rs6000-c.cc
index 6229c503bd0..c288acc200b 100644
--- a/gcc/config/rs6000/rs6000-c.cc
+++ b/gcc/config/rs6000/rs6000-c.cc
@@ -1522,46 +1522,6 @@ resolve_vec_insert (resolution *res, vecva_gc> *arglist,

   return error_mark_node;
 }

-  /* If we can use the VSX xxpermdi instruction, use that for insert.  */
-  machine_mode mode = TYPE_MODE (arg1_type);
-
-  if ((mode == V2DFmode || mode == V2DImode)
-  && VECTOR_UNIT_VSX_P (mode)
-  && TREE_CODE (arg2) == INTEGER_CST)
-    {
-  wide_int selector = wi::to_wide (arg2);
-  selector = wi::umod_trunc (selector, 2);
-  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
-
-  tree call = NULL_TREE;
-  if (mode == V2DFmode)
-    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DF];
-  else if (mode == V2DImode)
-    call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V2DI];
-
-  /* Note, __builtin_vec_insert_ has vector and scalar types
-     reversed.  */
-  if (call)
-    {
-      *res = resolved;
-      return build_call_expr (call, 3, arg1, arg0, arg2);
-    }
-    }
-
-  else if (mode == V1TImode
-       && VECTOR_UNIT_VSX_P (mode)
-       && TREE_CODE (arg2) == INTEGER_CST)
-    {
-  tree call = rs6000_builtin_decls[RS6000_BIF_VEC_SET_V1TI];
-  wide_int selector = wi::zero(32);
-  arg2 = wide_int_to_tree (TREE_TYPE (arg2), selector);
-
-  /* Note, __builtin_vec_insert_ has vector and scalar types
-     reversed.  */
-  *res = resolved;
-  return build_call_expr (call, 3, arg1, arg0, arg2);
-    }
-
   /* Build *(((arg1_inner_type*) & (vector type){arg1}) + arg2) = arg0 
with

  VIEW_CONVERT_EXPR.  i.e.:
    D.3192 = v1;
--
2.45.2




[COMMITTED] Regenerate c.opt.urls

2024-07-17 Thread Mark Wielaard
Hi,

On Wed, 2024-07-17 at 13:55 +0200, Mark Wielaard wrote:
> On Sun, 2024-07-14 at 15:31 +0200, Alejandro Colomar wrote:
> > On Sun, Jul 14, 2024 at 01:37:02PM GMT, Jonathan Wakely wrote:
> > > On Sun, 14 Jul 2024, 12:30 Alejandro Colomar via Gcc-help, <
> > > gcc-h...@gcc.gnu.org> wrote:
> > > > Did I break anything?  I see the failure being `git diff --exit-code`,
> > > > which doesn't seem like anything broken, but I don't know what that test
> > > > is for, so I'll ask.
> > > > 
> > > 
> > > It checks that necessary auto-generated files have been regenerated and
> > > committed.
> > > 
> > > If you didn't do anything related to that warning option that would have
> > > affected the c.opts.urls file then it wasn't you (I think it was a change
> > > from Marek).
> > 
> > I did add that warning option:
> > 
> > commit 44c9403ed1833ae71a59e84f9e37af3182be0df5
> > Author: Alejandro Colomar 
> > AuthorDate: Sat Jun 29 15:10:43 2024 +0200
> > Commit: Martin Uecker 
> > CommitDate: Sun Jul 14 11:41:00 2024 +0200
> > 
> > c, objc: Add -Wunterminated-string-initialization
> > 
> > Warn about the following:
> > 
> > char  s[3] = "foo";
> > 
> > I guess I should have committed some re-generated files?  Is that
> > documented somewhere?
> 
> Adding David Malcolm to CC who might know where this is documented.
> 
> But yes, after adding a new warning option you should run:
> 
> make html && cd gcc && make regenerate-opt-urls
> 
> This will produce the needed opt.url changes you should commit together
> with your change. In this case:
> 
> diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls
> index 1b60ae4847b..df5f58a1eee 100644
> --- a/gcc/c-family/c.opt.urls
> +++ b/gcc/c-family/c.opt.urls
> @@ -870,6 +870,9 @@ 
> UrlSuffix(gcc/Warning-Options.html#index-Wno-unknown-pragmas) 
> LangUrlSuffix_D(gd
>  Wunsuffixed-float-constants
>  UrlSuffix(gcc/Warning-Options.html#index-Wno-unsuffixed-float-constants)
>  
> +Wunterminated-string-initialization
> +UrlSuffix(gcc/Warning-Options.html#index-Wno-unterminated-string-initialization)
> +
>  Wunused
>  UrlSuffix(gcc/Warning-Options.html#index-Wno-unused)

I made sure that was also generated locally and pushed the attached to
make the autoregen buildbot happy.

Cheers,

Mark
From bdb4db1ec966c974b9b7bf5e3d2edda93d8635aa Mon Sep 17 00:00:00 2001
From: Mark Wielaard 
Date: Wed, 17 Jul 2024 17:58:14 +0200
Subject: [PATCH] Regenerate c.opt.urls

The addition of -Wunterminated-string-initialization should have
regenerated the c.opt.urls file.

Fixes: 44c9403ed183 ("c, objc: Add -Wunterminated-string-initialization")

gcc/c-family/ChangeLog:

	* c.opt.urls: Regenerate.
---
 gcc/c-family/c.opt.urls | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls
index 1b60ae4847b1..df5f58a1eeed 100644
--- a/gcc/c-family/c.opt.urls
+++ b/gcc/c-family/c.opt.urls
@@ -870,6 +870,9 @@ UrlSuffix(gcc/Warning-Options.html#index-Wno-unknown-pragmas) LangUrlSuffix_D(gd
 Wunsuffixed-float-constants
 UrlSuffix(gcc/Warning-Options.html#index-Wno-unsuffixed-float-constants)
 
+Wunterminated-string-initialization
+UrlSuffix(gcc/Warning-Options.html#index-Wno-unterminated-string-initialization)
+
 Wunused
 UrlSuffix(gcc/Warning-Options.html#index-Wno-unused)
 
-- 
2.45.2



Re: [PATCH ver 2] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c

2024-07-17 Thread Carl Love




On 7/16/24 6:01 PM, Peter Bergner wrote:

On 7/16/24 6:19 PM, Carl Love wrote:

use __int128 types that are not supported on all platforms.  The
__int128 type is only supported on 64-bit platforms.  Need to check that
the platform is 64-bits and support the __int128 type.  Add the int128 and
lp64 flags to the target test.

The test cases themselves look good, but you need to update your git log entry
to not mention the lp64/64-bits since you removed them.

Yea, I didn't get the lp64 references clean up properly.  Sorry about that.

  Yes, currently, only
64-bit targets support __int128, but our hope is that one day, even 32-bit
targets will as well.  So how about the following text instead?


...
use __int128 types that are not supported on all platforms.  Update the
tests to check int128 effective target to avoid unsupported type errors
on unsupported platforms.


OK, changed.

 Carl




[PATCH ver 3] rs6000, update effective target for tests builtins-10*.c and, vec_perm-runnable-i128.c

2024-07-17 Thread Carl Love

GCC maintainers:

Version 3, in version 2, the ChangeLog didn't get updated to remove the 
LP64 references.  Fixed that and updated the patch description per the 
feedback from Peter.


Version 2, removed the lp64 from the target per discussion.  Tested and 
it is not needed.  The int128 qualifier is sufficient for the thest to 
report as unsupported on a 32-bit Power system.


The tests:

  tests builtins-10-runnable.c
  tests builtins-10.c
  vec_perm-runnable-i128.c

generate the following errors when run on a 32-bit BE Power system with 
GCC configured with multilib enabled.


FAIL: gcc.target/powerpc/builtins-10-runnable.c (test for excess errors)
FAIL: gcc.target/powerpc/builtins-10.c (test for excess errors)
FAIL: gcc.target/powerpc/vec_perm-runnable-i128.c (test for excess errors)

The tests use the __int128 type which is not supported on 32-bit 
systems.  The test for int128 and lp64 was added to the test cases to 
disable the test on 32-bit systems and systems that do not support the 
__int128 type.  The three tests now report "# of unsupported tests 1".


The patch has been tested on a Power 9 BE system with multilib enabled 
for GCC and on a Power 10 LE 64-bit configuration with no regression 
failures.


Please let me know if the patch is acceptable for mainline. Thanks.

   Carl
--
rs6000, update effective target for tests builtins-10*.c and 
vec_perm-runnable-i128.c


The tests:

  tests builtins-10-runnable.c
  tests builtins-10.c
  vec_perm-runnable-i128.c

use __int128 types that are not supported on all platforms.  Update the
tests to check int128 effective target to avoid unsupported type errors
on unsupported platforms.

gcc/testsuite/ChangeLog:
    * gcc.target/powerpc/builtins-10-runnable.c: Add
    target int128.
    * gcc.target/powerpc/builtins-10.c: Add
    target int128.
    * gcc.target/powerpc/vec_perm-runnable-i128: Add
    target int128.
---
 gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c   | 2 +-
 gcc/testsuite/gcc.target/powerpc/builtins-10.c    | 2 +-
 gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c

index dede08358e1..e2d3c990852 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10-runnable.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target int128 } } */
 /* { dg-require-effective-target vmx_hw } */
 /* { dg-options "-maltivec -O2 " } */

diff --git a/gcc/testsuite/gcc.target/powerpc/builtins-10.c 
b/gcc/testsuite/gcc.target/powerpc/builtins-10.c

index b00f53cfc62..007892e2731 100644
--- a/gcc/testsuite/gcc.target/powerpc/builtins-10.c
+++ b/gcc/testsuite/gcc.target/powerpc/builtins-10.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do compile { target int128 } } */
 /* { dg-options "-O2 -maltivec" } */
 /* { dg-require-effective-target powerpc_altivec } */
 /* { dg-final { scan-assembler-times "xxsel" 6 } } */
diff --git a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c 
b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c

index 0e0d77bcb84..df1bf873cfc 100644
--- a/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
+++ b/gcc/testsuite/gcc.target/powerpc/vec_perm-runnable-i128.c
@@ -1,4 +1,4 @@
-/* { dg-do run } */
+/* { dg-do run { target  int128 } } */
 /* { dg-require-effective-target vmx_hw } */
 /* { dg-options "-maltivec -O2 " } */

--
2.45.2




[committed] alpha: Fix duplicate !tlsgd!62 assemble error [PR115526]

2024-07-17 Thread Uros Bizjak
Add missing "cannot_copy" attribute to instructions that have to
stay in 1-1 correspondence with another insn.

PR target/115526

gcc/ChangeLog:

* config/alpha/alpha.md (movdi_er_high_g): Add cannot_copy attribute.
(movdi_er_tlsgd): Ditto.
(movdi_er_tlsldm): Ditto.
(call_value_osf_): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/alpha/pr115526.c: New test.

Tested by Maciej on Alpha/Linux target and reported in the PR.

Uros.
diff --git a/gcc/config/alpha/alpha.md b/gcc/config/alpha/alpha.md
index 1e2de5a4d15..bd92392878e 100644
--- a/gcc/config/alpha/alpha.md
+++ b/gcc/config/alpha/alpha.md
@@ -3902,7 +3902,8 @@ (define_insn "movdi_er_high_g"
   else
 return "ldq %0,%2(%1)\t\t!literal!%3";
 }
-  [(set_attr "type" "ldsym")])
+  [(set_attr "type" "ldsym")
+   (set_attr "cannot_copy" "true")])
 
 (define_split
   [(set (match_operand:DI 0 "register_operand")
@@ -3926,7 +3927,8 @@ (define_insn "movdi_er_tlsgd"
 return "lda %0,%2(%1)\t\t!tlsgd";
   else
 return "lda %0,%2(%1)\t\t!tlsgd!%3";
-})
+}
+  [(set_attr "cannot_copy" "true")])
 
 (define_insn "movdi_er_tlsldm"
   [(set (match_operand:DI 0 "register_operand" "=r")
@@ -3939,7 +3941,8 @@ (define_insn "movdi_er_tlsldm"
 return "lda %0,%&(%1)\t\t!tlsldm";
   else
 return "lda %0,%&(%1)\t\t!tlsldm!%2";
-})
+}
+  [(set_attr "cannot_copy" "true")])
 
 (define_insn "*movdi_er_gotdtp"
   [(set (match_operand:DI 0 "register_operand" "=r")
@@ -5908,6 +5911,7 @@ (define_insn "call_value_osf_"
   "HAVE_AS_TLS"
   "ldq $27,%1($29)\t\t!literal!%2\;jsr $26,($27),%1\t\t!lituse_!%2\;ldah 
$29,0($26)\t\t!gpdisp!%*\;lda $29,0($29)\t\t!gpdisp!%*"
   [(set_attr "type" "jsr")
+   (set_attr "cannot_copy" "true")
(set_attr "length" "16")])
 
 ;; We must use peep2 instead of a split because we need accurate life
diff --git a/gcc/testsuite/gcc.target/alpha/pr115526.c 
b/gcc/testsuite/gcc.target/alpha/pr115526.c
new file mode 100644
index 000..2f57903fec3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/alpha/pr115526.c
@@ -0,0 +1,46 @@
+/* PR target/115526 */
+/* { dg-do assemble } */
+/* { dg-options "-O2 -Wno-attributes -fvisibility=hidden -fPIC -mcpu=ev4" } */
+
+struct _ts {
+  struct _dtoa_state *interp;
+};
+struct Bigint {
+  int k;
+} *_Py_dg_strtod_bs;
+struct _dtoa_state {
+  struct Bigint p5s;
+  struct Bigint *freelist[];
+};
+extern _Thread_local struct _ts _Py_tss_tstate;
+typedef struct Bigint Bigint;
+int pow5mult_k;
+long _Py_dg_strtod_ndigits;
+void PyMem_Free();
+void Bfree(Bigint *v) {
+  if (v)
+{
+  if (v->k)
+   PyMem_Free();
+  else {
+   struct _dtoa_state *interp = _Py_tss_tstate.interp;
+   interp->freelist[v->k] = v;
+  }
+}
+}
+static Bigint *pow5mult(Bigint *b) {
+  for (;;) {
+if (pow5mult_k & 1) {
+  Bfree(b);
+  if (b == 0)
+return 0;
+}
+if (!(pow5mult_k >>= 1))
+  break;
+  }
+  return 0;
+}
+void _Py_dg_strtod() {
+  if (_Py_dg_strtod_ndigits)
+pow5mult(_Py_dg_strtod_bs);
+}


Re: [PATCH] Improve optimizer to avoid stack spill across pure function call

2024-07-17 Thread Jeff Law




On 7/15/24 7:53 AM, Vladimir Makarov wrote:


On 6/14/24 07:10, user202...@protonmail.com wrote:
This patch was inspired from PR 110137. It reduces the amount of stack 
spilling by ensuring that more values are constant across a pure 
function call.


It does not add any new flag; rather, it makes the optimizer generate 
more optimal code.


For the added test file, the change is the following. As can be seen, 
the number of memory operations is cut in half (almost, because rbx = 
rdi also need to be saved in the "after" version).


Before:

_Z2ggO7MyClass:
.LFB653:
    .cfi_startproc
    sub    rsp, 72
    .cfi_def_cfa_offset 80
    movdqu    xmm1, XMMWORD PTR [rdi]
    movdqu    xmm0, XMMWORD PTR [rdi+16]
    movaps    XMMWORD PTR [rsp+16], xmm1
    movaps    XMMWORD PTR [rsp], xmm0
    call    _Z1fv
    movdqa    xmm1, XMMWORD PTR [rsp+16]
    movdqa    xmm0, XMMWORD PTR [rsp]
    lea    rdx, [rsp+32]
    movaps    XMMWORD PTR [rsp+32], xmm1
    movaps    XMMWORD PTR [rsp+48], xmm0
    add    rsp, 72
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc

After:

_Z2ggO7MyClass:
.LFB653:
    .cfi_startproc
    push    rbx
    .cfi_def_cfa_offset 16
    .cfi_offset 3, -16
    mov    rbx, rdi
    sub    rsp, 32
    .cfi_def_cfa_offset 48
    call    _Z1fv
    movdqu    xmm0, XMMWORD PTR [rbx]
    movaps    XMMWORD PTR [rsp], xmm0
    movdqu    xmm0, XMMWORD PTR [rbx+16]
    movaps    XMMWORD PTR [rsp+16], xmm0
    add    rsp, 32
    .cfi_def_cfa_offset 16
    pop    rbx
    .cfi_def_cfa_offset 8
    ret
    .cfi_endproc

As explained in PR 110137, the reason I modify the RTL pass instead of 
the GIMPLE pass is that currently the code that handle the 
optimization is in the IRA.


The optimization involved is: rewrite

definition: a = something;
...
use a;

to move the definition statement right before the use statement, 
provided none of the statements inbetween modifies "something".


The existing code only handle the case where "something" is a memory 
reference with a fixed address. The patch modifies the logic to also 
allow memory reference whose address is not changed by the statements 
inbetween.


In order to do that the only way I can think of is to modify 
"validate_equiv_mem" to also validate the equivalence of the address, 
which may consist of a pseudo-register.


Nevertheless, reviews and suggestions to improve the code/explain how 
to implement it in GIMPLE phase would be appreciated.


Bootstrapped and regression tested on x86_64-pc-linux-gnu.

I think the test passes but there are some spurious failures with some 
scan-assembler-* tests, looking through it it doesn't appear to 
pessimize the code.


I think this also fix PR 103541 as a side effect, but I'm not sure if 
the generated code is optimal (it loads from the global variable 
twice, but then there's no readily usable caller-saved register so you 
need an additional memory operation anyway)


Sorry for the big delay with the review, I missed your patch.  It is 
better to CC a patch to maintainers of the related code to avoid such 
situation.  Fortunately, Jeff Law pointed me out your patch recently. It 
is also a good practice to ping the patch if there is no response for 
one week.  Sometimes people do several pings to get an attention for 
their patches.


The patch looks ok to me.  The only thing I found is that the test case 
should be in g++.target/i386 dir, not in gcc.target/i386.  I would also 
add -std=c++11 (or something more), as the test in g++.target will be 
run with different -std options and for -std=c++98 the test will not pass.


Also it is better to test such non-trivial patches on all major 
targets.  You can use compiler farm for this if you have no own 
available machines.  Otherwise, you should pay attention to new 
testsuite regressions on other targets after submitting the patch.


So the patch can be committed with the test change I wrote above.

And thank you to find the opportunity to generate a better code and 
implementing it.
Note this patch is causing the compiler to hang on the compile/pr43415.c 
testcase on visium-elf.  So there's no way for it to go forward without 
some additional testing and bugfixing.


For the submitter.  If you configure the compiler with:

/configure --target=visium-elf

Then build with

make all-gcc

Then test with
cd gcc; make check-gcc RUNTESTFLAGS=compile.exp=pr43415.c

You should see the failure (after a long wait for the dejagnu timeout to 
kick in).


jeff




Re: [COMMITTED] Regenerate c.opt.urls

2024-07-17 Thread Alejandro Colomar
Hi Mark,

On Wed, Jul 17, 2024 at 06:07:20PM GMT, Mark Wielaard wrote:
> > diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls
> > index 1b60ae4847b..df5f58a1eee 100644
> > --- a/gcc/c-family/c.opt.urls
> > +++ b/gcc/c-family/c.opt.urls
> > @@ -870,6 +870,9 @@ 
> > UrlSuffix(gcc/Warning-Options.html#index-Wno-unknown-pragmas) 
> > LangUrlSuffix_D(gd
> >  Wunsuffixed-float-constants
> >  UrlSuffix(gcc/Warning-Options.html#index-Wno-unsuffixed-float-constants)
> >  
> > +Wunterminated-string-initialization
> > +UrlSuffix(gcc/Warning-Options.html#index-Wno-unterminated-string-initialization)
> > +
> >  Wunused
> >  UrlSuffix(gcc/Warning-Options.html#index-Wno-unused)
> 
> I made sure that was also generated locally and pushed the attached to
> make the autoregen buildbot happy.

Thanks!

Have a lovely day!
Alex


-- 



signature.asc
Description: PGP signature


[PATCH] aarch64: Improve Advanced SIMD popcount expansion by using SVE [PR113860]

2024-07-17 Thread Pengxuan Zheng
This patch improves the Advanced SIMD popcount expansion by using SVE if
available.

For example, GCC currently generates the following code sequence for V2DI:
  cnt v31.16b, v31.16b
  uaddlp  v31.8h, v31.16b
  uaddlp  v31.4s, v31.8h
  uaddlp  v31.2d, v31.4s

However, by using SVE, we can generate the following sequence instead:
  ptrue   p7.b, all
  cnt z31.d, p7/m, z31.d

Similar improvements can be made for V4HI, V8HI, V2SI and V4SI too.

The scalar popcount expansion can also be improved similarly by using SVE and
those changes will be included in a separate patch.

PR target/113860

gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (popcount2): Add TARGET_SVE
support.
* config/aarch64/aarch64-sve.md (@aarch64_pred_popcount): New
insn.
* config/aarch64/iterators.md (VPRED): Add V4HI, V8HI and V2SI.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt-sve.c: New test.

Signed-off-by: Pengxuan Zheng 
---
 gcc/config/aarch64/aarch64-simd.md|  9 ++
 gcc/config/aarch64/aarch64-sve.md | 12 +++
 gcc/config/aarch64/iterators.md   |  1 +
 gcc/testsuite/gcc.target/aarch64/popcnt-sve.c | 88 +++
 4 files changed, 110 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt-sve.c

diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index bbeee221f37..895d6e5eab5 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -3508,6 +3508,15 @@ (define_expand "popcount2"
(popcount:VDQHSD (match_operand:VDQHSD 1 "register_operand")))]
   "TARGET_SIMD"
   {
+if (TARGET_SVE)
+  {
+   rtx p = aarch64_ptrue_reg (mode);
+   emit_insn (gen_aarch64_pred_popcount (operands[0],
+   p,
+   operands[1]));
+   DONE;
+  }
+
 /* Generate a byte popcount.  */
 machine_mode mode =  == 64 ? V8QImode : V16QImode;
 rtx tmp = gen_reg_rtx (mode);
diff --git a/gcc/config/aarch64/aarch64-sve.md 
b/gcc/config/aarch64/aarch64-sve.md
index 5331e7121d5..b5021dd2da0 100644
--- a/gcc/config/aarch64/aarch64-sve.md
+++ b/gcc/config/aarch64/aarch64-sve.md
@@ -3168,6 +3168,18 @@ (define_insn "*cond__any"
   }
 )
 
+;; Popcount predicated with a PTRUE.
+(define_insn "@aarch64_pred_popcount"
+  [(set (match_operand:VDQHSD 0 "register_operand" "=w")
+   (unspec:VDQHSD
+ [(match_operand: 1 "register_operand" "Upl")
+  (popcount:VDQHSD
+(match_operand:VDQHSD 2 "register_operand" "0"))]
+ UNSPEC_PRED_X))]
+  "TARGET_SVE"
+  "cnt\t%Z0., %1/m, %Z2."
+)
+
 ;; -
 ;;  [INT] General unary arithmetic corresponding to unspecs
 ;; -
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index f527b2cfeb8..a06159b23ea 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -2278,6 +2278,7 @@ (define_mode_attr VPRED [(VNx16QI "VNx16BI") (VNx8QI 
"VNx8BI")
 (VNx32BF "VNx8BI")
 (VNx16SI "VNx4BI") (VNx16SF "VNx4BI")
 (VNx8DI "VNx2BI") (VNx8DF "VNx2BI")
+(V4HI "VNx4BI") (V8HI "VNx8BI") (V2SI "VNx2BI")
 (V4SI "VNx4BI") (V2DI "VNx2BI")])
 
 ;; ...and again in lower case.
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c
new file mode 100644
index 000..8e349efe390
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt-sve.c
@@ -0,0 +1,88 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv8.2-a+sve -fno-vect-cost-model 
-fno-schedule-insns -fno-schedule-insns2" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/*
+** f_v4hi:
+** ptrue   (p[0-7]).b, all
+** ldr d([0-9]+), \[x0\]
+** cnt z\2.h, \1/m, z\2.h
+** str d\2, \[x1\]
+** ret
+*/
+void
+f_v4hi (unsigned short *__restrict b, unsigned short *__restrict d)
+{
+  d[0] = __builtin_popcount (b[0]);
+  d[1] = __builtin_popcount (b[1]);
+  d[2] = __builtin_popcount (b[2]);
+  d[3] = __builtin_popcount (b[3]);
+}
+
+/*
+** f_v8hi:
+** ptrue   (p[0-7]).b, all
+** ldr q([0-9]+), \[x0\]
+** cnt z\2.h, \1/m, z\2.h
+** str q\2, \[x1\]
+** ret
+*/
+void
+f_v8hi (unsigned short *__restrict b, unsigned short *__restrict d)
+{
+  d[0] = __builtin_popcount (b[0]);
+  d[1] = __builtin_popcount (b[1]);
+  d[2] = __builtin_popcount (b[2]);
+  d[3] = __builtin_popcount (b[3]);
+  d[4] = __builtin_popcount (b[4]);
+  d[5] = __builtin_popcount (b[5]);
+  d[6] = __builtin_popcount (b[6]);
+  d[7] = __builtin_popcount (b[7]);
+}
+
+/*
+** f_v2si:
+** ptrue   (p[0-7]).b, all
+** ldr d([0-9]+

Re: [PATCH v2] MATCH: Simplify (a ? x : y) eq/ne (b ? x : y) [PR111150]

2024-07-17 Thread Andrew Pinski
On Wed, Jul 17, 2024 at 5:24 AM Richard Biener
 wrote:
>
> On Tue, Jul 16, 2024 at 3:36 PM Eikansh Gupta  
> wrote:
> >
> > This patch adds match pattern for `(a ? x : y) eq/ne (b ? x : y)`.
> > In forwprop1 pass, depending on the type of `a` and `b`, GCC produces
> > `vec_cond` or `cond_expr`. Based on the observation that `(x != y)` is
> > TRUE, the pattern can be optimized to produce `(a^b ? TRUE : FALSE)`.
> >
> > The patch adds match pattern for a, b:
> > (a ? x : y) != (b ? x : y) --> (a^b) ? TRUE  : FALSE
> > (a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE
> > (a ? x : y) != (b ? y : x) --> (a^b) ? TRUE  : FALSE
> > (a ? x : y) == (b ? y : x) --> (a^b) ? FALSE : TRUE
>
> OK.

Pushed as r15-2106-g44fcc1ca11e7ea (with one small change to the
commit message in the changelog where tabs should be used before the
*; most likely a copy and paste error).

Thanks,
Andrew

>
> Richard.
>
> > PR tree-optimization/50
> >
> > gcc/ChangeLog:
> >
> > * match.pd (`(a ? x : y) eq/ne (b ? x : y)`): New pattern.
> > (`(a ? x : y) eq/ne (b ? y : x)`): New pattern.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.dg/tree-ssa/pr50.c: New test.
> > * gcc.dg/tree-ssa/pr50-1.c: New test.
> > * g++.dg/tree-ssa/pr50.C: New test.
> >
> > Signed-off-by: Eikansh Gupta 
> > ---
> >  gcc/match.pd   | 15 +
> >  gcc/testsuite/g++.dg/tree-ssa/pr50.C   | 33 ++
> >  gcc/testsuite/gcc.dg/tree-ssa/pr50-1.c | 72 ++
> >  gcc/testsuite/gcc.dg/tree-ssa/pr50.c   | 22 +++
> >  4 files changed, 142 insertions(+)
> >  create mode 100644 gcc/testsuite/g++.dg/tree-ssa/pr50.C
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr50-1.c
> >  create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr50.c
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 3759c64d461..7c125255ea3 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -5577,6 +5577,21 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >(vec_cond (bit_and (bit_not @0) @1) @2 @3)))
> >  #endif
> >
> > +/* (a ? x : y) != (b ? x : y) --> (a^b) ? TRUE  : FALSE */
> > +/* (a ? x : y) == (b ? x : y) --> (a^b) ? FALSE : TRUE  */
> > +/* (a ? x : y) != (b ? y : x) --> (a^b) ? FALSE : TRUE  */
> > +/* (a ? x : y) == (b ? y : x) --> (a^b) ? TRUE  : FALSE */
> > +(for cnd (cond vec_cond)
> > + (for eqne (eq ne)
> > +  (simplify
> > +   (eqne:c (cnd @0 @1 @2) (cnd @3 @1 @2))
> > +(cnd (bit_xor @0 @3) { constant_boolean_node (eqne == NE_EXPR, type); }
> > + { constant_boolean_node (eqne != NE_EXPR, type); }))
> > +  (simplify
> > +   (eqne:c (cnd @0 @1 @2) (cnd @3 @2 @1))
> > +(cnd (bit_xor @0 @3) { constant_boolean_node (eqne != NE_EXPR, type); }
> > + { constant_boolean_node (eqne == NE_EXPR, type); }
> > +
> >  /* Canonicalize mask ? { 0, ... } : { -1, ...} to ~mask if the mask
> > types are compatible.  */
> >  (simplify
> > diff --git a/gcc/testsuite/g++.dg/tree-ssa/pr50.C 
> > b/gcc/testsuite/g++.dg/tree-ssa/pr50.C
> > new file mode 100644
> > index 000..ca02d8dc51e
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.dg/tree-ssa/pr50.C
> > @@ -0,0 +1,33 @@
> > +/* PR tree-optimization/50 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O1 -fdump-tree-forwprop1" } */
> > +typedef int v4si __attribute((__vector_size__(4 * sizeof(int;
> > +
> > +/* Before the patch, VEC_COND_EXPR was generated for each statement in the
> > +   function. This resulted in 3 VEC_COND_EXPR. */
> > +v4si f1_(v4si a, v4si b, v4si c, v4si d, v4si e, v4si f) {
> > +  v4si X = a == b ? e : f;
> > +  v4si Y = c == d ? e : f;
> > +  return (X != Y);
> > +}
> > +
> > +v4si f2_(v4si a, v4si b, v4si c, v4si d, v4si e, v4si f) {
> > +  v4si X = a == b ? e : f;
> > +  v4si Y = c == d ? e : f;
> > +  return (X == Y);
> > +}
> > +
> > +v4si f3_(v4si a, v4si b, v4si c, v4si d, v4si e, v4si f) {
> > +  v4si X = a == b ? e : f;
> > +  v4si Y = c == d ? f : e;
> > +  return (X != Y);
> > +}
> > +
> > +v4si f4_(v4si a, v4si b, v4si c, v4si d, v4si e, v4si f) {
> > +  v4si X = a == b ? e : f;
> > +  v4si Y = c == d ? f : e;
> > +  return (X == Y);
> > +}
> > +
> > +/* For each testcase, should produce only one VEC_COND_EXPR for X^Y. */
> > +/* { dg-final { scan-tree-dump-times " VEC_COND_EXPR " 4 "forwprop1" } } */
> > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr50-1.c 
> > b/gcc/testsuite/gcc.dg/tree-ssa/pr50-1.c
> > new file mode 100644
> > index 000..6f4b21ac6bc
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr50-1.c
> > @@ -0,0 +1,72 @@
> > +/* PR tree-optimization/50 */
> > +/* { dg-do compile } */
> > +/* { dg-options "-O1 -fgimple -fdump-tree-forwprop1-raw" } */
> > +
> > +/* Checks if pattern (X ? e : f) == (Y ? e : f) gets optimized. */
> > +__GIMPLE()
> > +_Bool f1_(int a, int b, int c, int d, int e, int f) {
> > +  _Bool X;
> > +  _Bool Y;
> > +  _Bool t;
> > +  int t1;
> > +  int t2;

Re: [PATCH 2/4] c++/modules: Track module purview for deferred instantiations [PR114630]

2024-07-17 Thread Jason Merrill

On 5/1/24 11:27 AM, Jason Merrill wrote:

On 5/1/24 07:11, Patrick Palka wrote:

On Wed, 1 May 2024, Nathaniel Shead wrote:


Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

When calling instantiate_pending_templates at end of parsing, any new
functions that are instantiated from this point have their module
purview set based on the current value of module_kind.

This is unideal, however, as the modules code will then treat these
instantiations as reachable and cause large swathes of the GMF to be
emitted into the module CMI, despite no code in the actual module
purview referencing it.

This patch fixes this by also remembering the value of module_kind when
the instantiation was deferred, and restoring it when doing this
deferred instantiation.  That way newly instantiated declarations
appropriately get a DECL_MODULE_PURVIEW_P appropriate for where the
instantiation was required, meaning that GMF entities won't be counted
as reachable unless referenced by an actually reachable entity.

Note that purviewness and attachment etc. is generally only determined
by the base template: this is purely for determining whether a
specialisation was declared in the module purview and hence whether it
should be streamed out.  See the comment on 'set_instantiating_module'.

PR c++/114630
PR c++/114795

gcc/cp/ChangeLog:

* cp-tree.h (struct tinst_level): Add field for tracking
module_kind.
* pt.cc (push_tinst_level_loc): Cache module_kind in new_level.
(reopen_tinst_level): Restore module_kind from level.
(instantiate_pending_templates): Save and restore module_kind so
it isn't affected by reopen_tinst_level.


LGTM.  Another approach is to instantiate all so-far deferred
instantiations and vtables once we reach the start of the module
purview, but your approach is much cleaner.


Hmm, I actually think I like the approach of instantiating at that point 
better, so that instantiations in the GMF can't depend on things in 
module purview.


And probably do the same again when switching to the private module 
fragment, when that's implemented.


Patrick mentioned these patches again today; I still think this is the 
right approach, particularly for the private module fragment.


That said, could we just clear module_kind at EOF?  As you say, only the 
purviewness of the template matters for implicit instantiations.  It is 
seeming to me like "set_instantiating_module" should only apply for 
explicit instantiations, at the point of the explicit instantiation 
declaration.


Jason



Re: [PING][patch, avr] Implement PR90616: Improve adding symbols that are 256-byte aligned

2024-07-17 Thread Georg-Johann Lay

Am 17.07.24 um 17:55 schrieb Jeff Law:

On 7/17/24 9:26 AM, Georg-Johann Lay wrote:
It looks fine for the trunk.  Out of curiosity, does the avr port 
implement linker relaxing for this case?  That would seem to be 


No. avr-ld performs relaxing, but only the two cases of

- JMP/CALL to RJMP/RCALL provided the offset fits.
- [R]CALL+RET to [R]JMP provided there's no label between.
Yea, the first could be comparable to other targets.  The second is 
probably not all that common since the compiler should be doing that 
tail call elimination.


It should.  But there are cases where gcc doesn't optimize, like

float add (float a, float b)
{
return a + b;
}

Then there are the calls that are not visible to the compiler, like

long mul (long a, long b)
{
return b * a;
}

so that the linker relaxations still have something to do.


I had a look at ld relaxing some time ago, and I must admit that
I don't understand completely what they are doing.  Not much comments
and explanations there, basically a copy+paste from some other target
from decades ago...
Can't speak for the avr implementation, but in general, yes, odds are it 
was copied from some other target eons ago with minimal documentation.


The basics are straightforward.  The devil is in all the details.  It's 
been years since I've done any linker relaxing, but I've been immersed 
in it in the past.  I tried to comment my implementation, both in terms 
of the code sequences for the target and the interactions with the BFD 
data structures.


One job for Binutils could be optimizing fixed registers like in

char mul3 (char a, char b, char c)
{
return a * b * c;
}

mul3:
mul r22,r20  ;  21  [c=12 l=3]  *mulqi3_enh
mov r22,r0
clr r1
mul r22,r24  ;  22  [c=12 l=3]  *mulqi3_enh
mov r24,r0
clr r1
ret  ;  25  [c=0 l=1]  return

The first "clr r1" is void due to the following mul.
Just like GCC PR20296, the only feasible solution is by letting Binutils
do the job.  But I have no idea how to adjust branches without labels
like RJMP .+20 that cross an instruction that's optimized out.

Johann


Linker relaxing would be ADD.lo8 + ADC.hi8 => ADD.hi8 which affects
condition code.

In which case it'd only be safe if you knew that CC died before being used.

jeff


Re: [PATCH v2 1/8] Fix warnings for tree formats in gfc_error

2024-07-17 Thread Marek Polacek
On Fri, Jul 12, 2024 at 04:11:48PM +0200, Paul-Antoine Arras wrote:
> This enables proper warnings for formats like %qD.

Ok.  The new lines are the same as in gcc_cdiag_char_table and
gcc_tdiag_char_table.
 
> gcc/c-family/ChangeLog:
> 
>   * c-format.cc (gcc_gfc_char_table): Add formats for tree objects.
> ---
>  gcc/c-family/c-format.cc | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/gcc/c-family/c-format.cc b/gcc/c-family/c-format.cc
> index 5bfd2fc4469..f4163c9cbc0 100644
> --- a/gcc/c-family/c-format.cc
> +++ b/gcc/c-family/c-format.cc
> @@ -847,6 +847,10 @@ static const format_char_info gcc_gfc_char_table[] =
>/* This will require a "locus" at runtime.  */
>{ "L",   0, STD_C89, { T89_V,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
> BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "", "R", NULL },
>  
> +  /* These will require a "tree" at runtime.  */
> +  { "DFTV", 1, STD_C89, { T89_T,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
> BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q+", "'",   
> NULL },
> +  { "E",   1, STD_C89, { T89_T,   BADLEN,  BADLEN,  BADLEN,  BADLEN,  
> BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN,  BADLEN  }, "q+", "",   
> NULL },
> +
>/* These will require nothing.  */
>{ "<>",0, STD_C89, NOARGUMENTS, "",  "",   NULL },
>{ NULL,  0, STD_C89, NOLENGTHS, NULL, NULL, NULL }
> -- 
> 2.45.2
> 

Marek



Re: [Committed] RISC-V: Fix testcase missing arch attribute

2024-07-17 Thread Edwin Lu

Committed! Thanks!

Edwin

On 7/17/2024 1:14 AM, Kito Cheng wrote:

LGTM :)

On Wed, Jul 17, 2024 at 9:15 AM Edwin Lu  wrote:

The C + F extentions implies the zcf extension on rv32. Add missing zcf
extension for the rv32 target.

gcc/testsuite/ChangeLog:

 * gcc.target/riscv/target-attr-16.c: Update expected assembly

Signed-off-by: Edwin Lu 
---
  gcc/testsuite/gcc.target/riscv/target-attr-16.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/riscv/target-attr-16.c 
b/gcc/testsuite/gcc.target/riscv/target-attr-16.c
index 1c7badccdee..c6b626d0c6c 100644
--- a/gcc/testsuite/gcc.target/riscv/target-attr-16.c
+++ b/gcc/testsuite/gcc.target/riscv/target-attr-16.c
@@ -24,5 +24,5 @@ void bar (void)
  {
  }

-/* { dg-final { scan-assembler-times ".option arch, 
rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zba1p0_zbb1p0"
 4 { target { rv32 } } } } */
+/* { dg-final { scan-assembler-times ".option arch, 
rv32i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zcf1p0_zba1p0_zbb1p0"
 4 { target { rv32 } } } } */
  /* { dg-final { scan-assembler-times ".option arch, 
rv64i2p1_m2p0_a2p1_f2p2_d2p2_c2p0_zicsr2p0_zifencei2p0_zaamo1p0_zalrsc1p0_zca1p0_zcd1p0_zba1p0_zbb1p0"
 4 { target { rv64 } } } } */
--
2.34.1



Re: [PATCH] c++: wrong error initializing empty class [PR115900]

2024-07-17 Thread Jason Merrill

On 7/17/24 12:00 PM, Marek Polacek wrote:

Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk/14?


OK.


-- >8 --
In r14-409, we started handling empty bases first in cxx_fold_indirect_ref_1
so that we don't need to recurse and waste time.

This caused a bogus "modifying a const object" error.  I'm appending my
analysis from the PR, but basically, cxx_fold_indirect_ref now returns
a different object than before, and we mark the wrong thing as const,
but since we're initializing an empty object, we should avoid setting
the object constness.

~~
Pre-r14-409: we're evaluating the call to C::C(), which is in the body of
B::B(), which is the body of D::D(&d):

   C::C ((struct C *) this, NON_LVALUE_EXPR <0>)

It's a ctor so we get here:

  3118   /* Remember the object we are constructing or destructing.  */
  3119   tree new_obj = NULL_TREE;
  3120   if (DECL_CONSTRUCTOR_P (fun) || DECL_DESTRUCTOR_P (fun))
  3121 {
  3122   /* In a cdtor, it should be the first `this' argument.
  3123  At this point it has already been evaluated in the call
  3124  to cxx_bind_parameters_in_call.  */
  3125   new_obj = TREE_VEC_ELT (new_call.bindings, 0);

new_obj=(struct C *) &d.D.2656

  3126   new_obj = cxx_fold_indirect_ref (ctx, loc, DECL_CONTEXT (fun), 
new_obj);

new_obj=d.D.2656.D.2597

We proceed to evaluate the call, then we get here:

  3317   /* At this point, the object's constructor will have run, so
  3318  the object is no longer under construction, and its possible
  3319  'const' semantics now apply.  Make a note of this fact by
  3320  marking the CONSTRUCTOR TREE_READONLY.  */
  3321   if (new_obj && DECL_CONSTRUCTOR_P (fun))
  3322 cxx_set_object_constness (ctx, new_obj, /*readonly_p=*/true,
  3323   non_constant_p, overflow_p);

new_obj is still d.D.2656.D.2597, its type is "C", cxx_set_object_constness
doesn't set anything as const.  This is fine.

After r14-409: on line 3125, new_obj is (struct C *) &d.D.2656 as before,
but we go to cxx_fold_indirect_ref_1:

  5739   if (is_empty_class (type)
  5740   && CLASS_TYPE_P (optype)
  5741   && lookup_base (optype, type, ba_any, NULL, tf_none, off))
  5742 {
  5743   if (empty_base)
  5744 *empty_base = true;
  5745   return op;

type is C, which is an empty class; optype is "const D", and C is a base of D.
So we return the VAR_DECL 'd'.  Then we get to cxx_set_object_constness with
object=d, which is const, so we mark the constructor READONLY.

Then we're evaluating A::A() which has

   ((A*)this)->data = 0;

we evaluate the LHS to d.D.2656.a, for which the initializer is
{.D.2656={.a={.data=}}} which is TREE_READONLY and 'd' is const, so we think
we're modifying a const object and fail the constexpr evaluation.

PR c++/115900

gcc/cp/ChangeLog:

* constexpr.cc (cxx_eval_call_expression): Set new_obj to NULL_TREE
if cxx_fold_indirect_ref set empty_base to true.

gcc/testsuite/ChangeLog:

* g++.dg/cpp2a/constexpr-init23.C: New test.
---
  gcc/cp/constexpr.cc   | 14 
  gcc/testsuite/g++.dg/cpp2a/constexpr-init23.C | 22 +++
  2 files changed, 32 insertions(+), 4 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp2a/constexpr-init23.C

diff --git a/gcc/cp/constexpr.cc b/gcc/cp/constexpr.cc
index f12b1dfc46d..abd3b04ea7f 100644
--- a/gcc/cp/constexpr.cc
+++ b/gcc/cp/constexpr.cc
@@ -3123,10 +3123,16 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
tree t,
 At this point it has already been evaluated in the call
 to cxx_bind_parameters_in_call.  */
new_obj = TREE_VEC_ELT (new_call.bindings, 0);
-  new_obj = cxx_fold_indirect_ref (ctx, loc, DECL_CONTEXT (fun), new_obj);
-
-  if (ctx->call && ctx->call->fundef
- && DECL_CONSTRUCTOR_P (ctx->call->fundef->decl))
+  bool empty_base = false;
+  new_obj = cxx_fold_indirect_ref (ctx, loc, DECL_CONTEXT (fun), new_obj,
+  &empty_base);
+  /* If we're initializing an empty class, don't set constness, because
+cxx_fold_indirect_ref will return the wrong object to set constness
+of.  */
+  if (empty_base)
+   new_obj = NULL_TREE;
+  else if (ctx->call && ctx->call->fundef
+  && DECL_CONSTRUCTOR_P (ctx->call->fundef->decl))
{
  tree cur_obj = TREE_VEC_ELT (ctx->call->bindings, 0);
  STRIP_NOPS (cur_obj);
diff --git a/gcc/testsuite/g++.dg/cpp2a/constexpr-init23.C 
b/gcc/testsuite/g++.dg/cpp2a/constexpr-init23.C
new file mode 100644
index 000..466236d446d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp2a/constexpr-init23.C
@@ -0,0 +1,22 @@
+// PR c++/115900
+// { dg-do compile { target c++20 } }
+
+struct A {
+char m;
+constexpr A() { m = 0; }
+};
+
+struct C {
+  constexpr C()

Re: [PATCH v2] c++: Hash placeholder constraint in ctp_hasher

2024-07-17 Thread Jason Merrill

On 7/17/24 8:32 AM, Seyed Sajad Kahani wrote:

This patch addresses a difference between the hash function and the equality
function for canonical types of template parameters (ctp_hasher). The equality
function uses comptypes (typeck.cc) (with COMPARE_STRUCTURAL) and checks
constraint equality for two auto nodes (typeck.cc:1586), while the hash
function ignores it (pt.cc:4528). This leads to hash collisions that can be
avoided by using `hash_placeholder_constraint` (constraint.cc:1150).


Please mention in the commit message why there isn't a testcase.  OK 
with that tweak, thanks.



* constraint.cc (hash_placeholder_constraint): Rename to
iterative_hash_placeholder_constraint.
(iterative_hash_placeholder_constraint): Rename from
hash_placeholder_constraint and add the initial val argument.
* cp-tree.h (hash_placeholder_constraint): Rename to
iterative_hash_placeholder_constraint.
(iterative_hash_placeholder_constraint): Renamed from
hash_placeholder_constraint and add the initial val argument.
* pt.cc (struct ctp_hasher): Updated to use
iterative_hash_placeholder_constraint in the case of a valid placeholder
constraint.
(auto_hash::hash): Reflect the renaming of hash_placeholder_constraint 
to
iterative_hash_placeholder_constraint.
---
  gcc/cp/constraint.cc | 4 ++--
  gcc/cp/cp-tree.h | 2 +-
  gcc/cp/pt.cc | 9 +++--
  3 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index ebf4255e5..78aacb77a 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -1971,13 +1971,13 @@ equivalent_placeholder_constraints (tree c1, tree c2)
  /* Return a hash value for the placeholder ATOMIC_CONSTR C.  */
  
  hashval_t

-hash_placeholder_constraint (tree c)
+iterative_hash_placeholder_constraint (tree c, hashval_t val)
  {
tree t, a;
placeholder_extract_concept_and_args (c, t, a);
  
/* Like hash_tmpl_and_args, but skip the first argument.  */

-  hashval_t val = iterative_hash_object (DECL_UID (t), 0);
+  val = iterative_hash_object (DECL_UID (t), val);
  
for (int i = TREE_VEC_LENGTH (a)-1; i > 0; --i)

  val = iterative_hash_template_arg (TREE_VEC_ELT (a, i), val);
diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index 4bb3e9c49..294e88f75 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -8588,7 +8588,7 @@ extern tree_pair finish_type_constraints  (tree, tree, 
tsubst_flags_t);
  extern tree build_constrained_parameter (tree, tree, tree = 
NULL_TREE);
  extern void placeholder_extract_concept_and_args (tree, tree&, tree&);
  extern bool equivalent_placeholder_constraints  (tree, tree);
-extern hashval_t hash_placeholder_constraint   (tree);
+extern hashval_t iterative_hash_placeholder_constraint (tree, hashval_t);
  extern bool deduce_constrained_parameter(tree, tree&, tree&);
  extern tree resolve_constraint_check(tree);
  extern tree check_function_concept  (tree);
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index d1316483e..3229c3706 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -4525,7 +4525,12 @@ struct ctp_hasher : ggc_ptr_hash
  val = iterative_hash_object (TEMPLATE_TYPE_LEVEL (t), val);
  val = iterative_hash_object (TEMPLATE_TYPE_IDX (t), val);
  if (TREE_CODE (t) == TEMPLATE_TYPE_PARM)
-  val = iterative_hash_template_arg (CLASS_PLACEHOLDER_TEMPLATE (t), val);
+  {
+   val
+ = iterative_hash_template_arg (CLASS_PLACEHOLDER_TEMPLATE (t), val);
+   if (tree c = NON_ERROR (PLACEHOLDER_TYPE_CONSTRAINTS (t)))
+ val = iterative_hash_placeholder_constraint (c, val);
+  }
  if (TREE_CODE (t) == BOUND_TEMPLATE_TEMPLATE_PARM)
val = iterative_hash_template_arg (TYPE_TI_ARGS (t), val);
  --comparing_specializations;
@@ -29605,7 +29610,7 @@ auto_hash::hash (tree t)
if (tree c = NON_ERROR (PLACEHOLDER_TYPE_CONSTRAINTS (t)))
  /* Matching constrained-type-specifiers denote the same template
 parameter, so hash the constraint.  */
-return hash_placeholder_constraint (c);
+return iterative_hash_placeholder_constraint (c, 0);
else
  /* But unconstrained autos are all separate, so just hash the pointer.  */
  return iterative_hash_object (t, 0);




Re: [PING][patch, avr] Implement PR90616: Improve adding symbols that are 256-byte aligned

2024-07-17 Thread Jeff Law




On 7/17/24 11:13 AM, Georg-Johann Lay wrote:

Am 17.07.24 um 17:55 schrieb Jeff Law:

On 7/17/24 9:26 AM, Georg-Johann Lay wrote:
It looks fine for the trunk.  Out of curiosity, does the avr port 
implement linker relaxing for this case?  That would seem to be 


No. avr-ld performs relaxing, but only the two cases of

- JMP/CALL to RJMP/RCALL provided the offset fits.
- [R]CALL+RET to [R]JMP provided there's no label between.
Yea, the first could be comparable to other targets.  The second is 
probably not all that common since the compiler should be doing that 
tail call elimination.


It should.  But there are cases where gcc doesn't optimize, like

float add (float a, float b)
{
     return a + b;
}
Presumably the a+b is handled via a libcall rather than a normal call? 
I guess there might be something in the path where that needs special 
handling.  It's been like 20+ years since I was last in that code. 
Conceptually I don't see a reason why libcalls would need to be special.




Then there are the calls that are not visible to the compiler, like

long mul (long a, long b)
{
     return b * a;
}

so that the linker relaxations still have something to do.
Yea, if you're emitting the call behind the back of the compiler for 
this kind of case, then the linker is your only real shot.  I did 
something like that for a few key operations on the mn102 chip eons ago.





One job for Binutils could be optimizing fixed registers like in

char mul3 (char a, char b, char c)
{
     return a * b * c;
}

mul3:
 mul r22,r20 ;  21    [c=12 l=3]  *mulqi3_enh
 mov r22,r0
 clr r1
 mul r22,r24 ;  22    [c=12 l=3]  *mulqi3_enh
 mov r24,r0
 clr r1
 ret ;  25    [c=0 l=1]  return

The first "clr r1" is void due to the following mul.
Just like GCC PR20296, the only feasible solution is by letting Binutils
do the job.  But I have no idea how to adjust branches without labels
like RJMP .+20 that cross an instruction that's optimized out.
I suspect the most important step is to prevent the assembler from 
resolving pc-relative jumps and instead emit a suitable relocation. 
Once that's done I think the branches should get adjusted automatically.



jeff








Johann


Linker relaxing would be ADD.lo8 + ADC.hi8 => ADD.hi8 which affects
condition code.
In which case it'd only be safe if you knew that CC died before being 
used.


jeff




Re: [PATCH] RISC-V: Support __mulbc3 and __divbc3 in libgcc for __bf16

2024-07-17 Thread Jeff Law




On 7/17/24 2:01 AM, Xiao Zeng wrote:

libgcc/ChangeLog:

* Makefile.in: Support __divbc3 and __mulbc3.
* libgcc2.c (if): Support BC mode for __bf16.
(defined): Ditto.
(MTYPE): Ditto.
(CTYPE): Ditto.
(AMTYPE): Ditto.
(MODE): Ditto.
(CEXT): Ditto.
(NOTRUNC): Ditto.
* libgcc2.h (LIBGCC2_HAS_BF_MODE): Ditto.
(__attribute__): Ditto.
(__divbc3): Add __divbc3 for __bf16.
(__mulbc3): Add __mulbc3 for __bf16.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/bf16-mulbc3-divbc3.c: New test.

It looks like this failed pre-commit testing:


https://patchwork.sourceware.org/project/gcc/patch/20240717080159.34038-1-zengx...@eswincomputing.com/



Jeff


[PATCH] c++: diagnose failed qualified lookup into current inst

2024-07-17 Thread Patrick Palka
Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

When the scope of a qualified name is the current instantiation, and
qualified lookup finds nothing at template definition time, then we
know it'll find nothing at instantiation time (unless the current
instantiation has dependent bases).  So such qualified name lookup
failure can be diagnosed ahead of time as per [temp.res.general]/6.

This patch implements that, for qualified names of the form:

  this->non_existent
  a.non_existent
  A::non_existent
  typename A::non_existent

It turns out we already optimistically attempt qualified lookup of
basically every qualified name, even when it's dependently scoped, and
just suppress issuing a lookup failure diagnostic after the fact when
the scope is a dependent type.  So implementing this is mostly a
matter of restricting the diagnostic suppression to "dependentish"
scopes, rather than all dependently typed scopes.

The cp_parser_conversion_function_id change is needed to avoid regressing
lookup/using8.C:

  using A::operator typename A::Nested*;

When resolving A::Nested we consider it not dependently scoped since
we entered A from cp_parser_conversion_function_id earlier.   But this
A is the implicit instantiation A not the primary template type A,
and so the lookup of Nested fails which we now diagnose.  This patch works
around this by not entering the template scope of a qualified conversion
function-id in this case, i.e. if we're in an expression vs declaration
context, by seeing if the type already went through finish_template_type
with entering_scope=true.

gcc/cp/ChangeLog:

* decl.cc (make_typename_type): Restrict name lookup failure
punting to dependentish_scope_p instead of dependent_type_p.
* error.cc (qualified_name_lookup_error): Improve diagnostic
when the scope is the current instantiation.
* parser.cc (cp_parser_diagnose_invalid_type_name): Likewise.
(cp_parser_conversion_function_id): Don't call push_scope on
a template scope unless we're in a declaration context.
(cp_parser_lookup_name): Restrict name lookup failure
punting to dependentish_scope_p instead of depedent_type_p.
* semantics.cc (finish_id_expression_1): Likewise.
* typeck.cc (finish_class_member_access_expr): Likewise.

libstdc++-v3/ChangeLog:

* include/experimental/socket
(basic_socket_iostream::basic_socket_iostream): Fix typo.
* include/tr2/dynamic_bitset
(__dynamic_bitset_base::_M_is_proper_subset_of): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alignas18.C: Expect name lookup error for U::X.
* g++.dg/cpp0x/forw_enum13.C: Expect name lookup error for
D3::A and D4::A.
* g++.dg/parse/access13.C: Declare A::E::V to avoid name lookup
failure and preserve intent of the test.
* g++.dg/parse/enum11.C: Expect extra errors, matching the
non-template case.
* g++.dg/template/crash123.C: Avoid name lookup failure to
preserve intent of the test.
* g++.dg/template/crash124.C: Likewise.
* g++.dg/template/crash7.C: Adjust expected diagnostics.
* g++.dg/template/dtor6.C: Declare A::~A() to avoid name lookup
failure and preserve intent of the test.
* g++.dg/template/error22.C: Adjust expected diagnostics.
* g++.dg/template/static30.C: Avoid name lookup failure to
preserve intent of the test.
* g++.old-deja/g++.other/decl5.C: Adjust expected diagnostics.
* g++.dg/template/non-dependent34.C: New test.
---
 gcc/cp/decl.cc|  2 +-
 gcc/cp/error.cc   |  3 +-
 gcc/cp/parser.cc  | 10 +++--
 gcc/cp/semantics.cc   |  2 +-
 gcc/cp/typeck.cc  |  2 +-
 gcc/testsuite/g++.dg/cpp0x/alignas18.C|  3 +-
 gcc/testsuite/g++.dg/cpp0x/forw_enum13.C  |  6 +--
 gcc/testsuite/g++.dg/parse/access13.C |  1 +
 gcc/testsuite/g++.dg/parse/enum11.C   |  2 +-
 gcc/testsuite/g++.dg/template/crash123.C  |  2 +-
 gcc/testsuite/g++.dg/template/crash124.C  |  4 +-
 gcc/testsuite/g++.dg/template/crash7.C|  6 +--
 gcc/testsuite/g++.dg/template/dtor6.C |  3 +-
 gcc/testsuite/g++.dg/template/error22.C   |  2 +-
 .../g++.dg/template/non-dependent34.C | 44 +++
 gcc/testsuite/g++.dg/template/static30.C  |  4 +-
 gcc/testsuite/g++.old-deja/g++.other/decl5.C  |  2 +-
 libstdc++-v3/include/experimental/socket  |  2 +-
 libstdc++-v3/include/tr2/dynamic_bitset   |  2 +-
 19 files changed, 75 insertions(+), 27 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/template/non-dependent34.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index e7bb4fa3089..3c5ad554ff2 100644
--- a/gcc/cp/decl.cc
+++ b/gcc/cp/decl.cc
@@ -4536,7 +4536,7 @@ make_typename_type (tre

Re: [PATCH v1] Match: Bugfix .SAT_TRUNC honor types has no mode precision [PR115961]

2024-07-17 Thread Andrew Pinski
On Wed, Jul 17, 2024 at 4:13 AM Richard Biener
 wrote:
>
> On Wed, Jul 17, 2024 at 11:48 AM  wrote:
> >
> > From: Pan Li 
> >
> > The .SAT_TRUNC matching doesn't check the type has mode precision.  Thus
> > when bitfield like below will be recog as .SAT_TRUNC.
> >
> > struct e
> > {
> >   unsigned pre : 12;
> >   unsigned a : 4;
> > };
> >
> > __attribute__((noipa))
> > void bug (e * v, unsigned def, unsigned use) {
> >   e & defE = *v;
> >   defE.a = min_u (use + 1, 0xf);
> > }
> >
> > This patch would like to add type_has_mode_precision_p for the
> > .SAT_TRUNC matching to get rid of this.
> >
> > The below test suites are passed for this patch:
> > 1. The rv64gcv fully regression tests.
> > 2. The x86 bootstrap tests.
> > 3. The x86 fully regression tests.
>
> Hmm, rather than restricting the matching the issue is the optab query or
> in this case how *_optab_supported_p blindly uses TYPE_MODE without
> either asserting the type has mode precision or failing the query in this 
> case.
>
> I think it would be simplest to adjust direct_optab_supported_p
> (and convert_optab_supported_p) to reject such operations?  Richard, do
> you agree or should callers check this instead?

I was thinking it should be in
direct_optab_supported_p/convert_optab_supported_p/direct_internal_fn_optab
too when I did my initial analysis of the bug report. I tried to see
if there was another use of direct_optab_supported_p where this could
happen but I didn't find one.

Thanks,
Andrew

>
> So, instead of match.pd the check would need to be in vector pattern matching
> and SSA math opts.  Or alternatively in internal-fn.cc as laid out above.
>
> Richard.
>
> > PR target/115961
> >
> > gcc/ChangeLog:
> >
> > * match.pd: Add type_has_mode_precision_p check for .SAT_TRUNC.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * g++.target/i386/pr115961-run-1.C: New test.
> > * g++.target/riscv/rvv/base/pr115961-run-1.C: New test.
> >
> > Signed-off-by: Pan Li 
> > ---
> >  gcc/match.pd  |  4 +--
> >  .../g++.target/i386/pr115961-run-1.C  | 34 +++
> >  .../riscv/rvv/base/pr115961-run-1.C   | 34 +++
> >  3 files changed, 70 insertions(+), 2 deletions(-)
> >  create mode 100644 gcc/testsuite/g++.target/i386/pr115961-run-1.C
> >  create mode 100644 gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
> >
> > diff --git a/gcc/match.pd b/gcc/match.pd
> > index 24a0bbead3e..8121ec09f53 100644
> > --- a/gcc/match.pd
> > +++ b/gcc/match.pd
> > @@ -3240,7 +3240,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >   (bit_ior:c (negate (convert (gt @0 INTEGER_CST@1)))
> > (convert @0))
> >   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> > -  && TYPE_UNSIGNED (TREE_TYPE (@0)))
> > +  && TYPE_UNSIGNED (TREE_TYPE (@0)) && type_has_mode_precision_p 
> > (type))
> >   (with
> >{
> > unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> > @@ -3255,7 +3255,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >  (match (unsigned_integer_sat_trunc @0)
> >   (convert (min @0 INTEGER_CST@1))
> >   (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)
> > -  && TYPE_UNSIGNED (TREE_TYPE (@0)))
> > +  && TYPE_UNSIGNED (TREE_TYPE (@0)) && type_has_mode_precision_p 
> > (type))
> >   (with
> >{
> > unsigned itype_precision = TYPE_PRECISION (TREE_TYPE (@0));
> > diff --git a/gcc/testsuite/g++.target/i386/pr115961-run-1.C 
> > b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
> > new file mode 100644
> > index 000..b8c8aef3b17
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/i386/pr115961-run-1.C
> > @@ -0,0 +1,34 @@
> > +/* PR target/115961 */
> > +/* { dg-do run } */
> > +/* { dg-options "-O3 -fdump-rtl-expand-details" } */
> > +
> > +struct e
> > +{
> > +  unsigned pre : 12;
> > +  unsigned a : 4;
> > +};
> > +
> > +static unsigned min_u (unsigned a, unsigned b)
> > +{
> > +  return (b < a) ? b : a;
> > +}
> > +
> > +__attribute__((noipa))
> > +void bug (e * v, unsigned def, unsigned use) {
> > +  e & defE = *v;
> > +  defE.a = min_u (use + 1, 0xf);
> > +}
> > +
> > +__attribute__((noipa, optimize(0)))
> > +int main(void)
> > +{
> > +  e v = { 0xded, 3 };
> > +
> > +  bug(&v, 32, 33);
> > +
> > +  if (v.a != 0xf)
> > +__builtin_abort ();
> > +
> > +  return 0;
> > +}
> > +/* { dg-final { scan-rtl-dump-not ".SAT_TRUNC " "expand" } } */
> > diff --git a/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C 
> > b/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
> > new file mode 100644
> > index 000..b8c8aef3b17
> > --- /dev/null
> > +++ b/gcc/testsuite/g++.target/riscv/rvv/base/pr115961-run-1.C
> > @@ -0,0 +1,34 @@
> > +/* PR target/115961 */
> > +/* { dg-do run } */
> > +/* { dg-options "-O3 -fdump-rtl-expand-details" } */
> > +
> > +struct e
> > +{
> > +  unsigned pre : 12;
> > +  unsigned a : 4;
> > +};
> > +
> > +static unsigned min_u (unsigned a, unsigned b)
> > +{
> > +  return (b < a)

[PATCH v2] testsuite: Add dg-do run to more tests

2024-07-17 Thread Sam James
All of these are for wrong-code bugs. Confirmed to be used before but
with no execution.

Tested on x86_64-pc-linux-gnu and checked test logs before/after.

2024-07-17  Sam James  

PR/96369
PR/102124
PR/108692
* c-c++-common/pr96369.c: Add dg-do run directive.
* gcc.dg/torture/pr102124.c: Ditto.
* gcc.dg/pr108692.c: Ditto.
---
v2: Fix ChangeLog format. Explicitly state how tested in commit msg.
Drop redundant dg-do compile where appropriate.

 gcc/testsuite/c-c++-common/pr96369.c| 2 +-
 gcc/testsuite/gcc.dg/pr108692.c | 2 +-
 gcc/testsuite/gcc.dg/torture/pr102124.c | 1 +
 3 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/c-c++-common/pr96369.c 
b/gcc/testsuite/c-c++-common/pr96369.c
index 8c468d9fec2f..ec58a3fc6c92 100644
--- a/gcc/testsuite/c-c++-common/pr96369.c
+++ b/gcc/testsuite/c-c++-common/pr96369.c
@@ -1,4 +1,4 @@
-/* { dg-do compile } */
+/* { dg-do run } */
 /* { dg-options "-O" } */
 
 int main()
diff --git a/gcc/testsuite/gcc.dg/pr108692.c b/gcc/testsuite/gcc.dg/pr108692.c
index fc25bf54e45d..13a27496ad9f 100644
--- a/gcc/testsuite/gcc.dg/pr108692.c
+++ b/gcc/testsuite/gcc.dg/pr108692.c
@@ -1,5 +1,5 @@
 /* PR tree-optimization/108692 */
-/* { dg-do compile } */
+/* { dg-do run } */
 /* { dg-options "-O2 -ftree-vectorize" } */
 
 __attribute__((noipa)) int
diff --git a/gcc/testsuite/gcc.dg/torture/pr102124.c 
b/gcc/testsuite/gcc.dg/torture/pr102124.c
index a158b4a60b69..a0eb01521242 100644
--- a/gcc/testsuite/gcc.dg/torture/pr102124.c
+++ b/gcc/testsuite/gcc.dg/torture/pr102124.c
@@ -1,4 +1,5 @@
 /* PR tree-optimization/102124 */
+/* { dg-do run } */
 
 int
 foo (const unsigned char *a, const unsigned char *b, unsigned long len)

-- 
2.45.2



Re: [PING][patch, avr] Implement PR90616: Improve adding symbols that are 256-byte aligned

2024-07-17 Thread Georg-Johann Lay

Am 17.07.24 um 19:51 schrieb Jeff Law:

On 7/17/24 11:13 AM, Georg-Johann Lay wrote:

Am 17.07.24 um 17:55 schrieb Jeff Law:

On 7/17/24 9:26 AM, Georg-Johann Lay wrote:
It looks fine for the trunk.  Out of curiosity, does the avr port 
implement linker relaxing for this case?  That would seem to be 


No. avr-ld performs relaxing, but only the two cases of

- JMP/CALL to RJMP/RCALL provided the offset fits.
- [R]CALL+RET to [R]JMP provided there's no label between.
Yea, the first could be comparable to other targets.  The second is 
probably not all that common since the compiler should be doing that 
tail call elimination.


It should.  But there are cases where gcc doesn't optimize, like

float add (float a, float b)
{
 return a + b;
}
Presumably the a+b is handled via a libcall rather than a normal call? I 
guess there might be something in the path where that needs special 
handling.  It's been like 20+ years since I was last in that code. 
Conceptually I don't see a reason why libcalls would need to be special.




Then there are the calls that are not visible to the compiler, like

long mul (long a, long b)
{
 return b * a;
}

so that the linker relaxations still have something to do.
Yea, if you're emitting the call behind the back of the compiler for 
this kind of case, then the linker is your only real shot.  I did 
something like that for a few key operations on the mn102 chip eons ago.



One job for Binutils could be optimizing fixed registers like in

char mul3 (char a, char b, char c)
{
 return a * b * c;
}

mul3:
 mul r22,r20 ;  21    [c=12 l=3]  *mulqi3_enh
 mov r22,r0
 clr r1
 mul r22,r24 ;  22    [c=12 l=3]  *mulqi3_enh
 mov r24,r0
 clr r1
 ret ;  25    [c=0 l=1]  return

The first "clr r1" is void due to the following mul.
Just like GCC PR20296, the only feasible solution is by letting Binutils
do the job.  But I have no idea how to adjust branches without labels
like RJMP .+20 that cross an instruction that's optimized out.
I suspect the most important step is to prevent the assembler from 
resolving pc-relative jumps and instead emit a suitable relocation. Once 
that's done I think the branches should get adjusted automatically.


May be that's already the case with -mlink-relax?  IIRC that was
introduced to keep the assembler from resolving label differences
when the linker may relax and hence change label differences, because
it shredded debug info.

...appears to work:

void trelax (void)
{
__asm ("rjmp .+4""\n\t"
   "rcall main"  "\n\t"
   "ret" "\n\t"
   "inc r0");
}

int main (void)
{
return 0;
}

with -mrelax, the code is:

004c :
  4c:   01 c0   rjmp.+2 ; 0x50 

004e :
  4e:   15 c0   rjmp.+42; 0x7a 
  50:   03 94   inc r0
  52:   08 95   ret

so that the RJMP is still targeting the INC.  Though the
very optimization is performed by ld and not by gas.

And there is the complication that a zero_reg optimization
must only be performed on asm code from C/C++ that is using
the avr-gcc ABI.  But that could be handled by options, so
we'd have a change to the decide-specs again :-/
Or maybe better by a directive like .abi gcc or so.

Johann


[pushed] genattrtab: Drop enum tags, consolidate type names

2024-07-17 Thread Richard Sandiford
genattrtab printed an "enum" tag before references to attribute
enums, but that's redundant in C++.  Removing it means that each
attribute type becomes a single token and can be easily stored
in the attr_desc structure.

Tested on aarch64-linux-gnu and using contrib/config-list.mk
(to make sure that removing "enum" didn't introduce a name clash).
Pushed to trunk.

Richard


gcc/
* genattrtab.cc (attr_desc::cxx_type): New field.
(write_attr_get, write_attr_value): Use it.
(gen_attr, find_attr, make_internal_attr): Initialize it,
dropping enum tags.
---
 gcc/genattrtab.cc | 37 ++---
 1 file changed, 14 insertions(+), 23 deletions(-)

diff --git a/gcc/genattrtab.cc b/gcc/genattrtab.cc
index 03c7d6c74a3..2a51549ddd4 100644
--- a/gcc/genattrtab.cc
+++ b/gcc/genattrtab.cc
@@ -175,6 +175,7 @@ class attr_desc
 public:
   char *name;  /* Name of attribute.  */
   const char *enum_name;   /* Enum name for DEFINE_ENUM_NAME.  */
+  const char *cxx_type;/* The associated C++ type.  */
   class attr_desc *next;   /* Next attribute.  */
   struct attr_value *first_value; /* First value of this attribute.  */
   struct attr_value *default_val; /* Default value for this attribute.  */
@@ -3083,6 +3084,7 @@ gen_attr (md_rtx_info *info)
   if (GET_CODE (def) == DEFINE_ENUM_ATTR)
 {
   attr->enum_name = XSTR (def, 1);
+  attr->cxx_type = attr->enum_name;
   et = rtx_reader_ptr->lookup_enum_type (XSTR (def, 1));
   if (!et || !et->md_p)
error_at (info->loc, "No define_enum called `%s' defined",
@@ -3092,9 +3094,13 @@ gen_attr (md_rtx_info *info)
  add_attr_value (attr, ev->name);
 }
   else if (*XSTR (def, 1) == '\0')
-attr->is_numeric = 1;
+{
+  attr->is_numeric = 1;
+  attr->cxx_type = "int";
+}
   else
 {
+  attr->cxx_type = concat ("attr_", attr->name, nullptr);
   name_ptr = XSTR (def, 1);
   while ((p = next_comma_elt (&name_ptr)) != NULL)
add_attr_value (attr, p);
@@ -4052,12 +4058,7 @@ write_attr_get (FILE *outf, class attr_desc *attr)
 
   /* Write out start of function, then all values with explicit `case' lines,
  then a `default', then the value with the most uses.  */
-  if (attr->enum_name)
-fprintf (outf, "enum %s\n", attr->enum_name);
-  else if (!attr->is_numeric)
-fprintf (outf, "enum attr_%s\n", attr->name);
-  else
-fprintf (outf, "int\n");
+  fprintf (outf, "%s\n", attr->cxx_type);
 
   /* If the attribute name starts with a star, the remainder is the name of
  the subroutine to use, instead of `get_attr_...'.  */
@@ -4103,13 +4104,8 @@ write_attr_get (FILE *outf, class attr_desc *attr)
  cached_attrs[j] = name;
cached_attr = find_attr (&name, 0);
gcc_assert (cached_attr && cached_attr->is_const == 0);
-   if (cached_attr->enum_name)
- fprintf (outf, "  enum %s", cached_attr->enum_name);
-   else if (!cached_attr->is_numeric)
- fprintf (outf, "  enum attr_%s", cached_attr->name);
-   else
- fprintf (outf, "  int");
-   fprintf (outf, " cached_%s ATTRIBUTE_UNUSED;\n", name);
+   fprintf (outf, "  %s cached_%s ATTRIBUTE_UNUSED;\n",
+cached_attr->cxx_type, name);
j++;
   }
   cached_attr_count = j;
@@ -4395,14 +4391,7 @@ write_attr_value (FILE *outf, class attr_desc *attr, rtx 
value)
 case ATTR:
   {
class attr_desc *attr2 = find_attr (&XSTR (value, 0), 0);
-   if (attr->enum_name)
- fprintf (outf, "(enum %s)", attr->enum_name);
-   else if (!attr->is_numeric)
- fprintf (outf, "(enum attr_%s)", attr->name);
-   else if (!attr2->is_numeric)
- fprintf (outf, "(int)");
-
-   fprintf (outf, "get_attr_%s (%s)", attr2->name,
+   fprintf (outf, "(%s) get_attr_%s (%s)", attr->cxx_type, attr2->name,
 (attr2->is_const ? "" : "insn"));
   }
   break;
@@ -4672,7 +4661,8 @@ find_attr (const char **name_p, int create)
 
   attr = oballoc (class attr_desc);
   attr->name = DEF_ATTR_STRING (name);
-  attr->enum_name = 0;
+  attr->enum_name = nullptr;
+  attr->cxx_type = nullptr;
   attr->first_value = attr->default_val = NULL;
   attr->is_numeric = attr->is_const = attr->is_special = 0;
   attr->next = attrs[index];
@@ -4693,6 +4683,7 @@ make_internal_attr (const char *name, rtx value, int 
special)
   attr = find_attr (&name, 1);
   gcc_assert (!attr->default_val);
 
+  attr->cxx_type = "int";
   attr->is_numeric = 1;
   attr->is_const = 0;
   attr->is_special = (special & ATTR_SPECIAL) != 0;
-- 
2.25.1



Re: [PATCH v2] gimple-fold: consistent dump of builtin call simplifications

2024-07-17 Thread Rubin Gerritsen
Sorry for the inconvenience, here the patch is attached as an attachment.

Rubin

From: Richard Biener 
Sent: 17 July 2024 1:01 PM
To: rubin.gerritsen 
Cc: gcc-patches@gcc.gnu.org 
Subject: Re: [PATCH v2] gimple-fold: consistent dump of builtin call 
simplifications

On Wed, Jul 17, 2024 at 12:47 PM Richard Biener
 wrote:
>
> On Tue, Jul 16, 2024 at 9:30 PM rubin.gerritsen  wrote:
> >
> > Changes since v1:
> >  * Added DCO signoff
> >  * Removed tabs from commit message
> >
> > --
> > Previously only simplifications of the `__st[xrp]cpy_chk`
> > were dumped. Now all call replacement simplifications are
> > dumped.
> >
> > Examples of statements with corresponding dumpfile entries:
> >
> > `printf("mystr\n");`:
> >   optimized: simplified printf to __builtin_puts
> > `printf("%c", 'a');`:
> >   optimized: simplified printf to __builtin_putchar
> > `printf("%s\n", "mystr");`:
> >   optimized: simplified printf to __builtin_puts
> >
> > The below test suites passed for this patch
> > * The x86 bootstrap test.
> > * Manual testing with some small example code manually
> >   examining dump logs, outputting the lines mentioned above.
>
> OK.
>
> I'll push this for you.

Can you please post the patch as generated by
git format-patch and attach it?  I have problems with your
mailer wrapping lines and even with that fixed the patch
not applying with git am.

Richard.

> Richard.
>
> > gcc/ChangeLog:
> >
> > * gimple-fold.cc (dump_transformation): Moved definition.
> > (replace_call_with_call_and_fold): Calls dump_transformation.
> > (gimple_fold_builtin_stxcpy_chk): Removes call to
> > dump_transformation, now in replace_call_with_call_and_fold.
> > (gimple_fold_builtin_stxncpy_chk): Removes call to
> > dump_transformation, now in replace_call_with_call_and_fold.
> >
> > Signed-off-by: Rubin Gerritsen 
> > ---
> >  gcc/gimple-fold.cc | 22 ++
> >  1 file changed, 10 insertions(+), 12 deletions(-)
> >
> > diff --git a/gcc/gimple-fold.cc b/gcc/gimple-fold.cc
> > index 7c534d56bf1..b20d3a2ff9a 100644
> > --- a/gcc/gimple-fold.cc
> > +++ b/gcc/gimple-fold.cc
> > @@ -802,6 +802,15 @@ gimplify_and_update_call_from_tree 
> > (gimple_stmt_iterator *si_p, tree expr)
> >gsi_replace_with_seq_vops (si_p, stmts);
> >  }
> >
> > +/* Print a message in the dump file recording transformation of FROM to 
> > TO.  */
> > +
> > +static void
> > +dump_transformation (gcall *from, gcall *to)
> > +{
> > +  if (dump_enabled_p ())
> > +dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, from, "simplified %T to 
> > %T\n",
> > +  gimple_call_fn (from), gimple_call_fn (to));
> > +}
> >
> >  /* Replace the call at *GSI with the gimple value VAL.  */
> >
> > @@ -835,6 +844,7 @@ static void
> >  replace_call_with_call_and_fold (gimple_stmt_iterator *gsi, gimple *repl)
> >  {
> >gimple *stmt = gsi_stmt (*gsi);
> > +  dump_transformation (as_a  (stmt), as_a  (repl));
> >gimple_call_set_lhs (repl, gimple_call_lhs (stmt));
> >gimple_set_location (repl, gimple_location (stmt));
> >gimple_move_vops (repl, stmt);
> > @@ -3090,16 +3100,6 @@ gimple_fold_builtin_memory_chk (gimple_stmt_iterator 
> > *gsi,
> >return true;
> >  }
> >
> > -/* Print a message in the dump file recording transformation of FROM to 
> > TO.  */
> > -
> > -static void
> > -dump_transformation (gcall *from, gcall *to)
> > -{
> > -  if (dump_enabled_p ())
> > -dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, from, "simplified %T to 
> > %T\n",
> > -  gimple_call_fn (from), gimple_call_fn (to));
> > -}
> > -
> >  /* Fold a call to the __st[rp]cpy_chk builtin.
> > DEST, SRC, and SIZE are the arguments to the call.
> > IGNORE is true if return value can be ignored.  FCODE is the BUILT_IN_*
> > @@ -3189,7 +3189,6 @@ gimple_fold_builtin_stxcpy_chk (gimple_stmt_iterator 
> > *gsi,
> >  return false;
> >
> >gcall *repl = gimple_build_call (fn, 2, dest, src);
> > -  dump_transformation (stmt, repl);
> >replace_call_with_call_and_fold (gsi, repl);
> >return true;
> >  }
> > @@ -3235,7 +3234,6 @@ gimple_fold_builtin_stxncpy_chk (gimple_stmt_iterator 
> > *gsi,
> >  return false;
> >
> >gcall *repl = gimple_build_call (fn, 3, dest, src, len);
> > -  dump_transformation (stmt, repl);
> >replace_call_with_call_and_fold (gsi, repl);
> >return true;
> >  }
> > --
> > 2.34.1
> >


0001-gimple-fold-consistent-dump-of-builtin-call-simplifi.patch
Description: 0001-gimple-fold-consistent-dump-of-builtin-call-simplifi.patch


Re: [PATCH] c++: diagnose failed qualified lookup into current inst

2024-07-17 Thread Jason Merrill

On 7/17/24 1:54 PM, Patrick Palka wrote:

Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look
OK for trunk?


OK.


-- >8 --

When the scope of a qualified name is the current instantiation, and
qualified lookup finds nothing at template definition time, then we
know it'll find nothing at instantiation time (unless the current
instantiation has dependent bases).  So such qualified name lookup
failure can be diagnosed ahead of time as per [temp.res.general]/6.

This patch implements that, for qualified names of the form:

   this->non_existent
   a.non_existent
   A::non_existent
   typename A::non_existent

It turns out we already optimistically attempt qualified lookup of
basically every qualified name, even when it's dependently scoped, and
just suppress issuing a lookup failure diagnostic after the fact when
the scope is a dependent type.  So implementing this is mostly a
matter of restricting the diagnostic suppression to "dependentish"
scopes, rather than all dependently typed scopes.

The cp_parser_conversion_function_id change is needed to avoid regressing
lookup/using8.C:

   using A::operator typename A::Nested*;

When resolving A::Nested we consider it not dependently scoped since
we entered A from cp_parser_conversion_function_id earlier.   But this
A is the implicit instantiation A not the primary template type A,
and so the lookup of Nested fails which we now diagnose.  This patch works
around this by not entering the template scope of a qualified conversion
function-id in this case, i.e. if we're in an expression vs declaration
context, by seeing if the type already went through finish_template_type
with entering_scope=true.

gcc/cp/ChangeLog:

* decl.cc (make_typename_type): Restrict name lookup failure
punting to dependentish_scope_p instead of dependent_type_p.
* error.cc (qualified_name_lookup_error): Improve diagnostic
when the scope is the current instantiation.
* parser.cc (cp_parser_diagnose_invalid_type_name): Likewise.
(cp_parser_conversion_function_id): Don't call push_scope on
a template scope unless we're in a declaration context.
(cp_parser_lookup_name): Restrict name lookup failure
punting to dependentish_scope_p instead of depedent_type_p.
* semantics.cc (finish_id_expression_1): Likewise.
* typeck.cc (finish_class_member_access_expr): Likewise.

libstdc++-v3/ChangeLog:

* include/experimental/socket
(basic_socket_iostream::basic_socket_iostream): Fix typo.
* include/tr2/dynamic_bitset
(__dynamic_bitset_base::_M_is_proper_subset_of): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/alignas18.C: Expect name lookup error for U::X.
* g++.dg/cpp0x/forw_enum13.C: Expect name lookup error for
D3::A and D4::A.
* g++.dg/parse/access13.C: Declare A::E::V to avoid name lookup
failure and preserve intent of the test.
* g++.dg/parse/enum11.C: Expect extra errors, matching the
non-template case.
* g++.dg/template/crash123.C: Avoid name lookup failure to
preserve intent of the test.
* g++.dg/template/crash124.C: Likewise.
* g++.dg/template/crash7.C: Adjust expected diagnostics.
* g++.dg/template/dtor6.C: Declare A::~A() to avoid name lookup
failure and preserve intent of the test.
* g++.dg/template/error22.C: Adjust expected diagnostics.
* g++.dg/template/static30.C: Avoid name lookup failure to
preserve intent of the test.
* g++.old-deja/g++.other/decl5.C: Adjust expected diagnostics.
* g++.dg/template/non-dependent34.C: New test.
---
  gcc/cp/decl.cc|  2 +-
  gcc/cp/error.cc   |  3 +-
  gcc/cp/parser.cc  | 10 +++--
  gcc/cp/semantics.cc   |  2 +-
  gcc/cp/typeck.cc  |  2 +-
  gcc/testsuite/g++.dg/cpp0x/alignas18.C|  3 +-
  gcc/testsuite/g++.dg/cpp0x/forw_enum13.C  |  6 +--
  gcc/testsuite/g++.dg/parse/access13.C |  1 +
  gcc/testsuite/g++.dg/parse/enum11.C   |  2 +-
  gcc/testsuite/g++.dg/template/crash123.C  |  2 +-
  gcc/testsuite/g++.dg/template/crash124.C  |  4 +-
  gcc/testsuite/g++.dg/template/crash7.C|  6 +--
  gcc/testsuite/g++.dg/template/dtor6.C |  3 +-
  gcc/testsuite/g++.dg/template/error22.C   |  2 +-
  .../g++.dg/template/non-dependent34.C | 44 +++
  gcc/testsuite/g++.dg/template/static30.C  |  4 +-
  gcc/testsuite/g++.old-deja/g++.other/decl5.C  |  2 +-
  libstdc++-v3/include/experimental/socket  |  2 +-
  libstdc++-v3/include/tr2/dynamic_bitset   |  2 +-
  19 files changed, 75 insertions(+), 27 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/template/non-dependent34.C

diff --git a/gcc/cp/decl.cc b/gcc/cp/decl.cc
index e7bb4fa3089..3c5ad554ff2 100644
--- a/gcc/c

Re: [PATCH] c++: missing -Wunused-value for ! [PR114104]

2024-07-17 Thread Patrick Palka
On Tue, 16 Jul 2024, Jason Merrill wrote:

> On 7/16/24 10:31 AM, Eric Gallager wrote:
> > On Mon, Jul 15, 2024 at 10:37 PM Patrick Palka  wrote:
> > > 
> > > Bootstrapped andrregtested on x86_64-pc-linux-gnu, does this look
> > > OK for trunk?
> > > 
> > > -- >8 --
> > > 
> > > Here we're neglecting to emit a -Wunused-value for eligible ! operator
> > > expressions, and in turn for != operator expressions that are rewritten
> > > as !(x == y), only because we don't call warn_if_unused_value on
> > > TRUTH_NOT_EXPR since its class is tcc_expression.  This patch makes us
> > > consider warning for TRUTH_NOT_EXPR as well.
> > > 
> > 
> > Eh, I think I've seen the ! operator recommended as a way to silence
> > -Wunused-value previously in cases where an actual fix isn't
> > practical; some people might be mad about this...
> 
> That sounds like weird advice to me; I'd expect the result of ! to be used,
> and we already warn if the operand is something other than bool. Clang also
> warns for the bool case.
> 
> Seems like ADDR_EXPR is another tcc_expression that could use handling here,
> e.g.
> 
> struct A { int i; };
> A& f();
> int main() {
>   &f().i; // missed warning
> }

Makes sense, like so?  Bootstrapped and regtested on x86_64-pc-linux-gnu

-- >8 --

Subject: [PATCH] c++: missing -Wunused-value for ! [PR114104]

Here we're neglecting to emit a -Wunused-value for eligible ! operator
expressions, and in turn for != operator expressions that are rewritten
as !(x == y), only because we don't call warn_if_unused_value on
TRUTH_NOT_EXPR since its class is tcc_expression.  This patch makes us
consider warning for TRUTH_NOT_EXPR as well, along with ADDR_EXPR.

PR c++/114104

gcc/cp/ChangeLog:

* cvt.cc (convert_to_void): Call warn_if_unused_value for
TRUTH_NOT_EXPR and ADDR_EXPR as well.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wunused-20.C: New test.
---
 gcc/cp/cvt.cc  |  2 ++
 gcc/testsuite/g++.dg/warn/Wunused-20.C | 19 +++
 2 files changed, 21 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/warn/Wunused-20.C

diff --git a/gcc/cp/cvt.cc b/gcc/cp/cvt.cc
index db086c017e8..d95e01c118c 100644
--- a/gcc/cp/cvt.cc
+++ b/gcc/cp/cvt.cc
@@ -1664,6 +1664,8 @@ convert_to_void (tree expr, impl_conv_void implicit, 
tsubst_flags_t complain)
  if (tclass == tcc_comparison
  || tclass == tcc_unary
  || tclass == tcc_binary
+ || code == TRUTH_NOT_EXPR
+ || code == ADDR_EXPR
  || code == VEC_PERM_EXPR
  || code == VEC_COND_EXPR)
warn_if_unused_value (e, loc);
diff --git a/gcc/testsuite/g++.dg/warn/Wunused-20.C 
b/gcc/testsuite/g++.dg/warn/Wunused-20.C
new file mode 100644
index 000..31b1920adcd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wunused-20.C
@@ -0,0 +1,19 @@
+// PR c++/114104
+// { dg-additional-options "-Wunused-value" }
+
+bool f();
+struct A { int i; };
+A& g();
+
+void h() {
+  !f(); // { dg-warning "value computed is not used" }
+  &g().i; // { dg-warning "value computed is not used" }
+}
+
+#if  __cplusplus >= 202002L
+[[nodiscard]] bool operator==(A&, int);
+
+void h(A a) {
+  a != 0; // { dg-warning "value computed is not used" "" { target c++20 } }
+}
+#endif
-- 
2.46.0.rc0.75.g04f5a52757


Re: [PATCH] c++: Fix ICE on valid involving variadic constructor [PR111592]

2024-07-17 Thread Jason Merrill

On 7/9/24 12:13 PM, Simon Martin wrote:

We currently ICE upon the following valid code, due to the fix made through
commit 9efe5fbde1e8


OK.


=== cut here ===
struct ignore { ignore(...) {} };
template
void InternalCompilerError(Args... args)
{ ignore{ ignore(args) ... }; }
int main() { InternalCompilerError(0, 0); }
=== cut here ===

Change 9efe5fbde1e8 avoids infinite recursion in build_over_call by returning
error_mark_node if one invokes ignore::ignore(...) with an argument of type
ignore, because otherwise we end up calling convert_arg_to_ellipsis for that
argument, and recurse into build_over_call with the exact same parameters.

This patch tightens the condition to only return error_mark_node if there's one
and only one parameter to the call being processed - otherwise we won't
infinitely recurse.

Successfully tested on x86_64-pc-linux-gnu.

PR c++/111592

gcc/cp/ChangeLog:

* call.cc (build_over_call): Only error out if there's a single
parameter of type A in a call to A::A(...).

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/variadic186.C: New test.

---
  gcc/cp/call.cc   |  1 +
  gcc/testsuite/g++.dg/cpp0x/variadic186.C | 11 +++
  2 files changed, 12 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/cpp0x/variadic186.C

diff --git a/gcc/cp/call.cc b/gcc/cp/call.cc
index 83070b2f633..0e4e9a80408 100644
--- a/gcc/cp/call.cc
+++ b/gcc/cp/call.cc
@@ -10335,6 +10335,7 @@ build_over_call (struct z_candidate *cand, int flags, 
tsubst_flags_t complain)
  a = decay_conversion (a, complain);
}
else if (DECL_CONSTRUCTOR_P (fn)
+  && vec_safe_length (args) == 1
   && same_type_ignoring_top_level_qualifiers_p (DECL_CONTEXT (fn),
 TREE_TYPE (a)))
{
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic186.C 
b/gcc/testsuite/g++.dg/cpp0x/variadic186.C
new file mode 100644
index 000..4a25a1a96bf
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic186.C
@@ -0,0 +1,11 @@
+// PR c++/111592
+// { dg-do compile { target c++11 } }
+
+struct ignore { ignore(...) {} };
+
+template
+void InternalCompilerError(Args... args)
+{ ignore{ ignore(args) ... }; }
+
+int main()
+{ InternalCompilerError(0, 0); }




Re: [PATCH] c-family: Introduce the -Winvalid-noreturn flag from clang with extra tuneability

2024-07-17 Thread Jason Merrill

On 7/9/24 6:46 AM, Julian Waters wrote:

Hi Jason,

Sorry for the long period of radio silence, but I'm finally done with
all my university coursework. The main issue I see here is how to
process the Winvalid-noreturn= entry. Can I define it in c.opt and
then process it in c_common_handle_option and then store whether the
-W or -Wno form was used in a bool somewhere? If I can do that
somehow, and then access the bool at the callsites where it is
required, I can figure out the rest on my own


You should be able to handle setting the variable entirely in c.opt, 
following the pattern of -fstrong-eval-order.  The variable is defined 
by the Var(...) notation in c.opt, and the opts code handles setting it.



On Tue, Jun 11, 2024 at 10:26 AM Jason Merrill  wrote:


On 6/10/24 03:13, Julian Waters wrote:

Hi Jason,

Thanks for the reply. I'm a little bit overwhelmed with university at
the moment, would it be ok if I delay implementing this a little bit?


Sure, we're still early in GCC 15 development, no time pressure.


On Tue, Jun 4, 2024 at 1:04 AM Jason Merrill  wrote:


On 6/1/24 11:31, Julian Waters wrote:

Hi Jason,

Thanks for the reply! I'll address your comments soon. I have a
question, if there is an option defined in c.opt as an Enum, like
fstrong-eval-order, and the -no variant of the option is passed, would
the Var somehow reflect the negated option? Eg

Winvalid-noreturn=
C ObjC C++ ObjC++ Var(warn_invalid_noreturn) Joined
Enum(invalid_noreturn) Warning

Enum
Name(invalid_noreturn) Type(int)

EnumValue
Enum(invalid_noreturn) String(explicit) Value(0)


-fstrong-eval-order has

fstrong-eval-order
C++ ObjC++ Common Alias(fstrong-eval-order=, all, none)

to represent that plain -fstrong-eval-order is equivalent to
-fstrong-eval-order=all, and -fno-strong-eval-order is equivalent to =none.


Would warn_invalid_noreturn then != 0 if
-Wno-invalid-noreturn=explicit is passed? Or is there a way to make a
warning call depend on 2 different OPT_ entries?


Typically = options will specify RejectNegative so the driver will
reject e.g. -Wno-invalid-noreturn=explicit.

Jason


best regards,
Julian

On Sat, Jun 1, 2024 at 4:57 AM Jason Merrill  wrote:


On 5/29/24 09:58, Julian Waters wrote:

Currently, gcc warns about noreturn marked functions that return both 
explicitly and implicitly, with no way to turn this warning off. clang does 
have an option for these classes of warnings, -Winvalid-noreturn. However, we 
can do better. Instead of just having 1 option that switches the warnings for 
both on and off, we can define an extra layer of granularity, and have a 
separate options for implicit returns and explicit returns, as in 
-Winvalid-return=explicit and -Winvalid-noreturn=implicit. This patch adds both 
to gcc, for compatibility with clang.


Thanks!


Do note that I am relatively new to gcc's codebase, and as such couldn't figure 
out how to cleanly define a general -Winvalid-noreturn warning that switch both 
on and off, for better compatibility with clang. If someone should point out 
how to do so, I'll happily rewrite my patch.


See -fstrong-eval-order for an example of an option that can be used
with or without =arg.


I also do not have write access to gcc, and will need help pushing this patch 
once the green light is given


Good to know, I can take care of that.


best regards,
Julian

gcc/c-family/ChangeLog:

 * c.opt: Introduce -Winvalid-noreturn=explicit and 
-Winvalid-noreturn=implicit

gcc/ChangeLog:

 * tree-cfg.cc (pass_warn_function_return::execute): Use it

gcc/c/ChangeLog:

 * c-typeck.cc (c_finish_return): Use it
 * gimple-parser.cc (c_finish_gimple_return): Use it

gcc/config/mingw/ChangeLog:

 * mingw32.h (EXTRA_OS_CPP_BUILTINS): Fix semicolons

gcc/cp/ChangeLog:

 * coroutines.cc (finish_co_return_stmt): Use it
 * typeck.cc (check_return_expr): Use it

gcc/doc/ChangeLog:

 * invoke.texi: Document new options

From 4daf884f8bbc1e318ba93121a6fdf4139da80b64 Mon Sep 17 00:00:00 2001
From: TheShermanTanker 
Date: Wed, 29 May 2024 21:32:08 +0800
Subject: [PATCH] Introduce the -Winvalid-noreturn flag from clang with extra
 tuneability


The rationale and ChangeLog entries should be part of the commit message
(and so the git format-patch output).



Signed-off-by: TheShermanTanker 


A DCO sign-off can't use a pseudonym, sorry; please either sign off
using your real name or file a copyright assignment for the pseudonym
with the FSF.

See https://gcc.gnu.org/contribute.html#legal for more detail.


---
 gcc/c-family/c.opt |  8 
 gcc/c/c-typeck.cc  |  2 +-
 gcc/c/gimple-parser.cc |  2 +-
 gcc/config/mingw/mingw32.h |  6 +++---
 gcc/cp/coroutines.cc   |  2 +-
 gcc/cp/typeck.cc   |  2 +-
 gcc/doc/invoke.texi| 13 +
 gcc/tree-cfg.cc|  2 +-
 8 files changed, 29 insertions(+), 8 deletions(-)

diff --git a/gcc/c-famil

Re: [PATCH] c++: missing -Wunused-value for ! [PR114104]

2024-07-17 Thread Jason Merrill

On 7/17/24 3:20 PM, Patrick Palka wrote:

On Tue, 16 Jul 2024, Jason Merrill wrote:


On 7/16/24 10:31 AM, Eric Gallager wrote:

On Mon, Jul 15, 2024 at 10:37 PM Patrick Palka  wrote:


Bootstrapped andrregtested on x86_64-pc-linux-gnu, does this look
OK for trunk?

-- >8 --

Here we're neglecting to emit a -Wunused-value for eligible ! operator
expressions, and in turn for != operator expressions that are rewritten
as !(x == y), only because we don't call warn_if_unused_value on
TRUTH_NOT_EXPR since its class is tcc_expression.  This patch makes us
consider warning for TRUTH_NOT_EXPR as well.



Eh, I think I've seen the ! operator recommended as a way to silence
-Wunused-value previously in cases where an actual fix isn't
practical; some people might be mad about this...


That sounds like weird advice to me; I'd expect the result of ! to be used,
and we already warn if the operand is something other than bool. Clang also
warns for the bool case.

Seems like ADDR_EXPR is another tcc_expression that could use handling here,
e.g.

struct A { int i; };
A& f();
int main() {
   &f().i; // missed warning
}


Makes sense, like so?  Bootstrapped and regtested on x86_64-pc-linux-gnu


OK.


-- >8 --

Subject: [PATCH] c++: missing -Wunused-value for ! [PR114104]

Here we're neglecting to emit a -Wunused-value for eligible ! operator
expressions, and in turn for != operator expressions that are rewritten
as !(x == y), only because we don't call warn_if_unused_value on
TRUTH_NOT_EXPR since its class is tcc_expression.  This patch makes us
consider warning for TRUTH_NOT_EXPR as well, along with ADDR_EXPR.

PR c++/114104

gcc/cp/ChangeLog:

* cvt.cc (convert_to_void): Call warn_if_unused_value for
TRUTH_NOT_EXPR and ADDR_EXPR as well.

gcc/testsuite/ChangeLog:

* g++.dg/warn/Wunused-20.C: New test.
---
  gcc/cp/cvt.cc  |  2 ++
  gcc/testsuite/g++.dg/warn/Wunused-20.C | 19 +++
  2 files changed, 21 insertions(+)
  create mode 100644 gcc/testsuite/g++.dg/warn/Wunused-20.C

diff --git a/gcc/cp/cvt.cc b/gcc/cp/cvt.cc
index db086c017e8..d95e01c118c 100644
--- a/gcc/cp/cvt.cc
+++ b/gcc/cp/cvt.cc
@@ -1664,6 +1664,8 @@ convert_to_void (tree expr, impl_conv_void implicit, 
tsubst_flags_t complain)
  if (tclass == tcc_comparison
  || tclass == tcc_unary
  || tclass == tcc_binary
+ || code == TRUTH_NOT_EXPR
+ || code == ADDR_EXPR
  || code == VEC_PERM_EXPR
  || code == VEC_COND_EXPR)
warn_if_unused_value (e, loc);
diff --git a/gcc/testsuite/g++.dg/warn/Wunused-20.C 
b/gcc/testsuite/g++.dg/warn/Wunused-20.C
new file mode 100644
index 000..31b1920adcd
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wunused-20.C
@@ -0,0 +1,19 @@
+// PR c++/114104
+// { dg-additional-options "-Wunused-value" }
+
+bool f();
+struct A { int i; };
+A& g();
+
+void h() {
+  !f(); // { dg-warning "value computed is not used" }
+  &g().i; // { dg-warning "value computed is not used" }
+}
+
+#if  __cplusplus >= 202002L
+[[nodiscard]] bool operator==(A&, int);
+
+void h(A a) {
+  a != 0; // { dg-warning "value computed is not used" "" { target c++20 } }
+}
+#endif




  1   2   >