[Bug c/114088] Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw

2024-02-24 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088

--- Comment #3 from Thiago Macieira  ---
> But __builtin_strlen *does* get optimized when the input is a string literal. 
>  Not sure about wcslen though.

It appears not to, in the test above. std::char_trait::length() calls
wcslen() whereas the char specialisation uses __builtin_strlen() explicitly.
But if the intrinsics are enabled, the two would be the same, wouldn't they?

Anyway, in the absence of a library function to call, inserting the loop is
fine; it's what is there already.

Though it would be nice to be able to provide such a function. I wrote it for
Qt (it's called qustrlen). I would try with __builtin_constant_p first to see
if the string is a literal.

[Bug c/114088] Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw

2024-02-24 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #2 from Xi Ruoyao  ---
(In reply to Jonathan Wakely from comment #1)
> GCC built-ins like __builtin_strlen just wrap a libc function. 
> __builtin_wcslen would generally just be a call to wcslen, which doesn't give 
> you much.

But __builtin_strlen *does* get optimized when the input is a string literal. 
Not sure about wcslen though.

[Bug target/100799] Stackoverflow in optimized code on PPC

2024-02-24 Thread bergner at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799

--- Comment #27 from Peter Bergner  ---
(In reply to Jakub Jelinek from comment #26)
> But I still think the workaround is possible on the callee side.
> Sure, if the DECL_HIDDEN_STRING_LENGTH argument(s) is(are) used in the
> function, then there is no easy way but expect the parameter save area (ok,
> sure, it could just load from the assumed parameter location and don't
> assume the rest is there, nor allow storing to the slots it loaded them
> from).
> But that is actually not what BLAS etc. suffers from.
[snip]
> So, the workaround could be for the case of unused DECL_HIDDEN_STRING_LENGTH
> arguments at the end of PARM_DECLs don't try to load those at all and don't
> assume there is parameter save area unless the non-DECL_HIDDEN_STRING_LENGTH
> or used DECL_HIDDEN_STRING_LENGTH arguments actually require it.
So I looked closer at what the failure mode was in this PR (versus the one
you're seeing with flexiblas).  As in your case, there is a mismatch in the
number of parameters the C caller thinks there are (8 args, so no param save
area needed) versus what the Fortran callee thinks there are (9 params which
include the one hidden arg, so there is a param save area).  The Fortran
function doesn't actually access the hidden argument in our test case above, in
fact the character argument is never used either.  What I see in the rtl dumps
is that *all* incoming args have a REG_EQUIV generated that points to the param
save area (this doesn't happen when there are 8 or fewer formal params), even
for the first 8 args that are passed in registers:

(insn 2 12 3 2 (set (reg/v/f:DI 117 [ r3 ])
(reg:DI 3 3 [ r3 ])) "callee-3.c":6:1 685 {*movdi_internal64}
 (expr_list:REG_EQUIV (mem/f/c:DI (plus:DI (reg/f:DI 99 ap)
(const_int 32 [0x20])) [1 r3+0 S8 A64])
(nil)))
(insn 3 2 4 2 (set (reg/v:DI 118 [ r4 ])
(reg:DI 4 4 [ r4 ])) "callee-3.c":6:1 685 {*movdi_internal64}
 (expr_list:REG_EQUIV (mem/c:DI (plus:DI (reg/f:DI 99 ap)
(const_int 40 [0x28])) [2 r4+0 S8 A64])
(nil)))
...

We then get to RA and we end up spilling one of the pseudos associated with one
of the other parameters (not the character param JOB).  LRA then uses that
REG_EQUIV note and rather than allocating a new stack slot to spill to, it uses
the parameter save memory location for that parameter for the spill slot.  When
we store to that memory location and the C caller has not allocated the param
save area, we end up clobbering an important part of the C callers stack
causing a crash.

If we were to try and do a callee workaround, we would need to disable setting
those REG_EQUIV notes for the parameters... if that's even possible.  Since
Fortran uses call-by-name parameter passing, isn't the updated param value from
the callee returned in the parameter save area itself???


> Doing the workaround on the caller side is impossible, this is for calls
> from C/C++ to Fortran code, directly or indirectly called and there is
> nothing the compiler could use to guess that it actually calls Fortran code
> with hidden Fortran character arguments.
As a HUGE hammer, every caller could always allocate a param save area.  That
would "fix" the problem from this bug, but would that also fix the bug you're
seeing in flexiblas?

I'm not advocating this though.  I was thinking maybe making callers (under an
option?) conservatively assume the callee is a Fortran function and for those C
arguments that could map to a Fortran parameter with a hidden argument, bump
the number of counted args by 1.  For example, a C function with 2 char/char *
args and 6 int args would think there are 8 normal args and 2 hidden args, so
it needs to allocate a param save area.  Is that not feasible?  ...or does that
not even address the issue you're seeing in your bug?

gcc-13-20240224 is now available

2024-02-24 Thread GCC Administrator via Gcc
Snapshot gcc-13-20240224 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/13-20240224/
and on various mirrors, see https://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 13 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-13 revision acafe0f9824e77f1259de1e833886003bf8a6864

You'll find:

 gcc-13-20240224.tar.xz   Complete GCC

  SHA256=3a5aa2c45d30efbe96872d92df85ba26a9c58f0c823cc2867569f06cd606a88f
  SHA1=95e21fd541e0c5b696f2d02c98e60f0dab460246

Diffs from 13-20240217 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-13
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[Bug tree-optimization/114093] New: Canonicalization of `a == -1 || a == 0`

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114093

Bug ID: 114093
   Summary: Canonicalization of `a == -1 || a == 0`
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Take:
```
_Bool f1(int a)
{
return a == -1 || a == 0;
}

_Bool f0(signed a)
{
a = -a;
return a == 1 || a == 0;
}


_Bool f(unsigned a)
{
return a == -1u || a == 0;
}

_Bool f3(unsigned a)
{
a = -a;
return a == 1 || a == 0;
}


_Bool f2(unsigned a)
{
return (-a) <= 1;
}
```

These all should produce the exact same code as they are all equivalent (if we
ignore the (undefined) overflow possibility for f0).

This is more about canonicalizations rather than anything else.

Though I will note on the riscv and mips targets, f is worse than the others.

LLVM canonical form seems to be `((unsigned)a) + 1 <= 1`.

[Bug tree-optimization/114092] ADD_OVERFLOW with resulting type of `_Complex unsigned:1` should be reduced to just `(unsigned)(a) <= 1`

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114092

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #2 from Jakub Jelinek  ---
Guess all of .ADD_OVERFLOW (x, 0), .ADD_OVERFLOW (0, x) and .SUB_OVERFLOW (x,
0)
to REALPART_EXPR = (type) x and IMAGPART_EXPR to (type) x != x.  Just need to
figure out for which types it is beneficial and for which it isn't.

[Bug tree-optimization/114092] ADD_OVERFLOW with resulting type of `_Complex unsigned:1` should be reduced to just `(unsigned)(a) <= 1`

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114092

--- Comment #1 from Andrew Pinski  ---
I should note that LLVM (LLVM does not have __builtin_add_overflow_p) is able
to optimize:
```
_Bool f2(int a, struct d b, unsigned _BitInt(1) t)
{
return __builtin_add_overflow(a, 0, );
}
```
into  f1.

[Bug tree-optimization/114092] New: ADD_OVERFLOW with resulting type of `_Complex unsigned:1` should be reduced to just `(unsigned)(a) <= 1`

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114092

Bug ID: 114092
   Summary: ADD_OVERFLOW with resulting type of `_Complex
unsigned:1` should be reduced to just `(unsigned)(a)
<= 1`
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Keywords: missed-optimization
  Severity: enhancement
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pinskia at gcc dot gnu.org
  Target Milestone: ---

Take:
```
struct d
{
  unsigned i:1;
};
_Bool f(int a, struct d b)
{
return __builtin_add_overflow_p(a, 0, b.i);
}


_Bool f1(int a, struct d b)
{
return a != 1 && a != 0;
}
```

These 2 functions should produce the same. Here `a+0` overflows an `unsigned:1`
if the value of a is not 0 or 1.

We could extend this to any smaller types too if we want.

[Bug target/114091] gcc/config/aarch64/aarch64.cc has code requiring c++14 instead of c++11, so g++14 bootsrap fails in my example context

2024-02-24 Thread markmigm at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114091

--- Comment #2 from Mark Millard  ---
(In reply to Andrew Pinski from comment #1)
> This has already been fixed, over 2 weeks ago.
> 
> >20240114
> 
> You are using a GCC 14 snapshot from a month ago even. Please try a newer
> snapshot before reporting a bug next time.
> 
> *** This bug has been marked as a duplicate of bug 113763 ***

Sorry. I was building a FreeBSD port and I'm not a port maintainer, much
less one for FreeBSD's lang/gcc14-devel .

I've sent the port maintainer a copy of your reply. Thanks.

[Bug target/113763] [14 Regression] build fails with clang++ host compiler because aarch64.cc uses C++14 constexpr.

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113763

Andrew Pinski  changed:

   What|Removed |Added

 CC||markmigm at gmail dot com

--- Comment #19 from Andrew Pinski  ---
*** Bug 114091 has been marked as a duplicate of this bug. ***

[Bug target/114091] gcc/config/aarch64/aarch64.cc has code requiring c++14 instead of c++11, so g++14 bootsrap fails in my example context

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114091

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Andrew Pinski  ---
This has already been fixed, over 2 weeks ago.

>20240114

You are using a GCC 14 snapshot from a month ago even. Please try a newer
snapshot before reporting a bug next time.

*** This bug has been marked as a duplicate of bug 113763 ***

[Bug c++/114091] New: gcc/config/aarch64/aarch64.cc has code requiring c++14 instead of c++11, so g++14 bootsrap fails in my example context

2024-02-24 Thread markmigm at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114091

Bug ID: 114091
   Summary: gcc/config/aarch64/aarch64.cc has code requiring c++14
instead of c++11, so g++14 bootsrap fails in my
example context
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: markmigm at gmail dot com
  Target Milestone: ---

[I'm not sure where gcc/config/aarch64/aarch64.cc fits in the
component alternatives. Feel free to correct that if I got it
wrong.] 

gcc bootstrap is based on c++11, which predates the constructors for pair
being constexpr. The gcc/config/aarch64/aarch64.cc specific code can fail
because of using pair constructors where constant expressions are required:

/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/config/aarch64/aarch64.cc:13095:50:
error: constexpr variable 'tiles' must be initialized by a constant expression
static constexpr std::pair tiles[] = {
^ ~
/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/config/aarch64/aarch64.cc:13096:5:
note: non-constexpr constructor 'pair' cannot be used in a
constant expression
{ 0xff, 'b' },
^

This stops the bootstrap in the example context.

This is detected when clang is doing the bootstrapping on FreeBSD.
For reference:

c++ -std=c++11  -fPIC -c   -g -DIN_GCC   -fno-strict-aliasing -fno-exceptions
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings
-Wcast-qual -Wno-format -Wmissing-format-attribute -Wconditionally-supported
-Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros
-Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -fPIC -I. -I.
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/.
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../include 
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../libcpp/include
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../libcody
-I/usr/local/include 
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../libdecnumber
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../libdecnumber/bid
-I../libdecnumber
-I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../libbacktrace 
-DLIBICONV_PLUG -o aarch64.o -MT aarch64.o -MMD -MP -MF ./.deps/aarch64.TPo
/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/config/aarch64/aarch64.cc

where clang used its libc++ (its a FreeBSD context):

/usr/include/c++/v1/__utility/pair.h:225:5: note: declared here
  225 | pair(_U1&& __u1, _U2&& __u2)
  | ^

having -std=c++11 on the command line. That results in lack of
constexpr status in libc++ .

It would appear that until gcc bootstrap intends on being based on
c++14 (or later) that the gcc/config/aarch64/aarch64.cc code
reported is presuming a post-c++11 context when it should not be.

New Chinese (simplified) PO file for 'gcc' (version 14.1-b20240218)

2024-02-24 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Chinese (simplified) team of translators.  The file is available at:

https://translationproject.org/latest/gcc/zh_CN.po

(This file, 'gcc-14.1-b20240218.zh_CN.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

https://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

https://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090

Jakub Jelinek  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #6 from Jakub Jelinek  ---
Created attachment 57521
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57521=edit
gcc14-pr114090.patch

Full untested patch.

[Bug c/114088] Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw

2024-02-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088

--- Comment #1 from Jonathan Wakely  ---
GCC built-ins like __builtin_strlen just wrap a libc function. __builtin_wcslen
would generally just be a call to wcslen, which doesn't give you much. I assume
what you want is to recognize wcslen and replace it with inline assembly code.

Similarly, if libc doesn't provide c16slen then a __builtin_c16slen isn't going
to do much.

I think what you want is better code for finding char16_t(0) or char32_t(0),
not a new built-in.

[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090

Jakub Jelinek  changed:

   What|Removed |Added

   Priority|P3  |P2

--- Comment #5 from Jakub Jelinek  ---
&& !TYPE_OVERFLOW_SANITIZED (type) is IMHO not needed, because both
transformations for
INT_MIN trigger UB before and after.

[Bug fortran/66499] Letters with accents change format behavior for X and T descriptors.

2024-02-24 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66499

--- Comment #7 from Jerry DeLisle  ---
There two issues going on here. We do not interpret source code that is UTF-8
encoded.  This is why in our current tests for UTF-8 encoding of data files we
us hexidecimal codes.

I will have to see what the standard says about non=ASCII character sets in
source code.

If I get around this by using something like this:

char1 = 4_"Test without local char"
char2 = 4_"Test with local char "

char2(22:22) = 4_"Ã"
char2(23:23) = 4_"Ã"

$ ./a.out 
  23
  23
1234567890123456789012345678901234567890
  Test without local char  10.
  Test with local char ÃÃ10.

The string lengths now match correctly.  One can see the tabbing is still off. 
This is because the format buffer seek functions are byte oriented and when
using UTF-8 encoding we need to seek the buffer differently. In fact we have to
allocate it differently as well to maintain the four byte characters.

[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
I'd go with
--- gcc/match.pd.jj 2024-02-22 10:09:48.678446435 +0100
+++ gcc/match.pd2024-02-24 19:23:32.201014245 +0100
@@ -453,8 +453,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

 /* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x.  */
 (simplify
-  (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
-  (abs @0))
+ (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
+ (if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
+  (abs @0)))

 /* X * 1, X / 1 -> X.  */
 (for op (mult trunc_div ceil_div floor_div round_div exact_div)
@@ -4218,8 +4219,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)

 /* (x <= 0 ? -x : 0) -> max(-x, 0).  */
 (simplify
-  (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
-  (max @2 @1))
+ (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1)
+ (if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
+  (max @2 @1)))

 /* (zero_one == 0) ? y : z  y -> ((typeof(y))zero_one * z)  y */
 (for op (bit_xor bit_ior plus)

[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090

--- Comment #3 from Jakub Jelinek  ---
Both the patterns look wrong for TYPE_OVERFLOW_WRAPS and the first one also for
TYPE_UNSIGNED (the second one is ok for TYPE_UNSIGNED but doesn't make much
sense there, we should have folded it to 0.  Of course, the first one is
unlikely to trigger for TYPE_UNSIGNED because MAX  should have
been folded to 0.

[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090

Andrew Pinski  changed:

   What|Removed |Added

   Last reconfirmed||2024-02-24
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
There most likely should be this check added:
ANY_INTEGRAL_TYPE_P (type) && !TYPE_OVERFLOW_WRAPS (type)

[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=94920
   Keywords||wrong-code
   Target Milestone|--- |13.3
Summary|forwprop -fwrapv|[13/14 Regression] forwprop
   |miscompilation  |-fwrapv miscompilation

--- Comment #1 from Andrew Pinski  ---
The pattern:
/* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x.  */
(simplify
  (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop))
  (abs @0))

introduced by r13-1785-g633e9920589ddf .

[Bug tree-optimization/114090] New: forwprop -fwrapv miscompilation

2024-02-24 Thread kristerw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090

Bug ID: 114090
   Summary: forwprop -fwrapv miscompilation
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: kristerw at gcc dot gnu.org
  Target Milestone: ---

The function f below returns an incorrect result for INT_MIN when compiled with
-O1 -fwrapv for X86_64:


__attribute__((noipa)) int f(int x) {
int w = (x >= 0 ? x : 0);
int y = -x;
int z = (y >= 0 ? y : 0);
return w + z;
}

int
main ()
{
  if (f(0x8000) != 0)
__builtin_abort ();
  return 0;
}


What is happening is that forwprop has optimized

  w_2 = MAX_EXPR ;
  y_3 = -x_1(D);
  z_4 = MAX_EXPR ;
  _5 = w_2 + z_4;
  return _5;

to

  _5 = ABS_EXPR ;
  return _5;

[Bug testsuite/114089] FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess errors)

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114089

--- Comment #2 from Jakub Jelinek  ---
I mean r14-9162 , sorry.

[Bug testsuite/114089] FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess errors)

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114089

Jakub Jelinek  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 CC||jakub at gcc dot gnu.org
 Resolution|--- |FIXED
Version|13.2.1  |14.0

--- Comment #1 from Jakub Jelinek  ---
See r14-9165 ?

Re: CI for "Option handling: add documentation URLs"

2024-02-24 Thread Mark Wielaard
Hi,

On Thu, Feb 22, 2024 at 11:57:50AM +0800, YunQiang Su wrote:
> Mark Wielaard  于2024年2月19日周一 06:58写道:
> > So, I did try the regenerate-opt-urls locally, and it did generate the
> > attached diff. Which seems to show we really need this automated.
> >
> > Going over the diff. The -Winfinite-recursion in rust does indeed seem
> > new.  As do the -mapx-inline-asm-use-gpr32 and mevex512 for i386.  And
> > the avr options -mskip-bug, -mflmap and mrodata-in-ram.  The change in
> > common.opt.urls for -Wuse-after-free comes from it being moved from
> > c++ to the c-family. The changes in mips.opt.urls seem to come from
> > commit 46df1369 "doc/invoke: Remove duplicate explicit-relocs entry of
> > MIPS".
> >
> 
> For MIPS, it's due to malformed patches to invoke.text.
> I will fix them.

Thanks. So with your commit 00bc8c0998d8 ("invoke.texi: Fix some
skipping UrlSuffix problem for MIPS") pushed now, the attached patch
fixes the remaining issues.

Is this OK to push?

> > The changes in c.opt.urls seem mostly reordering. The sorting makes
> > more sense after the diff imho. And must have come from commit
> > 4666cbde5 "Sort warning options in c-family/c.opt".
> >
> > Also the documentation for -Warray-parameter was fixed.
> >
> > So I think the regenerate-opt-urls check does work as intended. So
> > lets automate it, because it looks like nobody regenerated the
> > url.opts after updating the documentation.
> >
> > But we should first apply this diff. Could you double check it is
> > sane/correct?
> >
> > Thanks,
> >
> > Mark
> 
> 
> 
> -- 
> YunQiang Su
>From c019327e919fff87ffa94799e8f521bda707a883 Mon Sep 17 00:00:00 2001
From: Mark Wielaard 
Date: Sat, 24 Feb 2024 17:34:05 +0100
Subject: [PATCH] Regenerate opt.urls

There were several commits that didn't regenerate the opt.urls files.

Fixes: 438ef143679e ("rs6000: Neuter option -mpower{8,9}-vector")
Fixes: 50c549ef3db6 ("gccrs: enable -Winfinite-recursion warnings by default")
Fixes: 25bb8a40abd9 ("Move docs for -Wuse-after-free and -Wuseless-cast")
Fixes: 48448055fb70 ("AVR: Support .rodata in Flash for AVR64* and AVR128*")
Fixes: 42503cc257fb ("AVR: Document option -mskip-bug")
Fixes: 7de5bb642c12 ("i386: [APX] Document inline asm behavior and new switch")
Fixes: 49a14ee488b8 ("Add -mevex512 into invoke.texi")
Fixes: 4666cbde5e6d ("Sort warning options in c-family/c.opt.")

gcc/config/
* rs6000/rs6000.opt.urls: Regenerate.
* avr/avr.opt.urls: Likewise.
* i386/i386.opt.urls: Likewise.
* pru/pru.opt.urls: Likewise.
* riscv/riscv.opt.urls: Likewise.

gcc/rust/
* lang.opt.urls: Regenerate.

gcc/
* common.opt.urls: Regenerate.

gcc/c-family/
* c.opt.urls: Regenerate.
---
 gcc/c-family/c.opt.urls   | 351 +++---
 gcc/common.opt.urls   |   4 +-
 gcc/config/avr/avr.opt.urls   |   9 +
 gcc/config/i386/i386.opt.urls |   8 +-
 gcc/config/pru/pru.opt.urls   |   2 +-
 gcc/config/riscv/riscv.opt.urls   |   2 +-
 gcc/config/rs6000/rs6000.opt.urls |   3 -
 gcc/rust/lang.opt.urls|   3 +
 8 files changed, 200 insertions(+), 182 deletions(-)

diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls
index 5365c8e2bc54..9f97dc61a778 100644
--- a/gcc/c-family/c.opt.urls
+++ b/gcc/c-family/c.opt.urls
@@ -88,6 +88,9 @@ UrlSuffix(gcc/Warning-Options.html#index-Wabsolute-value)
 Waddress
 UrlSuffix(gcc/Warning-Options.html#index-Waddress)
 
+Waddress-of-packed-member
+UrlSuffix(gcc/Warning-Options.html#index-Waddress-of-packed-member)
+
 Waligned-new
 UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Waligned-new)
 
@@ -115,6 +118,9 @@ UrlSuffix(gcc/Warning-Options.html#index-Walloc-zero)
 Walloca-larger-than=
 UrlSuffix(gcc/Warning-Options.html#index-Walloca-larger-than_003d) 
LangUrlSuffix_D(gdc/Warnings.html#index-Walloca-larger-than)
 
+Warith-conversion
+UrlSuffix(gcc/Warning-Options.html#index-Warith-conversion)
+
 Warray-bounds=
 UrlSuffix(gcc/Warning-Options.html#index-Warray-bounds)
 
@@ -122,13 +128,10 @@ Warray-compare
 UrlSuffix(gcc/Warning-Options.html#index-Warray-compare)
 
 Warray-parameter
-UrlSuffix(gcc/Warning-Options.html#index-Wno-array-parameter)
+UrlSuffix(gcc/Warning-Options.html#index-Warray-parameter)
 
 Warray-parameter=
-UrlSuffix(gcc/Warning-Options.html#index-Wno-array-parameter)
-
-Wzero-length-bounds
-UrlSuffix(gcc/Warning-Options.html#index-Wzero-length-bounds)
+UrlSuffix(gcc/Warning-Options.html#index-Warray-parameter)
 
 Wassign-intercept
 
UrlSuffix(gcc/Objective-C-and-Objective-C_002b_002b-Dialect-Options.html#index-Wassign-intercept)
@@ -148,9 +151,6 @@ UrlSuffix(gcc/Warning-Options.html#index-Wbool-compare)
 Wbool-operation
 UrlSuffix(gcc/Warning-Options.html#index-Wbool-operation)
 
-Wframe-address
-UrlSuffix(gcc/Warning-Options.html#index-Wframe-address)
-
 Wbuiltin-declaration-mismatch
 UrlSuffix(gcc/Warning-Options.html#index-Wbuiltin-declaration-mismatch) 

[Bug testsuite/114089] New: FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess errors)

2024-02-24 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114089

Bug ID: 114089
   Summary: FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess
errors)
   Product: gcc
   Version: 13.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: testsuite
  Assignee: unassigned at gcc dot gnu.org
  Reporter: danglin at gcc dot gnu.org
CC: rsandifo at gcc dot gnu.org
  Target Milestone: ---
  Host: hppa64-hp-hpux11.11
Target: hppa64-hp-hpux11.11
 Build: hppa64-hp-hpux11.11

This test fails on hppa64-hp-hpux11.11.  Test lacks "target aarch64*-*-*"
restriction.

[Bug middle-end/114087] RISC-V optimization on checking certain bits set ((x & mask) == val)

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114087

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement
   Keywords||missed-optimization
  Component|rtl-optimization|middle-end

[Bug c/114088] Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw

2024-02-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug c/114088] New: Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw

2024-02-24 Thread thiago at kde dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088

Bug ID: 114088
   Summary: Please provide __builtin_c16slen and __builtin_c32slen
to complement __builtin_wcslenw
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: thiago at kde dot org
  Target Milestone: ---

Actually, GCC doesn't have __builtin_wcslen, but Clang does. Providing these
extra two builtins would allow implementing __builtin_wcslen too. The names are
not part of the C standard, but follow the current naming construction rules
for it, similar to how "mbrtowc" and "wcslen" parallel.

My specific need is actually to implement char16_t string containers in C++.
I'm particularly interested in QString/QStringView, but this applies to
std::basic_string{_view} too.

For example:

std::string_view f1() { return "Hello"; }
std::wstring_view fw() { return L"Hello"; }
std::u16string_view f16() { return u"Hello"; }
std::u32string_view f32() { return U"Hello"; }

With GCC and libstdc++, the first function produces optimal code:
movl$5, %eax
leaq.LC0(%rip), %rdx
ret

For wchar_t case, GCC emits an out-of-line call to wcslen:
pushq   %rbx
leaq.LC2(%rip), %rbx
movq%rbx, %rdi
callwcslen@PLT
movq%rbx, %rdx
popq%rbx
ret

The next two, because of the absence of a C library function, emit a loop:
xorl%eax, %eax
leaq.LC1(%rip), %rcx
.L4:
incq%rax
cmpw$0, (%rcx,%rax,2)
jne .L4
movq%rcx, %rdx
ret

Clang, meanwhile, emits optimal code for all four and so did the pre-Clang
Intel compiler. See https://gcc.godbolt.org/z/qvj7qnYbz. MSVC emits optimal for
the char and wchar_t versions, but loops for the other two.

Clang gives up when the string gets longer, though. See
https://gcc.godbolt.org/z/54j3zr6e6. That indicates that it gave up on guessing
the loop run and would do better if the intrinsic were present.

[r14-9155 Regression] FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess errors) on Linux/x86_64

2024-02-24 Thread haochen.jiang
On Linux/x86_64,

8a16e06da97f51574cfad17e2cece2e58571305d is the first bad commit
commit 8a16e06da97f51574cfad17e2cece2e58571305d
Author: Richard Sandiford 
Date:   Fri Feb 23 14:12:54 2024 +

aarch64: Add missing early-ra bookkeeping [PR113295]

caused

FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess errors)

with GCC configured with

../../gcc/configure 
--prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-9155/usr 
--enable-clocale=gnu --with-system-zlib --with-demangler-in-ld 
--with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl 
--enable-libmpx x86_64-linux --disable-bootstrap

To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="rtl.exp=gcc.dg/rtl/aarch64/pr113295-1.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="rtl.exp=gcc.dg/rtl/aarch64/pr113295-1.c 
--target_board='unix{-m32\ -march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="rtl.exp=gcc.dg/rtl/aarch64/pr113295-1.c 
--target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="rtl.exp=gcc.dg/rtl/aarch64/pr113295-1.c 
--target_board='unix{-m64\ -march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at haochen dot jiang at intel.com.)
(If you met problems with cascadelake related, disabling AVX512F in command 
line might save that.)
(However, please make sure that there is no potential problems with AVX512.)


[Bug rtl-optimization/114062] "GNAT BUG DETECTED" 13.2.0 (hppa-linux-gnu) in remove, at alloc-pool.h:437

2024-02-24 Thread danglin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114062

John David Anglin  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |INVALID

--- Comment #4 from John David Anglin  ---
Not reproducible.

[Bug rtl-optimization/114087] New: RISC-V optimization on checking certain bits set ((x & mask) == val)

2024-02-24 Thread Explorer09 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114087

Bug ID: 114087
   Summary: RISC-V optimization on checking certain bits set ((x &
mask) == val)
   Product: gcc
   Version: 13.2.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: rtl-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: Explorer09 at gmail dot com
  Target Milestone: ---

It might be common in the C family of languages to check if certain bits are
set in an integer with a code pattern like this:

```c
unsigned int x;
if ((x & 0x3000) == 0x1000) {
  // Do something...
}
```

And I am surprised when compilers like GCC and Clang didn't realize they can
use some bit shifts and inversions of bit masks to save some instructions and
emit smaller code.

Here I present 3 possible optimizations that could be implemented in a
compiler. Two of them can apply not only to RISC-V, but other RISC
architectures as well (except ARM, perhaps). The last one is specific to RISC-V
due to the 20-bit immediate operand of its "lui" (load upper immediate)
instruction.

The bit masks should be compile time constants, and the "-Os" flag (optimize
for size) is assumed.

### Test code

The example code and constants are crafted specifically for RISC-V.

Each group of `pred*` functions should function identically (if not, please let
me know; it might be a typo).

* The "a" variants are what I commonly write for checking the set bits.
* The "b" variants are what I believe the compiler should ideally transform the
code to. I wrote them to let compiler developers know how the optimization can
be done. (But in practice the "b" code might transform to "a", meaning the
"optimization" direction reversed.)
* The "c" variants are hacks to make things work. They contain `__asm__
volatile` directives to force GCC or Clang to optimize in the direction I want.
The generated assembly should present what I considered the ideal result.

```c
#include 
#include 
#define POWER_OF_TWO_FACTOR(x) ((x) & -(x))

// ---
// Example 1: The bitwise AND mask contains lower bits in all ones.
// By converting the bitwise AND into a bitwise OR, an "addi"
// instruction can be saved.
// (This might conflict with optimizations utilizing RISC-V "bclri"
// instruction; use one or the other.)
// (In ARM there are "bic" instructions already, making this
// optimization useless.)

static uint32_t mask1 = 0x5FFF;
static uint32_t val1  = 0x14501DEF;
// static_assert((mask1 & val1) == val1);
// static_assert((mask1 & 0xFFF) == 0xFFF);

bool pred1a(uint32_t x) {
  return ((x & mask1) == val1);
}

bool pred1b(uint32_t x) {
  return ((x | ~mask1) == (val1 | ~mask1));
}

bool pred1c(uint32_t x) {
  register uint32_t temp = x | ~mask1;
  __asm__ volatile ("" : "+r" (temp));
  return (temp == (val1 | ~mask1));
}

// ---
// Example 2: The bitwise AND mask could fit an 11-bit immediate
// operand of RISC-V "andi" instruction with a help of right
// shifting. (Keep the sign bit of the immediate operand zero.)
// (This kind of optimization could also work with other RISC 
// architectures, except ARM.)

static uint32_t mask2 = 0x5550;
static uint32_t val2  = 0x1450;

// static_assert(mask2 != 0);
// static_assert((mask2 & val2) == val2);
// static_assert(mask2 / POWER_OF_TWO_FACTOR(mask2) <= 0x7FF);

bool pred2a(uint32_t x) {
  return ((x & mask2) == val2);
}

bool pred2b(uint32_t x) {
  uint32_t factor = POWER_OF_TWO_FACTOR(mask2);
  return ((x >> __builtin_ctz(factor)) & (mask2 / factor))
== (val2 / factor);
}

bool pred2c(uint32_t x) {
  uint32_t factor = POWER_OF_TWO_FACTOR(mask2);
  register uint32_t temp = x >> 20;
  __asm__ volatile ("" : "+r" (temp));
  return (temp & 0x555) == 0x145;
}

// ---
// Example 3: The bitwise AND mask could fit a 20-bit immediate
// operand of RISC-V "lui" instruction.
// Only RISC-V has this 20-bit immediate "U-type" format, AFAIK.

static uint32_t mask3 = 0x0005;
static uint32_t val3  = 0x00045014;

// static_assert(mask3 / POWER_OF_TWO_FACTOR(mask3) <= 0xF);

bool pred3a(uint32_t x) {
  return ((x & mask3) == val3);
}

bool pred3b(uint32_t x) {
  uint32_t factor = POWER_OF_TWO_FACTOR(mask3);
  return (((x / factor) << 12) & ((mask3 / factor) << 12))
== ((val3 / factor) << 12);
}

bool pred3c(uint32_t x) {
  uint32_t factor = POWER_OF_TWO_FACTOR(mask3);
  register uint32_t temp = x << 12;
  __asm__ volatile ("" : "+r" (temp));
  return (temp & ((mask3 / factor) << 12))
== ((val3 / factor) << 12);
}
```

I tested the code in the Compiler Explorer (godbolt.org).

### Generated assembly (for reference only)

```
pred1a:
li  a5,1431658496
addia5,a5,-1
and a0,a0,a5
li  a5,340795392

[Bug sanitizer/97696] ICE since ASAN_MARK does not handle poly_int sized varibales

2024-02-24 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97696

--- Comment #3 from Richard Sandiford  ---
Created attachment 57520
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57520=edit
Candidate patch

The attached patch seems to fix it.  I'm taking next week off, but I'll run the
patch through proper testing when I get back.

[Bug sanitizer/97696] ICE since ASAN_MARK does not handle poly_int sized varibales

2024-02-24 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97696

Richard Sandiford  changed:

   What|Removed |Added

 CC||rsandifo at gcc dot gnu.org
 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |rsandifo at gcc dot 
gnu.org

Re: [PATCH RFA] build: drop target libs from LD_LIBRARY_PATH [PR105688]

2024-02-24 Thread Alexandre Oliva
On Feb 23, 2024, Jason Merrill  wrote:

> The problem, as you say, comes when you want to both bootstrap and
> build tools that aren't involved in the bootstrap process.

It's more visible there, because those don't actively refrain from
linking dynamically with libstdc++.  But even bootstrapped matter that
involves exceptions would have to link in libgcc_s, and that would bring
about the same sort of issue.

> To support that perhaps we want POSTBOOTSTRAP_HOST_EXPORTS for host
> modules without the bootstrap tag, and add the TARGET_LIB_PATH
> directories there?

That would be welcome, but it doesn't really address the problem, does
it?  Namely, the problem that we may face two different kinds of
scenarios, and each one calls for an opposite solution.

1. system tools used in the build depend on system libraries that are
newer than the ones we're about to build

This is the scenario that you're attempting to address with these
patches.  The problem here is that the libraries being built are older
than the system libraries, and and system tools won't run if dynamically
linked with the older libraries about to be built.

2. the libraries we're about to build are newer than corresponding
system libraries, if any

This is the scenario that the current build system aims for.  Any build
tools that rely on older system libraries are likely to work just as
well with the newly built libraries.  Any newly built libraries linked
into programs used and run as part of the build have to be present in
LD_LIBRARY_PATH lest we end up trying to use the older system libraries,
which may seem to work in some settings, but is bound to break if the
differences are large enough.


For maximum clarity, consider a bootstrap with LTO using a linker
plugin.  The linker plugin is built against the newly-built libraries.
The linker that attempts to load the plugin also requires the same
libraries.  Do you see how tending to 1. breaks 2., and vice-versa?


Now add ASAN to the picture, for another set of newly-built libraries
used during bootstrap.  Also use a prebuilt linker with ASAN enabled,
for maximum clarity of the problem I'm getting at.  Do you see the
problem?


Do you agree that patching the build system to solve a problem in
scenario 1. *will* cause problems in scenario 2., *unless* the fix can
distinguish the two scenarios and behave accordingly, but that getting
that right is triky and error prone?

Do you agree that, until we get there, it's probably better to optimize
for the more common scenario?

Do you agree that the more common scenario is probably 2.?

Do you agree that, until we get a solution that works for both 1. and 2.
automatically, offering a reasonably simple workaround for 1., while
aiming to work for 2., would be a desirable stopgap?

Do you agree that adding support for users to prepend directories to the
search path, enabling them to preempt build libraries with (symlinks
to?) select newer system libraries, and documenting situations in which
this could be needed, is a reasonably simple and desirable stopgap that
enables 1. to work while defaulting to the presumed more common case 2.?

Here's a patchlet that shows the crux of what I have in mind: (nevermind
we'd make the change elsewhere, document it further elsewhere, set an
empty default and arrange to passed down to sub-$(MAKE)s)

diff --git a/Makefile.in b/Makefile.in
index edb0c8a9a427f..10c7646ef98c4 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -771,7 +771,12 @@ TARGET_LIB_PATH_libatomic = 
$$r/$(TARGET_SUBDIR)/libatomic/.libs:
 
 # This is the list of directories that may be needed in RPATH_ENVVAR
 # so that programs built for the host machine work.
-HOST_LIB_PATH = 
$(HOST_LIB_PATH_gmp)$(HOST_LIB_PATH_mpfr)$(HOST_LIB_PATH_mpc)$(HOST_LIB_PATH_isl)
+# Users may set PREEMPT_HOST_LIB_PATH to a directory holding symlinks
+# to system libraries required by build tools (say the linker) that
+# are newer (as in higher-versioned) than the corresponding libraries
+# we're building.  If older libraries were to override the newer
+# system libraries, that could prevent the build tools from running.
+HOST_LIB_PATH = 
$(PREEMPT_HOST_LIB_PATH):$(HOST_LIB_PATH_gmp)$(HOST_LIB_PATH_mpfr)$(HOST_LIB_PATH_mpc)$(HOST_LIB_PATH_isl)
 
 # Define HOST_LIB_PATH_gcc here, for the sake of TARGET_LIB_PATH, ouch
 @if gcc


Now, for a more general solution that doesn't require user intervention,
configure could go about looking for system libraries in the default
search path, or in RPATH_ENV_VAR, that share the soname with those we're
about to build, identify preexisting libraries that are newer than those
we're about to build, populate a build-tree directory with symlinks to
them, and default PREEMPT_HOST_LIB_PATH to that directory.

WDYT?


-- 
Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/
   Free Software Activist   GNU Toolchain Engineer
More tolerance and less prejudice are key for inclusion and diversity
Excluding neuro-others for 

[Bug tree-optimization/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #7 from Jakub Jelinek  ---
Now, suppose we optimize the (0x >> x) & 1 case etc. provided suitable
range
of x to x & 1.
For
int
bar3 (int e)
{
  if (e <= 15U)
return e & 1;
  else
return 0;
}
phiopt optimizes this into
  return e & 1 & (e <= 15U);
so, guess we want another match.pd optimization which would turn that into e &
-15.

[Bug middle-end/113205] [14 Regression] internal compiler error: in backward_pass, at tree-vect-slp.cc:5346 since r14-3220

2024-02-24 Thread rsandifo at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113205

Richard Sandiford  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #14 from Richard Sandiford  ---
Finally fixed.

[Bug middle-end/113205] [14 Regression] internal compiler error: in backward_pass, at tree-vect-slp.cc:5346 since r14-3220

2024-02-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113205

--- Comment #13 from GCC Commits  ---
The trunk branch has been updated by Richard Sandiford :

https://gcc.gnu.org/g:0394ae31e832c5303f3b4aad9c66710a30c097f0

commit r14-9165-g0394ae31e832c5303f3b4aad9c66710a30c097f0
Author: Richard Sandiford 
Date:   Sat Feb 24 11:58:22 2024 +

vect: Tighten check for impossible SLP layouts [PR113205]

During its forward pass, the SLP layout code tries to calculate
the cost of a layout change on an incoming edge.  This is taken
as the minimum of two costs: one in which the source partition
keeps its current layout (chosen earlier during the pass) and
one in which the source partition switches to the new layout.
The latter can sometimes be arranged by the backward pass.

If only one of the costs is valid, the other cost was ignored.
But the PR shows that this is not safe.  If the source partition
has layout 0 (the normal layout), we have to be prepared to handle
the case in which that ends up being the only valid layout.

Other code already accounts for this restriction, e.g. see
the code starting with:

/* Reject the layout if it would make layout 0 impossible
   for later partitions.  This amounts to testing that the
   target supports reversing the layout change on edges
   to later partitions.

gcc/
PR tree-optimization/113205
* tree-vect-slp.cc (vect_optimize_slp_pass::forward_cost): Reject
the proposed layout if it does not allow a source partition with
layout 2 to keep that layout.

gcc/testsuite/
PR tree-optimization/113205
* gcc.dg/torture/pr113205.c: New test.

[Bug tree-optimization/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

Jakub Jelinek  changed:

   What|Removed |Added

  Component|middle-end  |tree-optimization

--- Comment #6 from Jakub Jelinek  ---
(In reply to Jan Schultke from comment #5)
> Well, it's not quite equivalent to either of the bit-shifts we've posted.

The #c4 foo2/bar2 are functionally equivalent to #c4 foo/bar, it is what gcc
actually emits for the latter.
x > 6 ? 0 : ((85 >> x) & 1)
isn't functionally equivalent to anything mentioned so far here, as it handles
negative values differently.

[Bug middle-end/113988] during GIMPLE pass: bitintlower: internal compiler error: in lower_stmt, at gimple-lower-bitint.cc:5470

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113988
Bug 113988 depends on bug 114073, which changed state.

Bug 114073 Summary: during GIMPLE pass: bitintlower: internal compiler error: 
in lower_stmt, at gimple-lower-bitint.cc:5530
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114073

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

[Bug middle-end/114073] during GIMPLE pass: bitintlower: internal compiler error: in lower_stmt, at gimple-lower-bitint.cc:5530

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114073

Jakub Jelinek  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #3 from Jakub Jelinek  ---
Fixed.

[Bug middle-end/114073] during GIMPLE pass: bitintlower: internal compiler error: in lower_stmt, at gimple-lower-bitint.cc:5530

2024-02-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114073

--- Comment #2 from GCC Commits  ---
The master branch has been updated by Jakub Jelinek :

https://gcc.gnu.org/g:5e7a176e88a2a37434cef9b1b6a37a4f8274854a

commit r14-9163-g5e7a176e88a2a37434cef9b1b6a37a4f8274854a
Author: Jakub Jelinek 
Date:   Sat Feb 24 12:44:34 2024 +0100

bitint: Handle VIEW_CONVERT_EXPRs between large/huge BITINT_TYPEs and
VECTOR/COMPLEX_TYPE etc. [PR114073]

The following patch implements support for VIEW_CONVERT_EXPRs from/to
large/huge _BitInt to/from vector or complex types or anything else but
integral/pointer types which doesn't need to live in memory.

2024-02-24  Jakub Jelinek  

PR middle-end/114073
* gimple-lower-bitint.cc (bitint_large_huge::lower_stmt): Handle
VIEW_CONVERT_EXPRs between large/huge _BitInt and
non-integer/pointer
types like vector or complex types.
(gimple_lower_bitint): Don't merge VIEW_CONVERT_EXPRs to
non-integral
types.  Fix up VIEW_CONVERT_EXPR handling.  Allow merging
VIEW_CONVERT_EXPR from non-integral/pointer types with a store.

* gcc.dg/bitint-93.c: New test.

Re: [PATCH] Use HOST_WIDE_INT_{C,UC,0,0U,1,1U} macros some more

2024-02-24 Thread Richard Biener



> Am 24.02.2024 um 08:44 schrieb Jakub Jelinek :
> 
> Hi!
> 
> I've searched for some uses of (HOST_WIDE_INT) constant or (unsigned
> HOST_WIDE_INT) constant and turned them into uses of the appropriate
> macros.
> THere are quite a few cases in non-i386 backends but I've left that out
> for now.
> The only behavior change is in build_replicated_int_cst where the
> left shift was done in HOST_WIDE_INT type but assigned to unsigned
> HOST_WIDE_INT, which I've changed into unsigned HOST_WIDE_INT shift.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-02-24  Jakub Jelinek  
> 
> gcc/
>* builtins.cc (fold_builtin_isascii): Use HOST_WIDE_INT_UC macro.
>* combine.cc (make_field_assignment): Use HOST_WIDE_INT_1U macro.
>* double-int.cc (double_int::mask): Use HOST_WIDE_INT_UC macros.
>* genattrtab.cc (attr_alt_complement): Use HOST_WIDE_INT_1 macro.
>(mk_attr_alt): Use HOST_WIDE_INT_0 macro.
>* genautomata.cc (bitmap_set_bit, CLEAR_BIT): Use HOST_WIDE_INT_1
>macros.
>* ipa-strub.cc (can_strub_internally_p): Use HOST_WIDE_INT_1 macro.
>* loop-iv.cc (implies_p): Use HOST_WIDE_INT_1U macro.
>* pretty-print.cc (test_pp_format): Use HOST_WIDE_INT_C and
>HOST_WIDE_INT_UC macros.
>* rtlanal.cc (nonzero_bits1): Use HOST_WIDE_INT_UC macro.
>* tree.cc (build_replicated_int_cst): Use HOST_WIDE_INT_1U macro.
>* tree.h (DECL_OFFSET_ALIGN): Use HOST_WIDE_INT_1U macro.
>* tree-ssa-structalias.cc (dump_varinfo): Use ~HOST_WIDE_INT_0U
>macros.
>* wide-int.cc (divmod_internal_2): Use HOST_WIDE_INT_1U macro.
>* config/i386/constraints.md (define_constraint "L"): Use
>HOST_WIDE_INT_C macro.
>* config/i386/i386.md (movabsq split peephole2): Use HOST_WIDE_INT_C
>macro.
>(movl + movb peephole2): Likewise.
>* config/i386/predicates.md (x86_64_zext_immediate_operand): Likewise.
>(const_32bit_mask): Likewise.
> gcc/objc/
>* objc-encoding.cc (encode_array): Use HOST_WIDE_INT_0 macros.
> 
> --- gcc/builtins.cc.jj2024-02-06 08:43:14.84351 +0100
> +++ gcc/builtins.cc2024-02-23 22:02:48.245611359 +0100
> @@ -9326,7 +9326,7 @@ fold_builtin_isascii (location_t loc, tr
>   /* Transform isascii(c) -> ((c & ~0x7f) == 0).  */
>   arg = fold_build2 (BIT_AND_EXPR, integer_type_node, arg,
> build_int_cst (integer_type_node,
> -~ (unsigned HOST_WIDE_INT) 0x7f));
> +~ HOST_WIDE_INT_UC (0x7f)));
>   return fold_build2_loc (loc, EQ_EXPR, integer_type_node,
>  arg, integer_zero_node);
> }
> --- gcc/combine.cc.jj2024-01-03 11:51:34.028696534 +0100
> +++ gcc/combine.cc2024-02-23 22:03:36.895923405 +0100
> @@ -9745,7 +9745,7 @@ make_field_assignment (rtx x)
>   if (width >= HOST_BITS_PER_WIDE_INT)
>ze_mask = -1;
>   else
> -ze_mask = ((unsigned HOST_WIDE_INT)1 << width) - 1;
> +ze_mask = (HOST_WIDE_INT_1U << width) - 1;
> 
>   /* Complete overlap.  We can remove the source AND.  */
>   if ((and_mask & ze_mask) == ze_mask)
> --- gcc/double-int.cc.jj2024-01-03 11:51:42.086584698 +0100
> +++ gcc/double-int.cc2024-02-23 22:04:30.586164187 +0100
> @@ -671,14 +671,14 @@ double_int::mask (unsigned prec)
>   if (prec > HOST_BITS_PER_WIDE_INT)
> {
>   prec -= HOST_BITS_PER_WIDE_INT;
> -  m = ((unsigned HOST_WIDE_INT) 2 << (prec - 1)) - 1;
> +  m = (HOST_WIDE_INT_UC (2) << (prec - 1)) - 1;
>   mask.high = (HOST_WIDE_INT) m;
>   mask.low = ALL_ONES;
> }
>   else
> {
>   mask.high = 0;
> -  mask.low = prec ? ((unsigned HOST_WIDE_INT) 2 << (prec - 1)) - 1 : 0;
> +  mask.low = prec ? (HOST_WIDE_INT_UC (2) << (prec - 1)) - 1 : 0;
> }
> 
>   return mask;
> --- gcc/genattrtab.cc.jj2024-01-03 11:51:38.125639672 +0100
> +++ gcc/genattrtab.cc2024-02-23 22:05:38.043210294 +0100
> @@ -2392,7 +2392,7 @@ static rtx
> attr_alt_complement (rtx s)
> {
>   return attr_rtx (EQ_ATTR_ALT, XWINT (s, 0),
> -   ((HOST_WIDE_INT) 1) - XWINT (s, 1));
> +   HOST_WIDE_INT_1 - XWINT (s, 1));
> }
> 
> /* Return EQ_ATTR_ALT expression representing set containing elements set
> @@ -2401,7 +2401,7 @@ attr_alt_complement (rtx s)
> static rtx
> mk_attr_alt (alternative_mask e)
> {
> -  return attr_rtx (EQ_ATTR_ALT, (HOST_WIDE_INT) e, (HOST_WIDE_INT) 0);
> +  return attr_rtx (EQ_ATTR_ALT, (HOST_WIDE_INT) e, HOST_WIDE_INT_0);
> }
> 
> /* Given an expression, see if it can be simplified for a particular insn
> --- gcc/genautomata.cc.jj2024-01-03 11:51:32.524717408 +0100
> +++ gcc/genautomata.cc2024-02-23 22:07:04.667985357 +0100
> @@ -3416,13 +3416,13 @@ finish_alt_states (void)
> 
> /* Set bit number bitno in the bit string.  The macro is not side
>effect proof.  */
> -#define bitmap_set_bit(bitstring, bitno)  \
> +#define bitmap_set_bit(bitstring, bitno)  \
>   ((bitstring)[(bitno) / (sizeof 

[Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread janschultke at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #5 from Jan Schultke  ---
Well, it's not quite equivalent to either of the bit-shifts we've posted. To
account for shifting more than the operand size, it would be:

bool foo (int x)
{
  return x > 6 ? 0 : ((85 >> x) & 1);
}


This is exactly what GCC does and the branch can be explained by this range
check.

So I guess GCC already does optimize this to a bit-vector, it just doesn't find
the optimization to:

bool foo(int x)
{
return (x & -7) == 0;
}


This is very specific to this particular switch statement though. You could do
better than having a branch if the hardware supported a saturating shift, but
probably not on x86_64.

Nevermind that; if anything, this isn't middle-end.

Re: [PATCH] bitint: Handle VIEW_CONVERT_EXPRs between large/huge BITINT_TYPEs and VECTOR/COMPLEX_TYPE etc. [PR114073]

2024-02-24 Thread Richard Biener



> Am 24.02.2024 um 08:40 schrieb Jakub Jelinek :
> 
> Hi!
> 
> The following patch implements support for VIEW_CONVERT_EXPRs from/to
> large/huge _BitInt to/from vector or complex types or anything else but
> integral/pointer types which doesn't need to live in memory.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Ok

Richard 

> 2024-02-24  Jakub Jelinek  
> 
>PR middle-end/114073
>* gimple-lower-bitint.cc (bitint_large_huge::lower_stmt): Handle
>VIEW_CONVERT_EXPRs between large/huge _BitInt and non-integer/pointer
>types like vector or complex types.
>(gimple_lower_bitint): Don't merge VIEW_CONVERT_EXPRs to non-integral
>types.  Fix up VIEW_CONVERT_EXPR handling.  Allow merging
>VIEW_CONVERT_EXPR from non-integral/pointer types with a store.
> 
>* gcc.dg/bitint-93.c: New test.
> 
> --- gcc/gimple-lower-bitint.cc.jj2024-02-23 11:36:06.977015730 +0100
> +++ gcc/gimple-lower-bitint.cc2024-02-23 18:21:09.282751377 +0100
> @@ -5305,27 +5305,21 @@ bitint_large_huge::lower_stmt (gimple *s
>   else if (TREE_CODE (TREE_TYPE (rhs1)) == BITINT_TYPE
>   && bitint_precision_kind (TREE_TYPE (rhs1)) >= bitint_prec_large
>   && (INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> -   || POINTER_TYPE_P (TREE_TYPE (lhs
> +   || POINTER_TYPE_P (TREE_TYPE (lhs))
> +   || gimple_assign_rhs_code (stmt) == VIEW_CONVERT_EXPR))
>{
>  final_cast_p = true;
> -  if (TREE_CODE (TREE_TYPE (lhs)) == INTEGER_TYPE
> -  && TYPE_PRECISION (TREE_TYPE (lhs)) > MAX_FIXED_MODE_SIZE
> +  if (((TREE_CODE (TREE_TYPE (lhs)) == INTEGER_TYPE
> +&& TYPE_PRECISION (TREE_TYPE (lhs)) > MAX_FIXED_MODE_SIZE)
> +   || (!INTEGRAL_TYPE_P (TREE_TYPE (lhs))
> +   && !POINTER_TYPE_P (TREE_TYPE (lhs
>  && gimple_assign_rhs_code (stmt) == VIEW_CONVERT_EXPR)
>{
>  /* Handle VIEW_CONVERT_EXPRs to not generally supported
> huge INTEGER_TYPEs like uint256_t or uint512_t.  These
> are usually emitted from memcpy folding and backends
> - support moves with them but that is usually it.  */
> -  if (TREE_CODE (rhs1) == INTEGER_CST)
> -{
> -  rhs1 = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (lhs),
> - rhs1);
> -  gcc_assert (rhs1 && TREE_CODE (rhs1) == INTEGER_CST);
> -  gimple_assign_set_rhs1 (stmt, rhs1);
> -  gimple_assign_set_rhs_code (stmt, INTEGER_CST);
> -  update_stmt (stmt);
> -  return;
> -}
> + support moves with them but that is usually it.
> + Similarly handle VCEs to vector/complex types etc.  */
>  gcc_assert (TREE_CODE (rhs1) == SSA_NAME);
>  if (SSA_NAME_IS_DEFAULT_DEF (rhs1)
>  && (!SSA_NAME_VAR (rhs1) || VAR_P (SSA_NAME_VAR (rhs1
> @@ -5376,6 +5370,18 @@ bitint_large_huge::lower_stmt (gimple *s
>}
>}
>}
> +  else if (TREE_CODE (TREE_TYPE (lhs)) == BITINT_TYPE
> +   && bitint_precision_kind (TREE_TYPE (lhs)) >= bitint_prec_large
> +   && !INTEGRAL_TYPE_P (TREE_TYPE (rhs1))
> +   && !POINTER_TYPE_P (TREE_TYPE (rhs1))
> +   && gimple_assign_rhs_code (stmt) == VIEW_CONVERT_EXPR)
> +{
> +  int part = var_to_partition (m_map, lhs);
> +  gcc_assert (m_vars[part] != NULL_TREE);
> +  lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (rhs1), m_vars[part]);
> +  insert_before (gimple_build_assign (lhs, rhs1));
> +  return;
> +}
> }
>   if (gimple_store_p (stmt))
> {
> @@ -5411,6 +5417,28 @@ bitint_large_huge::lower_stmt (gimple *s
>  case IMAGPART_EXPR:
>lower_cplxpart_stmt (lhs, g);
>goto handled;
> +  case VIEW_CONVERT_EXPR:
> +{
> +  tree rhs1 = gimple_assign_rhs1 (g);
> +  rhs1 = TREE_OPERAND (rhs1, 0);
> +  if (!INTEGRAL_TYPE_P (TREE_TYPE (rhs1))
> +  && !POINTER_TYPE_P (TREE_TYPE (rhs1)))
> +{
> +  tree ltype = TREE_TYPE (rhs1);
> +  addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (lhs));
> +  ltype
> += build_qualified_type (ltype,
> +TYPE_QUALS (TREE_TYPE (lhs))
> +| ENCODE_QUAL_ADDR_SPACE (as));
> +  lhs = build1 (VIEW_CONVERT_EXPR, ltype, lhs);
> +  gimple_assign_set_lhs (stmt, lhs);
> +  gimple_assign_set_rhs1 (stmt, rhs1);
> +  gimple_assign_set_rhs_code (stmt, TREE_CODE (rhs1));
> +  update_stmt (stmt);
> +  return;
> +}
> +}
> +break;
>  default:
>break;
>  }
> @@ -6235,6 +6263,14 @@ gimple_lower_bitint (void)
>  if (gimple_assign_cast_p (SSA_NAME_DEF_STMT (s)))
>{
>  tree rhs1 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (s));
> +  if (TREE_CODE (rhs1) == VIEW_CONVERT_EXPR)
> +{
> +

Re: [PATCH] vect: Tighten check for impossible SLP layouts [PR113205]

2024-02-24 Thread Richard Biener



> Am 24.02.2024 um 11:06 schrieb Richard Sandiford :
> 
> During its forward pass, the SLP layout code tries to calculate
> the cost of a layout change on an incoming edge.  This is taken
> as the minimum of two costs: one in which the source partition
> keeps its current layout (chosen earlier during the pass) and
> one in which the source partition switches to the new layout.
> The latter can sometimes be arranged by the backward pass.
> 
> If only one of the costs is valid, the other cost was ignored.
> But the PR shows that this is not safe.  If the source partition
> has layout 0 (the normal layout), we have to be prepared to handle
> the case in which that ends up being the only valid layout.
> 
> Other code already accounts for this restriction, e.g. see
> the code starting with:
> 
>/* Reject the layout if it would make layout 0 impossible
>   for later partitions.  This amounts to testing that the
>   target supports reversing the layout change on edges
>   to later partitions.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Ok

Thanks,
Richard 

> Richard
> 
> 
> gcc/
>PR tree-optimization/113205
>* tree-vect-slp.cc (vect_optimize_slp_pass::forward_cost): Reject
>the proposed layout if it does not allow a source partition with
>layout 2 to keep that layout.
> 
> gcc/testsuite/
>PR tree-optimization/113205
>* gcc.dg/torture/pr113205.c: New test.
> ---
> gcc/testsuite/gcc.dg/torture/pr113205.c | 19 +++
> gcc/tree-vect-slp.cc|  4 
> 2 files changed, 23 insertions(+)
> create mode 100644 gcc/testsuite/gcc.dg/torture/pr113205.c
> 
> diff --git a/gcc/testsuite/gcc.dg/torture/pr113205.c 
> b/gcc/testsuite/gcc.dg/torture/pr113205.c
> new file mode 100644
> index 000..edfba7fcd0e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr113205.c
> @@ -0,0 +1,19 @@
> +char a;
> +char *b, *c;
> +int d, e, f, g, h;
> +int *i;
> +
> +void
> +foo (void)
> +{
> +  unsigned p;
> +  d = i[0];
> +  e = i[1];
> +  f = i[2];
> +  g = i[3];
> +  p = d * b[0];
> +  p += f * c[h];
> +  p += e * b[h];
> +  p += g * c[h];
> +  a = (p + 8000) >> (__SIZEOF_INT__ * __CHAR_BIT__ / 2);
> +}
> diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
> index 7cf9504398c..895f4f7fb6b 100644
> --- a/gcc/tree-vect-slp.cc
> +++ b/gcc/tree-vect-slp.cc
> @@ -5034,6 +5034,10 @@ vect_optimize_slp_pass::forward_cost (graph_edge *ud, 
> unsigned int from_node_i,
>   cost.split (from_partition.out_degree);
>   cost.add_serial_cost (edge_cost);
> }
> +  else if (from_partition.layout == 0)
> +/* We must allow the source partition to have layout 0 as a fallback,
> +   in case all other options turn out to be impossible.  */
> +return cost;
> 
>   /* Take the minimum of that cost and the cost that applies if
>  FROM_PARTITION instead switches to TO_LAYOUT_I.  */
> --
> 2.25.1
> 


[Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #4 from Jakub Jelinek  ---
But sure, confirmed for both:

int
foo (int e)
{
  switch (e)
{
case 1:
case 3:
case 5:
case 7:
case 9:
case 11:
case 13:
  return 1;
default:
  return 0;
}
}

int
bar (int e)
{
  switch (e)
{
case 1:
case 3:
case 5:
case 7:
case 9:
case 11:
case 13:
case 15:
  return 1;
default:
  return 0;
}
}

where in foo because we emit the guarding
cmpl$13, %edi
ja  .L1
we could just simplify it to andl $1 when <= 13, and the bar case indeed can be
done
by (e & -15) != 0;
Now, the question is if either of these optimizations should be done in the
switch lowering, or if we should do it elsewhere where it would optimize also
hand written code like that, if user writes it as
int
foo2 (int e)
{
  if (e <= 13U)
return (10922 >> e) & 1;
  else
return 0;
}

int
bar2 (int e)
{
  if (e <= 15U)
return (43690 >> e) & 1;
  else
return 0;
}
Looking at clang, it can optimize bar, it can't optimize foo (uses switch table
rather than shift, that is worse than what gcc emits).  And emits pretty much
what gcc emits for foo2/bar2.
Perhaps phiopt could handle this for the bar2 case and match.pd using range
info for foo2?
Next question is what should be done if the 2 values aren't 1 and 0, but 0 and
1, or
some cst and cst + 1 or cst and cst - 1 for some arbitrary constant cst, or cst
and 0,
or 0 and cst or cst1 and cst2, whether to emit e.g. a conditional move etc.

RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

2024-02-24 Thread Li, Pan2
Hi Tamar and Richard.

Just try DEF_INTERNAL_INT_EXT_FN as below draft patch, not very sure if
my understanding is correct(mostly reference the popcount implementation) here.
Thanks a lot.

https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646442.html

Pan

-Original Message-
From: Tamar Christina  
Sent: Monday, February 19, 2024 9:05 PM
To: Li, Pan2 ; Richard Biener 
Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang 
; kito.ch...@gmail.com
Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU

> -Original Message-
> From: Li, Pan2 
> Sent: Monday, February 19, 2024 12:59 PM
> To: Tamar Christina ; Richard Biener
> 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> ; kito.ch...@gmail.com
> Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> Thanks Tamar for comments and explanations.
> 
> > I think we should actually do an indirect optab here, because the IFN can 
> > be used
> > to replace the general representation of saturating arithmetic.
> 
> > e.g. the __builtin_add_overflow case in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
> > is inefficient on all targets and so the IFN can always expand to something 
> > that's
> more
> > efficient like the branchless version add_sat2.
> 
> > I think this is why you suggested a new tree code below, but we don't 
> > really need
> > tree-codes for this. It can be done cleaner using the same way as
> DEF_INTERNAL_INT_EXT_FN
> 
> Yes, the backend could choose a branchless(of course we always hate branch for
> performance) code-gen or even better there is one saturation insn.
> Good to learn DEF_INTERNAL_INT_EXT_FN, and will have a try for it.
> 
> > Richard means that there shouldn't be .SAT_ADDU and .SAT_ADDS  and that the
> sign
> > should be determined by the types at expansion time.  i.e. there should 
> > only be
> > .SAT_ADD.
> 
> Got it, my initial idea comes from that we may have two insns for saturation 
> add,
> mostly these insns need to be signed or unsigned.
> For example, slt/sltu in riscv scalar. But I am not very clear about a 
> scenario like this.
> During define_expand in backend, we hit the standard name
> sat_add_3 but can we tell it is signed or not here? AFAIK, we only have 
> QI, HI,
> SI and DI.

Yeah, the way DEF_INTERNAL_SIGNED_OPTAB_FN works is that you give it two optabs,
one for when it's signed and one for when it's unsigned, and the right one is 
picked
automatically during expansion.  But in GIMPLE you'd only have one IFN.

> Maybe I will have the answer after try DEF_INTERNAL_SIGNED_OPTAB_FN, will
> keep you posted.

Awesome, Thanks!

Tamar
> 
> Pan
> 
> -Original Message-
> From: Tamar Christina 
> Sent: Monday, February 19, 2024 4:55 PM
> To: Li, Pan2 ; Richard Biener 
> Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> ; kito.ch...@gmail.com
> Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> 
> Thanks for doing this!
> 
> > -Original Message-
> > From: Li, Pan2 
> > Sent: Monday, February 19, 2024 8:42 AM
> > To: Richard Biener 
> > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang
> > ; kito.ch...@gmail.com; Tamar Christina
> > 
> > Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
> >
> > Thanks Richard for comments.
> >
> > > I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and
> > > the corresponding ssadd/usadd optabs.  There's not much documentation
> > > unfortunately besides the use of gen_*_fixed_libfunc usage where the
> comment
> > > suggests this is used for fixed-point operations.  It looks like arm uses
> > > fractional/accumulator modes for this but for example bfin has ssaddsi3.
> >
> > I find the related description about plus family in GCC internals doc but 
> > it doesn't
> > mention
> > anything about mode m here.
> >
> > (plus:m x y)
> > (ss_plus:m x y)
> > (us_plus:m x y)
> > These three expressions all represent the sum of the values represented by x
> > and y carried out in machine mode m. They diff er in their behavior on 
> > overflow
> > of integer modes. plus wraps round modulo the width of m; ss_plus saturates
> > at the maximum signed value representable in m; us_plus saturates at the
> > maximum unsigned value.
> >
> > > The natural thing is to use direct optab internal functions (that's what 
> > > you
> > > basically did, but you added a new optab, IMO without good reason).
> 
> I think we should actually do an indirect optab here, because the IFN can be 
> used
> to replace the general representation of saturating arithmetic.
> 
> e.g. the __builtin_add_overflow case in
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600
> is inefficient on all targets and so the IFN can always expand to something 
> that's
> more
> efficient like the branchless version add_sat2.
> 
> I think this is why you suggested a new tree code below, but we don't really 
> need
> tree-codes for 

[PATCH v2] Draft|Internal-fn: Introduce internal fn saturation US_PLUS

2024-02-24 Thread pan2 . li
From: Pan Li 

Hi Richard & Tamar,

Try the DEF_INTERNAL_INT_EXT_FN as your suggestion.  By mapping
us_plus$a3 to the RTL representation (us_plus:m x y) in optabs.def.
And then expand_US_PLUS in internal-fn.cc.  Not very sure if my
understanding is correct for DEF_INTERNAL_INT_EXT_FN.

I am not sure if we still need DEF_INTERNAL_SIGNED_OPTAB_FN here, given
the RTL representation has (ss_plus:m x y) and (us_plus:m x y) already.

Note this patch is a draft for validation, no test are invovled here.

gcc/ChangeLog:

* builtins.def (BUILT_IN_US_PLUS): Add builtin def.
(BUILT_IN_US_PLUSIMAX): Ditto.
(BUILT_IN_US_PLUSL): Ditto.
(BUILT_IN_US_PLUSLL): Ditto.
(BUILT_IN_US_PLUSG): Ditto.
* config/riscv/riscv-protos.h (riscv_expand_us_plus): Add new
func decl for expanding us_plus.
* config/riscv/riscv.cc (riscv_expand_us_plus): Add new func
impl for expanding us_plus.
* config/riscv/riscv.md (us_plus3): Add new pattern impl
us_plus3.
* internal-fn.cc (expand_US_PLUS): Add new func impl to expand
US_PLUS.
* internal-fn.def (US_PLUS): Add new INT_EXT_FN.
* internal-fn.h (expand_US_PLUS): Add new func decl.
* match.pd: Add new simplify pattern for us_plus.
* optabs.def (OPTAB_NL): Add new OPTAB_NL to US_PLUS rtl.

Signed-off-by: Pan Li 
---
 gcc/builtins.def|  7 +
 gcc/config/riscv/riscv-protos.h |  1 +
 gcc/config/riscv/riscv.cc   | 46 +
 gcc/config/riscv/riscv.md   | 11 
 gcc/internal-fn.cc  | 26 +++
 gcc/internal-fn.def |  3 +++
 gcc/internal-fn.h   |  1 +
 gcc/match.pd| 17 
 gcc/optabs.def  |  2 ++
 9 files changed, 114 insertions(+)

diff --git a/gcc/builtins.def b/gcc/builtins.def
index f6f3e104f6a..0777b912cfa 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -1055,6 +1055,13 @@ DEF_GCC_BUILTIN(BUILT_IN_POPCOUNTIMAX, 
"popcountimax", BT_FN_INT_UINTMAX
 DEF_GCC_BUILTIN(BUILT_IN_POPCOUNTL, "popcountl", BT_FN_INT_ULONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_POPCOUNTLL, "popcountll", 
BT_FN_INT_ULONGLONG, ATTR_CONST_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_POPCOUNTG, "popcountg", BT_FN_INT_VAR, 
ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
+
+DEF_GCC_BUILTIN(BUILT_IN_US_PLUS, "us_plus", BT_FN_INT_UINT, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_US_PLUSIMAX, "us_plusimax", 
BT_FN_INT_UINTMAX, ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_US_PLUSL, "us_plusl", BT_FN_INT_ULONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_US_PLUSLL, "us_plusll", BT_FN_INT_ULONGLONG, 
ATTR_CONST_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_US_PLUSG, "us_plusg", BT_FN_INT_VAR, 
ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF)
+
 DEF_EXT_LIB_BUILTIN(BUILT_IN_POSIX_MEMALIGN, "posix_memalign", 
BT_FN_INT_PTRPTR_SIZE_SIZE, ATTR_NOTHROW_NONNULL_LEAF)
 DEF_GCC_BUILTIN(BUILT_IN_PREFETCH, "prefetch", 
BT_FN_VOID_CONST_PTR_VAR, ATTR_NOVOPS_LEAF_LIST)
 DEF_LIB_BUILTIN(BUILT_IN_REALLOC, "realloc", BT_FN_PTR_PTR_SIZE, 
ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST)
diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h
index 80efdf2b7e5..ba6086f1f25 100644
--- a/gcc/config/riscv/riscv-protos.h
+++ b/gcc/config/riscv/riscv-protos.h
@@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const tree, 
const char *);
 extern bool
 riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int);
 extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx);
+extern void riscv_expand_us_plus (rtx, rtx, rtx);
 
 #ifdef RTX_CODE
 extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool 
*invert_ptr = 0);
diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 4100abc9dd1..23f08974f07 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -10657,6 +10657,52 @@ riscv_vector_mode_supported_any_target_p (machine_mode)
   return true;
 }
 
+/* Emit insn for the saturation addu, aka (x + y) | - ((x + y) < x).  */
+void
+riscv_expand_us_plus (rtx dest, rtx x, rtx y)
+{
+  machine_mode mode = GET_MODE (dest);
+  rtx pmode_sum = gen_reg_rtx (Pmode);
+  rtx pmode_lt = gen_reg_rtx (Pmode);
+  rtx pmode_x = gen_lowpart (Pmode, x);
+  rtx pmode_y = gen_lowpart (Pmode, y);
+  rtx pmode_dest = gen_reg_rtx (Pmode);
+
+  /* Step-1: sum = x + y  */
+  if (mode == SImode && mode != Pmode)
+{ /* Take addw to avoid the sum truncate.  */
+  rtx simode_sum = gen_reg_rtx (SImode);
+  riscv_emit_binary (PLUS, simode_sum, x, y);
+  emit_move_insn (pmode_sum, gen_lowpart (Pmode, simode_sum));
+}
+  else
+riscv_emit_binary (PLUS, pmode_sum, pmode_x, pmode_y);
+
+  /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.  

[Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #3 from Jakub Jelinek  ---
And the rest boils down to what code to generate for
bool
foo (int x)
{
  return ((682 >> x) & 1);
}
Both that and switch from the #c0 testcase boil down to
  _1 = 682 >> x_2(D);
  _3 = (_Bool) _1;
or
  _6 = 682 >> _4;
  _8 = (_Bool) _6;
in GIMPLE dump.  Now, for the foo above, gcc emits
movl$682, %eax
btl %edi, %eax
setc%al
ret
and clang emits the same:
movl$682, %eax  # imm = 0x2AA
btl %edi, %eax
setb%al
retq
Though, e.g. clang 14 emitted
movl%edi, %ecx
movl$682, %eax  # imm = 0x2AA
shrl%cl, %eax
andb$1, %al
retq
which is longer, dunno what is faster.

[Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread janschultke at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

--- Comment #2 from Jan Schultke  ---
Yeah right, the actual optimal output (which clang finds) is:

> test_switch(E):
>   test edi, -7
>   sete al
>   ret


Testing with -7 also makes sure that the 8-bit and greater are all zero.

Re: [PATCH RFA] build: drop target libs from LD_LIBRARY_PATH [PR105688]

2024-02-24 Thread Gaius Mulley
Iain Sandoe  writes:

> Hi Gaius,
>
>> On 22 Feb 2024, at 18:06, Gaius Mulley  wrote:
>> 
>> Iain Sandoe  writes:
>> 
>>> Right now, AFAIK the only target runtimes used by host tools are
>>> libstdc++, libgcc and libgnat.  I agree that might change with rust -
>>> since the rust folks are talking about using one of the runtimes in
>>> the FE, I am not aware of other language FEs requiring their targte
>>> runtimes to be available to the host tools (adding Gaius in case I
>>> missed something with m2 - which is quite complex inthe
>>> bootstrapping).
>
>> the m2 infrastructure translates and builds gcc/m2/gm2-libs along with
>> gcc/m2/gm2-compiler and uses these objects for cc1gm2, pge, mc etc -
>> rather than the library archives generated from /libgm2
>
> If I understand this (and my builds of the m2 stuff) correctly, this is done
> locally to the builds of the host-side components; in particular not 
> controlled
> by the top level Makefile.{tpl,def}?

Hi Iain,

yes indeed,

> (so that we do not see builds of libgm2 in stage1/2- but only in the
> stage3-target builds?
>
> in which case, this should be outside the scope of the patch here.

regards,
Gaius


[Bug rtl-optimization/114085] Internal (cross) compiler error when building libstdc++ for the H8/300 family

2024-02-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114085

Jonathan Wakely  changed:

   What|Removed |Added

  Component|libstdc++   |rtl-optimization

--- Comment #1 from Jonathan Wakely  ---
If the compiler crashes then that's a compiler bug, not a library bug.

Reassigning to rtl-optimization but that might not be accurate.

[Bug middle-end/114084] ICE: SIGSEGV: infinite recursion in fold_build2_loc / fold_binary_loc with _BitInt(127)

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114084

--- Comment #6 from Jakub Jelinek  ---
As in the following patch, which is supposed to track the origin of the 6
something0
variables in bitmasks, bit 1 means it comes (partly) from op0, bit 2 means it
comes (partly) from op1.
--- gcc/fold-const.cc.jj2024-02-24 09:49:09.098815803 +0100
+++ gcc/fold-const.cc   2024-02-24 11:01:34.266513041 +0100
@@ -11779,6 +11779,15 @@ fold_binary_loc (location_t loc, enum tr
  + (lit0 != 0) + (lit1 != 0)
  + (minus_lit0 != 0) + (minus_lit1 != 0)) > 2)
{
+ int var0_origin = (var0 != 0) + 2 * (var1 != 0);
+ int minus_var0_origin
+   = (minus_var0 != 0) + 2 * (minus_var1 != 0);
+ int con0_origin = (con0 != 0) + 2 * (con1 != 0);
+ int minus_con0_origin
+   = (minus_con0 != 0) + 2 * (minus_con1 != 0);
+ int lit0_origin = (lit0 != 0) + 2 * (lit1 != 0);
+ int minus_lit0_origin
+   = (minus_lit0 != 0) + 2 * (minus_lit1 != 0);
  var0 = associate_trees (loc, var0, var1, code, atype);
  minus_var0 = associate_trees (loc, minus_var0, minus_var1,
code, atype);
@@ -11791,15 +11800,19 @@ fold_binary_loc (location_t loc, enum tr

  if (minus_var0 && var0)
{
+ var0_origin |= minus_var0_origin;
  var0 = associate_trees (loc, var0, minus_var0,
  MINUS_EXPR, atype);
  minus_var0 = 0;
+ minus_var0_origin = 0;
}
  if (minus_con0 && con0)
{
+ con0_origin |= minus_con0_origin;
  con0 = associate_trees (loc, con0, minus_con0,
  MINUS_EXPR, atype);
  minus_con0 = 0;
+ minus_con0_origin = 0;
}

  /* Preserve the MINUS_EXPR if the negative part of the literal is
@@ -11815,15 +11828,19 @@ fold_binary_loc (location_t loc, enum tr
  /* But avoid ending up with only negated parts.  */
  && (var0 || con0))
{
+ minus_lit0_origin |= lit0_origin;
  minus_lit0 = associate_trees (loc, minus_lit0, lit0,
MINUS_EXPR, atype);
  lit0 = 0;
+ lit0_origin = 0;
}
  else
{
+ lit0_origin |= minus_lit0_origin;
  lit0 = associate_trees (loc, lit0, minus_lit0,
  MINUS_EXPR, atype);
  minus_lit0 = 0;
+ minus_lit0_origin = 0;
}
}

@@ -11833,37 +11850,51 @@ fold_binary_loc (location_t loc, enum tr
return NULL_TREE;

  /* Eliminate lit0 and minus_lit0 to con0 and minus_con0. */
+ con0_origin |= lit0_origin;
  con0 = associate_trees (loc, con0, lit0, code, atype);
- lit0 = 0;
+ minus_con0_origin |= minus_lit0_origin;
  minus_con0 = associate_trees (loc, minus_con0, minus_lit0,
code, atype);
- minus_lit0 = 0;

  /* Eliminate minus_con0.  */
  if (minus_con0)
{
  if (con0)
-   con0 = associate_trees (loc, con0, minus_con0,
-   MINUS_EXPR, atype);
+   {
+ con0_origin |= minus_con0_origin;
+ con0 = associate_trees (loc, con0, minus_con0,
+ MINUS_EXPR, atype);
+   }
  else if (var0)
-   var0 = associate_trees (loc, var0, minus_con0,
-   MINUS_EXPR, atype);
+   {
+ var0_origin |= minus_con0_origin;
+ var0 = associate_trees (loc, var0, minus_con0,
+ MINUS_EXPR, atype);
+   }
  else
gcc_unreachable ();
- minus_con0 = 0;
}

  /* Eliminate minus_var0.  */
  if (minus_var0)
{
  if (con0)
-   con0 = associate_trees (loc, con0, minus_var0,
-   MINUS_EXPR, atype);
+   {
+ con0_origin |= minus_var0_origin;
+ con0 = associate_trees (loc, con0, minus_var0,
+ MINUS_EXPR, atype);
+   }
  else

[Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #1 from Jakub Jelinek  ---
  mov eax, edi
  and eax, 1
  ret
seems wrong without -fstrict-enums, one could call test_switch(static_cast 
(9))
and it should return false in that case.

[pushed] Restrict gcc.dg/rtl/aarch64/pr113295-1.c to aarch64

2024-02-24 Thread Richard Sandiford
I keep forgetting that gcc.dg/rtl is the one testsuite where
tests in target-specific subdirectories aren't automatically
restricted to that target.

Pushed as obvious after testing on aarch64-linux-gnu & x86_64-linux-gnu.

Richard


gcc/testsuite/
* gcc.dg/rtl/aarch64/pr113295-1.c: Restrict to aarc64*-*-*.
---
 gcc/testsuite/gcc.dg/rtl/aarch64/pr113295-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/rtl/aarch64/pr113295-1.c 
b/gcc/testsuite/gcc.dg/rtl/aarch64/pr113295-1.c
index 481fb813f61..bf6c5d1f256 100644
--- a/gcc/testsuite/gcc.dg/rtl/aarch64/pr113295-1.c
+++ b/gcc/testsuite/gcc.dg/rtl/aarch64/pr113295-1.c
@@ -1,5 +1,5 @@
+// { dg-do run { target aarch64*-*-* } }
 // { dg-options "-O2" }
-// { dg-do run }
 
 struct data {
   double x;
-- 
2.25.1



[PATCH] vect: Tighten check for impossible SLP layouts [PR113205]

2024-02-24 Thread Richard Sandiford
During its forward pass, the SLP layout code tries to calculate
the cost of a layout change on an incoming edge.  This is taken
as the minimum of two costs: one in which the source partition
keeps its current layout (chosen earlier during the pass) and
one in which the source partition switches to the new layout.
The latter can sometimes be arranged by the backward pass.

If only one of the costs is valid, the other cost was ignored.
But the PR shows that this is not safe.  If the source partition
has layout 0 (the normal layout), we have to be prepared to handle
the case in which that ends up being the only valid layout.

Other code already accounts for this restriction, e.g. see
the code starting with:

/* Reject the layout if it would make layout 0 impossible
   for later partitions.  This amounts to testing that the
   target supports reversing the layout change on edges
   to later partitions.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


gcc/
PR tree-optimization/113205
* tree-vect-slp.cc (vect_optimize_slp_pass::forward_cost): Reject
the proposed layout if it does not allow a source partition with
layout 2 to keep that layout.

gcc/testsuite/
PR tree-optimization/113205
* gcc.dg/torture/pr113205.c: New test.
---
 gcc/testsuite/gcc.dg/torture/pr113205.c | 19 +++
 gcc/tree-vect-slp.cc|  4 
 2 files changed, 23 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/torture/pr113205.c

diff --git a/gcc/testsuite/gcc.dg/torture/pr113205.c 
b/gcc/testsuite/gcc.dg/torture/pr113205.c
new file mode 100644
index 000..edfba7fcd0e
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr113205.c
@@ -0,0 +1,19 @@
+char a;
+char *b, *c;
+int d, e, f, g, h;
+int *i;
+
+void
+foo (void)
+{
+  unsigned p;
+  d = i[0];
+  e = i[1];
+  f = i[2];
+  g = i[3];
+  p = d * b[0];
+  p += f * c[h];
+  p += e * b[h];
+  p += g * c[h];
+  a = (p + 8000) >> (__SIZEOF_INT__ * __CHAR_BIT__ / 2);
+}
diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc
index 7cf9504398c..895f4f7fb6b 100644
--- a/gcc/tree-vect-slp.cc
+++ b/gcc/tree-vect-slp.cc
@@ -5034,6 +5034,10 @@ vect_optimize_slp_pass::forward_cost (graph_edge *ud, 
unsigned int from_node_i,
   cost.split (from_partition.out_degree);
   cost.add_serial_cost (edge_cost);
 }
+  else if (from_partition.layout == 0)
+/* We must allow the source partition to have layout 0 as a fallback,
+   in case all other options turn out to be impossible.  */
+return cost;
 
   /* Take the minimum of that cost and the cost that applies if
  FROM_PARTITION instead switches to TO_LAYOUT_I.  */
-- 
2.25.1



[Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors

2024-02-24 Thread janschultke at googlemail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086

Bug ID: 114086
   Summary: Boolean switches could have a lot better codegen,
possibly utilizing bit-vectors
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: janschultke at googlemail dot com
  Target Milestone: ---

https://godbolt.org/z/3acqbbn3E

enum struct E { a, b, c, d, e, f, g, h };

bool test_switch(E e) {
switch (e) {
case E::a:
case E::c:
case E::e:
case E::g: return true;
default: return false;
}
}


Expected output
===

test_switch(E):
  mov eax, edi
  and eax, 1
  ret



Actual output (-O3)
===

test_switch(E):
  xor eax, eax
  cmp edi, 6
  ja .L1
  mov eax, 85
  bt rax, rdi
  setc al
.L1:
  ret


Explanation
===

Boolean switches in general can be optimized a lot better than what GCC
currently does. Clang does find the optimization to a bitwise AND, although
this may be a big ask.

Generally, contiguous boolean switches (that is, switch statements where all
cases yield a boolean value and the labels are contiguous) can be optimized to
accessing a bit vector.

That switch could have been transformed into:

> return 0b01010101 >> int(e);

[Bug target/114083] Possible word play on conditional/unconditional

2024-02-24 Thread schwab--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114083

--- Comment #5 from Andreas Schwab  ---
Enable conditional-move operations even if unsupported by hardware.

[Bug middle-end/114084] ICE: SIGSEGV: infinite recursion in fold_build2_loc / fold_binary_loc with _BitInt(127)

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114084

--- Comment #5 from Jakub Jelinek  ---
Or perhaps the
  if (ok
  && ((var0 != 0) + (var1 != 0)
  + (minus_var0 != 0) + (minus_var1 != 0)
  + (con0 != 0) + (con1 != 0)
  + (minus_con0 != 0) + (minus_con1 != 0)
  + (lit0 != 0) + (lit1 != 0)
  + (minus_lit0 != 0) + (minus_lit1 != 0)) > 2)
condition should be amended to avoid the reassociation in cases where clearly
nothing good can come out of that.  Which is if the association actually
doesn't reshuffle anything.  (var0 == 0) || (var1 == 0) && (and similarly for
the other 5 pairs) and
(ignoring the minus_* stuff that would need more thoughts on it)
(con0 != 0 && lit0 != 0) || (con1 != 0 && lit1 != 0),
then it reassociates to the original stuff in op0 and original stuff in op1,
no change.  But how the minus_* plays together with this is harder.
Perhaps if lazy we could have a bool var whether there has been any association
between subtrees from original op0 and op1, initially set to false and set if
we associate_trees between something that comes from op0 and op1, and only do
the final
associate_trees if that is the case, because if not, it should be folding of
the individual suboperands, not reassociation.

[Bug middle-end/114084] ICE: SIGSEGV: infinite recursion in fold_build2_loc / fold_binary_loc with _BitInt(127)

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114084

--- Comment #4 from Jakub Jelinek  ---
Though, I must say not really sure why this wouldn't recurse infinitely even
without the casts.

[Bug middle-end/114084] ICE: SIGSEGV: infinite recursion in fold_build2_loc / fold_binary_loc with _BitInt(127)

2024-02-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114084

--- Comment #3 from Jakub Jelinek  ---
Bet the associate code is really unprepared to have unfolded trees around,
which hasn't been the case before delayed folding has been introduced to C and
C++ FEs.
Unfortunately it isn't complete, because e.g. convert_to_integer_1 -> do_narrow
-> fold_build2_loc happily folds.

Anyway, quick fix could be not trying to reassociate TREE_CONSTANT parts:
--- gcc/fold-const.cc.jj2024-01-26 00:07:58.0 +0100
+++ gcc/fold-const.cc   2024-02-24 09:38:40.150808529 +0100
@@ -908,6 +908,8 @@ split_tree (tree in, tree type, enum tre
   if (TREE_CODE (in) == INTEGER_CST || TREE_CODE (in) == REAL_CST
   || TREE_CODE (in) == FIXED_CST)
 *litp = in;
+  else if (TREE_CONSTANT (in))
+*conp = in;
   else if (TREE_CODE (in) == code
   || ((! FLOAT_TYPE_P (TREE_TYPE (in)) || flag_associative_math)
   && ! SAT_FIXED_POINT_TYPE_P (TREE_TYPE (in))
@@ -956,8 +958,6 @@ split_tree (tree in, tree type, enum tre
   if (neg_var_p && var)
*minus_varp = var, var = 0;
 }
-  else if (TREE_CONSTANT (in))
-*conp = in;
   else if (TREE_CODE (in) == BIT_NOT_EXPR
   && code == PLUS_EXPR)
 {

So, the problem happens on
typedef unsigned _BitInt (__SIZEOF_INT__ * __CHAR_BIT__ - 1) T;
T a, b;

void
foo (void)
{
  b = (T) ((a | (-1U >> 1)) >> 1 | (a | 5) << 4);
}
when fold_binary_loc is called on (unsigned _BitInt(31)) a << 4 | 80 and
(unsigned _BitInt(31)) (2147483647 >> 1), but the important part is that
the op0 has the unsigned _BitInt(31) type, while op1 is NOP_EXPR to that type
from
RSHIFT_EXPR done on T type (the typedef).
Soon BIT_IOR_EXPR folding is called on
(unsigned _BitInt(31)) a << 4 and 2147483647 >> 1 | 80 where the latter is all
in T type (fold_binary_loc does STRIP_NOPS).  Because split_tree prefers same
code over TREE_CONSTANT, this splits it into the LSHIFT_EXPR var0, RSHIFT_EXPR
con1 (because it is TREE_CONSTANT) and the T type 80 literal in lit1,
everything else is NULL.  As there are 3 objects, it reassociates.  We first
associate_tree the 0 vs. 1 cases, but that just moves the *1 into *0 because
their counterparts are NULL.
Both the RSHIFT_EXPR and INTEGER_CST 80 have T type but atype is the
build_bitint_type
non-typedef type, so
11835 /* Eliminate lit0 and minus_lit0 to con0 and minus_con0.
*/
11836 con0 = associate_trees (loc, con0, lit0, code, atype);
returns NOP_EXPR of the RSHIFT_EXPR | INTEGER_CST.
And then we associate_trees the LSHIFT_EXPR with this result and so it recurses
infinitely.

Perhaps my above patch is an improvement, if we know some subtree is
TREE_CONSTANT, all we need is just wait for it to be constant folded (not sure
it would always do e.g. because of division by zero or similar) trying to
reassociate its parts with other expressions might just split the constants to
other spots instead of keeping it together.