[Bug c/114088] Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088 --- Comment #3 from Thiago Macieira --- > But __builtin_strlen *does* get optimized when the input is a string literal. > Not sure about wcslen though. It appears not to, in the test above. std::char_trait::length() calls wcslen() whereas the char specialisation uses __builtin_strlen() explicitly. But if the intrinsics are enabled, the two would be the same, wouldn't they? Anyway, in the absence of a library function to call, inserting the loop is fine; it's what is there already. Though it would be nice to be able to provide such a function. I wrote it for Qt (it's called qustrlen). I would try with __builtin_constant_p first to see if the string is a literal.
[Bug c/114088] Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088 Xi Ruoyao changed: What|Removed |Added CC||xry111 at gcc dot gnu.org --- Comment #2 from Xi Ruoyao --- (In reply to Jonathan Wakely from comment #1) > GCC built-ins like __builtin_strlen just wrap a libc function. > __builtin_wcslen would generally just be a call to wcslen, which doesn't give > you much. But __builtin_strlen *does* get optimized when the input is a string literal. Not sure about wcslen though.
[Bug target/100799] Stackoverflow in optimized code on PPC
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799 --- Comment #27 from Peter Bergner --- (In reply to Jakub Jelinek from comment #26) > But I still think the workaround is possible on the callee side. > Sure, if the DECL_HIDDEN_STRING_LENGTH argument(s) is(are) used in the > function, then there is no easy way but expect the parameter save area (ok, > sure, it could just load from the assumed parameter location and don't > assume the rest is there, nor allow storing to the slots it loaded them > from). > But that is actually not what BLAS etc. suffers from. [snip] > So, the workaround could be for the case of unused DECL_HIDDEN_STRING_LENGTH > arguments at the end of PARM_DECLs don't try to load those at all and don't > assume there is parameter save area unless the non-DECL_HIDDEN_STRING_LENGTH > or used DECL_HIDDEN_STRING_LENGTH arguments actually require it. So I looked closer at what the failure mode was in this PR (versus the one you're seeing with flexiblas). As in your case, there is a mismatch in the number of parameters the C caller thinks there are (8 args, so no param save area needed) versus what the Fortran callee thinks there are (9 params which include the one hidden arg, so there is a param save area). The Fortran function doesn't actually access the hidden argument in our test case above, in fact the character argument is never used either. What I see in the rtl dumps is that *all* incoming args have a REG_EQUIV generated that points to the param save area (this doesn't happen when there are 8 or fewer formal params), even for the first 8 args that are passed in registers: (insn 2 12 3 2 (set (reg/v/f:DI 117 [ r3 ]) (reg:DI 3 3 [ r3 ])) "callee-3.c":6:1 685 {*movdi_internal64} (expr_list:REG_EQUIV (mem/f/c:DI (plus:DI (reg/f:DI 99 ap) (const_int 32 [0x20])) [1 r3+0 S8 A64]) (nil))) (insn 3 2 4 2 (set (reg/v:DI 118 [ r4 ]) (reg:DI 4 4 [ r4 ])) "callee-3.c":6:1 685 {*movdi_internal64} (expr_list:REG_EQUIV (mem/c:DI (plus:DI (reg/f:DI 99 ap) (const_int 40 [0x28])) [2 r4+0 S8 A64]) (nil))) ... We then get to RA and we end up spilling one of the pseudos associated with one of the other parameters (not the character param JOB). LRA then uses that REG_EQUIV note and rather than allocating a new stack slot to spill to, it uses the parameter save memory location for that parameter for the spill slot. When we store to that memory location and the C caller has not allocated the param save area, we end up clobbering an important part of the C callers stack causing a crash. If we were to try and do a callee workaround, we would need to disable setting those REG_EQUIV notes for the parameters... if that's even possible. Since Fortran uses call-by-name parameter passing, isn't the updated param value from the callee returned in the parameter save area itself??? > Doing the workaround on the caller side is impossible, this is for calls > from C/C++ to Fortran code, directly or indirectly called and there is > nothing the compiler could use to guess that it actually calls Fortran code > with hidden Fortran character arguments. As a HUGE hammer, every caller could always allocate a param save area. That would "fix" the problem from this bug, but would that also fix the bug you're seeing in flexiblas? I'm not advocating this though. I was thinking maybe making callers (under an option?) conservatively assume the callee is a Fortran function and for those C arguments that could map to a Fortran parameter with a hidden argument, bump the number of counted args by 1. For example, a C function with 2 char/char * args and 6 int args would think there are 8 normal args and 2 hidden args, so it needs to allocate a param save area. Is that not feasible? ...or does that not even address the issue you're seeing in your bug?
gcc-13-20240224 is now available
Snapshot gcc-13-20240224 is now available on https://gcc.gnu.org/pub/gcc/snapshots/13-20240224/ and on various mirrors, see https://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 13 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-13 revision acafe0f9824e77f1259de1e833886003bf8a6864 You'll find: gcc-13-20240224.tar.xz Complete GCC SHA256=3a5aa2c45d30efbe96872d92df85ba26a9c58f0c823cc2867569f06cd606a88f SHA1=95e21fd541e0c5b696f2d02c98e60f0dab460246 Diffs from 13-20240217 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-13 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
[Bug tree-optimization/114093] New: Canonicalization of `a == -1 || a == 0`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114093 Bug ID: 114093 Summary: Canonicalization of `a == -1 || a == 0` Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Take: ``` _Bool f1(int a) { return a == -1 || a == 0; } _Bool f0(signed a) { a = -a; return a == 1 || a == 0; } _Bool f(unsigned a) { return a == -1u || a == 0; } _Bool f3(unsigned a) { a = -a; return a == 1 || a == 0; } _Bool f2(unsigned a) { return (-a) <= 1; } ``` These all should produce the exact same code as they are all equivalent (if we ignore the (undefined) overflow possibility for f0). This is more about canonicalizations rather than anything else. Though I will note on the riscv and mips targets, f is worse than the others. LLVM canonical form seems to be `((unsigned)a) + 1 <= 1`.
[Bug tree-optimization/114092] ADD_OVERFLOW with resulting type of `_Complex unsigned:1` should be reduced to just `(unsigned)(a) <= 1`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114092 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #2 from Jakub Jelinek --- Guess all of .ADD_OVERFLOW (x, 0), .ADD_OVERFLOW (0, x) and .SUB_OVERFLOW (x, 0) to REALPART_EXPR = (type) x and IMAGPART_EXPR to (type) x != x. Just need to figure out for which types it is beneficial and for which it isn't.
[Bug tree-optimization/114092] ADD_OVERFLOW with resulting type of `_Complex unsigned:1` should be reduced to just `(unsigned)(a) <= 1`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114092 --- Comment #1 from Andrew Pinski --- I should note that LLVM (LLVM does not have __builtin_add_overflow_p) is able to optimize: ``` _Bool f2(int a, struct d b, unsigned _BitInt(1) t) { return __builtin_add_overflow(a, 0, ); } ``` into f1.
[Bug tree-optimization/114092] New: ADD_OVERFLOW with resulting type of `_Complex unsigned:1` should be reduced to just `(unsigned)(a) <= 1`
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114092 Bug ID: 114092 Summary: ADD_OVERFLOW with resulting type of `_Complex unsigned:1` should be reduced to just `(unsigned)(a) <= 1` Product: gcc Version: 14.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: pinskia at gcc dot gnu.org Target Milestone: --- Take: ``` struct d { unsigned i:1; }; _Bool f(int a, struct d b) { return __builtin_add_overflow_p(a, 0, b.i); } _Bool f1(int a, struct d b) { return a != 1 && a != 0; } ``` These 2 functions should produce the same. Here `a+0` overflows an `unsigned:1` if the value of a is not 0 or 1. We could extend this to any smaller types too if we want.
[Bug target/114091] gcc/config/aarch64/aarch64.cc has code requiring c++14 instead of c++11, so g++14 bootsrap fails in my example context
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114091 --- Comment #2 from Mark Millard --- (In reply to Andrew Pinski from comment #1) > This has already been fixed, over 2 weeks ago. > > >20240114 > > You are using a GCC 14 snapshot from a month ago even. Please try a newer > snapshot before reporting a bug next time. > > *** This bug has been marked as a duplicate of bug 113763 *** Sorry. I was building a FreeBSD port and I'm not a port maintainer, much less one for FreeBSD's lang/gcc14-devel . I've sent the port maintainer a copy of your reply. Thanks.
[Bug target/113763] [14 Regression] build fails with clang++ host compiler because aarch64.cc uses C++14 constexpr.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113763 Andrew Pinski changed: What|Removed |Added CC||markmigm at gmail dot com --- Comment #19 from Andrew Pinski --- *** Bug 114091 has been marked as a duplicate of this bug. ***
[Bug target/114091] gcc/config/aarch64/aarch64.cc has code requiring c++14 instead of c++11, so g++14 bootsrap fails in my example context
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114091 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #1 from Andrew Pinski --- This has already been fixed, over 2 weeks ago. >20240114 You are using a GCC 14 snapshot from a month ago even. Please try a newer snapshot before reporting a bug next time. *** This bug has been marked as a duplicate of bug 113763 ***
[Bug c++/114091] New: gcc/config/aarch64/aarch64.cc has code requiring c++14 instead of c++11, so g++14 bootsrap fails in my example context
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114091 Bug ID: 114091 Summary: gcc/config/aarch64/aarch64.cc has code requiring c++14 instead of c++11, so g++14 bootsrap fails in my example context Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: markmigm at gmail dot com Target Milestone: --- [I'm not sure where gcc/config/aarch64/aarch64.cc fits in the component alternatives. Feel free to correct that if I got it wrong.] gcc bootstrap is based on c++11, which predates the constructors for pair being constexpr. The gcc/config/aarch64/aarch64.cc specific code can fail because of using pair constructors where constant expressions are required: /wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/config/aarch64/aarch64.cc:13095:50: error: constexpr variable 'tiles' must be initialized by a constant expression static constexpr std::pair tiles[] = { ^ ~ /wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/config/aarch64/aarch64.cc:13096:5: note: non-constexpr constructor 'pair' cannot be used in a constant expression { 0xff, 'b' }, ^ This stops the bootstrap in the example context. This is detected when clang is doing the bootstrapping on FreeBSD. For reference: c++ -std=c++11 -fPIC -c -g -DIN_GCC -fno-strict-aliasing -fno-exceptions -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wno-narrowing -Wwrite-strings -Wcast-qual -Wno-format -Wmissing-format-attribute -Wconditionally-supported -Woverloaded-virtual -pedantic -Wno-long-long -Wno-variadic-macros -Wno-overlength-strings -fno-common -DHAVE_CONFIG_H -fPIC -I. -I. -I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc -I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/. -I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../include -I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../libcpp/include -I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../libcody -I/usr/local/include -I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../libdecnumber -I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../libdecnumber/bid -I../libdecnumber -I/wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/../libbacktrace -DLIBICONV_PLUG -o aarch64.o -MT aarch64.o -MMD -MP -MF ./.deps/aarch64.TPo /wrkdirs/usr/ports/lang/gcc14-devel/work/gcc-14-20240114/gcc/config/aarch64/aarch64.cc where clang used its libc++ (its a FreeBSD context): /usr/include/c++/v1/__utility/pair.h:225:5: note: declared here 225 | pair(_U1&& __u1, _U2&& __u2) | ^ having -std=c++11 on the command line. That results in lack of constexpr status in libc++ . It would appear that until gcc bootstrap intends on being based on c++14 (or later) that the gcc/config/aarch64/aarch64.cc code reported is presuming a post-c++11 context when it should not be.
New Chinese (simplified) PO file for 'gcc' (version 14.1-b20240218)
Hello, gentle maintainer. This is a message from the Translation Project robot. A revised PO file for textual domain 'gcc' has been submitted by the Chinese (simplified) team of translators. The file is available at: https://translationproject.org/latest/gcc/zh_CN.po (This file, 'gcc-14.1-b20240218.zh_CN.po', has just now been sent to you in a separate email.) All other PO files for your package are available in: https://translationproject.org/latest/gcc/ Please consider including all of these in your next release, whether official or a pretest. Whenever you have a new distribution with a new version number ready, containing a newer POT file, please send the URL of that distribution tarball to the address below. The tarball may be just a pretest or a snapshot, it does not even have to compile. It is just used by the translators when they need some extra translation context. The following HTML page has been updated: https://translationproject.org/domain/gcc.html If any question arises, please contact the translation coordinator. Thank you for all your work, The Translation Project robot, in the name of your translation coordinator.
[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090 Jakub Jelinek changed: What|Removed |Added Assignee|unassigned at gcc dot gnu.org |jakub at gcc dot gnu.org Status|NEW |ASSIGNED --- Comment #6 from Jakub Jelinek --- Created attachment 57521 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57521=edit gcc14-pr114090.patch Full untested patch.
[Bug c/114088] Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088 --- Comment #1 from Jonathan Wakely --- GCC built-ins like __builtin_strlen just wrap a libc function. __builtin_wcslen would generally just be a call to wcslen, which doesn't give you much. I assume what you want is to recognize wcslen and replace it with inline assembly code. Similarly, if libc doesn't provide c16slen then a __builtin_c16slen isn't going to do much. I think what you want is better code for finding char16_t(0) or char32_t(0), not a new built-in.
[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090 Jakub Jelinek changed: What|Removed |Added Priority|P3 |P2 --- Comment #5 from Jakub Jelinek --- && !TYPE_OVERFLOW_SANITIZED (type) is IMHO not needed, because both transformations for INT_MIN trigger UB before and after.
[Bug fortran/66499] Letters with accents change format behavior for X and T descriptors.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66499 --- Comment #7 from Jerry DeLisle --- There two issues going on here. We do not interpret source code that is UTF-8 encoded. This is why in our current tests for UTF-8 encoding of data files we us hexidecimal codes. I will have to see what the standard says about non=ASCII character sets in source code. If I get around this by using something like this: char1 = 4_"Test without local char" char2 = 4_"Test with local char " char2(22:22) = 4_"Ã" char2(23:23) = 4_"Ã" $ ./a.out 23 23 1234567890123456789012345678901234567890 Test without local char 10. Test with local char ÃÃ10. The string lengths now match correctly. One can see the tabbing is still off. This is because the format buffer seek functions are byte oriented and when using UTF-8 encoding we need to seek the buffer differently. In fact we have to allocate it differently as well to maintain the four byte characters.
[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #4 from Jakub Jelinek --- I'd go with --- gcc/match.pd.jj 2024-02-22 10:09:48.678446435 +0100 +++ gcc/match.pd2024-02-24 19:23:32.201014245 +0100 @@ -453,8 +453,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x. */ (simplify - (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop)) - (abs @0)) + (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop)) + (if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) + (abs @0))) /* X * 1, X / 1 -> X. */ (for op (mult trunc_div ceil_div floor_div round_div exact_div) @@ -4218,8 +4219,9 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* (x <= 0 ? -x : 0) -> max(-x, 0). */ (simplify - (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1) - (max @2 @1)) + (cond (le @0 integer_zerop@1) (negate@2 @0) integer_zerop@1) + (if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type)) + (max @2 @1))) /* (zero_one == 0) ? y : z y -> ((typeof(y))zero_one * z) y */ (for op (bit_xor bit_ior plus)
[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090 --- Comment #3 from Jakub Jelinek --- Both the patterns look wrong for TYPE_OVERFLOW_WRAPS and the first one also for TYPE_UNSIGNED (the second one is ok for TYPE_UNSIGNED but doesn't make much sense there, we should have folded it to 0. Of course, the first one is unlikely to trigger for TYPE_UNSIGNED because MAX should have been folded to 0.
[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090 Andrew Pinski changed: What|Removed |Added Last reconfirmed||2024-02-24 Status|UNCONFIRMED |NEW Ever confirmed|0 |1 --- Comment #2 from Andrew Pinski --- There most likely should be this check added: ANY_INTEGRAL_TYPE_P (type) && !TYPE_OVERFLOW_WRAPS (type)
[Bug tree-optimization/114090] [13/14 Regression] forwprop -fwrapv miscompilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=94920 Keywords||wrong-code Target Milestone|--- |13.3 Summary|forwprop -fwrapv|[13/14 Regression] forwprop |miscompilation |-fwrapv miscompilation --- Comment #1 from Andrew Pinski --- The pattern: /* (x >= 0 ? x : 0) + (x <= 0 ? -x : 0) -> abs x. */ (simplify (plus:c (max @0 integer_zerop) (max (negate @0) integer_zerop)) (abs @0)) introduced by r13-1785-g633e9920589ddf .
[Bug tree-optimization/114090] New: forwprop -fwrapv miscompilation
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114090 Bug ID: 114090 Summary: forwprop -fwrapv miscompilation Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: kristerw at gcc dot gnu.org Target Milestone: --- The function f below returns an incorrect result for INT_MIN when compiled with -O1 -fwrapv for X86_64: __attribute__((noipa)) int f(int x) { int w = (x >= 0 ? x : 0); int y = -x; int z = (y >= 0 ? y : 0); return w + z; } int main () { if (f(0x8000) != 0) __builtin_abort (); return 0; } What is happening is that forwprop has optimized w_2 = MAX_EXPR ; y_3 = -x_1(D); z_4 = MAX_EXPR ; _5 = w_2 + z_4; return _5; to _5 = ABS_EXPR ; return _5;
[Bug testsuite/114089] FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114089 --- Comment #2 from Jakub Jelinek --- I mean r14-9162 , sorry.
[Bug testsuite/114089] FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114089 Jakub Jelinek changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED CC||jakub at gcc dot gnu.org Resolution|--- |FIXED Version|13.2.1 |14.0 --- Comment #1 from Jakub Jelinek --- See r14-9165 ?
Re: CI for "Option handling: add documentation URLs"
Hi, On Thu, Feb 22, 2024 at 11:57:50AM +0800, YunQiang Su wrote: > Mark Wielaard 于2024年2月19日周一 06:58写道: > > So, I did try the regenerate-opt-urls locally, and it did generate the > > attached diff. Which seems to show we really need this automated. > > > > Going over the diff. The -Winfinite-recursion in rust does indeed seem > > new. As do the -mapx-inline-asm-use-gpr32 and mevex512 for i386. And > > the avr options -mskip-bug, -mflmap and mrodata-in-ram. The change in > > common.opt.urls for -Wuse-after-free comes from it being moved from > > c++ to the c-family. The changes in mips.opt.urls seem to come from > > commit 46df1369 "doc/invoke: Remove duplicate explicit-relocs entry of > > MIPS". > > > > For MIPS, it's due to malformed patches to invoke.text. > I will fix them. Thanks. So with your commit 00bc8c0998d8 ("invoke.texi: Fix some skipping UrlSuffix problem for MIPS") pushed now, the attached patch fixes the remaining issues. Is this OK to push? > > The changes in c.opt.urls seem mostly reordering. The sorting makes > > more sense after the diff imho. And must have come from commit > > 4666cbde5 "Sort warning options in c-family/c.opt". > > > > Also the documentation for -Warray-parameter was fixed. > > > > So I think the regenerate-opt-urls check does work as intended. So > > lets automate it, because it looks like nobody regenerated the > > url.opts after updating the documentation. > > > > But we should first apply this diff. Could you double check it is > > sane/correct? > > > > Thanks, > > > > Mark > > > > -- > YunQiang Su >From c019327e919fff87ffa94799e8f521bda707a883 Mon Sep 17 00:00:00 2001 From: Mark Wielaard Date: Sat, 24 Feb 2024 17:34:05 +0100 Subject: [PATCH] Regenerate opt.urls There were several commits that didn't regenerate the opt.urls files. Fixes: 438ef143679e ("rs6000: Neuter option -mpower{8,9}-vector") Fixes: 50c549ef3db6 ("gccrs: enable -Winfinite-recursion warnings by default") Fixes: 25bb8a40abd9 ("Move docs for -Wuse-after-free and -Wuseless-cast") Fixes: 48448055fb70 ("AVR: Support .rodata in Flash for AVR64* and AVR128*") Fixes: 42503cc257fb ("AVR: Document option -mskip-bug") Fixes: 7de5bb642c12 ("i386: [APX] Document inline asm behavior and new switch") Fixes: 49a14ee488b8 ("Add -mevex512 into invoke.texi") Fixes: 4666cbde5e6d ("Sort warning options in c-family/c.opt.") gcc/config/ * rs6000/rs6000.opt.urls: Regenerate. * avr/avr.opt.urls: Likewise. * i386/i386.opt.urls: Likewise. * pru/pru.opt.urls: Likewise. * riscv/riscv.opt.urls: Likewise. gcc/rust/ * lang.opt.urls: Regenerate. gcc/ * common.opt.urls: Regenerate. gcc/c-family/ * c.opt.urls: Regenerate. --- gcc/c-family/c.opt.urls | 351 +++--- gcc/common.opt.urls | 4 +- gcc/config/avr/avr.opt.urls | 9 + gcc/config/i386/i386.opt.urls | 8 +- gcc/config/pru/pru.opt.urls | 2 +- gcc/config/riscv/riscv.opt.urls | 2 +- gcc/config/rs6000/rs6000.opt.urls | 3 - gcc/rust/lang.opt.urls| 3 + 8 files changed, 200 insertions(+), 182 deletions(-) diff --git a/gcc/c-family/c.opt.urls b/gcc/c-family/c.opt.urls index 5365c8e2bc54..9f97dc61a778 100644 --- a/gcc/c-family/c.opt.urls +++ b/gcc/c-family/c.opt.urls @@ -88,6 +88,9 @@ UrlSuffix(gcc/Warning-Options.html#index-Wabsolute-value) Waddress UrlSuffix(gcc/Warning-Options.html#index-Waddress) +Waddress-of-packed-member +UrlSuffix(gcc/Warning-Options.html#index-Waddress-of-packed-member) + Waligned-new UrlSuffix(gcc/C_002b_002b-Dialect-Options.html#index-Waligned-new) @@ -115,6 +118,9 @@ UrlSuffix(gcc/Warning-Options.html#index-Walloc-zero) Walloca-larger-than= UrlSuffix(gcc/Warning-Options.html#index-Walloca-larger-than_003d) LangUrlSuffix_D(gdc/Warnings.html#index-Walloca-larger-than) +Warith-conversion +UrlSuffix(gcc/Warning-Options.html#index-Warith-conversion) + Warray-bounds= UrlSuffix(gcc/Warning-Options.html#index-Warray-bounds) @@ -122,13 +128,10 @@ Warray-compare UrlSuffix(gcc/Warning-Options.html#index-Warray-compare) Warray-parameter -UrlSuffix(gcc/Warning-Options.html#index-Wno-array-parameter) +UrlSuffix(gcc/Warning-Options.html#index-Warray-parameter) Warray-parameter= -UrlSuffix(gcc/Warning-Options.html#index-Wno-array-parameter) - -Wzero-length-bounds -UrlSuffix(gcc/Warning-Options.html#index-Wzero-length-bounds) +UrlSuffix(gcc/Warning-Options.html#index-Warray-parameter) Wassign-intercept UrlSuffix(gcc/Objective-C-and-Objective-C_002b_002b-Dialect-Options.html#index-Wassign-intercept) @@ -148,9 +151,6 @@ UrlSuffix(gcc/Warning-Options.html#index-Wbool-compare) Wbool-operation UrlSuffix(gcc/Warning-Options.html#index-Wbool-operation) -Wframe-address -UrlSuffix(gcc/Warning-Options.html#index-Wframe-address) - Wbuiltin-declaration-mismatch UrlSuffix(gcc/Warning-Options.html#index-Wbuiltin-declaration-mismatch)
[Bug testsuite/114089] New: FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess errors)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114089 Bug ID: 114089 Summary: FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess errors) Product: gcc Version: 13.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: testsuite Assignee: unassigned at gcc dot gnu.org Reporter: danglin at gcc dot gnu.org CC: rsandifo at gcc dot gnu.org Target Milestone: --- Host: hppa64-hp-hpux11.11 Target: hppa64-hp-hpux11.11 Build: hppa64-hp-hpux11.11 This test fails on hppa64-hp-hpux11.11. Test lacks "target aarch64*-*-*" restriction.
[Bug middle-end/114087] RISC-V optimization on checking certain bits set ((x & mask) == val)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114087 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement Keywords||missed-optimization Component|rtl-optimization|middle-end
[Bug c/114088] Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088 Andrew Pinski changed: What|Removed |Added Severity|normal |enhancement
[Bug c/114088] New: Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114088 Bug ID: 114088 Summary: Please provide __builtin_c16slen and __builtin_c32slen to complement __builtin_wcslenw Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: thiago at kde dot org Target Milestone: --- Actually, GCC doesn't have __builtin_wcslen, but Clang does. Providing these extra two builtins would allow implementing __builtin_wcslen too. The names are not part of the C standard, but follow the current naming construction rules for it, similar to how "mbrtowc" and "wcslen" parallel. My specific need is actually to implement char16_t string containers in C++. I'm particularly interested in QString/QStringView, but this applies to std::basic_string{_view} too. For example: std::string_view f1() { return "Hello"; } std::wstring_view fw() { return L"Hello"; } std::u16string_view f16() { return u"Hello"; } std::u32string_view f32() { return U"Hello"; } With GCC and libstdc++, the first function produces optimal code: movl$5, %eax leaq.LC0(%rip), %rdx ret For wchar_t case, GCC emits an out-of-line call to wcslen: pushq %rbx leaq.LC2(%rip), %rbx movq%rbx, %rdi callwcslen@PLT movq%rbx, %rdx popq%rbx ret The next two, because of the absence of a C library function, emit a loop: xorl%eax, %eax leaq.LC1(%rip), %rcx .L4: incq%rax cmpw$0, (%rcx,%rax,2) jne .L4 movq%rcx, %rdx ret Clang, meanwhile, emits optimal code for all four and so did the pre-Clang Intel compiler. See https://gcc.godbolt.org/z/qvj7qnYbz. MSVC emits optimal for the char and wchar_t versions, but loops for the other two. Clang gives up when the string gets longer, though. See https://gcc.godbolt.org/z/54j3zr6e6. That indicates that it gave up on guessing the loop run and would do better if the intrinsic were present.
[r14-9155 Regression] FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess errors) on Linux/x86_64
On Linux/x86_64, 8a16e06da97f51574cfad17e2cece2e58571305d is the first bad commit commit 8a16e06da97f51574cfad17e2cece2e58571305d Author: Richard Sandiford Date: Fri Feb 23 14:12:54 2024 + aarch64: Add missing early-ra bookkeeping [PR113295] caused FAIL: gcc.dg/rtl/aarch64/pr113295-1.c (test for excess errors) with GCC configured with ../../gcc/configure --prefix=/export/users/haochenj/src/gcc-bisect/master/master/r14-9155/usr --enable-clocale=gnu --with-system-zlib --with-demangler-in-ld --with-fpmath=sse --enable-languages=c,c++,fortran --enable-cet --without-isl --enable-libmpx x86_64-linux --disable-bootstrap To reproduce: $ cd {build_dir}/gcc && make check RUNTESTFLAGS="rtl.exp=gcc.dg/rtl/aarch64/pr113295-1.c --target_board='unix{-m32}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="rtl.exp=gcc.dg/rtl/aarch64/pr113295-1.c --target_board='unix{-m32\ -march=cascadelake}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="rtl.exp=gcc.dg/rtl/aarch64/pr113295-1.c --target_board='unix{-m64}'" $ cd {build_dir}/gcc && make check RUNTESTFLAGS="rtl.exp=gcc.dg/rtl/aarch64/pr113295-1.c --target_board='unix{-m64\ -march=cascadelake}'" (Please do not reply to this email, for question about this report, contact me at haochen dot jiang at intel.com.) (If you met problems with cascadelake related, disabling AVX512F in command line might save that.) (However, please make sure that there is no potential problems with AVX512.)
[Bug rtl-optimization/114062] "GNAT BUG DETECTED" 13.2.0 (hppa-linux-gnu) in remove, at alloc-pool.h:437
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114062 John David Anglin changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |INVALID --- Comment #4 from John David Anglin --- Not reproducible.
[Bug rtl-optimization/114087] New: RISC-V optimization on checking certain bits set ((x & mask) == val)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114087 Bug ID: 114087 Summary: RISC-V optimization on checking certain bits set ((x & mask) == val) Product: gcc Version: 13.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: Explorer09 at gmail dot com Target Milestone: --- It might be common in the C family of languages to check if certain bits are set in an integer with a code pattern like this: ```c unsigned int x; if ((x & 0x3000) == 0x1000) { // Do something... } ``` And I am surprised when compilers like GCC and Clang didn't realize they can use some bit shifts and inversions of bit masks to save some instructions and emit smaller code. Here I present 3 possible optimizations that could be implemented in a compiler. Two of them can apply not only to RISC-V, but other RISC architectures as well (except ARM, perhaps). The last one is specific to RISC-V due to the 20-bit immediate operand of its "lui" (load upper immediate) instruction. The bit masks should be compile time constants, and the "-Os" flag (optimize for size) is assumed. ### Test code The example code and constants are crafted specifically for RISC-V. Each group of `pred*` functions should function identically (if not, please let me know; it might be a typo). * The "a" variants are what I commonly write for checking the set bits. * The "b" variants are what I believe the compiler should ideally transform the code to. I wrote them to let compiler developers know how the optimization can be done. (But in practice the "b" code might transform to "a", meaning the "optimization" direction reversed.) * The "c" variants are hacks to make things work. They contain `__asm__ volatile` directives to force GCC or Clang to optimize in the direction I want. The generated assembly should present what I considered the ideal result. ```c #include #include #define POWER_OF_TWO_FACTOR(x) ((x) & -(x)) // --- // Example 1: The bitwise AND mask contains lower bits in all ones. // By converting the bitwise AND into a bitwise OR, an "addi" // instruction can be saved. // (This might conflict with optimizations utilizing RISC-V "bclri" // instruction; use one or the other.) // (In ARM there are "bic" instructions already, making this // optimization useless.) static uint32_t mask1 = 0x5FFF; static uint32_t val1 = 0x14501DEF; // static_assert((mask1 & val1) == val1); // static_assert((mask1 & 0xFFF) == 0xFFF); bool pred1a(uint32_t x) { return ((x & mask1) == val1); } bool pred1b(uint32_t x) { return ((x | ~mask1) == (val1 | ~mask1)); } bool pred1c(uint32_t x) { register uint32_t temp = x | ~mask1; __asm__ volatile ("" : "+r" (temp)); return (temp == (val1 | ~mask1)); } // --- // Example 2: The bitwise AND mask could fit an 11-bit immediate // operand of RISC-V "andi" instruction with a help of right // shifting. (Keep the sign bit of the immediate operand zero.) // (This kind of optimization could also work with other RISC // architectures, except ARM.) static uint32_t mask2 = 0x5550; static uint32_t val2 = 0x1450; // static_assert(mask2 != 0); // static_assert((mask2 & val2) == val2); // static_assert(mask2 / POWER_OF_TWO_FACTOR(mask2) <= 0x7FF); bool pred2a(uint32_t x) { return ((x & mask2) == val2); } bool pred2b(uint32_t x) { uint32_t factor = POWER_OF_TWO_FACTOR(mask2); return ((x >> __builtin_ctz(factor)) & (mask2 / factor)) == (val2 / factor); } bool pred2c(uint32_t x) { uint32_t factor = POWER_OF_TWO_FACTOR(mask2); register uint32_t temp = x >> 20; __asm__ volatile ("" : "+r" (temp)); return (temp & 0x555) == 0x145; } // --- // Example 3: The bitwise AND mask could fit a 20-bit immediate // operand of RISC-V "lui" instruction. // Only RISC-V has this 20-bit immediate "U-type" format, AFAIK. static uint32_t mask3 = 0x0005; static uint32_t val3 = 0x00045014; // static_assert(mask3 / POWER_OF_TWO_FACTOR(mask3) <= 0xF); bool pred3a(uint32_t x) { return ((x & mask3) == val3); } bool pred3b(uint32_t x) { uint32_t factor = POWER_OF_TWO_FACTOR(mask3); return (((x / factor) << 12) & ((mask3 / factor) << 12)) == ((val3 / factor) << 12); } bool pred3c(uint32_t x) { uint32_t factor = POWER_OF_TWO_FACTOR(mask3); register uint32_t temp = x << 12; __asm__ volatile ("" : "+r" (temp)); return (temp & ((mask3 / factor) << 12)) == ((val3 / factor) << 12); } ``` I tested the code in the Compiler Explorer (godbolt.org). ### Generated assembly (for reference only) ``` pred1a: li a5,1431658496 addia5,a5,-1 and a0,a0,a5 li a5,340795392
[Bug sanitizer/97696] ICE since ASAN_MARK does not handle poly_int sized varibales
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97696 --- Comment #3 from Richard Sandiford --- Created attachment 57520 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57520=edit Candidate patch The attached patch seems to fix it. I'm taking next week off, but I'll run the patch through proper testing when I get back.
[Bug sanitizer/97696] ICE since ASAN_MARK does not handle poly_int sized varibales
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97696 Richard Sandiford changed: What|Removed |Added CC||rsandifo at gcc dot gnu.org Status|NEW |ASSIGNED Assignee|unassigned at gcc dot gnu.org |rsandifo at gcc dot gnu.org
Re: [PATCH RFA] build: drop target libs from LD_LIBRARY_PATH [PR105688]
On Feb 23, 2024, Jason Merrill wrote: > The problem, as you say, comes when you want to both bootstrap and > build tools that aren't involved in the bootstrap process. It's more visible there, because those don't actively refrain from linking dynamically with libstdc++. But even bootstrapped matter that involves exceptions would have to link in libgcc_s, and that would bring about the same sort of issue. > To support that perhaps we want POSTBOOTSTRAP_HOST_EXPORTS for host > modules without the bootstrap tag, and add the TARGET_LIB_PATH > directories there? That would be welcome, but it doesn't really address the problem, does it? Namely, the problem that we may face two different kinds of scenarios, and each one calls for an opposite solution. 1. system tools used in the build depend on system libraries that are newer than the ones we're about to build This is the scenario that you're attempting to address with these patches. The problem here is that the libraries being built are older than the system libraries, and and system tools won't run if dynamically linked with the older libraries about to be built. 2. the libraries we're about to build are newer than corresponding system libraries, if any This is the scenario that the current build system aims for. Any build tools that rely on older system libraries are likely to work just as well with the newly built libraries. Any newly built libraries linked into programs used and run as part of the build have to be present in LD_LIBRARY_PATH lest we end up trying to use the older system libraries, which may seem to work in some settings, but is bound to break if the differences are large enough. For maximum clarity, consider a bootstrap with LTO using a linker plugin. The linker plugin is built against the newly-built libraries. The linker that attempts to load the plugin also requires the same libraries. Do you see how tending to 1. breaks 2., and vice-versa? Now add ASAN to the picture, for another set of newly-built libraries used during bootstrap. Also use a prebuilt linker with ASAN enabled, for maximum clarity of the problem I'm getting at. Do you see the problem? Do you agree that patching the build system to solve a problem in scenario 1. *will* cause problems in scenario 2., *unless* the fix can distinguish the two scenarios and behave accordingly, but that getting that right is triky and error prone? Do you agree that, until we get there, it's probably better to optimize for the more common scenario? Do you agree that the more common scenario is probably 2.? Do you agree that, until we get a solution that works for both 1. and 2. automatically, offering a reasonably simple workaround for 1., while aiming to work for 2., would be a desirable stopgap? Do you agree that adding support for users to prepend directories to the search path, enabling them to preempt build libraries with (symlinks to?) select newer system libraries, and documenting situations in which this could be needed, is a reasonably simple and desirable stopgap that enables 1. to work while defaulting to the presumed more common case 2.? Here's a patchlet that shows the crux of what I have in mind: (nevermind we'd make the change elsewhere, document it further elsewhere, set an empty default and arrange to passed down to sub-$(MAKE)s) diff --git a/Makefile.in b/Makefile.in index edb0c8a9a427f..10c7646ef98c4 100644 --- a/Makefile.in +++ b/Makefile.in @@ -771,7 +771,12 @@ TARGET_LIB_PATH_libatomic = $$r/$(TARGET_SUBDIR)/libatomic/.libs: # This is the list of directories that may be needed in RPATH_ENVVAR # so that programs built for the host machine work. -HOST_LIB_PATH = $(HOST_LIB_PATH_gmp)$(HOST_LIB_PATH_mpfr)$(HOST_LIB_PATH_mpc)$(HOST_LIB_PATH_isl) +# Users may set PREEMPT_HOST_LIB_PATH to a directory holding symlinks +# to system libraries required by build tools (say the linker) that +# are newer (as in higher-versioned) than the corresponding libraries +# we're building. If older libraries were to override the newer +# system libraries, that could prevent the build tools from running. +HOST_LIB_PATH = $(PREEMPT_HOST_LIB_PATH):$(HOST_LIB_PATH_gmp)$(HOST_LIB_PATH_mpfr)$(HOST_LIB_PATH_mpc)$(HOST_LIB_PATH_isl) # Define HOST_LIB_PATH_gcc here, for the sake of TARGET_LIB_PATH, ouch @if gcc Now, for a more general solution that doesn't require user intervention, configure could go about looking for system libraries in the default search path, or in RPATH_ENV_VAR, that share the soname with those we're about to build, identify preexisting libraries that are newer than those we're about to build, populate a build-tree directory with symlinks to them, and default PREEMPT_HOST_LIB_PATH to that directory. WDYT? -- Alexandre Oliva, happy hackerhttps://FSFLA.org/blogs/lxo/ Free Software Activist GNU Toolchain Engineer More tolerance and less prejudice are key for inclusion and diversity Excluding neuro-others for
[Bug tree-optimization/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086 --- Comment #7 from Jakub Jelinek --- Now, suppose we optimize the (0x >> x) & 1 case etc. provided suitable range of x to x & 1. For int bar3 (int e) { if (e <= 15U) return e & 1; else return 0; } phiopt optimizes this into return e & 1 & (e <= 15U); so, guess we want another match.pd optimization which would turn that into e & -15.
[Bug middle-end/113205] [14 Regression] internal compiler error: in backward_pass, at tree-vect-slp.cc:5346 since r14-3220
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113205 Richard Sandiford changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #14 from Richard Sandiford --- Finally fixed.
[Bug middle-end/113205] [14 Regression] internal compiler error: in backward_pass, at tree-vect-slp.cc:5346 since r14-3220
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113205 --- Comment #13 from GCC Commits --- The trunk branch has been updated by Richard Sandiford : https://gcc.gnu.org/g:0394ae31e832c5303f3b4aad9c66710a30c097f0 commit r14-9165-g0394ae31e832c5303f3b4aad9c66710a30c097f0 Author: Richard Sandiford Date: Sat Feb 24 11:58:22 2024 + vect: Tighten check for impossible SLP layouts [PR113205] During its forward pass, the SLP layout code tries to calculate the cost of a layout change on an incoming edge. This is taken as the minimum of two costs: one in which the source partition keeps its current layout (chosen earlier during the pass) and one in which the source partition switches to the new layout. The latter can sometimes be arranged by the backward pass. If only one of the costs is valid, the other cost was ignored. But the PR shows that this is not safe. If the source partition has layout 0 (the normal layout), we have to be prepared to handle the case in which that ends up being the only valid layout. Other code already accounts for this restriction, e.g. see the code starting with: /* Reject the layout if it would make layout 0 impossible for later partitions. This amounts to testing that the target supports reversing the layout change on edges to later partitions. gcc/ PR tree-optimization/113205 * tree-vect-slp.cc (vect_optimize_slp_pass::forward_cost): Reject the proposed layout if it does not allow a source partition with layout 2 to keep that layout. gcc/testsuite/ PR tree-optimization/113205 * gcc.dg/torture/pr113205.c: New test.
[Bug tree-optimization/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086 Jakub Jelinek changed: What|Removed |Added Component|middle-end |tree-optimization --- Comment #6 from Jakub Jelinek --- (In reply to Jan Schultke from comment #5) > Well, it's not quite equivalent to either of the bit-shifts we've posted. The #c4 foo2/bar2 are functionally equivalent to #c4 foo/bar, it is what gcc actually emits for the latter. x > 6 ? 0 : ((85 >> x) & 1) isn't functionally equivalent to anything mentioned so far here, as it handles negative values differently.
[Bug middle-end/113988] during GIMPLE pass: bitintlower: internal compiler error: in lower_stmt, at gimple-lower-bitint.cc:5470
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113988 Bug 113988 depends on bug 114073, which changed state. Bug 114073 Summary: during GIMPLE pass: bitintlower: internal compiler error: in lower_stmt, at gimple-lower-bitint.cc:5530 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114073 What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED
[Bug middle-end/114073] during GIMPLE pass: bitintlower: internal compiler error: in lower_stmt, at gimple-lower-bitint.cc:5530
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114073 Jakub Jelinek changed: What|Removed |Added Status|ASSIGNED|RESOLVED Resolution|--- |FIXED --- Comment #3 from Jakub Jelinek --- Fixed.
[Bug middle-end/114073] during GIMPLE pass: bitintlower: internal compiler error: in lower_stmt, at gimple-lower-bitint.cc:5530
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114073 --- Comment #2 from GCC Commits --- The master branch has been updated by Jakub Jelinek : https://gcc.gnu.org/g:5e7a176e88a2a37434cef9b1b6a37a4f8274854a commit r14-9163-g5e7a176e88a2a37434cef9b1b6a37a4f8274854a Author: Jakub Jelinek Date: Sat Feb 24 12:44:34 2024 +0100 bitint: Handle VIEW_CONVERT_EXPRs between large/huge BITINT_TYPEs and VECTOR/COMPLEX_TYPE etc. [PR114073] The following patch implements support for VIEW_CONVERT_EXPRs from/to large/huge _BitInt to/from vector or complex types or anything else but integral/pointer types which doesn't need to live in memory. 2024-02-24 Jakub Jelinek PR middle-end/114073 * gimple-lower-bitint.cc (bitint_large_huge::lower_stmt): Handle VIEW_CONVERT_EXPRs between large/huge _BitInt and non-integer/pointer types like vector or complex types. (gimple_lower_bitint): Don't merge VIEW_CONVERT_EXPRs to non-integral types. Fix up VIEW_CONVERT_EXPR handling. Allow merging VIEW_CONVERT_EXPR from non-integral/pointer types with a store. * gcc.dg/bitint-93.c: New test.
Re: [PATCH] Use HOST_WIDE_INT_{C,UC,0,0U,1,1U} macros some more
> Am 24.02.2024 um 08:44 schrieb Jakub Jelinek : > > Hi! > > I've searched for some uses of (HOST_WIDE_INT) constant or (unsigned > HOST_WIDE_INT) constant and turned them into uses of the appropriate > macros. > THere are quite a few cases in non-i386 backends but I've left that out > for now. > The only behavior change is in build_replicated_int_cst where the > left shift was done in HOST_WIDE_INT type but assigned to unsigned > HOST_WIDE_INT, which I've changed into unsigned HOST_WIDE_INT shift. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok Richard > 2024-02-24 Jakub Jelinek > > gcc/ >* builtins.cc (fold_builtin_isascii): Use HOST_WIDE_INT_UC macro. >* combine.cc (make_field_assignment): Use HOST_WIDE_INT_1U macro. >* double-int.cc (double_int::mask): Use HOST_WIDE_INT_UC macros. >* genattrtab.cc (attr_alt_complement): Use HOST_WIDE_INT_1 macro. >(mk_attr_alt): Use HOST_WIDE_INT_0 macro. >* genautomata.cc (bitmap_set_bit, CLEAR_BIT): Use HOST_WIDE_INT_1 >macros. >* ipa-strub.cc (can_strub_internally_p): Use HOST_WIDE_INT_1 macro. >* loop-iv.cc (implies_p): Use HOST_WIDE_INT_1U macro. >* pretty-print.cc (test_pp_format): Use HOST_WIDE_INT_C and >HOST_WIDE_INT_UC macros. >* rtlanal.cc (nonzero_bits1): Use HOST_WIDE_INT_UC macro. >* tree.cc (build_replicated_int_cst): Use HOST_WIDE_INT_1U macro. >* tree.h (DECL_OFFSET_ALIGN): Use HOST_WIDE_INT_1U macro. >* tree-ssa-structalias.cc (dump_varinfo): Use ~HOST_WIDE_INT_0U >macros. >* wide-int.cc (divmod_internal_2): Use HOST_WIDE_INT_1U macro. >* config/i386/constraints.md (define_constraint "L"): Use >HOST_WIDE_INT_C macro. >* config/i386/i386.md (movabsq split peephole2): Use HOST_WIDE_INT_C >macro. >(movl + movb peephole2): Likewise. >* config/i386/predicates.md (x86_64_zext_immediate_operand): Likewise. >(const_32bit_mask): Likewise. > gcc/objc/ >* objc-encoding.cc (encode_array): Use HOST_WIDE_INT_0 macros. > > --- gcc/builtins.cc.jj2024-02-06 08:43:14.84351 +0100 > +++ gcc/builtins.cc2024-02-23 22:02:48.245611359 +0100 > @@ -9326,7 +9326,7 @@ fold_builtin_isascii (location_t loc, tr > /* Transform isascii(c) -> ((c & ~0x7f) == 0). */ > arg = fold_build2 (BIT_AND_EXPR, integer_type_node, arg, > build_int_cst (integer_type_node, > -~ (unsigned HOST_WIDE_INT) 0x7f)); > +~ HOST_WIDE_INT_UC (0x7f))); > return fold_build2_loc (loc, EQ_EXPR, integer_type_node, > arg, integer_zero_node); > } > --- gcc/combine.cc.jj2024-01-03 11:51:34.028696534 +0100 > +++ gcc/combine.cc2024-02-23 22:03:36.895923405 +0100 > @@ -9745,7 +9745,7 @@ make_field_assignment (rtx x) > if (width >= HOST_BITS_PER_WIDE_INT) >ze_mask = -1; > else > -ze_mask = ((unsigned HOST_WIDE_INT)1 << width) - 1; > +ze_mask = (HOST_WIDE_INT_1U << width) - 1; > > /* Complete overlap. We can remove the source AND. */ > if ((and_mask & ze_mask) == ze_mask) > --- gcc/double-int.cc.jj2024-01-03 11:51:42.086584698 +0100 > +++ gcc/double-int.cc2024-02-23 22:04:30.586164187 +0100 > @@ -671,14 +671,14 @@ double_int::mask (unsigned prec) > if (prec > HOST_BITS_PER_WIDE_INT) > { > prec -= HOST_BITS_PER_WIDE_INT; > - m = ((unsigned HOST_WIDE_INT) 2 << (prec - 1)) - 1; > + m = (HOST_WIDE_INT_UC (2) << (prec - 1)) - 1; > mask.high = (HOST_WIDE_INT) m; > mask.low = ALL_ONES; > } > else > { > mask.high = 0; > - mask.low = prec ? ((unsigned HOST_WIDE_INT) 2 << (prec - 1)) - 1 : 0; > + mask.low = prec ? (HOST_WIDE_INT_UC (2) << (prec - 1)) - 1 : 0; > } > > return mask; > --- gcc/genattrtab.cc.jj2024-01-03 11:51:38.125639672 +0100 > +++ gcc/genattrtab.cc2024-02-23 22:05:38.043210294 +0100 > @@ -2392,7 +2392,7 @@ static rtx > attr_alt_complement (rtx s) > { > return attr_rtx (EQ_ATTR_ALT, XWINT (s, 0), > - ((HOST_WIDE_INT) 1) - XWINT (s, 1)); > + HOST_WIDE_INT_1 - XWINT (s, 1)); > } > > /* Return EQ_ATTR_ALT expression representing set containing elements set > @@ -2401,7 +2401,7 @@ attr_alt_complement (rtx s) > static rtx > mk_attr_alt (alternative_mask e) > { > - return attr_rtx (EQ_ATTR_ALT, (HOST_WIDE_INT) e, (HOST_WIDE_INT) 0); > + return attr_rtx (EQ_ATTR_ALT, (HOST_WIDE_INT) e, HOST_WIDE_INT_0); > } > > /* Given an expression, see if it can be simplified for a particular insn > --- gcc/genautomata.cc.jj2024-01-03 11:51:32.524717408 +0100 > +++ gcc/genautomata.cc2024-02-23 22:07:04.667985357 +0100 > @@ -3416,13 +3416,13 @@ finish_alt_states (void) > > /* Set bit number bitno in the bit string. The macro is not side >effect proof. */ > -#define bitmap_set_bit(bitstring, bitno) \ > +#define bitmap_set_bit(bitstring, bitno) \ > ((bitstring)[(bitno) / (sizeof
[Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086 --- Comment #5 from Jan Schultke --- Well, it's not quite equivalent to either of the bit-shifts we've posted. To account for shifting more than the operand size, it would be: bool foo (int x) { return x > 6 ? 0 : ((85 >> x) & 1); } This is exactly what GCC does and the branch can be explained by this range check. So I guess GCC already does optimize this to a bit-vector, it just doesn't find the optimization to: bool foo(int x) { return (x & -7) == 0; } This is very specific to this particular switch statement though. You could do better than having a branch if the hardware supported a saturating shift, but probably not on x86_64. Nevermind that; if anything, this isn't middle-end.
Re: [PATCH] bitint: Handle VIEW_CONVERT_EXPRs between large/huge BITINT_TYPEs and VECTOR/COMPLEX_TYPE etc. [PR114073]
> Am 24.02.2024 um 08:40 schrieb Jakub Jelinek : > > Hi! > > The following patch implements support for VIEW_CONVERT_EXPRs from/to > large/huge _BitInt to/from vector or complex types or anything else but > integral/pointer types which doesn't need to live in memory. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok Richard > 2024-02-24 Jakub Jelinek > >PR middle-end/114073 >* gimple-lower-bitint.cc (bitint_large_huge::lower_stmt): Handle >VIEW_CONVERT_EXPRs between large/huge _BitInt and non-integer/pointer >types like vector or complex types. >(gimple_lower_bitint): Don't merge VIEW_CONVERT_EXPRs to non-integral >types. Fix up VIEW_CONVERT_EXPR handling. Allow merging >VIEW_CONVERT_EXPR from non-integral/pointer types with a store. > >* gcc.dg/bitint-93.c: New test. > > --- gcc/gimple-lower-bitint.cc.jj2024-02-23 11:36:06.977015730 +0100 > +++ gcc/gimple-lower-bitint.cc2024-02-23 18:21:09.282751377 +0100 > @@ -5305,27 +5305,21 @@ bitint_large_huge::lower_stmt (gimple *s > else if (TREE_CODE (TREE_TYPE (rhs1)) == BITINT_TYPE > && bitint_precision_kind (TREE_TYPE (rhs1)) >= bitint_prec_large > && (INTEGRAL_TYPE_P (TREE_TYPE (lhs)) > - || POINTER_TYPE_P (TREE_TYPE (lhs > + || POINTER_TYPE_P (TREE_TYPE (lhs)) > + || gimple_assign_rhs_code (stmt) == VIEW_CONVERT_EXPR)) >{ > final_cast_p = true; > - if (TREE_CODE (TREE_TYPE (lhs)) == INTEGER_TYPE > - && TYPE_PRECISION (TREE_TYPE (lhs)) > MAX_FIXED_MODE_SIZE > + if (((TREE_CODE (TREE_TYPE (lhs)) == INTEGER_TYPE > +&& TYPE_PRECISION (TREE_TYPE (lhs)) > MAX_FIXED_MODE_SIZE) > + || (!INTEGRAL_TYPE_P (TREE_TYPE (lhs)) > + && !POINTER_TYPE_P (TREE_TYPE (lhs > && gimple_assign_rhs_code (stmt) == VIEW_CONVERT_EXPR) >{ > /* Handle VIEW_CONVERT_EXPRs to not generally supported > huge INTEGER_TYPEs like uint256_t or uint512_t. These > are usually emitted from memcpy folding and backends > - support moves with them but that is usually it. */ > - if (TREE_CODE (rhs1) == INTEGER_CST) > -{ > - rhs1 = fold_unary (VIEW_CONVERT_EXPR, TREE_TYPE (lhs), > - rhs1); > - gcc_assert (rhs1 && TREE_CODE (rhs1) == INTEGER_CST); > - gimple_assign_set_rhs1 (stmt, rhs1); > - gimple_assign_set_rhs_code (stmt, INTEGER_CST); > - update_stmt (stmt); > - return; > -} > + support moves with them but that is usually it. > + Similarly handle VCEs to vector/complex types etc. */ > gcc_assert (TREE_CODE (rhs1) == SSA_NAME); > if (SSA_NAME_IS_DEFAULT_DEF (rhs1) > && (!SSA_NAME_VAR (rhs1) || VAR_P (SSA_NAME_VAR (rhs1 > @@ -5376,6 +5370,18 @@ bitint_large_huge::lower_stmt (gimple *s >} >} >} > + else if (TREE_CODE (TREE_TYPE (lhs)) == BITINT_TYPE > + && bitint_precision_kind (TREE_TYPE (lhs)) >= bitint_prec_large > + && !INTEGRAL_TYPE_P (TREE_TYPE (rhs1)) > + && !POINTER_TYPE_P (TREE_TYPE (rhs1)) > + && gimple_assign_rhs_code (stmt) == VIEW_CONVERT_EXPR) > +{ > + int part = var_to_partition (m_map, lhs); > + gcc_assert (m_vars[part] != NULL_TREE); > + lhs = build1 (VIEW_CONVERT_EXPR, TREE_TYPE (rhs1), m_vars[part]); > + insert_before (gimple_build_assign (lhs, rhs1)); > + return; > +} > } > if (gimple_store_p (stmt)) > { > @@ -5411,6 +5417,28 @@ bitint_large_huge::lower_stmt (gimple *s > case IMAGPART_EXPR: >lower_cplxpart_stmt (lhs, g); >goto handled; > + case VIEW_CONVERT_EXPR: > +{ > + tree rhs1 = gimple_assign_rhs1 (g); > + rhs1 = TREE_OPERAND (rhs1, 0); > + if (!INTEGRAL_TYPE_P (TREE_TYPE (rhs1)) > + && !POINTER_TYPE_P (TREE_TYPE (rhs1))) > +{ > + tree ltype = TREE_TYPE (rhs1); > + addr_space_t as = TYPE_ADDR_SPACE (TREE_TYPE (lhs)); > + ltype > += build_qualified_type (ltype, > +TYPE_QUALS (TREE_TYPE (lhs)) > +| ENCODE_QUAL_ADDR_SPACE (as)); > + lhs = build1 (VIEW_CONVERT_EXPR, ltype, lhs); > + gimple_assign_set_lhs (stmt, lhs); > + gimple_assign_set_rhs1 (stmt, rhs1); > + gimple_assign_set_rhs_code (stmt, TREE_CODE (rhs1)); > + update_stmt (stmt); > + return; > +} > +} > +break; > default: >break; > } > @@ -6235,6 +6263,14 @@ gimple_lower_bitint (void) > if (gimple_assign_cast_p (SSA_NAME_DEF_STMT (s))) >{ > tree rhs1 = gimple_assign_rhs1 (SSA_NAME_DEF_STMT (s)); > + if (TREE_CODE (rhs1) == VIEW_CONVERT_EXPR) > +{ > +
Re: [PATCH] vect: Tighten check for impossible SLP layouts [PR113205]
> Am 24.02.2024 um 11:06 schrieb Richard Sandiford : > > During its forward pass, the SLP layout code tries to calculate > the cost of a layout change on an incoming edge. This is taken > as the minimum of two costs: one in which the source partition > keeps its current layout (chosen earlier during the pass) and > one in which the source partition switches to the new layout. > The latter can sometimes be arranged by the backward pass. > > If only one of the costs is valid, the other cost was ignored. > But the PR shows that this is not safe. If the source partition > has layout 0 (the normal layout), we have to be prepared to handle > the case in which that ends up being the only valid layout. > > Other code already accounts for this restriction, e.g. see > the code starting with: > >/* Reject the layout if it would make layout 0 impossible > for later partitions. This amounts to testing that the > target supports reversing the layout change on edges > to later partitions. > > Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Ok Thanks, Richard > Richard > > > gcc/ >PR tree-optimization/113205 >* tree-vect-slp.cc (vect_optimize_slp_pass::forward_cost): Reject >the proposed layout if it does not allow a source partition with >layout 2 to keep that layout. > > gcc/testsuite/ >PR tree-optimization/113205 >* gcc.dg/torture/pr113205.c: New test. > --- > gcc/testsuite/gcc.dg/torture/pr113205.c | 19 +++ > gcc/tree-vect-slp.cc| 4 > 2 files changed, 23 insertions(+) > create mode 100644 gcc/testsuite/gcc.dg/torture/pr113205.c > > diff --git a/gcc/testsuite/gcc.dg/torture/pr113205.c > b/gcc/testsuite/gcc.dg/torture/pr113205.c > new file mode 100644 > index 000..edfba7fcd0e > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr113205.c > @@ -0,0 +1,19 @@ > +char a; > +char *b, *c; > +int d, e, f, g, h; > +int *i; > + > +void > +foo (void) > +{ > + unsigned p; > + d = i[0]; > + e = i[1]; > + f = i[2]; > + g = i[3]; > + p = d * b[0]; > + p += f * c[h]; > + p += e * b[h]; > + p += g * c[h]; > + a = (p + 8000) >> (__SIZEOF_INT__ * __CHAR_BIT__ / 2); > +} > diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc > index 7cf9504398c..895f4f7fb6b 100644 > --- a/gcc/tree-vect-slp.cc > +++ b/gcc/tree-vect-slp.cc > @@ -5034,6 +5034,10 @@ vect_optimize_slp_pass::forward_cost (graph_edge *ud, > unsigned int from_node_i, > cost.split (from_partition.out_degree); > cost.add_serial_cost (edge_cost); > } > + else if (from_partition.layout == 0) > +/* We must allow the source partition to have layout 0 as a fallback, > + in case all other options turn out to be impossible. */ > +return cost; > > /* Take the minimum of that cost and the cost that applies if > FROM_PARTITION instead switches to TO_LAYOUT_I. */ > -- > 2.25.1 >
[Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086 --- Comment #4 from Jakub Jelinek --- But sure, confirmed for both: int foo (int e) { switch (e) { case 1: case 3: case 5: case 7: case 9: case 11: case 13: return 1; default: return 0; } } int bar (int e) { switch (e) { case 1: case 3: case 5: case 7: case 9: case 11: case 13: case 15: return 1; default: return 0; } } where in foo because we emit the guarding cmpl$13, %edi ja .L1 we could just simplify it to andl $1 when <= 13, and the bar case indeed can be done by (e & -15) != 0; Now, the question is if either of these optimizations should be done in the switch lowering, or if we should do it elsewhere where it would optimize also hand written code like that, if user writes it as int foo2 (int e) { if (e <= 13U) return (10922 >> e) & 1; else return 0; } int bar2 (int e) { if (e <= 15U) return (43690 >> e) & 1; else return 0; } Looking at clang, it can optimize bar, it can't optimize foo (uses switch table rather than shift, that is worse than what gcc emits). And emits pretty much what gcc emits for foo2/bar2. Perhaps phiopt could handle this for the bar2 case and match.pd using range info for foo2? Next question is what should be done if the 2 values aren't 1 and 0, but 0 and 1, or some cst and cst + 1 or cst and cst - 1 for some arbitrary constant cst, or cst and 0, or 0 and cst or cst1 and cst2, whether to emit e.g. a conditional move etc.
RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU
Hi Tamar and Richard. Just try DEF_INTERNAL_INT_EXT_FN as below draft patch, not very sure if my understanding is correct(mostly reference the popcount implementation) here. Thanks a lot. https://gcc.gnu.org/pipermail/gcc-patches/2024-February/646442.html Pan -Original Message- From: Tamar Christina Sent: Monday, February 19, 2024 9:05 PM To: Li, Pan2 ; Richard Biener Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang ; kito.ch...@gmail.com Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU > -Original Message- > From: Li, Pan2 > Sent: Monday, February 19, 2024 12:59 PM > To: Tamar Christina ; Richard Biener > > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang > ; kito.ch...@gmail.com > Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU > > Thanks Tamar for comments and explanations. > > > I think we should actually do an indirect optab here, because the IFN can > > be used > > to replace the general representation of saturating arithmetic. > > > e.g. the __builtin_add_overflow case in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600 > > is inefficient on all targets and so the IFN can always expand to something > > that's > more > > efficient like the branchless version add_sat2. > > > I think this is why you suggested a new tree code below, but we don't > > really need > > tree-codes for this. It can be done cleaner using the same way as > DEF_INTERNAL_INT_EXT_FN > > Yes, the backend could choose a branchless(of course we always hate branch for > performance) code-gen or even better there is one saturation insn. > Good to learn DEF_INTERNAL_INT_EXT_FN, and will have a try for it. > > > Richard means that there shouldn't be .SAT_ADDU and .SAT_ADDS and that the > sign > > should be determined by the types at expansion time. i.e. there should > > only be > > .SAT_ADD. > > Got it, my initial idea comes from that we may have two insns for saturation > add, > mostly these insns need to be signed or unsigned. > For example, slt/sltu in riscv scalar. But I am not very clear about a > scenario like this. > During define_expand in backend, we hit the standard name > sat_add_3 but can we tell it is signed or not here? AFAIK, we only have > QI, HI, > SI and DI. Yeah, the way DEF_INTERNAL_SIGNED_OPTAB_FN works is that you give it two optabs, one for when it's signed and one for when it's unsigned, and the right one is picked automatically during expansion. But in GIMPLE you'd only have one IFN. > Maybe I will have the answer after try DEF_INTERNAL_SIGNED_OPTAB_FN, will > keep you posted. Awesome, Thanks! Tamar > > Pan > > -Original Message- > From: Tamar Christina > Sent: Monday, February 19, 2024 4:55 PM > To: Li, Pan2 ; Richard Biener > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang > ; kito.ch...@gmail.com > Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU > > Thanks for doing this! > > > -Original Message- > > From: Li, Pan2 > > Sent: Monday, February 19, 2024 8:42 AM > > To: Richard Biener > > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; Wang, Yanzhang > > ; kito.ch...@gmail.com; Tamar Christina > > > > Subject: RE: [PATCH v1] Internal-fn: Add new internal function SAT_ADDU > > > > Thanks Richard for comments. > > > > > I'll note that on RTL we already have SS_PLUS/US_PLUS and friends and > > > the corresponding ssadd/usadd optabs. There's not much documentation > > > unfortunately besides the use of gen_*_fixed_libfunc usage where the > comment > > > suggests this is used for fixed-point operations. It looks like arm uses > > > fractional/accumulator modes for this but for example bfin has ssaddsi3. > > > > I find the related description about plus family in GCC internals doc but > > it doesn't > > mention > > anything about mode m here. > > > > (plus:m x y) > > (ss_plus:m x y) > > (us_plus:m x y) > > These three expressions all represent the sum of the values represented by x > > and y carried out in machine mode m. They diff er in their behavior on > > overflow > > of integer modes. plus wraps round modulo the width of m; ss_plus saturates > > at the maximum signed value representable in m; us_plus saturates at the > > maximum unsigned value. > > > > > The natural thing is to use direct optab internal functions (that's what > > > you > > > basically did, but you added a new optab, IMO without good reason). > > I think we should actually do an indirect optab here, because the IFN can be > used > to replace the general representation of saturating arithmetic. > > e.g. the __builtin_add_overflow case in > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112600 > is inefficient on all targets and so the IFN can always expand to something > that's > more > efficient like the branchless version add_sat2. > > I think this is why you suggested a new tree code below, but we don't really > need > tree-codes for
[PATCH v2] Draft|Internal-fn: Introduce internal fn saturation US_PLUS
From: Pan Li Hi Richard & Tamar, Try the DEF_INTERNAL_INT_EXT_FN as your suggestion. By mapping us_plus$a3 to the RTL representation (us_plus:m x y) in optabs.def. And then expand_US_PLUS in internal-fn.cc. Not very sure if my understanding is correct for DEF_INTERNAL_INT_EXT_FN. I am not sure if we still need DEF_INTERNAL_SIGNED_OPTAB_FN here, given the RTL representation has (ss_plus:m x y) and (us_plus:m x y) already. Note this patch is a draft for validation, no test are invovled here. gcc/ChangeLog: * builtins.def (BUILT_IN_US_PLUS): Add builtin def. (BUILT_IN_US_PLUSIMAX): Ditto. (BUILT_IN_US_PLUSL): Ditto. (BUILT_IN_US_PLUSLL): Ditto. (BUILT_IN_US_PLUSG): Ditto. * config/riscv/riscv-protos.h (riscv_expand_us_plus): Add new func decl for expanding us_plus. * config/riscv/riscv.cc (riscv_expand_us_plus): Add new func impl for expanding us_plus. * config/riscv/riscv.md (us_plus3): Add new pattern impl us_plus3. * internal-fn.cc (expand_US_PLUS): Add new func impl to expand US_PLUS. * internal-fn.def (US_PLUS): Add new INT_EXT_FN. * internal-fn.h (expand_US_PLUS): Add new func decl. * match.pd: Add new simplify pattern for us_plus. * optabs.def (OPTAB_NL): Add new OPTAB_NL to US_PLUS rtl. Signed-off-by: Pan Li --- gcc/builtins.def| 7 + gcc/config/riscv/riscv-protos.h | 1 + gcc/config/riscv/riscv.cc | 46 + gcc/config/riscv/riscv.md | 11 gcc/internal-fn.cc | 26 +++ gcc/internal-fn.def | 3 +++ gcc/internal-fn.h | 1 + gcc/match.pd| 17 gcc/optabs.def | 2 ++ 9 files changed, 114 insertions(+) diff --git a/gcc/builtins.def b/gcc/builtins.def index f6f3e104f6a..0777b912cfa 100644 --- a/gcc/builtins.def +++ b/gcc/builtins.def @@ -1055,6 +1055,13 @@ DEF_GCC_BUILTIN(BUILT_IN_POPCOUNTIMAX, "popcountimax", BT_FN_INT_UINTMAX DEF_GCC_BUILTIN(BUILT_IN_POPCOUNTL, "popcountl", BT_FN_INT_ULONG, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GCC_BUILTIN(BUILT_IN_POPCOUNTLL, "popcountll", BT_FN_INT_ULONGLONG, ATTR_CONST_NOTHROW_LEAF_LIST) DEF_GCC_BUILTIN(BUILT_IN_POPCOUNTG, "popcountg", BT_FN_INT_VAR, ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF) + +DEF_GCC_BUILTIN(BUILT_IN_US_PLUS, "us_plus", BT_FN_INT_UINT, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_GCC_BUILTIN(BUILT_IN_US_PLUSIMAX, "us_plusimax", BT_FN_INT_UINTMAX, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_GCC_BUILTIN(BUILT_IN_US_PLUSL, "us_plusl", BT_FN_INT_ULONG, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_GCC_BUILTIN(BUILT_IN_US_PLUSLL, "us_plusll", BT_FN_INT_ULONGLONG, ATTR_CONST_NOTHROW_LEAF_LIST) +DEF_GCC_BUILTIN(BUILT_IN_US_PLUSG, "us_plusg", BT_FN_INT_VAR, ATTR_CONST_NOTHROW_TYPEGENERIC_LEAF) + DEF_EXT_LIB_BUILTIN(BUILT_IN_POSIX_MEMALIGN, "posix_memalign", BT_FN_INT_PTRPTR_SIZE_SIZE, ATTR_NOTHROW_NONNULL_LEAF) DEF_GCC_BUILTIN(BUILT_IN_PREFETCH, "prefetch", BT_FN_VOID_CONST_PTR_VAR, ATTR_NOVOPS_LEAF_LIST) DEF_LIB_BUILTIN(BUILT_IN_REALLOC, "realloc", BT_FN_PTR_PTR_SIZE, ATTR_ALLOC_WARN_UNUSED_RESULT_SIZE_2_NOTHROW_LEAF_LIST) diff --git a/gcc/config/riscv/riscv-protos.h b/gcc/config/riscv/riscv-protos.h index 80efdf2b7e5..ba6086f1f25 100644 --- a/gcc/config/riscv/riscv-protos.h +++ b/gcc/config/riscv/riscv-protos.h @@ -132,6 +132,7 @@ extern void riscv_asm_output_external (FILE *, const tree, const char *); extern bool riscv_zcmp_valid_stack_adj_bytes_p (HOST_WIDE_INT, int); extern void riscv_legitimize_poly_move (machine_mode, rtx, rtx, rtx); +extern void riscv_expand_us_plus (rtx, rtx, rtx); #ifdef RTX_CODE extern void riscv_expand_int_scc (rtx, enum rtx_code, rtx, rtx, bool *invert_ptr = 0); diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc index 4100abc9dd1..23f08974f07 100644 --- a/gcc/config/riscv/riscv.cc +++ b/gcc/config/riscv/riscv.cc @@ -10657,6 +10657,52 @@ riscv_vector_mode_supported_any_target_p (machine_mode) return true; } +/* Emit insn for the saturation addu, aka (x + y) | - ((x + y) < x). */ +void +riscv_expand_us_plus (rtx dest, rtx x, rtx y) +{ + machine_mode mode = GET_MODE (dest); + rtx pmode_sum = gen_reg_rtx (Pmode); + rtx pmode_lt = gen_reg_rtx (Pmode); + rtx pmode_x = gen_lowpart (Pmode, x); + rtx pmode_y = gen_lowpart (Pmode, y); + rtx pmode_dest = gen_reg_rtx (Pmode); + + /* Step-1: sum = x + y */ + if (mode == SImode && mode != Pmode) +{ /* Take addw to avoid the sum truncate. */ + rtx simode_sum = gen_reg_rtx (SImode); + riscv_emit_binary (PLUS, simode_sum, x, y); + emit_move_insn (pmode_sum, gen_lowpart (Pmode, simode_sum)); +} + else +riscv_emit_binary (PLUS, pmode_sum, pmode_x, pmode_y); + + /* Step-1.1: truncate sum for HI and QI as we have no insn for add QI/HI.
[Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086 --- Comment #3 from Jakub Jelinek --- And the rest boils down to what code to generate for bool foo (int x) { return ((682 >> x) & 1); } Both that and switch from the #c0 testcase boil down to _1 = 682 >> x_2(D); _3 = (_Bool) _1; or _6 = 682 >> _4; _8 = (_Bool) _6; in GIMPLE dump. Now, for the foo above, gcc emits movl$682, %eax btl %edi, %eax setc%al ret and clang emits the same: movl$682, %eax # imm = 0x2AA btl %edi, %eax setb%al retq Though, e.g. clang 14 emitted movl%edi, %ecx movl$682, %eax # imm = 0x2AA shrl%cl, %eax andb$1, %al retq which is longer, dunno what is faster.
[Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086 --- Comment #2 from Jan Schultke --- Yeah right, the actual optimal output (which clang finds) is: > test_switch(E): > test edi, -7 > sete al > ret Testing with -7 also makes sure that the 8-bit and greater are all zero.
Re: [PATCH RFA] build: drop target libs from LD_LIBRARY_PATH [PR105688]
Iain Sandoe writes: > Hi Gaius, > >> On 22 Feb 2024, at 18:06, Gaius Mulley wrote: >> >> Iain Sandoe writes: >> >>> Right now, AFAIK the only target runtimes used by host tools are >>> libstdc++, libgcc and libgnat. I agree that might change with rust - >>> since the rust folks are talking about using one of the runtimes in >>> the FE, I am not aware of other language FEs requiring their targte >>> runtimes to be available to the host tools (adding Gaius in case I >>> missed something with m2 - which is quite complex inthe >>> bootstrapping). > >> the m2 infrastructure translates and builds gcc/m2/gm2-libs along with >> gcc/m2/gm2-compiler and uses these objects for cc1gm2, pge, mc etc - >> rather than the library archives generated from /libgm2 > > If I understand this (and my builds of the m2 stuff) correctly, this is done > locally to the builds of the host-side components; in particular not > controlled > by the top level Makefile.{tpl,def}? Hi Iain, yes indeed, > (so that we do not see builds of libgm2 in stage1/2- but only in the > stage3-target builds? > > in which case, this should be outside the scope of the patch here. regards, Gaius
[Bug rtl-optimization/114085] Internal (cross) compiler error when building libstdc++ for the H8/300 family
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114085 Jonathan Wakely changed: What|Removed |Added Component|libstdc++ |rtl-optimization --- Comment #1 from Jonathan Wakely --- If the compiler crashes then that's a compiler bug, not a library bug. Reassigning to rtl-optimization but that might not be accurate.
[Bug middle-end/114084] ICE: SIGSEGV: infinite recursion in fold_build2_loc / fold_binary_loc with _BitInt(127)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114084 --- Comment #6 from Jakub Jelinek --- As in the following patch, which is supposed to track the origin of the 6 something0 variables in bitmasks, bit 1 means it comes (partly) from op0, bit 2 means it comes (partly) from op1. --- gcc/fold-const.cc.jj2024-02-24 09:49:09.098815803 +0100 +++ gcc/fold-const.cc 2024-02-24 11:01:34.266513041 +0100 @@ -11779,6 +11779,15 @@ fold_binary_loc (location_t loc, enum tr + (lit0 != 0) + (lit1 != 0) + (minus_lit0 != 0) + (minus_lit1 != 0)) > 2) { + int var0_origin = (var0 != 0) + 2 * (var1 != 0); + int minus_var0_origin + = (minus_var0 != 0) + 2 * (minus_var1 != 0); + int con0_origin = (con0 != 0) + 2 * (con1 != 0); + int minus_con0_origin + = (minus_con0 != 0) + 2 * (minus_con1 != 0); + int lit0_origin = (lit0 != 0) + 2 * (lit1 != 0); + int minus_lit0_origin + = (minus_lit0 != 0) + 2 * (minus_lit1 != 0); var0 = associate_trees (loc, var0, var1, code, atype); minus_var0 = associate_trees (loc, minus_var0, minus_var1, code, atype); @@ -11791,15 +11800,19 @@ fold_binary_loc (location_t loc, enum tr if (minus_var0 && var0) { + var0_origin |= minus_var0_origin; var0 = associate_trees (loc, var0, minus_var0, MINUS_EXPR, atype); minus_var0 = 0; + minus_var0_origin = 0; } if (minus_con0 && con0) { + con0_origin |= minus_con0_origin; con0 = associate_trees (loc, con0, minus_con0, MINUS_EXPR, atype); minus_con0 = 0; + minus_con0_origin = 0; } /* Preserve the MINUS_EXPR if the negative part of the literal is @@ -11815,15 +11828,19 @@ fold_binary_loc (location_t loc, enum tr /* But avoid ending up with only negated parts. */ && (var0 || con0)) { + minus_lit0_origin |= lit0_origin; minus_lit0 = associate_trees (loc, minus_lit0, lit0, MINUS_EXPR, atype); lit0 = 0; + lit0_origin = 0; } else { + lit0_origin |= minus_lit0_origin; lit0 = associate_trees (loc, lit0, minus_lit0, MINUS_EXPR, atype); minus_lit0 = 0; + minus_lit0_origin = 0; } } @@ -11833,37 +11850,51 @@ fold_binary_loc (location_t loc, enum tr return NULL_TREE; /* Eliminate lit0 and minus_lit0 to con0 and minus_con0. */ + con0_origin |= lit0_origin; con0 = associate_trees (loc, con0, lit0, code, atype); - lit0 = 0; + minus_con0_origin |= minus_lit0_origin; minus_con0 = associate_trees (loc, minus_con0, minus_lit0, code, atype); - minus_lit0 = 0; /* Eliminate minus_con0. */ if (minus_con0) { if (con0) - con0 = associate_trees (loc, con0, minus_con0, - MINUS_EXPR, atype); + { + con0_origin |= minus_con0_origin; + con0 = associate_trees (loc, con0, minus_con0, + MINUS_EXPR, atype); + } else if (var0) - var0 = associate_trees (loc, var0, minus_con0, - MINUS_EXPR, atype); + { + var0_origin |= minus_con0_origin; + var0 = associate_trees (loc, var0, minus_con0, + MINUS_EXPR, atype); + } else gcc_unreachable (); - minus_con0 = 0; } /* Eliminate minus_var0. */ if (minus_var0) { if (con0) - con0 = associate_trees (loc, con0, minus_var0, - MINUS_EXPR, atype); + { + con0_origin |= minus_var0_origin; + con0 = associate_trees (loc, con0, minus_var0, + MINUS_EXPR, atype); + } else
[Bug middle-end/114086] Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086 Jakub Jelinek changed: What|Removed |Added CC||jakub at gcc dot gnu.org --- Comment #1 from Jakub Jelinek --- mov eax, edi and eax, 1 ret seems wrong without -fstrict-enums, one could call test_switch(static_cast (9)) and it should return false in that case.
[pushed] Restrict gcc.dg/rtl/aarch64/pr113295-1.c to aarch64
I keep forgetting that gcc.dg/rtl is the one testsuite where tests in target-specific subdirectories aren't automatically restricted to that target. Pushed as obvious after testing on aarch64-linux-gnu & x86_64-linux-gnu. Richard gcc/testsuite/ * gcc.dg/rtl/aarch64/pr113295-1.c: Restrict to aarc64*-*-*. --- gcc/testsuite/gcc.dg/rtl/aarch64/pr113295-1.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/testsuite/gcc.dg/rtl/aarch64/pr113295-1.c b/gcc/testsuite/gcc.dg/rtl/aarch64/pr113295-1.c index 481fb813f61..bf6c5d1f256 100644 --- a/gcc/testsuite/gcc.dg/rtl/aarch64/pr113295-1.c +++ b/gcc/testsuite/gcc.dg/rtl/aarch64/pr113295-1.c @@ -1,5 +1,5 @@ +// { dg-do run { target aarch64*-*-* } } // { dg-options "-O2" } -// { dg-do run } struct data { double x; -- 2.25.1
[PATCH] vect: Tighten check for impossible SLP layouts [PR113205]
During its forward pass, the SLP layout code tries to calculate the cost of a layout change on an incoming edge. This is taken as the minimum of two costs: one in which the source partition keeps its current layout (chosen earlier during the pass) and one in which the source partition switches to the new layout. The latter can sometimes be arranged by the backward pass. If only one of the costs is valid, the other cost was ignored. But the PR shows that this is not safe. If the source partition has layout 0 (the normal layout), we have to be prepared to handle the case in which that ends up being the only valid layout. Other code already accounts for this restriction, e.g. see the code starting with: /* Reject the layout if it would make layout 0 impossible for later partitions. This amounts to testing that the target supports reversing the layout change on edges to later partitions. Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK to install? Richard gcc/ PR tree-optimization/113205 * tree-vect-slp.cc (vect_optimize_slp_pass::forward_cost): Reject the proposed layout if it does not allow a source partition with layout 2 to keep that layout. gcc/testsuite/ PR tree-optimization/113205 * gcc.dg/torture/pr113205.c: New test. --- gcc/testsuite/gcc.dg/torture/pr113205.c | 19 +++ gcc/tree-vect-slp.cc| 4 2 files changed, 23 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/torture/pr113205.c diff --git a/gcc/testsuite/gcc.dg/torture/pr113205.c b/gcc/testsuite/gcc.dg/torture/pr113205.c new file mode 100644 index 000..edfba7fcd0e --- /dev/null +++ b/gcc/testsuite/gcc.dg/torture/pr113205.c @@ -0,0 +1,19 @@ +char a; +char *b, *c; +int d, e, f, g, h; +int *i; + +void +foo (void) +{ + unsigned p; + d = i[0]; + e = i[1]; + f = i[2]; + g = i[3]; + p = d * b[0]; + p += f * c[h]; + p += e * b[h]; + p += g * c[h]; + a = (p + 8000) >> (__SIZEOF_INT__ * __CHAR_BIT__ / 2); +} diff --git a/gcc/tree-vect-slp.cc b/gcc/tree-vect-slp.cc index 7cf9504398c..895f4f7fb6b 100644 --- a/gcc/tree-vect-slp.cc +++ b/gcc/tree-vect-slp.cc @@ -5034,6 +5034,10 @@ vect_optimize_slp_pass::forward_cost (graph_edge *ud, unsigned int from_node_i, cost.split (from_partition.out_degree); cost.add_serial_cost (edge_cost); } + else if (from_partition.layout == 0) +/* We must allow the source partition to have layout 0 as a fallback, + in case all other options turn out to be impossible. */ +return cost; /* Take the minimum of that cost and the cost that applies if FROM_PARTITION instead switches to TO_LAYOUT_I. */ -- 2.25.1
[Bug middle-end/114086] New: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114086 Bug ID: 114086 Summary: Boolean switches could have a lot better codegen, possibly utilizing bit-vectors Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: middle-end Assignee: unassigned at gcc dot gnu.org Reporter: janschultke at googlemail dot com Target Milestone: --- https://godbolt.org/z/3acqbbn3E enum struct E { a, b, c, d, e, f, g, h }; bool test_switch(E e) { switch (e) { case E::a: case E::c: case E::e: case E::g: return true; default: return false; } } Expected output === test_switch(E): mov eax, edi and eax, 1 ret Actual output (-O3) === test_switch(E): xor eax, eax cmp edi, 6 ja .L1 mov eax, 85 bt rax, rdi setc al .L1: ret Explanation === Boolean switches in general can be optimized a lot better than what GCC currently does. Clang does find the optimization to a bitwise AND, although this may be a big ask. Generally, contiguous boolean switches (that is, switch statements where all cases yield a boolean value and the labels are contiguous) can be optimized to accessing a bit vector. That switch could have been transformed into: > return 0b01010101 >> int(e);
[Bug target/114083] Possible word play on conditional/unconditional
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114083 --- Comment #5 from Andreas Schwab --- Enable conditional-move operations even if unsupported by hardware.
[Bug middle-end/114084] ICE: SIGSEGV: infinite recursion in fold_build2_loc / fold_binary_loc with _BitInt(127)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114084 --- Comment #5 from Jakub Jelinek --- Or perhaps the if (ok && ((var0 != 0) + (var1 != 0) + (minus_var0 != 0) + (minus_var1 != 0) + (con0 != 0) + (con1 != 0) + (minus_con0 != 0) + (minus_con1 != 0) + (lit0 != 0) + (lit1 != 0) + (minus_lit0 != 0) + (minus_lit1 != 0)) > 2) condition should be amended to avoid the reassociation in cases where clearly nothing good can come out of that. Which is if the association actually doesn't reshuffle anything. (var0 == 0) || (var1 == 0) && (and similarly for the other 5 pairs) and (ignoring the minus_* stuff that would need more thoughts on it) (con0 != 0 && lit0 != 0) || (con1 != 0 && lit1 != 0), then it reassociates to the original stuff in op0 and original stuff in op1, no change. But how the minus_* plays together with this is harder. Perhaps if lazy we could have a bool var whether there has been any association between subtrees from original op0 and op1, initially set to false and set if we associate_trees between something that comes from op0 and op1, and only do the final associate_trees if that is the case, because if not, it should be folding of the individual suboperands, not reassociation.
[Bug middle-end/114084] ICE: SIGSEGV: infinite recursion in fold_build2_loc / fold_binary_loc with _BitInt(127)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114084 --- Comment #4 from Jakub Jelinek --- Though, I must say not really sure why this wouldn't recurse infinitely even without the casts.
[Bug middle-end/114084] ICE: SIGSEGV: infinite recursion in fold_build2_loc / fold_binary_loc with _BitInt(127)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114084 --- Comment #3 from Jakub Jelinek --- Bet the associate code is really unprepared to have unfolded trees around, which hasn't been the case before delayed folding has been introduced to C and C++ FEs. Unfortunately it isn't complete, because e.g. convert_to_integer_1 -> do_narrow -> fold_build2_loc happily folds. Anyway, quick fix could be not trying to reassociate TREE_CONSTANT parts: --- gcc/fold-const.cc.jj2024-01-26 00:07:58.0 +0100 +++ gcc/fold-const.cc 2024-02-24 09:38:40.150808529 +0100 @@ -908,6 +908,8 @@ split_tree (tree in, tree type, enum tre if (TREE_CODE (in) == INTEGER_CST || TREE_CODE (in) == REAL_CST || TREE_CODE (in) == FIXED_CST) *litp = in; + else if (TREE_CONSTANT (in)) +*conp = in; else if (TREE_CODE (in) == code || ((! FLOAT_TYPE_P (TREE_TYPE (in)) || flag_associative_math) && ! SAT_FIXED_POINT_TYPE_P (TREE_TYPE (in)) @@ -956,8 +958,6 @@ split_tree (tree in, tree type, enum tre if (neg_var_p && var) *minus_varp = var, var = 0; } - else if (TREE_CONSTANT (in)) -*conp = in; else if (TREE_CODE (in) == BIT_NOT_EXPR && code == PLUS_EXPR) { So, the problem happens on typedef unsigned _BitInt (__SIZEOF_INT__ * __CHAR_BIT__ - 1) T; T a, b; void foo (void) { b = (T) ((a | (-1U >> 1)) >> 1 | (a | 5) << 4); } when fold_binary_loc is called on (unsigned _BitInt(31)) a << 4 | 80 and (unsigned _BitInt(31)) (2147483647 >> 1), but the important part is that the op0 has the unsigned _BitInt(31) type, while op1 is NOP_EXPR to that type from RSHIFT_EXPR done on T type (the typedef). Soon BIT_IOR_EXPR folding is called on (unsigned _BitInt(31)) a << 4 and 2147483647 >> 1 | 80 where the latter is all in T type (fold_binary_loc does STRIP_NOPS). Because split_tree prefers same code over TREE_CONSTANT, this splits it into the LSHIFT_EXPR var0, RSHIFT_EXPR con1 (because it is TREE_CONSTANT) and the T type 80 literal in lit1, everything else is NULL. As there are 3 objects, it reassociates. We first associate_tree the 0 vs. 1 cases, but that just moves the *1 into *0 because their counterparts are NULL. Both the RSHIFT_EXPR and INTEGER_CST 80 have T type but atype is the build_bitint_type non-typedef type, so 11835 /* Eliminate lit0 and minus_lit0 to con0 and minus_con0. */ 11836 con0 = associate_trees (loc, con0, lit0, code, atype); returns NOP_EXPR of the RSHIFT_EXPR | INTEGER_CST. And then we associate_trees the LSHIFT_EXPR with this result and so it recurses infinitely. Perhaps my above patch is an improvement, if we know some subtree is TREE_CONSTANT, all we need is just wait for it to be constant folded (not sure it would always do e.g. because of division by zero or similar) trying to reassociate its parts with other expressions might just split the constants to other spots instead of keeping it together.