[gcc12 backport] PR target/105930: Split *xordi3_doubleword after reload on x86.
This is a backport of the fix for PR target/105930 from mainline to the gcc12 release branch. This patch has been retested against the gcc12 branch on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for the gcc12 branch? 2022-07-09 Roger Sayle Uroš Bizjak gcc/ChangeLog PR target/105930 * config/i386/i386.md (*di3_doubleword): Split after reload. Use rtx_equal_p to avoid creating memory-to-memory moves, and emit NOTE_INSN_DELETED if operand[2] is zero (i.e. with -O0). Thanks in advance, Roger -- diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 7c9560fc4..1c4781d 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -10400,22 +10400,25 @@ "ix86_expand_binary_operator (, mode, operands); DONE;") (define_insn_and_split "*di3_doubleword" - [(set (match_operand:DI 0 "nonimmediate_operand") + [(set (match_operand:DI 0 "nonimmediate_operand" "=ro,r") (any_or:DI -(match_operand:DI 1 "nonimmediate_operand") -(match_operand:DI 2 "x86_64_szext_general_operand"))) +(match_operand:DI 1 "nonimmediate_operand" "0,0") +(match_operand:DI 2 "x86_64_szext_general_operand" "re,o"))) (clobber (reg:CC FLAGS_REG))] "!TARGET_64BIT - && ix86_binary_operator_ok (, DImode, operands) - && ix86_pre_reload_split ()" + && ix86_binary_operator_ok (, DImode, operands)" "#" - "&& 1" + "&& reload_completed" [(const_int 0)] { + /* This insn may disappear completely when operands[2] == const0_rtx + and operands[0] == operands[1], which requires a NOTE_INSN_DELETED. */ + bool emit_insn_deleted_note_p = false; + split_double_mode (DImode, &operands[0], 3, &operands[0], &operands[3]); if (operands[2] == const0_rtx) -emit_move_insn (operands[0], operands[1]); +emit_insn_deleted_note_p = true; else if (operands[2] == constm1_rtx) { if ( == IOR) @@ -10427,7 +10430,10 @@ ix86_expand_binary_operator (, SImode, &operands[0]); if (operands[5] == const0_rtx) -emit_move_insn (operands[3], operands[4]); +{ + if (emit_insn_deleted_note_p) + emit_note (NOTE_INSN_DELETED); +} else if (operands[5] == constm1_rtx) { if ( == IOR)
Re: [gcc12 backport] PR target/105930: Split *xordi3_doubleword after reload on x86.
On Sat, Jul 9, 2022 at 11:26 AM Roger Sayle wrote: > > > This is a backport of the fix for PR target/105930 from mainline to the > gcc12 release branch. This patch has been retested against the gcc12 > branch on x86_64-pc-linux-gnu with make bootstrap and make -k check, > both with and without --target_board=unix{-m32} with no new failures. > Ok for the gcc12 branch? > > > 2022-07-09 Roger Sayle > Uroš Bizjak > > gcc/ChangeLog > PR target/105930 > * config/i386/i386.md (*di3_doubleword): Split after > reload. Use rtx_equal_p to avoid creating memory-to-memory moves, > and emit NOTE_INSN_DELETED if operand[2] is zero (i.e. with -O0). OK. Thanks, Uros. > > Thanks in advance, > Roger > -- >
Modula-2: merge followup (brief update on the progress of the new linking implementation)
A very brief update to say that I've merged the new linking implementation back onto the devel/modula-2 branch, regards, Gaius
[x86_64 PATCH] Improved Scalar-To-Vector (STV) support for TImode to V1TImode.
This patch upgrades x86_64's scalar-to-vector (STV) pass to more aggressively transform 128-bit scalar TImode operations into vector V1TImode operations performed on SSE registers. TImode functionality already exists in STV, but only for move operations, this changes brings support for logical operations (AND, IOR, XOR, NOT and ANDN) and comparisons. The effect of these changes are conveniently demonstrated by the new sse4_1-stv-5.c test case: __int128 a[16]; __int128 b[16]; __int128 c[16]; void foo() { for (unsigned int i=0; i<16; i++) a[i] = b[i] & ~c[i]; } which when currently compiled on mainline wtih -O2 -msse4 produces: foo:xorl%eax, %eax .L2:movqc(%rax), %rsi movqc+8(%rax), %rdi addq$16, %rax notq%rsi notq%rdi andqb-16(%rax), %rsi andqb-8(%rax), %rdi movq%rsi, a-16(%rax) movq%rdi, a-8(%rax) cmpq$256, %rax jne .L2 ret but with this patch now produces: foo:xorl%eax, %eax .L2:movdqa c(%rax), %xmm0 pandn b(%rax), %xmm0 addq$16, %rax movaps %xmm0, a-16(%rax) cmpq$256, %rax jne .L2 ret Technically, the STV pass is implemented by three C++ classes, a common abstract base class "scalar_chain" that contains common functionality, and two derived classes: general_scalar_chain (which handles SI and DI modes) and timode_scalar_chain (which handles TI modes). As mentioned previously, because only TI mode moves were handled the two worker classes behaved significantly differently. These changes bring the functionality of these two classes closer together, which is reflected by refactoring more shared code from general_scalar_chain to the parent scalar_chain and reusing it from timode. There still remain significant differences (and simplifications) so the existing division of classes (as specializations) continues to make sense. Obviously, there are more changes to come (shifts and rotates), and compute_convert_gain doesn't yet have its final (tuned) form, but is already an improvement over the "return 1;" used previously. This patch has been tested on x86_64-pc-linux-gnu with make boostrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2022-07-09 Roger Sayle gcc/ChangeLog * config/i386/i386-features.h (scalar_chain): Add fields insns_conv, n_sse_to_integer and n_integer_to_sse to this parent class, moved from general_scalar_chain. (scalar_chain::convert_compare): Protected method moved from general_scalar_chain. (mark_dual_mode_def): Make protected, not private virtual. (scalar_chain:convert_op): New private virtual method. (general_scalar_chain::general_scalar_chain): Simplify constructor. (general_scalar_chain::~general_scalar_chain): Delete destructor. (general_scalar_chain): Move insns_conv, n_sse_to_integer and n_integer_to_sse fields to parent class, scalar_chain. (general_scalar_chain::mark_dual_mode_def): Delete prototype. (general_scalar_chain::convert_compare): Delete prototype. (timode_scalar_chain::compute_convert_gain): Remove simplistic implementation, convert to a method prototype. (timode_scalar_chain::mark_dual_mode_def): Delete prototype. (timode_scalar_chain::convert_op): Prototype new virtual method. * config/i386/i386-features.cc (scalar_chain::scalar_chain): Allocate insns_conv and initialize n_sse_to_integer and n_integer_to_sse fields in constructor. (scalar_chain::scalar_chain): Free insns_conv in destructor. (general_scalar_chain::general_scalar_chain): Delete constructor, now defined in the class declaration. (general_scalar_chain::~general_scalar_chain): Delete destructor. (scalar_chain::mark_dual_mode_def): Renamed from general_scalar_chain::mark_dual_mode_def. (timode_scalar_chain::mark_dual_mode_def): Delete. (scalar_chain::convert_compare): Renamed from general_scalar_chain::convert_compare. (timode_scalar_chain::compute_convert_gain): New method to determine the gain from converting a TImode chain to V1TImode. (timode_scalar_chain::convert_op): New method to convert an operand from TImode to V1TImode. (timode_scalar_chain::convert_insn) : Only PUT_MODE on REG_EQUAL notes that were originally TImode (not CONST_INT). Handle AND, ANDN, XOR, IOR, NOT and COMPARE. (timode_mem_p): Helper predicate to check where operand is memory reference with sufficient alignment for TImode STV. (timode_scalar_to_vector_candidate_p): Use convertible_comparison_p to check whether COMPARE is convertible. Handle SET_DESTs that that are REG_P or MEM_P and SET_SRCs that are R
[committed] libstdc++: Remove obsolete comment in
libstdc++: Remove obsolete comment in header The comment is obsolete because char_traits.h do not include stl_algobase.h anymore and stl_algobase.h is included directly from a few lines below. libstdc++-v3/ChangeLog: * include/std/string: Remove obsolete comment about char_traits.h including stl_algobase.h. François diff --git a/libstdc++-v3/include/std/string b/libstdc++-v3/include/std/string index 37a4aaba9cd..62ecdb3af45 100644 --- a/libstdc++-v3/include/std/string +++ b/libstdc++-v3/include/std/string @@ -37,7 +37,7 @@ #include #include -#include // NB: In turn includes stl_algobase.h +#include #include #include #include // For operators >>, <<, and getline.
Re: Mips: Fix kernel_stat structure size
On Sat, 9 Jul 2022, Xi Ruoyao wrote: > On Fri, 2022-07-08 at 21:42 -0400, Hans-Peter Nilsson wrote: > > On Fri, 1 Jul 2022, Dimitrije Milosevic wrote: > > > > > Fix kernel_stat structure size for non-Android 32-bit Mips. > > > LLVM currently has this value for the kernel_stat structure size, > > > as per compiler-rt/lib/sanitizer- > > > common/sanitizer_platform_limits_posix.h. > > > This also resolves one of the build issues for non-Android 32-bit > > > Mips. > > > > I insist that PR105614 comment #7 is the way to go, i.e. fix > > the merge error, avoiding the faulty include that it > > reintroduced.? Was this tested on O32? > > I'm pretty sure it is *not* the way to go. > > Sanitizer does not really intercept system call. It intercepts libc > stat() or lstat() etc. calls. So you need to keep struct_kernel_stat_sz > same as the size of struct stat in libc, i. e. "the size of buffer which > *libc* stat()-like functions writing into". > > The "kernel_" in the name is just misleading. You make a sound argument, and I just refer to my old commit 9f943b2446f2d0a. Either way, the proof is in the pussing: was this tested for O32 or not? > And, if you still think it should be the way to go, let's submit the > change to LLVM and get it reviewed properly. Sorry, I've no longer interest in mips besides I'd like to see un-broken what I helped getting in a working state (support ASAN in gcc for mips O32). brgds, H-P
Re: [committed] libstdc++: Remove obsolete comment in
On Sat, 9 Jul 2022 at 13:26, François Dumont via Libstdc++ wrote: > > libstdc++: Remove obsolete comment in header > > The comment is obsolete because char_traits.h do not include > stl_algobase.h > anymore and stl_algobase.h is included directly from a few > lines > below. Nice, thanks > > libstdc++-v3/ChangeLog: > > * include/std/string: Remove obsolete comment about > char_traits.h including > stl_algobase.h. > > François
Re: [PATCH v3] tree-optimization/95821 - Convert strlen + strchr to memchr
On 6/21/2022 12:12 PM, Noah Goldstein via Gcc-patches wrote: This patch allows for strchr(x, c) to the replace with memchr(x, c, strlen(x) + 1) if strlen(x) has already been computed earlier in the tree. Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821 Since memchr doesn't need to re-find the null terminator it is faster than strchr. bootstrapped and tested on x86_64-linux. PR tree-optimization/95821 gcc/ * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit memchr instead of strchr if strlen already computed. gcc/testsuite/ * c-c++-common/pr95821-1.c: New test. * c-c++-common/pr95821-2.c: New test. * c-c++-common/pr95821-3.c: New test. * c-c++-common/pr95821-4.c: New test. * c-c++-common/pr95821-5.c: New test. * c-c++-common/pr95821-6.c: New test. * c-c++-common/pr95821-7.c: New test. * c-c++-common/pr95821-8.c: New test. Given Jakub's involvement to-date and the fact this touches tree-ssa-strlen.cc I think Jakub should have final ACK/NAK on this. jeff
Re: [PATCH] match.pd: Add new bitwise arithmetic pattern [PR98304]
On 7/7/2022 7:59 AM, Sam Feifer via Gcc-patches wrote: Hi! This patch is meant to solve a missed optimization in match.pd. It optimizes the following expression: n - (((n > 63) ? n : 63) & -64) where the constant being negated (in this case -64) is a power of 2 and the sum of the two constants is -1. For the signed case, this gets optimized to (n <= 63) ? n : (n & 63). For the unsigned case, it gets optimized to (n & 63). In both scenarios, the number of instructions produced decreases. There are also tests for this optimization making sure the optimization happens when it is supposed to, and does not happen when it isn't. Bootstrapped/regtested on x86_64-pc-linux-gnu, ok for trunk? PR tree-optimization/98304 gcc/ChangeLog: * match.pd (n - (((n > C1) ? n : C1) & -C2)): New simplification. gcc/testsuite/ChangeLog: * gcc.c-torture/execute/pr98304-2.c: New test. * gcc.dg/pr98304-1.c: New test. OK. I'm going to assume Red Hat's assignment covers you and/or you want to contribute under the DCO. Going forward, if you're part of the tools team for Red Hat and expect to be contributing regularly, you'll probably want to get authenticated write access so that you can commit approved changes (anyone on the team should be able to help with that). I'll go ahead and push this one to the trunk. Thanks, Jeff
Re: [PATCH v2] Support --disable-fixincludes.
On 7/8/2022 5:14 AM, Martin Liška wrote: On 5/25/22 07:37, Alexandre Oliva wrote: On May 24, 2022, Martin Liška wrote: Allways install limits.h and syslimits.h header files to include folder. typo: s/Allways/Always/ Hello. Fixed. I'm a little worried about this bit, too. limitx.h includes "syslimits.h", mentioning it should be in the same directory. Perhaps it could be left in include-fixed? Well, I would like to go w/o include-fixed if the option --disable-fixincludes is used. The patch also changes syslimits.h from either the fixincluded file or gsyslimits.h to use gsyslimits.h unconditionally, which seemed wrong at first. Now I see how these two hunks work together: syslimits.h will now always #include_next , which will find it in include-fixed if it's there, and system header files otherwise. Nice!, but IMHO the commit message could be a little more verbose on the extent of the change and why that (is supposed to) work. Oh, to be honest I'm not fully familiar with how these 2 files work together. Can you explain it to me so that I can adjust the changelog entry correspondingly? It also looks like install-mkheaders installs limits-related headers for when fixincludes runs; we could probably skip the whole thing if fixincludes is disabled, but I'm also worried about how the changes above might impact post-install fixincludes: if that installs gsyslimits.h et al in include-fixed while your changes moves it to include, headers might end up in a confusing state. I haven't worked out whether that's safe, but there appears to be room for cleanups there. I've check that 'make install-mkheaders' work fine w/ and w/o --disable-fixincludes after the patch. gcc/config/mips/t-sdemtk also places syslimits.h explicitly in include/ rather than include-fixed/, as part of disabling fixincludes, which is good, but it could be cleaned up as well. Can we do that as a follow-up patch? I don't see other config fragments that might require adjustments, so I think the patch looks good; hopefully my worries are unjustified, and the cleanups it enables could be Good. We still create the README file in there and install it, even with fixincludes disabled and thus unavailable, don't we? That README then becomes misleading; we might be better off not installing it. Sure, fixed in v2 of the patch. When --disable-fixincludes is used, then no systen header files are fixed by the tools in fixincludes. Moreover, the fixincludes tools are not built any longer. typo: s/systen/system/ Fixed. Could you please check that a post-install mkheaders still has a functional limits.h with these changes? How do I check that, please? The patch is ok (with the typo fixes) if so. The cleanups it enables would be welcome as separate patches ;-) Can I install the v2? Once Alex is OK with this patch, then it'll be good to go. jeff
Re: [PATCH] Implement global ranges for all vrange types (SSA_NAME_RANGE_INFO).
On 7/6/2022 11:10 AM, Aldy Hernandez via Gcc-patches wrote: Currently SSA_NAME_RANGE_INFO only handles integer ranges, and loses half the precision in the process because its use of legacy value_range's. This patch rewrites all the SSA_NAME_RANGE_INFO (nonzero bits included) to use the recently contributed vrange_storage. With it, we'll be able to efficiently save any ranges supported by ranger in GC memory. Presently this will only be irange's, but shortly we'll add floating ranges and others to the mix. As per the discussion with the trailing_wide_ints adjustments and vrange_storage, we'll be able to save integer ranges with a maximum of 5 sub-ranges. This could be adjusted later if more sub-ranges are needed (unlikely). Since this is a behavior changing patch, I would like to take a few days for discussion, and commit early next week if all goes well. A few notes. First, we get rid of the SSA_NAME_ANTI_RANGE_P bit in the SSA_NAME since we store full resolution ranges. Perhaps it could be re-used for something else. The range_info_def struct is gone in favor of an opaque type handled by vrange_storage. It currently supports irange, but will support frange, prange, etc, in due time. From the looks of it, set_range_info was an update operation despite its name, as we improved the nonzero bits with each call, even though we clobbered the ranges. Presumably this was because doing a proper intersect of ranges lost information with the anti-range hack. We no longer have this limitation so now we formalize both set_range_info and set_nonzero_bits to an update operation. After all, we should never be losing information, but enhancing it whenever possible. This means, that if folks' finger-memory is not offended, as a follow-up, I'd like to rename set_nonzero_bits and set_range_info to update_*. I have kept the same global API we had in tree-ssanames.h, with the caveat that all set operations are now update as discussed above. There is a 2% performance penalty for evrp and a 3% penalty for VRP that is coincidentally in line with a previous improvement of the same amount in the vrange abstraction patchset. Interestingly, this penalty is mostly due to the wide int to tree dance we keep doing with irange and legacy. In a first draft of this patch where I was streaming trees directly, there was actually a small improvement instead. I hope to get some of the gain back when we move irange's to wide-ints, though I'm not in a hurry ;-). Tested and benchmarked on x86-64 Linux. I will also test on ppc64le before the final commit. Comments welcome. gcc/ChangeLog: * gimple-range.cc (gimple_ranger::export_global_ranges): Remove verification against legacy value_range. * tree-core.h (struct range_info_def): Remove. (struct irange_storage_slot): New. (struct tree_base): Remove SSA_NAME_ANTI_RANGE_P documentation. (struct tree_ssa_name): Add vrange_storage support. * tree-ssanames.cc (range_info_p): New. (range_info_fits_p): New. (range_info_alloc): New. (range_info_free): New. (range_info_get_range): New. (range_info_set_range): New. (set_range_info_raw): Remove. (set_range_info): Adjust to use vrange_storage. (set_nonzero_bits): Same. (get_nonzero_bits): Same. (duplicate_ssa_name_range_info): Remove overload taking value_range_kind. Rewrite tree overload to use vrange_storage. (duplicate_ssa_name_fn): Adjust to use vrange_storage. * tree-ssanames.h (struct range_info_def): Remove. (set_range_info): Adjust prototype to take vrange. * tree-vrp.cc (vrp_asserts::remove_range_assertions): Call duplicate_ssa_name_range_info. * tree.h (SSA_NAME_ANTI_RANGE_P): Remove. (SSA_NAME_RANGE_TYPE): Remove. * value-query.cc (get_ssa_name_range_info): Adjust to use vrange_storage. (update_global_range): Use int_range_max. (get_range_global): Remove as_a. I'll be so happy once we don't have to keep doing the conversions between the types. Anti-ranges no more! I've got no real concerns here. So unless someone objects, your plan is OK. jeff
Re: [PATCH v3] c: Extend the -Wpadded message with actual padding size
On 6/27/2022 2:04 AM, Vit Kabele wrote: gcc/ChangeLog: * stor-layout.cc (finalize_record_size): Extend warning message. gcc/testsuite/ChangeLog: * c-c++-common/Wpadded.c: New test. Thanks. I've pushed this to the trunk. jeff
Re: [PATCH v2] [PR100106] Reject unaligned subregs when strict alignment is required
On 5/5/2022 8:41 PM, Alexandre Oliva via Gcc-patches wrote: On May 5, 2022, Segher Boessenkool wrote: On Thu, May 05, 2022 at 03:52:01AM -0300, Alexandre Oliva wrote: + else if (reg && MEM_P (reg) + && STRICT_ALIGNMENT && MEM_ALIGN (reg) < GET_MODE_ALIGNMENT (omode)) +return false; Please fix the line breaks? Either do a break before every &&, or put as many things as possible on one line? I was going for conceptual grouping of alignment-related subexprs, but I don't care enough to fight for it. Note that you should never have paradoxical subregs of mem on rs6000 or any other target with INSN_SCHEDULING. Great, that alleviates some of my concerns about overreaching in this patch. +#include "../../gcc.c-torture/compile/pr100106.c" It is better to copy the 11 lines of code. 'k Please comment what the ilp32 is for (namely, the -mcpu= will barf without it).. Ack The testcase is okay with those changes, thanks! Thanks. Here's the revised patch. I'm now testing on several platforms a follow-up patch that introduces TARGET_ALLOW_SUBREG_OF_MEM. [PR100106] Reject unaligned subregs when strict alignment is required From: Alexandre Oliva The testcase for pr100106, compiled with optimization for 32-bit powerpc -mcpu=604 with -mstrict-align expands the initialization of a union from a float _Complex value into a load from an SCmode constant pool entry, aligned to 4 bytes, into a DImode pseudo, requiring 8-byte alignment. The patch that introduced the testcase modified simplify_subreg to avoid changing the MEM to outermode, but simplify_gen_subreg still creates a SUBREG or a MEM that would require stricter alignment than MEM's, and lra_constraints appears to get confused by that, repeatedly creating unsatisfiable reloads for the SUBREG until it exceeds the insn count. Avoiding the unaligned SUBREG, expand splits the DImode dest into SUBREGs and loads each SImode word of the constant pool with the proper alignment. for gcc/ChangeLog PR target/100106 * emit-rtl.cc (validate_subreg): Reject a SUBREG of a MEM that requires stricter alignment than MEM's. for gcc/testsuite/ChangeLog PR target/100106 * gcc.target/powerpc/pr100106-sa.c: New. OK. jeff
Re: [PATCH] postscan_insn hook not called after input_asm
On 3/29/2022 8:10 AM, Paul Iannetta via Gcc-patches wrote: Hi, While working on the Kalray port of gcc, I noticed that the hook TARGET_ASM_FINAL_POSTSCAN_INSN is not called after emitting an instruction coming from a basic asm block. Here is a patch which fixes this behavior. The following check: ``` $ find gcc/config/ -type f -exec grep "#define TARGET_ASM_FINAL" "{}" + gcc/config/m68k/m68k.cc:#define TARGET_ASM_FINAL_POSTSCAN_INSN m68k_asm_final_postscan_insn gcc/config/avr/avr.cc:#define TARGET_ASM_FINAL_POSTSCAN_INSN avr_asm_final_postscan_insn gcc/config/mips/mips.cc:#define TARGET_ASM_FINAL_POSTSCAN_INSN mips_final_postscan_insn ``` reveals that m68k, avr and mips are the only affected targets upstream. So I know it's been a while since you posted this patch. Do you recall the motivation behind it? ie, was there something your port is doing that required the post-scan hook to be called on an ASM? I'm not objecting to the patch, I just want to understand better what drove you to write it in the first place. IIRC you can't ask for attributes on an ASM, meaning that the m68k hook will probably abort with this change. You may need to adjust the m68k implementation so that it doesn't fault if presented with an ASM. Something like this at the start should do the trick. if (recog_memoized (insn) < 0) { flags_operand1 = flags_operand2 = NULL_RTX; return; }
Re: [PATCH][v2] tree-optimization: Fold (type)X / (type)Y [PR103855]
On 3/16/2022 6:49 PM, Zhao Wei Liew via Gcc-patches wrote: Thanks for the detailed review. I have uploaded patch v2 based on the review. v1: https://gcc.gnu.org/pipermail/gcc-patches/2022-February/590604.html Changes since v1: 1. Add patterns for the cases CST / (T)x and (T)x / CST as well. Fix test regressions caused by those patterns. 2. Support signed integers except for the possible INT_MIN / -1 case. 3. Support cases where X and Y have different precisions. On Tue, 22 Feb 2022 at 15:54, Richard Biener wrote: +/* (type)X / (type)Y -> (type)(X / Y) + when the resulting type is at least precise as the original types + and when all the types are unsigned integral types. */ +(simplify + (trunc_div (convert @0) (convert @1)) + (if (INTEGRAL_TYPE_P (type) + && INTEGRAL_TYPE_P (TREE_TYPE (@0)) + && INTEGRAL_TYPE_P (TREE_TYPE (@1)) + && TYPE_UNSIGNED (type) + && TYPE_UNSIGNED (TREE_TYPE (@0)) + && TYPE_UNSIGNED (TREE_TYPE (@1)) + && TYPE_PRECISION (TREE_TYPE (@0)) == TYPE_PRECISION (TREE_TYPE (@1)) + && TYPE_PRECISION (type) >= TYPE_PRECISION (TREE_TYPE (@0))) + (convert (trunc_div @0 @1 since you are requiring the types of @0 and @1 to match it's easier to write && types_match (TREE_TYPE(@0), TREE_TYPE (@1)) Thanks. In the new patch, TREE_TYPE (@0) and TREE_TYPE (@1) can now have different precisions, so I believe that I can't use types_match() anymore. that allows you to elide checks on either @0 or @1. I suppose the transform does not work for signed types because of the -INT_MIN / -1 overflow case? It might be possible to use expr_not_equal_to (@0, -INT_MIN) || expr_not_equal_to (@1, -1) (correctly written, lookup the existing examples in match.pd for the X % -Y transform) That's right. I have made changes to support signed types except for the INT_MIN / -1 case. I'll note that as written the transform will not catch CST / (T)x or (T)x / CST since you'll not see conversions around constants. I'm not sure whether using (convert[12]? ...) or writing special patterns with INTEGER_CST operands is more convenient. There is int_fits_type_p to check whether a constant will fit in a type without truncation or sign change. I see. I couldn't find an easy way to use (convert[12]? ...) in this case so I added 2 other patterns for the CST / (T)x and (T)x / CST cases. When @0 and @1 do not have the same type there might still be a common type that can be used and is smaller than 'type', it might be as simple as using build_nonstandard_integer_type (MIN (@0-prec, @1-prec), 1 /*unsigned_p*/). Thanks for the tip. I've made a similar change, but using either TREE_TYPE (@0) or TREE_TYPE (@1) instead of build_nonstandard_integer_type(). In the past there have been attempts to more globally narrow operations using a new pass rather than using individual patterns. So for more complicated cases that might be the way to go. There's now also the ISEL pass which does pre-RTL expansion transforms that need some global context and for example can look at SSA uses. I had wanted to do something similar to handle all integral trunc_divs, but I'm not sure where/how to start. It seems out of my league at this moment. I'll gladly explore it after this change though! Just a couple general notes in case you want to poke further in this space. Earlier you mentioned you were trying to do some of this work in expand_divmod, but the operands had rtx types rather than tree types, thus you're unable to get at the range data. In expand_expr_divmod there's treeop0, treeop1. So if they are SSA_NAMEs, then you can query for their range. Richi mentioned attempts to globally narrow during a new pass. That's a much broader chance, but has the potential to apply in a lot more places. Narrowing early would potentially expose the narrowed statements to the vectorizer which could be a win. Narrowing late is likely a win too, but it likely only helps the expansion phase generate narrower operations. This is obviously a larger project than just handling the division cases in match.pd. Tackling it is not (IMHO) a requirement to move forward. Finally, narrowing isn't always a win. There have been micro-architectures were sub-word accesses are slower than word accesses. I'm not too worried about it for this patch, but it could become more important if you were to look into a more general solution. As far as the patch is concerned. At one point I was a bit worried that expr_not_equal_to wasn't right. But after digging a bit deeper into its implementation, it should do the right thing here. Note if you wanted to see how to get to underlying range data, you can find a simple example in that function. Your new testcase needs to limit itself to targets where integers are at least 32 bits wide. Something like this at the top of the new test: /* { dg-require-effective-target int32plus } */ There are additional cases Andrew Pinski pointed out in B
[COMMITTED] Set VR_VARYING in irange::irange_single_pair_union.
The fast union operation is sometimes setting a range of the entire domain, but leaving the kind bit as VR_RANGE instead of downgrading it to VR_VARYING. Tested on x86-64 Linux. gcc/ChangeLog: * value-range.cc (irange::irange_single_pair_union): Set VR_VARYING when appropriate. --- gcc/value-range.cc | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/gcc/value-range.cc b/gcc/value-range.cc index 25f1acff4a3..fd549b9ca59 100644 --- a/gcc/value-range.cc +++ b/gcc/value-range.cc @@ -1777,11 +1777,7 @@ irange::irange_single_pair_union (const irange &r) // Check for overlap/touching ranges, or single target range. if (m_max_ranges == 1 || wi::to_widest (m_base[1]) + 1 >= wi::to_widest (r.m_base[0])) - { - m_base[1] = r.m_base[1]; - if (varying_compatible_p ()) - m_kind = VR_VARYING; - } + m_base[1] = r.m_base[1]; else { // This is a dual range result. @@ -1789,8 +1785,8 @@ irange::irange_single_pair_union (const irange &r) m_base[3] = r.m_base[1]; m_num_ranges = 2; } - if (flag_checking) - verify_range (); + if (varying_compatible_p ()) + m_kind = VR_VARYING; return true; } @@ -1817,8 +1813,8 @@ irange::irange_single_pair_union (const irange &r) m_base[3] = m_base[1]; m_base[1] = r.m_base[1]; } - if (flag_checking) -verify_range (); + if (varying_compatible_p ()) +m_kind = VR_VARYING; return true; } @@ -1857,6 +1853,8 @@ irange::irange_union (const irange &r) { bool ret = irange_single_pair_union (r); set_nonzero_bits (saved_nz); + if (flag_checking) + verify_range (); return ret || ret_nz; } -- 2.36.1
Re: [PATCH] Implement global ranges for all vrange types (SSA_NAME_RANGE_INFO).
On Sat, Jul 9, 2022 at 6:16 PM Jeff Law via Gcc-patches wrote: > > > > On 7/6/2022 11:10 AM, Aldy Hernandez via Gcc-patches wrote: > > Currently SSA_NAME_RANGE_INFO only handles integer ranges, and loses > > half the precision in the process because its use of legacy > > value_range's. This patch rewrites all the SSA_NAME_RANGE_INFO > > (nonzero bits included) to use the recently contributed > > vrange_storage. With it, we'll be able to efficiently save any ranges > > supported by ranger in GC memory. Presently this will only be > > irange's, but shortly we'll add floating ranges and others to the mix. > > > > As per the discussion with the trailing_wide_ints adjustments and > > vrange_storage, we'll be able to save integer ranges with a maximum of > > 5 sub-ranges. This could be adjusted later if more sub-ranges are > > needed (unlikely). > > > > Since this is a behavior changing patch, I would like to take a few > > days for discussion, and commit early next week if all goes well. > > > > A few notes. > > > > First, we get rid of the SSA_NAME_ANTI_RANGE_P bit in the SSA_NAME > > since we store full resolution ranges. Perhaps it could be re-used > > for something else. > > > > The range_info_def struct is gone in favor of an opaque type handled > > by vrange_storage. It currently supports irange, but will support > > frange, prange, etc, in due time. > > > > From the looks of it, set_range_info was an update operation despite > > its name, as we improved the nonzero bits with each call, even though > > we clobbered the ranges. Presumably this was because doing a proper > > intersect of ranges lost information with the anti-range hack. We no > > longer have this limitation so now we formalize both set_range_info > > and set_nonzero_bits to an update operation. After all, we should > > never be losing information, but enhancing it whenever possible. This > > means, that if folks' finger-memory is not offended, as a follow-up, > > I'd like to rename set_nonzero_bits and set_range_info to update_*. > > > > I have kept the same global API we had in tree-ssanames.h, with the > > caveat that all set operations are now update as discussed above. > > > > There is a 2% performance penalty for evrp and a 3% penalty for VRP > > that is coincidentally in line with a previous improvement of the same > > amount in the vrange abstraction patchset. Interestingly, this > > penalty is mostly due to the wide int to tree dance we keep doing with > > irange and legacy. In a first draft of this patch where I was > > streaming trees directly, there was actually a small improvement > > instead. I hope to get some of the gain back when we move irange's to > > wide-ints, though I'm not in a hurry ;-). > > > > Tested and benchmarked on x86-64 Linux. I will also test on ppc64le > > before the final commit. > > > > Comments welcome. > > > > gcc/ChangeLog: > > > > * gimple-range.cc (gimple_ranger::export_global_ranges): Remove > > verification against legacy value_range. > > * tree-core.h (struct range_info_def): Remove. > > (struct irange_storage_slot): New. > > (struct tree_base): Remove SSA_NAME_ANTI_RANGE_P documentation. > > (struct tree_ssa_name): Add vrange_storage support. > > * tree-ssanames.cc (range_info_p): New. > > (range_info_fits_p): New. > > (range_info_alloc): New. > > (range_info_free): New. > > (range_info_get_range): New. > > (range_info_set_range): New. > > (set_range_info_raw): Remove. > > (set_range_info): Adjust to use vrange_storage. > > (set_nonzero_bits): Same. > > (get_nonzero_bits): Same. > > (duplicate_ssa_name_range_info): Remove overload taking > > value_range_kind. > > Rewrite tree overload to use vrange_storage. > > (duplicate_ssa_name_fn): Adjust to use vrange_storage. > > * tree-ssanames.h (struct range_info_def): Remove. > > (set_range_info): Adjust prototype to take vrange. > > * tree-vrp.cc (vrp_asserts::remove_range_assertions): Call > > duplicate_ssa_name_range_info. > > * tree.h (SSA_NAME_ANTI_RANGE_P): Remove. > > (SSA_NAME_RANGE_TYPE): Remove. > > * value-query.cc (get_ssa_name_range_info): Adjust to use > > vrange_storage. > > (update_global_range): Use int_range_max. > > (get_range_global): Remove as_a. > I'll be so happy once we don't have to keep doing the conversions > between the types. > > Anti-ranges no more! Yeah, it took a little longer than the 6 weeks Andrew had estimated originally :-P. Note that anti range kinda sorta still exist in two forms: a) If you use value_range, as it still uses the legacy stuff underneath. But any new consumers (evrp, DOM, etc), all pass an int_range or an int_range_max, so anyone who cares about ranges should never see an anti range. Later this cycle value_range will be typedefed to what is now Value_Range, which is an infinite precision range that works
[PATCH] c: Fix location for _Pragma tokens [PR97498]
Hello- PR97498 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97498) is another PR related to the fact that imprecise locations for _Pragma result in counterintuitive behavior for GCC diagnostic pragmas, which inhibit the ability to make convenient wrapper macros for enabling and disabling diagnostics in specific scopes. It looks like David did a lot of work a few years ago improving this (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69543 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69558), and in particular r233637 added a lot of new test coverage for cases that had regressed in the past. I think the main source of problems for all remaining issues is that we use the global input_location for deciding when/if a diagnostic should apply. I think it should be eventually doable to eliminate this, and rather properly resolve the token locations to the place they need to be so that _Pragma type wrapper macros just work the way people expect. That said, PR97498 can be solved easily with a 2-line fix without removing input_location, and I think the resulting change to input_location's value is an improvement that will benefit other areas, so I thought I'd see what you think about this patch please? Here is a typical testcase. Note the line continuations so it's all one logical line. === _Pragma("GCC diagnostic push") \ _Pragma("GCC diagnostic ignored \"-Wunused-function\"") \ static void f() {} \ _Pragma("GCC diagnostic pop") === What happens is that in C++ mode, input_location is always updated to the most recently-lexed token, so the above case works fine and does not warn when compiled with "g++ -Wunused-functions". However, in C mode, it does warn because input_location in C is almost always set to the start of the line, and is in this case. So the pop is deemed to take place prior to the definition of f(). Initially, I thought it best to change input_location for C mode to behave like C++, and always update to the most recently lexed token. Maybe that's still the right way to go, but there was a fair amount of testsuite fallout from that. Most of it, was just that we would need to change the tests to look for the new locations, and in many cases, the new locations seemed preferable to the old ones, but it seemed a bit much for now, so I took a more measured approach and just changed input_location in the specific case of processing a pragma, to be the location of the CPP_PRAGMA token. Unfortunately, it turns out that the CPP_PRAGMA token that libcpp provides to represent the _Pragma() expression doesn't have a valid location with which input_location could be overridden. Looking into that, in r232893 David added logic which sets the location of all tokens inside the _Pragma(...) to a reasonable place (namely it points to "_Pragma" at the expansion point). However, that patch didn't change the location of the CPP_PRAGMA token itself to similarly point there, so the 2nd line of this patch does that. The rest of it is just tweaking a couple tests which were sensitive to the location being output. In all these cases, the new locations seem more informative to me than the old ones. With those tweaks, bootstrap + regtest all languages looks good with no regressions. Please let me know what you think? Thanks! -Lewis [PATCH] c: Fix location for _Pragma tokens [PR97498] The handling of #pragma GCC diagnostic uses input_location, which is not always as precise as needed; in particular the relative location of some tokens and a _Pragma directive will crucially determine whether a given diagnostic is enabled or suppressed in the desired way. PR97498 shows how the C frontend ends up with input_location pointing to the beginning of the line containing a _Pragma() directive, resulting in the wrong behavior if the diagnostic to be modified pertains to some tokens found earlier on the same line. This patch fixes that by addressing two issues: a) libcpp was not assigning a valid location to the CPP_PRAGMA token generated by the _Pragma directive. b) C frontend was not setting input_location to something reasonable. With this change, the C frontend is able to change input_location to point to the _Pragma token as needed. This is just a two-line fix (one for each of a) and b)), the testsuite changes were needed only because the location on the tested warnings has been somewhat improved, so the tests need to look for the new locations. gcc/c/ChangeLog: PR preprocessor/97498 * c-parser.cc (c_parser_pragma): Set input_location to the location of the pragma, rather than the start of the line. libcpp/ChangeLog: PR preprocessor/97498 * directives.cc (destringize_and_run): Override the location of the CPP_PRAGMA token from a _Pragma directive to the location of the expansion point, as is done for the tokens lexed from it. gcc/testsuite/ChangeLog: PR preprocessor/97498 * c-c++-common/pr97498.c: New test. * c-c++-common/
Re: [PATCH v3] loongarch: fix mulsidi3_64bit instruction
在 2022/7/9 上午10:56, Xi Ruoyao 写道: v3: Relax scan-assembler pattern in test case mulw_d_w.c. It's because multiplication is Abelian and the compiler may switch the order of operands in the future. -- >8 -- (mult (sign_extend:DI rj:SI) (sign_extend:DI rk:SI)) should be "mulw.d.w", not "mul.d". gcc/ChangeLog: * config/loongarch/loongarch.md (mulsidi3_64bit): Use mulw.d.w instead of mul.d. gcc/testsuite/ChangeLog: * gcc.target/loongarch/mulw_d_w.c: New test. * gcc.c-torture/execute/mul-sext.c: New test. --- I think there is no problem with this modification. Thankes!
Re: [PATCH 0/2] loongarch: improve code generation for integer division
在 2022/7/7 上午10:23, Xi Ruoyao 写道: We were generating some unnecessary instructions for integer division. These two patches improve the code generation to compile template T div(T a, T b) { return a / b; } into a single division instruction (along with a return instruction of course) as we expected for T in {int32_t, uint32_t, int64_t}. Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? Xi Ruoyao (2): loongarch: add alternatives for idiv insns to improve code generation loongarch: avoid unnecessary sign-extend after 32-bit division gcc/config/loongarch/loongarch-protos.h| 1 + gcc/config/loongarch/loongarch.cc | 2 +- gcc/config/loongarch/loongarch.md | 34 -- gcc/testsuite/gcc.target/loongarch/div-1.c | 9 ++ gcc/testsuite/gcc.target/loongarch/div-2.c | 9 ++ gcc/testsuite/gcc.target/loongarch/div-3.c | 9 ++ gcc/testsuite/gcc.target/loongarch/div-4.c | 9 ++ 7 files changed, 63 insertions(+), 10 deletions(-) create mode 100644 gcc/testsuite/gcc.target/loongarch/div-1.c create mode 100644 gcc/testsuite/gcc.target/loongarch/div-2.c create mode 100644 gcc/testsuite/gcc.target/loongarch/div-3.c create mode 100644 gcc/testsuite/gcc.target/loongarch/div-4.c LGTM and spec has been tested. Thanks!
Re: PING^2: [PATCH] Add --enable-first-stage-cross configure option
On 1/9/2022 2:26 PM, Serge Belyshev wrote: Ping: [PATCH] Add --enable-first-stage-cross configure option https://gcc.gnu.org/pipermail/gcc-patches/2021-July/575318.html Add --enable-first-stage-cross configure option Build static-only, C-only compiler that is sufficient to cross compile glibc. This option disables various runtime libraries that require libc to compile, turns on --with-newlib, --without-headers, --disable-decimal-float, --disable-shared, --disable-threads, and sets --enable-languages=c. Rationale: current way of building first stage compiler of a cross toolchain requires specifying a list of target libraries that are not going to be compiled due to their dependency on target libc. This list is not documented in gccinstall.texi and sometimes changes. To simplify the procedure, it is better to maintain that list in the GCC itself. Usage example as a patch to glibc's scripts/build-many-libcs.py: diff --git a/scripts/build-many-glibcs.py b/scripts/build-many-glibcs.py index 580d25e8ee..3a6a7be76b 100755 --- a/scripts/build-many-glibcs.py +++ b/scripts/build-many-glibcs.py @@ -1446,17 +1446,7 @@ class Config(object): # required to define inhibit_libc (to stop some parts of # libgcc including libc headers); --without-headers is not # sufficient. -cfg_opts += ['--enable-languages=c', '--disable-shared', - '--disable-threads', - '--disable-libatomic', - '--disable-decimal-float', - '--disable-libffi', - '--disable-libgomp', - '--disable-libitm', - '--disable-libmpx', - '--disable-libquadmath', - '--disable-libsanitizer', - '--without-headers', '--with-newlib', +cfg_opts += ['--enable-first-stage-cross', '--with-glibc-version=%s' % self.ctx.glibc_version ] cfg_opts += self.first_gcc_cfg Bootstrapped/regtested on x86_64-pc-linux-gnu, and tested with build-many-glibcs.py with the above patch. OK for mainline? ChangeLog: * configure.ac: Add --enable-first-stage-cross. * configure: Regenerate. gcc/ChangeLog: * doc/install.texi: Document --enable-first-stage-cross. I'm not really sure we need a patch for this. Isn't it sufficient to "make all-gcc && make all-target-libgcc"? Folks have been doing that for decades. Jeff
Re: [PATCH] middle-end/104854: Avoid overread warning for strnlen and strndup
On 3/9/2022 5:39 PM, Siddhesh Poyarekar wrote: The size argument larger than size of SRC for strnlen and strndup is problematic only if SRC is not NULL terminated, which invokes undefined behaviour. In all other cases, as long as SRC is large enough to have a NULL char (i.e. size 1 or more), a larger N should not invoke a warning during compilation. Such a warning may be a suitable check for the static analyzer instead with slightly different wording suggesting that choice of size argument makes the function call equivalent to strlen/strdup. This change results in the following code going through without a warning: -- char *buf; char * foo (void) { buf = __builtin_malloc (4); __builtin_memset (buf, 'A', 4); return __builtin_strndup (buf, 5); } int main () { __builtin_printf ("%s\n", foo ()); } -- but the problem above is a missing NULL, not N being larger than the size of SRC and the overread warning in this context is confusing at best and misleading (and hinting at the wrong solution) in the worst case. gcc/ChangeLog: middle-end/104854 * gimple-ssa-warn-access.cc (check_access): New parameter. Skip warning if in read-only mode, source string is NULL terminated and has non-zero object size. (check_access): New parameter. (check_access): Adjust. (check_read_access): New parameter. Adjust for check_access change. (pass_waccess::check_builtin): Adjust check_read_access call for memcmp, memchr. (pass_waccess::maybe_check_access_sizes): Likewise. gcc/testsuite/ChangeLog: middle-end/104854 * gcc.dg/Wstringop-overread.c (test_strnlen_array, test_strndup_array): Don't expect warning for non-zero source sizes. * gcc.dg/attr-nonstring-4.c (strnlen_range): Likewise. * gcc.dg/pr78902.c: Likewise. * gcc.dg/warn-strnlen-no-nul.c: Likewise. I know this is old and the BZ has been set as CLOSED/INVALID. But it was in my TODO list, and I've got thoughts here so I might as well chime in ;-) The potential overread warning for that code seems quite reasonable to me. Yes it is the case that the length argument is sometimes unrelated to the source string. But even then where's the line for when we should and should not warn? In my mind these middle end warnings can not and will not ever be 100% accurate. Their goal is to say "hey, can someone please look at this code very closely because it might be wrong" -- and that's probably about the best we can do since many of these are trying to infer intent of the programmer. It almost feels like we want not just to have finer grained control over this particular class of wrnings, but even within the class we may want levels (and just to be clear, I'm not generally a fan of leveled warnings as the higher levels rarely get used in practice). The different levels could correspond to our ability to discern the source of the length operand -- ie, it is related to the source operand, some other operand or a constant and we could potentially treat each of those cases differently. Jeff
Re: [PING^2][PATCH][final] Handle compiler-generated asm insn
On 3/21/2022 10:14 AM, Tom de Vries via Gcc-patches wrote: On 3/21/22 14:49, Richard Biener wrote: On Mon, Mar 21, 2022 at 12:50 PM Tom de Vries wrote: On 3/21/22 08:58, Richard Biener wrote: On Thu, Mar 17, 2022 at 4:10 PM Tom de Vries via Gcc-patches wrote: On 3/9/22 13:50, Tom de Vries wrote: On 2/22/22 14:55, Tom de Vries wrote: Hi, For the nvptx port, with -mptx-comment we have in pr53465.s: ... // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // Start: Added by -minit-regs=3: // #NO_APP mov.u32 %r26, 0; // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // End: Added by -minit-regs=3: // #NO_APP ... The comments where generated using the compiler-generated equivalent of: ... asm ("// Comment"); ... but both the printed location and the NO_APP/APP are unnecessary for a compiler-generated asm insn. Fix this by handling ASM_INPUT_SOURCE_LOCATION == UNKNOWN_LOCATION in final_scan_insn_1, such what we simply get: ... // Start: Added by -minit-regs=3: mov.u32 %r26, 0; // End: Added by -minit-regs=3: ... Tested on nvptx. OK for trunk? Ping^2. Tobias just reported an ICE in PR104968, and this patch fixes it. I'd like to known whether this patch is acceptable for stage 4 or not. If not, I need to fix PR104968 in a different way. Say, disable -mcomment by default, or trying harder to propagate source info on outlined functions. Hi, thanks for the review. Usually targets use UNSPECs to emit compiler-generated "asm" instructions. Ack. [ I could go down that route eventually, but for now I'm hoping to implement this without having to change the port. ] I think an unknown location is a reasonable but not the best way to identify 'compiler-generated', we might lose the location through optimization. (why does it not use the INSN_LOCATION?) I don't know. FWIW, at the time that ASM_INPUT_SOURCE_LOCATION was introduced (2007), there was no INSN_LOCATION yet (introduced in 2012), only INSN_LOCATOR, my guess is that it has something to do with that. Rather than a location I'd use sth like DECL_ARTIFICIAL to disable 'user-mangling', do we have something like that for ASM or an insn in general? Haven't found it. If not maybe there's an unused bit on ASMs we can enable this way. Done. I've used the jump flag for that. Updated, untested patch attached. Is this what you meant? Hmm. I now read that ASM_INPUT is in every PATTERN of an insn Maybe I misunderstand, but that sounds incorrect to me. That is, can you point me to where you read that? Maybe you're referring to the fact that an ASM_INPUT may occur inside an ASM_OPERANDS, as "a convenient way to hold a string" (quoting rtl.def)? and wonder how this all works out there. That is, by default the ASM_INPUT would be artificial (for regular define_insn) but asm("") in source would mark them ASM_INPUT_USER_P or so. If you're suggesting to make it by default artificial, then that doesn't sound like a bad idea to me. In this iteration I haven't implemented this (yet), but instead explicitly marked as artificial some other uses of ASM_INPUT. But then I know nothing here. I did expect us to look at ASM_OPERANDS instead of just ASM_INPUT (but the code you are changing is about ASM_INPUT). I extended the rationale in the commit log a bit to include a description of what the rtl-equivalent of 'asm ("// Comment")' looks like, and there's no ASM_OPERANDS there. That said, the comments should probably explicitely say this is about ASM_INPUT in an ASM_OPERANDS instruction template, not some other pattern. AFAIU, this isn't about an ASM_INPUT in an ASM_OPERANDS instruction template, so at this point I haven't updated the comment. Thanks, - Tom 0015-final-Handle-compiler-generated-asm-insn.patch [final] Handle compiler-generated asm insn For the nvptx port, with -mptx-comment we have for test-case pr53465.c at mach: ... (insn 66 43 65 3 (asm_input ("// Start: Added by -minit-regs=3:")) -1 (nil)) (insn 65 66 67 3 (set (reg/v:SI 26 [ d ]) (const_int 0 [0])) 6 {*movsi_insn} (nil)) (insn 67 65 44 3 (asm_input ("// End: Added by -minit-regs=3:")) -1 (nil)) ... and in pr53465.s: ... // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // Start: Added by -minit-regs=3: // #NO_APP mov.u32 %r26, 0; // #APP // 9 "gcc/testsuite/gcc.c-torture/execute/pr53465.c" 1 // End: Added by -minit-regs=3: // #NO_APP ... [ The comment insns were modelled after: ... asm ("// Comment"); ... which expands to: ... (insn 5 2 6 2 (parallel [ (asm_input/v ("// Comment") test.c:4) (clobber (mem:BLK (scratch) [0 A8])) ]) "test.c":4:3 -1 (nil)) ... Note btw the differences: the comment insn has no clobber, and A
Re: [PATCH v3] loongarch: fix mulsidi3_64bit instruction
On Sun, 2022-07-10 at 09:45 +0800, Lulu Cheng wrote: > > 在 2022/7/9 上午10:56, Xi Ruoyao 写道: > > v3: Relax scan-assembler pattern in test case mulw_d_w.c. It's > > because > > multiplication is Abelian and the compiler may switch the order of > > operands in the future. > > -- >8 -- > > > > (mult (sign_extend:DI rj:SI) (sign_extend:DI rk:SI)) should be > > "mulw.d.w", not "mul.d". > > > > gcc/ChangeLog: > > > > * config/loongarch/loongarch.md (mulsidi3_64bit): Use > > mulw.d.w > > instead of mul.d. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.target/loongarch/mulw_d_w.c: New test. > > * gcc.c-torture/execute/mul-sext.c: New test. > > --- > > I think there is no problem with this modification. > > Thankes! > Pushed r13-1591 and r12-8562. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] Implement global ranges for all vrange types (SSA_NAME_RANGE_INFO).
On 7/9/2022 1:31 PM, Aldy Hernandez wrote: On Sat, Jul 9, 2022 at 6:16 PM Jeff Law via Gcc-patches wrote: On 7/6/2022 11:10 AM, Aldy Hernandez via Gcc-patches wrote: Currently SSA_NAME_RANGE_INFO only handles integer ranges, and loses half the precision in the process because its use of legacy value_range's. This patch rewrites all the SSA_NAME_RANGE_INFO (nonzero bits included) to use the recently contributed vrange_storage. With it, we'll be able to efficiently save any ranges supported by ranger in GC memory. Presently this will only be irange's, but shortly we'll add floating ranges and others to the mix. As per the discussion with the trailing_wide_ints adjustments and vrange_storage, we'll be able to save integer ranges with a maximum of 5 sub-ranges. This could be adjusted later if more sub-ranges are needed (unlikely). Since this is a behavior changing patch, I would like to take a few days for discussion, and commit early next week if all goes well. A few notes. First, we get rid of the SSA_NAME_ANTI_RANGE_P bit in the SSA_NAME since we store full resolution ranges. Perhaps it could be re-used for something else. The range_info_def struct is gone in favor of an opaque type handled by vrange_storage. It currently supports irange, but will support frange, prange, etc, in due time. From the looks of it, set_range_info was an update operation despite its name, as we improved the nonzero bits with each call, even though we clobbered the ranges. Presumably this was because doing a proper intersect of ranges lost information with the anti-range hack. We no longer have this limitation so now we formalize both set_range_info and set_nonzero_bits to an update operation. After all, we should never be losing information, but enhancing it whenever possible. This means, that if folks' finger-memory is not offended, as a follow-up, I'd like to rename set_nonzero_bits and set_range_info to update_*. I have kept the same global API we had in tree-ssanames.h, with the caveat that all set operations are now update as discussed above. There is a 2% performance penalty for evrp and a 3% penalty for VRP that is coincidentally in line with a previous improvement of the same amount in the vrange abstraction patchset. Interestingly, this penalty is mostly due to the wide int to tree dance we keep doing with irange and legacy. In a first draft of this patch where I was streaming trees directly, there was actually a small improvement instead. I hope to get some of the gain back when we move irange's to wide-ints, though I'm not in a hurry ;-). Tested and benchmarked on x86-64 Linux. I will also test on ppc64le before the final commit. Comments welcome. gcc/ChangeLog: * gimple-range.cc (gimple_ranger::export_global_ranges): Remove verification against legacy value_range. * tree-core.h (struct range_info_def): Remove. (struct irange_storage_slot): New. (struct tree_base): Remove SSA_NAME_ANTI_RANGE_P documentation. (struct tree_ssa_name): Add vrange_storage support. * tree-ssanames.cc (range_info_p): New. (range_info_fits_p): New. (range_info_alloc): New. (range_info_free): New. (range_info_get_range): New. (range_info_set_range): New. (set_range_info_raw): Remove. (set_range_info): Adjust to use vrange_storage. (set_nonzero_bits): Same. (get_nonzero_bits): Same. (duplicate_ssa_name_range_info): Remove overload taking value_range_kind. Rewrite tree overload to use vrange_storage. (duplicate_ssa_name_fn): Adjust to use vrange_storage. * tree-ssanames.h (struct range_info_def): Remove. (set_range_info): Adjust prototype to take vrange. * tree-vrp.cc (vrp_asserts::remove_range_assertions): Call duplicate_ssa_name_range_info. * tree.h (SSA_NAME_ANTI_RANGE_P): Remove. (SSA_NAME_RANGE_TYPE): Remove. * value-query.cc (get_ssa_name_range_info): Adjust to use vrange_storage. (update_global_range): Use int_range_max. (get_range_global): Remove as_a. I'll be so happy once we don't have to keep doing the conversions between the types. Anti-ranges no more! Yeah, it took a little longer than the 6 weeks Andrew had estimated originally :-P. Note that anti range kinda sorta still exist in two forms: a) If you use value_range, as it still uses the legacy stuff underneath. But any new consumers (evrp, DOM, etc), all pass an int_range or an int_range_max, so anyone who cares about ranges should never see an anti range. Later this cycle value_range will be typedefed to what is now Value_Range, which is an infinite precision range that works for all types the ranger supports. So anti-ranges here will die a quick death. b) There are some passes which still use the deprecated irange::kind(). This method will return VR_ANTI_RANGE if the range looks like this [-MIN, 123][567,+MAX]. But kind() is
Re: [PATCH 0/2] loongarch: improve code generation for integer division
On Sun, 2022-07-10 at 10:20 +0800, Lulu Cheng wrote: > > 在 2022/7/7 上午10:23, Xi Ruoyao 写道: > > We were generating some unnecessary instructions for integer > > division. > > These two patches improve the code generation to compile > > > > template T div(T a, T b) { return a / b; } > > > > into a single division instruction (along with a return instruction > > of > > course) as we expected for T in {int32_t, uint32_t, int64_t}. > > > > Bootstrapped and regtested on loongarch64-linux-gnu. Ok for trunk? > > > > Xi Ruoyao (2): > > loongarch: add alternatives for idiv insns to improve code > > generation > > loongarch: avoid unnecessary sign-extend after 32-bit division > > > > gcc/config/loongarch/loongarch-protos.h | 1 + > > gcc/config/loongarch/loongarch.cc | 2 +- > > gcc/config/loongarch/loongarch.md | 34 -- > > > > gcc/testsuite/gcc.target/loongarch/div-1.c | 9 ++ > > gcc/testsuite/gcc.target/loongarch/div-2.c | 9 ++ > > gcc/testsuite/gcc.target/loongarch/div-3.c | 9 ++ > > gcc/testsuite/gcc.target/loongarch/div-4.c | 9 ++ > > 7 files changed, 63 insertions(+), 10 deletions(-) > > create mode 100644 gcc/testsuite/gcc.target/loongarch/div-1.c > > create mode 100644 gcc/testsuite/gcc.target/loongarch/div-2.c > > create mode 100644 gcc/testsuite/gcc.target/loongarch/div-3.c > > create mode 100644 gcc/testsuite/gcc.target/loongarch/div-4.c > > > LGTM and spec has been tested. > > Thanks! Pushed r13-1592 and r13-1593. -- Xi Ruoyao School of Aerospace Science and Technology, Xidian University
Re: [PATCH] c: Fix location for _Pragma tokens [PR97498]
On 7/9/2022 2:52 PM, Lewis Hyatt via Gcc-patches wrote: Hello- PR97498 (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97498) is another PR related to the fact that imprecise locations for _Pragma result in counterintuitive behavior for GCC diagnostic pragmas, which inhibit the ability to make convenient wrapper macros for enabling and disabling diagnostics in specific scopes. It looks like David did a lot of work a few years ago improving this (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69543 and https://gcc.gnu.org/bugzilla/show_bug.cgi?id=69558), and in particular r233637 added a lot of new test coverage for cases that had regressed in the past. I think the main source of problems for all remaining issues is that we use the global input_location for deciding when/if a diagnostic should apply. I think it should be eventually doable to eliminate this, and rather properly resolve the token locations to the place they need to be I've long wanted to see our dependency on input_location be diminished with the goal of making it go away completely. so that _Pragma type wrapper macros just work the way people expect. Certainly desirable since many projects have built wrapper macros which use Pragmas to control warnings. One of the biggest QOI implementation details we've had with the warnings has been problems with location data leading to an inability to turn them off in specific locations. So I'm all for improvements, in terms of getting our location data more correct. That said, PR97498 can be solved easily with a 2-line fix without removing input_location, and I think the resulting change to input_location's value is an improvement that will benefit other areas, so I thought I'd see what you think about this patch please? Here is a typical testcase. Note the line continuations so it's all one logical line. === _Pragma("GCC diagnostic push") \ _Pragma("GCC diagnostic ignored \"-Wunused-function\"") \ static void f() {} \ _Pragma("GCC diagnostic pop") === What happens is that in C++ mode, input_location is always updated to the most recently-lexed token, so the above case works fine and does not warn when compiled with "g++ -Wunused-functions". However, in C mode, it does warn because input_location in C is almost always set to the start of the line, and is in this case. So the pop is deemed to take place prior to the definition of f(). Initially, I thought it best to change input_location for C mode to behave like C++, and always update to the most recently lexed token. Maybe that's still the right way to go, but there was a fair amount of testsuite fallout from that. Most of it, was just that we would need to change the tests to look for the new locations, and in many cases, the new locations seemed preferable to the old ones, but it seemed a bit much for now, so I took a more measured approach and just changed input_location in the specific case of processing a pragma, to be the location of the CPP_PRAGMA token. Unfortunately, it turns out that the CPP_PRAGMA token that libcpp provides to represent the _Pragma() expression doesn't have a valid location with which input_location could be overridden. Looking into that, in r232893 David added logic which sets the location of all tokens inside the _Pragma(...) to a reasonable place (namely it points to "_Pragma" at the expansion point). However, that patch didn't change the location of the CPP_PRAGMA token itself to similarly point there, so the 2nd line of this patch does that. The rest of it is just tweaking a couple tests which were sensitive to the location being output. In all these cases, the new locations seem more informative to me than the old ones. With those tweaks, bootstrap + regtest all languages looks good with no regressions. Please let me know what you think? Thanks! gcc/c/ChangeLog: PR preprocessor/97498 * c-parser.cc (c_parser_pragma): Set input_location to the location of the pragma, rather than the start of the line. libcpp/ChangeLog: PR preprocessor/97498 * directives.cc (destringize_and_run): Override the location of the CPP_PRAGMA token from a _Pragma directive to the location of the expansion point, as is done for the tokens lexed from it. gcc/testsuite/ChangeLog: PR preprocessor/97498 * c-c++-common/pr97498.c: New test. * c-c++-common/gomp/pragma-3.c: Adapt for improved warning locations. * c-c++-common/gomp/pragma-5.c: Likewise. * gcc.dg/pragma-message.c: Likewise. libgomp/ChangeLog: * testsuite/libgomp.oacc-c-c++-common/reduction-5.c: Adapt for improved warning locations. * testsuite/libgomp.oacc-c-c++-common/vred2d-128.c: Likewise. OK for the trunk. Thanks for digging into this! jeff
Re: [PATCH] middle-end/104854: Avoid overread warning for strnlen and strndup
On 10/07/2022 08:59, Jeff Law via Gcc-patches wrote: On 3/9/2022 5:39 PM, Siddhesh Poyarekar wrote: The size argument larger than size of SRC for strnlen and strndup is problematic only if SRC is not NULL terminated, which invokes undefined behaviour. In all other cases, as long as SRC is large enough to have a NULL char (i.e. size 1 or more), a larger N should not invoke a warning during compilation. Such a warning may be a suitable check for the static analyzer instead with slightly different wording suggesting that choice of size argument makes the function call equivalent to strlen/strdup. This change results in the following code going through without a warning: -- char *buf; char * foo (void) { buf = __builtin_malloc (4); __builtin_memset (buf, 'A', 4); return __builtin_strndup (buf, 5); } int main () { __builtin_printf ("%s\n", foo ()); } -- but the problem above is a missing NULL, not N being larger than the size of SRC and the overread warning in this context is confusing at best and misleading (and hinting at the wrong solution) in the worst case. gcc/ChangeLog: middle-end/104854 * gimple-ssa-warn-access.cc (check_access): New parameter. Skip warning if in read-only mode, source string is NULL terminated and has non-zero object size. (check_access): New parameter. (check_access): Adjust. (check_read_access): New parameter. Adjust for check_access change. (pass_waccess::check_builtin): Adjust check_read_access call for memcmp, memchr. (pass_waccess::maybe_check_access_sizes): Likewise. gcc/testsuite/ChangeLog: middle-end/104854 * gcc.dg/Wstringop-overread.c (test_strnlen_array, test_strndup_array): Don't expect warning for non-zero source sizes. * gcc.dg/attr-nonstring-4.c (strnlen_range): Likewise. * gcc.dg/pr78902.c: Likewise. * gcc.dg/warn-strnlen-no-nul.c: Likewise. I know this is old and the BZ has been set as CLOSED/INVALID. But it was in my TODO list, and I've got thoughts here so I might as well chime in ;-) The potential overread warning for that code seems quite reasonable to me. Yes it is the case that the length argument is sometimes unrelated to the source string. But even then where's the line for when we should and should not warn? The argument I was trying to make in the context of strnlen and strndup was that it is more likely in practice for the length argument to be a function of some other property, e.g. a destination buffer or an external limit that it is to be related to the source string. However I don't have any concrete evidence (or the cycles to find it at the moment) to either back up my claim or refute it. strndup for example seems popular for a substring alloc+copy and also for a general string copy with an application-specific upper bound, e.g. PATH_MAX. Thanks, Sid