[PATCH] [i386] Optimize __builtin_shuffle when it's used to zero the upper bits of the dest. [PR target/94680]
Hi: If the second operand of __builtin_shuffle is const vector 0, and with specific mask, it can be optimized to movq/vmovps. .i.e. foo128: - vxorps %xmm1, %xmm1, %xmm1 - vmovlhps%xmm1, %xmm0, %xmm0 + vmovq %xmm0, %xmm0 foo256: - vxorps %xmm1, %xmm1, %xmm1 - vshuff32x4 $0, %ymm1, %ymm0, %ymm0 + vmovaps %xmm0, %xmm0 foo512: - vxorps %xmm1, %xmm1, %xmm1 - vshuff32x4 $68, %zmm1, %zmm0, %zmm0 + vmovaps %ymm0, %ymm0 Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/94680 * config/i386/sse.md (ssedoublevecmode): Add attribute for V64QI/V32HI/V16SI/V4DI. (ssehalfvecmode): Add attribute for V2DI/V2DF. (*vec_concatv4si_0): Extend to VI124_128. (*vec_concat_0): New pre-reload splitter. * config/i386/predicates.md (movq_parallel): New predicate. gcc/testsuite/ChangeLog: PR target/94680 * gcc.target/i386/avx-pr94680.c: New test. * gcc.target/i386/avx512f-pr94680.c: New test. * gcc.target/i386/sse2-pr94680.c: New test. -- BR, Hongtao From eec5469cdeecf0e6650e9d2963dea4117919c5d2 Mon Sep 17 00:00:00 2001 From: liuhongt Date: Thu, 22 Apr 2021 15:33:16 +0800 Subject: [PATCH] [i386] Optimize __builtin_shuffle when it's used to zero the upper bits of the dest. [PR target/94680] If the second operand of __builtin_shuffle is const vector 0, and with specific mask, it can be optimized to movq/vmovps. .i.e. foo128: - vxorps %xmm1, %xmm1, %xmm1 - vmovlhps%xmm1, %xmm0, %xmm0 + vmovq %xmm0, %xmm0 foo256: - vxorps %xmm1, %xmm1, %xmm1 - vshuff32x4 $0, %ymm1, %ymm0, %ymm0 + vmovaps %xmm0, %xmm0 foo512: - vxorps %xmm1, %xmm1, %xmm1 - vshuff32x4 $68, %zmm1, %zmm0, %zmm0 + vmovaps %ymm0, %ymm0 gcc/ChangeLog: PR target/94680 * config/i386/sse.md (ssedoublevecmode): Add attribute for V64QI/V32HI/V16SI/V4DI. (ssehalfvecmode): Add attribute for V2DI/V2DF. (*vec_concatv4si_0): Extend to VI124_128. (*vec_concat_0): New pre-reload splitter. * config/i386/predicates.md (movq_parallel): New predicate. gcc/testsuite/ChangeLog: PR target/94680 * gcc.target/i386/avx-pr94680.c: New test. * gcc.target/i386/avx512f-pr94680.c: New test. * gcc.target/i386/sse2-pr94680.c: New test. --- gcc/config/i386/predicates.md | 33 gcc/config/i386/sse.md| 37 +++-- gcc/testsuite/gcc.target/i386/avx-pr94680.c | 59 ++ .../gcc.target/i386/avx512f-pr94680.c | 78 +++ gcc/testsuite/gcc.target/i386/sse2-pr94680.c | 51 5 files changed, 250 insertions(+), 8 deletions(-) create mode 100644 gcc/testsuite/gcc.target/i386/avx-pr94680.c create mode 100644 gcc/testsuite/gcc.target/i386/avx512f-pr94680.c create mode 100644 gcc/testsuite/gcc.target/i386/sse2-pr94680.c diff --git a/gcc/config/i386/predicates.md b/gcc/config/i386/predicates.md index b1df8548af6..4b706003ed8 100644 --- a/gcc/config/i386/predicates.md +++ b/gcc/config/i386/predicates.md @@ -1524,6 +1524,39 @@ (define_predicate "misaligned_operand" (and (match_code "mem") (match_test "MEM_ALIGN (op) < GET_MODE_BITSIZE (mode)"))) +;; Return true if OP is a parallel for an mov{d,q,dqa,ps,pd} vec_select, +;; where one of the two operands of the vec_concat is const0_operand. +(define_predicate "movq_parallel" + (match_code "parallel") +{ + unsigned nelt = XVECLEN (op, 0); + unsigned nelt2 = nelt >> 1; + unsigned i; + + if (nelt < 2) +return false; + + /* Validate that all of the elements are constants, + lower halves of permute are lower halves of the first operand, + upper halves of permute come from any of the second operand. */ + for (i = 0; i < nelt; ++i) +{ + rtx er = XVECEXP (op, 0, i); + unsigned HOST_WIDE_INT ei; + + if (!CONST_INT_P (er)) + return 0; + ei = INTVAL (er); + if (i < nelt2 && ei != i) + return 0; + if (i >= nelt2 + && (ei < nelt || ei >= nelt<<1)) + return 0; +} + + return 1; +}) + ;; Return true if OP is a vzeroall operation, known to be a PARALLEL. (define_predicate "vzeroall_operation" (match_code "parallel") diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 9d3728d1cb0..b55636a3e12 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -812,19 +812,22 @@ (define_mode_attr sseintvecmodelower ;; Mapping of vector modes to a vector mode of double size (define_mode_attr ssedoublevecmode - [(V32QI "V64QI") (V16HI "V32HI") (V8SI "V16SI") (V4DI "V8DI") + [(V64QI "V128QI") (V32HI "V64HI") (V16SI "V32SI") (V8DI "V16DI") + (V32QI "V64QI") (V16HI "V32HI") (V8SI "V16SI") (V4DI "V8DI") (V16QI "V32QI") (V8HI "V16HI") (V4SI "V8SI") (V2DI "V4DI") + (V16SF "V32SF") (V8DF "V16DF") (V8SF "V16SF") (V4DF "V8DF") (V4SF "V8SF") (V2DF "V4DF")]) ;;
Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.
On 4/22/21 4:55 AM, Martin Liška wrote: There's an updated version of the patch, Jonathan noticed correctly the comment related to assert was not correct. Subject: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX. Subject line needs "c++:" Please also include the rationale from your first message before the ChangeLog entries. OK with those adjustments. Jason
[PATCH] Switch AIX configuration to DWARF2 debugging
As requested at the end of Stage 4, this patch changes the debugging format for AIX configuration of GCC to "DWARF2". This is in preparation for removing stabs debugging support from GCC. The rs6000 configuration files remain somewhat intertwined with the stabs debugging support, but the configuration no longer generates stabs debugging information. I have been bootstrapping and testing GCC with this configuration for years. This patch means that earlier releases (Technology Levels) of AIX 7.1 and 7.2, prior to DWARF support and fixes, cannot build GCC or support GCC. Thanks, David * config/rs6000/aix71.h (PREFERRED_DEBUGGING_TYPE): Change to DWARF2_DEBUG. * config/rs6000/aix72.h (PREFERRED_DEBUGGING_TYPE): Same. diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h index 3612ed2593b..807e260a175 100644 --- a/gcc/config/rs6000/aix71.h +++ b/gcc/config/rs6000/aix71.h @@ -272,9 +272,9 @@ extern long long intatoll(const char *); #define TARGET_AIX_VERSION 71 -/* AIX 7.1 supports DWARF3 debugging, but XCOFF remains the default. */ +/* AIX 7.1 supports DWARF3+ debugging. */ #define DWARF2_DEBUGGING_INFO 1 -#define PREFERRED_DEBUGGING_TYPE XCOFF_DEBUG +#define PREFERRED_DEBUGGING_TYPE DWARF2_DEBUG #define DEBUG_INFO_SECTION "0x1" #define DEBUG_LINE_SECTION "0x2" #define DEBUG_PUBNAMES_SECTION "0x3" diff --git a/gcc/config/rs6000/aix72.h b/gcc/config/rs6000/aix72.h index d34909283cc..36c5d994439 100644 --- a/gcc/config/rs6000/aix72.h +++ b/gcc/config/rs6000/aix72.h @@ -273,9 +273,9 @@ extern long long intatoll(const char *); #define TARGET_AIX_VERSION 72 -/* AIX 7.2 supports DWARF3 debugging, but XCOFF remains the default. */ +/* AIX 7.2 supports DWARF3+ debugging. */ #define DWARF2_DEBUGGING_INFO 1 -#define PREFERRED_DEBUGGING_TYPE XCOFF_DEBUG +#define PREFERRED_DEBUGGING_TYPE DWARF2_DEBUG #define DEBUG_INFO_SECTION "0x1" #define DEBUG_LINE_SECTION "0x2" #define DEBUG_PUBNAMES_SECTION "0x3"
Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.
On Thu, 22 Apr 2021 14:28:24 -0600 Martin Sebor via Gcc-patches wrote: > > enum E { e = 5 }; > > struct A { E e: 3; }; > > > > constexpr int number_of_bits () > > { > > A a = { }; > > a.e = (E)-1; > > return 32 - __builtin_clz(a.e); > > } > > > > I had the same thought about using clz. It works in this case but > not in if one of the enumerators is negative, or if the underlying > type is signed. or for -fshort-enums in this case as is.
Re: [PATCH] Fix logic error in 32-bit trampolines, PR target/98952
On Fri, Apr 09, 2021 at 05:09:07PM -0400, Michael Meissner wrote: > Fix logic error in 32-bit trampolines, PR target/98952. > > The test in the PowerPC 32-bit trampoline support is backwards. It aborts > if the trampoline size is greater than the expected size. It should abort > when the trampoline size is less than the expected size. > PR target/98952 > * config/rs6000/tramp.S (__trampoline_setup): Fix trampoline size > comparison in 32-bit. > --- a/libgcc/config/rs6000/tramp.S > +++ b/libgcc/config/rs6000/tramp.S > @@ -64,8 +64,7 @@ FUNC_START(__trampoline_setup) > mflr r11 > addi r7,r11,trampoline_initial-4-.LCF0 /* trampoline address -4 */ > > - li r8,trampoline_size /* verify that the trampoline is big > enough */ > - cmpwcr1,r8,r4 > + cmpwi cr1,r4,trampoline_size /* verify that the trampoline is big > enough */ > srwir4,r4,2 /* # words to move */ > addir9,r3,-4/* adjust pointer for lwzu */ > mtctr r4 As Will says, it looks like the ELFv2 version has the same bug. Please fix that the same way. In the commit message and the changelog, point out that you folded the cmp with the li while you were at it. It is easier to read code like this so the change is fine, but do point it out. Can you test this in a testcase somehow? That would have found the ELFv2 case, for example. Okay for trunk. Okay for backport to 11 when that branch opens again. Does this need more backports? (Those should follow after 11 of course). Thanks, Segher
Re: [PATCH] Fix logic error in 32-bit trampolines, PR target/98952
On Mon, Apr 12, 2021 at 05:02:38PM -0500, will schmidt wrote: > On Fri, 2021-04-09 at 17:09 -0400, Michael Meissner wrote: > > - li r8,trampoline_size /* verify that the trampoline is big > > enough */ > > - cmpwcr1,r8,r4 > > + cmpwi cr1,r4,trampoline_size /* verify that the trampoline is big > > enough */ > > Hmm, I spent several minutes trying to determine how cmpw behaves > differently than cmpwi before noticing you also swapped the > order of the r4,r8 operands. > > That seems OK. > > A statement in the description indicating that you used a cmpwi instead > of a cmpw since you were in the neighborhood would help call that out. In general, don't do two unrelated things, esp. when not pointing it out explicitly. > The #elif _CALL_ELF == 2 portion of tramp.S (line 159 or so) has a > similar compare stanza with respect to the order of operands on the > compare. Will this also have a backwards greater-than less-than issue? > > li r8,trampoline_size /* verify that the trampoline is big > enough */ > cmpwcr1,r8,r4 > srwir4,r4,3 /* # doublewords to move */ > addir9,r3,-8/* adjust pointer for stdu */ > mtctr r4 > blt cr1,.Labort Not sure... It should use cmpwi as well though, and then it is easier to see. Segher
[wwwdocs] IPA/LTO/profile-feedback changes
Hi, this patch adds changesentry for IPA/LTO and FDO. Honza diff --git a/htdocs/gcc-11/changes.html b/htdocs/gcc-11/changes.html index 6f58cfe8..bba16ead 100644 --- a/htdocs/gcc-11/changes.html +++ b/htdocs/gcc-11/changes.html @@ -170,6 +170,37 @@ a work-in-progress. use -g together with -gdwarf-2, -gdwarf-3 or -gdwarf-4. + +Inter-procedural optimization improvements: + + New IPA-modref pass was added to track side-effects of function calls + and improve precision of points-to-analysis. Pass can be controlled + by -fipa-modref attribute. + + Identical code folding pass was significantly improved to increase number of + unified functions and to reduce compile-time memory use. + IPA-CP heuristics improved its estimation of potential usefulness of + known loop bounds and strides by taking into account the estimated + frequency of these loops. + + +Link-time optimization improvements: + + LTO bytecode file format was optimized for smaller object files and + faster streaming. + Memory allocation of the linking stage was improved to reduce peak + memory use. + + + +Profile driven optimization improvements: + + +Using https://gcc.gnu.org/onlinedocs/gcc-10.1.0/gcc/Optimize-Options.html#index-fprofile-values;>-fprofile-values, + was improved by tracking more target values for e.g. indirect calls. + + +
Re: [PATCH] c++: Hard error with tentative parse and CTAD [PR87709]
On 4/22/21 11:34 AM, Patrick Palka wrote: As described in detail in comment #4 of this PR, when tentatively parsing a construct that can either be a type or an expression, if during the type parse we encounter an unexpected template placeholder, we need to simulate an error rather than issue a real error because the subsequent expression parse can still succeed. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? OK. gcc/cp/ChangeLog: PR c++/87709 * parser.c (cp_parser_type_id_1): If we see a template placeholder, first try simulating an error before issuing a real error. gcc/testsuite/ChangeLog: PR c++/87709 * g++.dg/cpp1z/class-deduction86.C: New test. --- gcc/cp/parser.c| 11 +++ gcc/testsuite/g++.dg/cpp1z/class-deduction86.C | 16 2 files changed, 23 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction86.C diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index fba516efa23..e1b1617da68 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -23270,10 +23270,13 @@ cp_parser_type_id_1 (cp_parser *parser, cp_parser_flags flags, location_t loc = type_specifier_seq.locations[ds_type_spec]; if (tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (auto_node)) { - error_at (loc, "missing template arguments after %qT", - auto_node); - inform (DECL_SOURCE_LOCATION (tmpl), "%qD declared here", - tmpl); + if (!cp_parser_simulate_error (parser)) + { + error_at (loc, "missing template arguments after %qT", + auto_node); + inform (DECL_SOURCE_LOCATION (tmpl), "%qD declared here", + tmpl); + } } else if (parser->in_template_argument_list_p) error_at (loc, "%qT not permitted in template argument", diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction86.C b/gcc/testsuite/g++.dg/cpp1z/class-deduction86.C new file mode 100644 index 000..a198ed24ec6 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction86.C @@ -0,0 +1,16 @@ +// PR c++/87709 +// { dg-do compile { target c++17 } } + +template +struct lit { + lit(T) { } +}; + +template +int operator+(lit, lit) { + return 0; +} + +auto r2 = (lit(0)) + lit(0); + +static_assert(sizeof(lit(0)));
Re: [PATCH] c++: Refine enum direct-list-initialization [CWG2374]
On 4/22/21 10:49 AM, Patrick Palka wrote: This implements the wording changes of CWG2374, which clarifies the wording of P0138 to forbid e.g. direct-list-initialization of a scoped enumeration from a different scoped enumeration. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? OK. gcc/cp/ChangeLog: DR 2374 * decl.c (is_direct_enum_init): Check the implicit convertibility requirement added by CWG 2374. gcc/testsuite/ChangeLog: DR 2374 * g++.dg/cpp1z/direct-enum-init2.C: New test. --- gcc/cp/decl.c | 8 +++- gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C | 8 2 files changed, 15 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index b81de8ef934..60dc2bf182d 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -6191,7 +6191,13 @@ is_direct_enum_init (tree type, tree init) && ENUM_FIXED_UNDERLYING_TYPE_P (type) && TREE_CODE (init) == CONSTRUCTOR && CONSTRUCTOR_IS_DIRECT_INIT (init) - && CONSTRUCTOR_NELTS (init) == 1) + && CONSTRUCTOR_NELTS (init) == 1 + /* DR 2374: The single element needs to be implicitly +convertible to the underlying type of the enum. */ + && can_convert_arg (ENUM_UNDERLYING_TYPE (type), + TREE_TYPE (CONSTRUCTOR_ELT (init, 0)->value), + CONSTRUCTOR_ELT (init, 0)->value, + LOOKUP_IMPLICIT, tf_none)) return true; return false; } diff --git a/gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C b/gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C new file mode 100644 index 000..5b94a8d00fe --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C @@ -0,0 +1,8 @@ +// DR 2374 +// { dg-do compile { target c++17 } } + +enum class Orange; +enum class Apple; + +extern Orange o; +Apple a{o}; // { dg-error "cannot convert" }
[PATCH 1/2] Generate overlapping operations between two areas of memory
For op_by_pieces operations between two areas of memory on non-strict alignment target, add -foverlap-op-by-pieces=[off|on|max-memset] to generate overlapping operations to minimize number of operations if it is not a stack push which must not overlap. When operating on LENGTH bytes of memory, -foverlap-op-by-pieces=on starts with the widest usable integer size, MAX_SIZE, for LENGTH bytes and finishes with the smallest usable integer size, MIN_SIZE, for the remaining bytes where MAX_SIZE >= MIN_SIZE. If MIN_SIZE > the remaining bytes, the last operation is performed on MIN_SIZE bytes of overlapping memory from the previous operation. For memset with non-zero byte, -foverlap-op-by-pieces=max-memset generates an overlapping fill with MAX_SIZE if the number of the remaining bytes is greater than one. Tested on Linux/x86-64 with both -foverlap-op-by-pieces enabled and disabled by default. gcc/ PR middl-end/90773 * common.opt (-foverlap-op-by-pieces): New. * expr.c (by_pieces_ninsns): If -foverlap-op-by-pieces is enabled, round up size and alignment to the widest integer mode for maximum size (op_by_pieces_d): Add get_usable_mode, m_push and m_non_zero_memset. (op_by_pieces_d::op_by_pieces_d): Add 2 bool arguments to initialize m_push and m_non_zero_memset. (op_by_pieces_d::get_usable_mode): New. (op_by_pieces_d::run): Use get_usable_mode to get the largest usable integer mode and generate overlapping operations for -foverlap-op-by-pieces. (PUSHG_P): New. (move_by_pieces_d::move_by_pieces_d): Updated for op_by_pieces_d change. (store_by_pieces_d::store_by_pieces_d): Likewise. (clear_by_pieces): Likewsie. * toplev.c (process_options): Issue an error when -foverlap-op-by-pieces is used for strict alignment target. * doc/invoke.texi: Document -foverlap-op-by-pieces. gcc/testsuite/ PR middl-end/90773 * g++.dg/pr90773-1.h: New test. * g++.dg/pr90773-1a.C: Likewise. * g++.dg/pr90773-1b.C: Likewise. * g++.dg/pr90773-1c.C: Likewise. * g++.dg/pr90773-1d.C: Likewise. * gcc.target/i386/pr90773-1.c: Likewise. * gcc.target/i386/pr90773-2.c: Likewise. * gcc.target/i386/pr90773-3.c: Likewise. * gcc.target/i386/pr90773-4.c: Likewise. * gcc.target/i386/pr90773-5.c: Likewise. * gcc.target/i386/pr90773-6.c: Likewise. * gcc.target/i386/pr90773-7.c: Likewise. * gcc.target/i386/pr90773-8.c: Likewise. * gcc.target/i386/pr90773-9.c: Likewise. * gcc.target/i386/pr90773-10.c: Likewise. * gcc.target/i386/pr90773-11.c: Likewise. --- gcc/common.opt | 19 +++ gcc/doc/invoke.texi| 14 ++ gcc/expr.c | 159 - gcc/testsuite/g++.dg/pr90773-1.h | 14 ++ gcc/testsuite/g++.dg/pr90773-1a.C | 13 ++ gcc/testsuite/g++.dg/pr90773-1b.C | 5 + gcc/testsuite/g++.dg/pr90773-1c.C | 5 + gcc/testsuite/g++.dg/pr90773-1d.C | 19 +++ gcc/testsuite/gcc.target/i386/pr90773-1.c | 17 +++ gcc/testsuite/gcc.target/i386/pr90773-10.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-11.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-2.c | 20 +++ gcc/testsuite/gcc.target/i386/pr90773-3.c | 23 +++ gcc/testsuite/gcc.target/i386/pr90773-4.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-5.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-6.c | 11 ++ gcc/testsuite/gcc.target/i386/pr90773-7.c | 11 ++ gcc/testsuite/gcc.target/i386/pr90773-8.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-9.c | 13 ++ gcc/toplev.c | 8 ++ 20 files changed, 383 insertions(+), 33 deletions(-) create mode 100644 gcc/testsuite/g++.dg/pr90773-1.h create mode 100644 gcc/testsuite/g++.dg/pr90773-1a.C create mode 100644 gcc/testsuite/g++.dg/pr90773-1b.C create mode 100644 gcc/testsuite/g++.dg/pr90773-1c.C create mode 100644 gcc/testsuite/g++.dg/pr90773-1d.C create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-10.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-11.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-6.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-7.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-8.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-9.c diff --git a/gcc/common.opt b/gcc/common.opt index a75b44ee47e..7f5b38c7810 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2123,6 +2123,25 @@
[PATCH 2/2] x86: Enable -foverlap-op-by-pieces by default
Add TARGET_PREFER_OVERLAP_OP_BY_PIECES for Alder Lake and Intel core processoes with AVX2 to enable -foverlap-op-by-pieces by default. gcc/ PR middl-end/90773 * config/i386/i386-options.c (ix86_option_override_internal): Enable -foverlap-op-by-pieces by default for TARGET_PREFER_OVERLAP_OP_BY_PIECES. * config/i386/i386.h (TARGET_PREFER_OVERLAP_OP_BY_PIECES): New. * config/i386/x86-tune.def (X86_TUNE_PREFER_OVERLAP_OP_BY_PIECES): New. gcc/testsuite/ PR middl-end/90773 * gcc.target/i386/pr90773-12.c: New test. * gcc.target/i386/pr90773-13.c: Likewise. --- gcc/config/i386/i386-options.c | 3 +++ gcc/config/i386/i386.h | 2 ++ gcc/config/i386/x86-tune.def | 6 ++ gcc/testsuite/gcc.target/i386/pr90773-12.c | 11 +++ gcc/testsuite/gcc.target/i386/pr90773-13.c | 11 +++ 5 files changed, 33 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-12.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-13.c diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 2a12228d195..5949d2d5597 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -2821,6 +2821,9 @@ ix86_option_override_internal (bool main_args_p, if (ix86_indirect_branch != indirect_branch_keep) SET_OPTION_IF_UNSET (opts, opts_set, flag_jump_tables, 0); + if (TARGET_PREFER_OVERLAP_OP_BY_PIECES) +SET_OPTION_IF_UNSET (opts, opts_set, flag_overlap_op_by_pieces, 1); + return true; } diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h index 96b46bac238..cf24fecaddc 100644 --- a/gcc/config/i386/i386.h +++ b/gcc/config/i386/i386.h @@ -304,6 +304,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST]; #define TARGET_SINGLE_STRINGOP ix86_tune_features[X86_TUNE_SINGLE_STRINGOP] #define TARGET_PREFER_KNOWN_REP_MOVSB_STOSB \ ix86_tune_features[X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB] +#define TARGET_PREFER_OVERLAP_OP_BY_PIECES \ + ix86_tune_features[X86_TUNE_PREFER_OVERLAP_OP_BY_PIECES] #define TARGET_MISALIGNED_MOVE_STRING_PRO_EPILOGUES \ ix86_tune_features[X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES] #define TARGET_QIMODE_MATH ix86_tune_features[X86_TUNE_QIMODE_MATH] diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def index eb057a67750..848c1b53ad4 100644 --- a/gcc/config/i386/x86-tune.def +++ b/gcc/config/i386/x86-tune.def @@ -275,6 +275,12 @@ DEF_TUNE (X86_TUNE_PREFER_KNOWN_REP_MOVSB_STOSB, "prefer_known_rep_movsb_stosb", m_SKYLAKE | m_ALDERLAKE | m_CORE_AVX512) +/* X86_TUNE_PREFER_OVERLAP_OP_BY_PIECES: Enable -foverlap-op-by-pieces-run + by default. */ +DEF_TUNE (X86_TUNE_PREFER_OVERLAP_OP_BY_PIECES, + "prefer_overlap_op_by_pieces", + m_CORE_AVX2) + /* X86_TUNE_MISALIGNED_MOVE_STRING_PRO_EPILOGUES: Enable generation of compact prologues and epilogues by issuing a misaligned moves. This requires target to handle misaligned moves and partial memory stalls diff --git a/gcc/testsuite/gcc.target/i386/pr90773-12.c b/gcc/testsuite/gcc.target/i386/pr90773-12.c new file mode 100644 index 000..e45840a5b8d --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-12.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=skylake" } */ + +void +foo (char *dst, char *src) +{ + __builtin_memcpy (dst, src, 255); +} + +/* { dg-final { scan-assembler-times "movdqu\[\\t \]+\[0-9\]*\\(%\[\^,\]+\\)," 16 } } */ +/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */ diff --git a/gcc/testsuite/gcc.target/i386/pr90773-13.c b/gcc/testsuite/gcc.target/i386/pr90773-13.c new file mode 100644 index 000..4d5ae8d1086 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr90773-13.c @@ -0,0 +1,11 @@ +/* { dg-do compile { target { ! ia32 } } } */ +/* { dg-options "-O2 -mno-avx -msse2 -mtune=skylake" } */ + +void +foo (char *dst) +{ + __builtin_memset (dst, 0, 255); +} + +/* { dg-final { scan-assembler-times "movups\[\\t \]+%xmm\[0-9\]+, \[0-9\]*\\(%\[\^,\]+\\)" 16 } } */ +/* { dg-final { scan-assembler-not "mov\[bwlq\]" } } */ -- 2.30.2
[PATCH 0/2] Generate overlapping operations between two areas of memory
For op_by_pieces operations between two areas of memory on non-strict alignment target, add -foverlap-op-by-pieces=[off|on|max-memset] to generate overlapping operations to minimize number of operations if it is not a stack push which must not overlap. When operating on LENGTH bytes of memory, -foverlap-op-by-pieces=on starts with the widest usable integer size, MAX_SIZE, for LENGTH bytes and finishes with the smallest usable integer size, MIN_SIZE, for the remaining bytes where MAX_SIZE >= MIN_SIZE. If MIN_SIZE > the remaining bytes, the last operation is performed on MIN_SIZE bytes of overlapping memory from the previous operation. For memset with non-zero byte, -foverlap-op-by-pieces=max-memset generates an overlapping fill with MAX_SIZE if the number of the remaining bytes is greater than one. Code sizes are reduced slightly on glibc and GCC. Performance impact on SPEC CPU 2017 on Intel Xeon are within noise range. H.J. Lu (2): Generate overlapping operations between two areas of memory x86: Enable -foverlap-op-by-pieces by default gcc/common.opt | 19 +++ gcc/config/i386/i386-options.c | 3 + gcc/config/i386/i386.h | 2 + gcc/config/i386/x86-tune.def | 6 + gcc/doc/invoke.texi| 14 ++ gcc/expr.c | 159 - gcc/testsuite/g++.dg/pr90773-1.h | 14 ++ gcc/testsuite/g++.dg/pr90773-1a.C | 13 ++ gcc/testsuite/g++.dg/pr90773-1b.C | 5 + gcc/testsuite/g++.dg/pr90773-1c.C | 5 + gcc/testsuite/g++.dg/pr90773-1d.C | 19 +++ gcc/testsuite/gcc.target/i386/pr90773-1.c | 17 +++ gcc/testsuite/gcc.target/i386/pr90773-10.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-11.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-12.c | 11 ++ gcc/testsuite/gcc.target/i386/pr90773-13.c | 11 ++ gcc/testsuite/gcc.target/i386/pr90773-2.c | 20 +++ gcc/testsuite/gcc.target/i386/pr90773-3.c | 23 +++ gcc/testsuite/gcc.target/i386/pr90773-4.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-5.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-6.c | 11 ++ gcc/testsuite/gcc.target/i386/pr90773-7.c | 11 ++ gcc/testsuite/gcc.target/i386/pr90773-8.c | 13 ++ gcc/testsuite/gcc.target/i386/pr90773-9.c | 13 ++ gcc/toplev.c | 8 ++ 25 files changed, 416 insertions(+), 33 deletions(-) create mode 100644 gcc/testsuite/g++.dg/pr90773-1.h create mode 100644 gcc/testsuite/g++.dg/pr90773-1a.C create mode 100644 gcc/testsuite/g++.dg/pr90773-1b.C create mode 100644 gcc/testsuite/g++.dg/pr90773-1c.C create mode 100644 gcc/testsuite/g++.dg/pr90773-1d.C create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-1.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-10.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-11.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-12.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-13.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-2.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-3.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-4.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-5.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-6.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-7.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-8.c create mode 100644 gcc/testsuite/gcc.target/i386/pr90773-9.c -- 2.30.2
[Patch] PR fortran/100154 - [9/10/11/12 Regression] ICE in gfc_conv_procedure_call, at fortran/trans-expr.c:6131
Now with the correct patch attached ... Sorry for the confusion! --- Dear Fortranners, we need to check the arguments to the affected GNU intrinsic extensions properly, and - as pointed out in the PR by Tobias - we need to allow function references that have a data pointer result. Also the argument names of the character arguments of the subroutine versions needed a fix ("c" instead of "count"). Regtested on x86_64-pc-linux-gnu. OK for mainline (12)? OK for backports after 11.1 release? Thanks, Harald PR fortran/100154 - ICE in gfc_conv_procedure_call, at fortran/trans-expr.c:6131 Add appropriate static checks for the character and status arguments to the GNU Fortran intrinsic extensions fget[c], fput[c]. Extend variable check to allow a function reference having a data pointer result. gcc/fortran/ChangeLog: PR fortran/100154 * check.c (variable_check): Allow function reference having a data pointer result. (arg_strlen_is_zero): New function. (gfc_check_fgetputc_sub): Add static check of character and status arguments. (gfc_check_fgetput_sub): Likewise. * intrinsic.c (add_subroutines): Fix argument name for the character argument to intrinsic subroutines fget[c], fput[c]. gcc/testsuite/ChangeLog: PR fortran/100154 * gfortran.dg/pr100154.f90: New test. diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c index 82db8e4e1b2..1d30c93df82 100644 --- a/gcc/fortran/check.c +++ b/gcc/fortran/check.c @@ -1055,6 +1055,13 @@ variable_check (gfc_expr *e, int n, bool allow_proc) return true; } + /* F2018:R902: function reference having a data pointer result. */ + if (e->expr_type == EXPR_FUNCTION + && e->symtree->n.sym->attr.flavor == FL_PROCEDURE + && e->symtree->n.sym->attr.function + && e->symtree->n.sym->attr.pointer) +return true; + gfc_error ("%qs argument of %qs intrinsic at %L must be a variable", gfc_current_intrinsic_arg[n]->name, gfc_current_intrinsic, >where); @@ -5689,6 +5691,19 @@ gfc_check_spread (gfc_expr *source, gfc_expr *dim, gfc_expr *ncopies) /* Functions for checking FGETC, FPUTC, FGET and FPUT (subroutines and functions). */ +bool +arg_strlen_is_zero (gfc_expr *c, int n) +{ + if (gfc_var_strlen (c) == 0) +{ + gfc_error ("%qs argument of %qs intrinsic at %L must have " + "length at least 1", gfc_current_intrinsic_arg[n]->name, + gfc_current_intrinsic, >where); + return true; +} + return false; +} + bool gfc_check_fgetputc_sub (gfc_expr *unit, gfc_expr *c, gfc_expr *status) { @@ -5702,13 +5717,19 @@ gfc_check_fgetputc_sub (gfc_expr *unit, gfc_expr *c, gfc_expr *status) return false; if (!kind_value_check (c, 1, gfc_default_character_kind)) return false; + if (strcmp (gfc_current_intrinsic, "fgetc") == 0 + && !variable_check (c, 1, false)) +return false; + if (arg_strlen_is_zero (c, 1)) +return false; if (status == NULL) return true; if (!type_check (status, 2, BT_INTEGER) || !kind_value_check (status, 2, gfc_default_integer_kind) - || !scalar_check (status, 2)) + || !scalar_check (status, 2) + || !variable_check (status, 2, false)) return false; return true; @@ -5729,13 +5750,19 @@ gfc_check_fgetput_sub (gfc_expr *c, gfc_expr *status) return false; if (!kind_value_check (c, 0, gfc_default_character_kind)) return false; + if (strcmp (gfc_current_intrinsic, "fget") == 0 + && !variable_check (c, 0, false)) +return false; + if (arg_strlen_is_zero (c, 0)) +return false; if (status == NULL) return true; if (!type_check (status, 1, BT_INTEGER) || !kind_value_check (status, 1, gfc_default_integer_kind) - || !scalar_check (status, 1)) + || !scalar_check (status, 1) + || !variable_check (status, 1, false)) return false; return true; diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c index 17fd92eb462..219f04f2317 100644 --- a/gcc/fortran/intrinsic.c +++ b/gcc/fortran/intrinsic.c @@ -3460,7 +3460,7 @@ add_subroutines (void) /* Argument names. These are used as argument keywords and so need to match the documentation. Please keep this list in sorted order. */ static const char -*a = "a", *c = "count", *cm = "count_max", *com = "command", +*a = "a", *c_ = "c", *c = "count", *cm = "count_max", *com = "command", *cr = "count_rate", *dt = "date", *errmsg = "errmsg", *f = "from", *fp = "frompos", *gt = "get", *h = "harvest", *han = "handler", *length = "length", *ln = "len", *md = "mode", *msk = "mask", @@ -3840,12 +3840,12 @@ add_subroutines (void) add_sym_3s ("fgetc", GFC_ISYM_FGETC, CLASS_IMPURE, BT_UNKNOWN, 0, GFC_STD_GNU, gfc_check_fgetputc_sub, NULL, gfc_resolve_fgetc_sub, ut, BT_INTEGER, di, REQUIRED, INTENT_IN, - c, BT_CHARACTER, dc, REQUIRED, INTENT_OUT, + c_, BT_CHARACTER, dc, REQUIRED,
[Patch] PR fortran/100154 - [9/10/11/12 Regression] ICE in gfc_conv_procedure_call, at fortran/trans-expr.c:6131
Dear Fortranners, we need to check the arguments to the affected GNU intrinsic extensions properly, and - as pointed out in the PR by Tobias - we need to allow function references that have a data pointer result. Also the argument names of the character arguments of the subroutine versions needed a fix ("c" instead of "count"). Regtested on x86_64-pc-linux-gnu. OK for mainline (12)? OK for backports after 11.1 release? Thanks, Harald PR fortran/100154 - ICE in gfc_conv_procedure_call, at fortran/trans-expr.c:6131 Add appropriate static checks for the character and status arguments to the GNU Fortran intrinsic extensions fget[c], fput[c]. Extend variable check to allow a function reference having a data pointer result. gcc/fortran/ChangeLog: PR fortran/100154 * check.c (variable_check): Allow function reference having a data pointer result. (arg_strlen_is_zero): New function. (gfc_check_fgetputc_sub): Add static check of character and status arguments. (gfc_check_fgetput_sub): Likewise. * intrinsic.c (add_subroutines): Fix argument name for the character argument to intrinsic subroutines fget[c], fput[c]. gcc/testsuite/ChangeLog: PR fortran/100154 * gfortran.dg/pr100154.f90: New test. diff --git a/gcc/fortran/check.c b/gcc/fortran/check.c index 82db8e4e1b2..1d30c93df82 100644 --- a/gcc/fortran/check.c +++ b/gcc/fortran/check.c @@ -5689,6 +5691,19 @@ gfc_check_spread (gfc_expr *source, gfc_expr *dim, gfc_expr *ncopies) /* Functions for checking FGETC, FPUTC, FGET and FPUT (subroutines and functions). */ +bool +arg_strlen_is_zero (gfc_expr *c, int n) +{ + if (gfc_var_strlen (c) == 0) +{ + gfc_error ("%qs argument of %qs intrinsic at %L must have " + "length at least 1", gfc_current_intrinsic_arg[n]->name, + gfc_current_intrinsic, >where); + return true; +} + return false; +} + bool gfc_check_fgetputc_sub (gfc_expr *unit, gfc_expr *c, gfc_expr *status) { @@ -5702,13 +5717,19 @@ gfc_check_fgetputc_sub (gfc_expr *unit, gfc_expr *c, gfc_expr *status) return false; if (!kind_value_check (c, 1, gfc_default_character_kind)) return false; + if (strcmp (gfc_current_intrinsic, "fgetc") == 0 + && !variable_check (c, 1, false)) +return false; + if (arg_strlen_is_zero (c, 1)) +return false; if (status == NULL) return true; if (!type_check (status, 2, BT_INTEGER) || !kind_value_check (status, 2, gfc_default_integer_kind) - || !scalar_check (status, 2)) + || !scalar_check (status, 2) + || !variable_check (status, 2, false)) return false; return true; @@ -5729,13 +5750,19 @@ gfc_check_fgetput_sub (gfc_expr *c, gfc_expr *status) return false; if (!kind_value_check (c, 0, gfc_default_character_kind)) return false; + if (strcmp (gfc_current_intrinsic, "fget") == 0 + && !variable_check (c, 0, false)) +return false; + if (arg_strlen_is_zero (c, 0)) +return false; if (status == NULL) return true; if (!type_check (status, 1, BT_INTEGER) || !kind_value_check (status, 1, gfc_default_integer_kind) - || !scalar_check (status, 1)) + || !scalar_check (status, 1) + || !variable_check (status, 1, false)) return false; return true; diff --git a/gcc/fortran/intrinsic.c b/gcc/fortran/intrinsic.c index 17fd92eb462..219f04f2317 100644 --- a/gcc/fortran/intrinsic.c +++ b/gcc/fortran/intrinsic.c @@ -3460,7 +3460,7 @@ add_subroutines (void) /* Argument names. These are used as argument keywords and so need to match the documentation. Please keep this list in sorted order. */ static const char -*a = "a", *c = "count", *cm = "count_max", *com = "command", +*a = "a", *c_ = "c", *c = "count", *cm = "count_max", *com = "command", *cr = "count_rate", *dt = "date", *errmsg = "errmsg", *f = "from", *fp = "frompos", *gt = "get", *h = "harvest", *han = "handler", *length = "length", *ln = "len", *md = "mode", *msk = "mask", @@ -3840,12 +3840,12 @@ add_subroutines (void) add_sym_3s ("fgetc", GFC_ISYM_FGETC, CLASS_IMPURE, BT_UNKNOWN, 0, GFC_STD_GNU, gfc_check_fgetputc_sub, NULL, gfc_resolve_fgetc_sub, ut, BT_INTEGER, di, REQUIRED, INTENT_IN, - c, BT_CHARACTER, dc, REQUIRED, INTENT_OUT, + c_, BT_CHARACTER, dc, REQUIRED, INTENT_OUT, st, BT_INTEGER, di, OPTIONAL, INTENT_OUT); add_sym_2s ("fget", GFC_ISYM_FGET, CLASS_IMPURE, BT_UNKNOWN, 0, GFC_STD_GNU, gfc_check_fgetput_sub, NULL, gfc_resolve_fget_sub, - c, BT_CHARACTER, dc, REQUIRED, INTENT_OUT, + c_, BT_CHARACTER, dc, REQUIRED, INTENT_OUT, st, BT_INTEGER, di, OPTIONAL, INTENT_OUT); add_sym_1s ("flush", GFC_ISYM_FLUSH, CLASS_IMPURE, BT_UNKNOWN, 0, GFC_STD_GNU, @@ -3855,12 +3855,12 @@ add_subroutines (void) add_sym_3s ("fputc", GFC_ISYM_FPUTC, CLASS_IMPURE, BT_UNKNOWN, 0, GFC_STD_GNU, gfc_check_fgetputc_sub, NULL,
Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.
On 4/22/21 9:41 AM, Jonathan Wakely wrote: On Thu, 22 Apr 2021 at 15:59, Martin Sebor wrote: On 4/22/21 2:52 AM, Jonathan Wakely wrote: On Thu, 22 Apr 2021, 08:47 Martin Liška, wrote: On 4/21/21 6:11 PM, Martin Sebor wrote: > On 4/21/21 2:15 AM, Martin Liška wrote: >> Hello. >> >> It's addressing the following Clang warning: >> cp/lex.c:170:45: warning: result of comparison of constant 64 with expression of type 'enum ovl_op_code' is always true [-Wtautological-constant-out-of-range-compare] >> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. >> >> Ready to be installed? >> Thanks, >> Martin >> >> gcc/cp/ChangeLog: >> >> * cp-tree.h (STATIC_ASSERT): Prefer static assert. >> * lex.c (init_operators): Remove run-time check. >> --- >> gcc/cp/cp-tree.h | 3 +++ >> gcc/cp/lex.c | 2 -- >> 2 files changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h >> index 81ff375f8a5..a8f72448ea9 100644 >> --- a/gcc/cp/cp-tree.h >> +++ b/gcc/cp/cp-tree.h >> @@ -5916,6 +5916,9 @@ enum ovl_op_code { >> OVL_OP_MAX >> }; >> +/* Make sure it fits in lang_decl_fn::operator_code. */ >> +STATIC_ASSERT (OVL_OP_MAX < (1 << 6)); >> + > > I wonder if there's a way to test this directly by something like > > static_assert (number-of-bits (ovl_op_info_t::ovl_op_code) > <= number-of-bits (lang_decl_fn::operator_code)); Good point, but I'm not aware of it. Maybe C++ people can chime in? ovl_op_code is an unscoped enumeration (meaning "enum" not "enum class") with no fixed underlying type (i.e. no enum-base like ": int" or ": long" is specified) which means that the number of bits in is value representation is the number of bits needed to represent the minimum and maximum enumerators: "the values of the enumeration are the values representable by a hypothetical integer type with minimal width M such that all enumerators can be represented." There is no function/utility like number-of-bits that can tell you that from the type though.You could use std::underlying_type::type to get the integral type that the compiler used to represent it, but that will probably be 'int' in this case and so all it tells you is an upper bound of no more than 32 bits, which is not useful for this purpose. I suspected there wasn't a function like that. Thanks for confirming it. I wrote the one below just to see if it could be done. It works for one bit-field but I can't think of a way to generalize it. We'd probably need a built-in for that. Perhaps one might be useful. enum E { e = 5 }; struct A { E e: 3; }; constexpr int number_of_bits () { A a = { }; a.e = (E)-1; int n = 0; for (; a.e; ++n) a.e = (E)((unsigned)a.e ^ (1 << n)); return n; } Martin Or: enum E { e = 5 }; struct A { E e: 3; }; constexpr int number_of_bits () { A a = { }; a.e = (E)-1; return 32 - __builtin_clz(a.e); } I had the same thought about using clz. It works in this case but not in if one of the enumerators is negative, or if the underlying type is signed. But you can't get the number-of-bits needed for all the values of the enum E, which is what I was referring to. If you know the enumerators go from 0 to MAX (as is the case for ovl_op_code) you can use (32 - __builtin_clz(MAX)) there too, but in the general case you don't always know the maximum enumerator without checking, and it depends whether the enumeration has a fixed underlying type. That might be another useful query to add a built-in or trait for to improve introspection: get the min and max enumerator (or even all of them, e.g., as an initializer_list or something like that). Martin
[Patch] PR fortran/100218 - target of pointer from evaluation of function-reference
Dear Fortranners, while analyzing a different PR (PR100154), Tobias pointed out that the target of a pointer from the evaluation of function-reference is allowed to be used in a variable definition context and thus as an actual argument to a function or subroutine. This seems to be a more general issue that seems to have been overlooked. The attached simple patch allows to compile and run the attached example, which is by the way already yet rejected with -std=f2003. Regtested on x86_64-pc-linux-gnu. OK for mainline? Shall we backport this to (at least) 11? Thanks, Harald Fortran - allow target of pointer from evaluation of function-reference Fortran allows the target of a pointer from the evaluation of a function-reference in a variable definition context (e.g. F2018:R902). gcc/fortran/ChangeLog: PR fortran/100218 * expr.c (gfc_check_vardef_context): Extend check to allow pointer from a function reference. gcc/testsuite/ChangeLog: PR fortran/100218 * gfortran.dg/pr100218.f90: New test. diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c index 92a6700568d..696b9f1daac 100644 --- a/gcc/fortran/expr.c +++ b/gcc/fortran/expr.c @@ -6121,7 +6132,9 @@ gfc_check_vardef_context (gfc_expr* e, bool pointer, bool alloc_obj, } if (!pointer && sym->attr.flavor != FL_VARIABLE && !(sym->attr.flavor == FL_PROCEDURE && sym == sym->result) - && !(sym->attr.flavor == FL_PROCEDURE && sym->attr.proc_pointer)) + && !(sym->attr.flavor == FL_PROCEDURE && sym->attr.proc_pointer) + && !(sym->attr.flavor == FL_PROCEDURE + && sym->attr.function && sym->attr.pointer)) { if (context) gfc_error ("%qs in variable definition context (%s) at %L is not" diff --git a/gcc/testsuite/gfortran.dg/pr100218.f90 b/gcc/testsuite/gfortran.dg/pr100218.f90 new file mode 100644 index 000..62b18f6a935 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/pr100218.f90 @@ -0,0 +1,19 @@ +! { dg-do run } +! { dg-options "-O2 -std=f2008" } +! PR fortran/100218 - target of pointer from evaluation of function-reference + +program p + implicit none + integer, target :: z = 0 + call g (f ()) + if (z /= 1) stop 1 +contains + function f () result (r) +integer, pointer :: r +r => z + end function f + subroutine g (x) +integer, intent(out) :: x +x = 1 + end subroutine g +end program p
[PATCH] config/i386: Commentary typo fix
From: Bernhard Reutner-Fischer gcc/ChangeLog: * config/i386/x86-tune-sched-bd.c (dispatch_group): Commentary typo fix. --- gcc/config/i386/x86-tune-sched-bd.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/config/i386/x86-tune-sched-bd.c b/gcc/config/i386/x86-tune-sched-bd.c index ad0edf713f5..be38e48b271 100644 --- a/gcc/config/i386/x86-tune-sched-bd.c +++ b/gcc/config/i386/x86-tune-sched-bd.c @@ -67,7 +67,7 @@ along with GCC; see the file COPYING3. If not see #define BIG 100 -/* Dispatch groups. Istructions that affect the mix in a dispatch window. */ +/* Dispatch groups. Instructions that affect the mix in a dispatch window. */ enum dispatch_group { disp_no_group = 0, disp_load, -- 2.31.1
Re: Fix Fortran rounding issues, PR fortran/96983.
On April 22, 2021 9:09:28 PM GMT+02:00, Michael Meissner wrote: >On Wed, Apr 21, 2021 at 10:10:07AM +0200, Tobias Burnus wrote: >> On 20.04.21 08:58, Richard Biener via Fortran wrote: >> >On Mon, Apr 19, 2021 at 9:40 PM Michael Meissner via Fortran >> > wrote: >> Is there any reason to not only send the email to fortran@ _and_ >> gcc-patches@ but sending it to 13 Fortran maintainers explicitly? >(Now >> removed) > >Sorry about that. With PowerPC backend changes, I generally do >explicitly add >the maintainers so things don't get lost. > > >> >>Fix Fortran rounding issues, PR fortran/96983. >> >> >> >>Can I check this change into the GCC trunk? >> >The patch looks fine technically and is definitely an improvement >since the >> >intermediate conversion looks odd. But it might be that the >standard >> >requires such dance though the preceeding cases handled don't seem >to >> >care. I'm thinking of a FP format where round(1.6) == 3 because of >lack >> >of precision but using an intermediate FP format with higher >precision >> >would "correctly" compute 2. >> >> The patched build_round_expr is only called by ANINT / NINT; >> NINT is real → integer; ANINT is real → real >> [And the modified code is only called for NINT, reason: see comment >far below.] >> >> NINT (A[, KIND]) is described (F2018) as "Nearest integer": >> * Result Characteristics. Integer. If KIND is present, the kind type >parameter >> is that specified by the value of KIND; >> otherwise, the kind type parameter is that of default integer type. >> * The result is the integer nearest A, or if there are two >> integers equally near A, the result is whichever such integer has >the greater >> magnitude. >> * Example. NINT (2.783) has the value 3. >> >> ANINT (A[, KIND]) as "Nearest whole number": >> * The result is of type real. If KIND is present, the kind type >parameter is that >> specified by the value of KIND; otherwise, the kind type parameter >is that of A. >> * The result is the integer nearest A, or if there are two integers >equally near A, >> the result is whichever such integer has the greater magnitude. >> * Examples. ANINT (2.783) has the value 3.0. ANINT (−2.783) has the >value −3.0. >> >> >Of course the current code doesn't handle this correctly for the >> >case if llroundf either. >> >>I've not contributed to the Fortran front end before. If the >maintainers like >> >>the patch, can somebody point out if I need to do additional things >to commit >> >>the patch? >> Nothing special: a testcase already exists, committing is done as >usual >> and a PR to update you have as well. > >Given GCC 11 has branched, is it ok to backport the patch to the GCC 11 >branch >as well? I assume it is, since it fixes a regression in the compiler. Please wait until after 11.1 is released. Thanks, Richard.
Re: Fix Fortran rounding issues, PR fortran/96983.
On Wed, Apr 21, 2021 at 10:10:07AM +0200, Tobias Burnus wrote: > On 20.04.21 08:58, Richard Biener via Fortran wrote: > >On Mon, Apr 19, 2021 at 9:40 PM Michael Meissner via Fortran > > wrote: > Is there any reason to not only send the email to fortran@ _and_ > gcc-patches@ but sending it to 13 Fortran maintainers explicitly? (Now > removed) Sorry about that. With PowerPC backend changes, I generally do explicitly add the maintainers so things don't get lost. > >>Fix Fortran rounding issues, PR fortran/96983. > >> > >>Can I check this change into the GCC trunk? > >The patch looks fine technically and is definitely an improvement since the > >intermediate conversion looks odd. But it might be that the standard > >requires such dance though the preceeding cases handled don't seem to > >care. I'm thinking of a FP format where round(1.6) == 3 because of lack > >of precision but using an intermediate FP format with higher precision > >would "correctly" compute 2. > > The patched build_round_expr is only called by ANINT / NINT; > NINT is real → integer; ANINT is real → real > [And the modified code is only called for NINT, reason: see comment far > below.] > > NINT (A[, KIND]) is described (F2018) as "Nearest integer": > * Result Characteristics. Integer. If KIND is present, the kind type parameter > is that specified by the value of KIND; > otherwise, the kind type parameter is that of default integer type. > * The result is the integer nearest A, or if there are two > integers equally near A, the result is whichever such integer has the > greater > magnitude. > * Example. NINT (2.783) has the value 3. > > ANINT (A[, KIND]) as "Nearest whole number": > * The result is of type real. If KIND is present, the kind type parameter is > that > specified by the value of KIND; otherwise, the kind type parameter is that > of A. > * The result is the integer nearest A, or if there are two integers equally > near A, > the result is whichever such integer has the greater magnitude. > * Examples. ANINT (2.783) has the value 3.0. ANINT (−2.783) has the value > −3.0. > > >Of course the current code doesn't handle this correctly for the > >case if llroundf either. > >>I've not contributed to the Fortran front end before. If the maintainers > >>like > >>the patch, can somebody point out if I need to do additional things to > >>commit > >>the patch? > Nothing special: a testcase already exists, committing is done as usual > and a PR to update you have as well. Given GCC 11 has branched, is it ok to backport the patch to the GCC 11 branch as well? I assume it is, since it fixes a regression in the compiler. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797
[committed] c++: Add testcase for already fixed PR [PR16617]
We correctly diagnose the invalid access since r11-1350. gcc/testsuite/ChangeLog: PR c++/16617 * g++.dg/template/access36.C: New test. --- gcc/testsuite/g++.dg/template/access36.C | 25 1 file changed, 25 insertions(+) create mode 100644 gcc/testsuite/g++.dg/template/access36.C diff --git a/gcc/testsuite/g++.dg/template/access36.C b/gcc/testsuite/g++.dg/template/access36.C new file mode 100644 index 000..72ca23c7017 --- /dev/null +++ b/gcc/testsuite/g++.dg/template/access36.C @@ -0,0 +1,25 @@ +// PR c++/16617 + +class B +{ + protected: + int i; +}; + +template void fr (); + +class D2 : public B +{ + friend void fr (); +}; + +template struct X +{}; + +template void fr () +{ + X<::i> x1; // { dg-error "protected" } + X<::i> x2; // { dg-error "protected" } +} + +template void fr(); -- 2.31.1.362.g311531c9de
[committed] c++: Add testcase for already fixed PR [PR84689]
We correctly accept this testcase since r11-1638. gcc/testsuite/ChangeLog: PR c++/84689 * g++.dg/cpp0x/sfinae67.C: New test. --- gcc/testsuite/g++.dg/cpp0x/sfinae67.C | 20 1 file changed, 20 insertions(+) create mode 100644 gcc/testsuite/g++.dg/cpp0x/sfinae67.C diff --git a/gcc/testsuite/g++.dg/cpp0x/sfinae67.C b/gcc/testsuite/g++.dg/cpp0x/sfinae67.C new file mode 100644 index 000..cfed92ad472 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp0x/sfinae67.C @@ -0,0 +1,20 @@ +// PR c++/84689 +// { dg-do compile { target c++11 } } + +struct base { + void operator()(); +}; + +struct a : base { }; +struct b : base { }; + +struct f : a, b { + using a::operator(); + using b::operator(); +}; + +template auto g(int) -> decltype(T()()); +template auto g(...) -> int; + +using type = decltype(g(0)); +using type = int; -- 2.31.1.362.g311531c9de
[Patch, fortran] PR fortran/82376 - Duplicate function call using -fcheck=pointer
Hi All! Proposed patch to: PR82376 - Duplicate function call using -fcheck=pointer Patch tested only on x86_64-pc-linux-gnu. Evaluate function result and then pass a pointer, instead of a reference to the function itself, thus avoiding multiple evaluations of the function. Thank you very much. Best regards, José Rui Fortran: Fix double function call with -fcheck=pointer [PR] gcc/fortran/ChangeLog: PR fortran/82376 * trans-expr.c (gfc_conv_procedure_call): Evaluate function result and then pass a pointer. gcc/testsuite/ChangeLog: PR fortran/82376 * gfortran.dg/PR82376.f90: New test. diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c index 213f32b0a67..b83b021755d 100644 --- a/gcc/fortran/trans-expr.c +++ b/gcc/fortran/trans-expr.c @@ -6014,11 +6014,8 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, || (!e->value.function.esym && e->symtree->n.sym->attr.pointer)) && fsym && fsym->attr.target) - { - gfc_conv_expr (, e); - parmse.expr = gfc_build_addr_expr (NULL_TREE, parmse.expr); - } - + /* Make sure the function only gets called once. */ + gfc_conv_expr_reference (, e, false); else if (e->expr_type == EXPR_FUNCTION && e->symtree->n.sym->result && e->symtree->n.sym->result != e->symtree->n.sym diff --git a/gcc/testsuite/gfortran.dg/PR82376.f90 b/gcc/testsuite/gfortran.dg/PR82376.f90 new file mode 100644 index 000..cea1c2ae211 --- /dev/null +++ b/gcc/testsuite/gfortran.dg/PR82376.f90 @@ -0,0 +1,55 @@ +! { dg-do run } +! +! Test the fix for PR82376 +! + +program main_p + + integer, parameter :: n = 10 + + type :: foo_t +integer, pointer :: v =>null() + end type foo_t + + integer, save :: pcnt = 0 + + type(foo_t) :: int + integer :: i + + do i = 1, n +call init(int, i) +if(.not.associated(int%v)) stop 1 +if(int%v/=i) stop 2 +if(pcnt/=i) stop 3 + end do + +contains + + function new(data) result(this) +integer, target, intent(in) :: data + +integer, pointer :: this + +nullify(this) +this => data +pcnt = pcnt + 1 +return + end function new + + subroutine init(this, data) +type(foo_t), intent(out) :: this +integer, intent(in) :: data + +call set(this, new(data)) +return + end subroutine init + + subroutine set(this, that) +type(foo_t), intent(inout) :: this +integer, target, intent(in):: that + +this%v => that +return + end subroutine set + +end program main_p
testsuite/substr_{9,10}.f90: Move to gfortran.dg/
The test was added in r11-6687-gbdd1b1f55529da00b867ef05a53a08fbfc3d1c2e (PR93340) but placed into the wrong directly. Fixed by moving to gfortran.dg/ in commit r12-68-gac456fd981db6b0c2f7ee1ab0d17d36087a74dc2 after confirming that the tests work. Tobias - Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf commit ac456fd981db6b0c2f7ee1ab0d17d36087a74dc2 Author: Tobias Burnus Date: Thu Apr 22 19:14:58 2021 +0200 testsuite/substr_{9,10}.f90: Move to gfortran.dg/ gcc/testsuite/ * substr_9.f90: Move to ... * gfortran.dg/substr_9.f90: ... here. * substr_10.f90: Move to ... * gfortran.dg/substr_10.f90: ... here. --- gcc/testsuite/{ => gfortran.dg}/substr_10.f90 | 0 gcc/testsuite/{ => gfortran.dg}/substr_9.f90 | 0 2 files changed, 0 insertions(+), 0 deletions(-) diff --git a/gcc/testsuite/substr_10.f90 b/gcc/testsuite/gfortran.dg/substr_10.f90 similarity index 100% rename from gcc/testsuite/substr_10.f90 rename to gcc/testsuite/gfortran.dg/substr_10.f90 diff --git a/gcc/testsuite/substr_9.f90 b/gcc/testsuite/gfortran.dg/substr_9.f90 similarity index 100% rename from gcc/testsuite/substr_9.f90 rename to gcc/testsuite/gfortran.dg/substr_9.f90
Re: [PATCH] gcov: Use system IO buffering
Martin Liška writes: > Hey. > > I/O buffering in gcov seems duplicite to what modern C library can provide. > The patch is a simplification and can provide easier interface for system > that don't have a filesystem and would like using GCOV. > > I'm going to install the patch after 11.1 if there are no objections. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. What happens if someone compiles the C library with gcov? Being as self contained as possible (only system calls) would seem safer. -Andi
[PATCH] libstdc++: Fix semaphore to work with system_clock timeouts
The __cond_wait_until_impl function takes a steady_clock timeout, but then sometimes tries to compare it to a time from the system_clock, which won't compile. Additionally, that function gets called with system_clock timeouts, which also won't compile. This makes the function accept timeouts for either clock, and compare to the time from the right clock. This fixes the compilation error that was causing two tests to fail on non-futex targets, so we can revert the r12-11 change to disable them. libstdc++-v3/ChangeLog: * include/bits/atomic_timed_wait.h (__cond_wait_until_impl): Handle system_clock as well as steady_clock. * testsuite/30_threads/semaphore/try_acquire_for.cc: Re-enable. * testsuite/30_threads/semaphore/try_acquire_until.cc: Re-enable. I'm testing this now on x86_64-linux, powerpc64le-linux, sparc-linux, power-aix and sparc-solaris. It looks good so far, so I'll push to trunk when the tests finish. This should also go to the gcc-11 branch, or the timed waits for semaphores can't be used with system_clock times on non-futed targets. commit 6924588774a02dec63fb4b0ad19aed303accc2d2 Author: Jonathan Wakely Date: Thu Apr 22 17:26:50 2021 libstdc++: Fix semaphore to work with system_clock timeouts The __cond_wait_until_impl function takes a steady_clock timeout, but then sometimes tries to compare it to a time from the system_clock, which won't compile. Additionally, that function gets called with system_clock timeouts, which also won't compile. This makes the function accept timeouts for either clock, and compare to the time from the right clock. This fixes the compilation error that was causing two tests to fail on non-futex targets, so we can revert the r12-11 change to disable them. libstdc++-v3/ChangeLog: * include/bits/atomic_timed_wait.h (__cond_wait_until_impl): Handle system_clock as well as steady_clock. * testsuite/30_threads/semaphore/try_acquire_for.cc: Re-enable. * testsuite/30_threads/semaphore/try_acquire_until.cc: Re-enable. diff --git a/libstdc++-v3/include/bits/atomic_timed_wait.h b/libstdc++-v3/include/bits/atomic_timed_wait.h index 70e5335cfd7..ec7ff51cdbc 100644 --- a/libstdc++-v3/include/bits/atomic_timed_wait.h +++ b/libstdc++-v3/include/bits/atomic_timed_wait.h @@ -139,12 +139,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // (e.g. __ulock_wait())which is better than pthread_cond_clockwait #endif // ! PLATFORM_TIMED_WAIT -// returns true if wait ended before timeout -template +// Returns true if wait ended before timeout. +// _Clock must be either steady_clock or system_clock. +template bool __cond_wait_until_impl(__condvar& __cv, mutex& __mx, - const chrono::time_point& __atime) +const chrono::time_point<_Clock, _Dur>& __atime) { + static_assert(std::__is_one_of<_Clock, chrono::steady_clock, + chrono::system_clock>::value); + auto __s = chrono::time_point_cast(__atime); auto __ns = chrono::duration_cast(__atime - __s); @@ -155,12 +159,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION }; #ifdef _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT - __cv.wait_until(__mx, CLOCK_MONOTONIC, __ts); - return chrono::steady_clock::now() < __atime; -#else - __cv.wait_until(__mx, __ts); - return chrono::system_clock::now() < __atime; -#endif // ! _GLIBCXX_USE_PTHREAD_COND_CLOCKWAIT + if constexpr (is_same_v) + __cv.wait_until(__mx, CLOCK_MONOTONIC, __ts); + else +#endif + __cv.wait_until(__mx, __ts); + return _Clock::now() < __atime; } // returns true if wait ended before timeout diff --git a/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_for.cc b/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_for.cc index 248ecb07e56..e7edc9eeef1 100644 --- a/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_for.cc +++ b/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_for.cc @@ -21,8 +21,6 @@ // { dg-require-gthreads "" } // { dg-add-options libatomic } -// { dg-skip-if "FIXME: fails" { ! futex } } - #include #include #include diff --git a/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_until.cc b/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_until.cc index eb1351cd2bf..49ba33b4999 100644 --- a/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_until.cc +++ b/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_until.cc @@ -21,8 +21,6 @@ // { dg-additional-options "-pthread" { target pthread } } // { dg-add-options libatomic } -// { dg-skip-if "FIXME: fails" { ! futex } } - #include #include #include
Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.
On Thu, 22 Apr 2021 at 15:59, Martin Sebor wrote: > > On 4/22/21 2:52 AM, Jonathan Wakely wrote: > > On Thu, 22 Apr 2021, 08:47 Martin Liška, wrote: > > > > On 4/21/21 6:11 PM, Martin Sebor wrote: > > > On 4/21/21 2:15 AM, Martin Liška wrote: > > >> Hello. > > >> > > >> It's addressing the following Clang warning: > > >> cp/lex.c:170:45: warning: result of comparison of constant 64 > > with expression of type 'enum ovl_op_code' is always true > > [-Wtautological-constant-out-of-range-compare] > > >> > > >> Patch can bootstrap on x86_64-linux-gnu and survives regression > > tests. > > >> > > >> Ready to be installed? > > >> Thanks, > > >> Martin > > >> > > >> gcc/cp/ChangeLog: > > >> > > >> * cp-tree.h (STATIC_ASSERT): Prefer static assert. > > >> * lex.c (init_operators): Remove run-time check. > > >> --- > > >> gcc/cp/cp-tree.h | 3 +++ > > >> gcc/cp/lex.c | 2 -- > > >> 2 files changed, 3 insertions(+), 2 deletions(-) > > >> > > >> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h > > >> index 81ff375f8a5..a8f72448ea9 100644 > > >> --- a/gcc/cp/cp-tree.h > > >> +++ b/gcc/cp/cp-tree.h > > >> @@ -5916,6 +5916,9 @@ enum ovl_op_code { > > >> OVL_OP_MAX > > >> }; > > >> +/* Make sure it fits in lang_decl_fn::operator_code. */ > > >> +STATIC_ASSERT (OVL_OP_MAX < (1 << 6)); > > >> + > > > > > > I wonder if there's a way to test this directly by something like > > > > > > static_assert (number-of-bits (ovl_op_info_t::ovl_op_code) > > > <= number-of-bits (lang_decl_fn::operator_code)); > > > > Good point, but I'm not aware of it. Maybe C++ people can chime in? > > > > > > ovl_op_code is an unscoped enumeration (meaning "enum" not "enum class") > > with no fixed underlying type (i.e. no enum-base like ": int" or ": > > long" is specified) which means that the number of bits in is value > > representation is the number of bits needed to represent the minimum and > > maximum enumerators: > > > > "the values of the enumeration are the values representable by a > > hypothetical integer type with minimal width M such that all enumerators > > can be represented." > > > > There is no function/utility like number-of-bits that can tell you that > > from the type though.You could use > > std::underlying_type::type to get the integral type that > > the compiler used to represent it, but that will probably be 'int' in > > this case and so all it tells you is an upper bound of no more than 32 > > bits, which is not useful for this purpose. > > I suspected there wasn't a function like that. Thanks for confirming > it. I wrote the one below just to see if it could be done. It works > for one bit-field but I can't think of a way to generalize it. We'd > probably need a built-in for that. Perhaps one might be useful. > > enum E { e = 5 }; > struct A { E e: 3; }; > > constexpr int number_of_bits () > { >A a = { }; >a.e = (E)-1; >int n = 0; >for (; a.e; ++n) > a.e = (E)((unsigned)a.e ^ (1 << n)); >return n; > } > > Martin Or: enum E { e = 5 }; struct A { E e: 3; }; constexpr int number_of_bits () { A a = { }; a.e = (E)-1; return 32 - __builtin_clz(a.e); } But you can't get the number-of-bits needed for all the values of the enum E, which is what I was referring to. If you know the enumerators go from 0 to MAX (as is the case for ovl_op_code) you can use (32 - __builtin_clz(MAX)) there too, but in the general case you don't always know the maximum enumerator without checking, and it depends whether the enumeration has a fixed underlying type.
Re: [PATCH] Fix various typos.
On Thu, Apr 22, 2021 at 09:26:47AM -0600, Jeff Law via Gcc-patches wrote: > > * g++.dg/template/nontype29.C: Fix typos and missing comments. > > * gcc.dg/Warray-bounds-64.c: Likewise. > > * gcc.dg/Warray-parameter.c: Likewise. > > * gcc.dg/Wstring-compare.c: Likewise. > > * gcc.dg/format/gcc_diag-11.c: Likewise. > > * gfortran.dg/array_constructor_3.f90: Likewise. > > * gfortran.dg/matmul_bounds_9.f90: Likewise. > > * gfortran.dg/pr78033.f90: Likewise. > > * gfortran.dg/pr96325.f90: Likewise. > > OK and I think we should consider these kinds of typo fixes obvious and thus > not needing an explicit ack. Comment typos sure. The typos in dg-bogus that Richi mentioned is something that actually was a good idea to review, while in that case it was correct, the corresponding commit that added those typos into dg-bogus didn't change any diagnostics from expreession to expression, generally it could be a test that verifies the diagnostics doesn't contain the typo. Jakub
[PATCH] c++: Hard error with tentative parse and CTAD [PR87709]
As described in detail in comment #4 of this PR, when tentatively parsing a construct that can either be a type or an expression, if during the type parse we encounter an unexpected template placeholder, we need to simulate an error rather than issue a real error because the subsequent expression parse can still succeed. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? gcc/cp/ChangeLog: PR c++/87709 * parser.c (cp_parser_type_id_1): If we see a template placeholder, first try simulating an error before issuing a real error. gcc/testsuite/ChangeLog: PR c++/87709 * g++.dg/cpp1z/class-deduction86.C: New test. --- gcc/cp/parser.c| 11 +++ gcc/testsuite/g++.dg/cpp1z/class-deduction86.C | 16 2 files changed, 23 insertions(+), 4 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp1z/class-deduction86.C diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c index fba516efa23..e1b1617da68 100644 --- a/gcc/cp/parser.c +++ b/gcc/cp/parser.c @@ -23270,10 +23270,13 @@ cp_parser_type_id_1 (cp_parser *parser, cp_parser_flags flags, location_t loc = type_specifier_seq.locations[ds_type_spec]; if (tree tmpl = CLASS_PLACEHOLDER_TEMPLATE (auto_node)) { - error_at (loc, "missing template arguments after %qT", - auto_node); - inform (DECL_SOURCE_LOCATION (tmpl), "%qD declared here", - tmpl); + if (!cp_parser_simulate_error (parser)) + { + error_at (loc, "missing template arguments after %qT", + auto_node); + inform (DECL_SOURCE_LOCATION (tmpl), "%qD declared here", + tmpl); + } } else if (parser->in_template_argument_list_p) error_at (loc, "%qT not permitted in template argument", diff --git a/gcc/testsuite/g++.dg/cpp1z/class-deduction86.C b/gcc/testsuite/g++.dg/cpp1z/class-deduction86.C new file mode 100644 index 000..a198ed24ec6 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1z/class-deduction86.C @@ -0,0 +1,16 @@ +// PR c++/87709 +// { dg-do compile { target c++17 } } + +template +struct lit { + lit(T) { } +}; + +template +int operator+(lit, lit) { + return 0; +} + +auto r2 = (lit(0)) + lit(0); + +static_assert(sizeof(lit(0))); -- 2.31.1.362.g311531c9de
Re: [PATCH] Fix various typos.
On 4/22/2021 6:23 AM, Martin Liška wrote: Hello. The patch fixes various typos. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin PR testsuite/100159 PR testsuite/100192 gcc/ChangeLog: * builtins.c (expand_builtin): Fix typos and missing comments. * dwarf2out.c (gen_subprogram_die): Likewise. (gen_struct_or_union_type_die): Likewise. gcc/fortran/ChangeLog: * frontend-passes.c (optimize_expr): Fix typos and missing comments. gcc/testsuite/ChangeLog: * g++.dg/template/nontype29.C: Fix typos and missing comments. * gcc.dg/Warray-bounds-64.c: Likewise. * gcc.dg/Warray-parameter.c: Likewise. * gcc.dg/Wstring-compare.c: Likewise. * gcc.dg/format/gcc_diag-11.c: Likewise. * gfortran.dg/array_constructor_3.f90: Likewise. * gfortran.dg/matmul_bounds_9.f90: Likewise. * gfortran.dg/pr78033.f90: Likewise. * gfortran.dg/pr96325.f90: Likewise. OK and I think we should consider these kinds of typo fixes obvious and thus not needing an explicit ack. jeff
[committed] libstdc++: Add options for libatomic to test
This fixes a linker error on AIX: FAIL: 30_threads/semaphore/try_acquire_posix.cc (test for excess errors) Excess errors: ld: 0711-317 ERROR: Undefined symbol: .__atomic_fetch_add_8 ld: 0711-317 ERROR: Undefined symbol: .__atomic_load_8 ld: 0711-317 ERROR: Undefined symbol: .__atomic_fetch_sub_8 ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information. collect2: error: ld returned 8 exit status libstdc++-v3/ChangeLog: * testsuite/30_threads/semaphore/try_acquire_posix.cc: Add options for libatomic. Tested powerpc64le-linux and powerpc-aix. Committed to trunk. I'd like to backport this to gcc-11 too. commit 58871c03318e080962f022f5d77db3c4fde3e351 Author: Jonathan Wakely Date: Thu Apr 22 15:51:08 2021 libstdc++: Add options for libatomic to test This fixes a linker error on AIX: FAIL: 30_threads/semaphore/try_acquire_posix.cc (test for excess errors) Excess errors: ld: 0711-317 ERROR: Undefined symbol: .__atomic_fetch_add_8 ld: 0711-317 ERROR: Undefined symbol: .__atomic_load_8 ld: 0711-317 ERROR: Undefined symbol: .__atomic_fetch_sub_8 ld: 0711-345 Use the -bloadmap or -bnoquiet option to obtain more information. collect2: error: ld returned 8 exit status libstdc++-v3/ChangeLog: * testsuite/30_threads/semaphore/try_acquire_posix.cc: Add options for libatomic. diff --git a/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_posix.cc b/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_posix.cc index 97e386f7b76..2d9ae40db72 100644 --- a/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_posix.cc +++ b/libstdc++-v3/testsuite/30_threads/semaphore/try_acquire_posix.cc @@ -19,6 +19,7 @@ // { dg-do run { target c++2a } } // { dg-require-effective-target pthread } // { dg-require-gthreads "" } +// { dg-add-options libatomic } #include #ifdef _GLIBCXX_HAVE_POSIX_SEMAPHORE
[committed] libstdc++: Reject std::make_shared [PR 99006]
Prior to C++20 it should be ill-formed to use std::make_shared with an array type (and we don't support the C++20 feature to make it valid yet anyway). libstdc++-v3/ChangeLog: PR libstdc++/99006 * include/bits/shared_ptr.h (allocate_shared): Assert that _Tp is not an array type. * include/bits/shared_ptr_base.h (__allocate_shared): Likewise. * testsuite/20_util/shared_ptr/creation/99006.cc: New test. Tested powerpc64le-linux. Committed to trunk. commit 55650236cd97d81f42f9fdb4f6bcb12babafe51f Author: Jonathan Wakely Date: Thu Apr 22 15:46:51 2021 libstdc++: Reject std::make_shared [PR 99006] Prior to C++20 it should be ill-formed to use std::make_shared with an array type (and we don't support the C++20 feature to make it valid yet anyway). libstdc++-v3/ChangeLog: PR libstdc++/99006 * include/bits/shared_ptr.h (allocate_shared): Assert that _Tp is not an array type. * include/bits/shared_ptr_base.h (__allocate_shared): Likewise. * testsuite/20_util/shared_ptr/creation/99006.cc: New test. diff --git a/libstdc++-v3/include/bits/shared_ptr.h b/libstdc++-v3/include/bits/shared_ptr.h index ac4aa20b794..d5386ad535f 100644 --- a/libstdc++-v3/include/bits/shared_ptr.h +++ b/libstdc++-v3/include/bits/shared_ptr.h @@ -857,6 +857,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION inline shared_ptr<_Tp> allocate_shared(const _Alloc& __a, _Args&&... __args) { + static_assert(!is_array<_Tp>::value, "make_shared not supported"); + return shared_ptr<_Tp>(_Sp_alloc_shared_tag<_Alloc>{__a}, std::forward<_Args>(__args)...); } diff --git a/libstdc++-v3/include/bits/shared_ptr_base.h b/libstdc++-v3/include/bits/shared_ptr_base.h index 8766b61fd81..71099afbf7a 100644 --- a/libstdc++-v3/include/bits/shared_ptr_base.h +++ b/libstdc++-v3/include/bits/shared_ptr_base.h @@ -1831,6 +1831,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION inline __shared_ptr<_Tp, _Lp> __allocate_shared(const _Alloc& __a, _Args&&... __args) { + static_assert(!is_array<_Tp>::value, "make_shared not supported"); + return __shared_ptr<_Tp, _Lp>(_Sp_alloc_shared_tag<_Alloc>{__a}, std::forward<_Args>(__args)...); } diff --git a/libstdc++-v3/testsuite/20_util/shared_ptr/creation/99006.cc b/libstdc++-v3/testsuite/20_util/shared_ptr/creation/99006.cc new file mode 100644 index 000..d5f7a5da5e9 --- /dev/null +++ b/libstdc++-v3/testsuite/20_util/shared_ptr/creation/99006.cc @@ -0,0 +1,9 @@ +// FIXME: This should use { target { ! c++20 } } +// { dg-do compile } + +#include + +auto p = std::make_shared(2); // { dg-error "here" } +auto q = std::make_shared(1, 2); // { dg-error "here" } + +// { dg-prune-output "static assertion failed" }
[committed] libstdc++: Fix typo in comment
libstdc++-v3/ChangeLog: * config/os/gnu-linux/os_defines.h: Fix type in comment. Tested powerpc64le-linux. Committed to trunk. commit 19aa9bc9897955817622574e62b53b24ae0837e9 Author: Jonathan Wakely Date: Thu Apr 22 15:48:29 2021 libstdc++: Fix typo in comment libstdc++-v3/ChangeLog: * config/os/gnu-linux/os_defines.h: Fix type in comment. diff --git a/libstdc++-v3/config/os/gnu-linux/os_defines.h b/libstdc++-v3/config/os/gnu-linux/os_defines.h index 64f78df8eef..d5bb2a1886e 100644 --- a/libstdc++-v3/config/os/gnu-linux/os_defines.h +++ b/libstdc++-v3/config/os/gnu-linux/os_defines.h @@ -33,7 +33,7 @@ // System-specific #define, typedefs, corrections, etc, go here. This // file will come before all others. -// This keeps isanum, et al from being propagated as macros. +// This keeps isalnum, et al from being propagated as macros. #define __NO_CTYPE 1 #include
Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.
On 4/22/21 2:52 AM, Jonathan Wakely wrote: On Thu, 22 Apr 2021, 08:47 Martin Liška, wrote: On 4/21/21 6:11 PM, Martin Sebor wrote: > On 4/21/21 2:15 AM, Martin Liška wrote: >> Hello. >> >> It's addressing the following Clang warning: >> cp/lex.c:170:45: warning: result of comparison of constant 64 with expression of type 'enum ovl_op_code' is always true [-Wtautological-constant-out-of-range-compare] >> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. >> >> Ready to be installed? >> Thanks, >> Martin >> >> gcc/cp/ChangeLog: >> >> * cp-tree.h (STATIC_ASSERT): Prefer static assert. >> * lex.c (init_operators): Remove run-time check. >> --- >> gcc/cp/cp-tree.h | 3 +++ >> gcc/cp/lex.c | 2 -- >> 2 files changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h >> index 81ff375f8a5..a8f72448ea9 100644 >> --- a/gcc/cp/cp-tree.h >> +++ b/gcc/cp/cp-tree.h >> @@ -5916,6 +5916,9 @@ enum ovl_op_code { >> OVL_OP_MAX >> }; >> +/* Make sure it fits in lang_decl_fn::operator_code. */ >> +STATIC_ASSERT (OVL_OP_MAX < (1 << 6)); >> + > > I wonder if there's a way to test this directly by something like > > static_assert (number-of-bits (ovl_op_info_t::ovl_op_code) > <= number-of-bits (lang_decl_fn::operator_code)); Good point, but I'm not aware of it. Maybe C++ people can chime in? ovl_op_code is an unscoped enumeration (meaning "enum" not "enum class") with no fixed underlying type (i.e. no enum-base like ": int" or ": long" is specified) which means that the number of bits in is value representation is the number of bits needed to represent the minimum and maximum enumerators: "the values of the enumeration are the values representable by a hypothetical integer type with minimal width M such that all enumerators can be represented." There is no function/utility like number-of-bits that can tell you that from the type though.You could use std::underlying_type::type to get the integral type that the compiler used to represent it, but that will probably be 'int' in this case and so all it tells you is an upper bound of no more than 32 bits, which is not useful for this purpose. I suspected there wasn't a function like that. Thanks for confirming it. I wrote the one below just to see if it could be done. It works for one bit-field but I can't think of a way to generalize it. We'd probably need a built-in for that. Perhaps one might be useful. enum E { e = 5 }; struct A { E e: 3; }; constexpr int number_of_bits () { A a = { }; a.e = (E)-1; int n = 0; for (; a.e; ++n) a.e = (E)((unsigned)a.e ^ (1 << n)); return n; } Martin
[PATCH] c++: Refine enum direct-list-initialization [CWG2374]
This implements the wording changes of CWG2374, which clarifies the wording of P0138 to forbid e.g. direct-list-initialization of a scoped enumeration from a different scoped enumeration. Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for trunk? gcc/cp/ChangeLog: DR 2374 * decl.c (is_direct_enum_init): Check the implicit convertibility requirement added by CWG 2374. gcc/testsuite/ChangeLog: DR 2374 * g++.dg/cpp1z/direct-enum-init2.C: New test. --- gcc/cp/decl.c | 8 +++- gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C | 8 2 files changed, 15 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C diff --git a/gcc/cp/decl.c b/gcc/cp/decl.c index b81de8ef934..60dc2bf182d 100644 --- a/gcc/cp/decl.c +++ b/gcc/cp/decl.c @@ -6191,7 +6191,13 @@ is_direct_enum_init (tree type, tree init) && ENUM_FIXED_UNDERLYING_TYPE_P (type) && TREE_CODE (init) == CONSTRUCTOR && CONSTRUCTOR_IS_DIRECT_INIT (init) - && CONSTRUCTOR_NELTS (init) == 1) + && CONSTRUCTOR_NELTS (init) == 1 + /* DR 2374: The single element needs to be implicitly +convertible to the underlying type of the enum. */ + && can_convert_arg (ENUM_UNDERLYING_TYPE (type), + TREE_TYPE (CONSTRUCTOR_ELT (init, 0)->value), + CONSTRUCTOR_ELT (init, 0)->value, + LOOKUP_IMPLICIT, tf_none)) return true; return false; } diff --git a/gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C b/gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C new file mode 100644 index 000..5b94a8d00fe --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp1z/direct-enum-init2.C @@ -0,0 +1,8 @@ +// DR 2374 +// { dg-do compile { target c++17 } } + +enum class Orange; +enum class Apple; + +extern Orange o; +Apple a{o}; // { dg-error "cannot convert" } -- 2.31.1.362.g311531c9de
Re: [PATCH] Remove __cplusplus >= 201103
On 4/22/21 3:59 PM, Richard Biener wrote: > For system.h please keep it working for C code (not sure if we use > a C compiler anywhere). All right. I've just did --enable-languages=all and all is fine. So there's likely no C usage of it. Martin
Re: [PATCH] [libstdc++] Fix "bare" notifications dropped by waiters check
On 2021-04-22 02:23, Jonathan Wakely wrote: On 21/04/21 18:29 -0700, Thomas Rodgers wrote: From: Thomas Rodgers NOTE - This patch also needs to be backported to gcc-11 in order for semaphore release() to work correctly on non-futex platforms. Tested sparc-sun-solaris2.11 For types that track whether or not there extant waiters (e.g. semaphore) internally, the __atomic_notify_address_bare() call was introduced to avoid the overhead of loading the atomic count of waiters. For platforms that don't have Futex, however, there was still a check for waiters, and seeing that there are none (because in the bare case, the count is not incremented), the notification is dropped. This commit addresses that case. libstdc++-v3/ChangeLog: * include/bits/atomic_wait.h: Always notify waiters in the in the case of 'bare' address notification. Repeated text: "in the in the" --- libstdc++-v3/include/bits/atomic_wait.h | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h index 0ac5575190c..984ed70f16c 100644 --- a/libstdc++-v3/include/bits/atomic_wait.h +++ b/libstdc++-v3/include/bits/atomic_wait.h @@ -226,9 +226,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } void - _M_notify(const __platform_wait_t* __addr, bool __all) noexcept + _M_notify(const __platform_wait_t* __addr, bool __all, bool __bare) noexcept { -if (!_M_waiting()) +if (!(__bare || _M_waiting())) Maybe it's just me, but I would find (!__base && !__waiting) to be a clearer expression of the logic here. i.e. "return if the wait is not bare and there are no waiters" rather than "return if the wait is not either bare or has waiters". The latter makes me take a second to grok. As discussed on IRC, I went back and forth on this a couple of times and neither option seemed particularly great to me. I started down the path of propagating the std::true_type/std::false_type bits that the RAII wrapper type knows about and making this a compile time choice, but felt the change was too big to make this late in the release cycle. I will revisit this for stage1 (though ideally all of this will be moved into the .so for GCC12 and partially rewritten as part of that process anyway). The patch is OK either way though, with the ChangeLog typo fix. return; #ifdef _GLIBCXX_HAVE_PLATFORM_WAIT @@ -304,11 +304,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } void -_M_notify(bool __all) +_M_notify(bool __all, bool __bare = false) { if (_M_addr == &_M_w._M_ver) __atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL); - _M_w._M_notify(_M_addr, __all); + _M_w._M_notify(_M_addr, __all, __bare); } template// This call is to be used by atomic types which track contention externally @@ -464,7 +464,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __detail::__platform_notify(__addr, __all); #else __detail::__bare_wait __w(__addr); -__w._M_notify(__all); +__w._M_notify(__all, true); #endif } _GLIBCXX_END_NAMESPACE_VERSION -- 2.30.2 Tested sparc-sun-solaris2.11. Committed to master, backported to release/gcc-11.
Re: [PATCH] Fix various typos.
On 4/22/21 4:00 PM, Richard Biener wrote: > For the dg-bogus did you check the emitted diagnostic not tested for > doesn't have the same misspelling? I did. I'm going to push it. Martin
i386: Fix unsigned int -> double conversion on i386 w/ -mfpmath=sse [PR100119]
2021-04-22 Uroš Bizjak gcc/ PR target/100119 * config/i386/i386-expand.c (ix86_expand_convert_uns_sidf_sse): Remove the sign with FE_DOWNWARD, where x - x = -0.0. gcc/testsuite/ PR target/100119 * gcc.target/i386/pr100119.c: New test. Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}. Pushed to master. Uros. diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c index 166c23ddb4d..516440eb5c1 100644 --- a/gcc/config/i386/i386-expand.c +++ b/gcc/config/i386/i386-expand.c @@ -1550,6 +1550,8 @@ ix86_expand_convert_uns_sixf_sse (rtx, rtx) gcc_unreachable (); } +static rtx ix86_expand_sse_fabs (rtx op0, rtx *smask); + /* Convert an unsigned SImode value into a DFmode. Only currently used for SSE, but applicable anywhere. */ @@ -1569,6 +1571,11 @@ ix86_expand_convert_uns_sidf_sse (rtx target, rtx input) x = const_double_from_real_value (TWO31r, DFmode); x = expand_simple_binop (DFmode, PLUS, fp, x, target, 0, OPTAB_DIRECT); + + /* Remove the sign with FE_DOWNWARD, where x - x = -0.0. */ + if (HONOR_SIGNED_ZEROS (DFmode) && flag_rounding_math) +x = ix86_expand_sse_fabs (x, NULL); + if (x != target) emit_move_insn (target, x); } diff --git a/gcc/testsuite/gcc.target/i386/pr100119.c b/gcc/testsuite/gcc.target/i386/pr100119.c new file mode 100644 index 000..be9472eb939 --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100119.c @@ -0,0 +1,28 @@ +/* PR target/100119 */ +/* { dg-do run { target sse2_runtime } } */ +/* { dg-require-effective-target fenv } */ +/* { dg-options "-O2 -frounding-math -msse2 -mno-avx512f -mfpmath=sse" } */ + +#include + +double +__attribute__((noinline)) +test (unsigned int x) +{ + return x; +} + +int +main () +{ + double result; + + fesetround (FE_DOWNWARD); + + result = test (0); + + if (__builtin_signbit (result) != 0) +__builtin_abort (); + + return 0; +}
Re: [PATCH] libstdc++: Add workaround for ia32 floating atomics miscompilations [PR100184]
Wrong PR # in subject. On Thu, Apr 22, 2021 at 7:11 AM Jakub Jelinek via Gcc-patches wrote: > > Hi! > > gcc on ia32 miscompiles various atomics involving floating point, > unfortunately I'm afraid it is too late to fix that for 11.1 and > as I'm quite lost on it, it might take a while for 12 too > (disabling all the 8 peephole2s would be easiest, but then we'd > run into optimization regressions). > > While 1.cc just FAILs, with dejagnu 1.6.1 wait_notify.cc hangs the > make check even after the timeout fires. The following patch therefore > xfails the former and skips the latter. > > Tested on x86_64-linux where > make check RUNTESTFLAGS='conformance.exp=atomic_float/*.cc' > is still > === libstdc++ Summary === > > # of expected passes8 > and on i686-linux, where it is now > === libstdc++ Summary === > > # of expected passes5 > # of expected failures 1 > # of unsupported tests 1 > > Ok for trunk/11.1 ? > > 2021-04-22 Jakub Jelinek > > PR target/100182 > * testsuite/29_atomics_atomic_float/1.cc: Add dg-xfail-run-if for > ia32. > * testsuite/29_atomics/atomic_float/wait_notify.cc: Add dg-skip-if for > ia32. > > --- libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc.jj 2021-01-05 > 00:13:58.552294301 +0100 > +++ libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc 2021-04-22 > 14:29:19.778374653 +0200 > @@ -18,6 +18,7 @@ > // { dg-add-options ieee } > // { dg-options "-std=gnu++2a" } > // { dg-do run { target c++2a } } > +// { dg-xfail-run-if "PR100182" { ia32 } } > > #include > #include > --- libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc.jj > 2021-04-20 23:46:09.214189796 +0200 > +++ libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc > 2021-04-22 14:28:19.793044944 +0200 > @@ -2,6 +2,7 @@ > // { dg-do run { target c++2a } } > // { dg-require-gthreads "" } > // { dg-additional-options "-pthread" { target pthread } } > +// { dg-skip-if "PR100182" { ia32 } } > // { dg-add-options libatomic } > > // Copyright (C) 2020-2021 Free Software Foundation, Inc. > > Jakub > -- H.J.
Re: [PATCH] LTO: fallback to -flto=N if -flto=jobserver does not work.
On 4/22/21 2:47 PM, Richard Biener wrote: > On Thu, Apr 22, 2021 at 2:21 PM Martin Liška wrote: >> >> On 4/22/21 1:19 PM, Richard Biener wrote: >>> On Thu, Apr 22, 2021 at 11:02 AM Martin Liška wrote: On 4/22/21 10:04 AM, Richard Biener wrote: > On Wed, Apr 21, 2021 at 3:08 PM Martin Liška wrote: >> >> When -flto=jobserver is used and we cannot detect job server, then we can >> still fallbackto -flto=N mode. >> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. >> >> Ready to be installed? > > I think this behavior needs to be documented - it falls back to a less > conservative (possibly system overloading) mode - which IMHO is > non-obvious and IMHO we shouldn't do. Sure, I'm sending corresponding patch. Note that it's quite common mistake that '+' is missing in Makefile rule. That was motivation for my change. >>> >>> Sure, but that change won't get this fixed. >> >> It will as linker command line will contain (-flto=jobserver) and LTO will >> fallback to -flto=N. >> >>> IMHO we should eventually >>> emit diagnostic like >>> >>> warning: could not find jobserver, compiling N jobs serially >>> >>> once N > 1 (or 2?). >> >> We do that now (for all N): >> lto-wrapper: warning: jobserver is not available: ‘MAKEFLAGS’ environment >> variable is unset >> >> >>> Likewise if people just use -flto and auto-detection >>> finds nothing: >> >> -flto != -flto=auto >> >> Yes, -flto is a serial linking and we can emit a warning. > > I'd avoid warning if there's just a single ltrans unit. That's doable and I've just done that in the attached patch. Two disadvantages: - one needs waiting for the warning after WPA - source change (>1 LTRANS) can trigger the warning > >>> warning: using serial compilation of N LTRANS jobs >>> note: refer to http:// for how to use parallel compile >>> >>> using the URL diagnostics to point to -flto=... documentation. >> >> What about making that a proper warning (-Wlto)? We have diagnostics >> infrastructure >> that prints URL links. > > Note that drivers like lto-wrapper do not have fully initialized diagnostic > machinery and use a "different" set of overloads (likewise gen* programs). I managed printing the warning: lto-wrapper: warning: jobserver is not available: ‘MAKEFLAGS’ environment variable is unset lto-wrapper: note: see the ‘-flto’ option documentation for more information and lto-wrapper: warning: using serial compilation of 128 LTRANS jobs lto-wrapper: note: see the ‘-flto’ option documentation for more information > >>> >>> That is, teach users rather than second-guessing and eventually >>> blowing things up. IMHO only the jobserver mode is safe to >>> automatically use. >> >> Well, -flto=auto is also fine and document. I think there is no possibility >> auto CPU deduction can fail. So -flto=jobserver (with missing make job >> server) >> and -flto (equal to -flto=1) worth emitting a warning. >> >> What do you think? > > Yes, that sounds reasonable. I suspect that people might want to see > -flto default to -flto=auto but then I don't think that's good because there's > no system wide job scheduler limiting things (systemd-jobserver anyone?) Done that. Thoughts? Martin > > Richard. > >> Martin >> >>> >>> Richard. >>> Martin > > Richard. > >> Thanks, >> Martin >> >> gcc/ChangeLog: >> >> * lto-wrapper.c (run_gcc): When -flto=jobserver is used, but the >> makeserver cannot be detected, then use -flto=N fallback. >> --- >> gcc/lto-wrapper.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c >> index 03a5922f8ea..0b626d7c811 100644 >> --- a/gcc/lto-wrapper.c >> +++ b/gcc/lto-wrapper.c >> @@ -1585,8 +1585,9 @@ run_gcc (unsigned argc, char *argv[]) >>if (jobserver && jobserver_error != NULL) >> { >> warning (0, jobserver_error); >> - parallel = 0; >> + /* Fall back to auto parallelism. */ >> jobserver = 0; >> + auto_parallel = 1; >> } >>else if (!jobserver && jobserver_error == NULL) >> { >> -- >> 2.31.1 >> >> >From 7cd89e17c49c027027e2aed1722c0758331ac7ac Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Thu, 22 Apr 2021 16:27:19 +0200 Subject: [PATCH] Print warning diagnostics for -flto issues. gcc/ChangeLog: * lto-wrapper.c (print_lto_docs_link): New function. (run_gcc): Print warning about missing job server detection after we know NR of partitions. Do the same for -flto{,=1}. * opts.c (get_option_html_page): Support -flto option. --- gcc/lto-wrapper.c | 39 +-- gcc/opts.c| 6 +- 2 files changed, 42 insertions(+), 3 deletions(-) diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index
Re: [PATCH] Fix various typos.
On Thu, Apr 22, 2021 at 3:50 PM Martin Liška wrote: > > Hello. > > The patch fixes various typos. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Ready to be installed? For the dg-bogus did you check the emitted diagnostic not tested for doesn't have the same misspelling? Otherwise OK. Richard. > Thanks, > Martin > > PR testsuite/100159 > PR testsuite/100192 > > gcc/ChangeLog: > > * builtins.c (expand_builtin): Fix typos and missing comments. > * dwarf2out.c (gen_subprogram_die): Likewise. > (gen_struct_or_union_type_die): Likewise. > > gcc/fortran/ChangeLog: > > * frontend-passes.c (optimize_expr): Fix typos and missing comments. > > gcc/testsuite/ChangeLog: > > * g++.dg/template/nontype29.C: Fix typos and missing comments. > * gcc.dg/Warray-bounds-64.c: Likewise. > * gcc.dg/Warray-parameter.c: Likewise. > * gcc.dg/Wstring-compare.c: Likewise. > * gcc.dg/format/gcc_diag-11.c: Likewise. > * gfortran.dg/array_constructor_3.f90: Likewise. > * gfortran.dg/matmul_bounds_9.f90: Likewise. > * gfortran.dg/pr78033.f90: Likewise. > * gfortran.dg/pr96325.f90: Likewise. > --- > gcc/builtins.c| 2 +- > gcc/dwarf2out.c | 4 ++-- > gcc/fortran/frontend-passes.c | 2 +- > gcc/testsuite/g++.dg/template/nontype29.C | 4 ++-- > gcc/testsuite/gcc.dg/Warray-bounds-64.c | 2 +- > gcc/testsuite/gcc.dg/Warray-parameter.c | 2 +- > gcc/testsuite/gcc.dg/Wstring-compare.c| 10 +- > gcc/testsuite/gcc.dg/format/gcc_diag-11.c | 2 +- > gcc/testsuite/gfortran.dg/array_constructor_3.f90 | 2 +- > gcc/testsuite/gfortran.dg/matmul_bounds_9.f90 | 2 +- > gcc/testsuite/gfortran.dg/pr78033.f90 | 2 +- > gcc/testsuite/gfortran.dg/pr96325.f90 | 2 +- > 12 files changed, 18 insertions(+), 18 deletions(-) > > diff --git a/gcc/builtins.c b/gcc/builtins.c > index d30c4eb62fc..8c5324bf7de 100644 > --- a/gcc/builtins.c > +++ b/gcc/builtins.c > @@ -9986,7 +9986,7 @@ expand_builtin (tree exp, rtx target, rtx subtarget, > machine_mode mode, >break; > > /* Expand it as BUILT_IN_MEMCMP_EQ first. If not successful, change it > - back to a BUILT_IN_STRCMP. Remember to delete the 3rd paramater > + back to a BUILT_IN_STRCMP. Remember to delete the 3rd parameter > when changing it to a strcmp call. */ > case BUILT_IN_STRCMP_EQ: >target = expand_builtin_memcmp (exp, target, true); > diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c > index aba16848295..c36fd5a7f6a 100644 > --- a/gcc/dwarf2out.c > +++ b/gcc/dwarf2out.c > @@ -23542,7 +23542,7 @@ gen_subprogram_die (tree decl, dw_die_ref context_die) >resolve_variable_values (); > } > > - /* Generate child dies for template paramaters. */ > + /* Generate child dies for template parameters. */ >if (early_dwarf && debug_info_level > DINFO_LEVEL_TERSE) > gen_generic_params_dies (decl); > > @@ -25471,7 +25471,7 @@ gen_struct_or_union_type_die (tree type, dw_die_ref > context_die, > >scope_die = scope_die_for (type, context_die); > > - /* Generate child dies for template paramaters. */ > + /* Generate child dies for template parameters. */ >if (!type_die && debug_info_level > DINFO_LEVEL_TERSE) > schedule_generic_params_dies_gen (type); > > diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c > index 7d3eae67632..93ac4b4a17c 100644 > --- a/gcc/fortran/frontend-passes.c > +++ b/gcc/fortran/frontend-passes.c > @@ -373,7 +373,7 @@ optimize_expr (gfc_expr **e, int *walk_subtrees > ATTRIBUTE_UNUSED, >return 0; > } > > -/* Auxiliary function to handle the arguments to reduction intrnisics. If > the > +/* Auxiliary function to handle the arguments to reduction intrinsics. If > the > function is a scalar, just copy it; otherwise returns the new element, the > old one can be freed. */ > > diff --git a/gcc/testsuite/g++.dg/template/nontype29.C > b/gcc/testsuite/g++.dg/template/nontype29.C > index 18a3058e7e3..dd4e20f7e62 100644 > --- a/gcc/testsuite/g++.dg/template/nontype29.C > +++ b/gcc/testsuite/g++.dg/template/nontype29.C > @@ -3,7 +3,7 @@ > // { dg-do compile } > // { dg-options "-Wall" } > > -#if __cpluspls >= 201103L > +#if __cplusplus >= 201103L > > // C++ 11 test case from comment #0. > namespace comment_0 { > @@ -60,7 +60,7 @@ void h () > } // comment_2 > > > -#if __cpluspls >= 201103L > +#if __cplusplus >= 201103L > > // C++ 11 test case from comment #5. > namespace comment_5 { > diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-64.c > b/gcc/testsuite/gcc.dg/Warray-bounds-64.c > index 88b88debff4..f5ebc3dd4b2 100644 > --- a/gcc/testsuite/gcc.dg/Warray-bounds-64.c > +++ b/gcc/testsuite/gcc.dg/Warray-bounds-64.c > @@ -7,7 +7,7 @@ > asks for.
Re: [PATCH] Remove __cplusplus >= 201103
On Thu, Apr 22, 2021 at 3:52 PM Martin Liška wrote: > > Right now, we require a C++11 compiler, so the check is not needed any > longer. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Ready to be installed once GCC 11.1 is released? For system.h please keep it working for C code (not sure if we use a C compiler anywhere). Richard. > Thanks, > Martin > > gcc/analyzer/ChangeLog: > > * program-state.cc (program_state::operator=): Remove > __cplusplus >= 201103. > (program_state::program_state): Likewise. > * program-state.h: Likewise. > * region-model.h (class region_model): Remove dead code. > > gcc/ChangeLog: > > * bitmap.h (class auto_bitmap): Remove > __cplusplus >= 201103. > * config/aarch64/aarch64.c: Likewise. > * gimple-ssa-store-merging.c > (store_immediate_info::store_immediate_info): > Likewise. > * sbitmap.h: Likewise. > * system.h (STATIC_ASSERT): Likewise. > --- > gcc/analyzer/program-state.cc | 2 -- > gcc/analyzer/program-state.h | 4 > gcc/analyzer/region-model.h| 5 - > gcc/bitmap.h | 2 -- > gcc/config/aarch64/aarch64.c | 2 -- > gcc/gimple-ssa-store-merging.c | 10 +- > gcc/sbitmap.h | 2 -- > gcc/system.h | 9 + > 8 files changed, 2 insertions(+), 34 deletions(-) > > diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc > index f8094621309..5c690b08fd3 100644 > --- a/gcc/analyzer/program-state.cc > +++ b/gcc/analyzer/program-state.cc > @@ -731,7 +731,6 @@ program_state::operator= (const program_state ) >return *this; > } > > -#if __cplusplus >= 201103 > /* Move constructor for program_state (when building with C++11). */ > program_state::program_state (program_state &) > : m_region_model (other.m_region_model), > @@ -747,7 +746,6 @@ program_state::program_state (program_state &) > >m_valid = other.m_valid; > } > -#endif > > /* program_state's dtor. */ > > diff --git a/gcc/analyzer/program-state.h b/gcc/analyzer/program-state.h > index 898c57ff833..f16fe6ba984 100644 > --- a/gcc/analyzer/program-state.h > +++ b/gcc/analyzer/program-state.h > @@ -192,11 +192,7 @@ public: >program_state (const extrinsic_state _state); >program_state (const program_state ); >program_state& operator= (const program_state ); > - > -#if __cplusplus >= 201103 >program_state (program_state &); > -#endif > - >~program_state (); > >hashval_t hash () const; > diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h > index 4123500f3bc..a169396dcb5 100644 > --- a/gcc/analyzer/region-model.h > +++ b/gcc/analyzer/region-model.h > @@ -414,11 +414,6 @@ class region_model >region_model (region_model_manager *mgr); >region_model (const region_model ); >~region_model (); > - > -#if 0//__cplusplus >= 201103 > - region_model (region_model &); > -#endif > - >region_model = (const region_model ); > >bool operator== (const region_model ) const; > diff --git a/gcc/bitmap.h b/gcc/bitmap.h > index 84632af7009..26138556b2a 100644 > --- a/gcc/bitmap.h > +++ b/gcc/bitmap.h > @@ -951,10 +951,8 @@ class auto_bitmap >// Prevent making a copy that references our bitmap. >auto_bitmap (const auto_bitmap &); >auto_bitmap = (const auto_bitmap &); > -#if __cplusplus >= 201103L >auto_bitmap (auto_bitmap &&); >auto_bitmap = (auto_bitmap &&); > -#endif > >bitmap_head m_bits; > }; > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c > index 12625a4bee3..62fcad88215 100644 > --- a/gcc/config/aarch64/aarch64.c > +++ b/gcc/config/aarch64/aarch64.c > @@ -221,9 +221,7 @@ public: > predicate in each predicate argument register. This means that > we need at least 12 pieces. */ >static const unsigned int MAX_PIECES = NUM_FP_ARG_REGS + NUM_PR_ARG_REGS; > -#if __cplusplus >= 201103L >static_assert (MAX_PIECES >= 8, "Need to store at least 8 predicates"); > -#endif > >/* Describes one piece of a PST. Each piece is one of: > > diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c > index 7eb50d65a20..123c92d9b44 100644 > --- a/gcc/gimple-ssa-store-merging.c > +++ b/gcc/gimple-ssa-store-merging.c > @@ -1595,17 +1595,9 @@ store_immediate_info::store_immediate_info (unsigned > HOST_WIDE_INT bs, >: bitsize (bs), bitpos (bp), bitregion_start (brs), bitregion_end (bre), > stmt (st), order (ord), rhs_code (rhscode), n (nr), > ins_stmt (ins_stmtp), bit_not_p (bitnotp), ops_swapped_p (false), > -lp_nr (nr2) > -#if __cplusplus >= 201103L > -, ops { op0r, op1r } > +lp_nr (nr2), ops { op0r, op1r } > { > } > -#else > -{ > - ops[0] = op0r; > - ops[1] = op1r; > -} > -#endif > > /* Struct representing a group of stores to contiguous memory locations. > These are produced by the second phase (coalescing) and consumed in the >
Re: [PATCH 0/3] VAX backend preparatory updates for switching to LRA
> On Apr 21, 2021, at 5:32 PM, Maciej W. Rozycki wrote: > > ... > OTOH switching to LRA regresses code generation seriously, by making the > indexed and indirect VAX address modes severely underutilised, so while > with these changes in place the backend can be switched to LRA with just a > trivial to remove the redefinition of TARGET_LRA_P, I think it is not yet > the right time to do it. I noticed similar issues with pdp11, which at the moment allows LRA via a -mlra switch but doesn't make it the default. Another mode that isn't handled well (or at all) by LRA is autoincrement/autodecrement. It would be great if all these things could be done better, that would help several targets. (I wonder if m68k would be another; doesn't it have similar addressing modes at least on the 68040?) paul
Re: [PATCH] c++: do_class_deduction and dependent init [PR93383]
On Wed, 21 Apr 2021, Patrick Palka wrote: > On Wed, 21 Apr 2021, Jason Merrill wrote: > > > On 4/12/21 1:20 PM, Patrick Palka wrote: > > > Here we're crashing during deduction for a template placeholder from a > > > dependent initializer because one of the initializer's elements has an > > > empty TREE_TYPE, something which resolve_args and later unify_one_argument > > > don't expect. And if the deduction from a dependent initializer > > > otherwise fails, we prematurely issue an error rather than reattempting > > > the deduction at instantiation time. > > > > > > This patch makes do_class_deduction more tolerant about dependent > > > initializers, in a manner similar to what do_auto_deduction does: if > > > deduction from a dependent initializer fails, just return the original > > > placeholder unchanged. > > > > Why doesn't the type_dependent_expression_p check in do_auto_deduction catch > > this already? > > That check applies only when context != adc_unify, but here we have > context == adc_unify since we're being called from > convert_template_argument. > > And currently, when 'auto' deduction fails for a dependent initializer, > do_auto_deduction will just silently return the original placeholder: > > int val = type_unification_real (tparms, targs, parms, , 1, 0, >DEDUCE_CALL, >NULL, /*explain_p=*/false); > if (val > 0) > { > if (processing_template_decl) > /* Try again at instantiation time. */ > return type; > > so I suppose this patch just makes do_class_deduction behave more > similarly to do_auto_deduction in this situation. On second thought, I think attempting CTAD a dependent initializer as the patch does might sometimes give us the wrong answer. If e.g. the class template in question has the deduction guides template A(T) -> A; A(int) -> A; then ahead-of-time CTAD for A{v} where v is type-dependent will succeed and resolve to A, but at instantiation time the type of v might be int. So perhaps we should just have do_class_deduction punt on all type-dependent expressions, e.g. -- >8 -- gcc/cp/ChangeLog: PR c++/89565 PR c++/93383 PR c++/99200 * pt.c (do_class_deduction): Give up if the initializer is type-dependent. gcc/testsuite/ChangeLog: PR c++/89565 PR c++/93383 PR c++/99200 * g++.dg/cpp2a/nontype-class39.C: Remove dg-ice. * g++.dg/cpp2a/nontype-class45.C: New test. * g++.dg/cpp2a/nontype-class46.C: New test. --- gcc/cp/pt.c | 5 +++ gcc/testsuite/g++.dg/cpp2a/nontype-class39.C | 2 -- gcc/testsuite/g++.dg/cpp2a/nontype-class45.C | 32 gcc/testsuite/g++.dg/cpp2a/nontype-class46.C | 11 +++ 4 files changed, 48 insertions(+), 2 deletions(-) create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class45.C create mode 100644 gcc/testsuite/g++.dg/cpp2a/nontype-class46.C diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c index 7bcbe6dc3ce..6673f935ab6 100644 --- a/gcc/cp/pt.c +++ b/gcc/cp/pt.c @@ -29362,6 +29362,11 @@ do_class_deduction (tree ptype, tree tmpl, tree init, return error_mark_node; } + /* If the initializer is dependent, we can't resolve the class template + placeholder ahead of time. */ + if (type_dependent_expression_p (init)) +return ptype; + tree type = TREE_TYPE (tmpl); bool try_list_ctor = false; diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class39.C b/gcc/testsuite/g++.dg/cpp2a/nontype-class39.C index 512afad8e4f..9b4da4f02ea 100644 --- a/gcc/testsuite/g++.dg/cpp2a/nontype-class39.C +++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class39.C @@ -1,7 +1,5 @@ // PR c++/89565 // { dg-do compile { target c++20 } } -// { dg-additional-options "-fchecking" } -// { dg-ice "resolve_args" } template struct N{}; diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class45.C b/gcc/testsuite/g++.dg/cpp2a/nontype-class45.C new file mode 100644 index 000..e7addf5f291 --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class45.C @@ -0,0 +1,32 @@ +// PR c++/99200 +// { dg-do compile { target c++20 } } + +template +struct A +{ + constexpr A (const char ()[N]) { for (int i = 0; i < N; i++) v[i] = s[i]; v[N] = 0; } + char v[N + 1]; +}; + +template +struct B +{ + constexpr operator const char *() { return s.v; } +}; + +template +const char * +foo () +{ + return B<__PRETTY_FUNCTION__>{}; +} + +template +const char * +bar () +{ + return B<__FUNCTION__>{}; +} + +auto a = foo (); +auto b = bar (); diff --git a/gcc/testsuite/g++.dg/cpp2a/nontype-class46.C b/gcc/testsuite/g++.dg/cpp2a/nontype-class46.C new file mode 100644 index 000..d91e800424f --- /dev/null +++ b/gcc/testsuite/g++.dg/cpp2a/nontype-class46.C @@ -0,0 +1,11 @@ +// PR c++/93383 +// { dg-do compile { target c++20 } } + +template struct A {}; + +template struct B { + void
[PATCH] testsuite/arm: Improve unsigned-float.c
The test requires an FPU, so use -march=armv7-a+fp -mfpu=auto instead of -march=armv7-a. We also remove dg-require-effective-target arm_fp_ok, but keep dg-add-options arm_fp: this enables the test to pass on arm-eabi configured with default cpu/fpu/mode. dg-require-effective-target arm_fp_ok fails on such a configuration for lack of FPU, since dg-options are not taken into account by dg-require-effective-target. Add -march=armv7-a+fp -mfpu=auto is sufficient for arm_fp options to be acceptable. This enables the test to pass on all the arm-eabi configurations I'm testing, as well as arm-linux-gnueabi when forcing -march=armv5t. 2021-04-22 Christophe Lyon gcc/testsuite/ * gcc.target/arm/unsigned-float.c: Remove arm_fp_ok, adjust dg-options. --- gcc/testsuite/gcc.target/arm/unsigned-float.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/gcc/testsuite/gcc.target/arm/unsigned-float.c b/gcc/testsuite/gcc.target/arm/unsigned-float.c index ad589d9..ea3abc7 100644 --- a/gcc/testsuite/gcc.target/arm/unsigned-float.c +++ b/gcc/testsuite/gcc.target/arm/unsigned-float.c @@ -1,8 +1,8 @@ /* { dg-do compile } */ -/* { dg-require-effective-target arm_fp_ok } */ -/* { dg-skip-if "need fp instructions" { *-*-* } { "-mfloat-abi=soft" } { "" } } */ /* { dg-skip-if "-mpure-code supports M-profile only" { *-*-* } { "-mpure-code" } } */ -/* { dg-options "-march=armv7-a -O1" } */ +/* { dg-options "-march=armv7-a+fp -mfpu=auto -O1" } */ +/* Do not require arm_ok effective target to avoid skipping on arm-eabi with + default configure options. */ /* { dg-add-options arm_fp } */ -- 2.7.4
Re: [PATCH] libstdc++: Add workaround for ia32 floating atomics miscompilations [PR100184]
On 22/04/21 14:42 +0200, Jakub Jelinek wrote: Hi! gcc on ia32 miscompiles various atomics involving floating point, unfortunately I'm afraid it is too late to fix that for 11.1 and as I'm quite lost on it, it might take a while for 12 too (disabling all the 8 peephole2s would be easiest, but then we'd run into optimization regressions). While 1.cc just FAILs, with dejagnu 1.6.1 wait_notify.cc hangs the make check even after the timeout fires. The following patch therefore xfails the former and skips the latter. Tested on x86_64-linux where make check RUNTESTFLAGS='conformance.exp=atomic_float/*.cc' is still === libstdc++ Summary === # of expected passes8 and on i686-linux, where it is now === libstdc++ Summary === # of expected passes5 # of expected failures 1 # of unsupported tests 1 Ok for trunk/11.1 ? OK, thanks. 2021-04-22 Jakub Jelinek PR target/100182 * testsuite/29_atomics_atomic_float/1.cc: Add dg-xfail-run-if for ia32. * testsuite/29_atomics/atomic_float/wait_notify.cc: Add dg-skip-if for ia32. --- libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc.jj 2021-01-05 00:13:58.552294301 +0100 +++ libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc 2021-04-22 14:29:19.778374653 +0200 @@ -18,6 +18,7 @@ // { dg-add-options ieee } // { dg-options "-std=gnu++2a" } // { dg-do run { target c++2a } } +// { dg-xfail-run-if "PR100182" { ia32 } } #include #include --- libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc.jj 2021-04-20 23:46:09.214189796 +0200 +++ libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc 2021-04-22 14:28:19.793044944 +0200 @@ -2,6 +2,7 @@ // { dg-do run { target c++2a } } // { dg-require-gthreads "" } // { dg-additional-options "-pthread" { target pthread } } +// { dg-skip-if "PR100182" { ia32 } } // { dg-add-options libatomic } // Copyright (C) 2020-2021 Free Software Foundation, Inc. Jakub
Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute
On Thu, Apr 22, 2021 at 2:52 PM Richard Biener wrote: > > On Thu, Apr 22, 2021 at 2:22 PM Jakub Jelinek wrote: > > > > On Thu, Apr 22, 2021 at 01:23:20PM +0200, Richard Biener via Gcc-patches > > wrote: > > > > The question is if the pragma GCC target right now behaves incrementally > > > > or not, whether > > > > #pragma GCC target("avx2") > > > > adds -mavx2 to options if it was missing before and nothing otherwise, > > > > or if > > > > it switches other options off. If it is incremental, we could e.g. try > > > > to > > > > use the second least significant bit of global_options_set.x_* to mean > > > > this option has been set explicitly by some surrounding #pragma GCC > > > > target. > > > > The normal tests - global_options_set.x_flag_whatever could still work > > > > fine because they wouldn't care if the option was explicit from anywhere > > > > (command line or GCC target or target attribute) and just & 2 would mean > > > > it was explicit from pragma GCC target; though there is the case of > > > > bitfields... And then the inlining decision could check the & 2 flags to > > > > see what is required and what is just from command line. > > > > Or we can have some other pragma GCC that would be like target but would > > > > have flags that are explicit (and could e.g. be more restricted, to ISA > > > > options only, and let those use in addition to #pragma GCC target. > > > > > > I'm still curious as to what you think will break if always-inline does > > > what > > > it is documented to do. > > > > We will silently accept calling intrinsics that must be used only in certain > > ISA contexts, which will lead to people writing non-portable code. > > > > So -O2 -mno-avx > > #include > > > > void > > foo (__m256 *x) > > { > > x[0] = _mm256_sub_ps (x[1], x[2]); > > } > > etc. will now be accepted when it shouldn't be. > > clang rejects it like gcc with: > > 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target > > feature 'avx', but would be inlined into function 'foo' that is compiled > > without support for 'avx' > > x[0] = _mm256_sub_ps (x[1], x[2]); > > ^ > > > > Note, if I do: > > #include > > > > __attribute__((target ("no-sse3"))) void > > foo (__m256 *x) > > { > > x[0] = _mm256_sub_ps (x[1], x[2]); > > } > > and compile > > clang -S -O2 -mavx2 1.c > > 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target > > feature 'avx', but would be inlined into function 'foo' that is compiled > > without support for 'avx' > > x[0] = _mm256_sub_ps (x[1], x[2]); > > ^ > > then from the error message it seems that unlike GCC, clang remembers > > the exact target features that are needed for the intrinsics and checks just > > those. > > Though, looking at the preprocessed source, seems it uses > > static __inline __m256 __attribute__((__always_inline__, __nodebug__, > > __target__("avx"), __min_vector_width__(256))) > > _mm256_sub_ps(__m256 __a, __m256 __b) > > { > > return (__m256)((__v8sf)__a-(__v8sf)__b); > > } > > and not target pragmas. > > > > Anyway, if we tweak our intrinsic headers so that > > -#ifndef __AVX__ > > #pragma GCC push_options > > #pragma GCC target("avx") > > -#define __DISABLE_AVX__ > > -#endif /* __AVX__ */ > > > > ... > > -#ifdef __DISABLE_AVX__ > > -#undef __DISABLE_AVX__ > > #pragma GCC pop_options > > -#endif /* __DISABLE_AVX__ */ > > and do the opts_set->x_* & 2 stuff on explicit options coming out of > > target/optimize pragmas and attributes, perhaps we don't even need > > to introduce a new attribute and can handle everything magically: Oh, and any such changes will likely interact with Martins ideas to rework how optimize and target attributes work (aka adding ontop of the commandline options). That is, attribute target will then not be enough to remember the exact set of needed ISA features (as opposed to what likely clang implements?) > > 1) if it is gnu_inline extern inline, allow indirect calls, otherwise > > disallow them for always_inline functions > > There are a lot of intrinsics using extern inline __gnu_inline though... > > > 2) for the isa flags and option mismatches, only disallow opts_set->x_* & 2 > > stuff > > This will keep both intrinsics and glibc fortify macros working fine > > in all the needed use cases. > > Yes, see my example in the other mail. > > I think before we add any new attributes we should sort out the > current mess, eventually adding some testcases for desired > diagnostic. > > Richard. > > > Jakub > >
Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute
On Thu, Apr 22, 2021 at 2:22 PM Jakub Jelinek wrote: > > On Thu, Apr 22, 2021 at 01:23:20PM +0200, Richard Biener via Gcc-patches > wrote: > > > The question is if the pragma GCC target right now behaves incrementally > > > or not, whether > > > #pragma GCC target("avx2") > > > adds -mavx2 to options if it was missing before and nothing otherwise, or > > > if > > > it switches other options off. If it is incremental, we could e.g. try to > > > use the second least significant bit of global_options_set.x_* to mean > > > this option has been set explicitly by some surrounding #pragma GCC > > > target. > > > The normal tests - global_options_set.x_flag_whatever could still work > > > fine because they wouldn't care if the option was explicit from anywhere > > > (command line or GCC target or target attribute) and just & 2 would mean > > > it was explicit from pragma GCC target; though there is the case of > > > bitfields... And then the inlining decision could check the & 2 flags to > > > see what is required and what is just from command line. > > > Or we can have some other pragma GCC that would be like target but would > > > have flags that are explicit (and could e.g. be more restricted, to ISA > > > options only, and let those use in addition to #pragma GCC target. > > > > I'm still curious as to what you think will break if always-inline does what > > it is documented to do. > > We will silently accept calling intrinsics that must be used only in certain > ISA contexts, which will lead to people writing non-portable code. > > So -O2 -mno-avx > #include > > void > foo (__m256 *x) > { > x[0] = _mm256_sub_ps (x[1], x[2]); > } > etc. will now be accepted when it shouldn't be. > clang rejects it like gcc with: > 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target > feature 'avx', but would be inlined into function 'foo' that is compiled > without support for 'avx' > x[0] = _mm256_sub_ps (x[1], x[2]); > ^ > > Note, if I do: > #include > > __attribute__((target ("no-sse3"))) void > foo (__m256 *x) > { > x[0] = _mm256_sub_ps (x[1], x[2]); > } > and compile > clang -S -O2 -mavx2 1.c > 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target > feature 'avx', but would be inlined into function 'foo' that is compiled > without support for 'avx' > x[0] = _mm256_sub_ps (x[1], x[2]); > ^ > then from the error message it seems that unlike GCC, clang remembers > the exact target features that are needed for the intrinsics and checks just > those. > Though, looking at the preprocessed source, seems it uses > static __inline __m256 __attribute__((__always_inline__, __nodebug__, > __target__("avx"), __min_vector_width__(256))) > _mm256_sub_ps(__m256 __a, __m256 __b) > { > return (__m256)((__v8sf)__a-(__v8sf)__b); > } > and not target pragmas. > > Anyway, if we tweak our intrinsic headers so that > -#ifndef __AVX__ > #pragma GCC push_options > #pragma GCC target("avx") > -#define __DISABLE_AVX__ > -#endif /* __AVX__ */ > > ... > -#ifdef __DISABLE_AVX__ > -#undef __DISABLE_AVX__ > #pragma GCC pop_options > -#endif /* __DISABLE_AVX__ */ > and do the opts_set->x_* & 2 stuff on explicit options coming out of > target/optimize pragmas and attributes, perhaps we don't even need > to introduce a new attribute and can handle everything magically: > > 1) if it is gnu_inline extern inline, allow indirect calls, otherwise > disallow them for always_inline functions There are a lot of intrinsics using extern inline __gnu_inline though... > 2) for the isa flags and option mismatches, only disallow opts_set->x_* & 2 > stuff > This will keep both intrinsics and glibc fortify macros working fine > in all the needed use cases. Yes, see my example in the other mail. I think before we add any new attributes we should sort out the current mess, eventually adding some testcases for desired diagnostic. Richard. > Jakub >
Re: [PATCH] LTO: fallback to -flto=N if -flto=jobserver does not work.
On Thu, Apr 22, 2021 at 2:21 PM Martin Liška wrote: > > On 4/22/21 1:19 PM, Richard Biener wrote: > > On Thu, Apr 22, 2021 at 11:02 AM Martin Liška wrote: > >> > >> On 4/22/21 10:04 AM, Richard Biener wrote: > >>> On Wed, Apr 21, 2021 at 3:08 PM Martin Liška wrote: > > When -flto=jobserver is used and we cannot detect job server, then we can > still fallbackto -flto=N mode. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Ready to be installed? > >>> > >>> I think this behavior needs to be documented - it falls back to a less > >>> conservative (possibly system overloading) mode - which IMHO is > >>> non-obvious and IMHO we shouldn't do. > >> > >> Sure, I'm sending corresponding patch. Note that it's quite common mistake > >> that '+' is missing in Makefile rule. That was motivation for my change. > > > > Sure, but that change won't get this fixed. > > It will as linker command line will contain (-flto=jobserver) and LTO will > fallback to -flto=N. > > > IMHO we should eventually > > emit diagnostic like > > > > warning: could not find jobserver, compiling N jobs serially > > > > once N > 1 (or 2?). > > We do that now (for all N): > lto-wrapper: warning: jobserver is not available: ‘MAKEFLAGS’ environment > variable is unset > > > > Likewise if people just use -flto and auto-detection > > finds nothing: > > -flto != -flto=auto > > Yes, -flto is a serial linking and we can emit a warning. I'd avoid warning if there's just a single ltrans unit. > > warning: using serial compilation of N LTRANS jobs > > note: refer to http:// for how to use parallel compile > > > > using the URL diagnostics to point to -flto=... documentation. > > What about making that a proper warning (-Wlto)? We have diagnostics > infrastructure > that prints URL links. Note that drivers like lto-wrapper do not have fully initialized diagnostic machinery and use a "different" set of overloads (likewise gen* programs). > > > > That is, teach users rather than second-guessing and eventually > > blowing things up. IMHO only the jobserver mode is safe to > > automatically use. > > Well, -flto=auto is also fine and document. I think there is no possibility > auto CPU deduction can fail. So -flto=jobserver (with missing make job server) > and -flto (equal to -flto=1) worth emitting a warning. > > What do you think? Yes, that sounds reasonable. I suspect that people might want to see -flto default to -flto=auto but then I don't think that's good because there's no system wide job scheduler limiting things (systemd-jobserver anyone?) Richard. > Martin > > > > > Richard. > > > >> Martin > >> > >>> > >>> Richard. > >>> > Thanks, > Martin > > gcc/ChangeLog: > > * lto-wrapper.c (run_gcc): When -flto=jobserver is used, but the > makeserver cannot be detected, then use -flto=N fallback. > --- > gcc/lto-wrapper.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c > index 03a5922f8ea..0b626d7c811 100644 > --- a/gcc/lto-wrapper.c > +++ b/gcc/lto-wrapper.c > @@ -1585,8 +1585,9 @@ run_gcc (unsigned argc, char *argv[]) > if (jobserver && jobserver_error != NULL) > { > warning (0, jobserver_error); > - parallel = 0; > + /* Fall back to auto parallelism. */ > jobserver = 0; > + auto_parallel = 1; > } > else if (!jobserver && jobserver_error == NULL) > { > -- > 2.31.1 > > >> >
[PATCH] libstdc++: Add workaround for ia32 floating atomics miscompilations [PR100184]
Hi! gcc on ia32 miscompiles various atomics involving floating point, unfortunately I'm afraid it is too late to fix that for 11.1 and as I'm quite lost on it, it might take a while for 12 too (disabling all the 8 peephole2s would be easiest, but then we'd run into optimization regressions). While 1.cc just FAILs, with dejagnu 1.6.1 wait_notify.cc hangs the make check even after the timeout fires. The following patch therefore xfails the former and skips the latter. Tested on x86_64-linux where make check RUNTESTFLAGS='conformance.exp=atomic_float/*.cc' is still === libstdc++ Summary === # of expected passes8 and on i686-linux, where it is now === libstdc++ Summary === # of expected passes5 # of expected failures 1 # of unsupported tests 1 Ok for trunk/11.1 ? 2021-04-22 Jakub Jelinek PR target/100182 * testsuite/29_atomics_atomic_float/1.cc: Add dg-xfail-run-if for ia32. * testsuite/29_atomics/atomic_float/wait_notify.cc: Add dg-skip-if for ia32. --- libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc.jj 2021-01-05 00:13:58.552294301 +0100 +++ libstdc++-v3/testsuite/29_atomics/atomic_float/1.cc 2021-04-22 14:29:19.778374653 +0200 @@ -18,6 +18,7 @@ // { dg-add-options ieee } // { dg-options "-std=gnu++2a" } // { dg-do run { target c++2a } } +// { dg-xfail-run-if "PR100182" { ia32 } } #include #include --- libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc.jj 2021-04-20 23:46:09.214189796 +0200 +++ libstdc++-v3/testsuite/29_atomics/atomic_float/wait_notify.cc 2021-04-22 14:28:19.793044944 +0200 @@ -2,6 +2,7 @@ // { dg-do run { target c++2a } } // { dg-require-gthreads "" } // { dg-additional-options "-pthread" { target pthread } } +// { dg-skip-if "PR100182" { ia32 } } // { dg-add-options libatomic } // Copyright (C) 2020-2021 Free Software Foundation, Inc. Jakub
Re: GCC 11.1 Release Candidate available from gcc.gnu.org
On Thu, Apr 22, 2021 at 01:27:40PM +0100, Jonathan Wakely wrote: > I'm just finishing testing this on various targets and will push to > trunk. I'd like to backport it to gcc-11 too, to fix PR100179. Ok for 11.1. > commit 50070d8602a07160cece5890899929e9f210244d > Author: Jonathan Wakely > Date: Thu Apr 22 11:10:06 2021 > > libstdc++: Remove #error from implementation [PR 100179] > > This removes the #error from for the case where > neither __atomic_semaphore nor __platform_semaphore is defined. > > Also rename the _GLIBCXX_REQUIRE_POSIX_SEMAPHORE macro to > _GLIBCXX_USE_POSIX_SEMAPHORE for consistency with the similar > _GLIBCXX_USE_CXX11_ABI macro that can be used to request an alternative > (ABI-changing) implementation. > > libstdc++-v3/ChangeLog: > > PR libstdc++/100179 > * include/bits/semaphore_base.h: Remove #error. > * include/std/semaphore: Do not define anything unless one of > the implementations is available. Jakub
Re: GCC 11.1 Release Candidate available from gcc.gnu.org
On 21/04/21 15:30 +0100, Jonathan Wakely wrote: On 21/04/21 13:12 +0100, Jonathan Wakely wrote: On 21/04/21 12:38 +0100, Jonathan Wakely wrote: On 20/04/21 22:12 -0700, Thomas Rodgers wrote: @@ -86,6 +88,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } } +_GLIBCXX_ALWAYS_INLINE bool +_M_try_acquire() noexcept +{ + for (;;) + { + auto __err = sem_trywait(&_M_semaphore); + if (__err && (errno == EINTR)) + continue; + else if (__err && (errno == EAGAIN)) + return false; + else if (__err) + std::terminate(); + else + break; + } + return true; +} + _GLIBCXX_ALWAYS_INLINE void _M_release(std::ptrdiff_t __update) noexcept { Please just commit this part to trunk and gcc-11, not the macro renaming (as that's been fixed by Jakub already). I think on trunk I'd prefer to do the attached. WDYT? In fact I think something like this is neded even on gcc-11 branch, otherwise anything that tries to include without atomics or sem_t gets hard errors: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100179 And includes which includes , meaning is unusable on those targets. So I think removing this #error is essential: -// Note: the _GLIBCXX_REQUIRE_POSIX_SEMAPHORE macro can be used to force the -// use of Posix semaphores (sem_t). Doing so however, alters the ABI. -#if defined __cpp_lib_atomic_wait && !_GLIBCXX_REQUIRE_POSIX_SEMAPHORE using __semaphore_impl = __atomic_semaphore; -#elif _GLIBCXX_HAVE_POSIX_SEMAPHORE - using __semaphore_impl = __platform_semaphore; -#else -# error "No suitable semaphore implementation available" -#endif +#endif // __cpp_lib_atomic_wait Here's a simpler patch which just removes the #error and renames the REQUIRE macro to USE. This still dumps the whole of and in the global namespace when is included, but we'll have to live with that for the 11.1 release. I'm just finishing testing this on various targets and will push to trunk. I'd like to backport it to gcc-11 too, to fix PR100179. commit 50070d8602a07160cece5890899929e9f210244d Author: Jonathan Wakely Date: Thu Apr 22 11:10:06 2021 libstdc++: Remove #error from implementation [PR 100179] This removes the #error from for the case where neither __atomic_semaphore nor __platform_semaphore is defined. Also rename the _GLIBCXX_REQUIRE_POSIX_SEMAPHORE macro to _GLIBCXX_USE_POSIX_SEMAPHORE for consistency with the similar _GLIBCXX_USE_CXX11_ABI macro that can be used to request an alternative (ABI-changing) implementation. libstdc++-v3/ChangeLog: PR libstdc++/100179 * include/bits/semaphore_base.h: Remove #error. * include/std/semaphore: Do not define anything unless one of the implementations is available. diff --git a/libstdc++-v3/include/bits/semaphore_base.h b/libstdc++-v3/include/bits/semaphore_base.h index 84b33423fff..4948f0fd0bc 100644 --- a/libstdc++-v3/include/bits/semaphore_base.h +++ b/libstdc++-v3/include/bits/semaphore_base.h @@ -263,14 +263,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION }; #endif // __cpp_lib_atomic_wait -// Note: the _GLIBCXX_REQUIRE_POSIX_SEMAPHORE macro can be used to force the +// Note: the _GLIBCXX_USE_POSIX_SEMAPHORE macro can be used to force the // use of Posix semaphores (sem_t). Doing so however, alters the ABI. -#if defined __cpp_lib_atomic_wait && !_GLIBCXX_REQUIRE_POSIX_SEMAPHORE +#if defined __cpp_lib_atomic_wait && !_GLIBCXX_USE_POSIX_SEMAPHORE using __semaphore_impl = __atomic_semaphore; #elif _GLIBCXX_HAVE_POSIX_SEMAPHORE using __semaphore_impl = __platform_semaphore; -#else -# error "No suitable semaphore implementation available" #endif _GLIBCXX_END_NAMESPACE_VERSION diff --git a/libstdc++-v3/include/std/semaphore b/libstdc++-v3/include/std/semaphore index a1560915d83..52addca742c 100644 --- a/libstdc++-v3/include/std/semaphore +++ b/libstdc++-v3/include/std/semaphore @@ -34,6 +34,7 @@ #if __cplusplus > 201703L #include +#if __cpp_lib_atomic_wait || _GLIBCXX_HAVE_POSIX_SEMAPHORE namespace std _GLIBCXX_VISIBILITY(default) { _GLIBCXX_BEGIN_NAMESPACE_VERSION @@ -89,5 +90,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _GLIBCXX_END_NAMESPACE_VERSION } // namespace +#endif // cpp_lib_atomic_wait || _GLIBCXX_HAVE_POSIX_SEMAPHORE #endif // C++20 #endif // _GLIBCXX_SEMAPHORE
[PATCH] Remove __cplusplus >= 201103
Right now, we require a C++11 compiler, so the check is not needed any longer. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed once GCC 11.1 is released? Thanks, Martin gcc/analyzer/ChangeLog: * program-state.cc (program_state::operator=): Remove __cplusplus >= 201103. (program_state::program_state): Likewise. * program-state.h: Likewise. * region-model.h (class region_model): Remove dead code. gcc/ChangeLog: * bitmap.h (class auto_bitmap): Remove __cplusplus >= 201103. * config/aarch64/aarch64.c: Likewise. * gimple-ssa-store-merging.c (store_immediate_info::store_immediate_info): Likewise. * sbitmap.h: Likewise. * system.h (STATIC_ASSERT): Likewise. --- gcc/analyzer/program-state.cc | 2 -- gcc/analyzer/program-state.h | 4 gcc/analyzer/region-model.h| 5 - gcc/bitmap.h | 2 -- gcc/config/aarch64/aarch64.c | 2 -- gcc/gimple-ssa-store-merging.c | 10 +- gcc/sbitmap.h | 2 -- gcc/system.h | 9 + 8 files changed, 2 insertions(+), 34 deletions(-) diff --git a/gcc/analyzer/program-state.cc b/gcc/analyzer/program-state.cc index f8094621309..5c690b08fd3 100644 --- a/gcc/analyzer/program-state.cc +++ b/gcc/analyzer/program-state.cc @@ -731,7 +731,6 @@ program_state::operator= (const program_state ) return *this; } -#if __cplusplus >= 201103 /* Move constructor for program_state (when building with C++11). */ program_state::program_state (program_state &) : m_region_model (other.m_region_model), @@ -747,7 +746,6 @@ program_state::program_state (program_state &) m_valid = other.m_valid; } -#endif /* program_state's dtor. */ diff --git a/gcc/analyzer/program-state.h b/gcc/analyzer/program-state.h index 898c57ff833..f16fe6ba984 100644 --- a/gcc/analyzer/program-state.h +++ b/gcc/analyzer/program-state.h @@ -192,11 +192,7 @@ public: program_state (const extrinsic_state _state); program_state (const program_state ); program_state& operator= (const program_state ); - -#if __cplusplus >= 201103 program_state (program_state &); -#endif - ~program_state (); hashval_t hash () const; diff --git a/gcc/analyzer/region-model.h b/gcc/analyzer/region-model.h index 4123500f3bc..a169396dcb5 100644 --- a/gcc/analyzer/region-model.h +++ b/gcc/analyzer/region-model.h @@ -414,11 +414,6 @@ class region_model region_model (region_model_manager *mgr); region_model (const region_model ); ~region_model (); - -#if 0//__cplusplus >= 201103 - region_model (region_model &); -#endif - region_model = (const region_model ); bool operator== (const region_model ) const; diff --git a/gcc/bitmap.h b/gcc/bitmap.h index 84632af7009..26138556b2a 100644 --- a/gcc/bitmap.h +++ b/gcc/bitmap.h @@ -951,10 +951,8 @@ class auto_bitmap // Prevent making a copy that references our bitmap. auto_bitmap (const auto_bitmap &); auto_bitmap = (const auto_bitmap &); -#if __cplusplus >= 201103L auto_bitmap (auto_bitmap &&); auto_bitmap = (auto_bitmap &&); -#endif bitmap_head m_bits; }; diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index 12625a4bee3..62fcad88215 100644 --- a/gcc/config/aarch64/aarch64.c +++ b/gcc/config/aarch64/aarch64.c @@ -221,9 +221,7 @@ public: predicate in each predicate argument register. This means that we need at least 12 pieces. */ static const unsigned int MAX_PIECES = NUM_FP_ARG_REGS + NUM_PR_ARG_REGS; -#if __cplusplus >= 201103L static_assert (MAX_PIECES >= 8, "Need to store at least 8 predicates"); -#endif /* Describes one piece of a PST. Each piece is one of: diff --git a/gcc/gimple-ssa-store-merging.c b/gcc/gimple-ssa-store-merging.c index 7eb50d65a20..123c92d9b44 100644 --- a/gcc/gimple-ssa-store-merging.c +++ b/gcc/gimple-ssa-store-merging.c @@ -1595,17 +1595,9 @@ store_immediate_info::store_immediate_info (unsigned HOST_WIDE_INT bs, : bitsize (bs), bitpos (bp), bitregion_start (brs), bitregion_end (bre), stmt (st), order (ord), rhs_code (rhscode), n (nr), ins_stmt (ins_stmtp), bit_not_p (bitnotp), ops_swapped_p (false), -lp_nr (nr2) -#if __cplusplus >= 201103L -, ops { op0r, op1r } +lp_nr (nr2), ops { op0r, op1r } { } -#else -{ - ops[0] = op0r; - ops[1] = op1r; -} -#endif /* Struct representing a group of stores to contiguous memory locations. These are produced by the second phase (coalescing) and consumed in the diff --git a/gcc/sbitmap.h b/gcc/sbitmap.h index 89a86e49f60..17660e5a65e 100644 --- a/gcc/sbitmap.h +++ b/gcc/sbitmap.h @@ -301,10 +301,8 @@ private: /* Prevent making a copy that refers to our sbitmap. */ auto_sbitmap (const auto_sbitmap &); auto_sbitmap = (const auto_sbitmap &); -#if __cplusplus >= 201103L auto_sbitmap (auto_sbitmap &&); auto_sbitmap = (auto_sbitmap &&); -#endif /* The
[PATCH] Fix various typos.
Hello. The patch fixes various typos. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? Thanks, Martin PR testsuite/100159 PR testsuite/100192 gcc/ChangeLog: * builtins.c (expand_builtin): Fix typos and missing comments. * dwarf2out.c (gen_subprogram_die): Likewise. (gen_struct_or_union_type_die): Likewise. gcc/fortran/ChangeLog: * frontend-passes.c (optimize_expr): Fix typos and missing comments. gcc/testsuite/ChangeLog: * g++.dg/template/nontype29.C: Fix typos and missing comments. * gcc.dg/Warray-bounds-64.c: Likewise. * gcc.dg/Warray-parameter.c: Likewise. * gcc.dg/Wstring-compare.c: Likewise. * gcc.dg/format/gcc_diag-11.c: Likewise. * gfortran.dg/array_constructor_3.f90: Likewise. * gfortran.dg/matmul_bounds_9.f90: Likewise. * gfortran.dg/pr78033.f90: Likewise. * gfortran.dg/pr96325.f90: Likewise. --- gcc/builtins.c| 2 +- gcc/dwarf2out.c | 4 ++-- gcc/fortran/frontend-passes.c | 2 +- gcc/testsuite/g++.dg/template/nontype29.C | 4 ++-- gcc/testsuite/gcc.dg/Warray-bounds-64.c | 2 +- gcc/testsuite/gcc.dg/Warray-parameter.c | 2 +- gcc/testsuite/gcc.dg/Wstring-compare.c| 10 +- gcc/testsuite/gcc.dg/format/gcc_diag-11.c | 2 +- gcc/testsuite/gfortran.dg/array_constructor_3.f90 | 2 +- gcc/testsuite/gfortran.dg/matmul_bounds_9.f90 | 2 +- gcc/testsuite/gfortran.dg/pr78033.f90 | 2 +- gcc/testsuite/gfortran.dg/pr96325.f90 | 2 +- 12 files changed, 18 insertions(+), 18 deletions(-) diff --git a/gcc/builtins.c b/gcc/builtins.c index d30c4eb62fc..8c5324bf7de 100644 --- a/gcc/builtins.c +++ b/gcc/builtins.c @@ -9986,7 +9986,7 @@ expand_builtin (tree exp, rtx target, rtx subtarget, machine_mode mode, break; /* Expand it as BUILT_IN_MEMCMP_EQ first. If not successful, change it - back to a BUILT_IN_STRCMP. Remember to delete the 3rd paramater + back to a BUILT_IN_STRCMP. Remember to delete the 3rd parameter when changing it to a strcmp call. */ case BUILT_IN_STRCMP_EQ: target = expand_builtin_memcmp (exp, target, true); diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index aba16848295..c36fd5a7f6a 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -23542,7 +23542,7 @@ gen_subprogram_die (tree decl, dw_die_ref context_die) resolve_variable_values (); } - /* Generate child dies for template paramaters. */ + /* Generate child dies for template parameters. */ if (early_dwarf && debug_info_level > DINFO_LEVEL_TERSE) gen_generic_params_dies (decl); @@ -25471,7 +25471,7 @@ gen_struct_or_union_type_die (tree type, dw_die_ref context_die, scope_die = scope_die_for (type, context_die); - /* Generate child dies for template paramaters. */ + /* Generate child dies for template parameters. */ if (!type_die && debug_info_level > DINFO_LEVEL_TERSE) schedule_generic_params_dies_gen (type); diff --git a/gcc/fortran/frontend-passes.c b/gcc/fortran/frontend-passes.c index 7d3eae67632..93ac4b4a17c 100644 --- a/gcc/fortran/frontend-passes.c +++ b/gcc/fortran/frontend-passes.c @@ -373,7 +373,7 @@ optimize_expr (gfc_expr **e, int *walk_subtrees ATTRIBUTE_UNUSED, return 0; } -/* Auxiliary function to handle the arguments to reduction intrnisics. If the +/* Auxiliary function to handle the arguments to reduction intrinsics. If the function is a scalar, just copy it; otherwise returns the new element, the old one can be freed. */ diff --git a/gcc/testsuite/g++.dg/template/nontype29.C b/gcc/testsuite/g++.dg/template/nontype29.C index 18a3058e7e3..dd4e20f7e62 100644 --- a/gcc/testsuite/g++.dg/template/nontype29.C +++ b/gcc/testsuite/g++.dg/template/nontype29.C @@ -3,7 +3,7 @@ // { dg-do compile } // { dg-options "-Wall" } -#if __cpluspls >= 201103L +#if __cplusplus >= 201103L // C++ 11 test case from comment #0. namespace comment_0 { @@ -60,7 +60,7 @@ void h () } // comment_2 -#if __cpluspls >= 201103L +#if __cplusplus >= 201103L // C++ 11 test case from comment #5. namespace comment_5 { diff --git a/gcc/testsuite/gcc.dg/Warray-bounds-64.c b/gcc/testsuite/gcc.dg/Warray-bounds-64.c index 88b88debff4..f5ebc3dd4b2 100644 --- a/gcc/testsuite/gcc.dg/Warray-bounds-64.c +++ b/gcc/testsuite/gcc.dg/Warray-bounds-64.c @@ -7,7 +7,7 @@ asks for. { dg-do compile } - { dg-options "-O2 -Wall -Warray-parameter -Wno-vla-paramater" } */ + { dg-options "-O2 -Wall -Warray-parameter -Wno-vla-parameter" } */ #define NOIPA __attribute__ ((noipa)) diff --git a/gcc/testsuite/gcc.dg/Warray-parameter.c b/gcc/testsuite/gcc.dg/Warray-parameter.c index 42be3100e45..6c5195a31be 100644 --- a/gcc/testsuite/gcc.dg/Warray-parameter.c +++
Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute
On Thu, Apr 22, 2021 at 01:23:20PM +0200, Richard Biener via Gcc-patches wrote: > > The question is if the pragma GCC target right now behaves incrementally > > or not, whether > > #pragma GCC target("avx2") > > adds -mavx2 to options if it was missing before and nothing otherwise, or if > > it switches other options off. If it is incremental, we could e.g. try to > > use the second least significant bit of global_options_set.x_* to mean > > this option has been set explicitly by some surrounding #pragma GCC target. > > The normal tests - global_options_set.x_flag_whatever could still work > > fine because they wouldn't care if the option was explicit from anywhere > > (command line or GCC target or target attribute) and just & 2 would mean > > it was explicit from pragma GCC target; though there is the case of > > bitfields... And then the inlining decision could check the & 2 flags to > > see what is required and what is just from command line. > > Or we can have some other pragma GCC that would be like target but would > > have flags that are explicit (and could e.g. be more restricted, to ISA > > options only, and let those use in addition to #pragma GCC target. > > I'm still curious as to what you think will break if always-inline does what > it is documented to do. We will silently accept calling intrinsics that must be used only in certain ISA contexts, which will lead to people writing non-portable code. So -O2 -mno-avx #include void foo (__m256 *x) { x[0] = _mm256_sub_ps (x[1], x[2]); } etc. will now be accepted when it shouldn't be. clang rejects it like gcc with: 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target feature 'avx', but would be inlined into function 'foo' that is compiled without support for 'avx' x[0] = _mm256_sub_ps (x[1], x[2]); ^ Note, if I do: #include __attribute__((target ("no-sse3"))) void foo (__m256 *x) { x[0] = _mm256_sub_ps (x[1], x[2]); } and compile clang -S -O2 -mavx2 1.c 1.c:6:10: error: always_inline function '_mm256_sub_ps' requires target feature 'avx', but would be inlined into function 'foo' that is compiled without support for 'avx' x[0] = _mm256_sub_ps (x[1], x[2]); ^ then from the error message it seems that unlike GCC, clang remembers the exact target features that are needed for the intrinsics and checks just those. Though, looking at the preprocessed source, seems it uses static __inline __m256 __attribute__((__always_inline__, __nodebug__, __target__("avx"), __min_vector_width__(256))) _mm256_sub_ps(__m256 __a, __m256 __b) { return (__m256)((__v8sf)__a-(__v8sf)__b); } and not target pragmas. Anyway, if we tweak our intrinsic headers so that -#ifndef __AVX__ #pragma GCC push_options #pragma GCC target("avx") -#define __DISABLE_AVX__ -#endif /* __AVX__ */ ... -#ifdef __DISABLE_AVX__ -#undef __DISABLE_AVX__ #pragma GCC pop_options -#endif /* __DISABLE_AVX__ */ and do the opts_set->x_* & 2 stuff on explicit options coming out of target/optimize pragmas and attributes, perhaps we don't even need to introduce a new attribute and can handle everything magically: 1) if it is gnu_inline extern inline, allow indirect calls, otherwise disallow them for always_inline functions 2) for the isa flags and option mismatches, only disallow opts_set->x_* & 2 stuff This will keep both intrinsics and glibc fortify macros working fine in all the needed use cases. Jakub
Re: [PATCH] LTO: fallback to -flto=N if -flto=jobserver does not work.
On 4/22/21 1:19 PM, Richard Biener wrote: > On Thu, Apr 22, 2021 at 11:02 AM Martin Liška wrote: >> >> On 4/22/21 10:04 AM, Richard Biener wrote: >>> On Wed, Apr 21, 2021 at 3:08 PM Martin Liška wrote: When -flto=jobserver is used and we cannot detect job server, then we can still fallbackto -flto=N mode. Patch can bootstrap on x86_64-linux-gnu and survives regression tests. Ready to be installed? >>> >>> I think this behavior needs to be documented - it falls back to a less >>> conservative (possibly system overloading) mode - which IMHO is >>> non-obvious and IMHO we shouldn't do. >> >> Sure, I'm sending corresponding patch. Note that it's quite common mistake >> that '+' is missing in Makefile rule. That was motivation for my change. > > Sure, but that change won't get this fixed. It will as linker command line will contain (-flto=jobserver) and LTO will fallback to -flto=N. > IMHO we should eventually > emit diagnostic like > > warning: could not find jobserver, compiling N jobs serially > > once N > 1 (or 2?). We do that now (for all N): lto-wrapper: warning: jobserver is not available: ‘MAKEFLAGS’ environment variable is unset > Likewise if people just use -flto and auto-detection > finds nothing: -flto != -flto=auto Yes, -flto is a serial linking and we can emit a warning. > > warning: using serial compilation of N LTRANS jobs > note: refer to http:// for how to use parallel compile > > using the URL diagnostics to point to -flto=... documentation. What about making that a proper warning (-Wlto)? We have diagnostics infrastructure that prints URL links. > > That is, teach users rather than second-guessing and eventually > blowing things up. IMHO only the jobserver mode is safe to > automatically use. Well, -flto=auto is also fine and document. I think there is no possibility auto CPU deduction can fail. So -flto=jobserver (with missing make job server) and -flto (equal to -flto=1) worth emitting a warning. What do you think? Martin > > Richard. > >> Martin >> >>> >>> Richard. >>> Thanks, Martin gcc/ChangeLog: * lto-wrapper.c (run_gcc): When -flto=jobserver is used, but the makeserver cannot be detected, then use -flto=N fallback. --- gcc/lto-wrapper.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index 03a5922f8ea..0b626d7c811 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -1585,8 +1585,9 @@ run_gcc (unsigned argc, char *argv[]) if (jobserver && jobserver_error != NULL) { warning (0, jobserver_error); - parallel = 0; + /* Fall back to auto parallelism. */ jobserver = 0; + auto_parallel = 1; } else if (!jobserver && jobserver_error == NULL) { -- 2.31.1 >>
Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute
On Thu, Apr 22, 2021 at 1:58 PM H.J. Lu wrote: > > On Thu, Apr 22, 2021 at 4:23 AM Richard Biener > wrote: > > > > On Thu, Apr 22, 2021 at 12:30 PM Jakub Jelinek via Gcc-patches > > wrote: > > > > > > On Wed, Apr 21, 2021 at 06:01:07PM -0700, H.J. Lu via Gcc-patches wrote: > > > > How about this? > > > > > > > > @item general_regs_only > > > > @cindex @code{general_regs_only} function attribute, x86 > > > > The @code{general_regs_only} function attribute informs the compiler > > > > that the function uses only general purpose registers. When the > > > > compiler inlines a function with the @code{always_inline} attribute, > > > > target-specific compilation options may lead to inline failures. > > > > The @code{general_regs_only} attribute, if applicable, can be used > > > > together with the @code{always_inline} attribute to reduce inlining > > > > failure. > > > > > > I don't really like this attribute. > > > It is very specific to what you want to solve and doesn't address the > > > general problem, that always_inline means different things in different > > > code, and that is a problem for many targets, not just one. > > > > > > As has been written, in some cases it means inline always, error > > > whenever it is called indirectly which can't be optimized into a direct > > > call that can be inlined and error whenever the inlining fails for other > > > reasons. > > > > > > Another case, e.g. in the glibc fortify wrappers, is inline always > > > when the call is direct or an indirect call can be optimized into a direct > > > call and error when the inlining fails, but support indirect calls without > > > errors. > > > > > > Another case, which is most of the x86/aarch64/arm etc. intrinsics, is > > > inline always unless there is a target mismatch (roughly what is > > > actually implemented). > > > > > > Because from the always_inline attribute it is impossible to determine > > > which > > > one of those it is (for the indirect calls the rule could be > > > gnu_inline extern inline means indirect calls are ok, anything else means > > > indirect calls are bad), we need new syntax to distinguish those cases. > > > > > > general_regs_only attribute doesn't seem to be it, e.g. for the glibc > > > fortify wrappers cases I don't see why we should forbid using floating > > > point > > > in such inlines. > > > > > > So IMHO we need some new attribute for one of those, or optional parameter > > > to always_inline. > > > > > > For the intrinsic case, ideal would be if we could record which ISA flags > > > (or more generally which options) are required and which are not. Either > > > have some syntax where those would be explicitly specified in attribute > > > (but > > > frankly that would be a maintainance nightmare), or derive those from > > > surrounding pragmas. Right now we have those wrapped in > > > #ifndef __AVX2__ > > > #pragma GCC push_options > > > #pragma GCC target("avx2") > > > #define __DISABLE_AVX2__ > > > #endif /* __AVX2__ */ > > > > > > ... > > > > > > #ifdef __DISABLE_AVX2__ > > > #undef __DISABLE_AVX2__ > > > #pragma GCC pop_options > > > #endif /* __DISABLE_AVX2__ */ > > > > > > The question is if the pragma GCC target right now behaves incrementally > > > or not, whether > > > #pragma GCC target("avx2") > > > adds -mavx2 to options if it was missing before and nothing otherwise, or > > > if > > > it switches other options off. If it is incremental, we could e.g. try to > > > use the second least significant bit of global_options_set.x_* to mean > > > this option has been set explicitly by some surrounding #pragma GCC > > > target. > > > The normal tests - global_options_set.x_flag_whatever could still work > > > fine because they wouldn't care if the option was explicit from anywhere > > > (command line or GCC target or target attribute) and just & 2 would mean > > > it was explicit from pragma GCC target; though there is the case of > > > bitfields... And then the inlining decision could check the & 2 flags to > > > see what is required and what is just from command line. > > > Or we can have some other pragma GCC that would be like target but would > > > have flags that are explicit (and could e.g. be more restricted, to ISA > > > options only, and let those use in addition to #pragma GCC target. > > > > I'm still curious as to what you think will break if always-inline does what > > it is documented to do. > > No wrong code. But the compiler will generate a different error message > at the later stage if the ISA for the intrinsic isn't enabled. But all issues at hand we're trying to fix are rejections of _valid_ code. Improving diagnostics for invalid code should be secondary to properly accepting valid code. There's nothing perfect here - for diagnostics we might want to consider an __attribute__((requires_isa(a,b,c))) which says that calling a function with such attribute requires the ISAs in the list to be active. Target code can then iterate over calls _before_
Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute
On Thu, Apr 22, 2021 at 4:23 AM Richard Biener wrote: > > On Thu, Apr 22, 2021 at 12:30 PM Jakub Jelinek via Gcc-patches > wrote: > > > > On Wed, Apr 21, 2021 at 06:01:07PM -0700, H.J. Lu via Gcc-patches wrote: > > > How about this? > > > > > > @item general_regs_only > > > @cindex @code{general_regs_only} function attribute, x86 > > > The @code{general_regs_only} function attribute informs the compiler > > > that the function uses only general purpose registers. When the > > > compiler inlines a function with the @code{always_inline} attribute, > > > target-specific compilation options may lead to inline failures. > > > The @code{general_regs_only} attribute, if applicable, can be used > > > together with the @code{always_inline} attribute to reduce inlining > > > failure. > > > > I don't really like this attribute. > > It is very specific to what you want to solve and doesn't address the > > general problem, that always_inline means different things in different > > code, and that is a problem for many targets, not just one. > > > > As has been written, in some cases it means inline always, error > > whenever it is called indirectly which can't be optimized into a direct > > call that can be inlined and error whenever the inlining fails for other > > reasons. > > > > Another case, e.g. in the glibc fortify wrappers, is inline always > > when the call is direct or an indirect call can be optimized into a direct > > call and error when the inlining fails, but support indirect calls without > > errors. > > > > Another case, which is most of the x86/aarch64/arm etc. intrinsics, is > > inline always unless there is a target mismatch (roughly what is > > actually implemented). > > > > Because from the always_inline attribute it is impossible to determine which > > one of those it is (for the indirect calls the rule could be > > gnu_inline extern inline means indirect calls are ok, anything else means > > indirect calls are bad), we need new syntax to distinguish those cases. > > > > general_regs_only attribute doesn't seem to be it, e.g. for the glibc > > fortify wrappers cases I don't see why we should forbid using floating point > > in such inlines. > > > > So IMHO we need some new attribute for one of those, or optional parameter > > to always_inline. > > > > For the intrinsic case, ideal would be if we could record which ISA flags > > (or more generally which options) are required and which are not. Either > > have some syntax where those would be explicitly specified in attribute (but > > frankly that would be a maintainance nightmare), or derive those from > > surrounding pragmas. Right now we have those wrapped in > > #ifndef __AVX2__ > > #pragma GCC push_options > > #pragma GCC target("avx2") > > #define __DISABLE_AVX2__ > > #endif /* __AVX2__ */ > > > > ... > > > > #ifdef __DISABLE_AVX2__ > > #undef __DISABLE_AVX2__ > > #pragma GCC pop_options > > #endif /* __DISABLE_AVX2__ */ > > > > The question is if the pragma GCC target right now behaves incrementally > > or not, whether > > #pragma GCC target("avx2") > > adds -mavx2 to options if it was missing before and nothing otherwise, or if > > it switches other options off. If it is incremental, we could e.g. try to > > use the second least significant bit of global_options_set.x_* to mean > > this option has been set explicitly by some surrounding #pragma GCC target. > > The normal tests - global_options_set.x_flag_whatever could still work > > fine because they wouldn't care if the option was explicit from anywhere > > (command line or GCC target or target attribute) and just & 2 would mean > > it was explicit from pragma GCC target; though there is the case of > > bitfields... And then the inlining decision could check the & 2 flags to > > see what is required and what is just from command line. > > Or we can have some other pragma GCC that would be like target but would > > have flags that are explicit (and could e.g. be more restricted, to ISA > > options only, and let those use in addition to #pragma GCC target. > > I'm still curious as to what you think will break if always-inline does what > it is documented to do. No wrong code. But the compiler will generate a different error message at the later stage if the ISA for the intrinsic isn't enabled. -- H.J.
[PING^3][PATCH][omp, simt] Handle alternative IV
On 12/17/20 5:46 PM, Tom de Vries wrote: > On 10/15/20 5:05 PM, Tom de Vries wrote: >> On 10/2/20 3:21 PM, Tom de Vries wrote: >>> Hi, >>> >>> Consider the test-case libgomp.c/pr81778.c added in this commit, with >>> this core loop (note: CANARY_SIZE set to 0 for simplicity): >>> ... >>> int s = 1; >>> #pragma omp target simd >>> for (int i = N - 1; i > -1; i -= s) >>> a[i] = 1; >>> ... >>> which, given that N is 32, sets a[0..31] to 1. >>> >>> After omp-expand, this looks like: >>> ... >>>: >>> simduid.7 = .GOMP_SIMT_ENTER (simduid.7); >>> .omp_simt.8 = .GOMP_SIMT_ENTER_ALLOC (simduid.7); >>> D.3193 = -s; >>> s.9 = s; >>> D.3204 = .GOMP_SIMT_LANE (); >>> D.3205 = -s.9; >>> D.3206 = (int) D.3204; >>> D.3207 = D.3205 * D.3206; >>> i = D.3207 + 31; >>> D.3209 = 0; >>> D.3210 = -s.9; >>> D.3211 = D.3210 - i; >>> D.3210 = -s.9; >>> D.3212 = D.3211 / D.3210; >>> D.3213 = (unsigned int) D.3212; >>> D.3213 = i >= 0 ? D.3213 : 0; >>> >>>: >>> if (D.3209 < D.3213) >>> goto ; [87.50%] >>> else >>> goto ; [12.50%] >>> >>>: >>> a[i] = 1; >>> D.3215 = -s.9; >>> D.3219 = .GOMP_SIMT_VF (); >>> D.3216 = (int) D.3219; >>> D.3220 = D.3215 * D.3216; >>> i = D.3220 + i; >>> D.3209 = D.3209 + 1; >>> goto ; [100.00%] >>> ... >>> >>> On nvptx, the first time bb6 is executed, i is in the 0..31 range (depending >>> on the lane that is executing) at bb entry. >>> >>> So we have the following sequence: >>> - a[0..31] is set to 1 >>> - i is updated to -32..-1 >>> - D.3209 is updated to 1 (being 0 initially) >>> - bb19 is executed, and if condition (D.3209 < D.3213) == (1 < 32) evaluates >>> to true >>> - bb6 is once more executed, which should not happen because all the >>> elements >>> that needed to be handled were already handled. >>> - consequently, elements that should not be written are written >>> - with CANARY_SIZE == 0, we may run into a libgomp error: >>> ... >>> libgomp: cuCtxSynchronize error: an illegal memory access was encountered >>> ... >>> and with CANARY_SIZE unmodified, we run into: >>> ... >>> Expected 0, got 1 at base[-961] >>> Aborted (core dumped) >>> ... >>> >>> The cause of this is as follows: >>> - because the step s is a variable rather than a constant, an alternative >>> IV (D.3209 in our example) is generated in expand_omp_simd, and the >>> loop condition is tested in terms of the alternative IV rather than >>> the original IV (i in our example). >>> - the SIMT code in expand_omp_simd works by modifying step and initial >>> value. >>> - The initial value fd->loop.n1 is loaded into a variable n1, which is >>> modified by the SIMT code and then used there-after. >>> - The step fd->loop.step is loaded into a variable step, which is is >>> modified >>> by the SIMT code, but afterwards there are uses of both step and >>> fd->loop.step. >>> - There are uses of fd->loop.step in the alternative IV handling code, >>> which should use step instead. >>> >>> Fix this by introducing an additional variable orig_step, which is not >>> modified by the SIMT code and replacing all remaining uses of fd->loop.step >>> by either step or orig_step. >>> >>> Build on x86_64-linux with nvptx accelerator, tested libgomp. >>> >>> This fixes for-5.c and for-6.c FAILs I'm currently seeing on a quadro m1200 >>> with driver 450.66. >>> >>> OK for trunk? >>> >> Ping^3. Thanks, - Tom >>> [omp, simt] Handle alternative IV >>> >>> gcc/ChangeLog: >>> >>> 2020-10-02 Tom de Vries >>> >>> * omp-expand.c (expand_omp_simd): Add step_orig, and replace uses of >>> fd->loop.step by either step or orig_step. >>> >>> libgomp/ChangeLog: >>> >>> 2020-10-02 Tom de Vries >>> >>> * testsuite/libgomp.c/pr81778.c: New test. >>> >>> --- >>> gcc/omp-expand.c | 11 >>> libgomp/testsuite/libgomp.c/pr81778.c | 48 >>> +++ >>> 2 files changed, 54 insertions(+), 5 deletions(-) >>> >>> diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c >>> index 99cb4f9dda4..80e35ac0294 100644 >>> --- a/gcc/omp-expand.c >>> +++ b/gcc/omp-expand.c >>> @@ -6307,6 +6307,7 @@ expand_omp_simd (struct omp_region *region, struct >>> omp_for_data *fd) >>>n2 = OMP_CLAUSE_DECL (innerc); >>> } >>>tree step = fd->loop.step; >>> + tree orig_step = step; /* May be different from step if is_simt. */ >>> >>>bool is_simt = omp_find_clause (gimple_omp_for_clauses (fd->for_stmt), >>> OMP_CLAUSE__SIMT_); >>> @@ -6455,7 +6456,7 @@ expand_omp_simd (struct omp_region *region, struct >>> omp_for_data *fd) >>>tree altv = NULL_TREE, altn2 = NULL_TREE; >>>if (fd->collapse == 1 >>>&& !broken_loop >>> - && TREE_CODE (fd->loops[0].step) != INTEGER_CST) >>> + && TREE_CODE (orig_step) != INTEGER_CST) >>> { >>>/* The vectorizer currently punts on loops with non-constant steps >>> for the main IV (can't compute
Re: [PATCH 1/2] bpf: align function entry point to 64 bits
Hi YiFei. This is OK for both trunk and GCC 10. Thanks for the fix! [I see you don't have a copyright transfer contract in place. I believe this change, and also the patch in 2/2, are small/trivial enough to not require one, but if you plan to do more contributions in the future we will need you to do the paperwork. I just sent you more information about this privately.] > Libbpf does not treat paddings after functions well. If function > symbols does not cover a whole text section, it will emit error > similar to: > > libbpf: sec '.text': failed to find program symbol at offset 56 > > Each instruction in BPF is a multiple of 8 bytes, so align the > functions to 8 bytes, similar to how clang does it. > > 2021-04-22 YiFei Zhu > > gcc/ > > * config/bpf/bpf.h (FUNCTION_BOUNDARY): Set to 64. > --- > gcc/config/bpf/bpf.h | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h > index ef0123b56a63..7ff2f310d583 100644 > --- a/gcc/config/bpf/bpf.h > +++ b/gcc/config/bpf/bpf.h > @@ -57,8 +57,8 @@ > 64-bit at any time. */ > #define STACK_BOUNDARY 64 > > -/* Function entry points are aligned to 128 bits. */ > -#define FUNCTION_BOUNDARY 128 > +/* Function entry points are aligned to 64 bits. */ > +#define FUNCTION_BOUNDARY 64 > > /* Maximum alignment required by data of any type. */ > #define BIGGEST_ALIGNMENT 64
Re: [PATCH] aarch64: Avoid duplicating bti j insns for jump tables [PR99988]
Christophe Lyon via Gcc-patches writes: > On Wed, 21 Apr 2021 at 14:05, Richard Sandiford via Gcc-patches > wrote: >> >> Alex Coplan writes: >> > Hi Richard, >> > >> > On 15/04/2021 18:45, Richard Sandiford wrote: >> >> Looks good in general, but like you say, it's GCC 12 material. >> > >> > Thanks for the review. The attached patch addresses these comments and >> > bootstraps/regtests OK on aarch64-linux-gnu. OK for trunk? >> >> OK, thanks. >> > > The new test fails with -mabi=ilp32: > sorry, unimplemented: return address signing is only supported for > '-mabi=lp64' > > Is that OK: > diff --git a/gcc/testsuite/gcc.target/aarch64/pr99988.c > b/gcc/testsuite/gcc.target/aarch64/pr99988.c > index 2d87f41..7cca496 100644 > --- a/gcc/testsuite/gcc.target/aarch64/pr99988.c > +++ b/gcc/testsuite/gcc.target/aarch64/pr99988.c > @@ -1,4 +1,4 @@ > -/* { dg-do compile } */ > +/* { dg-do compile { target lp64 } } */ > /* { dg-options "-O2 -mbranch-protection=standard" } */ > /* { dg-final { scan-assembler-times {bti j} 13 } } */ > int a; OK, thanks. Richard
Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute
On Thu, Apr 22, 2021 at 12:30 PM Jakub Jelinek via Gcc-patches wrote: > > On Wed, Apr 21, 2021 at 06:01:07PM -0700, H.J. Lu via Gcc-patches wrote: > > How about this? > > > > @item general_regs_only > > @cindex @code{general_regs_only} function attribute, x86 > > The @code{general_regs_only} function attribute informs the compiler > > that the function uses only general purpose registers. When the > > compiler inlines a function with the @code{always_inline} attribute, > > target-specific compilation options may lead to inline failures. > > The @code{general_regs_only} attribute, if applicable, can be used > > together with the @code{always_inline} attribute to reduce inlining > > failure. > > I don't really like this attribute. > It is very specific to what you want to solve and doesn't address the > general problem, that always_inline means different things in different > code, and that is a problem for many targets, not just one. > > As has been written, in some cases it means inline always, error > whenever it is called indirectly which can't be optimized into a direct > call that can be inlined and error whenever the inlining fails for other > reasons. > > Another case, e.g. in the glibc fortify wrappers, is inline always > when the call is direct or an indirect call can be optimized into a direct > call and error when the inlining fails, but support indirect calls without > errors. > > Another case, which is most of the x86/aarch64/arm etc. intrinsics, is > inline always unless there is a target mismatch (roughly what is > actually implemented). > > Because from the always_inline attribute it is impossible to determine which > one of those it is (for the indirect calls the rule could be > gnu_inline extern inline means indirect calls are ok, anything else means > indirect calls are bad), we need new syntax to distinguish those cases. > > general_regs_only attribute doesn't seem to be it, e.g. for the glibc > fortify wrappers cases I don't see why we should forbid using floating point > in such inlines. > > So IMHO we need some new attribute for one of those, or optional parameter > to always_inline. > > For the intrinsic case, ideal would be if we could record which ISA flags > (or more generally which options) are required and which are not. Either > have some syntax where those would be explicitly specified in attribute (but > frankly that would be a maintainance nightmare), or derive those from > surrounding pragmas. Right now we have those wrapped in > #ifndef __AVX2__ > #pragma GCC push_options > #pragma GCC target("avx2") > #define __DISABLE_AVX2__ > #endif /* __AVX2__ */ > > ... > > #ifdef __DISABLE_AVX2__ > #undef __DISABLE_AVX2__ > #pragma GCC pop_options > #endif /* __DISABLE_AVX2__ */ > > The question is if the pragma GCC target right now behaves incrementally > or not, whether > #pragma GCC target("avx2") > adds -mavx2 to options if it was missing before and nothing otherwise, or if > it switches other options off. If it is incremental, we could e.g. try to > use the second least significant bit of global_options_set.x_* to mean > this option has been set explicitly by some surrounding #pragma GCC target. > The normal tests - global_options_set.x_flag_whatever could still work > fine because they wouldn't care if the option was explicit from anywhere > (command line or GCC target or target attribute) and just & 2 would mean > it was explicit from pragma GCC target; though there is the case of > bitfields... And then the inlining decision could check the & 2 flags to > see what is required and what is just from command line. > Or we can have some other pragma GCC that would be like target but would > have flags that are explicit (and could e.g. be more restricted, to ISA > options only, and let those use in addition to #pragma GCC target. I'm still curious as to what you think will break if always-inline does what it is documented to do. Richard. > Jakub >
Re: [PATCH] LTO: fallback to -flto=N if -flto=jobserver does not work.
On Thu, Apr 22, 2021 at 11:02 AM Martin Liška wrote: > > On 4/22/21 10:04 AM, Richard Biener wrote: > > On Wed, Apr 21, 2021 at 3:08 PM Martin Liška wrote: > >> > >> When -flto=jobserver is used and we cannot detect job server, then we can > >> still fallbackto -flto=N mode. > >> > >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > >> > >> Ready to be installed? > > > > I think this behavior needs to be documented - it falls back to a less > > conservative (possibly system overloading) mode - which IMHO is > > non-obvious and IMHO we shouldn't do. > > Sure, I'm sending corresponding patch. Note that it's quite common mistake > that '+' is missing in Makefile rule. That was motivation for my change. Sure, but that change won't get this fixed. IMHO we should eventually emit diagnostic like warning: could not find jobserver, compiling N jobs serially once N > 1 (or 2?). Likewise if people just use -flto and auto-detection finds nothing: warning: using serial compilation of N LTRANS jobs note: refer to http:// for how to use parallel compile using the URL diagnostics to point to -flto=... documentation. That is, teach users rather than second-guessing and eventually blowing things up. IMHO only the jobserver mode is safe to automatically use. Richard. > Martin > > > > > Richard. > > > >> Thanks, > >> Martin > >> > >> gcc/ChangeLog: > >> > >> * lto-wrapper.c (run_gcc): When -flto=jobserver is used, but the > >> makeserver cannot be detected, then use -flto=N fallback. > >> --- > >> gcc/lto-wrapper.c | 3 ++- > >> 1 file changed, 2 insertions(+), 1 deletion(-) > >> > >> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c > >> index 03a5922f8ea..0b626d7c811 100644 > >> --- a/gcc/lto-wrapper.c > >> +++ b/gcc/lto-wrapper.c > >> @@ -1585,8 +1585,9 @@ run_gcc (unsigned argc, char *argv[]) > >>if (jobserver && jobserver_error != NULL) > >> { > >> warning (0, jobserver_error); > >> - parallel = 0; > >> + /* Fall back to auto parallelism. */ > >> jobserver = 0; > >> + auto_parallel = 1; > >> } > >>else if (!jobserver && jobserver_error == NULL) > >> { > >> -- > >> 2.31.1 > >> >
Re: [PATCH][GCC 10] aarch64: Fix SVE ACLE builtins with LTO [PR99216]
Alex Coplan writes: > Hi, > > Here is a backport of my fix for PR99216. The only change w.r.t the > original patch is a bump of lto-streamer.h:LTO_minor_version. > > Bootstrapped and regtested on aarch64-linux-gnu, no issues. > > OK for GCC 10 branch? OK, thanks. Richard > As discussed in the PR, we currently have two different numbering > schemes for SVE builtins: one for C, and one for C++. This is > problematic for LTO, where we end up getting confused about which > intrinsic we're talking about. This patch inserts placeholders into the > registered_functions vector to ensure that there is a consistent > numbering scheme for both C and C++. > > This version uses integer_zero_node as a placeholder node instead of > building a function decl. This is safe because the node is only returned > by the TARGET_BUILTIN_DECL hook, which (on AArch64) is only used for > validation when builtin decls are streamed into lto1. > > gcc/ChangeLog: > > PR target/99216 > * config/aarch64/aarch64-sve-builtins.cc > (function_builder::add_function): Add placeholder_p argument, use > placeholder decls if this is set. > (function_builder::add_unique_function): Instead of conditionally adding > direct overloads, unconditionally add either a direct overload or a > placeholder. > (function_builder::add_overloaded_function): Set placeholder_p if we're > using C++ overloads. Use the obstack for string storage instead > of relying on the tree nodes. > (function_builder::add_overloaded_functions): Don't return early for > m_direct_overloads: we need to add placeholders. > * config/aarch64/aarch64-sve-builtins.h > (function_builder::add_function): Add placeholder_p argument. > * lto-streamer.h (LTO_minor_version): Bump. > > gcc/testsuite/ChangeLog: > > PR target/99216 > * g++.target/aarch64/sve/pr99216.C: New test. > > diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc > b/gcc/config/aarch64/aarch64-sve-builtins.cc > index d534ca923d9..336a1db662b 100644 > --- a/gcc/config/aarch64/aarch64-sve-builtins.cc > +++ b/gcc/config/aarch64/aarch64-sve-builtins.cc > @@ -995,12 +995,29 @@ registered_function & > function_builder::add_function (const function_instance , > const char *name, tree fntype, tree attrs, > uint64_t required_extensions, > - bool overloaded_p) > + bool overloaded_p, > + bool placeholder_p) > { >unsigned int code = vec_safe_length (registered_functions); >code = (code << AARCH64_BUILTIN_SHIFT) | AARCH64_BUILTIN_SVE; > - tree decl = simulate_builtin_function_decl (input_location, name, fntype, > - code, NULL, attrs); > + > + /* We need to be able to generate placeholders to enusre that we have a > + consistent numbering scheme for function codes between the C and C++ > + frontends, so that everything ties up in LTO. > + > + Currently, tree-streamer-in.c:unpack_ts_function_decl_value_fields > + validates that tree nodes returned by TARGET_BUILTIN_DECL are non-NULL > and > + some node other than error_mark_node. This is a holdover from when > builtin > + decls were streamed by code rather than by value. > + > + Ultimately, we should be able to remove this validation of BUILT_IN_MD > + nodes and remove the target hook. For now, however, we need to appease > the > + validation and return a non-NULL, non-error_mark_node node, so we > + arbitrarily choose integer_zero_node. */ > + tree decl = placeholder_p > +? integer_zero_node > +: simulate_builtin_function_decl (input_location, name, fntype, > + code, NULL, attrs); > >registered_function = *ggc_alloc (); >rfn.instance = instance; > @@ -1032,7 +1049,7 @@ function_builder::add_unique_function (const > function_instance , > argument_types.address ()); >tree attrs = get_attributes (instance); >registered_function = add_function (instance, name, fntype, attrs, > -required_extensions, false); > +required_extensions, false, false); > >/* Enter the function into the hash table. */ >hashval_t hash = instance.hash (); > @@ -1043,16 +1060,14 @@ function_builder::add_unique_function (const > function_instance , > >/* Also add the function under its overloaded alias, if we want > a separate decl for each instance of an overloaded function. */ > - if (m_direct_overloads || force_direct_overloads) > + char *overload_name = get_name (instance, true); > + if (strcmp (name, overload_name) != 0) > { > - char *overload_name = get_name (instance, true); > - if (strcmp (name, overload_name) != 0) > - { > - /*
Re: [PATCH][libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end
On 4/21/21 7:02 PM, Alexander Monakov wrote: > On Wed, 21 Apr 2021, Tom de Vries wrote: > >>> I don't think implementing futex_wait is possible on nvptx. >>> >> >> Well, I gave it a try, attached below. Can you explain why you think >> it's not possible, or pinpoint a problem in the implementation? > > Responding only to this for now. When I said futex_wait I really meant > Linux futex wait, where the API is tied to a 32-bit futex control word > and nothing else. Your implementation works with a gomp_barrier_t that > includes more than one field. It would be confusing to call it a > "futex wait", it is not a 1:1 replacement. > > (i.e. unlike a proper futex, it can work only for gomp_barrier_t objects) Ah, I see, agreed, that makes sense. I was afraid there was some fundamental problem that I overlooked. Here's an updated version. I've tried to make it clear that the futex_wait/wake are locally used versions, not generic functionality. The main change in structure is that I'm now using the generation_to_barrier trick from the rtems port, allowing linux/bar.c to be included rather than copied (because the barrier argument is now implicit). Furthermore, I've reviewed the MEMMODELs used for the atomic accesses, and updated a few. Also now the cpu_relax from doacross.h is used. Thanks, - Tom [libgomp, nvptx] Fix hang in gomp_team_barrier_wait_end Consider the following omp fragment. ... #pragma omp target #pragma omp parallel num_threads (2) #pragma omp task ; ... This hangs at -O0 for nvptx. Investigating the behaviour gives us the following trace of events: - both threads execute GOMP_task, where they: - deposit a task, and - execute gomp_team_barrier_wake - thread 1 executes gomp_team_barrier_wait_end and, not being the last thread, proceeds to wait at the team barrier - thread 0 executes gomp_team_barrier_wait_end and, being the last thread, it calls gomp_barrier_handle_tasks, where it: - executes both tasks and marks the team barrier done - executes a gomp_team_barrier_wake which wakes up thread 1 - thread 1 exits the team barrier - thread 0 returns from gomp_barrier_handle_tasks and goes to wait at the team barrier. - thread 0 hangs. To understand why there is a hang here, it's good to understand how things are setup for nvptx. The libgomp/config/nvptx/bar.c implementation is a copy of the libgomp/config/linux/bar.c implementation, with uses of both futex_wake and do_wait replaced with uses of ptx insn bar.sync: ... if (bar->total > 1) asm ("bar.sync 1, %0;" : : "r" (32 * bar->total)); ... The point where thread 0 goes to wait at the team barrier, corresponds in the linux implementation with a do_wait. In the linux case, the call to do_wait doesn't hang, because it's waiting for bar->generation to become a certain value, and if bar->generation already has that value, it just proceeds, without any need for coordination with other threads. In the nvtpx case, the bar.sync waits until thread 1 joins it in the same logical barrier, which never happens: thread 1 is lingering in the thread pool at the thread pool barrier (using a different logical barrier), waiting to join a new team. The easiest way to fix this is to revert to the posix implementation for bar.{c,h}. That however falls back on a busy-waiting approach, and does not take advantage of the ptx bar.sync insn. Instead, we revert to the linux implementation for bar.c, and implement bar.c local functions futex_wait and futex_wake using the bar.sync insn. This is a WIP version that does not yet take performance into consideration, but instead focuses on copying a working version as completely as possible, and isolating the machine-specific changes to as few functions as possible. The bar.sync insn takes an argument specifying how many threads are participating, and that doesn't play well with the futex syntax where it's not clear in advance how many threads will be woken up. This is solved by waking up all waiting threads each time a futex_wait or futex_wake happens, and possibly going back to sleep with an updated thread count. Tested libgomp on x86_64 with nvptx accelerator, both as-is and with do_spin hardcoded to 1. libgomp/ChangeLog: 2021-04-20 Tom de Vries PR target/99555 * config/nvptx/bar.c (generation_to_barrier): New function, copied from config/rtems/bar.c. (futex_wait, futex_wake): New function. (do_spin, do_wait): New function, copied from config/linux/wait.h. (gomp_barrier_wait_end, gomp_barrier_wait_last) (gomp_team_barrier_wake, gomp_team_barrier_wait_end): (gomp_team_barrier_wait_cancel_end, gomp_team_barrier_cancel): Remove and replace with include of config/linux/bar.c. * config/nvptx/bar.h (gomp_barrier_t): Add fields waiters and lock. (gomp_barrier_init): Init new fields. * testsuite/libgomp.c-c++-common/task-detach-6.c: Remove nvptx-specific workarounds. * testsuite/libgomp.c/pr99555-1.c: Same. * testsuite/libgomp.fortran/task-detach-6.f90: Same. ---
Re: [PATCH] [i386] MASK_AVX256_SPLIT_UNALIGNED_STORE/LOAD should be cleared in opts->x_target_flags when X86_TUNE_AVX256_UNALIGNED_LOAD/STORE_OPTIMAL is enabled by target attribute.
On Thu, Apr 22, 2021 at 12:05 PM Hongtao Liu wrote: > > Hi: > Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}. > Ok for trunk? > > gcc/ChangeLog: > > PR target/100093 > * config/i386/i386-options.c (ix86_option_override_internal): > Clear MASK_AVX256_SPLIT_UNALIGNED_LOAD/STORE in x_target_flags > when X86_TUNE_AVX256_UNALIGNED_LOAD/STORE_OPTIMAL is enabled > by target attribute. > > gcc/testsuite/ChangeLog: > > PR target/100093 > * gcc.target/i386/pr100093.c: New test. OK. Thanks, Uros. > > > -- > BR, > Hongtao
[PATCH 2/2] bpf: allow BSS symbols to be global symbols
Prior to this, a BSS declaration such as: int foo; static int bar; Generates: .global foo .local foo .comm foo,4,4 .local bar .comm bar,4,4 Creating symbols: b foo 0004 b bar Both symbols are local. However, libbpf bpf_object__variable_offset rquires symbols to be STB_GLOBAL & STT_OBJECT for data section lookup. This patch makes the same declaration generate: .global foo .type foo, @object .lcomm foo,4,4 .local bar .comm bar,4,4 Creating symbols: B foo 0004 b bar And libbpf will be okay with looking up the global symbol "foo". 2021-04-22 YiFei Zhu gcc/ * config/bpf/bpf.h (ASM_OUTPUT_ALIGNED_BSS): Use .type and .lcomm. --- gcc/config/bpf/bpf.h | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h index 7ff2f310d583..55beecbcb364 100644 --- a/gcc/config/bpf/bpf.h +++ b/gcc/config/bpf/bpf.h @@ -414,9 +414,15 @@ enum reg_class Try to use asm_output_aligned_bss to implement this macro. */ #define ASM_OUTPUT_ALIGNED_BSS(FILE, DECL, NAME, SIZE, ALIGN) \ - do { \ -ASM_OUTPUT_ALIGNED_LOCAL (FILE, NAME, SIZE, ALIGN);\ - } while (0) + do \ +{ \ + ASM_OUTPUT_TYPE_DIRECTIVE (FILE, NAME, "object"); \ + fprintf ((FILE), "%s", "\t.lcomm\t");\ + assemble_name ((FILE), (NAME)); \ + fprintf ((FILE), "," HOST_WIDE_INT_PRINT_UNSIGNED ",%u\n", \ + (SIZE), (ALIGN) / BITS_PER_UNIT);\ +} \ + while (0) /*** Output and Generation of Labels. */ -- 2.31.1
[PATCH 1/2] bpf: align function entry point to 64 bits
Libbpf does not treat paddings after functions well. If function symbols does not cover a whole text section, it will emit error similar to: libbpf: sec '.text': failed to find program symbol at offset 56 Each instruction in BPF is a multiple of 8 bytes, so align the functions to 8 bytes, similar to how clang does it. 2021-04-22 YiFei Zhu gcc/ * config/bpf/bpf.h (FUNCTION_BOUNDARY): Set to 64. --- gcc/config/bpf/bpf.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/gcc/config/bpf/bpf.h b/gcc/config/bpf/bpf.h index ef0123b56a63..7ff2f310d583 100644 --- a/gcc/config/bpf/bpf.h +++ b/gcc/config/bpf/bpf.h @@ -57,8 +57,8 @@ 64-bit at any time. */ #define STACK_BOUNDARY 64 -/* Function entry points are aligned to 128 bits. */ -#define FUNCTION_BOUNDARY 128 +/* Function entry points are aligned to 64 bits. */ +#define FUNCTION_BOUNDARY 64 /* Maximum alignment required by data of any type. */ #define BIGGEST_ALIGNMENT 64 -- 2.31.1
[PATCH] [i386] MASK_AVX256_SPLIT_UNALIGNED_STORE/LOAD should be cleared in opts->x_target_flags when X86_TUNE_AVX256_UNALIGNED_LOAD/STORE_OPTIMAL is enabled by target attribute.
Hi: Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}. Ok for trunk? gcc/ChangeLog: PR target/100093 * config/i386/i386-options.c (ix86_option_override_internal): Clear MASK_AVX256_SPLIT_UNALIGNED_LOAD/STORE in x_target_flags when X86_TUNE_AVX256_UNALIGNED_LOAD/STORE_OPTIMAL is enabled by target attribute. gcc/testsuite/ChangeLog: PR target/100093 * gcc.target/i386/pr100093.c: New test. -- BR, Hongtao From 6944526e088df6c4ef73ab797ee90f99c0ad566a Mon Sep 17 00:00:00 2001 From: liuhongt Date: Fri, 16 Apr 2021 11:29:10 +0800 Subject: [PATCH] [i386] MASK_AVX256_SPLIT_UNALIGNED_STORE/LOAD should be cleared in opts->x_target_flags when X86_TUNE_AVX256_UNALIGNED_LOAD/STORE_OPTIMAL is enabled by target attribute. gcc/ChangeLog: PR target/100093 * config/i386/i386-options.c (ix86_option_override_internal): Clear MASK_AVX256_SPLIT_UNALIGNED_LOAD/STORE in x_target_flags when X86_TUNE_AVX256_UNALIGNED_LOAD/STORE_OPTIMAL is enabled by target attribute. gcc/testsuite/ChangeLog: PR target/100093 * gcc.target/i386/pr100093.c: New test. --- gcc/config/i386/i386-options.c | 7 +++ gcc/testsuite/gcc.target/i386/pr100093.c | 12 2 files changed, 19 insertions(+) create mode 100644 gcc/testsuite/gcc.target/i386/pr100093.c diff --git a/gcc/config/i386/i386-options.c b/gcc/config/i386/i386-options.c index 91da2849c49..ba5f275e1e3 100644 --- a/gcc/config/i386/i386-options.c +++ b/gcc/config/i386/i386-options.c @@ -2853,9 +2853,16 @@ ix86_option_override_internal (bool main_args_p, if (!ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL] && !(opts_set->x_target_flags & MASK_AVX256_SPLIT_UNALIGNED_LOAD)) opts->x_target_flags |= MASK_AVX256_SPLIT_UNALIGNED_LOAD; + else if (!main_args_p + && ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL]) +opts->x_target_flags &= ~MASK_AVX256_SPLIT_UNALIGNED_LOAD; + if (!ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL] && !(opts_set->x_target_flags & MASK_AVX256_SPLIT_UNALIGNED_STORE)) opts->x_target_flags |= MASK_AVX256_SPLIT_UNALIGNED_STORE; + else if (!main_args_p + && ix86_tune_features[X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL]) +opts->x_target_flags &= ~MASK_AVX256_SPLIT_UNALIGNED_STORE; /* Enable 128-bit AVX instruction generation for the auto-vectorizer. */ diff --git a/gcc/testsuite/gcc.target/i386/pr100093.c b/gcc/testsuite/gcc.target/i386/pr100093.c new file mode 100644 index 000..f32a4bc8f2e --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/pr100093.c @@ -0,0 +1,12 @@ +/* PR target/100093 */ +/* { dg-do compile } */ +/* { dg-options "-O3 -march=znver1" } */ +/* { dg-final { scan-assembler-not "vextractf128" } } */ + +__attribute__((target("tune=skylake-avx512"))) +void fill_avx2(double *__restrict__ data, int n, double value) +{ +for (int i = 0; i < n * 16; i++) { +data[i] = value; +} +} -- 2.18.1
Re: [PATCH] [libstdc++] Fix "bare" notifications dropped by waiters check
On 21/04/21 18:29 -0700, Thomas Rodgers wrote: From: Thomas Rodgers NOTE - This patch also needs to be backported to gcc-11 in order for semaphore release() to work correctly on non-futex platforms. Tested sparc-sun-solaris2.11 For types that track whether or not there extant waiters (e.g. semaphore) internally, the __atomic_notify_address_bare() call was introduced to avoid the overhead of loading the atomic count of waiters. For platforms that don't have Futex, however, there was still a check for waiters, and seeing that there are none (because in the bare case, the count is not incremented), the notification is dropped. This commit addresses that case. libstdc++-v3/ChangeLog: * include/bits/atomic_wait.h: Always notify waiters in the in the case of 'bare' address notification. Repeated text: "in the in the" --- libstdc++-v3/include/bits/atomic_wait.h | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/libstdc++-v3/include/bits/atomic_wait.h b/libstdc++-v3/include/bits/atomic_wait.h index 0ac5575190c..984ed70f16c 100644 --- a/libstdc++-v3/include/bits/atomic_wait.h +++ b/libstdc++-v3/include/bits/atomic_wait.h @@ -226,9 +226,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } void - _M_notify(const __platform_wait_t* __addr, bool __all) noexcept + _M_notify(const __platform_wait_t* __addr, bool __all, bool __bare) noexcept { - if (!_M_waiting()) + if (!(__bare || _M_waiting())) Maybe it's just me, but I would find (!__base && !__waiting) to be a clearer expression of the logic here. i.e. "return if the wait is not bare and there are no waiters" rather than "return if the wait is not either bare or has waiters". The latter makes me take a second to grok. The patch is OK either way though, with the ChangeLog typo fix. return; #ifdef _GLIBCXX_HAVE_PLATFORM_WAIT @@ -304,11 +304,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } void - _M_notify(bool __all) + _M_notify(bool __all, bool __bare = false) { if (_M_addr == &_M_w._M_ver) __atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL); - _M_w._M_notify(_M_addr, __all); + _M_w._M_notify(_M_addr, __all, __bare); } template
[Patch, committed] gfortran.dg/pr68078.f90: Avoid increasing RLIMIT_AS
The test did fail when the virtual memory already had a hard limit != unlimited. Committed as r12-57-gfaf7d413a3f3337be1a3ac5cdf33e0e3b87b426e Tobias - Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf commit faf7d413a3f3337be1a3ac5cdf33e0e3b87b426e Author: Tobias Burnus Date: Thu Apr 22 11:05:17 2021 +0200 gfortran.dg/pr68078.f90: Avoid increasing RLIMIT_AS pr68078.f90 tests out-of-memory handling and calls set_vm_limit to set the soft limit. However, setrlimit was then called with hard limit RLIM_INFINITY, which failed when the current hard limit was lower. gcc/testsuite/ * gfortran.dg/set_vm_limit.c (set_vm_limit): Call getrlimit, use obtained hard limit, and only call setrlimit if new softlimit is lower. diff --git a/gcc/testsuite/gfortran.dg/set_vm_limit.c b/gcc/testsuite/gfortran.dg/set_vm_limit.c index 30c4b43e0ed..8344f6fd079 100644 --- a/gcc/testsuite/gfortran.dg/set_vm_limit.c +++ b/gcc/testsuite/gfortran.dg/set_vm_limit.c @@ -8,9 +8,20 @@ void set_vm_limit (int vm_limit) { - struct rlimit rl = { vm_limit, RLIM_INFINITY }; + struct rlimit rl; int r; + r = getrlimit (RLIMIT_AS, ); + if (r) +{ + perror ("get_vm_limit"); + exit (1); +} + + if (vm_limit >= rl.rlim_cur) +return; + + rl.rlim_cur = vm_limit; r = setrlimit (RLIMIT_AS, ); if (r) {
Re: [PATCH] LTO: fallback to -flto=N if -flto=jobserver does not work.
On 4/22/21 10:04 AM, Richard Biener wrote: > On Wed, Apr 21, 2021 at 3:08 PM Martin Liška wrote: >> >> When -flto=jobserver is used and we cannot detect job server, then we can >> still fallbackto -flto=N mode. >> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. >> >> Ready to be installed? > > I think this behavior needs to be documented - it falls back to a less > conservative (possibly system overloading) mode - which IMHO is > non-obvious and IMHO we shouldn't do. Sure, I'm sending corresponding patch. Note that it's quite common mistake that '+' is missing in Makefile rule. That was motivation for my change. Martin > > Richard. > >> Thanks, >> Martin >> >> gcc/ChangeLog: >> >> * lto-wrapper.c (run_gcc): When -flto=jobserver is used, but the >> makeserver cannot be detected, then use -flto=N fallback. >> --- >> gcc/lto-wrapper.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c >> index 03a5922f8ea..0b626d7c811 100644 >> --- a/gcc/lto-wrapper.c >> +++ b/gcc/lto-wrapper.c >> @@ -1585,8 +1585,9 @@ run_gcc (unsigned argc, char *argv[]) >>if (jobserver && jobserver_error != NULL) >> { >> warning (0, jobserver_error); >> - parallel = 0; >> + /* Fall back to auto parallelism. */ >> jobserver = 0; >> + auto_parallel = 1; >> } >>else if (!jobserver && jobserver_error == NULL) >> { >> -- >> 2.31.1 >> >From b81f5288fc81256e63d44e00793e2732eb2a2d88 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Thu, 22 Apr 2021 10:59:51 +0200 Subject: [PATCH] Document -flto=jobserver fallback. gcc/ChangeLog: * doc/invoke.texi: Document what happens when GCC can't detect make's job server with -flto=jobserver. --- gcc/doc/invoke.texi | 3 +++ 1 file changed, 3 insertions(+) diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index e98b0962b9f..1cfff4c14a9 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -12340,6 +12340,9 @@ Use @option{-flto=auto} to use GNU make's job server, if available, or otherwise fall back to autodetection of the number of CPU threads present in your system. +If you specify @option{-flto=jobserver} and make's job server can't be detected, +then well fall back to the number of CPU threads present in your system. + @item -flto-partition=@var{alg} @opindex flto-partition Specify the partitioning algorithm used by the link-time optimizer. -- 2.31.1
Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute
On Wed, Apr 21, 2021 at 06:01:07PM -0700, H.J. Lu via Gcc-patches wrote: > How about this? > > @item general_regs_only > @cindex @code{general_regs_only} function attribute, x86 > The @code{general_regs_only} function attribute informs the compiler > that the function uses only general purpose registers. When the > compiler inlines a function with the @code{always_inline} attribute, > target-specific compilation options may lead to inline failures. > The @code{general_regs_only} attribute, if applicable, can be used > together with the @code{always_inline} attribute to reduce inlining > failure. I don't really like this attribute. It is very specific to what you want to solve and doesn't address the general problem, that always_inline means different things in different code, and that is a problem for many targets, not just one. As has been written, in some cases it means inline always, error whenever it is called indirectly which can't be optimized into a direct call that can be inlined and error whenever the inlining fails for other reasons. Another case, e.g. in the glibc fortify wrappers, is inline always when the call is direct or an indirect call can be optimized into a direct call and error when the inlining fails, but support indirect calls without errors. Another case, which is most of the x86/aarch64/arm etc. intrinsics, is inline always unless there is a target mismatch (roughly what is actually implemented). Because from the always_inline attribute it is impossible to determine which one of those it is (for the indirect calls the rule could be gnu_inline extern inline means indirect calls are ok, anything else means indirect calls are bad), we need new syntax to distinguish those cases. general_regs_only attribute doesn't seem to be it, e.g. for the glibc fortify wrappers cases I don't see why we should forbid using floating point in such inlines. So IMHO we need some new attribute for one of those, or optional parameter to always_inline. For the intrinsic case, ideal would be if we could record which ISA flags (or more generally which options) are required and which are not. Either have some syntax where those would be explicitly specified in attribute (but frankly that would be a maintainance nightmare), or derive those from surrounding pragmas. Right now we have those wrapped in #ifndef __AVX2__ #pragma GCC push_options #pragma GCC target("avx2") #define __DISABLE_AVX2__ #endif /* __AVX2__ */ ... #ifdef __DISABLE_AVX2__ #undef __DISABLE_AVX2__ #pragma GCC pop_options #endif /* __DISABLE_AVX2__ */ The question is if the pragma GCC target right now behaves incrementally or not, whether #pragma GCC target("avx2") adds -mavx2 to options if it was missing before and nothing otherwise, or if it switches other options off. If it is incremental, we could e.g. try to use the second least significant bit of global_options_set.x_* to mean this option has been set explicitly by some surrounding #pragma GCC target. The normal tests - global_options_set.x_flag_whatever could still work fine because they wouldn't care if the option was explicit from anywhere (command line or GCC target or target attribute) and just & 2 would mean it was explicit from pragma GCC target; though there is the case of bitfields... And then the inlining decision could check the & 2 flags to see what is required and what is just from command line. Or we can have some other pragma GCC that would be like target but would have flags that are explicit (and could e.g. be more restricted, to ISA options only, and let those use in addition to #pragma GCC target. Jakub
Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.
There's an updated version of the patch, Jonathan noticed correctly the comment related to assert was not correct. Martin >From e035fd0549ea17ab4f8d71488f577fd1e4077fd9 Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Fri, 12 Mar 2021 14:32:07 +0100 Subject: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX. gcc/cp/ChangeLog: * cp-tree.h (STATIC_ASSERT): Prefer static assert. * lex.c (init_operators): Remove run-time check. --- gcc/cp/cp-tree.h | 3 +++ gcc/cp/lex.c | 2 -- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index 23a77a2b2e0..cb254e0adea 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -5922,6 +5922,9 @@ enum ovl_op_code { OVL_OP_MAX }; +/* Make sure it fits in lang_decl_fn::ovl_op_code. */ +STATIC_ASSERT (OVL_OP_MAX < (1 << 6)); + struct GTY(()) ovl_op_info_t { /* The IDENTIFIER_NODE for the operator. */ tree identifier; diff --git a/gcc/cp/lex.c b/gcc/cp/lex.c index 73e14b8394c..43abd019e6e 100644 --- a/gcc/cp/lex.c +++ b/gcc/cp/lex.c @@ -166,8 +166,6 @@ init_operators (void) if (op_ptr->name) { - /* Make sure it fits in lang_decl_fn::operator_code. */ - gcc_checking_assert (op_ptr->ovl_op_code < (1 << 6)); tree ident = set_operator_ident (op_ptr); if (unsigned index = IDENTIFIER_CP_INDEX (ident)) { -- 2.31.1
Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.
On Thu, 22 Apr 2021, 08:47 Martin Liška, wrote: > On 4/21/21 6:11 PM, Martin Sebor wrote: > > On 4/21/21 2:15 AM, Martin Liška wrote: > >> Hello. > >> > >> It's addressing the following Clang warning: > >> cp/lex.c:170:45: warning: result of comparison of constant 64 with > expression of type 'enum ovl_op_code' is always true > [-Wtautological-constant-out-of-range-compare] > >> > >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > >> > >> Ready to be installed? > >> Thanks, > >> Martin > >> > >> gcc/cp/ChangeLog: > >> > >> * cp-tree.h (STATIC_ASSERT): Prefer static assert. > >> * lex.c (init_operators): Remove run-time check. > >> --- > >> gcc/cp/cp-tree.h | 3 +++ > >> gcc/cp/lex.c | 2 -- > >> 2 files changed, 3 insertions(+), 2 deletions(-) > >> > >> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h > >> index 81ff375f8a5..a8f72448ea9 100644 > >> --- a/gcc/cp/cp-tree.h > >> +++ b/gcc/cp/cp-tree.h > >> @@ -5916,6 +5916,9 @@ enum ovl_op_code { > >> OVL_OP_MAX > >> }; > >> +/* Make sure it fits in lang_decl_fn::operator_code. */ > >> +STATIC_ASSERT (OVL_OP_MAX < (1 << 6)); > >> + > > > > I wonder if there's a way to test this directly by something like > > > > static_assert (number-of-bits (ovl_op_info_t::ovl_op_code) > > <= number-of-bits (lang_decl_fn::operator_code)); > > Good point, but I'm not aware of it. Maybe C++ people can chime in? > ovl_op_code is an unscoped enumeration (meaning "enum" not "enum class") with no fixed underlying type (i.e. no enum-base like ": int" or ": long" is specified) which means that the number of bits in is value representation is the number of bits needed to represent the minimum and maximum enumerators: "the values of the enumeration are the values representable by a hypothetical integer type with minimal width M such that all enumerators can be represented." There is no function/utility like number-of-bits that can tell you that from the type though.You could use std::underlying_type::type to get the integral type that the compiler used to represent it, but that will probably be 'int' in this case and so all it tells you is an upper bound of no more than 32 bits, which is not useful for this purpose.
Re: [PATCH] [libstdc++] Fix "bare" notifications dropped by waiters check
On Thu, Apr 22, 2021 at 4:29 AM Thomas Rodgers wrote: > > From: Thomas Rodgers > > NOTE - This patch also needs to be backported to gcc-11 in order for > semaphore release() to work correctly on non-futex platforms. > > Tested sparc-sun-solaris2.11 > > For types that track whether or not there extant waiters (e.g. > semaphore) internally, the __atomic_notify_address_bare() call was > introduced to avoid the overhead of loading the atomic count of > waiters. For platforms that don't have Futex, however, there was > still a check for waiters, and seeing that there are none (because > in the bare case, the count is not incremented), the notification > is dropped. This commit addresses that case. This is OK for the GCC 11 branch if approved for trunk. Richard. > libstdc++-v3/ChangeLog: > * include/bits/atomic_wait.h: Always notify waiters in the > in the case of 'bare' address notification. > --- > libstdc++-v3/include/bits/atomic_wait.h | 12 ++-- > 1 file changed, 6 insertions(+), 6 deletions(-) > > diff --git a/libstdc++-v3/include/bits/atomic_wait.h > b/libstdc++-v3/include/bits/atomic_wait.h > index 0ac5575190c..984ed70f16c 100644 > --- a/libstdc++-v3/include/bits/atomic_wait.h > +++ b/libstdc++-v3/include/bits/atomic_wait.h > @@ -226,9 +226,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION >} > >void > - _M_notify(const __platform_wait_t* __addr, bool __all) noexcept > + _M_notify(const __platform_wait_t* __addr, bool __all, bool __bare) > noexcept >{ > - if (!_M_waiting()) > + if (!(__bare || _M_waiting())) > return; > > #ifdef _GLIBCXX_HAVE_PLATFORM_WAIT > @@ -304,11 +304,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > } > > void > - _M_notify(bool __all) > + _M_notify(bool __all, bool __bare = false) > { > if (_M_addr == &_M_w._M_ver) > __atomic_fetch_add(_M_addr, 1, __ATOMIC_ACQ_REL); > - _M_w._M_notify(_M_addr, __all); > + _M_w._M_notify(_M_addr, __all, __bare); > } > > template @@ -452,7 +452,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > __atomic_notify_address(const _Tp* __addr, bool __all) noexcept > { >__detail::__bare_wait __w(__addr); > - __w._M_notify(__all); > + __w._M_notify(__all, true); > } > >// This call is to be used by atomic types which track contention > externally > @@ -464,7 +464,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION > __detail::__platform_notify(__addr, __all); > #else > __detail::__bare_wait __w(__addr); > -__w._M_notify(__all); > +__w._M_notify(__all, true); > #endif >} > _GLIBCXX_END_NAMESPACE_VERSION > -- > 2.30.2 >
Re: [PATCH v4 2/2] x86: Add general_regs_only function attribute
On Thu, Apr 22, 2021 at 3:01 AM H.J. Lu wrote: > > On Wed, Apr 21, 2021 at 4:24 PM Martin Sebor wrote: > > > > On 4/21/21 2:58 PM, H.J. Lu wrote: > > > On Wed, Apr 21, 2021 at 10:09 AM Martin Sebor wrote: > > >> > > >> On 4/14/21 4:39 PM, H.J. Lu wrote: > > >>> commit 87c753ac241f25d222d46ba1ac66ceba89d6a200 > > >>> Author: H.J. Lu > > >>> Date: Fri Aug 21 09:42:49 2020 -0700 > > >>> > > >>> x86: Add target("general-regs-only") function attribute > > >>> > > >>> is incomplete since it is impossible to call integer intrinsics from > > >>> a function with general-regs-only target attribute. > > >>> > > >>> 1. Add general_regs_only function attribute to inform the compiler that > > >>> functions use only general purpose registers. When making inlining > > >>> decisions on such functions, non-GPR compiler options are excluded. > > >>> 2. Add general_regs_only attribute to x86 intrinsics which use only > > >>> general purpose registers. > > >>> > > >> ... > > >>> --- a/gcc/doc/extend.texi > > >>> +++ b/gcc/doc/extend.texi > > >>> @@ -7066,6 +7066,11 @@ On x86 targets, the @code{fentry_section} > > >>> attribute sets the name > > >>>of the section to record function entry instrumentation calls in when > > >>>enabled with @option{-pg -mrecord-mcount} > > >>> > > >>> +@item general_regs_only > > >>> +@cindex @code{general_regs_only} function attribute, x86 > > >>> +The @code{general_regs_only} attribute on functions is used to > > >>> +inform the compiler that functions use only general purpose registers. > > >> > > >> I'll just reiterate basically the same comment as before: it's not > > >> clear from the very brief description above what the requirements > > >> are for using the attribute. I'm guessing it can be applied to > > >> any function (inline or otherwise) but only has any effect when > > >> the function is actually inlined and otherwise doesn't constrain > > >> what the function can do. (Whatever the constraints are, I think > > >> the manual should spell them out, and likewise for its effects.) > > > > > > That is correct. > > > > > >> Similarly it's not clear what should be expected when the function > > >> does use some other register. Ideally, I think GCC would check and > > >> issue a nice error message whether or not the function is inlined > > >> or called. I suspect that might only be possible for inline > > >> functions that are actually called and for which the back end must > > >> emit code. > > > > > > This is what GCC does today: > > > > > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99744 > > > > Yes, that's the rather obscure error I think I commented on before > > and suggested should be improved. Based on r99744-3.c I don't think > > this has changed in the improved patch. > > My goal is to fix the inline failures, not to improve the compiler error > message. > > > > > > >> Other than that, I'd suggest to improve the phrasing a bit: > > >> > > >> The @code{general_regs_only} function attribute indicates that > > >> the function uses only general purpose registers... [text > > >> explaining constraints and errors follows]. > > >> > > >> Martin > > > > > > How about this > > > > > > @item general_regs_only > > > @cindex @code{general_regs_only} function attribute, x86 > > > The @code{general_regs_only} attribute on functions is used to inform > > > the compiler that functions use only general purpose registers. It > > > can be used together with the @code{always_inline} attribute to avoid > > > inlining failure when there is a mismatch in compiler vector options. > > > > Without an article the part "that functions use only general purpose > > registers" is unclear and/or grammatically incorrect. What functions? > > If the function the attribute is applied to, it needs an article, e.g., > > "the function" or "a function", and singular. (Otherwise it could be > > read as talking about the functions called from the one with > > the attribute, or some other functions altogether). > > > > I tried to correct that above but, if you prefer, the following would > > be closer to your phrasing but more correct/accurate: > > > >The @code{general_regs_only} function attribute informs > >the compiler that the function uses only general purpose > >registers. > > > > I don't understand what the second sentence is trying to say, and > > without a better error message for the problem in r99744, I suspect > > few users will either. I am suggesting to explain in the text you > > are adding, under what conditions inlining might fail without > > the attribute, and what effect the attribute has on the function > > that prevents the inlining failure. > > How about this? > > @item general_regs_only > @cindex @code{general_regs_only} function attribute, x86 > The @code{general_regs_only} function attribute informs the compiler > that the function uses only general purpose registers. When the > compiler inlines a function with the @code{always_inline}
Re: [PATCH 0/3] VAX backend preparatory updates for switching to LRA
On Thu, Apr 22, 2021 at 1:22 AM Maciej W. Rozycki wrote: > > Hi, > > According to the plan discussed in the context of the recent switch to > MODE_CC of the VAX backend I have been looking into switching the backend > to LRA as well. > > It has turned out quite straightforward itself, with just a couple of > minor issues triggered with a flip to LRA, one causing a build failure > with target libatomic and another causing a C testsuite regression. Also > I have come across a piece of dead code which has never ever been used for > anything and it is unclear to me what its intended purpose was. > > I have come up with this small patch series then, bundled together for > easier reference although the individual changes are independent from each > other. > > I think 3/3 is worth backporting to GCC 11 at one point, perhaps 11.2, so > that it can be easily picked downstream, as it improves code generation > with old reload and we may not have another major release still using it. > > OTOH switching to LRA regresses code generation seriously, by making the > indexed and indirect VAX address modes severely underutilised, so while > with these changes in place the backend can be switched to LRA with just a > trivial to remove the redefinition of TARGET_LRA_P, I think it is not yet > the right time to do it. > > It is not a hard show-stopper though, so while I plan to look into LRA > now to figure out what is missing there that the old reload has to satisfy > the VAX backend, the switch to LRA can now be made anytime if so required > and I am preempted for whatever reason (and nobody else gets to it). > > Questions, comments, OK to apply? Sounds like a reasonable stance to me. The patches look all good, thus they are OK to apply. Thanks, Richard. > > Maciej
Re: [PATCH] LTO: fallback to -flto=N if -flto=jobserver does not work.
On Wed, Apr 21, 2021 at 3:08 PM Martin Liška wrote: > > When -flto=jobserver is used and we cannot detect job server, then we can > still fallbackto -flto=N mode. > > Patch can bootstrap on x86_64-linux-gnu and survives regression tests. > > Ready to be installed? I think this behavior needs to be documented - it falls back to a less conservative (possibly system overloading) mode - which IMHO is non-obvious and IMHO we shouldn't do. Richard. > Thanks, > Martin > > gcc/ChangeLog: > > * lto-wrapper.c (run_gcc): When -flto=jobserver is used, but the > makeserver cannot be detected, then use -flto=N fallback. > --- > gcc/lto-wrapper.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c > index 03a5922f8ea..0b626d7c811 100644 > --- a/gcc/lto-wrapper.c > +++ b/gcc/lto-wrapper.c > @@ -1585,8 +1585,9 @@ run_gcc (unsigned argc, char *argv[]) >if (jobserver && jobserver_error != NULL) > { > warning (0, jobserver_error); > - parallel = 0; > + /* Fall back to auto parallelism. */ > jobserver = 0; > + auto_parallel = 1; > } >else if (!jobserver && jobserver_error == NULL) > { > -- > 2.31.1 >
Re: [Patch] libgomp/testsuite: Fix checks for dg-excess-errors
On Wed, 21 Apr 2021, Tobias Burnus wrote: > This was brought up by Richard when testing libgomp with the GCC 11 > distribution > compiler, which has both nvptx and gcn enabled – but no offloading device was > available. > > This lead to fails for: > * testsuite/libgomp.c-c++-common/function-not-offloaded.c: > → function (on purpose) not marked for offline > → issue: dg-excess-errors by lto1/mkoffload not ignored > * testsuite/libgomp.c-c++-common/variable-not-offloaded.c > → likewise for variables, except that this would succeed > with unified-shared memory > → issue: dg-excess-errors by lto1/mkoffload not ignored > * libgomp/testsuite/libgomp.c/pr86416-1.c + *-2.c > → expected fail when offloading compiler does not support > long double or float128 > → issue: error output but dg-error/dg-excess-error not active > > The reason for those fails is that used effective target checks > do not cover those cases correctly. Namely: > > * offload_device - check whether an offloading device is > available & used at run time. > → problem: compilation was done for the device but here > no device is available > * offload_device_nonshared_as > → Likewise and same issue, except that also 0 is returned > if the device has unified shared memory. > > Solution: Explicitly check for nvptx/gcn offload compilation > (but ignore device availability). > [Uses 'gcc -v |grep ^OFFLOAD_TARGET_NAMES=', which honors > -foffload=, including -foffload=disable.] > > OK for mainline and GCC 11? > > > Tested without offloading support & with nvptx & > offload support (with an offloading device available). > > @Richard: I would be happy if you could confirm that this fixes > your issue. So with this I'm down to === libgomp tests === Running target unix XPASS: libgomp.c/../libgomp.c-c++-common/pr96390.c (test for excess errors) FAIL: libgomp.c/target-28.c (test for excess errors) UNRESOLVED: libgomp.c/target-28.c compilation failed to produce executable XPASS: libgomp.c++/../libgomp.c-c++-common/pr96390.c (test for excess errors) XPASS: libgomp.c++/pr96390.C (test for excess errors) FAIL: libgomp.c++/target-13.C (test for excess errors) UNRESOLVED: libgomp.c++/target-13.C compilation failed to produce executable === libgomp Summary === # of expected passes8411 # of unexpected failures2 # of unexpected successes 3 which is nice. The target-28.c FAIL is spawn -ignore SIGHUP /home/abuild/rpmbuild/BUILD/gcc-11.0.1+git10/obj-x86_64-suse-linux/./gcc/xgcc -B/home/abuild/rpmbuild/BUILD/gcc-11.0.1+git10/obj-x86_64-suse-linux/./gcc/ -B/usr/x86_64-suse-linux/bin/ -B/usr/x86_64-suse-linux/lib/ -isystem /usr/x86_64-suse-linux/include -isystem /usr/x86_64-suse-linux/sys-include -fchecking=1 ../../../../libgomp/testsuite/libgomp.c/target-28.c -B/home/abuild/rpmbuild/BUILD/gcc-11.0.1+git10/obj-x86_64-suse-linux/x86_64-suse-linux/./libgomp/ -B/home/abuild/rpmbuild/BUILD/gcc-11.0.1+git10/obj-x86_64-suse-linux/x86_64-suse-linux/./libgomp/.libs -I/home/abuild/rpmbuild/BUILD/gcc-11.0.1+git10/obj-x86_64-suse-linux/x86_64-suse-linux/./libgomp -I../../../../libgomp/testsuite/../../include -I../../../../libgomp/testsuite/.. -Lno -fmessage-length=0 -fno-diagnostics-show-caret -fdiagnostics-color=never -B/usr/lib64/gcc/x86_64-suse-linux/11 -B/usr/bin -B/usr/lib64/gcc/x86_64-suse-linux/11 -B/usr/bin -fopenmp -O2 -L/home/abuild/rpmbuild/BUILD/gcc-11.0.1+git10/obj-x86_64-suse-linux/x86_64-suse-linux/./libgomp/.libs -lm -o ./target-28.exe^M ^[[1m/tmp/ccCLBCvn.mkoffload.2.s:233:2: ^[[0m^[[0;1;31merror: ^[[0m^[[1ms.3 changed binding to STB_GLOBAL^M ^[[0m.global s.3^M ^[[0;1;32m^^M ^[[0m^[[1m/tmp/ccCLBCvn.mkoffload.2.s:234:2: ^[[0m^[[0;1;31merror: ^[[0m^[[1ms.2 changed binding to STB_GLOBAL^M ^[[0m.global s.2^M ^[[0;1;32m^^M ^[[0mmkoffload: fatal error: /usr/bin//x86_64-suse-linux-accel-amdgcn-amdhsa-gcc-11 returned 1 exit status^M compilation terminated.^M lto-wrapper: fatal error: /usr/lib64/gcc/x86_64-suse-linux/11//accel/amdgcn-amdhsa/mkoffload returned 1 exit status^M and the target-13.C one is similar. Richard.
Re: [PATCH] Use STATIC_ASSERT for OVL_OP_MAX.
On 4/21/21 6:11 PM, Martin Sebor wrote: > On 4/21/21 2:15 AM, Martin Liška wrote: >> Hello. >> >> It's addressing the following Clang warning: >> cp/lex.c:170:45: warning: result of comparison of constant 64 with >> expression of type 'enum ovl_op_code' is always true >> [-Wtautological-constant-out-of-range-compare] >> >> Patch can bootstrap on x86_64-linux-gnu and survives regression tests. >> >> Ready to be installed? >> Thanks, >> Martin >> >> gcc/cp/ChangeLog: >> >> * cp-tree.h (STATIC_ASSERT): Prefer static assert. >> * lex.c (init_operators): Remove run-time check. >> --- >> gcc/cp/cp-tree.h | 3 +++ >> gcc/cp/lex.c | 2 -- >> 2 files changed, 3 insertions(+), 2 deletions(-) >> >> diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h >> index 81ff375f8a5..a8f72448ea9 100644 >> --- a/gcc/cp/cp-tree.h >> +++ b/gcc/cp/cp-tree.h >> @@ -5916,6 +5916,9 @@ enum ovl_op_code { >> OVL_OP_MAX >> }; >> +/* Make sure it fits in lang_decl_fn::operator_code. */ >> +STATIC_ASSERT (OVL_OP_MAX < (1 << 6)); >> + > > I wonder if there's a way to test this directly by something like > > static_assert (number-of-bits (ovl_op_info_t::ovl_op_code) > <= number-of-bits (lang_decl_fn::operator_code)); Good point, but I'm not aware of it. Maybe C++ people can chime in? > > Also, since we are now compiling in C++ 11 mode, would using > static_assert be appropriate? We do: #if __cplusplus >= 201103L #define STATIC_ASSERT(X) \ static_assert ((X), #X) #else #define STATIC_ASSERT(X) \ typedef int assertion1[(X) ? 1 : -1] ATTRIBUTE_UNUSED #endif Btw. I'm going to send a patch with remove the #else branch. Martin > > Martin > > >> struct GTY(()) ovl_op_info_t { >> /* The IDENTIFIER_NODE for the operator. */ >> tree identifier; >> diff --git a/gcc/cp/lex.c b/gcc/cp/lex.c >> index 73e14b8394c..43abd019e6e 100644 >> --- a/gcc/cp/lex.c >> +++ b/gcc/cp/lex.c >> @@ -166,8 +166,6 @@ init_operators (void) >> if (op_ptr->name) >> { >> - /* Make sure it fits in lang_decl_fn::operator_code. */ >> - gcc_checking_assert (op_ptr->ovl_op_code < (1 << 6)); >> tree ident = set_operator_ident (op_ptr); >> if (unsigned index = IDENTIFIER_CP_INDEX (ident)) >> { >> >
Re: [PATCH] aarch64: Avoid duplicating bti j insns for jump tables [PR99988]
On Wed, 21 Apr 2021 at 14:05, Richard Sandiford via Gcc-patches wrote: > > Alex Coplan writes: > > Hi Richard, > > > > On 15/04/2021 18:45, Richard Sandiford wrote: > >> Looks good in general, but like you say, it's GCC 12 material. > > > > Thanks for the review. The attached patch addresses these comments and > > bootstraps/regtests OK on aarch64-linux-gnu. OK for trunk? > > OK, thanks. > The new test fails with -mabi=ilp32: sorry, unimplemented: return address signing is only supported for '-mabi=lp64' Is that OK: diff --git a/gcc/testsuite/gcc.target/aarch64/pr99988.c b/gcc/testsuite/gcc.target/aarch64/pr99988.c index 2d87f41..7cca496 100644 --- a/gcc/testsuite/gcc.target/aarch64/pr99988.c +++ b/gcc/testsuite/gcc.target/aarch64/pr99988.c @@ -1,4 +1,4 @@ -/* { dg-do compile } */ +/* { dg-do compile { target lp64 } } */ /* { dg-options "-O2 -mbranch-protection=standard" } */ /* { dg-final { scan-assembler-times {bti j} 13 } } */ int a; Thanks, Christophe
Re: [PATCH] Use hardware_concurrency only if _GLIBCXX_HAS_GTHREADS
On 4/21/21 10:16 PM, Jakub Jelinek wrote: > On Wed, Apr 21, 2021 at 08:53:54PM +0100, Jonathan Wakely wrote: What would be IMHO a good idea would be to use configure test for #include int t = std::thread::hardware_concurrency (); and in that case use that as a fallback to the previous implementation, that will be strictly an improvement. >>> >>> That does not make sense to me. Original motivation was removal of the >>> complicated >>> implementation with a C++ standard function. > > The reason it is larger and more complicated is that it answers a different > question than std::thread::hardware_concurrency(). > hardware_concurrency() says how many threads the hw has available, if we > wanted just that, you could just keep the sysconf call. > The current implementation answers the question how many threads can this > process and its children be scheduled on. > Consider some NUMA box with hundreds of threads, but where it might > be desirable to keep specific compilations limited to some cores and do > other compilations or whatever else on other cores. > > Sure, hardware_concurrency() answer is better than no information about the > concurrency at all. But I don't really see the costs of the current > implementation, it is < 100 lines of code that have been written once and > it is unlikely they will need changing - at this point the interfaces are > fairly stable, haven't been changed on the glibc side for years. It is true > that when it was introduced in glibc, there have been 2 big changes to it > soon. All right, I understand that hardware_concurrency doesn't match exactly our needs. Thus I'm going to install the following patch that removes the FIXME. Thanks for help with that, Martin > > Jakub > >From dbe87a4229126829e0a912e9928c82b8dde7b51c Mon Sep 17 00:00:00 2001 From: Martin Liska Date: Thu, 22 Apr 2021 09:14:28 +0200 Subject: [PATCH] Remove not feasible FIXME gcc/ChangeLog: * lto-wrapper.c: Remove FIXME about usage of hardware_concurrency. The function is not on par with what we have now. --- gcc/lto-wrapper.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c index 0b626d7c811..49894e4fc61 100644 --- a/gcc/lto-wrapper.c +++ b/gcc/lto-wrapper.c @@ -1304,8 +1304,6 @@ init_num_threads (void) #endif } -/* FIXME: once using -std=c++11, we can use std::thread::hardware_concurrency. */ - /* Test and return reason why a jobserver cannot be detected. */ static const char * -- 2.31.1
Re: [PATCH] Use std::thread::hardware_concurrency in lto-wrapper.c.
> Anyway, that's why we have a C++ ISO standard and so > std::thread::hardware_concurrency function implementation can (and likely > should) handle all this. Or? This is a compiler though, i.e. quite low level in the software stack, so the implementation must be conservative and thus fancy C++ features avoided. -- Eric Botcazou
Re: [PATCH] Use hardware_concurrency only if _GLIBCXX_HAS_GTHREADS
On 4/21/21 9:53 PM, Jonathan Wakely wrote: > On Wed, 21 Apr 2021 at 20:41, Martin Liška wrote: >> >> On 4/21/21 9:15 PM, Jakub Jelinek wrote: >>> On Wed, Apr 21, 2021 at 08:28:55PM +0200, Jakub Jelinek via Gcc-patches >>> wrote: > There's a patch attempt for the problem with > std::thread::hardware_concurrency where > it's used only if _GLIBCXX_HAS_GTHREADS is set. > > Does it help? > Thanks, > Martin > > gcc/ChangeLog: > > PR bootstrap/100186 > * lto-wrapper.c: Use hardware_concurrency only if > _GLIBCXX_HAS_GTHREADS. > --- > gcc/lto-wrapper.c | 6 +- > 1 file changed, 5 insertions(+), 1 deletion(-) > > diff --git a/gcc/lto-wrapper.c b/gcc/lto-wrapper.c > index 6ba401007f6..8a85b3e93a8 100644 > --- a/gcc/lto-wrapper.c > +++ b/gcc/lto-wrapper.c > @@ -1285,7 +1285,11 @@ run_gcc (unsigned argc, char *argv[]) >static char current_dir[] = { '.', DIR_SEPARATOR, '\0' }; > >/* Number of CPUs that can be used for parallel LTRANS phase. */ > - unsigned long nthreads_var = std::thread::hardware_concurrency (); > + unsigned long nthreads_var = 0; > + > +#ifdef _GLIBCXX_HAS_GTHREADS > + nthreads_var = std::thread::hardware_concurrency (); > +#endif _GLIBCXX_HAS_GTHREADS is a libstdc++ internal macro, it shouldn't be used outside of libstdc++. And, when using some other compiler or standard C++ library, nthreads_var will be always 0. That isn't an improvement. >>> >>> What would be IMHO a good idea would be to use configure test for >>> #include >>> int t = std::thread::hardware_concurrency (); >>> and in that case use that as a fallback to the previous implementation, >>> that will be strictly an improvement. >> >> That does not make sense to me. Original motivation was removal of the >> complicated >> implementation with a C++ standard function. >> >> @Jonathan: >> >> What standard says about usage of std::thread::hardware_concurrency() [1]? >> The function is defined in C++11, how can one detect if threading is enabled >> in the C++ library and use it? > > A configure test. > > The standard says it's required to exist, but libstdc++ has > traditionally not defined things that are not supportable on the > target. Which seems logical to me. > So for targets with no multithreading, we didn't define > anything in . Even though that has changed in current releases > (so that std::thread exists even if it's useless), that won't help you > with previous releases. You need a configure test. All right. > > As I said in the PR, opinions differ on whether it's better to define > everything (even if it's useless) or not define them. Arguably, users > on small embedded targets do not want the library to waste space > defining a useless std::thread class that just throws an exception if > you ever try to create a new thread (but that's what the standard > requires). > I think defining only things that work is the right way. Martin
Re: [PATCH] Use std::thread::hardware_concurrency in lto-wrapper.c.
> This patch broke bootstrap on AIX. > > std::thread is not provided in all instances. GCC is not compiled > multi-threaded by default. Right, this will very likely break on Windows too. -- Eric Botcazou
[PATCH] arm: remove error in CPP_SPEC when float-abi soft and hard are used together
arm.h has had this error message since 1997, and was never updated to take softfp into account. Anyway, it seems it was useful long ago, but it is no longer needed since option parsing has been improved: -mfloat-abi is handled via arm.opt and updates the var_float_abi variable. So, the last instance of -mfloat-abi= on the command line wins. This patch just removes this error message, thus enabling many more tests to pass on arm-eabi: * with -mcpu=cortex-a7/-mfloat-abi=soft/-march=armv7ve+simd (2 more passes) gcc.target/arm/pr52375.c g++.target/arm/pr99593.C (test for excess errors) * with -mthumb/-mfloat-abi=soft/-march=armv6s-m (115 more passes in C, 90 more in C++) gcc.target/arm/armv8_1m-fp16-move-1.c (test for excess errors) gcc.target/arm/armv8_1m-fp32-move-1.c (test for excess errors) gcc.target/arm/armv8_1m-fp64-move-1.c (test for excess errors) gcc.target/arm/armv8_2-fp16-move-1.c (test for excess errors) gcc.target/arm/cortex-m55-nodsp-flag-hard.c (test for excess errors) gcc.target/arm/cortex-m55-nofp-flag-hard.c (test for excess errors) gcc.target/arm/cortex-m55-nomve-flag-hard.c (test for excess errors) gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c (test for excess errors) g++.target/arm/no_unique_address_1.C g++.target/arm/no_unique_address_2.C * with -mthumb/-mfloat-abi=soft/-march=armv7-m (153 more passes in C, 90 more in C++) gcc.dg/pr59418.c (test for excess errors) gcc.target/arm/armv8_1m-fp16-move-1.c (test for excess errors) gcc.target/arm/armv8_1m-fp32-move-1.c (test for excess errors) gcc.target/arm/armv8_1m-fp64-move-1.c (test for excess errors) gcc.target/arm/armv8_2-fp16-move-1.c (test for excess errors) gcc.target/arm/bfloat16_scalar_2_1.c (test for excess errors) gcc.target/arm/bfloat16_scalar_3_1.c (test for excess errors) gcc.target/arm/cortex-m55-nodsp-flag-hard.c (test for excess errors) gcc.target/arm/cortex-m55-nofp-flag-hard.c (test for excess errors) gcc.target/arm/cortex-m55-nomve-flag-hard.c (test for excess errors) gcc.target/arm/cortex-m55-nomve.fp-flag-hard.c (test for excess errors) gcc.target/arm/pr52375.c (test for excess errors) gcc.target/arm/simd/vld1_bf16_1.c (test for excess errors) gcc.target/arm/simd/vldn_lane_bf16_1.c (test for excess errors) gcc.target/arm/simd/vst1_bf16_1.c (test for excess errors) gcc.target/arm/simd/vstn_lane_bf16_1.c (test for excess errors) g++.target/arm/no_unique_address_1.C g++.target/arm/no_unique_address_2.C * with -mthumb/-mfloat-abi=hard/-march=armv7e-m+fp (65 more passes) gcc.target/arm/atomic-comp-swap-release-acquire-3.c (test for excess errors) gcc.target/arm/atomic-comp-swap-release-acquire-3.c scan-assembler-not dmb gcc.target/arm/atomic-comp-swap-release-acquire-3.c scan-assembler-times ldaex 4 gcc.target/arm/atomic-comp-swap-release-acquire-3.c scan-assembler-times stlex 4 gcc.target/arm/atomic-op-acq_rel-3.c (test for excess errors) gcc.target/arm/atomic-op-acq_rel-3.c scan-assembler-not dmb gcc.target/arm/atomic-op-acq_rel-3.c scan-assembler-times ldaex\tr[0-9]+, \\[r[0-9]+\\] 6 gcc.target/arm/atomic-op-acq_rel-3.c scan-assembler-times stlex\t...?, r[0-9]+, \\[r[0-9]+\\] 6 gcc.target/arm/atomic-op-acquire-3.c (test for excess errors) gcc.target/arm/atomic-op-acquire-3.c scan-assembler-not dmb gcc.target/arm/atomic-op-acquire-3.c scan-assembler-times ldaex\tr[0-9]+, \\[r[0-9]+\\] 6 gcc.target/arm/atomic-op-acquire-3.c scan-assembler-times strex\t...?, r[0-9]+, \\[r[0-9]+\\] 6 gcc.target/arm/atomic-op-char-3.c (test for excess errors) gcc.target/arm/atomic-op-char-3.c scan-assembler-not dmb gcc.target/arm/atomic-op-char-3.c scan-assembler-times ldrexb\tr[0-9]+, \\[r[0-9]+\\] 6 gcc.target/arm/atomic-op-char-3.c scan-assembler-times strexb\t...?, r[0-9]+, \\[r[0-9]+\\] 6 gcc.target/arm/atomic-op-consume-3.c (test for excess errors) gcc.target/arm/atomic-op-consume-3.c scan-assembler-not dmb gcc.target/arm/atomic-op-consume-3.c scan-assembler-times ldaex\tr[0-9]+, \\[r[0-9]+\\] 6 gcc.target/arm/atomic-op-consume-3.c scan-assembler-times strex\t...?, r[0-9]+, \\[r[0-9]+\\] 6 gcc.target/arm/atomic-op-int-3.c (test for excess errors) gcc.target/arm/atomic-op-int-3.c scan-assembler-not dmb gcc.target/arm/atomic-op-int-3.c scan-assembler-times ldrex\tr[0-9]+, \\[r[0-9]+\\] 6 gcc.target/arm/atomic-op-int-3.c scan-assembler-times strex\t...?, r[0-9]+, \\[r[0-9]+\\] 6 gcc.target/arm/atomic-op-relaxed-3.c (test for excess errors) gcc.target/arm/atomic-op-relaxed-3.c scan-assembler-not dmb gcc.target/arm/atomic-op-relaxed-3.c scan-assembler-times ldrex\tr[0-9]+, \\[r[0-9]+\\] 6 gcc.target/arm/atomic-op-relaxed-3.c scan-assembler-times strex\t...?, r[0-9]+, \\[r[0-9]+\\] 6 gcc.target/arm/atomic-op-release-3.c (test for excess errors) gcc.target/arm/atomic-op-release-3.c scan-assembler-not dmb gcc.target/arm/atomic-op-release-3.c scan-assembler-times ldrex\tr[0-9]+, \\[r[0-9]+\\] 6 gcc.target/arm/atomic-op-release-3.c scan-assembler-times stlex\t...?, r[0-9]+, \\[r[0-9]+\\] 6
Re: [PATCH] Fix target/100106 ICE in gen_movdi
On Thu, 22 Apr 2021, Bernd Edlinger wrote: > Aehm, > > forgot to mention, > > tested on arm-none-eabi (arm-sim target) > is it OK for trunk? OK next week (when RC2 looks good). Thanks, Richard. > > Thanks > Bernd. > > On 4/22/21 8:37 AM, Bernd Edlinger wrote: > > As the test case shows, the outer mode may have a higher alignment > > requirement than the inner mode here. > > > > 2021-04-22 Bernd Edlinger > > > > PR target/100106 > > * gimplify-rtx.c (simplify_context::simplify_subreg): Check the > > memory alignment for the outer mode. > > > > * gcc.c-torture/compile/pr100106.c: New testcase. > > --- > > gcc/simplify-rtx.c | 1 + > > gcc/testsuite/gcc.c-torture/compile/pr100106.c | 11 +++ > > 2 files changed, 12 insertions(+) > > create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr100106.c > > > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > > index d13c390..ad3b7b2 100644 > > --- a/gcc/simplify-rtx.c > > +++ b/gcc/simplify-rtx.c > > @@ -7217,6 +7217,7 @@ simplify_context::simplify_subreg (machine_mode > > outermode, rtx op, > > have instruction to move the whole thing. */ > >&& (! MEM_VOLATILE_P (op) > > || ! have_insn_for (SET, innermode)) > > + && !(STRICT_ALIGNMENT && MEM_ALIGN (op) < GET_MODE_ALIGNMENT > > (outermode)) > >&& known_le (outersize, innersize)) > > return adjust_address_nv (op, outermode, byte); > > > > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr100106.c > > b/gcc/testsuite/gcc.c-torture/compile/pr100106.c > > new file mode 100644 > > index 000..7f98b4f > > --- /dev/null > > +++ b/gcc/testsuite/gcc.c-torture/compile/pr100106.c > > @@ -0,0 +1,11 @@ > > +union a { > > + float _Complex b; > > + long long c; > > +}; > > + > > +void g(union a); > > + > > +void e() { > > + union a f = {1.0f}; > > + g(f); > > +} > > > -- Richard Biener SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
Re: [PATCH] Fix target/100106 ICE in gen_movdi
Aehm, forgot to mention, tested on arm-none-eabi (arm-sim target) is it OK for trunk? Thanks Bernd. On 4/22/21 8:37 AM, Bernd Edlinger wrote: > As the test case shows, the outer mode may have a higher alignment > requirement than the inner mode here. > > 2021-04-22 Bernd Edlinger > > PR target/100106 > * gimplify-rtx.c (simplify_context::simplify_subreg): Check the > memory alignment for the outer mode. > > * gcc.c-torture/compile/pr100106.c: New testcase. > --- > gcc/simplify-rtx.c | 1 + > gcc/testsuite/gcc.c-torture/compile/pr100106.c | 11 +++ > 2 files changed, 12 insertions(+) > create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr100106.c > > diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c > index d13c390..ad3b7b2 100644 > --- a/gcc/simplify-rtx.c > +++ b/gcc/simplify-rtx.c > @@ -7217,6 +7217,7 @@ simplify_context::simplify_subreg (machine_mode > outermode, rtx op, > have instruction to move the whole thing. */ >&& (! MEM_VOLATILE_P (op) > || ! have_insn_for (SET, innermode)) > + && !(STRICT_ALIGNMENT && MEM_ALIGN (op) < GET_MODE_ALIGNMENT > (outermode)) >&& known_le (outersize, innersize)) > return adjust_address_nv (op, outermode, byte); > > diff --git a/gcc/testsuite/gcc.c-torture/compile/pr100106.c > b/gcc/testsuite/gcc.c-torture/compile/pr100106.c > new file mode 100644 > index 000..7f98b4f > --- /dev/null > +++ b/gcc/testsuite/gcc.c-torture/compile/pr100106.c > @@ -0,0 +1,11 @@ > +union a { > + float _Complex b; > + long long c; > +}; > + > +void g(union a); > + > +void e() { > + union a f = {1.0f}; > + g(f); > +} >
[PATCH] Fix target/100106 ICE in gen_movdi
As the test case shows, the outer mode may have a higher alignment requirement than the inner mode here. 2021-04-22 Bernd Edlinger PR target/100106 * gimplify-rtx.c (simplify_context::simplify_subreg): Check the memory alignment for the outer mode. * gcc.c-torture/compile/pr100106.c: New testcase. --- gcc/simplify-rtx.c | 1 + gcc/testsuite/gcc.c-torture/compile/pr100106.c | 11 +++ 2 files changed, 12 insertions(+) create mode 100644 gcc/testsuite/gcc.c-torture/compile/pr100106.c diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index d13c390..ad3b7b2 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -7217,6 +7217,7 @@ simplify_context::simplify_subreg (machine_mode outermode, rtx op, have instruction to move the whole thing. */ && (! MEM_VOLATILE_P (op) || ! have_insn_for (SET, innermode)) + && !(STRICT_ALIGNMENT && MEM_ALIGN (op) < GET_MODE_ALIGNMENT (outermode)) && known_le (outersize, innersize)) return adjust_address_nv (op, outermode, byte); diff --git a/gcc/testsuite/gcc.c-torture/compile/pr100106.c b/gcc/testsuite/gcc.c-torture/compile/pr100106.c new file mode 100644 index 000..7f98b4f --- /dev/null +++ b/gcc/testsuite/gcc.c-torture/compile/pr100106.c @@ -0,0 +1,11 @@ +union a { + float _Complex b; + long long c; +}; + +void g(union a); + +void e() { + union a f = {1.0f}; + g(f); +} -- 1.9.1