[Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148 --- Comment #3 from cuilili --- I reproduced S1244 regression on znver3. Src code: for (int i = 0; i < LEN_1D-1; i++) { a[i] = b[i] + c[i] * c[i] + b[i] * b[i] + c[i]; d[i] = a[i] + a[i+1]; } Base version: Base + commit version: Assembler Assembler Loop1:Loop1: vmovsd 0x60c400(%rax),%xmm2 vmovsd 0x60ba00(%rax),%xmm2 vmovsd 0x60ba00(%rax),%xmm1 vmovsd 0x60c400(%rax),%xmm1 add$0x8,%rax add$0x8,%rax vaddsd %xmm1,%xmm2,%xmm0 vmovsd %xmm2,%xmm2,%xmm0 vmulsd %xmm2,%xmm2,%xmm2 vfmadd132sd %xmm2,%xmm1,%xmm0 vfmadd132sd %xmm1,%xmm2,%xmm1 vfmadd132sd %xmm1,%xmm2,%xmm1 vaddsd %xmm1,%xmm0,%xmm0 vaddsd %xmm1,%xmm0,%xmm0 vmovsd %xmm0,0x60cdf8(%rax) vmovsd %xmm0,0x60cdf8(%rax) vaddsd 0x60ce00(%rax),%xmm0,%xmm0 vaddsd 0x60ce00(%rax),%xmm0,%xmm0 vmovsd %xmm0,0x60aff8(%rax) vmovsd %xmm0,0x60aff8(%rax) cmp$0x9f8,%raxcmp$0x9f8,%rax jneLoop1: jneLoop1 For the Base version, mult and FMA have dependencies, which increases the latency of the critical dependency chain. I didn't find out why znver3 has regression. Same binary running on ICX has 11% gain (with #define iterations 1).
Re: [PATCH 1/5] x86: use VPTERNLOG for further bitwise two-vector operations
On 25.06.2023 06:42, Hongtao Liu wrote: > On Wed, Jun 21, 2023 at 2:26 PM Jan Beulich via Gcc-patches > wrote: >> >> +(define_code_iterator andor [and ior]) >> +(define_code_attr nlogic [(and "nor") (ior "nand")]) >> +(define_code_attr ternlog_nlogic [(and "0x11") (ior "0x77")]) >> + >> +(define_insn "*3" >> + [(set (match_operand:VI 0 "register_operand" "=v,v") >> + (andor:VI >> + (not:VI (match_operand:VI 1 "bcst_vector_operand" "%v,v")) >> + (not:VI (match_operand:VI 2 "bcst_vector_operand" "vBr,m"] > I'm thinking of doing it in simplify_rtx or gimple match.pd to transform > (and (not op1)) (not op2)) -> (not: (ior: op1 op2)) This wouldn't be a win (not + andn) -> (or + not), but what's more important is ... > (ior (not op1) (not op2)) -> (not : (and op1 op2)) > > Even w/o avx512f, the transformation should also benefit since it > takes less logic operations 3 -> 2.(or 2 -> 2 for pandn). ... that these transformations (from the, as per the doc, canonical representation of nand and nor) are already occurring in common code, _if_ no suitable insn can be found. That was at least the conclusion I drew from looking around a lot, supported by the code that's generated prior to this change. Jan
RE: [PATCH] New finish_compare_by_pieces target hook (for x86).
On Tue, 13 June 2023 12:02, Richard Biener wrote: > On Mon, Jun 12, 2023 at 4:04 PM Roger Sayle > wrote: > > The following simple test case, from PR 104610, shows that memcmp () > > == 0 can result in some bizarre code sequences on x86. > > > > int foo(char *a) > > { > > static const char t[] = "0123456789012345678901234567890"; > > return __builtin_memcmp(a, [0], sizeof(t)) == 0; } > > > > with -O2 currently contains both: > > xorl%eax, %eax > > xorl$1, %eax > > and also > > movl$1, %eax > > xorl$1, %eax > > > > Changing the return type of foo to _Bool results in the equally > > bizarre: > > xorl%eax, %eax > > testl %eax, %eax > > sete%al > > and also > > movl$1, %eax > > testl %eax, %eax > > sete%al > > > > All these sequences set the result to a constant, but this > > optimization opportunity only occurs very late during compilation, by > > basic block duplication in the 322r.bbro pass, too late for CSE or > > peephole2 to do anything about it. The problem is that the idiom > > expanded by compare_by_pieces for __builtin_memcmp_eq contains basic > > blocks that can't easily be optimized by if-conversion due to the > > multiple incoming edges on the fail block. > > > > In summary, compare_by_pieces generates code that looks like: > > > > if (x[0] != y[0]) goto fail_label; > > if (x[1] != y[1]) goto fail_label; > > ... > > if (x[n] != y[n]) goto fail_label; > > result = 1; > > goto end_label; > > fail_label: > > result = 0; > > end_label: > > > > In theory, the RTL if-conversion pass could be enhanced to tackle > > arbitrarily complex if-then-else graphs, but the solution proposed > > here is to allow suitable targets to perform if-conversion during > > compare_by_pieces. The x86, for example, can take advantage that all > > of the above comparisons set and test the zero flag (ZF), which can > > then be used in combination with sete. Hence compare_by_pieces could > > instead generate: > > > > if (x[0] != y[0]) goto fail_label; > > if (x[1] != y[1]) goto fail_label; > > ... > > if (x[n] != y[n]) goto fail_label; > > fail_label: > > sete result > > > > which requires one less basic block, and the redundant conditional > > branch to a label immediately after is cleaned up by GCC's existing > > RTL optimizations. > > > > For the test case above, where -O2 -msse4 previously generated: > > > > foo:movdqu (%rdi), %xmm0 > > pxor.LC0(%rip), %xmm0 > > ptest %xmm0, %xmm0 > > je .L5 > > .L2:movl$1, %eax > > xorl$1, %eax > > ret > > .L5:movdqu 16(%rdi), %xmm0 > > pxor.LC1(%rip), %xmm0 > > ptest %xmm0, %xmm0 > > jne .L2 > > xorl%eax, %eax > > xorl$1, %eax > > ret > > > > we now generate: > > > > foo:movdqu (%rdi), %xmm0 > > pxor.LC0(%rip), %xmm0 > > ptest %xmm0, %xmm0 > > jne .L2 > > movdqu 16(%rdi), %xmm0 > > pxor.LC1(%rip), %xmm0 > > ptest %xmm0, %xmm0 > > .L2:sete%al > > movzbl %al, %eax > > ret > > > > Using a target hook allows the large amount of intelligence already in > > compare_by_pieces to be re-used by the i386 backend, but this can also > > help other backends with condition flags where the equality result can > > be materialized. > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check, both with and without --target_board=unix{-m32} > > with no new failures. Ok for mainline? > > What's the guarantee that the zero flag is appropriately set on all edges > incoming > now and forever? Is there any reason why this target hook can't be removed (in future) should it stop being useful? It's completely optional and not required for the correct functioning of the compiler. > Does this require target specific knowledge on how do_compare_rtx_and_jump > is emitting RTL? Yes. Each backend can decide how best to implement finish_compare_by_pieces given its internal knowledge of how do_compare_rtx_and_jump works. It's not important to the middle-end how the underlying invariants are guaranteed, just that they are and the backend produces correct code. A backend may store flags on the target label, or maintain state in cfun. Future changes to the i386 backend might cause it to revert to the default finish_compare_by_pieces, or provide an alternate implementation, but at the moment this patch improves the code that GCC generates. Very little (in software like GCC) is forever. > Do you see matching this in ifcvt to be unreasonable? I'm thinking of > "reducing" > the incoming edges pairwise without actually looking at the ifcvt code. There's nothing about the proposed patch that prevents or blocks improvements
Re: [PATCH 5/5] x86: yet more PR target/100711-like splitting
On Wed, Jun 21, 2023 at 2:29 PM Jan Beulich via Gcc-patches wrote: > > Following two-operand bitwise operations, add another splitter to also > deal with not followed by broadcast all on its own, which can be > expressed as simple embedded broadcast instead once a broadcast operand > is actually permitted in the respective insn. While there also permit > a broadcast operand in the corresponding expander. The patch LGTM. > > gcc/ > > * config/i386/sse.md: New splitters to simplify > not;vec_duplicate as a singular vpternlog. > (one_cmpl2): Allow broadcast for operand 1. > (one_cmpl2): Likewise. > > gcc/testsuite/ > > * gcc.target/i386/pr100711-6.c: New test. > --- > For the purpose here (and elsewhere) bcst_vector_operand() (really: > bcst_mem_operand()) isn't permissive enough: We'd want it to allow > 128-bit and 256-bit types as well irrespective of AVX512VL being > enabled. This would likely require a new predicate > (bcst_intvec_operand()?) and a new constraint (BR? Bi?). (Yet for name > selection it will want considering that this is applicable to certain > non-calculational FP operations as well.) I think so. > > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -17156,7 +17156,7 @@ > > (define_expand "one_cmpl2" >[(set (match_operand:VI 0 "register_operand") > - (xor:VI (match_operand:VI 1 "vector_operand") > + (xor:VI (match_operand:VI 1 "bcst_vector_operand") > (match_dup 2)))] >"TARGET_SSE" > { > @@ -17168,7 +17168,7 @@ > > (define_insn "one_cmpl2" >[(set (match_operand:VI 0 "register_operand" "=v,v") > - (xor:VI (match_operand:VI 1 "nonimmediate_operand" "v,m") > + (xor:VI (match_operand:VI 1 "bcst_vector_operand" "vBr,m") > (match_operand:VI 2 "vector_all_ones_operand" "BC,BC")))] >"TARGET_AVX512F > && (! > @@ -17191,6 +17191,19 @@ > (symbol_ref " == 64 || TARGET_AVX512VL") > (const_int 1)))]) > > +(define_split > + [(set (match_operand:VI48_AVX512F 0 "register_operand") > + (vec_duplicate:VI48_AVX512F > + (not: > + (match_operand: 1 "nonimmediate_operand"] > + " == 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)" > + [(set (match_dup 0) > + (xor:VI48_AVX512F > + (vec_duplicate:VI48_AVX512F (match_dup 1)) > + (match_dup 2)))] > + "operands[2] = CONSTM1_RTX (mode);") > + > (define_expand "_andnot3" >[(set (match_operand:VI_AVX2 0 "register_operand") > (and:VI_AVX2 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr100711-6.c > @@ -0,0 +1,18 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -mno-avx512vl -mprefer-vector-width=512 -O2" } */ > + > +typedef int v16si __attribute__ ((vector_size (64))); > +typedef long long v8di __attribute__((vector_size (64))); > + > +v16si foo_v16si (const int *a) > +{ > +return (__extension__ (v16si) {~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a, > + ~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a}); > +} > + > +v8di foo_v8di (const long long *a) > +{ > +return (__extension__ (v8di) {~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a}); > +} > + > +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0x55, > \\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}" 2 } } */ > -- BR, Hongtao
Re: [PATCH 4/5] x86: further PR target/100711-like splitting
On Wed, Jun 21, 2023 at 2:28 PM Jan Beulich via Gcc-patches wrote: > > With respective two-operand bitwise operations now expressable by a > single VPTERNLOG, add splitters to also deal with ior and xor > counterparts of the original and-only case. Note that the splitters need > to be separate, as the placement of "not" differs in the final insns > (*iornot3, *xnor3) which are intended to pick up one half of > the result. > > gcc/ > > * config/i386/sse.md: New splitters to simplify > not;vec_duplicate;{ior,xor} as vec_duplicate;{iornot,xnor}. > > gcc/testsuite/ > > * gcc.target/i386/pr100711-4.c: New test. > * gcc.target/i386/pr100711-5.c: New test. > > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -17366,6 +17366,36 @@ > (match_dup 2)))] >"operands[3] = gen_reg_rtx (mode);") > > +(define_split > + [(set (match_operand:VI 0 "register_operand") > + (ior:VI > + (vec_duplicate:VI > + (not: > + (match_operand: 1 "nonimmediate_operand"))) > + (match_operand:VI 2 "vector_operand")))] > + " == 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)" > + [(set (match_dup 3) > + (vec_duplicate:VI (match_dup 1))) > + (set (match_dup 0) > + (ior:VI (not:VI (match_dup 3)) (match_dup 2)))] > + "operands[3] = gen_reg_rtx (mode);") > + > +(define_split > + [(set (match_operand:VI 0 "register_operand") > + (xor:VI > + (vec_duplicate:VI > + (not: > + (match_operand: 1 "nonimmediate_operand"))) > + (match_operand:VI 2 "vector_operand")))] > + " == 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)" > + [(set (match_dup 3) > + (vec_duplicate:VI (match_dup 1))) > + (set (match_dup 0) > + (not:VI (xor:VI (match_dup 3) (match_dup 2] > + "operands[3] = gen_reg_rtx (mode);") > + Can we merge this splitter(xor:not) into ior:not one with a code iterator for xor,ior, They look the same except for the xor/ior. No need to merge it into and:not case which have different guard conditions. Others LGTM. > (define_insn "*andnot3_mask" >[(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v") > (vec_merge:VI48_AVX512VL > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr100711-4.c > @@ -0,0 +1,42 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512bw -mno-avx512vl -mprefer-vector-width=512 -O2" } */ > + > +typedef char v64qi __attribute__ ((vector_size (64))); > +typedef short v32hi __attribute__ ((vector_size (64))); > +typedef int v16si __attribute__ ((vector_size (64))); > +typedef long long v8di __attribute__((vector_size (64))); > + > +v64qi foo_v64qi (char a, v64qi b) > +{ > +return (__extension__ (v64qi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) | b; > +} > + > +v32hi foo_v32hi (short a, v32hi b) > +{ > +return (__extension__ (v32hi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) | b; > +} > + > +v16si foo_v16si (int a, v16si b) > +{ > +return (__extension__ (v16si) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) | b; > +} > + > +v8di foo_v8di (long long a, v8di b) > +{ > +return (__extension__ (v8di) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) | b; > +} > + > +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0xbb" 4 { > target { ! ia32 } } } } */ > +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0xbb" 2 { > target { ia32 } } } } */ > +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0xdd" 2 { > target { ia32 } } } } */ > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/pr100711-5.c > @@ -0,0 +1,40 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512bw -mno-avx512vl -mprefer-vector-width=512 -O2" } */ > + > +typedef char v64qi __attribute__ ((vector_size (64))); > +typedef short v32hi __attribute__ ((vector_size (64))); > +typedef int v16si __attribute__ ((vector_size (64))); > +typedef long long v8di __attribute__((vector_size (64))); > + > +v64qi foo_v64qi (char a, v64qi b) > +{ > +return (__extension__ (v64qi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a, > + ~a, ~a, ~a, ~a, ~a, ~a,
Re: [PATCH 3/5] x86: allow memory operand for AVX2 splitter for PR target/100711
On Wed, Jun 21, 2023 at 2:28 PM Jan Beulich via Gcc-patches wrote: > > The intended broadcast (with AVX512) can very well be done right from > memory. Ok. > > gcc/ > > * config/i386/sse.md: Permit non-immediate operand 1 in AVX2 > form of splitter for PR target/100711. > > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -17356,7 +17356,7 @@ > (and:VI_AVX2 > (vec_duplicate:VI_AVX2 > (not: > - (match_operand: 1 "register_operand"))) > + (match_operand: 1 "nonimmediate_operand"))) > (match_operand:VI_AVX2 2 "vector_operand")))] >"TARGET_AVX2" >[(set (match_dup 3) > -- BR, Hongtao
Re: [PATCH 2/5] x86: use VPTERNLOG also for certain andnot forms
On Wed, Jun 21, 2023 at 2:27 PM Jan Beulich via Gcc-patches wrote: > > When it's the memory operand which is to be inverted, using VPANDN* > requires a further load instruction. The same can be achieved by a > single VPTERNLOG*. Add two new alternatives (for plain memory and > embedded broadcast), adjusting the predicate for the first operand > accordingly. > > Two pre-existing testcases actually end up being affected (improved) by > the change, which is reflected in updated expectations there. LGTM. > > gcc/ > > PR target/93768 > * config/i386/sse.md (*andnot3): Add new alternatives > for memory form operand 1. > > gcc/testsuite/ > > PR target/93768 > * gcc.target/i386/avx512f-andn-di-zmm-2.c: New test. > * gcc.target/i386/avx512f-andn-si-zmm-2.c: Adjust expecations > towards generated code. > * gcc.target/i386/pr100711-3.c: Adjust expectations for 32-bit > code. > > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -17210,11 +17210,13 @@ >"TARGET_AVX512F") > > (define_insn "*andnot3" > - [(set (match_operand:VI 0 "register_operand" "=x,x,v") > + [(set (match_operand:VI 0 "register_operand" "=x,x,v,v,v") > (and:VI > - (not:VI (match_operand:VI 1 "vector_operand" "0,x,v")) > - (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr")))] > - "TARGET_SSE" > + (not:VI (match_operand:VI 1 "bcst_vector_operand" "0,x,v,m,Br")) > + (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr,v,v")))] > + "TARGET_SSE > + && (register_operand (operands[1], mode) > + || register_operand (operands[2], mode))" > { >char buf[64]; >const char *ops; > @@ -17281,6 +17283,15 @@ > case 2: >ops = "v%s%s\t{%%2, %%1, %%0|%%0, %%1, %%2}"; >break; > +case 3: > +case 4: > + tmp = "pternlog"; > + ssesuffix = ""; > + if (which_alternative != 4 || TARGET_AVX512VL) > + ops = "v%s%s\t{$0x44, %%1, %%2, %%0|%%0, %%2, %%1, $0x44}"; > + else > + ops = "v%s%s\t{$0x44, %%g1, %%g2, %%g0|%%g0, %%g2, %%g1, $0x44}"; > + break; > default: >gcc_unreachable (); > } > @@ -17289,7 +17300,7 @@ >output_asm_insn (buf, operands); >return ""; > } > - [(set_attr "isa" "noavx,avx,avx") > + [(set_attr "isa" "noavx,avx,avx,*,*") > (set_attr "type" "sselog") > (set (attr "prefix_data16") > (if_then_else > @@ -17297,9 +17308,12 @@ > (eq_attr "mode" "TI")) > (const_string "1") > (const_string "*"))) > - (set_attr "prefix" "orig,vex,evex") > + (set_attr "prefix" "orig,vex,evex,evex,evex") > (set (attr "mode") > - (cond [(match_test "TARGET_AVX2") > + (cond [(and (eq_attr "alternative" "3,4") > + (match_test " < 64 && !TARGET_AVX512VL")) > +(const_string "XI") > + (match_test "TARGET_AVX2") > (const_string "") >(match_test "TARGET_AVX") > (if_then_else > @@ -17310,7 +17324,15 @@ > (match_test "optimize_function_for_size_p (cfun)")) > (const_string "V4SF") > ] > - (const_string "")))]) > + (const_string ""))) > + (set (attr "enabled") > + (cond [(eq_attr "alternative" "3") > +(symbol_ref " == 64 || TARGET_AVX512VL") > + (eq_attr "alternative" "4") > +(symbol_ref " == 64 || TARGET_AVX512VL > + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)") > + ] > + (const_string "*")))]) > > ;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn > (define_split > --- /dev/null > +++ b/gcc/testsuite/gcc.target/i386/avx512f-andn-di-zmm-2.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-mavx512f -mno-avx512vl -mprefer-vector-width=512 -O2" } */ > +/* { dg-final { scan-assembler-times "vpternlogq\[ \\t\]+\\\$0x44, > \\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */ > +/* { dg-final { scan-assembler-not "vpbroadcast" } } */ > + > +#define type __m512i > +#define vec 512 > +#define op andnot > +#define suffix epi64 > +#define SCALAR long long > + > +#include "avx512-binop-2.h" > --- a/gcc/testsuite/gcc.target/i386/avx512f-andn-si-zmm-2.c > +++ b/gcc/testsuite/gcc.target/i386/avx512f-andn-si-zmm-2.c > @@ -1,7 +1,7 @@ > /* { dg-do compile } */ > /* { dg-options "-mavx512f -O2" } */ > -/* { dg-final { scan-assembler-times "vpbroadcastd\[^\n\]*%zmm\[0-9\]+" 1 } > } */ > -/* { dg-final { scan-assembler-times "vpandnd\[^\n\]*%zmm\[0-9\]+" 1 } } */ > +/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\\\$0x44, > \\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */ > +/* { dg-final { scan-assembler-not "vpbroadcast" } } */ > > #define type __m512i > #define vec 512 > --- a/gcc/testsuite/gcc.target/i386/pr100711-3.c > +++
Re: [PATCH 1/5] x86: use VPTERNLOG for further bitwise two-vector operations
On Wed, Jun 21, 2023 at 2:26 PM Jan Beulich via Gcc-patches wrote: > > All combinations of and, ior, xor, and not involving two operands can be > expressed that way in a single insn. > > gcc/ > > PR target/93768 > * config/i386/i386.cc (ix86_rtx_costs): Further special-case > bitwise vector operations. > * config/i386/sse.md (*iornot3): New insn. > (*xnor3): Likewise. > (*3): Likewise. > (andor): New code iterator. > (nlogic): New code attribute. > (ternlog_nlogic): Likewise. > > gcc/testsuite/ > > PR target/93768 > gcc.target/i386/avx512-binop-not-1.h: New. > gcc.target/i386/avx512-binop-not-2.h: New. > gcc.target/i386/avx512f-orn-si-zmm-1.c: New test. > gcc.target/i386/avx512f-orn-si-zmm-2.c: New test. > --- > The use of VI matches that in e.g. one_cmpl2 / > one_cmpl2 and *andnot3, despite > (here and there) > - V64QI and V32HI being needlessly excluded when AVX512BW isn't enabled, > - VTI not being covered, > - vector modes more narrow than 16 bytes not being covered. > > --- a/gcc/config/i386/i386.cc > +++ b/gcc/config/i386/i386.cc > @@ -21178,6 +21178,32 @@ ix86_rtx_costs (rtx x, machine_mode mode >return false; > > case IOR: > + if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT) > + { > + /* (ior (not ...) ...) can be a single insn in AVX512. */ > + if (GET_CODE (XEXP (x, 0)) == NOT && TARGET_AVX512F > + && (GET_MODE_SIZE (mode) == 64 > + || (TARGET_AVX512VL > + && (GET_MODE_SIZE (mode) == 32 > + || GET_MODE_SIZE (mode) == 16 > + { > + rtx right = GET_CODE (XEXP (x, 1)) != NOT > + ? XEXP (x, 1) : XEXP (XEXP (x, 1), 0); > + > + *total = ix86_vec_cost (mode, cost->sse_op) > + + rtx_cost (XEXP (XEXP (x, 0), 0), mode, > + outer_code, opno, speed) > + + rtx_cost (right, mode, outer_code, opno, speed); > + return true; > + } > + *total = ix86_vec_cost (mode, cost->sse_op); > + } > + else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD) > + *total = cost->add * 2; > + else > + *total = cost->add; > + return false; > + > case XOR: >if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT) > *total = ix86_vec_cost (mode, cost->sse_op); > @@ -21198,11 +21224,20 @@ ix86_rtx_costs (rtx x, machine_mode mode > /* pandn is a single instruction. */ > if (GET_CODE (XEXP (x, 0)) == NOT) > { > + rtx right = XEXP (x, 1); > + > + /* (and (not ...) (not ...)) can be a single insn in AVX512. */ > + if (GET_CODE (right) == NOT && TARGET_AVX512F > + && (GET_MODE_SIZE (mode) == 64 > + || (TARGET_AVX512VL > + && (GET_MODE_SIZE (mode) == 32 > + || GET_MODE_SIZE (mode) == 16 > + right = XEXP (right, 0); > + > *total = ix86_vec_cost (mode, cost->sse_op) >+ rtx_cost (XEXP (XEXP (x, 0), 0), mode, >outer_code, opno, speed) > - + rtx_cost (XEXP (x, 1), mode, > - outer_code, opno, speed); > + + rtx_cost (right, mode, outer_code, opno, speed); > return true; > } > else if (GET_CODE (XEXP (x, 1)) == NOT) > @@ -21260,8 +21295,25 @@ ix86_rtx_costs (rtx x, machine_mode mode > > case NOT: >if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT) > - // vnot is pxor -1. > - *total = ix86_vec_cost (mode, cost->sse_op) + 1; > + { > + /* (not (xor ...)) can be a single insn in AVX512. */ > + if (GET_CODE (XEXP (x, 0)) == XOR && TARGET_AVX512F > + && (GET_MODE_SIZE (mode) == 64 > + || (TARGET_AVX512VL > + && (GET_MODE_SIZE (mode) == 32 > + || GET_MODE_SIZE (mode) == 16 > + { > + *total = ix86_vec_cost (mode, cost->sse_op) > + + rtx_cost (XEXP (XEXP (x, 0), 0), mode, > + outer_code, opno, speed) > + + rtx_cost (XEXP (XEXP (x, 0), 1), mode, > + outer_code, opno, speed); > + return true; > + } > + > + // vnot is pxor -1. > + *total = ix86_vec_cost (mode, cost->sse_op) + 1; > + } >else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD) > *total = cost->add * 2; >else > --- a/gcc/config/i386/sse.md > +++ b/gcc/config/i386/sse.md > @@ -17616,6 +17616,98 @@ >operands[2] = force_reg (V1TImode, CONSTM1_RTX (V1TImode)); > }) > > +(define_insn "*iornot3" > + [(set (match_operand:VI
[Bug target/110400] New: Reuse vector register for both scalar and vector value.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110400 Bug ID: 110400 Summary: Reuse vector register for both scalar and vector value. Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: crazylht at gmail dot com Target Milestone: --- >From PR109812 #c18 Uroš Bizjak 2023-06-21 09:46:43 UTC One interesting observation: clang is able to do this: 0.09 │ │ vmovddup -0x8(%rdx,%rsi,1),%xmm3 ▒ ... 0.11 │ │ vfmadd231sd %xmm2,%xmm3,%xmm1▒ ... 0.74 │ │ vfmadd231pd %xmm2,%xmm3,%xmm0▒ It figures out that duplicated V2DFmode value in %xmm3 can also be accessed in the same register as DFmode value. OTOH, current gcc does: vmovsd (%rsi,%rax,8), %xmm1 ... vmovddup%xmm1, %xmm4 ... vfmadd231pd %xmm4, %xmm0, %xmm2 ... vfmadd231sd %xmm1, %xmm0, %xmm3 The above code needs two registers. Similar with below testcase typedef double v2df __attribute__((vector_size(16))); v2df c; double d; void foo (double* __restrict a) { c = __extension__(v2df) {*a, *a}; d = *a; } with option: -O2 -mavx2 GCC generates foo(double*): vmovsd (%rdi), %xmm0 vmovddup%xmm0, %xmm1 vmovsd %xmm0, d(%rip) vmovapd %xmm1, c(%rip) Clang foo(double*): # @foo(double*) vmovddup(%rdi), %xmm0 # xmm0 = mem[0,0] vmovaps %xmm0, c(%rip) vmovlps %xmm0, d(%rip) retq
[Bug target/110309] Wrong code for masked load expansion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110309 --- Comment #4 from Hongtao.liu --- Fixed for GCC14. Note: unspec is not added to maskstore since vpblendd doesn't support memeory dest, so there's no chance for a maskstore be optimized to vpblendd?
[Bug target/110309] Wrong code for masked load expansion
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110309 --- Comment #3 from CVS Commits --- The master branch has been updated by hongtao Liu : https://gcc.gnu.org/g:c79476da46728e2ab17e0e546262d2f6377081aa commit r14-2070-gc79476da46728e2ab17e0e546262d2f6377081aa Author: liuhongt Date: Tue Jun 20 15:41:00 2023 +0800 Refine maskloadmn pattern with UNSPEC_MASKLOAD. If mem_addr points to a memory region with less than whole vector size bytes of accessible memory and k is a mask that would prevent reading the inaccessible bytes from mem_addr, add UNSPEC_MASKLOAD to prevent it to be transformed to vpblendd. gcc/ChangeLog: PR target/110309 * config/i386/sse.md (maskload): Refine pattern with UNSPEC_MASKLOAD. (maskload): Ditto. (*_load_mask): Extend mode iterator to VI12HFBF_AVX512VL. (*_load): Ditto. gcc/testsuite/ChangeLog: * gcc.target/i386/pr110309.c: New test.
[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237 --- Comment #9 from Hongtao.liu --- > So we can simply clear only MEM_EXPR (and MEM_OFFSET), that cuts off the > problematic part of alias analysis. Together with UNSPEC this might be > enough to fix things. > Note maskstore won't optimized to vpblendd since it doesn't support memory dest, so I guess no need to use UNSPEC for maskstore?
Re: [PATCH] RISC-V: force arg and target to reg rtx under -O0
Hi, Li. Appreciate for catching this! I think it's better: -emit_insn (gen_rtx_SET (gen_lowpart (e.vector_mode (), e.target), src)); +emit_move_insn (gen_lowpart (e.vector_mode (), e.target), src); do this to fix this issue. Thanks. juzhe.zh...@rivai.ai From: Li Xu Date: 2023-06-25 11:08 To: gcc-patches CC: kito.cheng; palmer; juzhe.zhong; Li Xu Subject: [PATCH] RISC-V: force arg and target to reg rtx under -O0 arg and target should be expanded to reg rtx during expand pass. Consider this following case: void test_vlmul_ext_v_i8mf8_i8mf4(vint8mf8_t op1) { vint8mf4_t res = __riscv_vlmul_ext_v_i8mf8_i8mf4(op1); } Compilation fails with: test.c: In function 'test_vlmul_ext_v_i8mf8_i8mf4': test.c:5:1: error: unrecognizable insn: 5 | } | ^ (insn 30 29 0 2 (set (mem/c:VNx2QI (reg/f:DI 143) [0 x+0 S[2, 2] A32]) (mem/c:VNx2QI (reg/f:DI 148) [0 op1+0 S[2, 2] A16])) "test.c":4:18 -1 (nil)) during RTL pass: vregs test.c:5:1: internal compiler error: in extract_insn, at recog.cc:2791 0x7c61b8 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) ../.././riscv-gcc/gcc/rtl-error.cc:108 0x7c61d7 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) ../.././riscv-gcc/gcc/rtl-error.cc:116 0xed58a7 extract_insn(rtx_insn*) ../.././riscv-gcc/gcc/recog.cc:2791 0xb7f789 instantiate_virtual_regs_in_insn ../.././riscv-gcc/gcc/function.cc:1611 0xb7f789 instantiate_virtual_regs ../.././riscv-gcc/gcc/function.cc:1984 gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: force arg and target to reg rtx. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/vlmul_ext-2.c: New test. --- gcc/config/riscv/riscv-vector-builtins-bases.cc | 5 - gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c | 8 2 files changed, 12 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc b/gcc/config/riscv/riscv-vector-builtins-bases.cc index c6c53dc13a5..f135f7971fa 100644 --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc @@ -1567,7 +1567,10 @@ public: { tree arg = CALL_EXPR_ARG (e.exp, 0); rtx src = expand_normal (arg); -emit_insn (gen_rtx_SET (gen_lowpart (e.vector_mode (), e.target), src)); +if (MEM_P (e.target)) + e.target = force_reg (GET_MODE (e.target), e.target); +emit_insn (gen_rtx_SET (gen_lowpart (e.vector_mode (), e.target), + MEM_P (src) ? force_reg (GET_MODE (src), src) : src)); return e.target; } }; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c new file mode 100644 index 000..2b088b53546 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O0" } */ + +#include "riscv_vector.h" + +void test_vlmul_ext_v_i8mf8_i8mf4(vint8mf8_t op1) { + vint8mf4_t res = __riscv_vlmul_ext_v_i8mf8_i8mf4(op1); +} -- 2.17.1
[PATCH] internal-fn: Fix bug of BIAS argument index
From: Ju-Zhe Zhong When trying to enable LEN_MASK_{LOAD,STORE} in RISC-V port, I found I made a mistake in case of argument index of BIAS. This patch is an obvious fix, Ok for trunk ? gcc/ChangeLog: * internal-fn.cc (expand_partial_store_optab_fn): Fix bug of BIAS argument index. --- gcc/internal-fn.cc | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc index 1c2fd487e2a..9017176dc7a 100644 --- a/gcc/internal-fn.cc +++ b/gcc/internal-fn.cc @@ -2991,7 +2991,7 @@ expand_partial_store_optab_fn (internal_fn ifn, gcall *stmt, convert_optab optab maskt = gimple_call_arg (stmt, 3); mask = expand_normal (maskt); create_input_operand ([3], mask, TYPE_MODE (TREE_TYPE (maskt))); - biast = gimple_call_arg (stmt, 4); + biast = gimple_call_arg (stmt, 5); bias = expand_normal (biast); create_input_operand ([4], bias, QImode); icode = convert_optab_handler (optab, TYPE_MODE (type), GET_MODE (mask)); -- 2.36.3
[Bug tree-optimization/110371] [14 Regression] gfortran ICE "verify_gimple failed" in gfortran.dg/vect/pr51058-2.f90 since r14-2007
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110371 --- Comment #7 from Hongtao.liu --- (In reply to Hongtao.liu from comment #6) > (In reply to Thiago Jung Bauermann from comment #0) > > Created attachment 55387 [details] > > Output of running gfortran with -freport-bug > > > > In today's trunk (tested commit 33ebb0dff9bb "configure: Implement > > --enable-host-bind-now") I get these new failures on aarch64-linux-gnu: > > > > Running gcc:gcc.target/aarch64/sve/aarch64-sve.exp ... > > FAIL: gcc.target/aarch64/sve/pack_fcvt_signed_1.c scan-assembler-times > > \\tfcvtzs\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.d\\n 2 > > FAIL: gcc.target/aarch64/sve/pack_fcvt_signed_1.c scan-assembler-times > > \\tuzp1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1 > > FAIL: gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c scan-assembler-times > > \\tfcvtzu\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.d\\n 2 > > FAIL: gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c scan-assembler-times > > \\tuzp1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1 > > FAIL: gcc.target/aarch64/sve/unpack_fcvt_signed_1.c scan-assembler-times > > \\tscvtf\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.s\\n 2 > > FAIL: gcc.target/aarch64/sve/unpack_fcvt_signed_1.c scan-assembler-times > > \\tzip1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1 > > FAIL: gcc.target/aarch64/sve/unpack_fcvt_signed_1.c scan-assembler-times > > \\tzip2\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1 > > FAIL: gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c scan-assembler-times > > \\tucvtf\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.s\\n 2 > > FAIL: gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c scan-assembler-times > > \\tzip1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1 > > FAIL: gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c scan-assembler-times > > \\tzip2\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1 > > === gfortran tests === > > > > For this scan-assembler failures, It looks like gcc now generates better > code, is it ok to adjust testcase to match new assembly? > > current: > ld1dz31.d, p7/z, [x1, x3, lsl 3] > faddz31.d, p7/m, z31.d, z30.d > fcvtzs z31.d, p6/m, z31.d > st1wz31.d, p7, [x0, x3, lsl 2] > add x3, x3, x4 > whilelo p7.d, w3, w2 > b.any .L3 > > vs > original > punpklo p2.h, p0.b > punpkhi p1.h, p0.b > ld1dz0.d, p2/z, [x1, x3, lsl 3] > ld1dz1.d, p1/z, [x5, x3, lsl 3] > faddz0.d, p2/m, z0.d, z2.d > faddz1.d, p1/m, z1.d, z2.d > fcvtzs z0.s, p3/m, z0.d > fcvtzs z1.s, p3/m, z1.d > uzp1z0.s, z0.s, z1.s > st1wz0.s, p0, [x0, x3, lsl 2] > add x3, x3, x4 > whilelo p0.s, w3, w2 > b.any .L3 > > > https://godbolt.org/z/b4cW7WKev Or only adjust testcase for FLOAT_EXPR, not for FIX_TRUNC_EXPR to avoid float- integer overflow.
[Bug tree-optimization/110371] [14 Regression] gfortran ICE "verify_gimple failed" in gfortran.dg/vect/pr51058-2.f90 since r14-2007
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110371 --- Comment #6 from Hongtao.liu --- (In reply to Thiago Jung Bauermann from comment #0) > Created attachment 55387 [details] > Output of running gfortran with -freport-bug > > In today's trunk (tested commit 33ebb0dff9bb "configure: Implement > --enable-host-bind-now") I get these new failures on aarch64-linux-gnu: > > Running gcc:gcc.target/aarch64/sve/aarch64-sve.exp ... > FAIL: gcc.target/aarch64/sve/pack_fcvt_signed_1.c scan-assembler-times > \\tfcvtzs\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.d\\n 2 > FAIL: gcc.target/aarch64/sve/pack_fcvt_signed_1.c scan-assembler-times > \\tuzp1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1 > FAIL: gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c scan-assembler-times > \\tfcvtzu\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.d\\n 2 > FAIL: gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c scan-assembler-times > \\tuzp1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1 > FAIL: gcc.target/aarch64/sve/unpack_fcvt_signed_1.c scan-assembler-times > \\tscvtf\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.s\\n 2 > FAIL: gcc.target/aarch64/sve/unpack_fcvt_signed_1.c scan-assembler-times > \\tzip1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1 > FAIL: gcc.target/aarch64/sve/unpack_fcvt_signed_1.c scan-assembler-times > \\tzip2\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1 > FAIL: gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c scan-assembler-times > \\tucvtf\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.s\\n 2 > FAIL: gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c scan-assembler-times > \\tzip1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1 > FAIL: gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c scan-assembler-times > \\tzip2\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1 > === gfortran tests === > For this scan-assembler failures, It looks like gcc now generates better code, is it ok to adjust testcase to match new assembly? current: ld1dz31.d, p7/z, [x1, x3, lsl 3] faddz31.d, p7/m, z31.d, z30.d fcvtzs z31.d, p6/m, z31.d st1wz31.d, p7, [x0, x3, lsl 2] add x3, x3, x4 whilelo p7.d, w3, w2 b.any .L3 vs original punpklo p2.h, p0.b punpkhi p1.h, p0.b ld1dz0.d, p2/z, [x1, x3, lsl 3] ld1dz1.d, p1/z, [x5, x3, lsl 3] faddz0.d, p2/m, z0.d, z2.d faddz1.d, p1/m, z1.d, z2.d fcvtzs z0.s, p3/m, z0.d fcvtzs z1.s, p3/m, z1.d uzp1z0.s, z0.s, z1.s st1wz0.s, p0, [x0, x3, lsl 2] add x3, x3, x4 whilelo p0.s, w3, w2 b.any .L3 https://godbolt.org/z/b4cW7WKev
[PATCH] RISC-V: force arg and target to reg rtx under -O0
arg and target should be expanded to reg rtx during expand pass. Consider this following case: void test_vlmul_ext_v_i8mf8_i8mf4(vint8mf8_t op1) { vint8mf4_t res = __riscv_vlmul_ext_v_i8mf8_i8mf4(op1); } Compilation fails with: test.c: In function 'test_vlmul_ext_v_i8mf8_i8mf4': test.c:5:1: error: unrecognizable insn: 5 | } | ^ (insn 30 29 0 2 (set (mem/c:VNx2QI (reg/f:DI 143) [0 x+0 S[2, 2] A32]) (mem/c:VNx2QI (reg/f:DI 148) [0 op1+0 S[2, 2] A16])) "test.c":4:18 -1 (nil)) during RTL pass: vregs test.c:5:1: internal compiler error: in extract_insn, at recog.cc:2791 0x7c61b8 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*) ../.././riscv-gcc/gcc/rtl-error.cc:108 0x7c61d7 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*) ../.././riscv-gcc/gcc/rtl-error.cc:116 0xed58a7 extract_insn(rtx_insn*) ../.././riscv-gcc/gcc/recog.cc:2791 0xb7f789 instantiate_virtual_regs_in_insn ../.././riscv-gcc/gcc/function.cc:1611 0xb7f789 instantiate_virtual_regs ../.././riscv-gcc/gcc/function.cc:1984 gcc/ChangeLog: * config/riscv/riscv-vector-builtins-bases.cc: force arg and target to reg rtx. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/base/vlmul_ext-2.c: New test. --- gcc/config/riscv/riscv-vector-builtins-bases.cc | 5 - gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c | 8 2 files changed, 12 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc b/gcc/config/riscv/riscv-vector-builtins-bases.cc index c6c53dc13a5..f135f7971fa 100644 --- a/gcc/config/riscv/riscv-vector-builtins-bases.cc +++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc @@ -1567,7 +1567,10 @@ public: { tree arg = CALL_EXPR_ARG (e.exp, 0); rtx src = expand_normal (arg); -emit_insn (gen_rtx_SET (gen_lowpart (e.vector_mode (), e.target), src)); +if (MEM_P (e.target)) + e.target = force_reg (GET_MODE (e.target), e.target); +emit_insn (gen_rtx_SET (gen_lowpart (e.vector_mode (), e.target), + MEM_P (src) ? force_reg (GET_MODE (src), src) : src)); return e.target; } }; diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c b/gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c new file mode 100644 index 000..2b088b53546 --- /dev/null +++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c @@ -0,0 +1,8 @@ +/* { dg-do compile } */ +/* { dg-options "-march=rv64gcv -mabi=lp64d -O0" } */ + +#include "riscv_vector.h" + +void test_vlmul_ext_v_i8mf8_i8mf4(vint8mf8_t op1) { + vint8mf4_t res = __riscv_vlmul_ext_v_i8mf8_i8mf4(op1); +} -- 2.17.1
[Bug tree-optimization/110371] [14 Regression] gfortran ICE "verify_gimple failed" in gfortran.dg/vect/pr51058-2.f90 since r14-2007
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110371 --- Comment #5 from Hongtao.liu --- Reproduced with typedef struct dest { double m[3][3]; } dest; typedef struct src { int m[3][3]; } src; void foo (dest *a, src* s) { for (int i = 0; i != 3; i++) for (int j = 0; j != 3; j++) a->m[i][j] = s->m[i][j]; } for aarch64-linux-gnu. The problem is when there's more than 1 vop in vec_oprnds0, vec_dest will be overwrited to final vectype_out, but here it's expecting cvt_type. I'm testing below: Staged changes 1 file changed, 10 insertions(+), 4 deletions(-) gcc/tree-vect-stmts.cc | 14 ++ modified gcc/tree-vect-stmts.cc @@ -5044,7 +5044,7 @@ vectorizable_conversion (vec_info *vinfo, gimple **vec_stmt, slp_tree slp_node, stmt_vector_for_cost *cost_vec) { - tree vec_dest; + tree vec_dest, cvt_op; tree scalar_dest; tree op0, op1 = NULL_TREE; loop_vec_info loop_vinfo = dyn_cast (vinfo); @@ -5568,6 +5568,13 @@ vectorizable_conversion (vec_info *vinfo, case NONE: vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies, op0, _oprnds0); + /* vec_dest is intermediate type operand when multi_step_cvt. */ + if (multi_step_cvt) +{ + cvt_op = vec_dest; + vec_dest = vec_dsts[0]; +} + FOR_EACH_VEC_ELT (vec_oprnds0, i, vop0) { /* Arguments are ready, create the new vector stmt. */ @@ -5575,12 +5582,11 @@ vectorizable_conversion (vec_info *vinfo, if (multi_step_cvt) { gcc_assert (multi_step_cvt == 1); - new_stmt = vect_gimple_build (vec_dest, codecvt1, vop0); - new_temp = make_ssa_name (vec_dest, new_stmt); + new_stmt = vect_gimple_build (cvt_op, codecvt1, vop0); + new_temp = make_ssa_name (cvt_op, new_stmt); gimple_assign_set_lhs (new_stmt, new_temp); vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi); vop0 = new_temp; - vec_dest = vec_dsts[0]; } new_stmt = vect_gimple_build (vec_dest, code1, vop0); new_temp = make_ssa_name (vec_dest, new_stmt); [back]
Re: Re: [PATCH V1] RISC-V:Add float16 tuple type support
Such issue will be addressed by this patch: https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622440.html But still wait for Jakub's comments. juzhe.zh...@rivai.ai From: Andreas Schwab Date: 2023-06-23 18:25 To: shiyulong CC: gcc-patches; palmer; kito.cheng; jim.wilson.gcc; juzhe.zhong; pan2.li; wuwei2016; jiawei; shihua; dje.gcc; mirimmad Subject: Re: [PATCH V1] RISC-V:Add float16 tuple type support ../../gcc/lto-streamer-out.cc: In function 'void lto_output_init_mode_table()': ../../gcc/lto-streamer-out.cc:3177:10: error: 'void* memset(void*, int, size_t)' forming offset [256, 283] is out of the bounds [0, 256] of object 'streamer_mode_table' with type 'unsigned char [256]' [-Werror=array-bounds=] 3177 | memset (streamer_mode_table, '\0', MAX_MACHINE_MODE); | ~~~^ In file included from ../../gcc/gimple-streamer.h:25, from ../../gcc/lto-streamer-out.cc:33: ../../gcc/tree-streamer.h:78:22: note: 'streamer_mode_table' declared here 78 | extern unsigned char streamer_mode_table[1 << 8]; | ^~~ cc1plus: all warnings being treated as errors make[3]: *** [Makefile:1180: lto-streamer-out.o] Error 1 -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 "And now for something completely different."
[PATCHv4, rs6000] Splat vector small V2DI constants with ISA 2.07 instructions [PR104124]
Hi, This patch adds a new insn for vector splat with small V2DI constants on P8. If the value of constant is in RANGE (-16, 15) and not 0 or -1, it can be loaded with vspltisw and vupkhsw on P8. It should be efficient than loading vector from memory. Compared to last version, the main change is to remove the new constraint and use a super constraint in the insn and set the check into insn condition. Bootstrapped and tested on powerpc64-linux BE and LE with no regressions. Thanks Gui Haochen ChangeLog 2023-06-25 Haochen Gui gcc/ PR target/104124 * config/rs6000/altivec.md (*altivec_vupkhs_direct): Rename to... (altivec_vupkhs_direct): ...this. * config/rs6000/predicates.md (vspltisw_vupkhsw_constant_split): New predicate to test if a constant can be loaded with vspltisw and vupkhsw. (easy_vector_constant): Call vspltisw_vupkhsw_constant_p to Check if a vector constant can be synthesized with a vspltisw and a vupkhsw. * config/rs6000/rs6000-protos.h (vspltisw_vupkhsw_constant_p): Declare. * config/rs6000/rs6000.cc (vspltisw_vupkhsw_constant_p): New function to return true if OP mode is V2DI and can be synthesized with vupkhsw and vspltisw. * config/rs6000/vsx.md (*vspltisw_v2di_split): New insn to load up constants with vspltisw and vupkhsw. gcc/testsuite/ PR target/104124 * gcc.target/powerpc/pr104124.c: New. patch.diff diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md index 49b0c964f4d..2c932854c33 100644 --- a/gcc/config/rs6000/altivec.md +++ b/gcc/config/rs6000/altivec.md @@ -2542,7 +2542,7 @@ (define_insn "altivec_vupkhs" } [(set_attr "type" "vecperm")]) -(define_insn "*altivec_vupkhs_direct" +(define_insn "altivec_vupkhs_direct" [(set (match_operand:VP 0 "register_operand" "=v") (unspec:VP [(match_operand: 1 "register_operand" "v")] UNSPEC_VUNPACK_HI_SIGN_DIRECT))] diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md index 52c65534e51..f62a4d9b506 100644 --- a/gcc/config/rs6000/predicates.md +++ b/gcc/config/rs6000/predicates.md @@ -694,6 +694,12 @@ (define_predicate "xxspltib_constant_split" return num_insns > 1; }) +;; Return true if the operand is a constant that can be loaded with a vspltisw +;; instruction and then a vupkhsw instruction. + +(define_predicate "vspltisw_vupkhsw_constant_split" + (and (match_code "const_vector") + (match_test "vspltisw_vupkhsw_constant_p (op, mode)"))) ;; Return 1 if the operand is constant that can loaded directly with a XXSPLTIB ;; instruction. @@ -742,6 +748,11 @@ (define_predicate "easy_vector_constant" && xxspltib_constant_p (op, mode, _insns, )) return true; + /* V2DI constant within RANGE (-16, 15) can be synthesized with a +vspltisw and a vupkhsw. */ + if (vspltisw_vupkhsw_constant_p (op, mode, )) + return true; + return easy_altivec_constant (op, mode); } diff --git a/gcc/config/rs6000/rs6000-protos.h b/gcc/config/rs6000/rs6000-protos.h index 1a4fc1df668..00cb2d82953 100644 --- a/gcc/config/rs6000/rs6000-protos.h +++ b/gcc/config/rs6000/rs6000-protos.h @@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, int, int, int, extern int easy_altivec_constant (rtx, machine_mode); extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *); +extern bool vspltisw_vupkhsw_constant_p (rtx, machine_mode, int * = nullptr); extern int vspltis_shifted (rtx); extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int); extern bool macho_lo_sum_memory_operand (rtx, machine_mode); diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc index 3be5860dd9b..ae34a02b282 100644 --- a/gcc/config/rs6000/rs6000.cc +++ b/gcc/config/rs6000/rs6000.cc @@ -6638,6 +6638,36 @@ xxspltib_constant_p (rtx op, return true; } +/* Return true if OP mode is V2DI and can be synthesized with ISA 2.07 + instructions vupkhsw and vspltisw. + + Return the constant that is being split via CONSTANT_PTR. */ + +bool +vspltisw_vupkhsw_constant_p (rtx op, machine_mode mode, int *constant_ptr) +{ + HOST_WIDE_INT value; + rtx elt; + + if (!TARGET_P8_VECTOR) +return false; + + if (mode != V2DImode) +return false; + + if (!const_vec_duplicate_p (op, )) +return false; + + value = INTVAL (elt); + if (value == 0 || value == 1 + || !EASY_VECTOR_15 (value)) +return false; + + if (constant_ptr) +*constant_ptr = (int) value; + return true; +} + const char * output_vec_const_move (rtx *operands) { diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md index 7d845df5c2d..4919b073e50 100644 --- a/gcc/config/rs6000/vsx.md +++ b/gcc/config/rs6000/vsx.md @@ -1174,6 +1174,30 @@ (define_insn_and_split "*xxspltib__split" [(set_attr "type" "vecperm") (set_attr "length" "8")]) +(define_insn_and_split
[Bug middle-end/13421] IA32 bigmem pointer subtraction and –ftrapv option causes unjustified program abort
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=13421 Andrew Pinski changed: What|Removed |Added CC||baiwfg2 at gmail dot com --- Comment #16 from Andrew Pinski --- *** Bug 110399 has been marked as a duplicate of this bug. ***
[Bug middle-end/110399] pointer substraction causes coredump with ftrapv on edge case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110399 Andrew Pinski changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #2 from Andrew Pinski --- Dup of bug 13421. *** This bug has been marked as a duplicate of bug 13421 ***
[Bug middle-end/110399] pointer substraction causes coredump with ftrapv on edge case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110399 --- Comment #1 from Andrew Pinski --- 32 bit, w1=2 w2=2 w3=2 w4=0 w5=2 Program received signal SIGABRT, Aborted.
[Bug c/110399] New: pointer substraction causes coredump with ftrapv on edge case
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110399 Bug ID: 110399 Summary: pointer substraction causes coredump with ftrapv on edge case Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c Assignee: unassigned at gcc dot gnu.org Reporter: baiwfg2 at gmail dot com Target Milestone: --- The demo code is : ```c #include #include #include #include int main() { { char *p = (char *)0x8001; char *q = (char *)0x7fff; uint32_t w = p - q; printf("32 bit, w1=%u\n", w); } { char *p = (char *)0x7fff; char *q = (char *)0x7ffd; uint32_t w2 = p - q; printf("w2=%u\n", w2); } { char *p = (char *)0x8003; char *q = (char *)0x8001; uint32_t w3 = p - q; printf("w3=%u\n", w3); } { char *p = (char *)0x8001; char *q = (char *)0x0001; uint32_t w4 = p - q; printf("w4=%u\n", w4); // ans is 0, not crash under -ftrapv } { char *p = (char *)0x8001; char *q = (char *)0x7fff; uint32_t w5 = (uintptr_t)p - (uintptr_t)q; printf("w5=%u\n", w5); } { char *p = (char *)0x8001; // use uint8_t also crash char *q = (char *)0x7fff; // use smaller num 0x0011, also crash uint32_t w6 = p - q; printf("w6=%u\n", w6); // crash under gcc -ftrapv, not crash under clang -ftrapv } return 0; } ``` The statement w6 = p - q cause coredump. But what program actually means do pointer unsigned arithmetic operation. How can I make it right(that is, output 2) with ftrapv option ? I find it's ok with clang -ftrapv . This happens on many gcc versions.
Re: [PATCH] RISCV: Add -m(no)-omit-leaf-frame-pointer support.
On Sat, Jun 24, 2023, at 11:01 AM, Jeff Law via Gcc-patches wrote: > On 6/21/23 02:14, Wang, Yanzhang wrote: >> Hi Jeff, sorry for the late reply. >> >>> The long branch handling is done at the assembler level. So the clobbering >>> of $ra isn't visible to the compiler. Thus the compiler has to be >>> extremely careful to not hold values in $ra because the assembler may >>> clobber $ra. >> >> If assembler will modify the $ra behavior, it seems the rules we defined in >> the riscv.cc will be ignored. For example, the $ra saving generated by this >> patch may be modified by the assmebler and all others depends on it will be >> wrong. So implementing the long jump in the compiler is better. > Basically correct. The assembler potentially clobbers $ra. That's why > in the long jump patches $ra becomes a fixed register -- the compiler > doesn't know when it's clobbered by the assembler. > > Even if this were done in the compiler, we'd still have to do something > special with $ra. The point at which decisions about register > allocation and such are made is before the point where we know the final > positions of jumps/labels. It's a classic problem in GCC's design. Do you have a reference for more information on the long jump patches? I'm particularly curious about why $ra was selected as the temporary instead of $t1 like the tail pseudoinstruction uses. -s
RE: [PATCH] SSA ALIAS: Apply LEN_MASK_{LOAD, STORE} into SSA alias analysis
Committed, thanks Jeff. Pan -Original Message- From: Gcc-patches On Behalf Of Jeff Law via Gcc-patches Sent: Saturday, June 24, 2023 10:09 PM To: 钟居哲 ; gcc-patches Cc: rguenther ; richard.sandiford Subject: Re: [PATCH] SSA ALIAS: Apply LEN_MASK_{LOAD, STORE} into SSA alias analysis On 6/23/23 17:21, 钟居哲 wrote: > Not sure since I saw MASK_STORE/LEN_STORE didn't compute size. Yea, I think you're right. We take the size from the LHS. My mistake. This is fine for the trunk. jeff
Re: [PATCH v2] x86: make better use of VBROADCASTSS / VPBROADCASTD
On Sun, Jun 25, 2023 at 9:17 AM Liu, Hongtao wrote: > > > > > -Original Message- > > From: Jan Beulich > > Sent: Wednesday, June 21, 2023 8:40 PM > > To: Hongtao Liu > > Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin ; Liu, > > Hongtao > > Subject: Re: [PATCH v2] x86: make better use of VBROADCASTSS / > > VPBROADCASTD > > > > On 21.06.2023 09:44, Jan Beulich wrote: > > > On 21.06.2023 09:37, Hongtao Liu wrote: > > >> On Wed, Jun 21, 2023 at 2:06 PM Jan Beulich via Gcc-patches > > >> wrote: > > >>> > > >>> Isn't prefix_extra use bogus here? What extra prefix does > > >>> vbroadcastss > > >> According to comments, yes, no extra prefix is needed. > > >> > > >> ;; There are also additional prefixes in 3DNOW, SSSE3. > > >> ;; ssemuladd,sse4arg default to 0f24/0f25 and DREX byte, ;; > > >> sseiadd1,ssecvt1 to 0f7a with no DREX byte. > > >> ;; 3DNOW has 0f0f prefix, SSSE3 and SSE4_{1,2} 0f38/0f3a. > > > > > > Right, that's what triggered my question. I guess dropping these > > > "prefix_extra" really wants to be a separate patch (or maybe even > > > multiple, but it's hard to see how to split), dealing with all of the > > > instances which likely have accumulated simply via copy-and-paste. > > > > Or wait - I'm altering those lines anyway, so I could as well drop them > > right > > away (and slightly shrink patch size), if that's okay with you. Of course I > > should then not forget to also mention this in the changelog entry. > > > Yes. >Would you be okay for me to fold in that adjustment, or do you >insist on a separate patch? Also for this, no need for a separate patch. > > Jan -- BR, Hongtao
RE: [PATCH v2] x86: make better use of VBROADCASTSS / VPBROADCASTD
> -Original Message- > From: Jan Beulich > Sent: Wednesday, June 21, 2023 8:40 PM > To: Hongtao Liu > Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin ; Liu, > Hongtao > Subject: Re: [PATCH v2] x86: make better use of VBROADCASTSS / > VPBROADCASTD > > On 21.06.2023 09:44, Jan Beulich wrote: > > On 21.06.2023 09:37, Hongtao Liu wrote: > >> On Wed, Jun 21, 2023 at 2:06 PM Jan Beulich via Gcc-patches > >> wrote: > >>> > >>> Isn't prefix_extra use bogus here? What extra prefix does > >>> vbroadcastss > >> According to comments, yes, no extra prefix is needed. > >> > >> ;; There are also additional prefixes in 3DNOW, SSSE3. > >> ;; ssemuladd,sse4arg default to 0f24/0f25 and DREX byte, ;; > >> sseiadd1,ssecvt1 to 0f7a with no DREX byte. > >> ;; 3DNOW has 0f0f prefix, SSSE3 and SSE4_{1,2} 0f38/0f3a. > > > > Right, that's what triggered my question. I guess dropping these > > "prefix_extra" really wants to be a separate patch (or maybe even > > multiple, but it's hard to see how to split), dealing with all of the > > instances which likely have accumulated simply via copy-and-paste. > > Or wait - I'm altering those lines anyway, so I could as well drop them right > away (and slightly shrink patch size), if that's okay with you. Of course I > should then not forget to also mention this in the changelog entry. > Yes. > Jan
[Bug tree-optimization/110371] [14 Regression] gfortran ICE "verify_gimple failed" in gfortran.dg/vect/pr51058-2.f90 since r14-2007
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110371 --- Comment #4 from Hongtao.liu --- I'll take a look.
[Bug ada/110398] New: Program_Error sem_eval.adb:4635 explicit raise
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110398 Bug ID: 110398 Summary: Program_Error sem_eval.adb:4635 explicit raise Product: gcc Version: 13.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: ada Assignee: unassigned at gcc dot gnu.org Reporter: aj at ianozi dot com CC: dkm at gcc dot gnu.org Target Milestone: --- Tested with godbolt: https://ada.godbolt.org/z/ezqshsxzo Also tested on version 12 (the link is using version 13). Steps to reproduce: 1) Create "foo.ads" with: ``` package Foo is subtype Bar is String (1 .. 3) with Dynamic_Predicate => Bar in "ABC" | "DEF"; end Foo; ``` 2) Create "foobar.ads" with: ``` with Foo; package Foobar is subtype Foo_Bar is Foo.Bar; end Foobar; ``` 3) Create "foobar-nested.ads" with: ``` package Foobar.Nested is function Test_Function (Item : Foo_Bar) return Boolean is (True); end Foobar.Nested; ``` 4) Create "example.adb" with: ``` with Foobar.Nested; procedure Example is Bug : constant Boolean := Foobar.Nested.Test_Function ("ABC"); begin null; end Example; ``` It fails with: ``` gcc -c -I/app/ -g -fdiagnostics-color=always -S -fverbose-asm -masm=intel -o /app/example.s -I- gnatmake: "" compilation error +===GNAT BUG DETECTED==+ | 13.1.0 (x86_64-linux-gnu) Program_Error sem_eval.adb:4635 explicit raise | | Error detected at example.adb:3:42 | | Compiling| | Please submit a bug report; see https://gcc.gnu.org/bugs/ . | | Use a subject line meaningful to you and us to track the bug.| | Include the entire contents of this bug box in the report. | | Include the exact command that you entered. | | Also include sources listed below. | +==+ Please include these source files with error report Note that list may not be accurate in some cases, so please double check that the problem can still be reproduced with the set of files listed. Consider also -gnatd.n switch (see debug.adb). /app/foobar.ads /app/foo.ads /app/foobar-nested.ads compilation abandoned Compiler returned: 4 ``` (I took this from godbolt but the same error happens on my local systems) If I changed the definition of "Test_Function" to the following it works, so I'm guessing it has to do with the subtype: ``` function Test_Function (Item : Foo.Bar) return Boolean is (True); ```
RE: [PATCH V1] RISC-V:Add float16 tuple type abi
Committed, thanks Jeff. Pan -Original Message- From: Jeff Law Sent: Saturday, June 24, 2023 10:51 PM To: juzhe.zh...@rivai.ai; yulong ; gcc-patches Cc: palmer ; Kito.cheng ; Li, Pan2 ; wuwei2016 ; jiawei ; shihua ; dje.gcc ; pinskia ; Robin Dapp Subject: Re: [PATCH V1] RISC-V:Add float16 tuple type abi On 6/21/23 01:46, juzhe.zh...@rivai.ai wrote: > LGTM. Thanks. OK from me as well. jeff
[Bug middle-end/77294] __builtin_object_size inconsistent for member arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77294 Andrew Pinski changed: What|Removed |Added See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=64715 --- Comment #2 from Andrew Pinski --- I think this is an dup of bug 64715.
[Bug middle-end/44384] builtin_object_size_ treatment of multidimensional arrays is unexpected
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44384 Andrew Pinski changed: What|Removed |Added CC||siddhesh at gcc dot gnu.org --- Comment #6 from Andrew Pinski --- *** Bug 110373 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/110373] __builtin_object_size does not recognize subarrays in multi-dimensional arrays
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110373 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #1 from Andrew Pinski --- Dup of bug 44384. *** This bug has been marked as a duplicate of bug 44384 ***
[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173 --- Comment #32 from Vincent Lefèvre --- (In reply to Jakub Jelinek from comment #31) > (In reply to Vincent Lefèvre from comment #30) > > (In reply to Jakub Jelinek from comment #29) > > > I mean that if the compiler can't see it is in [0, 1], it will need > > > to use 2 additions and or the 2 carry bits together. But, because > > > the ored carry bits are in [0, 1] range, all the higher limbs could > > > be done using addc. > > > > If the compiler can't see that carryin is in [0, 1], then it must not "or" > > the carry bits; it needs to add them, as carryout may be 2. > > That is not how the clang builtin works, which is why I've implemented the | > and documented it that way, as it is a compatibility builtin. I'm confused. In Comment 14, you said that *carry_out = c1 + c2; was used. This is an addition, not an OR.
[Bug c++/110395] GCOV stuck in an infinite loop with large std::array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110395 --- Comment #3 from Andrew Pinski --- Note it is not an infinite loop, just many basic blocks (over 4 of them) causing the performance to be very very slow.
[Bug c++/110395] GCOV stuck in an infinite loop with large std::array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110395 Andrew Pinski changed: What|Removed |Added Resolution|--- |FIXED Status|UNCONFIRMED |RESOLVED --- Comment #2 from Andrew Pinski --- Fixed in GCC 12.1.0 by the same patch which fixed PR 92385 .
[Bug gcov-profile/110395] GCOV stuck in an infinite loop with large std::array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110395 --- Comment #1 from Andrew Pinski --- On the trunk it takes no time at all: [apinski@xeond2 upstream-gcc-git]$ ~/upstream-gcc/bin/g++ t.cc --coverage [apinski@xeond2 upstream-gcc-git]$ LD_LIBRARY_PATH=~/upstream-gcc/lib64 ./a.out [apinski@xeond2 upstream-gcc-git]$ LD_LIBRARY_PATH=~/upstream-gcc/lib64 ~/upstream-gcc/bin/gcov t.cc t.gcno:cannot open notes file t.gcda:cannot open data file, assuming not executed No executable lines [apinski@xeond2 upstream-gcc-git]$ LD_LIBRARY_PATH=~/upstream-gcc/lib64 ~/upstream-gcc/bin/gcov a-t.cc File '/home/apinski/upstream-gcc/include/c++/14.0.0/bits/stl_construct.h' Lines executed:100.00% of 4 Creating 'stl_construct.h.gcov' File '/home/apinski/upstream-gcc/include/c++/14.0.0/bits/new_allocator.h' Lines executed:50.00% of 4 Creating 'new_allocator.h.gcov' File '/home/apinski/upstream-gcc/include/c++/14.0.0/bits/stl_vector.h' Lines executed:95.45% of 22 Creating 'stl_vector.h.gcov' File '/home/apinski/upstream-gcc/include/c++/14.0.0/bits/alloc_traits.h' Lines executed:66.67% of 3 Creating 'alloc_traits.h.gcov' File '/home/apinski/upstream-gcc/include/c++/14.0.0/bits/allocator.h' Lines executed:100.00% of 1 Creating 'allocator.h.gcov' File 't.cc' Lines executed:100.00% of 5 Creating 't.cc.gcov' File '/home/apinski/upstream-gcc/include/c++/14.0.0/array' No executable lines Removing 'array.gcov' Lines executed:89.74% of 39 real0m0.043s user0m0.004s sys 0m0.002s
Re: Patch regarding addition of .symtab while generating object file from libiberty [WIP]
> Hi, Hi, I am sorry for late reaction. > I am working on the GSOC project "Bypass Assembler when generating LTO > object files." So as a first step, I am adding .symtab along with > __gnu_lto_slim symbol into it so that at a later stage, it can be > recognized that this object file has been produced using -flto enabled. > This patch is regarding the same. Although I am still testing this patch, I > want general feedback on my code and design choice. > I have extended simple_object_wrtie_struct to hold a list of symbols ( > similar to sections ). A function in simple-object.c to add symbols. I am > calling this function in lto-object.cc to add __gnu_lto_v1. > Right now, as we are only working on ELF support first, I am adding .symtab > in elf object files only. > > --- > gcc/lto-object.cc| 4 +- > include/simple-object.h | 10 +++ > libiberty/simple-object-common.h | 18 + > libiberty/simple-object-elf.c| 130 +-- > libiberty/simple-object.c| 32 > 5 files changed, 187 insertions(+), 7 deletions(-) > > diff --git a/gcc/lto-object.cc b/gcc/lto-object.cc > index cb1c3a6cfb3..680977cb327 100644 > --- a/gcc/lto-object.cc > +++ b/gcc/lto-object.cc > @@ -187,7 +187,9 @@ lto_obj_file_close (lto_file *file) >int err; > >gcc_assert (lo->base.offset == 0); > - > + /*Add __gnu_lto_slim symbol*/ > + if(flag_bypass_asm) > +simple_object_write_add_symbol (lo->sobj_w, "__gnu_lto_slim",1,1); You can probably do this unconditionally. The ltrans files we produce are kind of wrong by missing the symbol table currently. > +simple_object_write_add_symbol(simple_object_write *sobj, const char *name, > +size_t size, unsigned int align); Symbols has much more properties in addition to sizes and alignments. We will eventually need to get dwarf writting, so we will need to support them. However right now we only do these fake lto object symbols, so perhaps for start we could kep things simple and assume that size is always 0 and align always 1 or so. Overall this looks like really good start to me (both API and imllementation looks reasonable to me and it is good that you follow the coding convention). I guess you can create a branch (see git info on the gcc homepage) and put the patch there? I am also adding Ian to CC as he is maintainer of the simple-object and he may have some ideas. Honza > > /* Release all resources associated with SIMPLE_OBJECT, including any > simple_object_write_section's that may have been created. */ > diff --git a/libiberty/simple-object-common.h > b/libiberty/simple-object-common.h > index b9d10550d88..df99c9d85ac 100644 > --- a/libiberty/simple-object-common.h > +++ b/libiberty/simple-object-common.h > @@ -58,6 +58,24 @@ struct simple_object_write_struct >simple_object_write_section *last_section; >/* Private data for the object file format. */ >void *data; > + /*The start of the list of symbols.*/ > + simple_object_symbol *symbols; > + /*The last entry in the list of symbols*/ > + simple_object_symbol *last_symbol; > +}; > + > +/*A symbol in object file being created*/ > +struct simple_object_symbol_struct > +{ > + /*Next in the list of symbols attached to an > + simple_object_write*/ > + simple_object_symbol *next; > + /*The name of this symbol. */ > + char *name; > + /* Symbol value */ > + unsigned int align; > + /* Symbol size */ > + size_t size; > }; > > /* A section in an object file being created. */ > diff --git a/libiberty/simple-object-elf.c b/libiberty/simple-object-elf.c > index eee07039984..cbba88186bd 100644 > --- a/libiberty/simple-object-elf.c > +++ b/libiberty/simple-object-elf.c > @@ -787,9 +787,9 @@ simple_object_elf_write_ehdr (simple_object_write > *sobj, int descriptor, > ++shnum; >if (shnum > 0) > { > - /* Add a section header for the dummy section and one for > - .shstrtab. */ > - shnum += 2; > + /* Add a section header for the dummy section, > + .shstrtab, .symtab and .strtab. */ > + shnum += 4; > } > >ehdr_size = (cl == ELFCLASS32 > @@ -882,6 +882,51 @@ simple_object_elf_write_shdr (simple_object_write > *sobj, int descriptor, > errmsg, err); > } > > +/* Write out an ELF Symbol*/ > + > +static int > +simple_object_elf_write_symbol(simple_object_write *sobj, int descriptor, > +off_t offset, unsigned int st_name, unsigned int st_value, > size_t st_size, > +unsigned char st_info, unsigned char st_other, unsigned int > st_shndx, > +const char **errmsg, int *err) > +{ > + struct simple_object_elf_attributes *attrs = > +(struct simple_object_elf_attributes *) sobj->data; > + const struct elf_type_functions* fns; > + unsigned char cl; > + size_t sym_size; > + unsigned char buf[sizeof (Elf64_External_Shdr)]; > + > + fns = attrs->type_functions; > + cl = attrs->ei_class; > + > + sym_size = (cl ==
gcc-13-20230624 is now available
Snapshot gcc-13-20230624 is now available on https://gcc.gnu.org/pub/gcc/snapshots/13-20230624/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 13 git branch with the following options: git://gcc.gnu.org/git/gcc.git branch releases/gcc-13 revision 896085f08f683d915c6803e4f2e8a7c816dcb1d7 You'll find: gcc-13-20230624.tar.xz Complete GCC SHA256=2b1d0ecb8b4a30fe4eb50993af34d05199792902e9d1eafb12b193ce3c52e409 SHA1=c8ee4ceaeb241df4d33eb7d68d61b8e01dc52929 Diffs from 13-20230617 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-13 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
[Bug target/78904] zero-extracts are not effective
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78904 --- Comment #18 from CVS Commits --- The master branch has been updated by Roger Sayle : https://gcc.gnu.org/g:8f6c747c8638d4c3c47ba2d4c8be86909e183132 commit r14-2065-g8f6c747c8638d4c3c47ba2d4c8be86909e183132 Author: Roger Sayle Date: Sat Jun 24 23:05:25 2023 +0100 i386: Add alternate representation for {and,or,xor}b %ah,%dh. A patch that I'm working on to improve RTL simplifications in the middle-end results in the regression of pr78904-1b.c, due to changes in the canonical representation of high-byte (%ah, %bh, %ch, %dh) logic. See also PR target/78904. This patch avoids/prevents those failures by adding support for the alternate representation, duplicating the existing *qi_ext_2 as *qi_ext_3 (the new version also replacing any_or with any_logic to provide *andqi_ext_3 in the same pattern). Removing the original pattern isn't trivial, as it's generated by define_split, but this can be investigated after the other pieces are approved. The current representation of this instruction is: (set (zero_extract:DI (reg/v:DI 87 [ aD.2763 ]) (const_int 8 [0x8]) (const_int 8 [0x8])) (subreg:DI (xor:QI (subreg:QI (zero_extract:DI (reg:DI 94) (const_int 8 [0x8]) (const_int 8 [0x8])) 0) (subreg:QI (zero_extract:DI (reg/v:DI 87 [ aD.2763 ]) (const_int 8 [0x8]) (const_int 8 [0x8])) 0)) 0)) after my proposed middle-end improvement, we attempt to recognize: (set (zero_extract:DI (reg/v:DI 87 [ aD.2763 ]) (const_int 8 [0x8]) (const_int 8 [0x8])) (zero_extract:DI (xor:DI (reg:DI 94) (reg/v:DI 87 [ aD.2763 ])) (const_int 8 [0x8]) (const_int 8 [0x8]))) 2023-06-24 Roger Sayle gcc/ChangeLog * config/i386/i386.md (*qi_ext_3): New define_insn.
[Bug middle-end/109986] missing fold (~a | b) ^ a => ~(a & b)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109986 --- Comment #3 from Ivan Sorokin --- I tried to investigate why GCC is able to simplify `(a | b) ^ a` and `(a | ~b) ^ a` from comment 2, but not similarly looking `(~a | b) ^ a` from comment 0. `(a | b) ^ a` matches the following pattern from match.pd: /* (X | Y) ^ X -> Y & ~ X*/ (simplify (bit_xor:c (convert1? (bit_ior:c @@0 @1)) (convert2? @0)) (if (tree_nop_conversion_p (type, TREE_TYPE (@0))) (convert (bit_and @1 (bit_not @0) `(a | ~b) ^ a` matches another pattern: /* (~X | C) ^ D -> (X | C) ^ (~D ^ C) if (~D ^ C) can be simplified. */ (simplify (bit_xor:c (bit_ior:cs (bit_not:s @0) @1) @2) (bit_xor (bit_ior @0 @1) (bit_xor! (bit_not! @2) @1))) With substitution `X = b, C = a, D = a` it gives: (b | a) ^ (~a ^ a) (b | a) ^ -1 ~(b | a) `(~a | b) ^ a` is not simplifiable by this pattern because it requires that `~D ^ C` is simplifiable further, but `~a ^ b` is not. In any case, even if it were applicable it would produce `(a | b) ^ (~a ^ b)` which has more operations than the original expression.
[Bug c++/110397] types may not be defined in parameter types leads to ICE with -fdump-tree-original (or no -quiet when invoking cc1plus directly)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110397 Andrew Pinski changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #2 from Andrew Pinski --- Dup of bug 93788. *** This bug has been marked as a duplicate of bug 93788 ***
[Bug c++/93788] Segfault caused by infinite loop in cc1plus
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93788 Andrew Pinski changed: What|Removed |Added CC||stevenxia990430 at gmail dot com --- Comment #4 from Andrew Pinski --- *** Bug 110397 has been marked as a duplicate of this bug. ***
[Bug c++/110397] types may not be defined in parameter types leads to ICE with -fdump-tree-original (or no -quiet when invoking cc1plus directly)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110397 --- Comment #1 from Andrew Pinski --- Note here is the odd thing about this issue, it only shows up some of the time. You can reproduce it 100% of the time if you use -fdump-tree-original . Also don't need the include of iostream (though if using godbolt you do need it if not using -fdump-tree-original) .
[Bug c++/110344] [C++26] P2738R1 - constexpr cast from void*
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110344 --- Comment #3 from Jason Merrill --- Version of the paper testcase that just adds constexpr, that we currently reject: #include struct Sheep { constexpr std::string_view speak() const noexcept { return "Baa"; } }; struct Cow { constexpr std::string_view speak() const noexcept { return "Mooo"; } }; class Animal_View { private: const void *animal; std::string_view (*speak_function)(const void *); public: template constexpr Animal_View(const Animal ) : animal{}, speak_function{[](const void *object) { return static_cast(object)->speak(); }} {} constexpr std::string_view speak() const noexcept { return speak_function(animal); } }; // This is the key bit here. This is a single concrete function // that can take anything that happens to have the "Animal_View" // interface constexpr std::string_view do_speak(Animal_View av) { return av.speak(); } int main() { // A Cow is a cow. The only think that makes it special // is that it has a "std::string_view speak() const" member constexpr Cow cow; constexpr auto result = do_speak(cow); return static_cast(result.size()); }
[Bug c++/110344] [C++26] P2738R1 - constexpr cast from void*
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110344 Jason Merrill changed: What|Removed |Added CC||jason at gcc dot gnu.org --- Comment #2 from Jason Merrill --- Reduced version of the paper's testcase that we already (wrongly) accept: class Doer { private: const void *ob; int (*fn)(const void *); public: template constexpr Doer(const T ) : ob{}, fn{[](const void *p) { return static_cast(p)->doit(); }} {} constexpr int operator()() const { return fn(ob); } }; struct Thing { constexpr int doit() const { return 42; }; }; static_assert (Doer(Thing())() == 42);
[Bug c++/110397] New: types may not be defined in parameter types leads to ICE
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110397 Bug ID: 110397 Summary: types may not be defined in parameter types leads to ICE Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: stevenxia990430 at gmail dot com Target Milestone: --- The following invalid program reports an internal compiler error: Segmentation fault. To quickly reproduce: https://gcc.godbolt.org/z/dE96K7cGc ``` #include int main(){ auto sum = ([](struct A {int b; int c;}a,...){ }); return 0; } ``` tested on gcc-trunk
[Bug c++/110394] Lambda capture receives wrong value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394 --- Comment #7 from Andrew Pinski --- (In reply to jackyguo18 from comment #6) > @Andrew Pinski - Thanks, just confirmed that that was the issue. > > Why doesn't GCC choose to delete the function (thus causing the weird > behaviour) early at lower optimization levels? > > Seems kinda strange it would work at -O2. Most likely inlining more and being more agressive of doing some optimizations. Since it is undefined behavior if you use the object after the lifetime ends, it is just happened to work at different levels of optimization really.
[Bug c++/110394] Lambda capture receives wrong value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394 jackyguo18 at hotmail dot com changed: What|Removed |Added Status|WAITING |RESOLVED Resolution|--- |INVALID --- Comment #6 from jackyguo18 at hotmail dot com --- @Andrew Pinski - Thanks, just confirmed that that was the issue. Why doesn't GCC choose to delete the function (thus causing the weird behaviour) early at lower optimization levels? Seems kinda strange it would work at -O2.
[Bug c++/110394] Lambda capture receives wrong value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394 --- Comment #5 from jackyguo18 at hotmail dot com --- @Andrew Pinski - Thanks, just confirmed that that was the issue. Why doesn't GCC choose to delete the function (thus causing the weird behaviour) early at lower optimization levels? Seems kinda strange it would work at -O2.
[Bug target/108678] Windows on ARM64 platform target aarch64-w64-mingw32
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108678 --- Comment #3 from Brecht Sanders --- Any pointers on which files to edit in order to support aarch64-mingw ? I think it won't require reinventing the wheel as it will probably be a mix of existing *-mingw and aarch64-* stuff...
[Bug middle-end/102253] scalability issues with large loop depth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102253 --- Comment #5 from Andrew Pinski --- On the trunk with the original testcase here we get: tree copy headers : 12.16 ( 19%) 0.01 ( 2%) 21.57 ( 28%) 771k ( 0%) (I suspect the rest is due to not setting release checking ...)
Re: [Patch, fortran] PR49213 - [OOP] gfortran rejects structure constructor expression
Hi Paul! On 6/24/23 15:18, Paul Richard Thomas via Gcc-patches wrote: I have included the adjustment to 'gfc_is_ptr_fcn' and eliminating the extra blank line, introduced by my last patch. I played safe and went exclusively for class functions with attr.class_pointer set on the grounds that these have had all the accoutrements checked and built (ie. class_ok). I am still not sure if this is necessary or not. maybe it is my fault, but I find the version in the patch confusing: @@ -816,7 +816,7 @@ bool gfc_is_ptr_fcn (gfc_expr *e) { return e != NULL && e->expr_type == EXPR_FUNCTION - && (gfc_expr_attr (e).pointer + && ((e->ts.type != BT_CLASS && gfc_expr_attr (e).pointer) || (e->ts.type == BT_CLASS && CLASS_DATA (e)->attr.class_pointer)); } The caller 'gfc_is_ptr_fcn' has e->expr_type == EXPR_FUNCTION, so gfc_expr_attr (e) boils down to: if (e->value.function.esym && e->value.function.esym->result) { gfc_symbol *sym = e->value.function.esym->result; attr = sym->attr; if (sym->ts.type == BT_CLASS && sym->attr.class_ok) { attr.dimension = CLASS_DATA (sym)->attr.dimension; attr.pointer = CLASS_DATA (sym)->attr.class_pointer; attr.allocatable = CLASS_DATA (sym)->attr.allocatable; } } ... else if (e->symtree) attr = gfc_variable_attr (e, NULL); So I thought this should already do what you want if you do gfc_is_ptr_fcn (gfc_expr *e) { return e != NULL && e->expr_type == EXPR_FUNCTION && gfc_expr_attr (e).pointer; } or what am I missing? The additional checks in gfc_expr_attr are there to avoid ICEs in case CLASS_DATA (sym) has issues, and we all know Gerhard who showed that he is an expert in exploiting this. To sum up, I'd prefer to use the safer form if it works. If it doesn't, I would expect a latent issue. The rest of the code looked good to me, but I was suspicious about the handling of CHARACTER. Nasty as I am, I modified the testcase to use character(kind=4) instead of kind=1 (see attached). This either fails here (stop 10), or if I activate the marked line !cont = tContainer('hello!') ! ### ICE! ### I get an ICE. Can you have another look? Thanks, Harald OK for trunk? Paul Fortran: Enable class expressions in structure constructors [PR49213] 2023-06-24 Paul Thomas gcc/fortran PR fortran/49213 * expr.cc (gfc_is_ptr_fcn): Guard pointer attribute to exclude class expressions. * resolve.cc (resolve_assoc_var): Call gfc_is_ptr_fcn to allow associate names with pointer function targets to be used in variable definition context. * trans-decl.cc (get_symbol_decl): Remove extraneous line. * trans-expr.cc (alloc_scalar_allocatable_subcomponent): Obtain size of intrinsic and character expressions. (gfc_trans_subcomponent_assign): Expand assignment to class components to include intrinsic and character expressions. gcc/testsuite/ PR fortran/49213 * gfortran.dg/pr49213.f90 : New test ! { dg-do run } ! ! Contributed by Neil Carlson ! program main ! character(2) :: c character(2,kind=4) :: c type :: S integer :: n end type type(S) :: Sobj type, extends(S) :: S2 integer :: m end type type(S2) :: S2obj type :: T class(S), allocatable :: x end type type(T) :: Tobj Sobj = S(1) Tobj = T(Sobj) S2obj = S2(1,2) Tobj = T(S2obj)! Failed here select type (x => Tobj%x) type is (S2) if ((x%n .ne. 1) .or. (x%m .ne. 2)) stop 1 class default stop 2 end select c = 4_" " call pass_it (T(Sobj)) if (c .ne. 4_"S ") stop 3 call pass_it (T(S2obj))! and here if (c .ne. 4_"S2") stop 4 call bar contains subroutine pass_it (foo) type(T), intent(in) :: foo select type (x => foo%x) type is (S) c = 4_"S " if (x%n .ne. 1) stop 5 type is (S2) c = 4_"S2" if ((x%n .ne. 1) .or. (x%m .ne. 2)) stop 6 class default stop 7 end select end subroutine subroutine bar ! Test from comment #29 of the PR - due to Janus Weil type tContainer class(*), allocatable :: x end type integer, parameter :: i = 0 character(7,kind=4) :: chr = 4_"goodbye" type(tContainer) :: cont cont%x = i ! linker error: undefined reference to `__copy_INTEGER_4_.3804' cont = tContainer(i+42) ! Failed here select type (z => cont%x) type is (integer) if (z .ne. 42) stop 8 class default stop 9 end select !cont = tContainer('hello!') ! ### ICE! ### cont = tContainer(4_'hello!') select type (z => cont%x) type is (character(*,kind=4)) if (z .ne. 4_'hello!') stop 10 class default stop 11 end select cont = tContainer(chr) select type (z => cont%x) type is (character(*,kind=4)) if (z .ne. 4_'goodbye') stop 12 class default
[Bug rtl-optimization/110390] ICE on valid code on x86_64-linux-gnu with sel-scheduling: in av_set_could_be_blocked_by_bookkeeping_p, at sel-sched.cc:3609
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110390 --- Comment #1 from Zhendong Su --- Another reproducer with fewer flags (and affects 12.* and later). Compiler Explorer: https://godbolt.org/z/fYqEz9EWx [603] % gcctk -v Using built-in specs. COLLECT_GCC=gcctk COLLECT_LTO_WRAPPER=/local/home/suz/suz-local/software/local/gcc-trunk/bin/../libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: ../gcc-trunk/configure --disable-bootstrap --enable-checking=yes --prefix=/local/suz-local/software/local/gcc-trunk --enable-sanitizers --enable-languages=c,c++ --disable-werror --disable-multilib Thread model: posix Supported LTO compression algorithms: zlib gcc version 14.0.0 20230624 (experimental) [master r14-924-gd709841ae0f] (GCC) [604] % [604] % gcctk -O3 -fsel-sched-pipelining -fschedule-insns -fselective-scheduling2 -fPIC small.c during RTL pass: sched2 small.c: In function ‘h’: small.c:20:1: internal compiler error: in av_set_could_be_blocked_by_bookkeeping_p, at sel-sched.cc:3609 20 | } | ^ 0x7d635a av_set_could_be_blocked_by_bookkeeping_p ../../gcc-trunk/gcc/sel-sched.cc:3609 0x7d635a code_motion_process_successors ../../gcc-trunk/gcc/sel-sched.cc:6386 0x7d635a code_motion_path_driver ../../gcc-trunk/gcc/sel-sched.cc:6608 0xf85b69 code_motion_process_successors ../../gcc-trunk/gcc/sel-sched.cc:6342 0xf85b69 code_motion_path_driver ../../gcc-trunk/gcc/sel-sched.cc:6608 0xf86c18 find_used_regs ../../gcc-trunk/gcc/sel-sched.cc:3272 0xf86c18 collect_unavailable_regs_from_bnds ../../gcc-trunk/gcc/sel-sched.cc:1586 0xf86c18 find_best_reg_for_expr ../../gcc-trunk/gcc/sel-sched.cc:1649 0xf8976c fill_vec_av_set ../../gcc-trunk/gcc/sel-sched.cc:3784 0xf8976c fill_ready_list ../../gcc-trunk/gcc/sel-sched.cc:4014 0xf8976c find_best_expr ../../gcc-trunk/gcc/sel-sched.cc:4374 0xf8976c fill_insns ../../gcc-trunk/gcc/sel-sched.cc:5535 0xf8976c schedule_on_fences ../../gcc-trunk/gcc/sel-sched.cc:7353 0xf8976c sel_sched_region_2 ../../gcc-trunk/gcc/sel-sched.cc:7491 0xf8a928 sel_sched_region_1 ../../gcc-trunk/gcc/sel-sched.cc:7533 0xf8bf46 sel_sched_region(int) ../../gcc-trunk/gcc/sel-sched.cc:7634 0xf8bf46 sel_sched_region(int) ../../gcc-trunk/gcc/sel-sched.cc:7619 0xf8c0e9 run_selective_scheduling() ../../gcc-trunk/gcc/sel-sched.cc:7720 0xf6d7ed rest_of_handle_sched2 ../../gcc-trunk/gcc/sched-rgn.cc:3743 0xf6d7ed execute ../../gcc-trunk/gcc/sched-rgn.cc:3890 Please submit a full bug report, with preprocessed source (by using -freport-bug). Please include the complete backtrace with any bug report. See <https://gcc.gnu.org/bugs/> for instructions. [605] % [605] % cat small.c static int a; int b, c, d, g; long e, f; extern void l(char *); void h() { char i; int j = 1 >> f / b; L: f = -(-(f % g || a) * (c && f | e)); if (a > e) l(""); if (f) { l("A"); i = j / g; } if (a) goto L; d = i; a = 0; }
[Bug middle-end/102253] scalability issues with large loop depth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102253 --- Comment #4 from Andrew Pinski --- VRP/ranger uses SCEV now so it might even be worse, the testcase from PR 110396 has that behavior too.
[Bug middle-end/102253] scalability issues with large loop depth
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102253 Andrew Pinski changed: What|Removed |Added CC||luydorarko at vusra dot com --- Comment #3 from Andrew Pinski --- *** Bug 110396 has been marked as a duplicate of this bug. ***
[Bug tree-optimization/110396] Compile-time hog with -O2 and -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110396 Andrew Pinski changed: What|Removed |Added Resolution|--- |DUPLICATE Status|UNCONFIRMED |RESOLVED --- Comment #2 from Andrew Pinski --- This is basically a dup of bug 102253. The problem is there is a known scalability issues with large loop depth. How did you generate this testcase, is it from real code or just generated to try to hit some compile bugs? *** This bug has been marked as a duplicate of bug 102253 ***
[Bug tree-optimization/110396] Compile-time hog with -O2 and -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110396 Andrew Pinski changed: What|Removed |Added Component|c++ |tree-optimization --- Comment #1 from Andrew Pinski --- #0 0x012f8732 in hash_table_mod1 (index=5, hash=165) at /home/apinski/src/upstream-gcc-git/gcc/gcc/hash-table.h:344 #1 hash_table::find_slot_with_hash (insert=INSERT, hash=165, comparable=, this=0x77600b28) at /home/apinski/src/upstream-gcc-git/gcc/gcc/hash-table.h:1051 #2 hash_table::find_slot (insert=INSERT, value=, this=0x77600b28) at /home/apinski/src/upstream-gcc-git/gcc/gcc/hash-table.h:435 #3 find_var_scev_info (instantiated_below=0x75247b40, var=0x75159360) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:358 #4 0x012f9312 in get_scalar_evolution (scalar=0x75159360, instantiated_below=0x75247b40) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:556 #5 analyze_scalar_evolution (loop=0x7514eaf0, var=0x75159360) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:2020 #6 0x012f8aa7 in interpret_condition_phi (condition_phi=0x751fbd00, loop=0x7514eaf0) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1603 #7 analyze_scalar_evolution_1 (loop=0x7514eaf0, var=0x751f5dc8) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1969 #8 0x012f8b5b in analyze_scalar_evolution_1 (loop=0x7514e960, var=0x751f5dc8) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1950 #9 0x012f94e5 in analyze_scalar_evolution (loop=0x7514e960, var=0x751f5dc8) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:2031 #10 0x012f8aa7 in interpret_condition_phi (condition_phi=0x75209400, loop=0x7514e960) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1603 #11 analyze_scalar_evolution_1 (loop=0x7514e960, var=0x75207870) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1969 #12 0x012f8b5b in analyze_scalar_evolution_1 (loop=0x7514e7d0, var=0x75207870) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1950 #13 0x012f94e5 in analyze_scalar_evolution (loop=0x7514e7d0, var=0x75207870) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:2031 #14 0x012f8aa7 in interpret_condition_phi (condition_phi=0x7520ab00, loop=0x7514e7d0) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1603 #15 analyze_scalar_evolution_1 (loop=0x7514e7d0, var=0x75161900) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1969 #16 0x012f8b5b in analyze_scalar_evolution_1 (loop=0x7514e640, var=0x75161900) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1950 #17 0x012f94e5 in analyze_scalar_evolution (loop=0x7514e640, var=0x75161900) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:2031 #18 0x012f8aa7 in interpret_condition_phi (condition_phi=0x75172a00, loop=0x7514e640) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1603 #19 analyze_scalar_evolution_1 (loop=0x7514e640, var=0x75159318) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1969 #20 0x012f8b5b in analyze_scalar_evolution_1 (loop=0x7514e4b0, var=0x75159318) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1950 #21 0x012f94e5 in analyze_scalar_evolution (loop=0x7514e4b0, var=0x75159318) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:2031 #22 0x012f8aa7 in interpret_condition_phi (condition_phi=0x7520d300, loop=0x7514e4b0) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1603 #23 analyze_scalar_evolution_1 (loop=0x7514e4b0, var=0x74d961b0) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1969 #24 0x012f8b5b in analyze_scalar_evolution_1 (loop=0x7514e320, var=0x74d961b0) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1950 #25 0x012f94e5 in analyze_scalar_evolution (loop=0x7514e320, var=0x74d961b0) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:2031 #26 0x012f8aa7 in interpret_condition_phi (condition_phi=0x7520d500, loop=0x7514e320) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1603 #27 analyze_scalar_evolution_1 (loop=0x7514e320, var=0x7505fd80) at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1969 #28 0x012f8b5b in analyze_scalar_evolution_1 (loop=0x7514e190, var=0x7505fd80) at
[Bug tree-optimization/110311] [14 Regression] regression in tree-optimizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110311 --- Comment #23 from anlauf at gcc dot gnu.org --- You could check the input arguments for validity, e.g. using ieee_is_finite from the intrinsic ieee_arithmetic module. use, intrinsic :: ieee_arithmetic, only: ieee_is_finite ... if (.not. ieee_is_finite (a)) then print *, "bad: a=", a stop 1 end if As last resort I still recommend what I wrote in comment#15: build (=link) your executable from *.o from your project build tree with known-good objects but replacing one candidate.o by the one from the build tree showing the problem. And I really mean: link only und run.
[Bug c++/110396] New: Compile-time hog with -O2 and -O3
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110396 Bug ID: 110396 Summary: Compile-time hog with -O2 and -O3 Product: gcc Version: unknown Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: luydorarko at vusra dot com Target Milestone: --- Created attachment 55397 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55397=edit Preprocessed file created by `-O2 -save-temps` Compile time hog behavior can be reproduced with: ``` g++ -O2 tmp.cpp ``` Also same behavior with `-O3`. Compiler takes far too long (more than one hour in one case) and was killed after a while. Output of `g++ -v`: ``` Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/13.1.1/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /build/gcc/src/gcc/configure --enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --with-build-config=bootstrap-lto --with-linker-hash-style=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-libstdcxx-backtrace --enable-link-serialization=1 --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-werror Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 13.1.1 20230429 (GCC) ``` Attachment: a-tmp.ii file created with `g++ -O2 tmp.cpp -save-temps`
[PATCH, part2, committed] Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument [PR110360]
Dear all, the first part of the patch came with a testcase that also exercised code for constant string arguments, which was not touched by that patch but seems to have caused runtime failures on big-endian platforms (e.g. Power-* BE) for all optimization levels, and on x86 / -m32 at -O1 and higher (not at -O0). I did not see any issues on x86 / -m64 and any optimization level, but could reproduce a problem with x86 / -m32 at -O1, which appears to be related how arguments that are to be passed by value are handled when there is a mismatch between the function prototype and the passed argument. The solution is to truncate too long constant string arguments, fixed by the attached patch, pushed as: https://gcc.gnu.org/g:3f97d10aa1ff5984d6fd657f246d3f251b254ff1 and see attached. * * * I found gcc-testresults quite helpful in checking whether my patch caused trouble on architectures different from the one I'm working on. The value (pun intended) would have been even greater if output of runtime failures would also be made available. Many (Fortran) tests provide either a stop code, or some hopefully helpful diagnostic output on stdout intended for locating errors on platforms where one has no direct access to, or is less familiar with. Far better than a plain FAIL: gfortran.dg/value_9.f90 -O1 execution test * * * Thanks, Harald From 3f97d10aa1ff5984d6fd657f246d3f251b254ff1 Mon Sep 17 00:00:00 2001 From: Harald Anlauf Date: Sat, 24 Jun 2023 20:36:53 +0200 Subject: [PATCH] Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument [PR110360] gcc/fortran/ChangeLog: PR fortran/110360 * trans-expr.cc (gfc_conv_procedure_call): Truncate constant string argument of length > 1 passed to scalar CHARACTER(1),VALUE dummy. --- gcc/fortran/trans-expr.cc | 21 + 1 file changed, 13 insertions(+), 8 deletions(-) diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc index c92fccd0be2..63e3cf9681e 100644 --- a/gcc/fortran/trans-expr.cc +++ b/gcc/fortran/trans-expr.cc @@ -6395,20 +6395,25 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym, /* ABI: actual arguments to CHARACTER(len=1),VALUE dummy arguments are actually passed by value. - The BIND(C) case is handled elsewhere. - TODO: truncate constant strings to length 1. */ + Constant strings are truncated to length 1. + The BIND(C) case is handled elsewhere. */ if (fsym->ts.type == BT_CHARACTER && !fsym->ts.is_c_interop && fsym->ts.u.cl->length->expr_type == EXPR_CONSTANT && fsym->ts.u.cl->length->ts.type == BT_INTEGER && (mpz_cmp_ui - (fsym->ts.u.cl->length->value.integer, 1) == 0) - && e->expr_type != EXPR_CONSTANT) + (fsym->ts.u.cl->length->value.integer, 1) == 0)) { - parmse.expr = gfc_string_to_single_character - (build_int_cst (gfc_charlen_type_node, 1), - parmse.expr, - e->ts.kind); + if (e->expr_type != EXPR_CONSTANT) + parmse.expr = gfc_string_to_single_character + (build_int_cst (gfc_charlen_type_node, 1), + parmse.expr, + e->ts.kind); + else if (e->value.character.length > 1) + { + e->value.character.length = 1; + gfc_conv_expr (, e); + } } if (fsym->attr.optional -- 2.35.3
[Bug tree-optimization/110311] [14 Regression] regression in tree-optimizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110311 --- Comment #22 from Jürgen Reuter --- (In reply to anlauf from comment #21) > I forgot to mention that you need to check that the location where a symptom > is seen sometimes may not be the location of the cause. Indeed, I think you are right and the problem is elsewhere. I don't really know where to continue.
[Bug fortran/82943] [F03] Error with type-bound procedure of parametrized derived type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82943 Jerry DeLisle changed: What|Removed |Added CC||jvdelisle at gcc dot gnu.org --- Comment #14 from Jerry DeLisle --- (In reply to Alexander Westbrooks from comment #13) > I sent in the patch to those emails. Hopefully now the ball will start > rolling and I can slowly get this packaged into a legitimate fix. I'll post > updates here as I receive them. > > The patch is below, if you would like to try it. I did this in the GCC 14 > code. > I saw your email. Thanks for getting involved!
[Bug fortran/110360] ABI issue with character,value dummy argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110360 --- Comment #13 from CVS Commits --- The master branch has been updated by Harald Anlauf : https://gcc.gnu.org/g:3f97d10aa1ff5984d6fd657f246d3f251b254ff1 commit r14-2064-g3f97d10aa1ff5984d6fd657f246d3f251b254ff1 Author: Harald Anlauf Date: Sat Jun 24 20:36:53 2023 +0200 Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument [PR110360] gcc/fortran/ChangeLog: PR fortran/110360 * trans-expr.cc (gfc_conv_procedure_call): Truncate constant string argument of length > 1 passed to scalar CHARACTER(1),VALUE dummy.
[Bug fortran/110360] ABI issue with character,value dummy argument
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110360 --- Comment #12 from anlauf at gcc dot gnu.org --- (In reply to anlauf from comment #11) > Created attachment 55393 [details] > Patch to truncate string argument longer than 1 > > This truncates the string to length 1 and appears to work on x86 / -m32 . > Would be interesting to get feedback on big-endian platforms. As this works here, cross-checked with valgrind, and not feedback so far, I'll push this update and watch the testers.
[Bug tree-optimization/110311] [14 Regression] regression in tree-optimizer
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110311 --- Comment #21 from anlauf at gcc dot gnu.org --- I forgot to mention that you need to check that the location where a symptom is seen sometimes may not be the location of the cause.
[x86_PATCH] New *ashl_doubleword_highpart define_insn_and_split.
This patch contains a pair of (related) optimizations in i386.md that allow us to generate better code for the example below (this is a step towards fixing a bugzilla PR, but I've forgotten the number). __int128 foo64(__int128 x, long long y) { __int128 t = (__int128)y << 64; return x ^ t; } The hidden issue is that the RTL currently seen by reload contains the sign extension of y from DImode to TImode, even though this is dead (not required) for left shifts by more than WORD_SIZE bits. (insn 11 8 12 2 (parallel [ (set (reg:TI 0 ax [orig:91 y ] [91]) (sign_extend:TI (reg:DI 1 dx [97]))) (clobber (reg:CC 17 flags)) (clobber (scratch:DI)) ]) {extendditi2} What makes this particularly undesirable is that the sign-extension pattern above requires an additional DImode scratch register, indicated by the clobber, which unnecessarily increases register pressure. The proposed solution is to add a define_insn_and_split for such left shifts (of sign or zero extensions) that only have a non-zero highpart, where the extension is redundant and eliminated, that can be split after reload, without scratch registers or early clobbers. This (late split) exposes a second optimization opportunity where setting the lowpart to zero can sometimes be combined/simplified with the following instruction during peephole2. For the test case above, we previously generated with -O2: foo64: xorl%eax, %eax xorq%rsi, %rdx xorq%rdi, %rax ret with this patch, we now generate: foo64: movq%rdi, %rax xorq%rsi, %rdx ret Likewise for the related -m32 test case, we go from: foo32: movl12(%esp), %eax movl%eax, %edx xorl%eax, %eax xorl8(%esp), %edx xorl4(%esp), %eax ret to the improved: foo32: movl12(%esp), %edx movl4(%esp), %eax xorl8(%esp), %edx ret This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-06-24 Roger Sayle gcc/ChangeLog * config/i386/i386.md (peephole2): Simplify zeroing a register followed by an IOR, XOR or PLUS operation on it, into a move. (*ashl3_doubleword_highpart): New define_insn_and_split to eliminate (and hide from reload) unnecessary word to doubleword extensions that are followed by left shifts by sufficient large (but valid) bit counts. gcc/testsuite/ChangeLog * gcc.target/i386/ashldi3-1.c: New 32-bit test case. * gcc.target/i386/ashlti3-2.c: New 64-bit test case. Thanks again, Roger -- diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md index 95a6653c..7664dff 100644 --- a/gcc/config/i386/i386.md +++ b/gcc/config/i386/i386.md @@ -12206,6 +12206,18 @@ (set_attr "type" "alu") (set_attr "mode" "QI")]) +;; Peephole2 rega = 0; rega op= regb into rega = regb. +(define_peephole2 + [(parallel [(set (match_operand:SWI 0 "general_reg_operand") + (const_int 0)) + (clobber (reg:CC FLAGS_REG))]) + (parallel [(set (match_dup 0) + (any_or_plus:SWI (match_dup 0) + (match_operand:SWI 1 ""))) + (clobber (reg:CC FLAGS_REG))])] + "" + [(set (match_dup 0) (match_dup 1))]) + ;; Split DST = (HI<<32)|LO early to minimize register usage. (define_insn_and_split "*concat3_1" [(set (match_operand: 0 "nonimmediate_operand" "=ro,r") @@ -13365,6 +13377,28 @@ [(const_int 0)] "ix86_split_ashl (operands, operands[3], mode); DONE;") +(define_insn_and_split "*ashl3_doubleword_highpart" + [(set (match_operand: 0 "register_operand" "=r") + (ashift: + (any_extend: (match_operand:DWIH 1 "nonimmediate_operand" "rm")) + (match_operand:QI 2 "const_int_operand"))) + (clobber (reg:CC FLAGS_REG))] + "INTVAL (operands[2]) >= * BITS_PER_UNIT + && INTVAL (operands[2]) < * BITS_PER_UNIT * 2" + "#" + "&& reload_completed" + [(const_int 0)] +{ + split_double_mode (mode, [0], 1, [0], [3]); + int bits = INTVAL (operands[2]) - ( * BITS_PER_UNIT); + if (!rtx_equal_p (operands[3], operands[1])) +emit_move_insn (operands[3], operands[1]); + if (bits > 0) +emit_insn (gen_ashl3 (operands[3], operands[3], GEN_INT (bits))); + ix86_expand_clear (operands[0]); + DONE; +}) + (define_insn "x86_64_shld" [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m") (ior:DI (ashift:DI (match_dup 0) diff --git a/gcc/testsuite/gcc.target/i386/ashldi3-1.c b/gcc/testsuite/gcc.target/i386/ashldi3-1.c new file mode 100644 index 000..b61d63b --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/ashldi3-1.c @@ -0,0 +1,16 @@ +/* { dg-do compile { target ia32 } } */ +/* { dg-options "-O2" } */ + +long long foo(long long x, int y) +{ + long long t = (long long)y <<
[Bug fortran/82943] [F03] Error with type-bound procedure of parametrized derived type
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82943 --- Comment #13 from Alexander Westbrooks --- I sent in the patch to those emails. Hopefully now the ball will start rolling and I can slowly get this packaged into a legitimate fix. I'll post updates here as I receive them. The patch is below, if you would like to try it. I did this in the GCC 14 code. diff --git a/gcc/fortran/decl.cc b/gcc/fortran/decl.cc index d09c8bc97d9..9043a4d427f 100644 --- a/gcc/fortran/decl.cc +++ b/gcc/fortran/decl.cc @@ -4063,6 +4063,21 @@ gfc_get_pdt_instance (gfc_actual_arglist *param_list, gfc_symbol **sym, continue; } + /* +Addressing PR82943, this will fix the issue where a function/subroutine is declared as not +a member of the PDT instance. The reason for this is because the PDT instance did not have +access to its template's f2k_derived namespace in order to find the typebound procedures. + +The number of references to the PDT template's f2k_derived will ensure that f2k_derived is +properly freed later on. + */ + + if (!instance->f2k_derived && pdt->f2k_derived) + { +instance->f2k_derived = pdt->f2k_derived; +instance->f2k_derived->refs++; + } + /* Set the component kind using the parameterized expression. */ if ((c1->ts.kind == 0 || c1->ts.type == BT_CHARACTER) && c1->kind_expr != NULL) diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h index a58c60e9828..6854edb3467 100644 --- a/gcc/fortran/gfortran.h +++ b/gcc/fortran/gfortran.h @@ -3536,6 +3536,7 @@ void gfc_traverse_gsymbol (gfc_gsymbol *, void (*)(gfc_gsymbol *, void *), void gfc_typebound_proc* gfc_get_typebound_proc (gfc_typebound_proc*); gfc_symbol* gfc_get_derived_super_type (gfc_symbol*); bool gfc_type_is_extension_of (gfc_symbol *, gfc_symbol *); +bool gfc_pdt_is_instance_of(gfc_symbol *, gfc_symbol *); bool gfc_type_compatible (gfc_typespec *, gfc_typespec *); void gfc_copy_formal_args_intr (gfc_symbol *, gfc_intrinsic_sym *, diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc index 50b49d0cb83..6af55760321 100644 --- a/gcc/fortran/resolve.cc +++ b/gcc/fortran/resolve.cc @@ -14705,14 +14705,34 @@ resolve_typebound_procedure (gfc_symtree* stree) goto error; } - if (CLASS_DATA (me_arg)->ts.u.derived - != resolve_bindings_derived) - { - gfc_error ("Argument %qs of %qs with PASS(%s) at %L must be of" -" the derived-type %qs", me_arg->name, proc->name, -me_arg->name, , resolve_bindings_derived->name); - goto error; - } + /* The derived type is not a PDT template. Resolve as usual */ + if ( !resolve_bindings_derived->attr.pdt_template && +(CLASS_DATA (me_arg)->ts.u.derived != resolve_bindings_derived)) + { +gfc_error ("Argument %qs of %qs with PASS(%s) at %L must be of" +" the derived-type %qs", me_arg->name, proc->name, +me_arg->name, , resolve_bindings_derived->name); +goto error; + } + + if ( resolve_bindings_derived->attr.pdt_template && +!gfc_pdt_is_instance_of(resolve_bindings_derived, CLASS_DATA(me_arg)->ts.u.derived) ) + { +gfc_error ("Argument %qs of %qs with PASS(%s) at %L must be of" + " the parametric derived-type %qs", me_arg->name, proc->name, + me_arg->name, , resolve_bindings_derived->name); +goto error; + } + + if ( resolve_bindings_derived->attr.pdt_template +&& gfc_pdt_is_instance_of(resolve_bindings_derived, CLASS_DATA(me_arg)->ts.u.derived) +&& (me_arg->param_list != NULL) +&& (gfc_spec_list_type(me_arg->param_list, CLASS_DATA(me_arg)->ts.u.derived) != SPEC_ASSUMED)) + { +gfc_error ("All LEN type parameters of the passed dummy argument %qs of %qs" +" at %L must be ASSUMED.", me_arg->name, proc->name, ); +goto error; + } gcc_assert (me_arg->ts.type == BT_CLASS); if (CLASS_DATA (me_arg)->as && CLASS_DATA (me_arg)->as->rank != 0) diff --git a/gcc/fortran/symbol.cc b/gcc/fortran/symbol.cc index 37a9e8fa0ae..77f84de0989 100644 --- a/gcc/fortran/symbol.cc +++ b/gcc/fortran/symbol.cc @@ -5134,6 +5134,35 @@ gfc_type_is_extension_of (gfc_symbol *t1, gfc_symbol *t2) return gfc_compare_derived_types (t1, t2); } +/* Check if a parameterized derived type t2 is an instance of a PDT template t1 */ + +bool +gfc_pdt_is_instance_of(gfc_symbol *t1, gfc_symbol *t2) +{ + if ( !t1->attr.pdt_template || !t2->attr.pdt_type ) +return false; + + /* +in decl.cc, gfc_get_pdt_instance, a pdt instance is given a 3 character prefix "Pdt", followed +by an underscore list of the kind parameters, up to a maximum of 8. + +So to check if a PDT Type corresponds to the template, extract the core derive_type name, +and then see if it is type compatible by name... + +For example: + +Pdtf_2_2 -> extract out the 'f' -> see if the derived type 'f' is compatible with symbol t1 + */ + + // Starting at index 3 of
PR82943 - Suggested patch to fix
Hello, I am new to the GFortran community. Over the past two weeks I created a patch that should fix PR82943 for GFortran. I have attached it to this email. The patch allows the code below to compile successfully. I am working on creating test cases next, but I am new to the process so it may take me some time. After I make test cases, do I email them to you as well? Do I need to make a pull-request on github in order to get the patch reviewed? Thank you, Alexander Westbrooks module testmod public :: foo type, public :: tough_lvl_0(a, b) integer, kind :: a = 1 integer, len :: b contains procedure :: foo end type type, public, EXTENDS(tough_lvl_0) :: tough_lvl_1 (c) integer, len :: c contains procedure :: bar end type type, public, EXTENDS(tough_lvl_1) :: tough_lvl_2 (d) integer, len :: d contains procedure :: foobar end type contains subroutine foo(this) class(tough_lvl_0(1,*)), intent(inout) :: this end subroutine subroutine bar(this) class(tough_lvl_1(1,*,*)), intent(inout) :: this end subroutine subroutine foobar(this) class(tough_lvl_2(1,*,*,*)), intent(inout) :: this end subroutine end module PROGRAM testprogram USE testmod TYPE(tough_lvl_0(1,5)) :: test_pdt_0 TYPE(tough_lvl_1(1,5,6)) :: test_pdt_1 TYPE(tough_lvl_2(1,5,6,7)) :: test_pdt_2 CALL test_pdt_0%foo() CALL test_pdt_1%foo() CALL test_pdt_1%bar() CALL test_pdt_2%foo() CALL test_pdt_2%bar() CALL test_pdt_2%foobar() END PROGRAM testprogram 0001-bug-patch-PR82943.patch Description: Binary data
[x86_64 PATCH] Handle SUBREG conversions in TImode STV (for ptest).
This patch teaches i386's STV pass how to handle SUBREG conversions, i.e. that a TImode SUBREG can be transformed into a V1TImode SUBREG, without worrying about other DEFs and USEs. A motivating example where this is useful is typedef long long __m128i __attribute__ ((__vector_size__ (16))); int foo (__m128i x, __m128i y) { return (__int128)x == (__int128)y; } where with -O2 -msse4 we can now scalar-to-vector transform: (insn 7 4 8 2 (set (reg:CCZ 17 flags) (compare:CCZ (subreg:TI (reg/v:V2DI 86 [ x ]) 0) (subreg:TI (reg/v:V2DI 87 [ y ]) 0))) {*cmpti_doubleword} into (insn 17 4 7 2 (set (reg:V1TI 91) (xor:V1TI (subreg:V1TI (reg/v:V2DI 86 [ x ]) 0) (subreg:V1TI (reg/v:V2DI 87 [ y ]) 0))) (nil)) (insn 7 17 8 2 (set (reg:CCZ 17 flags) (unspec:CCZ [ (reg:V1TI 91) repeated x2 ] UNSPEC_PTEST)) {*sse4_1_ptestv1ti} (expr_list:REG_DEAD (reg/v:V2DI 87 [ y ]) (expr_list:REG_DEAD (reg/v:V2DI 86 [ x ]) (nil with the dramatic effect that the assembly output before: foo:movaps %xmm0, -40(%rsp) movq-32(%rsp), %rdx movq%xmm0, %rax movq%xmm1, %rsi movaps %xmm1, -24(%rsp) movq-16(%rsp), %rcx xorq%rsi, %rax xorq%rcx, %rdx orq %rdx, %rax sete%al movzbl %al, %eax ret now becomes foo:pxor%xmm1, %xmm0 xorl%eax, %eax ptest %xmm0, %xmm0 sete%al ret i.e. a 128-bit vector doesn't need to be transferred to the scalar unit to be tested for equality. The new test case includes additional related examples that show similar improvements. Previously we explicitly checked *cmpti_doubleword operands to be either immediate constants, or a TImode REG or a TImode MEM. By enhancing this to allow a TImode SUBREG, we now handle everything that would match the general_operand predicate, making this part of STV more like other RTL passes (lra/reload). The big change is that unlike a regular DF USE, a SUBREG USE doesn't require us to analyze and convert the rest of the DEF-USE chain. This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32} with no new failures. Ok for mainline? 2023-06-24 Roger Sayle gcc/ChangeLog * config/i386/i386-features.cc (scalar_chain:add_insn): Don't call analyze_register_chain if the USE is a SUBREG. (timode_scalar_chain::convert_op): Call gen_lowpart to convert TImode SUBREGs to V1TImode SUBREGs. (convertible_comparison_p): We can now handle all general_operands of *cmp_doubleword. (timode_remove_non_convertible_regs): We only need to check TImode uses that aren't TImode SUBREGs of registers in other modes. gcc/testsuite/ChangeLog * gcc.target/i386/sse4_1-ptest-7.c: New test case. Thanks in advance, Roger -- diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc index 4a3b07a..6e9ba54 100644 --- a/gcc/config/i386/i386-features.cc +++ b/gcc/config/i386/i386-features.cc @@ -449,7 +449,8 @@ scalar_chain::add_insn (bitmap candidates, unsigned int insn_uid, return true; for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref)) -if (!DF_REF_REG_MEM_P (ref)) +if (DF_REF_TYPE (ref) == DF_REF_REG_USE + && !SUBREG_P (DF_REF_REG (ref))) if (!analyze_register_chain (candidates, ref, disallowed)) return false; @@ -1621,7 +1622,8 @@ timode_scalar_chain::convert_op (rtx *op, rtx_insn *insn) else { gcc_assert (SUBREG_P (*op)); - gcc_assert (GET_MODE (*op) == vmode); + if (GET_MODE (*op) != V1TImode) + *op = gen_lowpart (V1TImode, *op); } } @@ -1912,12 +1914,8 @@ convertible_comparison_p (rtx_insn *insn, enum machine_mode mode) rtx op2 = XEXP (src, 1); /* *cmp_doubleword. */ - if ((CONST_SCALAR_INT_P (op1) - || ((REG_P (op1) || MEM_P (op1)) - && GET_MODE (op1) == mode)) - && (CONST_SCALAR_INT_P (op2) - || ((REG_P (op2) || MEM_P (op2)) - && GET_MODE (op2) == mode))) + if (general_operand (op1, mode) + && general_operand (op2, mode)) return true; /* *testti_doubleword. */ @@ -2244,8 +2242,9 @@ timode_remove_non_convertible_regs (bitmap candidates) DF_REF_REGNO (ref)); FOR_EACH_INSN_USE (ref, insn) - if (!DF_REF_REG_MEM_P (ref) - && GET_MODE (DF_REF_REG (ref)) == TImode) + if (DF_REF_TYPE (ref) == DF_REF_REG_USE + && GET_MODE (DF_REF_REG (ref)) == TImode + && !SUBREG_P (DF_REF_REG (ref))) timode_check_non_convertible_regs (candidates, regs, DF_REF_REGNO (ref)); } diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-ptest-7.c
Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).
On 6/22/23 07:03, Robin Dapp wrote: Hi, when working on FP widening/narrowing I realized the Zvfhmin handling is not ideal right now: We use the "enabled" insn attribute to disable instructions not available with Zvfhmin (but only with Zvfh). However, "enabled == 0" only disables insn alternatives, in our case all of them when the mode is a HFmode. The insn itself remains available (e.g. for combine to match) and we end up with an insn without alternatives that reload cannot handle --> ICE. The proper solution is to disable the instruction for the respective mode altogether. This patch achieves this by splitting the VF as well as VWEXTF iterators into variants with TARGET_ZVFH and TARGET_VECTOR_ELEN_FP_16 (which is true when either TARGET_ZVFH or TARGET_ZVFHMIN are true). Also, VWCONVERTI, VHF and VHF_LMUL1 need adjustments. Regards Robin gcc/ChangeLog: * config/riscv/autovec.md: VF_AUTO -> VF. * config/riscv/vector-iterators.md: Introduce VF_ZVFHMIN, VWEXTF_ZVFHMIN and use TARGET_ZVFH in VWCONVERTI, VHF and VHF_LMUL1. * config/riscv/vector.md: Use new iterators. OK for the trunk. Thanks for walking everyone through the issues here. jeff
[Bug c++/110394] Lambda capture receives wrong value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394 Xi Ruoyao changed: What|Removed |Added CC||xry111 at gcc dot gnu.org --- Comment #4 from Xi Ruoyao --- (In reply to Andrew Pinski from comment #3) > You can also try -fno-lifetime-dse to see if you get the behavior you were > expecting too. Though I am not sure it will help extend the lifetime of the > temporary here ... > > > https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Optimize-Options.html#index- > flifetime-dse -fstack-reuse=named_vars maybe needed as well. -flifetime-dse preserves the stores outside of the lifetime, and -fstack-reuse=named_vars disallows reusing the stack space of the temporary object for other objects.
[Bug c++/110394] Lambda capture receives wrong value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394 --- Comment #3 from Andrew Pinski --- You can also try -fno-lifetime-dse to see if you get the behavior you were expecting too. Though I am not sure it will help extend the lifetime of the temporary here ... https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Optimize-Options.html#index-flifetime-dse
[Bug c++/110394] Lambda capture receives wrong value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2023-06-24 Ever confirmed|0 |1 --- Comment #2 from Andrew Pinski --- I am almost think this is a bug in your code. Take: auto wait_handle = tc::g_postbox->wait( "UpdateInputs"sv, [=](const msgpack::object& obj) -> bool { }); The temporary for tc::postbox::acceptor_type will end its lifetime at the end of that statement but tc::g_postbox->wait stores it off into m_awaiters. And then gets poped off with: wait_handle.await(); You can fix this via extending the temporary via: ``` tc::postbox::acceptor_type t = [=](const msgpack::object& obj) -> bool { auto [rcv_index, rcv_value] = obj.as>(); tc::tracef(M64MSG_VERBOSE, "index = {}", index); if (rcv_index != index) return false; keys->Value = rcv_value; return true; }; auto wait_handle = tc::g_postbox->wait( "UpdateInputs"sv, t); ``` Note `-fsantize=address` should catch this at runtime too.
[Bug gcov-profile/110395] New: GCOV stuck in an infinite loop with large std::array
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110395 Bug ID: 110395 Summary: GCOV stuck in an infinite loop with large std::array Product: gcc Version: 9.4.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: gcov-profile Assignee: unassigned at gcc dot gnu.org Reporter: carlosgalvezp at gmail dot com CC: marxin at gcc dot gnu.org Target Milestone: --- Hi! We are bumping from GCC 7.5.0 to GCC 9.4.0 (Ubuntu 20.04) and observe that GCOV is stuck when analyzing the following minimal repro code: #include #include template class StaticVector { public: StaticVector() = default; void foo(){} private: std::array data{}; }; class Foo { StaticVector, 4> data_{}; }; int main() { Foo f; return 0; } $ g++ --coverage main.cpp $ ./a.out $ gcov main.cpp The problem goes away if I remove the value initialization for std::array in the StaticVector class (i.e. I leave the member "data" uninitialized). The same problem happens also on GCC 11 What might be the reason for this? Thanks!
[Bug c++/110394] Lambda capture receives wrong value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394 --- Comment #1 from jackyguo18 at hotmail dot com --- Created attachment 55396 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55396=edit .ii file which triggers the bug I couldn't attach the original .ii file, so I had to compress it under gzip.
[Bug other/110394] New: Lambda capture receives wrong value
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394 Bug ID: 110394 Summary: Lambda capture receives wrong value Product: gcc Version: 13.1.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: other Assignee: unassigned at gcc dot gnu.org Reporter: jackyguo18 at hotmail dot com Target Milestone: --- Note that this doesn't occur in Clang, and to my knowledge, disabling strict aliasing and overflow would make no difference. The code submitted here is actually part of a larger library. When I go to debug it, a lambda in `GetKeys(int index, BUTTONS* keys)` captures the wrong value for `index`--it should be 0, but it's 23. Changing the capture type from value to reference causes the lambda to inexplicably call the address 0x17 (decimal 23).
[Bug tree-optimization/110389] [12/13/14 Regression] wrong code at -Os and -O2 with "-fno-tree-ch -fno-expensive-optimizations -fno-ivopts -fno-tree-loop-ivcanon" on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110389 Andrew Pinski changed: What|Removed |Added Status|UNCONFIRMED |NEW Target Milestone|--- |12.4 Ever confirmed|0 |1 Summary|wrong code at -Os and -O2 |[12/13/14 Regression] wrong |with "-fno-tree-ch |code at -Os and -O2 with |-fno-expensive-optimization |"-fno-tree-ch |s -fno-ivopts |-fno-expensive-optimization |-fno-tree-loop-ivcanon" on |s -fno-ivopts |x86_64-linux-gnu|-fno-tree-loop-ivcanon" on ||x86_64-linux-gnu Last reconfirmed||2023-06-24 --- Comment #1 from Andrew Pinski --- Something goes really wrong in DOM3. _7 = e.5_26 + 1; if (_7 <= 2) goto ; [89.57%] else goto ; [10.43%] is optimized to always true.
[Bug rtl-optimization/110391] [12/13/14 Regression] wrong code at -O2 and -O3 with "-fsel-sched-pipelining -fselective-scheduling2" on x86_64-linux-gnu
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110391 Andrew Pinski changed: What|Removed |Added Summary|wrong code at -O2 and -O3 |[12/13/14 Regression] wrong |with|code at -O2 and -O3 with |"-fsel-sched-pipelining |"-fsel-sched-pipelining |-fselective-scheduling2" on |-fselective-scheduling2" on |x86_64-linux-gnu|x86_64-linux-gnu Version|unknown |14.0 See Also||https://gcc.gnu.org/bugzill ||a/show_bug.cgi?id=95123 Target Milestone|--- |12.4
[Bug tree-optimization/110392] [13/14 Regression] ICE at -O3 with "-O3 -Wall -fno-tree-vrp -fno-tree-dominator-opts -fno-tree-copy-prop -fno-tree-fre -fno-tree-ccp -fno-tree-forwprop": in find_var_cmp
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110392 Andrew Pinski changed: What|Removed |Added Ever confirmed|0 |1 Status|UNCONFIRMED |NEW Last reconfirmed||2023-06-24 --- Comment #1 from Andrew Pinski --- Confirmed.
[Bug tree-optimization/110392] ICE at -O3 with "-O3 -Wall -fno-tree-vrp -fno-tree-dominator-opts -fno-tree-copy-prop -fno-tree-fre -fno-tree-ccp -fno-tree-forwprop": in find_var_cmp_const, at gimple-p
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110392 Andrew Pinski changed: What|Removed |Added Summary|ICE at -O3 with "-w -O3 |ICE at -O3 with "-O3 -Wall |-Wall -fno-tree-vrp |-fno-tree-vrp |-fno-tree-dominator-opts|-fno-tree-dominator-opts |-fno-tree-copy-prop |-fno-tree-copy-prop |-fno-tree-fre -fno-tree-ccp |-fno-tree-fre -fno-tree-ccp |-fno-tree-forwprop": in |-fno-tree-forwprop": in |find_var_cmp_const, at |find_var_cmp_const, at |gimple-predicate-analysis.c |gimple-predicate-analysis.c |c:257 |c:257 Version|unknown |14.0 Target Milestone|--- |13.2 Keywords||ice-on-valid-code
Re: [PATCH v7 0/6] c++, libstdc++: get std::is_object to dispatch to new built-in traits
On Tue, Jun 20, 2023 at 8:32 AM Patrick Palka wrote: > > On Thu, 15 Jun 2023, Ken Matsui via Libstdc++ wrote: > > > Hi, > > > > For those curious about the performance improvements of this patch, I > > conducted a benchmark that instantiates 256k specializations of > > is_object_v based on Patrick's code. You can find the benchmark code > > at this link: > > > > https://github.com/ken-matsui/gcc-benches/blob/main/is_object_benchmark.cc > > > > On my computer, using the gcc HEAD of this patch for a release build, > > the patch with -DUSE_BUILTIN took 64% less time and used 44-47% less > > memory compared to not using it. > > That's more like it :D Though the benchmark should also invoke the > trait on non-object types too, e.g. Instantiator& or Instantiator(int). Here is the updated benchmark: https://github.com/ken-matsui/gcc-benches/blob/main/is_object.md#sat-jun-24-080110-am-pdt-2023 Time: -74.7544% Peak Memory Usage: -62.5913% Total Memory Usage: -64.2708% > > > > Sincerely, > > Ken Matsui > > > > On Mon, Jun 12, 2023 at 3:49 PM Ken Matsui > > wrote: > > > > > > Hi, > > > > > > This patch series gets std::is_object to dispatch to built-in traits and > > > implements the following built-in traits, on which std::object depends. > > > > > > * __is_reference > > > * __is_function > > > * __is_void > > > > > > std::is_object was depending on them with disjunction and negation. > > > > > > __not_<__or_, is_reference<_Tp>, is_void<_Tp>>>::type > > > > > > Therefore, this patch uses them directly instead of implementing an > > > additional > > > built-in trait __is_object, which makes the compiler slightly bigger and > > > slower. > > > > > > __bool_constant > > __is_void(_Tp))> > > > > > > This would instantiate only __bool_constant and > > > __bool_constant, > > > which can be mostly shared. That is, the purpose of built-in traits is > > > considered as achieved. > > > > > > Changes in v7 > > > > > > * Removed an unnecessary new line. > > > > > > Ken Matsui (6): > > > c++: implement __is_reference built-in trait > > > libstdc++: use new built-in trait __is_reference for std::is_reference > > > c++: implement __is_function built-in trait > > > libstdc++: use new built-in trait __is_function for std::is_function > > > c++, libstdc++: implement __is_void built-in trait > > > libstdc++: make std::is_object dispatch to new built-in traits > > > > > > gcc/cp/constraint.cc | 9 +++ > > > gcc/cp/cp-trait.def | 3 + > > > gcc/cp/semantics.cc | 12 > > > gcc/testsuite/g++.dg/ext/has-builtin-1.C | 9 +++ > > > gcc/testsuite/g++.dg/ext/is_function.C| 58 +++ > > > gcc/testsuite/g++.dg/ext/is_reference.C | 34 +++ > > > gcc/testsuite/g++.dg/ext/is_void.C| 35 +++ > > > gcc/testsuite/g++.dg/tm/pr46567.C | 6 +- > > > libstdc++-v3/include/bits/cpp_type_traits.h | 15 - > > > libstdc++-v3/include/debug/helper_functions.h | 5 +- > > > libstdc++-v3/include/std/type_traits | 51 > > > 11 files changed, 216 insertions(+), 21 deletions(-) > > > create mode 100644 gcc/testsuite/g++.dg/ext/is_function.C > > > create mode 100644 gcc/testsuite/g++.dg/ext/is_reference.C > > > create mode 100644 gcc/testsuite/g++.dg/ext/is_void.C > > > > > > -- > > > 2.41.0 > > > > > > >
Re: [PATCH] RISCV: Add -m(no)-omit-leaf-frame-pointer support.
On 6/21/23 02:14, Wang, Yanzhang wrote: Hi Jeff, sorry for the late reply. The long branch handling is done at the assembler level. So the clobbering of $ra isn't visible to the compiler. Thus the compiler has to be extremely careful to not hold values in $ra because the assembler may clobber $ra. If assembler will modify the $ra behavior, it seems the rules we defined in the riscv.cc will be ignored. For example, the $ra saving generated by this patch may be modified by the assmebler and all others depends on it will be wrong. So implementing the long jump in the compiler is better. Basically correct. The assembler potentially clobbers $ra. That's why in the long jump patches $ra becomes a fixed register -- the compiler doesn't know when it's clobbered by the assembler. Even if this were done in the compiler, we'd still have to do something special with $ra. The point at which decisions about register allocation and such are made is before the point where we know the final positions of jumps/labels. It's a classic problem in GCC's design. If you're not going to use dwarf, then my recommendation is to ensure that the data you need is *always* available in the stack at known offsets. That will mean your code isn't optimized as well. It means hand written assembly code has to follow the conventions, you can't link against libraries that do not follow those conventions, etc etc. But that's the price you pay for not using dwarf (or presumably ORC/SFRAME which I haven't studied in detail). Yes. That's right. All the libraries need to follow the same logic. But as you said, this is the price if we choose this solution. And fortunately, this will only be used in special scenarios. The key point is you want the location of the return pointer to be consistent in every function and you want to know that every function has a frame pointer. Otherwise you end up having to either consult on-the-side tables (at which point you might as well look at ORC/SFRAME) or disassembling code in the executable to deduce where to find fp, ra, etc (which is a path to madness). Thus for the usage scenario you're looking at, I would recommend always having a frame pointer, every function, no matter how trivial and that $ra always be saved into a suitable slot relative to the frame pointer, again, no matter how trivial the function. And Jeff, do you have any other comments about this patch? Should we add some descriptions somewhere in the doc? We may need to adjust the documentation a bit since I think I'm suggesting slight changes in the behavior of existing -m options. I'd like to see an updated patch before commenting further on implementation details. jeff
Re: [PATCH V1] RISC-V:Add float16 tuple type abi
On 6/21/23 01:46, juzhe.zh...@rivai.ai wrote: LGTM. Thanks. OK from me as well. jeff
Re: [PATCH v2 1/2] c++: implement __is_volatile built-in trait
Here is the benchmark result for is_volatile: https://github.com/ken-matsui/gcc-benches/blob/main/is_volatile.md#sat-jun-24-074036-am-pdt-2023 Time: -2.42335% Peak Memory Usage: -1.07651% Total Memory Usage: -1.62369% On Sat, Jun 24, 2023 at 7:24 AM Ken Matsui wrote: > > This patch implements built-in trait for std::is_volatile. > > gcc/cp/ChangeLog: > > * cp-trait.def: Define __is_volatile. > * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_VOLATILE. > * semantics.cc (trait_expr_value): Likewise. > (finish_trait_expr): Likewise. > > gcc/testsuite/ChangeLog: > > * g++.dg/ext/has-builtin-1.C: Test existence of __is_volatile. > * g++.dg/ext/is_volatile.C: New test. > > Signed-off-by: Ken Matsui > --- > gcc/cp/constraint.cc | 3 +++ > gcc/cp/cp-trait.def | 1 + > gcc/cp/semantics.cc | 4 > gcc/testsuite/g++.dg/ext/has-builtin-1.C | 3 +++ > gcc/testsuite/g++.dg/ext/is_volatile.C | 19 +++ > 5 files changed, 30 insertions(+) > create mode 100644 gcc/testsuite/g++.dg/ext/is_volatile.C > > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc > index 8cf0f2d0974..e971d67ee25 100644 > --- a/gcc/cp/constraint.cc > +++ b/gcc/cp/constraint.cc > @@ -3751,6 +3751,9 @@ diagnose_trait_expr (tree expr, tree args) > case CPTK_IS_UNION: >inform (loc, " %qT is not a union", t1); >break; > +case CPTK_IS_VOLATILE: > + inform (loc, " %qT is not a volatile type", t1); > + break; > case CPTK_IS_AGGREGATE: >inform (loc, " %qT is not an aggregate", t1); >break; > diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def > index 8b7fece0cc8..414b1065a11 100644 > --- a/gcc/cp/cp-trait.def > +++ b/gcc/cp/cp-trait.def > @@ -82,6 +82,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, > "__is_trivially_assignable", 2) > DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, "__is_trivially_constructible", > -1) > DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1) > DEFTRAIT_EXPR (IS_UNION, "__is_union", 1) > +DEFTRAIT_EXPR (IS_VOLATILE, "__is_volatile", 1) > DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, > "__reference_constructs_from_temporary", 2) > DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, > "__reference_converts_from_temporary", 2) > /* FIXME Added space to avoid direct usage in GCC 13. */ > diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc > index 8fb47fd179e..10934d01504 100644 > --- a/gcc/cp/semantics.cc > +++ b/gcc/cp/semantics.cc > @@ -12079,6 +12079,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, > tree type2) > case CPTK_IS_ENUM: >return type_code1 == ENUMERAL_TYPE; > > +case CPTK_IS_VOLATILE: > + return CP_TYPE_VOLATILE_P (type1); > + > case CPTK_IS_FINAL: >return CLASS_TYPE_P (type1) && CLASSTYPE_FINAL (type1); > > @@ -12296,6 +12299,7 @@ finish_trait_expr (location_t loc, cp_trait_kind > kind, tree type1, tree type2) > case CPTK_IS_ENUM: > case CPTK_IS_UNION: > case CPTK_IS_SAME: > +case CPTK_IS_VOLATILE: >break; > > case CPTK_IS_LAYOUT_COMPATIBLE: > diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C > b/gcc/testsuite/g++.dg/ext/has-builtin-1.C > index f343e153e56..7ad640f141b 100644 > --- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C > +++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C > @@ -146,3 +146,6 @@ > #if !__has_builtin (__remove_cvref) > # error "__has_builtin (__remove_cvref) failed" > #endif > +#if !__has_builtin (__is_volatile) > +# error "__has_builtin (__is_volatile) failed" > +#endif > diff --git a/gcc/testsuite/g++.dg/ext/is_volatile.C > b/gcc/testsuite/g++.dg/ext/is_volatile.C > new file mode 100644 > index 000..004e397e5e7 > --- /dev/null > +++ b/gcc/testsuite/g++.dg/ext/is_volatile.C > @@ -0,0 +1,19 @@ > +// { dg-do compile { target c++11 } } > + > +#include > + > +using namespace __gnu_test; > + > +#define SA(X) static_assert((X),#X) > + > +// Positive tests. > +SA(__is_volatile(volatile int)); > +SA(__is_volatile(const volatile int)); > +SA(__is_volatile(vClassType)); > +SA(__is_volatile(cvClassType)); > + > +// Negative tests. > +SA(!__is_volatile(int)); > +SA(!__is_volatile(const int)); > +SA(!__is_volatile(ClassType)); > +SA(!__is_volatile(cClassType)); > -- > 2.41.0 >
Re: [PATCH] GIMPLE_FOLD: Apply LEN_MASK_{LOAD, STORE} into GIMPLE_FOLD
On 6/23/23 07:48, juzhe.zh...@rivai.ai wrote: From: Ju-Zhe Zhong Hi, since we are going to have LEN_MASK_{LOAD,STORE} into loopVectorizer. Currenly, 1. we can fold MASK_{LOAD,STORE} into MEM when mask is all ones. 2. we can fold LEN_{LOAD,STORE} into MEM when (len - bias) is VF. Now, I think it makes sense that we can support fold LEN_MASK_{LOAD,STORE} into MEM when both mask = all ones and (len - bias) is VF. gcc/ChangeLog: * gimple-fold.cc (arith_overflowed_p): Apply LEN_MASK_{LOAD,STORE}. (gimple_fold_partial_load_store_mem_ref): Ditto. (gimple_fold_partial_store): Ditto. (gimple_fold_call): Ditto. OK jeff
RE: [PATCH] RISC-V: Refactor the integer ternary autovec pattern
Committed, thanks Jeff. Pan -Original Message- From: Gcc-patches On Behalf Of Jeff Law via Gcc-patches Sent: Saturday, June 24, 2023 10:04 PM To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org Cc: kito.ch...@sifive.com; pal...@rivosinc.com; rdapp@gmail.com Subject: Re: [PATCH] RISC-V: Refactor the integer ternary autovec pattern On 6/21/23 16:38, Juzhe-Zhong wrote: > Long time ago, I encounter ICE when trying to set clobber register as Pmode > and I forgot the reason. > > So, I clobber SI scratch and PUT_MODE to make it Pmode after reload which > makes patterns look unreasonable. > > According to Jeff's comments, I tried it again, it works now when we try to > set clobber register as Pmode and the patterns look more reasonable now. > > The tests are all passed, Ok for trunk. > > gcc/ChangeLog: > > * config/riscv/autovec.md (*fma): set clobber to Pmode in > expand stage. > (*fma): Ditto. > (*fnma): Ditto. > (*fnma): Ditto. OK jeff
RE: [PATCH V3] RISC-V: Support RVV floating-point auto-vectorization
Committed, thanks Jeff. Pan -Original Message- From: Gcc-patches On Behalf Of Jeff Law via Gcc-patches Sent: Saturday, June 24, 2023 10:06 PM To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org Cc: kito.ch...@sifive.com; pal...@rivosinc.com; rdapp@gmail.com Subject: Re: [PATCH V3] RISC-V: Support RVV floating-point auto-vectorization On 6/21/23 09:53, Juzhe-Zhong wrote: > This patch adds RVV floating-point auto-vectorization. > Also, fix attribute bug of floating-point ternary operations in vector.md. > > gcc/ChangeLog: > > * config/riscv/autovec.md (fma4): New pattern. > (*fma): Ditto. > (fnma4): Ditto. > (*fnma): Ditto. > (fms4): Ditto. > (*fms): Ditto. > (fnms4): Ditto. > (*fnms): Ditto. > * config/riscv/riscv-protos.h (emit_vlmax_fp_ternary_insn): New > function. > * config/riscv/riscv-v.cc (emit_vlmax_fp_ternary_insn): Ditto. > * config/riscv/vector.md: Fix attribute bug. OK. Thanks for digging into that clobber issue. Jeff
Re: [PATCH v2] RISC-V: Implement autovec copysign.
On 6/21/23 08:24, 钟居哲 wrote: LGTM. Likewise. OK for the trunk. jeff
Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores
On 6/22/23 00:39, Richard Biener wrote: I suspect there's no way to specify the desired semantics? OTOH code that looks at the MEM operand only and not the insn (which should have some UNSPEC wrapped) needs to be conservative, so maybe the alias code shouldn't assume that a (mem:V16SI ..) actually performs an access of the size of V16SI at the specified location? I'm not aware of a way to express the semantics fully right now. We'd need some way to indicate that the MEM is a partial and pass along the actual length. We could do both through MEM_ATTRS with some work. For example we could declare that for vector modes full semantic information is carried in the MEM_ATTRS rather than by the mode itself. So it falls into a space between how we currently think of something like V16SI and BLK. The mode specifies a maximum size and how to interpret the elements. But actual size and perhaps mask info would be found in MEM_ATTRS. jeff
[Bug ada/105212] -gnatwu gives false error message for certain arrays.
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105212 --- Comment #2 from Honki Tonk --- The error still occurs with version 13.1.
[PATCH v2 2/2] libstdc++: use new built-in trait __is_volatile
This patch lets libstdc++ use new built-in trait __is_volatile. libstdc++-v3/ChangeLog: * include/std/type_traits (is_volatile): Use __is_volatile built-in trait. (is_volatile_v): Likewise. Signed-off-by: Ken Matsui --- libstdc++-v3/include/std/type_traits | 13 + 1 file changed, 13 insertions(+) diff --git a/libstdc++-v3/include/std/type_traits b/libstdc++-v3/include/std/type_traits index 0e7a9c9c7f3..db74b884b35 100644 --- a/libstdc++-v3/include/std/type_traits +++ b/libstdc++-v3/include/std/type_traits @@ -773,6 +773,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION : public true_type { }; /// is_volatile +#if __has_builtin(__is_volatile) + template +struct is_volatile +: public __bool_constant<__is_volatile(_Tp)> +{ }; +#else template struct is_volatile : public false_type { }; @@ -780,6 +786,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION template struct is_volatile<_Tp volatile> : public true_type { }; +#endif /// is_trivial template @@ -3214,10 +3221,16 @@ template inline constexpr bool is_const_v = false; template inline constexpr bool is_const_v = true; + +#if __has_builtin(__is_volatile) +template + inline constexpr bool is_volatile_v = __is_volatile(_Tp); +#else template inline constexpr bool is_volatile_v = false; template inline constexpr bool is_volatile_v = true; +#endif template inline constexpr bool is_trivial_v = __is_trivial(_Tp); -- 2.41.0
Re: [PATCH] Improve DSE to handle stores before __builtin_unreachable ()
On 6/22/23 07:42, Jan Hubicka wrote: On 6/22/23 00:31, Richard Biener wrote: I think there's a difference in that __builtin_trap () is observable while __builtin_unreachable () is not and reaching __builtin_unreachable () invokes undefined behavior while reaching __builtin_trap () does not. So the isolation code marking the trapping code volatile should be enough and the trap () is just there to end the basic block (and maybe be on the safe side to really trap). Agreed WRT observability -- but that's not really the point of the trap and if we wanted we could change that behavior. The trap is there to halt execution immediately rather than letting it keep running. That was a design decision from a security standpoint. If we've detected that we're executing undefined behavior, stop rather than potentially letting a malicious actor turn a bug into an exploit. Also as discussed some time ago, the volatile loads between traps has effect of turning previously pure/const functions into non-const which is somewhat sad, so it is still on my todo list to change it this stage1 to something more careful. We discussed internal functions trap_store and trap_load which will expand to load/store + trap but will make it clear that side effect does not count to modref. It's been a long time since I looked at this code -- isn't it the case that we already must have had a load/store and that all we've done is change its form (to enable more DCE) and added the volatile marker? Meaning that it couldn't have been pure/cost before, could it? Or is it the case that you want to not have the erroneous path be the sole reason to spoil pure/const detection -- does that happen often in practice? jeff
[PATCH v2 1/2] c++: implement __is_volatile built-in trait
This patch implements built-in trait for std::is_volatile. gcc/cp/ChangeLog: * cp-trait.def: Define __is_volatile. * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_VOLATILE. * semantics.cc (trait_expr_value): Likewise. (finish_trait_expr): Likewise. gcc/testsuite/ChangeLog: * g++.dg/ext/has-builtin-1.C: Test existence of __is_volatile. * g++.dg/ext/is_volatile.C: New test. Signed-off-by: Ken Matsui --- gcc/cp/constraint.cc | 3 +++ gcc/cp/cp-trait.def | 1 + gcc/cp/semantics.cc | 4 gcc/testsuite/g++.dg/ext/has-builtin-1.C | 3 +++ gcc/testsuite/g++.dg/ext/is_volatile.C | 19 +++ 5 files changed, 30 insertions(+) create mode 100644 gcc/testsuite/g++.dg/ext/is_volatile.C diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc index 8cf0f2d0974..e971d67ee25 100644 --- a/gcc/cp/constraint.cc +++ b/gcc/cp/constraint.cc @@ -3751,6 +3751,9 @@ diagnose_trait_expr (tree expr, tree args) case CPTK_IS_UNION: inform (loc, " %qT is not a union", t1); break; +case CPTK_IS_VOLATILE: + inform (loc, " %qT is not a volatile type", t1); + break; case CPTK_IS_AGGREGATE: inform (loc, " %qT is not an aggregate", t1); break; diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def index 8b7fece0cc8..414b1065a11 100644 --- a/gcc/cp/cp-trait.def +++ b/gcc/cp/cp-trait.def @@ -82,6 +82,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, "__is_trivially_assignable", 2) DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, "__is_trivially_constructible", -1) DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1) DEFTRAIT_EXPR (IS_UNION, "__is_union", 1) +DEFTRAIT_EXPR (IS_VOLATILE, "__is_volatile", 1) DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, "__reference_constructs_from_temporary", 2) DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, "__reference_converts_from_temporary", 2) /* FIXME Added space to avoid direct usage in GCC 13. */ diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc index 8fb47fd179e..10934d01504 100644 --- a/gcc/cp/semantics.cc +++ b/gcc/cp/semantics.cc @@ -12079,6 +12079,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree type2) case CPTK_IS_ENUM: return type_code1 == ENUMERAL_TYPE; +case CPTK_IS_VOLATILE: + return CP_TYPE_VOLATILE_P (type1); + case CPTK_IS_FINAL: return CLASS_TYPE_P (type1) && CLASSTYPE_FINAL (type1); @@ -12296,6 +12299,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, tree type1, tree type2) case CPTK_IS_ENUM: case CPTK_IS_UNION: case CPTK_IS_SAME: +case CPTK_IS_VOLATILE: break; case CPTK_IS_LAYOUT_COMPATIBLE: diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C b/gcc/testsuite/g++.dg/ext/has-builtin-1.C index f343e153e56..7ad640f141b 100644 --- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C +++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C @@ -146,3 +146,6 @@ #if !__has_builtin (__remove_cvref) # error "__has_builtin (__remove_cvref) failed" #endif +#if !__has_builtin (__is_volatile) +# error "__has_builtin (__is_volatile) failed" +#endif diff --git a/gcc/testsuite/g++.dg/ext/is_volatile.C b/gcc/testsuite/g++.dg/ext/is_volatile.C new file mode 100644 index 000..004e397e5e7 --- /dev/null +++ b/gcc/testsuite/g++.dg/ext/is_volatile.C @@ -0,0 +1,19 @@ +// { dg-do compile { target c++11 } } + +#include + +using namespace __gnu_test; + +#define SA(X) static_assert((X),#X) + +// Positive tests. +SA(__is_volatile(volatile int)); +SA(__is_volatile(const volatile int)); +SA(__is_volatile(vClassType)); +SA(__is_volatile(cvClassType)); + +// Negative tests. +SA(!__is_volatile(int)); +SA(!__is_volatile(const int)); +SA(!__is_volatile(ClassType)); +SA(!__is_volatile(cClassType)); -- 2.41.0
Re: [PATCH v2 1/2] c++: implement __is_array built-in trait
Here is the benchmark result for is_array: https://github.com/ken-matsui/gcc-benches/blob/main/is_array.md#sat-jun-24-070630-am-pdt-2023 Time: -15.511% Peak Memory Usage: +0.173923% Total Memory Usage: -6.2037% On Sat, Jun 24, 2023 at 6:54 AM Ken Matsui wrote: > > This patch implements built-in trait for std::is_array. > > gcc/cp/ChangeLog: > > * cp-trait.def: Define __is_array. > * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_ARRAY. > * semantics.cc (trait_expr_value): Likewise. > (finish_trait_expr): Likewise. > > gcc/testsuite/ChangeLog: > > * g++.dg/ext/has-builtin-1.C: Test existence of __is_array. > * g++.dg/ext/is_array.C: New test. > > Signed-off-by: Ken Matsui > --- > gcc/cp/constraint.cc | 3 +++ > gcc/cp/cp-trait.def | 1 + > gcc/cp/semantics.cc | 4 > gcc/testsuite/g++.dg/ext/has-builtin-1.C | 3 +++ > gcc/testsuite/g++.dg/ext/is_array.C | 28 > 5 files changed, 39 insertions(+) > create mode 100644 gcc/testsuite/g++.dg/ext/is_array.C > > diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc > index 8cf0f2d0974..7cec7eba591 100644 > --- a/gcc/cp/constraint.cc > +++ b/gcc/cp/constraint.cc > @@ -3751,6 +3751,9 @@ diagnose_trait_expr (tree expr, tree args) > case CPTK_IS_UNION: >inform (loc, " %qT is not a union", t1); >break; > +case CPTK_IS_ARRAY: > + inform (loc, " %qT is not an array", t1); > + break; > case CPTK_IS_AGGREGATE: >inform (loc, " %qT is not an aggregate", t1); >break; > diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def > index 8b7fece0cc8..f68c7f2e8ec 100644 > --- a/gcc/cp/cp-trait.def > +++ b/gcc/cp/cp-trait.def > @@ -82,6 +82,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, > "__is_trivially_assignable", 2) > DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, "__is_trivially_constructible", > -1) > DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1) > DEFTRAIT_EXPR (IS_UNION, "__is_union", 1) > +DEFTRAIT_EXPR (IS_ARRAY, "__is_array", 1) > DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, > "__reference_constructs_from_temporary", 2) > DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, > "__reference_converts_from_temporary", 2) > /* FIXME Added space to avoid direct usage in GCC 13. */ > diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc > index 8fb47fd179e..22f2700ec0b 100644 > --- a/gcc/cp/semantics.cc > +++ b/gcc/cp/semantics.cc > @@ -12118,6 +12118,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, > tree type2) > case CPTK_IS_UNION: >return type_code1 == UNION_TYPE; > > +case CPTK_IS_ARRAY: > + return type_code1 == ARRAY_TYPE; > + > case CPTK_IS_ASSIGNABLE: >return is_xible (MODIFY_EXPR, type1, type2); > > @@ -12296,6 +12299,7 @@ finish_trait_expr (location_t loc, cp_trait_kind > kind, tree type1, tree type2) > case CPTK_IS_ENUM: > case CPTK_IS_UNION: > case CPTK_IS_SAME: > +case CPTK_IS_ARRAY: >break; > > case CPTK_IS_LAYOUT_COMPATIBLE: > diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C > b/gcc/testsuite/g++.dg/ext/has-builtin-1.C > index f343e153e56..56485ae62be 100644 > --- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C > +++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C > @@ -146,3 +146,6 @@ > #if !__has_builtin (__remove_cvref) > # error "__has_builtin (__remove_cvref) failed" > #endif > +#if !__has_builtin (__is_array) > +# error "__has_builtin (__is_array) failed" > +#endif > diff --git a/gcc/testsuite/g++.dg/ext/is_array.C > b/gcc/testsuite/g++.dg/ext/is_array.C > new file mode 100644 > index 000..facfed5c7cb > --- /dev/null > +++ b/gcc/testsuite/g++.dg/ext/is_array.C > @@ -0,0 +1,28 @@ > +// { dg-do compile { target c++11 } } > + > +#include > + > +using namespace __gnu_test; > + > +#define SA(X) static_assert((X),#X) > +#define SA_TEST_CATEGORY(TRAIT, X, expect) \ > + SA(TRAIT(X) == expect); \ > + SA(TRAIT(const X) == expect);\ > + SA(TRAIT(volatile X) == expect); \ > + SA(TRAIT(const volatile X) == expect) > + > +SA_TEST_CATEGORY(__is_array, int[2], true); > +SA_TEST_CATEGORY(__is_array, int[], true); > +SA_TEST_CATEGORY(__is_array, int[2][3], true); > +SA_TEST_CATEGORY(__is_array, int[][3], true); > +SA_TEST_CATEGORY(__is_array, float*[2], true); > +SA_TEST_CATEGORY(__is_array, float*[], true); > +SA_TEST_CATEGORY(__is_array, float*[2][3], true); > +SA_TEST_CATEGORY(__is_array, float*[][3], true); > +SA_TEST_CATEGORY(__is_array, ClassType[2], true); > +SA_TEST_CATEGORY(__is_array, ClassType[], true); > +SA_TEST_CATEGORY(__is_array, ClassType[2][3], true); > +SA_TEST_CATEGORY(__is_array, ClassType[][3], true); > + > +// Sanity check. > +SA_TEST_CATEGORY(__is_array, ClassType, false); > -- > 2.41.0 >
Re: [PATCH] SSA ALIAS: Apply LEN_MASK_STORE to 'ref_maybe_used_by_call_p_1'
On 6/23/23 17:20, 钟居哲 wrote: Not sure since I saw MASK_STORE/LEN_STORE didn't compute size. Also OK after a re-review on my part. The code sets the size to -1 after doing the ao_ref_init_from_ptr_and_size, meaning it's not a known size. jeff
Re: [PATCH] SSA ALIAS: Apply LEN_MASK_{LOAD, STORE} into SSA alias analysis
On 6/23/23 17:21, 钟居哲 wrote: Not sure since I saw MASK_STORE/LEN_STORE didn't compute size. Yea, I think you're right. We take the size from the LHS. My mistake. This is fine for the trunk. jeff
Re: [PATCH V3] RISC-V: Support RVV floating-point auto-vectorization
On 6/21/23 09:53, Juzhe-Zhong wrote: This patch adds RVV floating-point auto-vectorization. Also, fix attribute bug of floating-point ternary operations in vector.md. gcc/ChangeLog: * config/riscv/autovec.md (fma4): New pattern. (*fma): Ditto. (fnma4): Ditto. (*fnma): Ditto. (fms4): Ditto. (*fms): Ditto. (fnms4): Ditto. (*fnms): Ditto. * config/riscv/riscv-protos.h (emit_vlmax_fp_ternary_insn): New function. * config/riscv/riscv-v.cc (emit_vlmax_fp_ternary_insn): Ditto. * config/riscv/vector.md: Fix attribute bug. OK. Thanks for digging into that clobber issue. Jeff
Re: [PATCH] RISC-V: Refactor the integer ternary autovec pattern
On 6/21/23 16:38, Juzhe-Zhong wrote: Long time ago, I encounter ICE when trying to set clobber register as Pmode and I forgot the reason. So, I clobber SI scratch and PUT_MODE to make it Pmode after reload which makes patterns look unreasonable. According to Jeff's comments, I tried it again, it works now when we try to set clobber register as Pmode and the patterns look more reasonable now. The tests are all passed, Ok for trunk. gcc/ChangeLog: * config/riscv/autovec.md (*fma): set clobber to Pmode in expand stage. (*fma): Ditto. (*fnma): Ditto. (*fnma): Ditto. OK jeff