[Bug middle-end/110148] [14 Regression] TSVC s242 regression between g:c0df96b3cda5738afbba3a65bb054183c5cd5530 and g:e4c986fde56a6248f8fbe6cf0704e1da34b055d8

2023-06-24 Thread lili.cui at intel dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110148

--- Comment #3 from cuilili  ---
I reproduced S1244 regression on znver3.

Src code:

for (int i = 0; i < LEN_1D-1; i++)
  {
a[i] = b[i] + c[i] * c[i] + b[i] * b[i] + c[i];
d[i] = a[i] + a[i+1];
  }

Base version: Base + commit version:

Assembler Assembler 
Loop1:Loop1:
vmovsd 0x60c400(%rax),%xmm2   vmovsd 0x60ba00(%rax),%xmm2   
vmovsd 0x60ba00(%rax),%xmm1   vmovsd 0x60c400(%rax),%xmm1   
add$0x8,%rax  add$0x8,%rax  

vaddsd %xmm1,%xmm2,%xmm0  vmovsd %xmm2,%xmm2,%xmm0  
vmulsd %xmm2,%xmm2,%xmm2  vfmadd132sd %xmm2,%xmm1,%xmm0 
vfmadd132sd %xmm1,%xmm2,%xmm1 vfmadd132sd %xmm1,%xmm2,%xmm1 

vaddsd %xmm1,%xmm0,%xmm0  vaddsd %xmm1,%xmm0,%xmm0  
vmovsd %xmm0,0x60cdf8(%rax)   vmovsd %xmm0,0x60cdf8(%rax)   
vaddsd 0x60ce00(%rax),%xmm0,%xmm0 vaddsd 0x60ce00(%rax),%xmm0,%xmm0 
vmovsd %xmm0,0x60aff8(%rax)   vmovsd %xmm0,0x60aff8(%rax)   
cmp$0x9f8,%raxcmp$0x9f8,%rax
jneLoop1: jneLoop1


For the Base version, mult and FMA have dependencies, which increases the
latency of the critical dependency chain. I didn't find out why znver3 has
regression. Same binary running on ICX has 11% gain (with #define iterations
1).

Re: [PATCH 1/5] x86: use VPTERNLOG for further bitwise two-vector operations

2023-06-24 Thread Jan Beulich via Gcc-patches
On 25.06.2023 06:42, Hongtao Liu wrote:
> On Wed, Jun 21, 2023 at 2:26 PM Jan Beulich via Gcc-patches
>  wrote:
>>
>> +(define_code_iterator andor [and ior])
>> +(define_code_attr nlogic [(and "nor") (ior "nand")])
>> +(define_code_attr ternlog_nlogic [(and "0x11") (ior "0x77")])
>> +
>> +(define_insn "*3"
>> +  [(set (match_operand:VI 0 "register_operand" "=v,v")
>> +   (andor:VI
>> + (not:VI (match_operand:VI 1 "bcst_vector_operand" "%v,v"))
>> + (not:VI (match_operand:VI 2 "bcst_vector_operand" "vBr,m"]
> I'm thinking of doing it in simplify_rtx or gimple match.pd to transform
> (and (not op1))  (not op2)) -> (not: (ior: op1 op2))

This wouldn't be a win (not + andn) -> (or + not), but what's
more important is ...

> (ior (not op1) (not op2)) -> (not : (and op1 op2))
> 
> Even w/o avx512f, the transformation should also benefit since it
> takes less logic operations 3 -> 2.(or 2 -> 2 for pandn).

... that these transformations (from the, as per the doc,
canonical representation of nand and nor) are already occurring
in common code, _if_ no suitable insn can be found. That was at
least the conclusion I drew from looking around a lot, supported
by the code that's generated prior to this change.

Jan


RE: [PATCH] New finish_compare_by_pieces target hook (for x86).

2023-06-24 Thread Roger Sayle


On Tue, 13 June 2023 12:02, Richard Biener wrote:
> On Mon, Jun 12, 2023 at 4:04 PM Roger Sayle 
> wrote:
> > The following simple test case, from PR 104610, shows that memcmp ()
> > == 0 can result in some bizarre code sequences on x86.
> >
> > int foo(char *a)
> > {
> > static const char t[] = "0123456789012345678901234567890";
> > return __builtin_memcmp(a, [0], sizeof(t)) == 0; }
> >
> > with -O2 currently contains both:
> > xorl%eax, %eax
> > xorl$1, %eax
> > and also
> > movl$1, %eax
> > xorl$1, %eax
> >
> > Changing the return type of foo to _Bool results in the equally
> > bizarre:
> > xorl%eax, %eax
> > testl   %eax, %eax
> > sete%al
> > and also
> > movl$1, %eax
> > testl   %eax, %eax
> > sete%al
> >
> > All these sequences set the result to a constant, but this
> > optimization opportunity only occurs very late during compilation, by
> > basic block duplication in the 322r.bbro pass, too late for CSE or
> > peephole2 to do anything about it.  The problem is that the idiom
> > expanded by compare_by_pieces for __builtin_memcmp_eq contains basic
> > blocks that can't easily be optimized by if-conversion due to the
> > multiple incoming edges on the fail block.
> >
> > In summary, compare_by_pieces generates code that looks like:
> >
> > if (x[0] != y[0]) goto fail_label;
> > if (x[1] != y[1]) goto fail_label;
> > ...
> > if (x[n] != y[n]) goto fail_label;
> > result = 1;
> > goto end_label;
> > fail_label:
> > result = 0;
> > end_label:
> >
> > In theory, the RTL if-conversion pass could be enhanced to tackle
> > arbitrarily complex if-then-else graphs, but the solution proposed
> > here is to allow suitable targets to perform if-conversion during
> > compare_by_pieces.  The x86, for example, can take advantage that all
> > of the above comparisons set and test the zero flag (ZF), which can
> > then be used in combination with sete.  Hence compare_by_pieces could
> > instead generate:
> >
> > if (x[0] != y[0]) goto fail_label;
> > if (x[1] != y[1]) goto fail_label;
> > ...
> > if (x[n] != y[n]) goto fail_label;
> > fail_label:
> > sete result
> >
> > which requires one less basic block, and the redundant conditional
> > branch to a label immediately after is cleaned up by GCC's existing
> > RTL optimizations.
> >
> > For the test case above, where -O2 -msse4 previously generated:
> >
> > foo:movdqu  (%rdi), %xmm0
> > pxor.LC0(%rip), %xmm0
> > ptest   %xmm0, %xmm0
> > je  .L5
> > .L2:movl$1, %eax
> > xorl$1, %eax
> > ret
> > .L5:movdqu  16(%rdi), %xmm0
> > pxor.LC1(%rip), %xmm0
> > ptest   %xmm0, %xmm0
> > jne .L2
> > xorl%eax, %eax
> > xorl$1, %eax
> > ret
> >
> > we now generate:
> >
> > foo:movdqu  (%rdi), %xmm0
> > pxor.LC0(%rip), %xmm0
> > ptest   %xmm0, %xmm0
> > jne .L2
> > movdqu  16(%rdi), %xmm0
> > pxor.LC1(%rip), %xmm0
> > ptest   %xmm0, %xmm0
> > .L2:sete%al
> > movzbl  %al, %eax
> > ret
> >
> > Using a target hook allows the large amount of intelligence already in
> > compare_by_pieces to be re-used by the i386 backend, but this can also
> > help other backends with condition flags where the equality result can
> > be materialized.
> >
> > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
> > and make -k check, both with and without --target_board=unix{-m32}
> > with no new failures.  Ok for mainline?
> 
> What's the guarantee that the zero flag is appropriately set on all edges 
> incoming
> now and forever?

Is there any reason why this target hook can't be removed (in future) should it 
stop
being useful?  It's completely optional and not required for the correct 
functioning
of the compiler.

> Does this require target specific knowledge on how do_compare_rtx_and_jump
> is emitting RTL?

Yes.  Each backend can decide how best to implement finish_compare_by_pieces
given its internal knowledge of how do_compare_rtx_and_jump works.  It's not
important to the middle-end how the underlying invariants are guaranteed, just
that they are and the backend produces correct code.  A backend may store flags
on the target label, or maintain state in cfun.  Future changes to the i386 
backend
might cause it to revert to the default finish_compare_by_pieces, or provide an
alternate implementation, but at the moment this patch improves the code that
GCC generates.  Very little (in software like GCC) is forever.

> Do you see matching this in ifcvt to be unreasonable?  I'm thinking of 
> "reducing"
> the incoming edges pairwise without actually looking at the ifcvt code.

There's nothing about the proposed patch that prevents or blocks improvements

Re: [PATCH 5/5] x86: yet more PR target/100711-like splitting

2023-06-24 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 21, 2023 at 2:29 PM Jan Beulich via Gcc-patches
 wrote:
>
> Following two-operand bitwise operations, add another splitter to also
> deal with not followed by broadcast all on its own, which can be
> expressed as simple embedded broadcast instead once a broadcast operand
> is actually permitted in the respective insn. While there also permit
> a broadcast operand in the corresponding expander.
The patch LGTM.
>
> gcc/
>
> * config/i386/sse.md: New splitters to simplify
> not;vec_duplicate as a singular vpternlog.
> (one_cmpl2): Allow broadcast for operand 1.
> (one_cmpl2): Likewise.
>
> gcc/testsuite/
>
> * gcc.target/i386/pr100711-6.c: New test.
> ---
> For the purpose here (and elsewhere) bcst_vector_operand() (really:
> bcst_mem_operand()) isn't permissive enough: We'd want it to allow
> 128-bit and 256-bit types as well irrespective of AVX512VL being
> enabled. This would likely require a new predicate
> (bcst_intvec_operand()?) and a new constraint (BR? Bi?). (Yet for name
> selection it will want considering that this is applicable to certain
> non-calculational FP operations as well.)
I think so.
>
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -17156,7 +17156,7 @@
>
>  (define_expand "one_cmpl2"
>[(set (match_operand:VI 0 "register_operand")
> -   (xor:VI (match_operand:VI 1 "vector_operand")
> +   (xor:VI (match_operand:VI 1 "bcst_vector_operand")
> (match_dup 2)))]
>"TARGET_SSE"
>  {
> @@ -17168,7 +17168,7 @@
>
>  (define_insn "one_cmpl2"
>[(set (match_operand:VI 0 "register_operand" "=v,v")
> -   (xor:VI (match_operand:VI 1 "nonimmediate_operand" "v,m")
> +   (xor:VI (match_operand:VI 1 "bcst_vector_operand" "vBr,m")
> (match_operand:VI 2 "vector_all_ones_operand" "BC,BC")))]
>"TARGET_AVX512F
> && (!
> @@ -17191,6 +17191,19 @@
>   (symbol_ref " == 64 || TARGET_AVX512VL")
>   (const_int 1)))])
>
> +(define_split
> +  [(set (match_operand:VI48_AVX512F 0 "register_operand")
> +   (vec_duplicate:VI48_AVX512F
> + (not:
> +   (match_operand: 1 "nonimmediate_operand"]
> +  " == 64 || TARGET_AVX512VL
> +   || (TARGET_AVX512F && !TARGET_PREFER_AVX256)"
> +  [(set (match_dup 0)
> +   (xor:VI48_AVX512F
> + (vec_duplicate:VI48_AVX512F (match_dup 1))
> + (match_dup 2)))]
> +  "operands[2] = CONSTM1_RTX (mode);")
> +
>  (define_expand "_andnot3"
>[(set (match_operand:VI_AVX2 0 "register_operand")
> (and:VI_AVX2
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr100711-6.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -mno-avx512vl -mprefer-vector-width=512 -O2" } */
> +
> +typedef int v16si __attribute__ ((vector_size (64)));
> +typedef long long v8di __attribute__((vector_size (64)));
> +
> +v16si foo_v16si (const int *a)
> +{
> +return (__extension__ (v16si) {~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a,
> +  ~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a});
> +}
> +
> +v8di foo_v8di (const long long *a)
> +{
> +return (__extension__ (v8di) {~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a, ~*a});
> +}
> +
> +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0x55, 
> \\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}" 2 } } */
>


-- 
BR,
Hongtao


Re: [PATCH 4/5] x86: further PR target/100711-like splitting

2023-06-24 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 21, 2023 at 2:28 PM Jan Beulich via Gcc-patches
 wrote:
>
> With respective two-operand bitwise operations now expressable by a
> single VPTERNLOG, add splitters to also deal with ior and xor
> counterparts of the original and-only case. Note that the splitters need
> to be separate, as the placement of "not" differs in the final insns
> (*iornot3, *xnor3) which are intended to pick up one half of
> the result.
>
> gcc/
>
> * config/i386/sse.md: New splitters to simplify
> not;vec_duplicate;{ior,xor} as vec_duplicate;{iornot,xnor}.
>
> gcc/testsuite/
>
> * gcc.target/i386/pr100711-4.c: New test.
> * gcc.target/i386/pr100711-5.c: New test.
>
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -17366,6 +17366,36 @@
> (match_dup 2)))]
>"operands[3] = gen_reg_rtx (mode);")
>
> +(define_split
> +  [(set (match_operand:VI 0 "register_operand")
> +   (ior:VI
> + (vec_duplicate:VI
> +   (not:
> + (match_operand: 1 "nonimmediate_operand")))
> + (match_operand:VI 2 "vector_operand")))]
> +  " == 64 || TARGET_AVX512VL
> +   || (TARGET_AVX512F && !TARGET_PREFER_AVX256)"
> +  [(set (match_dup 3)
> +   (vec_duplicate:VI (match_dup 1)))
> +   (set (match_dup 0)
> +   (ior:VI (not:VI (match_dup 3)) (match_dup 2)))]
> +  "operands[3] = gen_reg_rtx (mode);")
> +
> +(define_split
> +  [(set (match_operand:VI 0 "register_operand")
> +   (xor:VI
> + (vec_duplicate:VI
> +   (not:
> + (match_operand: 1 "nonimmediate_operand")))
> + (match_operand:VI 2 "vector_operand")))]
> +  " == 64 || TARGET_AVX512VL
> +   || (TARGET_AVX512F && !TARGET_PREFER_AVX256)"
> +  [(set (match_dup 3)
> +   (vec_duplicate:VI (match_dup 1)))
> +   (set (match_dup 0)
> +   (not:VI (xor:VI (match_dup 3) (match_dup 2]
> +  "operands[3] = gen_reg_rtx (mode);")
> +
Can we merge this splitter(xor:not) into ior:not one with a code
iterator for xor,ior, They look the same except for the xor/ior.
No need to merge it into and:not case which have different guard conditions.
Others LGTM.
>  (define_insn "*andnot3_mask"
>[(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
> (vec_merge:VI48_AVX512VL
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr100711-4.c
> @@ -0,0 +1,42 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512bw -mno-avx512vl -mprefer-vector-width=512 -O2" } */
> +
> +typedef char v64qi __attribute__ ((vector_size (64)));
> +typedef short v32hi __attribute__ ((vector_size (64)));
> +typedef int v16si __attribute__ ((vector_size (64)));
> +typedef long long v8di __attribute__((vector_size (64)));
> +
> +v64qi foo_v64qi (char a, v64qi b)
> +{
> +return (__extension__ (v64qi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
> +   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
> +   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
> +   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
> +  ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
> +  ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
> +  ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
> +  ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) | b;
> +}
> +
> +v32hi foo_v32hi (short a, v32hi b)
> +{
> +return (__extension__ (v32hi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
> +   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
> +   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
> +  ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) | b;
> +}
> +
> +v16si foo_v16si (int a, v16si b)
> +{
> +return (__extension__ (v16si) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
> +  ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) | b;
> +}
> +
> +v8di foo_v8di (long long a, v8di b)
> +{
> +return (__extension__ (v8di) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a}) | b;
> +}
> +
> +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0xbb" 4 { 
> target { ! ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0xbb" 2 { 
> target { ia32 } } } } */
> +/* { dg-final { scan-assembler-times "vpternlog\[dq\]\[ \\t\]+\\\$0xdd" 2 { 
> target { ia32 } } } } */
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr100711-5.c
> @@ -0,0 +1,40 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512bw -mno-avx512vl -mprefer-vector-width=512 -O2" } */
> +
> +typedef char v64qi __attribute__ ((vector_size (64)));
> +typedef short v32hi __attribute__ ((vector_size (64)));
> +typedef int v16si __attribute__ ((vector_size (64)));
> +typedef long long v8di __attribute__((vector_size (64)));
> +
> +v64qi foo_v64qi (char a, v64qi b)
> +{
> +return (__extension__ (v64qi) {~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
> +   ~a, ~a, ~a, ~a, ~a, ~a, ~a, ~a,
> +   ~a, ~a, ~a, ~a, ~a, ~a, 

Re: [PATCH 3/5] x86: allow memory operand for AVX2 splitter for PR target/100711

2023-06-24 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 21, 2023 at 2:28 PM Jan Beulich via Gcc-patches
 wrote:
>
> The intended broadcast (with AVX512) can very well be done right from
> memory.
Ok.
>
> gcc/
>
> * config/i386/sse.md: Permit non-immediate operand 1 in AVX2
> form of splitter for PR target/100711.
>
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -17356,7 +17356,7 @@
> (and:VI_AVX2
>   (vec_duplicate:VI_AVX2
> (not:
> - (match_operand: 1 "register_operand")))
> + (match_operand: 1 "nonimmediate_operand")))
>   (match_operand:VI_AVX2 2 "vector_operand")))]
>"TARGET_AVX2"
>[(set (match_dup 3)
>


-- 
BR,
Hongtao


Re: [PATCH 2/5] x86: use VPTERNLOG also for certain andnot forms

2023-06-24 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 21, 2023 at 2:27 PM Jan Beulich via Gcc-patches
 wrote:
>
> When it's the memory operand which is to be inverted, using VPANDN*
> requires a further load instruction. The same can be achieved by a
> single VPTERNLOG*. Add two new alternatives (for plain memory and
> embedded broadcast), adjusting the predicate for the first operand
> accordingly.
>
> Two pre-existing testcases actually end up being affected (improved) by
> the change, which is reflected in updated expectations there.
LGTM.
>
> gcc/
>
> PR target/93768
> * config/i386/sse.md (*andnot3): Add new alternatives
> for memory form operand 1.
>
> gcc/testsuite/
>
> PR target/93768
> * gcc.target/i386/avx512f-andn-di-zmm-2.c: New test.
> * gcc.target/i386/avx512f-andn-si-zmm-2.c: Adjust expecations
> towards generated code.
> * gcc.target/i386/pr100711-3.c: Adjust expectations for 32-bit
> code.
>
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -17210,11 +17210,13 @@
>"TARGET_AVX512F")
>
>  (define_insn "*andnot3"
> -  [(set (match_operand:VI 0 "register_operand" "=x,x,v")
> +  [(set (match_operand:VI 0 "register_operand" "=x,x,v,v,v")
> (and:VI
> - (not:VI (match_operand:VI 1 "vector_operand" "0,x,v"))
> - (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr")))]
> -  "TARGET_SSE"
> + (not:VI (match_operand:VI 1 "bcst_vector_operand" "0,x,v,m,Br"))
> + (match_operand:VI 2 "bcst_vector_operand" "xBm,xm,vmBr,v,v")))]
> +  "TARGET_SSE
> +   && (register_operand (operands[1], mode)
> +   || register_operand (operands[2], mode))"
>  {
>char buf[64];
>const char *ops;
> @@ -17281,6 +17283,15 @@
>  case 2:
>ops = "v%s%s\t{%%2, %%1, %%0|%%0, %%1, %%2}";
>break;
> +case 3:
> +case 4:
> +  tmp = "pternlog";
> +  ssesuffix = "";
> +  if (which_alternative != 4 || TARGET_AVX512VL)
> +   ops = "v%s%s\t{$0x44, %%1, %%2, %%0|%%0, %%2, %%1, $0x44}";
> +  else
> +   ops = "v%s%s\t{$0x44, %%g1, %%g2, %%g0|%%g0, %%g2, %%g1, $0x44}";
> +  break;
>  default:
>gcc_unreachable ();
>  }
> @@ -17289,7 +17300,7 @@
>output_asm_insn (buf, operands);
>return "";
>  }
> -  [(set_attr "isa" "noavx,avx,avx")
> +  [(set_attr "isa" "noavx,avx,avx,*,*")
> (set_attr "type" "sselog")
> (set (attr "prefix_data16")
>   (if_then_else
> @@ -17297,9 +17308,12 @@
> (eq_attr "mode" "TI"))
> (const_string "1")
> (const_string "*")))
> -   (set_attr "prefix" "orig,vex,evex")
> +   (set_attr "prefix" "orig,vex,evex,evex,evex")
> (set (attr "mode")
> -   (cond [(match_test "TARGET_AVX2")
> +   (cond [(and (eq_attr "alternative" "3,4")
> +   (match_test " < 64 && !TARGET_AVX512VL"))
> +(const_string "XI")
> +  (match_test "TARGET_AVX2")
>  (const_string "")
>(match_test "TARGET_AVX")
>  (if_then_else
> @@ -17310,7 +17324,15 @@
> (match_test "optimize_function_for_size_p (cfun)"))
>  (const_string "V4SF")
>   ]
> - (const_string "")))])
> + (const_string "")))
> +   (set (attr "enabled")
> +   (cond [(eq_attr "alternative" "3")
> +(symbol_ref " == 64 || TARGET_AVX512VL")
> +  (eq_attr "alternative" "4")
> +(symbol_ref " == 64 || TARGET_AVX512VL
> + || (TARGET_AVX512F && !TARGET_PREFER_AVX256)")
> + ]
> + (const_string "*")))])
>
>  ;; PR target/100711: Split notl; vpbroadcastd; vpand as vpbroadcastd; vpandn
>  (define_split
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-andn-di-zmm-2.c
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-options "-mavx512f -mno-avx512vl -mprefer-vector-width=512 -O2" } */
> +/* { dg-final { scan-assembler-times "vpternlogq\[ \\t\]+\\\$0x44, 
> \\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
> +/* { dg-final { scan-assembler-not "vpbroadcast" } } */
> +
> +#define type __m512i
> +#define vec 512
> +#define op andnot
> +#define suffix epi64
> +#define SCALAR long long
> +
> +#include "avx512-binop-2.h"
> --- a/gcc/testsuite/gcc.target/i386/avx512f-andn-si-zmm-2.c
> +++ b/gcc/testsuite/gcc.target/i386/avx512f-andn-si-zmm-2.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile } */
>  /* { dg-options "-mavx512f -O2" } */
> -/* { dg-final { scan-assembler-times "vpbroadcastd\[^\n\]*%zmm\[0-9\]+" 1 } 
> } */
> -/* { dg-final { scan-assembler-times "vpandnd\[^\n\]*%zmm\[0-9\]+" 1 } } */
> +/* { dg-final { scan-assembler-times "vpternlogd\[ \\t\]+\\\$0x44, 
> \\(%(?:eax|rdi|edi)\\)\\\{1to\[1-8\]+\\\}, %zmm\[0-9\]+, %zmm0" 1 } } */
> +/* { dg-final { scan-assembler-not "vpbroadcast" } } */
>
>  #define type __m512i
>  #define vec 512
> --- a/gcc/testsuite/gcc.target/i386/pr100711-3.c
> +++ 

Re: [PATCH 1/5] x86: use VPTERNLOG for further bitwise two-vector operations

2023-06-24 Thread Hongtao Liu via Gcc-patches
On Wed, Jun 21, 2023 at 2:26 PM Jan Beulich via Gcc-patches
 wrote:
>
> All combinations of and, ior, xor, and not involving two operands can be
> expressed that way in a single insn.
>
> gcc/
>
> PR target/93768
> * config/i386/i386.cc (ix86_rtx_costs): Further special-case
> bitwise vector operations.
> * config/i386/sse.md (*iornot3): New insn.
> (*xnor3): Likewise.
> (*3): Likewise.
> (andor): New code iterator.
> (nlogic): New code attribute.
> (ternlog_nlogic): Likewise.
>
> gcc/testsuite/
>
> PR target/93768
> gcc.target/i386/avx512-binop-not-1.h: New.
> gcc.target/i386/avx512-binop-not-2.h: New.
> gcc.target/i386/avx512f-orn-si-zmm-1.c: New test.
> gcc.target/i386/avx512f-orn-si-zmm-2.c: New test.
> ---
> The use of VI matches that in e.g. one_cmpl2 /
> one_cmpl2 and *andnot3, despite
> (here and there)
> - V64QI and V32HI being needlessly excluded when AVX512BW isn't enabled,
> - VTI not being covered,
> - vector modes more narrow than 16 bytes not being covered.
>
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -21178,6 +21178,32 @@ ix86_rtx_costs (rtx x, machine_mode mode
>return false;
>
>  case IOR:
> +  if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> +   {
> + /* (ior (not ...) ...) can be a single insn in AVX512.  */
> + if (GET_CODE (XEXP (x, 0)) == NOT && TARGET_AVX512F
> + && (GET_MODE_SIZE (mode) == 64
> + || (TARGET_AVX512VL
> + && (GET_MODE_SIZE (mode) == 32
> + || GET_MODE_SIZE (mode) == 16
> +   {
> + rtx right = GET_CODE (XEXP (x, 1)) != NOT
> + ? XEXP (x, 1) : XEXP (XEXP (x, 1), 0);
> +
> + *total = ix86_vec_cost (mode, cost->sse_op)
> +  + rtx_cost (XEXP (XEXP (x, 0), 0), mode,
> +  outer_code, opno, speed)
> +  + rtx_cost (right, mode, outer_code, opno, speed);
> + return true;
> +   }
> + *total = ix86_vec_cost (mode, cost->sse_op);
> +   }
> +  else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
> +   *total = cost->add * 2;
> +  else
> +   *total = cost->add;
> +  return false;
> +
>  case XOR:
>if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> *total = ix86_vec_cost (mode, cost->sse_op);
> @@ -21198,11 +21224,20 @@ ix86_rtx_costs (rtx x, machine_mode mode
>   /* pandn is a single instruction.  */
>   if (GET_CODE (XEXP (x, 0)) == NOT)
> {
> + rtx right = XEXP (x, 1);
> +
> + /* (and (not ...) (not ...)) can be a single insn in AVX512.  */
> + if (GET_CODE (right) == NOT && TARGET_AVX512F
> + && (GET_MODE_SIZE (mode) == 64
> + || (TARGET_AVX512VL
> + && (GET_MODE_SIZE (mode) == 32
> + || GET_MODE_SIZE (mode) == 16
> +   right = XEXP (right, 0);
> +
>   *total = ix86_vec_cost (mode, cost->sse_op)
>+ rtx_cost (XEXP (XEXP (x, 0), 0), mode,
>outer_code, opno, speed)
> -  + rtx_cost (XEXP (x, 1), mode,
> -  outer_code, opno, speed);
> +  + rtx_cost (right, mode, outer_code, opno, speed);
>   return true;
> }
>   else if (GET_CODE (XEXP (x, 1)) == NOT)
> @@ -21260,8 +21295,25 @@ ix86_rtx_costs (rtx x, machine_mode mode
>
>  case NOT:
>if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT)
> -   // vnot is pxor -1.
> -   *total = ix86_vec_cost (mode, cost->sse_op) + 1;
> +   {
> + /* (not (xor ...)) can be a single insn in AVX512.  */
> + if (GET_CODE (XEXP (x, 0)) == XOR && TARGET_AVX512F
> + && (GET_MODE_SIZE (mode) == 64
> + || (TARGET_AVX512VL
> + && (GET_MODE_SIZE (mode) == 32
> + || GET_MODE_SIZE (mode) == 16
> +   {
> + *total = ix86_vec_cost (mode, cost->sse_op)
> +  + rtx_cost (XEXP (XEXP (x, 0), 0), mode,
> +  outer_code, opno, speed)
> +  + rtx_cost (XEXP (XEXP (x, 0), 1), mode,
> +  outer_code, opno, speed);
> + return true;
> +   }
> +
> + // vnot is pxor -1.
> + *total = ix86_vec_cost (mode, cost->sse_op) + 1;
> +   }
>else if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
> *total = cost->add * 2;
>else
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -17616,6 +17616,98 @@
>operands[2] = force_reg (V1TImode, CONSTM1_RTX (V1TImode));
>  })
>
> +(define_insn "*iornot3"
> +  [(set (match_operand:VI 

[Bug target/110400] New: Reuse vector register for both scalar and vector value.

2023-06-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110400

Bug ID: 110400
   Summary: Reuse vector register for both scalar and vector
value.
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: crazylht at gmail dot com
  Target Milestone: ---

>From PR109812 #c18

Uroš Bizjak 2023-06-21 09:46:43 UTC
One interesting observation:

clang is able to do this:

  0.09 │ │  vmovddup -0x8(%rdx,%rsi,1),%xmm3  ▒
  ...
  0.11 │ │  vfmadd231sd  %xmm2,%xmm3,%xmm1▒
  ...
  0.74 │ │  vfmadd231pd  %xmm2,%xmm3,%xmm0▒

It figures out that duplicated V2DFmode value in %xmm3 can also be accessed in
the same register as DFmode value.

OTOH, current gcc does:

vmovsd  (%rsi,%rax,8), %xmm1
...
vmovddup%xmm1, %xmm4
...
vfmadd231pd %xmm4, %xmm0, %xmm2
...
vfmadd231sd %xmm1, %xmm0, %xmm3

The above code needs two registers.



Similar with below testcase

typedef double v2df __attribute__((vector_size(16)));
v2df c;
double d;
void
foo (double* __restrict a)
{
c = __extension__(v2df) {*a, *a};
d = *a;
}

with option: -O2 -mavx2

GCC generates

foo(double*):
vmovsd  (%rdi), %xmm0
vmovddup%xmm0, %xmm1
vmovsd  %xmm0, d(%rip)
vmovapd %xmm1, c(%rip)

Clang

foo(double*):   # @foo(double*)
vmovddup(%rdi), %xmm0   # xmm0 = mem[0,0]
vmovaps %xmm0, c(%rip)
vmovlps %xmm0, d(%rip)
retq

[Bug target/110309] Wrong code for masked load expansion

2023-06-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110309

--- Comment #4 from Hongtao.liu  ---
Fixed for GCC14.

Note: unspec is not added to maskstore since vpblendd doesn't support memeory
dest, so there's no chance for a maskstore be optimized to vpblendd?

[Bug target/110309] Wrong code for masked load expansion

2023-06-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110309

--- Comment #3 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:c79476da46728e2ab17e0e546262d2f6377081aa

commit r14-2070-gc79476da46728e2ab17e0e546262d2f6377081aa
Author: liuhongt 
Date:   Tue Jun 20 15:41:00 2023 +0800

Refine maskloadmn pattern with UNSPEC_MASKLOAD.

If mem_addr points to a memory region with less than whole vector size
bytes of accessible memory and k is a mask that would prevent reading
the inaccessible bytes from mem_addr, add UNSPEC_MASKLOAD to prevent
it to be transformed to vpblendd.

gcc/ChangeLog:

PR target/110309
* config/i386/sse.md (maskload):
Refine pattern with UNSPEC_MASKLOAD.
(maskload): Ditto.
(*_load_mask): Extend mode iterator to
VI12HFBF_AVX512VL.
(*_load): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr110309.c: New test.

[Bug rtl-optimization/110237] gcc.dg/torture/pr58955-2.c is miscompiled by RTL scheduling after reload

2023-06-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110237

--- Comment #9 from Hongtao.liu  ---

> So we can simply clear only MEM_EXPR (and MEM_OFFSET), that cuts off the
> problematic part of alias analysis.  Together with UNSPEC this might be
> enough to fix things.
> 
Note maskstore won't optimized to vpblendd since it doesn't support memory
dest, so I guess no need to use UNSPEC for maskstore?

Re: [PATCH] RISC-V: force arg and target to reg rtx under -O0

2023-06-24 Thread juzhe.zh...@rivai.ai
Hi, Li.
Appreciate for catching this!

I think it's better:
-emit_insn (gen_rtx_SET (gen_lowpart (e.vector_mode (), e.target), src));
+emit_move_insn (gen_lowpart (e.vector_mode (), e.target), src);
do this to fix this issue.

Thanks.


juzhe.zh...@rivai.ai
 
From: Li Xu
Date: 2023-06-25 11:08
To: gcc-patches
CC: kito.cheng; palmer; juzhe.zhong; Li Xu
Subject: [PATCH] RISC-V: force arg and target to reg rtx under -O0
arg and target should be expanded to reg rtx during expand pass.
 
Consider this following case:
void test_vlmul_ext_v_i8mf8_i8mf4(vint8mf8_t op1) {
  vint8mf4_t res = __riscv_vlmul_ext_v_i8mf8_i8mf4(op1);
}
 
Compilation fails with:
test.c: In function 'test_vlmul_ext_v_i8mf8_i8mf4':
test.c:5:1: error: unrecognizable insn:
5 | }
  | ^
(insn 30 29 0 2 (set (mem/c:VNx2QI (reg/f:DI 143) [0 x+0 S[2, 2] A32])
(mem/c:VNx2QI (reg/f:DI 148) [0 op1+0 S[2, 2] A16])) "test.c":4:18 -1
 (nil))
during RTL pass: vregs
test.c:5:1: internal compiler error: in extract_insn, at recog.cc:2791
0x7c61b8 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
../.././riscv-gcc/gcc/rtl-error.cc:108
0x7c61d7 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../.././riscv-gcc/gcc/rtl-error.cc:116
0xed58a7 extract_insn(rtx_insn*)
../.././riscv-gcc/gcc/recog.cc:2791
0xb7f789 instantiate_virtual_regs_in_insn
../.././riscv-gcc/gcc/function.cc:1611
0xb7f789 instantiate_virtual_regs
../.././riscv-gcc/gcc/function.cc:1984
 
gcc/ChangeLog:
 
* config/riscv/riscv-vector-builtins-bases.cc: force arg and target to 
reg rtx.
 
gcc/testsuite/ChangeLog:
 
* gcc.target/riscv/rvv/base/vlmul_ext-2.c: New test.
---
gcc/config/riscv/riscv-vector-builtins-bases.cc   | 5 -
gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c | 8 
2 files changed, 12 insertions(+), 1 deletion(-)
create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c
 
diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index c6c53dc13a5..f135f7971fa 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1567,7 +1567,10 @@ public:
   {
 tree arg = CALL_EXPR_ARG (e.exp, 0);
 rtx src = expand_normal (arg);
-emit_insn (gen_rtx_SET (gen_lowpart (e.vector_mode (), e.target), src));
+if (MEM_P (e.target))
+  e.target = force_reg (GET_MODE (e.target), e.target);
+emit_insn (gen_rtx_SET (gen_lowpart (e.vector_mode (), e.target),
+  MEM_P (src) ? force_reg (GET_MODE (src), src) : src));
 return e.target;
   }
};
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c
new file mode 100644
index 000..2b088b53546
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O0" } */
+
+#include "riscv_vector.h"
+
+void test_vlmul_ext_v_i8mf8_i8mf4(vint8mf8_t op1) {
+  vint8mf4_t res = __riscv_vlmul_ext_v_i8mf8_i8mf4(op1);
+}
-- 
2.17.1
 
 


[PATCH] internal-fn: Fix bug of BIAS argument index

2023-06-24 Thread juzhe . zhong
From: Ju-Zhe Zhong 

When trying to enable LEN_MASK_{LOAD,STORE} in RISC-V port,
I found I made a mistake in case of argument index of BIAS.

This patch is an obvious fix,

Ok for trunk ?

gcc/ChangeLog:

* internal-fn.cc (expand_partial_store_optab_fn): Fix bug of BIAS 
argument index.

---
 gcc/internal-fn.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/internal-fn.cc b/gcc/internal-fn.cc
index 1c2fd487e2a..9017176dc7a 100644
--- a/gcc/internal-fn.cc
+++ b/gcc/internal-fn.cc
@@ -2991,7 +2991,7 @@ expand_partial_store_optab_fn (internal_fn ifn, gcall 
*stmt, convert_optab optab
   maskt = gimple_call_arg (stmt, 3);
   mask = expand_normal (maskt);
   create_input_operand ([3], mask, TYPE_MODE (TREE_TYPE (maskt)));
-  biast = gimple_call_arg (stmt, 4);
+  biast = gimple_call_arg (stmt, 5);
   bias = expand_normal (biast);
   create_input_operand ([4], bias, QImode);
   icode = convert_optab_handler (optab, TYPE_MODE (type), GET_MODE (mask));
-- 
2.36.3



[Bug tree-optimization/110371] [14 Regression] gfortran ICE "verify_gimple failed" in gfortran.dg/vect/pr51058-2.f90 since r14-2007

2023-06-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110371

--- Comment #7 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #6)
> (In reply to Thiago Jung Bauermann from comment #0)
> > Created attachment 55387 [details]
> > Output of running gfortran with -freport-bug
> > 
> > In today's trunk (tested commit 33ebb0dff9bb "configure: Implement
> > --enable-host-bind-now") I get these new failures on aarch64-linux-gnu:
> > 
> > Running gcc:gcc.target/aarch64/sve/aarch64-sve.exp ...
> > FAIL: gcc.target/aarch64/sve/pack_fcvt_signed_1.c scan-assembler-times
> > \\tfcvtzs\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.d\\n 2
> > FAIL: gcc.target/aarch64/sve/pack_fcvt_signed_1.c scan-assembler-times
> > \\tuzp1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
> > FAIL: gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c scan-assembler-times
> > \\tfcvtzu\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.d\\n 2
> > FAIL: gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c scan-assembler-times
> > \\tuzp1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
> > FAIL: gcc.target/aarch64/sve/unpack_fcvt_signed_1.c scan-assembler-times
> > \\tscvtf\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.s\\n 2
> > FAIL: gcc.target/aarch64/sve/unpack_fcvt_signed_1.c scan-assembler-times
> > \\tzip1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
> > FAIL: gcc.target/aarch64/sve/unpack_fcvt_signed_1.c scan-assembler-times
> > \\tzip2\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
> > FAIL: gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c scan-assembler-times
> > \\tucvtf\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.s\\n 2
> > FAIL: gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c scan-assembler-times
> > \\tzip1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
> > FAIL: gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c scan-assembler-times
> > \\tzip2\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
> > === gfortran tests ===
> > 
> 
> For this scan-assembler failures, It looks like gcc now generates better
> code, is it ok to adjust testcase to match new assembly?
> 
> current:
> ld1dz31.d, p7/z, [x1, x3, lsl 3]
> faddz31.d, p7/m, z31.d, z30.d
> fcvtzs  z31.d, p6/m, z31.d
> st1wz31.d, p7, [x0, x3, lsl 2]
> add x3, x3, x4
> whilelo p7.d, w3, w2
> b.any   .L3
> 
> vs 
> original
> punpklo p2.h, p0.b
> punpkhi p1.h, p0.b
> ld1dz0.d, p2/z, [x1, x3, lsl 3]
> ld1dz1.d, p1/z, [x5, x3, lsl 3]
> faddz0.d, p2/m, z0.d, z2.d
> faddz1.d, p1/m, z1.d, z2.d
> fcvtzs  z0.s, p3/m, z0.d
> fcvtzs  z1.s, p3/m, z1.d
> uzp1z0.s, z0.s, z1.s
> st1wz0.s, p0, [x0, x3, lsl 2]
> add x3, x3, x4
> whilelo p0.s, w3, w2
> b.any   .L3
> 
> 
> https://godbolt.org/z/b4cW7WKev

Or only adjust testcase for FLOAT_EXPR, not for FIX_TRUNC_EXPR to avoid float-
integer overflow.

[Bug tree-optimization/110371] [14 Regression] gfortran ICE "verify_gimple failed" in gfortran.dg/vect/pr51058-2.f90 since r14-2007

2023-06-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110371

--- Comment #6 from Hongtao.liu  ---
(In reply to Thiago Jung Bauermann from comment #0)
> Created attachment 55387 [details]
> Output of running gfortran with -freport-bug
> 
> In today's trunk (tested commit 33ebb0dff9bb "configure: Implement
> --enable-host-bind-now") I get these new failures on aarch64-linux-gnu:
> 
> Running gcc:gcc.target/aarch64/sve/aarch64-sve.exp ...
> FAIL: gcc.target/aarch64/sve/pack_fcvt_signed_1.c scan-assembler-times
> \\tfcvtzs\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.d\\n 2
> FAIL: gcc.target/aarch64/sve/pack_fcvt_signed_1.c scan-assembler-times
> \\tuzp1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
> FAIL: gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c scan-assembler-times
> \\tfcvtzu\\tz[0-9]+\\.s, p[0-7]/m, z[0-9]+\\.d\\n 2
> FAIL: gcc.target/aarch64/sve/pack_fcvt_unsigned_1.c scan-assembler-times
> \\tuzp1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
> FAIL: gcc.target/aarch64/sve/unpack_fcvt_signed_1.c scan-assembler-times
> \\tscvtf\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.s\\n 2
> FAIL: gcc.target/aarch64/sve/unpack_fcvt_signed_1.c scan-assembler-times
> \\tzip1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
> FAIL: gcc.target/aarch64/sve/unpack_fcvt_signed_1.c scan-assembler-times
> \\tzip2\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
> FAIL: gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c scan-assembler-times
> \\tucvtf\\tz[0-9]+\\.d, p[0-7]/m, z[0-9]+\\.s\\n 2
> FAIL: gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c scan-assembler-times
> \\tzip1\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
> FAIL: gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c scan-assembler-times
> \\tzip2\\tz[0-9]+\\.s, z[0-9]+\\.s, z[0-9]+\\.s\\n 1
>   === gfortran tests ===
> 

For this scan-assembler failures, It looks like gcc now generates better code,
is it ok to adjust testcase to match new assembly?

current:
ld1dz31.d, p7/z, [x1, x3, lsl 3]
faddz31.d, p7/m, z31.d, z30.d
fcvtzs  z31.d, p6/m, z31.d
st1wz31.d, p7, [x0, x3, lsl 2]
add x3, x3, x4
whilelo p7.d, w3, w2
b.any   .L3

vs 
original
punpklo p2.h, p0.b
punpkhi p1.h, p0.b
ld1dz0.d, p2/z, [x1, x3, lsl 3]
ld1dz1.d, p1/z, [x5, x3, lsl 3]
faddz0.d, p2/m, z0.d, z2.d
faddz1.d, p1/m, z1.d, z2.d
fcvtzs  z0.s, p3/m, z0.d
fcvtzs  z1.s, p3/m, z1.d
uzp1z0.s, z0.s, z1.s
st1wz0.s, p0, [x0, x3, lsl 2]
add x3, x3, x4
whilelo p0.s, w3, w2
b.any   .L3


https://godbolt.org/z/b4cW7WKev

[PATCH] RISC-V: force arg and target to reg rtx under -O0

2023-06-24 Thread Li Xu
arg and target should be expanded to reg rtx during expand pass.

Consider this following case:
void test_vlmul_ext_v_i8mf8_i8mf4(vint8mf8_t op1) {
  vint8mf4_t res = __riscv_vlmul_ext_v_i8mf8_i8mf4(op1);
}

Compilation fails with:
test.c: In function 'test_vlmul_ext_v_i8mf8_i8mf4':
test.c:5:1: error: unrecognizable insn:
5 | }
  | ^
(insn 30 29 0 2 (set (mem/c:VNx2QI (reg/f:DI 143) [0 x+0 S[2, 2] A32])
(mem/c:VNx2QI (reg/f:DI 148) [0 op1+0 S[2, 2] A16])) "test.c":4:18 -1
 (nil))
during RTL pass: vregs
test.c:5:1: internal compiler error: in extract_insn, at recog.cc:2791
0x7c61b8 _fatal_insn(char const*, rtx_def const*, char const*, int, char const*)
../.././riscv-gcc/gcc/rtl-error.cc:108
0x7c61d7 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../.././riscv-gcc/gcc/rtl-error.cc:116
0xed58a7 extract_insn(rtx_insn*)
../.././riscv-gcc/gcc/recog.cc:2791
0xb7f789 instantiate_virtual_regs_in_insn
../.././riscv-gcc/gcc/function.cc:1611
0xb7f789 instantiate_virtual_regs
../.././riscv-gcc/gcc/function.cc:1984

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: force arg and target to 
reg rtx.

gcc/testsuite/ChangeLog:

* gcc.target/riscv/rvv/base/vlmul_ext-2.c: New test.
---
 gcc/config/riscv/riscv-vector-builtins-bases.cc   | 5 -
 gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c | 8 
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c

diff --git a/gcc/config/riscv/riscv-vector-builtins-bases.cc 
b/gcc/config/riscv/riscv-vector-builtins-bases.cc
index c6c53dc13a5..f135f7971fa 100644
--- a/gcc/config/riscv/riscv-vector-builtins-bases.cc
+++ b/gcc/config/riscv/riscv-vector-builtins-bases.cc
@@ -1567,7 +1567,10 @@ public:
   {
 tree arg = CALL_EXPR_ARG (e.exp, 0);
 rtx src = expand_normal (arg);
-emit_insn (gen_rtx_SET (gen_lowpart (e.vector_mode (), e.target), src));
+if (MEM_P (e.target))
+  e.target = force_reg (GET_MODE (e.target), e.target);
+emit_insn (gen_rtx_SET (gen_lowpart (e.vector_mode (), e.target),
+  MEM_P (src) ? force_reg (GET_MODE (src), src) : src));
 return e.target;
   }
 };
diff --git a/gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c 
b/gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c
new file mode 100644
index 000..2b088b53546
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/rvv/base/vlmul_ext-2.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gcv -mabi=lp64d -O0" } */
+
+#include "riscv_vector.h"
+
+void test_vlmul_ext_v_i8mf8_i8mf4(vint8mf8_t op1) {
+  vint8mf4_t res = __riscv_vlmul_ext_v_i8mf8_i8mf4(op1);
+}
-- 
2.17.1



[Bug tree-optimization/110371] [14 Regression] gfortran ICE "verify_gimple failed" in gfortran.dg/vect/pr51058-2.f90 since r14-2007

2023-06-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110371

--- Comment #5 from Hongtao.liu  ---
Reproduced with

typedef struct dest
{
  double m[3][3];
} dest;

typedef struct src
{
  int m[3][3];
} src;

void
foo (dest *a, src* s)
{
  for (int i = 0; i != 3; i++)
for (int j = 0; j != 3; j++)
  a->m[i][j] = s->m[i][j];
}


for aarch64-linux-gnu.

The problem is when there's more than 1 vop in vec_oprnds0, vec_dest will be
overwrited to final vectype_out, but here it's expecting cvt_type. I'm testing
below:

Staged changes
1 file changed, 10 insertions(+), 4 deletions(-)
gcc/tree-vect-stmts.cc | 14 ++

modified   gcc/tree-vect-stmts.cc
@@ -5044,7 +5044,7 @@ vectorizable_conversion (vec_info *vinfo,
  gimple **vec_stmt, slp_tree slp_node,
  stmt_vector_for_cost *cost_vec)
 {
-  tree vec_dest;
+  tree vec_dest, cvt_op;
   tree scalar_dest;
   tree op0, op1 = NULL_TREE;
   loop_vec_info loop_vinfo = dyn_cast  (vinfo);
@@ -5568,6 +5568,13 @@ vectorizable_conversion (vec_info *vinfo,
 case NONE:
   vect_get_vec_defs (vinfo, stmt_info, slp_node, ncopies,
  op0, _oprnds0);
+  /* vec_dest is intermediate type operand when multi_step_cvt.  */
+  if (multi_step_cvt)
+{
+  cvt_op = vec_dest;
+  vec_dest = vec_dsts[0];
+}
+
   FOR_EACH_VEC_ELT (vec_oprnds0, i, vop0)
 {
   /* Arguments are ready, create the new vector stmt.  */
@@ -5575,12 +5582,11 @@ vectorizable_conversion (vec_info *vinfo,
   if (multi_step_cvt)
 {
   gcc_assert (multi_step_cvt == 1);
-  new_stmt = vect_gimple_build (vec_dest, codecvt1, vop0);
-  new_temp = make_ssa_name (vec_dest, new_stmt);
+  new_stmt = vect_gimple_build (cvt_op, codecvt1, vop0);
+  new_temp = make_ssa_name (cvt_op, new_stmt);
   gimple_assign_set_lhs (new_stmt, new_temp);
   vect_finish_stmt_generation (vinfo, stmt_info, new_stmt, gsi);
   vop0 = new_temp;
-  vec_dest = vec_dsts[0];
 }
   new_stmt = vect_gimple_build (vec_dest, code1, vop0);
   new_temp = make_ssa_name (vec_dest, new_stmt);

[back]

Re: Re: [PATCH V1] RISC-V:Add float16 tuple type support

2023-06-24 Thread juzhe.zh...@rivai.ai
Such issue will be addressed by this patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-June/622440.html 
But still wait for Jakub's comments.



juzhe.zh...@rivai.ai
 
From: Andreas Schwab
Date: 2023-06-23 18:25
To: shiyulong
CC: gcc-patches; palmer; kito.cheng; jim.wilson.gcc; juzhe.zhong; pan2.li; 
wuwei2016; jiawei; shihua; dje.gcc; mirimmad
Subject: Re: [PATCH V1] RISC-V:Add float16 tuple type support
../../gcc/lto-streamer-out.cc: In function 'void lto_output_init_mode_table()':
../../gcc/lto-streamer-out.cc:3177:10: error: 'void* memset(void*, int, 
size_t)' forming offset [256, 283] is out of the bounds [0, 256] of object 
'streamer_mode_table' with type 'unsigned char [256]' [-Werror=array-bounds=]
3177 |   memset (streamer_mode_table, '\0', MAX_MACHINE_MODE);
  |   ~~~^
In file included from ../../gcc/gimple-streamer.h:25,
 from ../../gcc/lto-streamer-out.cc:33:
../../gcc/tree-streamer.h:78:22: note: 'streamer_mode_table' declared here
   78 | extern unsigned char streamer_mode_table[1 << 8];
  |  ^~~
cc1plus: all warnings being treated as errors
make[3]: *** [Makefile:1180: lto-streamer-out.o] Error 1
 
-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
"And now for something completely different."
 


[PATCHv4, rs6000] Splat vector small V2DI constants with ISA 2.07 instructions [PR104124]

2023-06-24 Thread HAO CHEN GUI via Gcc-patches
Hi,
  This patch adds a new insn for vector splat with small V2DI constants on P8.
If the value of constant is in RANGE (-16, 15) and not 0 or -1, it can be loaded
with vspltisw and vupkhsw on P8. It should be efficient than loading vector from
memory.

  Compared to last version, the main change is to remove the new constraint and
use a super constraint in the insn and set the check into insn condition.

  Bootstrapped and tested on powerpc64-linux BE and LE with no regressions.

Thanks
Gui Haochen

ChangeLog
2023-06-25  Haochen Gui 

gcc/
PR target/104124
* config/rs6000/altivec.md (*altivec_vupkhs_direct): Rename
to...
(altivec_vupkhs_direct): ...this.
* config/rs6000/predicates.md (vspltisw_vupkhsw_constant_split): New
predicate to test if a constant can be loaded with vspltisw and
vupkhsw.
(easy_vector_constant): Call vspltisw_vupkhsw_constant_p to Check if
a vector constant can be synthesized with a vspltisw and a vupkhsw.
* config/rs6000/rs6000-protos.h (vspltisw_vupkhsw_constant_p): Declare.
* config/rs6000/rs6000.cc (vspltisw_vupkhsw_constant_p): New function
to return true if OP mode is V2DI and can be synthesized with vupkhsw
and vspltisw.
* config/rs6000/vsx.md (*vspltisw_v2di_split): New insn to load up
constants with vspltisw and vupkhsw.

gcc/testsuite/
PR target/104124
* gcc.target/powerpc/pr104124.c: New.

patch.diff
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 49b0c964f4d..2c932854c33 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2542,7 +2542,7 @@ (define_insn "altivec_vupkhs"
 }
   [(set_attr "type" "vecperm")])

-(define_insn "*altivec_vupkhs_direct"
+(define_insn "altivec_vupkhs_direct"
   [(set (match_operand:VP 0 "register_operand" "=v")
(unspec:VP [(match_operand: 1 "register_operand" "v")]
 UNSPEC_VUNPACK_HI_SIGN_DIRECT))]
diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
index 52c65534e51..f62a4d9b506 100644
--- a/gcc/config/rs6000/predicates.md
+++ b/gcc/config/rs6000/predicates.md
@@ -694,6 +694,12 @@ (define_predicate "xxspltib_constant_split"
   return num_insns > 1;
 })

+;; Return true if the operand is a constant that can be loaded with a vspltisw
+;; instruction and then a vupkhsw instruction.
+
+(define_predicate "vspltisw_vupkhsw_constant_split"
+  (and (match_code "const_vector")
+   (match_test "vspltisw_vupkhsw_constant_p (op, mode)")))

 ;; Return 1 if the operand is constant that can loaded directly with a XXSPLTIB
 ;; instruction.
@@ -742,6 +748,11 @@ (define_predicate "easy_vector_constant"
   && xxspltib_constant_p (op, mode, _insns, ))
return true;

+  /* V2DI constant within RANGE (-16, 15) can be synthesized with a
+vspltisw and a vupkhsw.  */
+  if (vspltisw_vupkhsw_constant_p (op, mode, ))
+   return true;
+
   return easy_altivec_constant (op, mode);
 }

diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 1a4fc1df668..00cb2d82953 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, 
rtx, int, int, int,

 extern int easy_altivec_constant (rtx, machine_mode);
 extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
+extern bool vspltisw_vupkhsw_constant_p (rtx, machine_mode, int * = nullptr);
 extern int vspltis_shifted (rtx);
 extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
 extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 3be5860dd9b..ae34a02b282 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -6638,6 +6638,36 @@ xxspltib_constant_p (rtx op,
   return true;
 }

+/* Return true if OP mode is V2DI and can be synthesized with ISA 2.07
+   instructions vupkhsw and vspltisw.
+
+   Return the constant that is being split via CONSTANT_PTR.  */
+
+bool
+vspltisw_vupkhsw_constant_p (rtx op, machine_mode mode, int *constant_ptr)
+{
+  HOST_WIDE_INT value;
+  rtx elt;
+
+  if (!TARGET_P8_VECTOR)
+return false;
+
+  if (mode != V2DImode)
+return false;
+
+  if (!const_vec_duplicate_p (op, ))
+return false;
+
+  value = INTVAL (elt);
+  if (value == 0 || value == 1
+  || !EASY_VECTOR_15 (value))
+return false;
+
+  if (constant_ptr)
+*constant_ptr = (int) value;
+  return true;
+}
+
 const char *
 output_vec_const_move (rtx *operands)
 {
diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
index 7d845df5c2d..4919b073e50 100644
--- a/gcc/config/rs6000/vsx.md
+++ b/gcc/config/rs6000/vsx.md
@@ -1174,6 +1174,30 @@ (define_insn_and_split "*xxspltib__split"
   [(set_attr "type" "vecperm")
(set_attr "length" "8")])

+(define_insn_and_split 

[Bug middle-end/13421] IA32 bigmem pointer subtraction and –ftrapv option causes unjustified program abort

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=13421

Andrew Pinski  changed:

   What|Removed |Added

 CC||baiwfg2 at gmail dot com

--- Comment #16 from Andrew Pinski  ---
*** Bug 110399 has been marked as a duplicate of this bug. ***

[Bug middle-end/110399] pointer substraction causes coredump with ftrapv on edge case

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110399

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Andrew Pinski  ---
Dup of bug 13421.

*** This bug has been marked as a duplicate of bug 13421 ***

[Bug middle-end/110399] pointer substraction causes coredump with ftrapv on edge case

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110399

--- Comment #1 from Andrew Pinski  ---
32 bit, w1=2
w2=2
w3=2
w4=0
w5=2

Program received signal SIGABRT, Aborted.

[Bug c/110399] New: pointer substraction causes coredump with ftrapv on edge case

2023-06-24 Thread baiwfg2 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110399

Bug ID: 110399
   Summary: pointer substraction causes coredump with ftrapv on
edge case
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: baiwfg2 at gmail dot com
  Target Milestone: ---

The demo code is :

```c
#include 
#include 
#include 
#include 

int main() {
{
char *p = (char *)0x8001;
char *q = (char *)0x7fff;
uint32_t w = p - q;
printf("32 bit, w1=%u\n", w);
}

{
char *p = (char *)0x7fff;
char *q = (char *)0x7ffd;
uint32_t w2 = p - q;
printf("w2=%u\n", w2);
}

{
char *p = (char *)0x8003;
char *q = (char *)0x8001;
uint32_t w3 = p - q;
printf("w3=%u\n", w3);
}

{
char *p = (char *)0x8001;
char *q = (char *)0x0001;
uint32_t w4 = p - q;
printf("w4=%u\n", w4); // ans is 0, not crash under -ftrapv
}

{
char *p = (char *)0x8001;
char *q = (char *)0x7fff;
uint32_t w5 = (uintptr_t)p - (uintptr_t)q;
printf("w5=%u\n", w5);
}

{
char *p = (char *)0x8001; // use uint8_t also crash
char *q = (char *)0x7fff; // use smaller num
0x0011, also crash
uint32_t w6 = p - q;
printf("w6=%u\n", w6); // crash under gcc -ftrapv, not crash under
clang -ftrapv
}

return 0;
}
```

The statement w6 = p - q cause coredump. But what program actually means do
pointer unsigned arithmetic operation. How can I make it right(that is, output
2) with ftrapv option ? I find it's ok with clang -ftrapv .

This happens on many gcc versions.

Re: [PATCH] RISCV: Add -m(no)-omit-leaf-frame-pointer support.

2023-06-24 Thread Stefan O'Rear via Gcc-patches
On Sat, Jun 24, 2023, at 11:01 AM, Jeff Law via Gcc-patches wrote:
> On 6/21/23 02:14, Wang, Yanzhang wrote:
>> Hi Jeff, sorry for the late reply.
>> 
>>> The long branch handling is done at the assembler level.  So the clobbering
>>> of $ra isn't visible to the compiler.  Thus the compiler has to be
>>> extremely careful to not hold values in $ra because the assembler may
>>> clobber $ra.
>> 
>> If assembler will modify the $ra behavior, it seems the rules we defined in
>> the riscv.cc will be ignored. For example, the $ra saving generated by this
>> patch may be modified by the assmebler and all others depends on it will be
>> wrong. So implementing the long jump in the compiler is better.
> Basically correct.  The assembler potentially clobbers $ra.  That's why 
> in the long jump patches $ra becomes a fixed register -- the compiler 
> doesn't know when it's clobbered by the assembler.
>
> Even if this were done in the compiler, we'd still have to do something 
> special with $ra.  The point at which decisions about register 
> allocation and such are made is before the point where we know the final 
> positions of jumps/labels.  It's a classic problem in GCC's design.

Do you have a reference for more information on the long jump patches?

I'm particularly curious about why $ra was selected as the temporary instead
of $t1 like the tail pseudoinstruction uses.

-s


RE: [PATCH] SSA ALIAS: Apply LEN_MASK_{LOAD, STORE} into SSA alias analysis

2023-06-24 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Saturday, June 24, 2023 10:09 PM
To: 钟居哲 ; gcc-patches 
Cc: rguenther ; richard.sandiford 
Subject: Re: [PATCH] SSA ALIAS: Apply LEN_MASK_{LOAD, STORE} into SSA alias 
analysis



On 6/23/23 17:21, 钟居哲 wrote:
> Not sure since I saw MASK_STORE/LEN_STORE didn't compute size.
Yea, I think you're right.  We take the size from the LHS.  My mistake.

This is fine for the trunk.

jeff


Re: [PATCH v2] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-24 Thread Hongtao Liu via Gcc-patches
On Sun, Jun 25, 2023 at 9:17 AM Liu, Hongtao  wrote:
>
>
>
> > -Original Message-
> > From: Jan Beulich 
> > Sent: Wednesday, June 21, 2023 8:40 PM
> > To: Hongtao Liu 
> > Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin ; Liu,
> > Hongtao 
> > Subject: Re: [PATCH v2] x86: make better use of VBROADCASTSS /
> > VPBROADCASTD
> >
> > On 21.06.2023 09:44, Jan Beulich wrote:
> > > On 21.06.2023 09:37, Hongtao Liu wrote:
> > >> On Wed, Jun 21, 2023 at 2:06 PM Jan Beulich via Gcc-patches
> > >>  wrote:
> > >>>
> > >>> Isn't prefix_extra use bogus here? What extra prefix does
> > >>> vbroadcastss
> > >> According to comments, yes, no extra prefix is needed.
> > >>
> > >> ;; There are also additional prefixes in 3DNOW, SSSE3.
> > >> ;; ssemuladd,sse4arg default to 0f24/0f25 and DREX byte, ;;
> > >> sseiadd1,ssecvt1 to 0f7a with no DREX byte.
> > >> ;; 3DNOW has 0f0f prefix, SSSE3 and SSE4_{1,2} 0f38/0f3a.
> > >
> > > Right, that's what triggered my question. I guess dropping these
> > > "prefix_extra" really wants to be a separate patch (or maybe even
> > > multiple, but it's hard to see how to split), dealing with all of the
> > > instances which likely have accumulated simply via copy-and-paste.
> >
> > Or wait - I'm altering those lines anyway, so I could as well drop them 
> > right
> > away (and slightly shrink patch size), if that's okay with you. Of course I
> > should then not forget to also mention this in the changelog entry.
> >
> Yes.
>Would you be okay for me to fold in that adjustment, or do you
>insist on a separate patch?
Also for this, no need for a separate patch.
> > Jan



-- 
BR,
Hongtao


RE: [PATCH v2] x86: make better use of VBROADCASTSS / VPBROADCASTD

2023-06-24 Thread Liu, Hongtao via Gcc-patches


> -Original Message-
> From: Jan Beulich 
> Sent: Wednesday, June 21, 2023 8:40 PM
> To: Hongtao Liu 
> Cc: gcc-patches@gcc.gnu.org; Kirill Yukhin ; Liu,
> Hongtao 
> Subject: Re: [PATCH v2] x86: make better use of VBROADCASTSS /
> VPBROADCASTD
> 
> On 21.06.2023 09:44, Jan Beulich wrote:
> > On 21.06.2023 09:37, Hongtao Liu wrote:
> >> On Wed, Jun 21, 2023 at 2:06 PM Jan Beulich via Gcc-patches
> >>  wrote:
> >>>
> >>> Isn't prefix_extra use bogus here? What extra prefix does
> >>> vbroadcastss
> >> According to comments, yes, no extra prefix is needed.
> >>
> >> ;; There are also additional prefixes in 3DNOW, SSSE3.
> >> ;; ssemuladd,sse4arg default to 0f24/0f25 and DREX byte, ;;
> >> sseiadd1,ssecvt1 to 0f7a with no DREX byte.
> >> ;; 3DNOW has 0f0f prefix, SSSE3 and SSE4_{1,2} 0f38/0f3a.
> >
> > Right, that's what triggered my question. I guess dropping these
> > "prefix_extra" really wants to be a separate patch (or maybe even
> > multiple, but it's hard to see how to split), dealing with all of the
> > instances which likely have accumulated simply via copy-and-paste.
> 
> Or wait - I'm altering those lines anyway, so I could as well drop them right
> away (and slightly shrink patch size), if that's okay with you. Of course I
> should then not forget to also mention this in the changelog entry.
> 
Yes.
> Jan


[Bug tree-optimization/110371] [14 Regression] gfortran ICE "verify_gimple failed" in gfortran.dg/vect/pr51058-2.f90 since r14-2007

2023-06-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110371

--- Comment #4 from Hongtao.liu  ---
I'll take a look.

[Bug ada/110398] New: Program_Error sem_eval.adb:4635 explicit raise

2023-06-24 Thread aj at ianozi dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110398

Bug ID: 110398
   Summary: Program_Error sem_eval.adb:4635 explicit raise
   Product: gcc
   Version: 13.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: ada
  Assignee: unassigned at gcc dot gnu.org
  Reporter: aj at ianozi dot com
CC: dkm at gcc dot gnu.org
  Target Milestone: ---

Tested with godbolt: https://ada.godbolt.org/z/ezqshsxzo

Also tested on version 12 (the link is using version 13).

Steps to reproduce:
1) Create "foo.ads" with:
```
package Foo is
  subtype Bar is String (1 .. 3) with 
Dynamic_Predicate => Bar in
"ABC" | "DEF";
end Foo;
```

2) Create "foobar.ads" with:
```
with Foo;
package Foobar is
  subtype Foo_Bar is Foo.Bar;
end Foobar;
```

3) Create "foobar-nested.ads" with:
```
package Foobar.Nested is
function Test_Function
(Item : Foo_Bar) return Boolean is (True);
end Foobar.Nested;
```

4) Create "example.adb" with:
```
with Foobar.Nested;
procedure Example is
  Bug : constant Boolean := Foobar.Nested.Test_Function ("ABC");
begin
  null;
end Example;
```

It fails with:
```
gcc -c -I/app/ -g -fdiagnostics-color=always -S -fverbose-asm -masm=intel -o
/app/example.s -I- 
gnatmake: "" compilation error
+===GNAT BUG DETECTED==+
| 13.1.0 (x86_64-linux-gnu) Program_Error sem_eval.adb:4635 explicit raise |
| Error detected at example.adb:3:42   |
| Compiling|
| Please submit a bug report; see https://gcc.gnu.org/bugs/ .  |
| Use a subject line meaningful to you and us to track the bug.|
| Include the entire contents of this bug box in the report.   |
| Include the exact command that you entered.  |
| Also include sources listed below.   |
+==+

Please include these source files with error report
Note that list may not be accurate in some cases,
so please double check that the problem can still
be reproduced with the set of files listed.
Consider also -gnatd.n switch (see debug.adb).


/app/foobar.ads
/app/foo.ads
/app/foobar-nested.ads

compilation abandoned
Compiler returned: 4
```
(I took this from godbolt but the same error happens on my local systems)

If I changed the definition of "Test_Function" to the following it works, so
I'm guessing it has to do with the subtype:
```
function Test_Function
(Item : Foo.Bar) return Boolean is (True);
```

RE: [PATCH V1] RISC-V:Add float16 tuple type abi

2023-06-24 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Jeff Law  
Sent: Saturday, June 24, 2023 10:51 PM
To: juzhe.zh...@rivai.ai; yulong ; gcc-patches 

Cc: palmer ; Kito.cheng ; Li, Pan2 
; wuwei2016 ; jiawei 
; shihua ; dje.gcc ; 
pinskia ; Robin Dapp 
Subject: Re: [PATCH V1] RISC-V:Add float16 tuple type abi



On 6/21/23 01:46, juzhe.zh...@rivai.ai wrote:
> LGTM. Thanks.
OK from me as well.
jeff


[Bug middle-end/77294] __builtin_object_size inconsistent for member arrays

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=77294

Andrew Pinski  changed:

   What|Removed |Added

   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=64715

--- Comment #2 from Andrew Pinski  ---
I think this is an dup of bug 64715.

[Bug middle-end/44384] builtin_object_size_ treatment of multidimensional arrays is unexpected

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44384

Andrew Pinski  changed:

   What|Removed |Added

 CC||siddhesh at gcc dot gnu.org

--- Comment #6 from Andrew Pinski  ---
*** Bug 110373 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/110373] __builtin_object_size does not recognize subarrays in multi-dimensional arrays

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110373

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #1 from Andrew Pinski  ---
Dup of bug 44384.

*** This bug has been marked as a duplicate of bug 44384 ***

[Bug middle-end/79173] add-with-carry and subtract-with-borrow support (x86_64 and others)

2023-06-24 Thread vincent-gcc at vinc17 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79173

--- Comment #32 from Vincent Lefèvre  ---
(In reply to Jakub Jelinek from comment #31)
> (In reply to Vincent Lefèvre from comment #30)
> > (In reply to Jakub Jelinek from comment #29)
> > > I mean that if the compiler can't see it is in [0, 1], it will need
> > > to use 2 additions and or the 2 carry bits together.  But, because
> > > the ored carry bits are in [0, 1] range, all the higher limbs could
> > > be done using addc.
> > 
> > If the compiler can't see that carryin is in [0, 1], then it must not "or"
> > the carry bits; it needs to add them, as carryout may be 2.
> 
> That is not how the clang builtin works, which is why I've implemented the |
> and documented it that way, as it is a compatibility builtin.

I'm confused. In Comment 14, you said that

  *carry_out = c1 + c2;

was used. This is an addition, not an OR.

[Bug c++/110395] GCOV stuck in an infinite loop with large std::array

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110395

--- Comment #3 from Andrew Pinski  ---
Note it is not an infinite loop, just many basic blocks (over 4 of them)
causing the performance to be very very slow.

[Bug c++/110395] GCOV stuck in an infinite loop with large std::array

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110395

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Andrew Pinski  ---
Fixed in GCC 12.1.0 by the same patch which fixed PR 92385 .

[Bug gcov-profile/110395] GCOV stuck in an infinite loop with large std::array

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110395

--- Comment #1 from Andrew Pinski  ---
On the trunk it takes no time at all:
[apinski@xeond2 upstream-gcc-git]$ ~/upstream-gcc/bin/g++  t.cc --coverage
[apinski@xeond2 upstream-gcc-git]$ LD_LIBRARY_PATH=~/upstream-gcc/lib64 ./a.out
[apinski@xeond2 upstream-gcc-git]$ LD_LIBRARY_PATH=~/upstream-gcc/lib64
~/upstream-gcc/bin/gcov t.cc
t.gcno:cannot open notes file
t.gcda:cannot open data file, assuming not executed
No executable lines
[apinski@xeond2 upstream-gcc-git]$ LD_LIBRARY_PATH=~/upstream-gcc/lib64
~/upstream-gcc/bin/gcov a-t.cc
File '/home/apinski/upstream-gcc/include/c++/14.0.0/bits/stl_construct.h'
Lines executed:100.00% of 4
Creating 'stl_construct.h.gcov'

File '/home/apinski/upstream-gcc/include/c++/14.0.0/bits/new_allocator.h'
Lines executed:50.00% of 4
Creating 'new_allocator.h.gcov'

File '/home/apinski/upstream-gcc/include/c++/14.0.0/bits/stl_vector.h'
Lines executed:95.45% of 22
Creating 'stl_vector.h.gcov'

File '/home/apinski/upstream-gcc/include/c++/14.0.0/bits/alloc_traits.h'
Lines executed:66.67% of 3
Creating 'alloc_traits.h.gcov'

File '/home/apinski/upstream-gcc/include/c++/14.0.0/bits/allocator.h'
Lines executed:100.00% of 1
Creating 'allocator.h.gcov'

File 't.cc'
Lines executed:100.00% of 5
Creating 't.cc.gcov'

File '/home/apinski/upstream-gcc/include/c++/14.0.0/array'
No executable lines
Removing 'array.gcov'

Lines executed:89.74% of 39

real0m0.043s
user0m0.004s
sys 0m0.002s

Re: Patch regarding addition of .symtab while generating object file from libiberty [WIP]

2023-06-24 Thread Jan Hubicka via Gcc
> Hi,
Hi,
I am sorry for late reaction.
> I am working on the GSOC project "Bypass Assembler when generating LTO
> object files." So as a first step, I am adding .symtab along with
> __gnu_lto_slim symbol into it so that at a later stage, it can be
> recognized that this object file has been produced using -flto enabled.
> This patch is regarding the same. Although I am still testing this patch, I
> want general feedback on my code and design choice.
> I have extended simple_object_wrtie_struct to hold a list of symbols (
> similar to sections ). A function in simple-object.c to add symbols. I am
> calling this function in lto-object.cc to add __gnu_lto_v1.
> Right now, as we are only working on ELF support first, I am adding .symtab
> in elf object files only.
> 
> ---
>  gcc/lto-object.cc|   4 +-
>  include/simple-object.h  |  10 +++
>  libiberty/simple-object-common.h |  18 +
>  libiberty/simple-object-elf.c| 130 +--
>  libiberty/simple-object.c|  32 
>  5 files changed, 187 insertions(+), 7 deletions(-)
> 
> diff --git a/gcc/lto-object.cc b/gcc/lto-object.cc
> index cb1c3a6cfb3..680977cb327 100644
> --- a/gcc/lto-object.cc
> +++ b/gcc/lto-object.cc
> @@ -187,7 +187,9 @@ lto_obj_file_close (lto_file *file)
>int err;
> 
>gcc_assert (lo->base.offset == 0);
> -
> +  /*Add __gnu_lto_slim symbol*/
> +  if(flag_bypass_asm)
> +simple_object_write_add_symbol (lo->sobj_w, "__gnu_lto_slim",1,1);

You can probably do this unconditionally.  The ltrans files we produce
are kind of wrong by missing the symbol table currently.
> +simple_object_write_add_symbol(simple_object_write *sobj, const char *name,
> +size_t size, unsigned int align);

Symbols has much more properties in addition to sizes and alignments.
We will eventually need to get dwarf writting, so we will need to
support them. However right now we only do these fake lto object
symbols, so perhaps for start we could kep things simple and assume that
size is always 0 and align always 1 or so.

Overall this looks like really good start to me (both API and
imllementation looks reasonable to me and it is good that you follow the
coding convention).  I guess you can create a branch (see git info on
the gcc homepage) and put the patch there?

I am also adding Ian to CC as he is maintainer of the simple-object and
he may have some ideas.

Honza
> 
>  /* Release all resources associated with SIMPLE_OBJECT, including any
> simple_object_write_section's that may have been created.  */
> diff --git a/libiberty/simple-object-common.h
> b/libiberty/simple-object-common.h
> index b9d10550d88..df99c9d85ac 100644
> --- a/libiberty/simple-object-common.h
> +++ b/libiberty/simple-object-common.h
> @@ -58,6 +58,24 @@ struct simple_object_write_struct
>simple_object_write_section *last_section;
>/* Private data for the object file format.  */
>void *data;
> +  /*The start of the list of symbols.*/
> +  simple_object_symbol *symbols;
> +  /*The last entry in the list of symbols*/
> +  simple_object_symbol *last_symbol;
> +};
> +
> +/*A symbol in object file being created*/
> +struct simple_object_symbol_struct
> +{
> +  /*Next in the list of symbols attached to an
> +  simple_object_write*/
> +  simple_object_symbol *next;
> +  /*The name of this symbol. */
> +  char *name;
> +  /* Symbol value */
> +  unsigned int align;
> +  /* Symbol size */
> +  size_t size;
>  };
> 
>  /* A section in an object file being created.  */
> diff --git a/libiberty/simple-object-elf.c b/libiberty/simple-object-elf.c
> index eee07039984..cbba88186bd 100644
> --- a/libiberty/simple-object-elf.c
> +++ b/libiberty/simple-object-elf.c
> @@ -787,9 +787,9 @@ simple_object_elf_write_ehdr (simple_object_write
> *sobj, int descriptor,
>  ++shnum;
>if (shnum > 0)
>  {
> -  /* Add a section header for the dummy section and one for
> - .shstrtab.  */
> -  shnum += 2;
> +  /* Add a section header for the dummy section,
> + .shstrtab, .symtab and .strtab.  */
> +  shnum += 4;
>  }
> 
>ehdr_size = (cl == ELFCLASS32
> @@ -882,6 +882,51 @@ simple_object_elf_write_shdr (simple_object_write
> *sobj, int descriptor,
> errmsg, err);
>  }
> 
> +/* Write out an ELF Symbol*/
> +
> +static int
> +simple_object_elf_write_symbol(simple_object_write *sobj, int descriptor,
> +off_t offset, unsigned int st_name, unsigned int st_value,
> size_t st_size,
> +unsigned char st_info, unsigned char st_other, unsigned int
> st_shndx,
> +const char **errmsg, int *err)
> +{
> +  struct simple_object_elf_attributes *attrs =
> +(struct simple_object_elf_attributes *) sobj->data;
> +  const struct elf_type_functions* fns;
> +  unsigned char cl;
> +  size_t sym_size;
> +  unsigned char buf[sizeof (Elf64_External_Shdr)];
> +
> +  fns = attrs->type_functions;
> +  cl = attrs->ei_class;
> +
> +  sym_size = (cl == 

gcc-13-20230624 is now available

2023-06-24 Thread GCC Administrator via Gcc
Snapshot gcc-13-20230624 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/13-20230624/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 13 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-13 revision 896085f08f683d915c6803e4f2e8a7c816dcb1d7

You'll find:

 gcc-13-20230624.tar.xz   Complete GCC

  SHA256=2b1d0ecb8b4a30fe4eb50993af34d05199792902e9d1eafb12b193ce3c52e409
  SHA1=c8ee4ceaeb241df4d33eb7d68d61b8e01dc52929

Diffs from 13-20230617 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-13
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[Bug target/78904] zero-extracts are not effective

2023-06-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78904

--- Comment #18 from CVS Commits  ---
The master branch has been updated by Roger Sayle :

https://gcc.gnu.org/g:8f6c747c8638d4c3c47ba2d4c8be86909e183132

commit r14-2065-g8f6c747c8638d4c3c47ba2d4c8be86909e183132
Author: Roger Sayle 
Date:   Sat Jun 24 23:05:25 2023 +0100

i386: Add alternate representation for {and,or,xor}b %ah,%dh.

A patch that I'm working on to improve RTL simplifications in the
middle-end results in the regression of pr78904-1b.c, due to changes in
the canonical representation of high-byte (%ah, %bh, %ch, %dh) logic.
See also PR target/78904.

This patch avoids/prevents those failures by adding support for the
alternate representation, duplicating the existing *qi_ext_2
as *qi_ext_3 (the new version also replacing any_or with
any_logic to provide *andqi_ext_3 in the same pattern).  Removing
the original pattern isn't trivial, as it's generated by define_split,
but this can be investigated after the other pieces are approved.

The current representation of this instruction is:

(set (zero_extract:DI (reg/v:DI 87 [ aD.2763 ])
(const_int 8 [0x8])
(const_int 8 [0x8]))
(subreg:DI (xor:QI (subreg:QI (zero_extract:DI (reg:DI 94)
(const_int 8 [0x8])
(const_int 8 [0x8])) 0)
(subreg:QI (zero_extract:DI (reg/v:DI 87 [ aD.2763 ])
(const_int 8 [0x8])
(const_int 8 [0x8])) 0)) 0))

after my proposed middle-end improvement, we attempt to recognize:

(set (zero_extract:DI (reg/v:DI 87 [ aD.2763 ])
(const_int 8 [0x8])
(const_int 8 [0x8]))
(zero_extract:DI (xor:DI (reg:DI 94)
(reg/v:DI 87 [ aD.2763 ]))
(const_int 8 [0x8])
(const_int 8 [0x8])))

2023-06-24  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.md (*qi_ext_3): New define_insn.

[Bug middle-end/109986] missing fold (~a | b) ^ a => ~(a & b)

2023-06-24 Thread vanyacpp at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109986

--- Comment #3 from Ivan Sorokin  ---
I tried to investigate why GCC is able to simplify `(a | b) ^ a` and `(a | ~b)
^ a` from comment 2, but not similarly looking `(~a | b) ^ a` from comment 0.

`(a | b) ^ a` matches the following pattern from match.pd:

/* (X | Y) ^ X -> Y & ~ X*/
(simplify
 (bit_xor:c (convert1? (bit_ior:c @@0 @1)) (convert2? @0))
 (if (tree_nop_conversion_p (type, TREE_TYPE (@0)))
  (convert (bit_and @1 (bit_not @0)

`(a | ~b) ^ a` matches another pattern:

/* (~X | C) ^ D -> (X | C) ^ (~D ^ C) if (~D ^ C) can be simplified.  */
(simplify
 (bit_xor:c (bit_ior:cs (bit_not:s @0) @1) @2)
  (bit_xor (bit_ior @0 @1) (bit_xor! (bit_not! @2) @1)))

With substitution `X = b, C = a, D = a` it gives:

(b | a) ^ (~a ^ a)
(b | a) ^ -1
~(b | a)

`(~a | b) ^ a` is not simplifiable by this pattern because it requires that `~D
^ C` is simplifiable further, but `~a ^ b` is not. In any case, even if it were
applicable it would produce `(a | b) ^ (~a ^ b)` which has more operations than
the original expression.

[Bug c++/110397] types may not be defined in parameter types leads to ICE with -fdump-tree-original (or no -quiet when invoking cc1plus directly)

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110397

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Andrew Pinski  ---
Dup of bug 93788.

*** This bug has been marked as a duplicate of bug 93788 ***

[Bug c++/93788] Segfault caused by infinite loop in cc1plus

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93788

Andrew Pinski  changed:

   What|Removed |Added

 CC||stevenxia990430 at gmail dot 
com

--- Comment #4 from Andrew Pinski  ---
*** Bug 110397 has been marked as a duplicate of this bug. ***

[Bug c++/110397] types may not be defined in parameter types leads to ICE with -fdump-tree-original (or no -quiet when invoking cc1plus directly)

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110397

--- Comment #1 from Andrew Pinski  ---
Note here is the odd thing about this issue, it only shows up some of the time.
You can reproduce it 100% of the time if you use -fdump-tree-original .
Also don't need the include of iostream (though if using godbolt you do need it
if not using -fdump-tree-original) .

[Bug c++/110344] [C++26] P2738R1 - constexpr cast from void*

2023-06-24 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110344

--- Comment #3 from Jason Merrill  ---
Version of the paper testcase that just adds constexpr, that we currently
reject:

#include 
struct Sheep {
  constexpr std::string_view speak() const noexcept { return "Baa"; }
};
struct Cow {
  constexpr std::string_view speak() const noexcept { return "Mooo"; }
};
class Animal_View {
private:
  const void *animal;
  std::string_view (*speak_function)(const void *);
public:
  template 
  constexpr Animal_View(const Animal )
: animal{}, speak_function{[](const void *object) {
   return static_cast(object)->speak();
 }} {}
  constexpr std::string_view speak() const noexcept {
return speak_function(animal);
  }
};
// This is the key bit here. This is a single concrete function 
// that can take anything that happens to have the "Animal_View"
// interface
constexpr std::string_view do_speak(Animal_View av) { return av.speak(); }
int main() {
  // A Cow is a cow. The only think that makes it special   
  // is that it has a "std::string_view speak() const" member   
  constexpr Cow cow;
  constexpr auto result = do_speak(cow);
  return static_cast(result.size());
}

[Bug c++/110344] [C++26] P2738R1 - constexpr cast from void*

2023-06-24 Thread jason at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110344

Jason Merrill  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org

--- Comment #2 from Jason Merrill  ---
Reduced version of the paper's testcase that we already (wrongly) accept:

class Doer {
private:
  const void *ob;
  int (*fn)(const void *);
public:
  template 
  constexpr Doer(const T )
: ob{},
  fn{[](const void *p) { return static_cast(p)->doit(); }}
  {}
  constexpr int operator()() const { return fn(ob); }
};
struct Thing { constexpr int doit() const { return 42; }; };
static_assert (Doer(Thing())() == 42);

[Bug c++/110397] New: types may not be defined in parameter types leads to ICE

2023-06-24 Thread stevenxia990430 at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110397

Bug ID: 110397
   Summary: types may not be defined in parameter types leads to
ICE
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: stevenxia990430 at gmail dot com
  Target Milestone: ---

The following invalid program reports an internal compiler error: Segmentation
fault.

To quickly reproduce: https://gcc.godbolt.org/z/dE96K7cGc
```
#include 

int main(){
auto sum = ([](struct A {int b; int c;}a,...){
});
return 0;
}
```

tested on gcc-trunk

[Bug c++/110394] Lambda capture receives wrong value

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394

--- Comment #7 from Andrew Pinski  ---
(In reply to jackyguo18 from comment #6)
> @Andrew Pinski - Thanks, just confirmed that that was the issue.
> 
> Why doesn't GCC choose to delete the function (thus causing the weird
> behaviour) early at lower optimization levels?
> 
> Seems kinda strange it would work at -O2.

Most likely inlining more and being more agressive of doing some optimizations.
Since it is undefined behavior if you use the object after the lifetime ends,
it is just happened to work at different levels of optimization really.

[Bug c++/110394] Lambda capture receives wrong value

2023-06-24 Thread jackyguo18 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394

jackyguo18 at hotmail dot com changed:

   What|Removed |Added

 Status|WAITING |RESOLVED
 Resolution|--- |INVALID

--- Comment #6 from jackyguo18 at hotmail dot com ---
@Andrew Pinski - Thanks, just confirmed that that was the issue.

Why doesn't GCC choose to delete the function (thus causing the weird
behaviour) early at lower optimization levels?

Seems kinda strange it would work at -O2.

[Bug c++/110394] Lambda capture receives wrong value

2023-06-24 Thread jackyguo18 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394

--- Comment #5 from jackyguo18 at hotmail dot com ---
@Andrew Pinski - Thanks, just confirmed that that was the issue.

Why doesn't GCC choose to delete the function (thus causing the weird
behaviour) early at lower optimization levels?

Seems kinda strange it would work at -O2.

[Bug target/108678] Windows on ARM64 platform target aarch64-w64-mingw32

2023-06-24 Thread brechtsanders at users dot sourceforge.net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108678

--- Comment #3 from Brecht Sanders  
---
Any pointers on which files to edit in order to support aarch64-mingw ?

I think it won't require reinventing the wheel as it will probably be a mix of
existing *-mingw and aarch64-* stuff...

[Bug middle-end/102253] scalability issues with large loop depth

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102253

--- Comment #5 from Andrew Pinski  ---
On the trunk with the original testcase here we get:
 tree copy headers  :  12.16 ( 19%)   0.01 (  2%)  21.57 ( 28%)
  771k (  0%)


(I suspect the rest is due to not setting release checking ...)

Re: [Patch, fortran] PR49213 - [OOP] gfortran rejects structure constructor expression

2023-06-24 Thread Harald Anlauf via Gcc-patches

Hi Paul!

On 6/24/23 15:18, Paul Richard Thomas via Gcc-patches wrote:

I have included the adjustment to 'gfc_is_ptr_fcn' and eliminating the
extra blank line, introduced by my last patch. I played safe and went
exclusively for class functions with attr.class_pointer set on the
grounds that these have had all the accoutrements checked and built
(ie. class_ok). I am still not sure if this is necessary or not.


maybe it is my fault, but I find the version in the patch confusing:

@@ -816,7 +816,7 @@ bool
 gfc_is_ptr_fcn (gfc_expr *e)
 {
   return e != NULL && e->expr_type == EXPR_FUNCTION
- && (gfc_expr_attr (e).pointer
+ && ((e->ts.type != BT_CLASS && gfc_expr_attr (e).pointer)
  || (e->ts.type == BT_CLASS
  && CLASS_DATA (e)->attr.class_pointer));
 }

The caller 'gfc_is_ptr_fcn' has e->expr_type == EXPR_FUNCTION, so
gfc_expr_attr (e) boils down to:

  if (e->value.function.esym && e->value.function.esym->result)
{
  gfc_symbol *sym = e->value.function.esym->result;
  attr = sym->attr;
  if (sym->ts.type == BT_CLASS && sym->attr.class_ok)
{
  attr.dimension = CLASS_DATA (sym)->attr.dimension;
  attr.pointer = CLASS_DATA (sym)->attr.class_pointer;
  attr.allocatable = CLASS_DATA (sym)->attr.allocatable;
}
}
...
  else if (e->symtree)
attr = gfc_variable_attr (e, NULL);

So I thought this should already do what you want if you do

gfc_is_ptr_fcn (gfc_expr *e)
{
  return e != NULL && e->expr_type == EXPR_FUNCTION && gfc_expr_attr 
(e).pointer;

}

or what am I missing?  The additional checks in gfc_expr_attr are
there to avoid ICEs in case CLASS_DATA (sym) has issues, and we all
know Gerhard who showed that he is an expert in exploiting this.

To sum up, I'd prefer to use the safer form if it works.  If it
doesn't, I would expect a latent issue.

The rest of the code looked good to me, but I was suspicious about
the handling of CHARACTER.

Nasty as I am, I modified the testcase to use character(kind=4)
instead of kind=1 (see attached).  This either fails here (stop 10),
or if I activate the marked line

!cont = tContainer('hello!')   ! ### ICE! ###

I get an ICE.

Can you have another look?

Thanks,
Harald






OK for trunk?

Paul

Fortran: Enable class expressions in structure constructors [PR49213]

2023-06-24  Paul Thomas  

gcc/fortran
PR fortran/49213
* expr.cc (gfc_is_ptr_fcn): Guard pointer attribute to exclude
class expressions.
* resolve.cc (resolve_assoc_var): Call gfc_is_ptr_fcn to allow
associate names with pointer function targets to be used in
variable definition context.
* trans-decl.cc (get_symbol_decl): Remove extraneous line.
* trans-expr.cc (alloc_scalar_allocatable_subcomponent): Obtain
size of intrinsic and character expressions.
(gfc_trans_subcomponent_assign): Expand assignment to class
components to include intrinsic and character expressions.

gcc/testsuite/
PR fortran/49213
* gfortran.dg/pr49213.f90 : New test
! { dg-do run }
!
! Contributed by Neil Carlson  
!
program main
! character(2) :: c
  character(2,kind=4) :: c

  type :: S
integer :: n
  end type
  type(S) :: Sobj

  type, extends(S) :: S2
integer :: m
  end type
  type(S2) :: S2obj

  type :: T
class(S), allocatable :: x
  end type
  type(T) :: Tobj

  Sobj = S(1)
  Tobj = T(Sobj)

  S2obj = S2(1,2)
  Tobj = T(S2obj)! Failed here
  select type (x => Tobj%x)
type is (S2)
  if ((x%n .ne. 1) .or. (x%m .ne. 2)) stop 1
class default
  stop 2
  end select

  c = 4_"  "
  call pass_it (T(Sobj))
  if (c .ne. 4_"S ") stop 3
  call pass_it (T(S2obj))! and here
  if (c .ne. 4_"S2") stop 4

  call bar

contains

  subroutine pass_it (foo)
type(T), intent(in) :: foo
select type (x => foo%x)
  type is (S)
c = 4_"S "
if (x%n .ne. 1) stop 5
  type is (S2)
c = 4_"S2"
if ((x%n .ne. 1) .or. (x%m .ne. 2)) stop 6
  class default
stop 7
end select
  end subroutine

  subroutine bar
   ! Test from comment #29 of the PR - due to Janus Weil
type tContainer
  class(*), allocatable :: x
end type
integer, parameter :: i = 0
character(7,kind=4) :: chr = 4_"goodbye"
type(tContainer) :: cont

cont%x = i ! linker error: undefined reference to `__copy_INTEGER_4_.3804'

cont = tContainer(i+42) ! Failed here
select type (z => cont%x)
  type is (integer)
if (z .ne. 42) stop 8
  class default
stop 9
end select

!cont = tContainer('hello!')   ! ### ICE! ###
cont = tContainer(4_'hello!')
select type (z => cont%x)
  type is (character(*,kind=4))
if (z .ne. 4_'hello!') stop 10
  class default
stop 11
end select

cont = tContainer(chr)
select type (z => cont%x)
  type is (character(*,kind=4))
if (z .ne. 4_'goodbye') stop 12
  class default
 

[Bug rtl-optimization/110390] ICE on valid code on x86_64-linux-gnu with sel-scheduling: in av_set_could_be_blocked_by_bookkeeping_p, at sel-sched.cc:3609

2023-06-24 Thread zhendong.su at inf dot ethz.ch via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110390

--- Comment #1 from Zhendong Su  ---
Another reproducer with fewer flags (and affects 12.* and later).

Compiler Explorer: https://godbolt.org/z/fYqEz9EWx

[603] % gcctk -v
Using built-in specs.
COLLECT_GCC=gcctk
COLLECT_LTO_WRAPPER=/local/home/suz/suz-local/software/local/gcc-trunk/bin/../libexec/gcc/x86_64-pc-linux-gnu/14.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: ../gcc-trunk/configure --disable-bootstrap
--enable-checking=yes --prefix=/local/suz-local/software/local/gcc-trunk
--enable-sanitizers --enable-languages=c,c++ --disable-werror
--disable-multilib
Thread model: posix
Supported LTO compression algorithms: zlib
gcc version 14.0.0 20230624 (experimental) [master r14-924-gd709841ae0f] (GCC)
[604] %
[604] % gcctk -O3 -fsel-sched-pipelining -fschedule-insns
-fselective-scheduling2 -fPIC small.c
during RTL pass: sched2
small.c: In function ‘h’:
small.c:20:1: internal compiler error: in
av_set_could_be_blocked_by_bookkeeping_p, at sel-sched.cc:3609
   20 | }
  | ^
0x7d635a av_set_could_be_blocked_by_bookkeeping_p
../../gcc-trunk/gcc/sel-sched.cc:3609
0x7d635a code_motion_process_successors
../../gcc-trunk/gcc/sel-sched.cc:6386
0x7d635a code_motion_path_driver
../../gcc-trunk/gcc/sel-sched.cc:6608
0xf85b69 code_motion_process_successors
../../gcc-trunk/gcc/sel-sched.cc:6342
0xf85b69 code_motion_path_driver
../../gcc-trunk/gcc/sel-sched.cc:6608
0xf86c18 find_used_regs
../../gcc-trunk/gcc/sel-sched.cc:3272
0xf86c18 collect_unavailable_regs_from_bnds
../../gcc-trunk/gcc/sel-sched.cc:1586
0xf86c18 find_best_reg_for_expr
../../gcc-trunk/gcc/sel-sched.cc:1649
0xf8976c fill_vec_av_set
../../gcc-trunk/gcc/sel-sched.cc:3784
0xf8976c fill_ready_list
../../gcc-trunk/gcc/sel-sched.cc:4014
0xf8976c find_best_expr
../../gcc-trunk/gcc/sel-sched.cc:4374
0xf8976c fill_insns
../../gcc-trunk/gcc/sel-sched.cc:5535
0xf8976c schedule_on_fences
../../gcc-trunk/gcc/sel-sched.cc:7353
0xf8976c sel_sched_region_2
../../gcc-trunk/gcc/sel-sched.cc:7491
0xf8a928 sel_sched_region_1
../../gcc-trunk/gcc/sel-sched.cc:7533
0xf8bf46 sel_sched_region(int)
../../gcc-trunk/gcc/sel-sched.cc:7634
0xf8bf46 sel_sched_region(int)
../../gcc-trunk/gcc/sel-sched.cc:7619
0xf8c0e9 run_selective_scheduling()
../../gcc-trunk/gcc/sel-sched.cc:7720
0xf6d7ed rest_of_handle_sched2
../../gcc-trunk/gcc/sched-rgn.cc:3743
0xf6d7ed execute
../../gcc-trunk/gcc/sched-rgn.cc:3890
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
[605] %
[605] % cat small.c
static int a;
int b, c, d, g;
long e, f;
extern void l(char *);
void h() {
  char i;
  int j = 1 >> f / b;
 L:
  f = -(-(f % g || a) * (c && f | e));
  if (a > e)
l("");
  if (f) {
l("A");
i = j / g;
  }
  if (a)
goto L;
  d = i;
  a = 0;
}

[Bug middle-end/102253] scalability issues with large loop depth

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102253

--- Comment #4 from Andrew Pinski  ---
VRP/ranger uses SCEV now so it might even be worse, the testcase from PR 110396
has that behavior too.

[Bug middle-end/102253] scalability issues with large loop depth

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102253

Andrew Pinski  changed:

   What|Removed |Added

 CC||luydorarko at vusra dot com

--- Comment #3 from Andrew Pinski  ---
*** Bug 110396 has been marked as a duplicate of this bug. ***

[Bug tree-optimization/110396] Compile-time hog with -O2 and -O3

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110396

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #2 from Andrew Pinski  ---
This is basically a dup of bug 102253. The problem is there is a known
scalability issues with large loop depth.

How did you generate this testcase, is it from real code or just generated to
try to hit some compile bugs?

*** This bug has been marked as a duplicate of bug 102253 ***

[Bug tree-optimization/110396] Compile-time hog with -O2 and -O3

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110396

Andrew Pinski  changed:

   What|Removed |Added

  Component|c++ |tree-optimization

--- Comment #1 from Andrew Pinski  ---
#0  0x012f8732 in hash_table_mod1 (index=5, hash=165) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/hash-table.h:344
#1  hash_table::find_slot_with_hash
(insert=INSERT, hash=165, comparable=, this=0x77600b28) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/hash-table.h:1051
#2  hash_table::find_slot (insert=INSERT,
value=, this=0x77600b28) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/hash-table.h:435
#3  find_var_scev_info (instantiated_below=0x75247b40, var=0x75159360)
at /home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:358
#4  0x012f9312 in get_scalar_evolution (scalar=0x75159360,
instantiated_below=0x75247b40) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:556
#5  analyze_scalar_evolution (loop=0x7514eaf0, var=0x75159360) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:2020
#6  0x012f8aa7 in interpret_condition_phi
(condition_phi=0x751fbd00, loop=0x7514eaf0) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1603
#7  analyze_scalar_evolution_1 (loop=0x7514eaf0, var=0x751f5dc8) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1969
#8  0x012f8b5b in analyze_scalar_evolution_1 (loop=0x7514e960,
var=0x751f5dc8) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1950
#9  0x012f94e5 in analyze_scalar_evolution (loop=0x7514e960,
var=0x751f5dc8) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:2031
#10 0x012f8aa7 in interpret_condition_phi
(condition_phi=0x75209400, loop=0x7514e960) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1603
#11 analyze_scalar_evolution_1 (loop=0x7514e960, var=0x75207870) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1969
#12 0x012f8b5b in analyze_scalar_evolution_1 (loop=0x7514e7d0,
var=0x75207870) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1950
#13 0x012f94e5 in analyze_scalar_evolution (loop=0x7514e7d0,
var=0x75207870) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:2031
#14 0x012f8aa7 in interpret_condition_phi
(condition_phi=0x7520ab00, loop=0x7514e7d0) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1603
#15 analyze_scalar_evolution_1 (loop=0x7514e7d0, var=0x75161900) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1969
#16 0x012f8b5b in analyze_scalar_evolution_1 (loop=0x7514e640,
var=0x75161900) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1950
#17 0x012f94e5 in analyze_scalar_evolution (loop=0x7514e640,
var=0x75161900) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:2031
#18 0x012f8aa7 in interpret_condition_phi
(condition_phi=0x75172a00, loop=0x7514e640) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1603
#19 analyze_scalar_evolution_1 (loop=0x7514e640, var=0x75159318) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1969
#20 0x012f8b5b in analyze_scalar_evolution_1 (loop=0x7514e4b0,
var=0x75159318) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1950
#21 0x012f94e5 in analyze_scalar_evolution (loop=0x7514e4b0,
var=0x75159318) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:2031
#22 0x012f8aa7 in interpret_condition_phi
(condition_phi=0x7520d300, loop=0x7514e4b0) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1603
#23 analyze_scalar_evolution_1 (loop=0x7514e4b0, var=0x74d961b0) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1969
#24 0x012f8b5b in analyze_scalar_evolution_1 (loop=0x7514e320,
var=0x74d961b0) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1950
#25 0x012f94e5 in analyze_scalar_evolution (loop=0x7514e320,
var=0x74d961b0) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:2031
#26 0x012f8aa7 in interpret_condition_phi
(condition_phi=0x7520d500, loop=0x7514e320) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1603
#27 analyze_scalar_evolution_1 (loop=0x7514e320, var=0x7505fd80) at
/home/apinski/src/upstream-gcc-git/gcc/gcc/tree-scalar-evolution.cc:1969
#28 0x012f8b5b in analyze_scalar_evolution_1 (loop=0x7514e190,
var=0x7505fd80) at

[Bug tree-optimization/110311] [14 Regression] regression in tree-optimizer

2023-06-24 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110311

--- Comment #23 from anlauf at gcc dot gnu.org ---
You could check the input arguments for validity, e.g. using ieee_is_finite
from the intrinsic ieee_arithmetic module.

  use, intrinsic :: ieee_arithmetic, only: ieee_is_finite

...

  if (.not. ieee_is_finite (a)) then
 print *, "bad: a=", a
 stop 1
  end if

As last resort I still recommend what I wrote in comment#15: build (=link)
your executable from *.o from your project build tree with known-good objects
but replacing one candidate.o by the one from the build tree showing the
problem.

And I really mean: link only und run.

[Bug c++/110396] New: Compile-time hog with -O2 and -O3

2023-06-24 Thread luydorarko at vusra dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110396

Bug ID: 110396
   Summary: Compile-time hog with -O2 and -O3
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: luydorarko at vusra dot com
  Target Milestone: ---

Created attachment 55397
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55397=edit
Preprocessed file created by `-O2 -save-temps`

Compile time hog behavior can be reproduced with:
```
g++ -O2 tmp.cpp 
```
Also same behavior with `-O3`.

Compiler takes far too long (more than one hour in one case) and was killed
after a while.

Output of `g++ -v`:
```
Using built-in specs.
COLLECT_GCC=g++
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/13.1.1/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure
--enable-languages=ada,c,c++,d,fortran,go,lto,objc,obj-c++ --enable-bootstrap
--prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
--with-build-config=bootstrap-lto --with-linker-hash-style=gnu
--with-system-zlib --enable-__cxa_atexit --enable-cet=auto
--enable-checking=release --enable-clocale=gnu --enable-default-pie
--enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object
--enable-libstdcxx-backtrace --enable-link-serialization=1
--enable-linker-build-id --enable-lto --enable-multilib --enable-plugin
--enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch
--disable-werror
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.1.1 20230429 (GCC) 
```

Attachment: a-tmp.ii file created with `g++ -O2 tmp.cpp -save-temps`

[PATCH, part2, committed] Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument [PR110360]

2023-06-24 Thread Harald Anlauf via Gcc-patches
Dear all,

the first part of the patch came with a testcase that also exercised
code for constant string arguments, which was not touched by that patch
but seems to have caused runtime failures on big-endian platforms
(e.g. Power-* BE) for all optimization levels, and on x86 / -m32
at -O1 and higher (not at -O0).

I did not see any issues on x86 / -m64 and any optimization level,
but could reproduce a problem with x86 / -m32 at -O1, which appears
to be related how arguments that are to be passed by value are
handled when there is a mismatch between the function prototype
and the passed argument.  The solution is to truncate too long
constant string arguments, fixed by the attached patch, pushed as:

https://gcc.gnu.org/g:3f97d10aa1ff5984d6fd657f246d3f251b254ff1

and see attached.

* * *

I found gcc-testresults quite helpful in checking whether my patch
caused trouble on architectures different from the one I'm working
on.  The value (pun intended) would have been even greater if
output of runtime failures would also be made available.
Many (Fortran) tests provide either a stop code, or some hopefully
helpful diagnostic output on stdout intended for locating errors
on platforms where one has no direct access to, or is less
familiar with.  Far better than a plain

FAIL: gfortran.dg/value_9.f90   -O1  execution test

* * *

Thanks,
Harald

From 3f97d10aa1ff5984d6fd657f246d3f251b254ff1 Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sat, 24 Jun 2023 20:36:53 +0200
Subject: [PATCH] Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument
 [PR110360]

gcc/fortran/ChangeLog:

	PR fortran/110360
	* trans-expr.cc (gfc_conv_procedure_call): Truncate constant string
	argument of length > 1 passed to scalar CHARACTER(1),VALUE dummy.
---
 gcc/fortran/trans-expr.cc | 21 +
 1 file changed, 13 insertions(+), 8 deletions(-)

diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index c92fccd0be2..63e3cf9681e 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -6395,20 +6395,25 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,

 		/* ABI: actual arguments to CHARACTER(len=1),VALUE
 		   dummy arguments are actually passed by value.
-		   The BIND(C) case is handled elsewhere.
-		   TODO: truncate constant strings to length 1.  */
+		   Constant strings are truncated to length 1.
+		   The BIND(C) case is handled elsewhere.  */
 		if (fsym->ts.type == BT_CHARACTER
 			&& !fsym->ts.is_c_interop
 			&& fsym->ts.u.cl->length->expr_type == EXPR_CONSTANT
 			&& fsym->ts.u.cl->length->ts.type == BT_INTEGER
 			&& (mpz_cmp_ui
-			(fsym->ts.u.cl->length->value.integer, 1) == 0)
-			&& e->expr_type != EXPR_CONSTANT)
+			(fsym->ts.u.cl->length->value.integer, 1) == 0))
 		  {
-			parmse.expr = gfc_string_to_single_character
-			  (build_int_cst (gfc_charlen_type_node, 1),
-			   parmse.expr,
-			   e->ts.kind);
+			if (e->expr_type != EXPR_CONSTANT)
+			  parmse.expr = gfc_string_to_single_character
+			(build_int_cst (gfc_charlen_type_node, 1),
+			 parmse.expr,
+			 e->ts.kind);
+			else if (e->value.character.length > 1)
+			  {
+			e->value.character.length = 1;
+			gfc_conv_expr (, e);
+			  }
 		  }

 		if (fsym->attr.optional
--
2.35.3



[Bug tree-optimization/110311] [14 Regression] regression in tree-optimizer

2023-06-24 Thread juergen.reuter at desy dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110311

--- Comment #22 from Jürgen Reuter  ---
(In reply to anlauf from comment #21)
> I forgot to mention that you need to check that the location where a symptom
> is seen sometimes may not be the location of the cause.

Indeed, I think you are right and the problem is elsewhere. I don't really know
where to continue.

[Bug fortran/82943] [F03] Error with type-bound procedure of parametrized derived type

2023-06-24 Thread jvdelisle at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82943

Jerry DeLisle  changed:

   What|Removed |Added

 CC||jvdelisle at gcc dot gnu.org

--- Comment #14 from Jerry DeLisle  ---
(In reply to Alexander Westbrooks from comment #13)
> I sent in the patch to those emails. Hopefully now the ball will start
> rolling and I can slowly get this packaged into a legitimate fix. I'll post
> updates here as I receive them.
> 
> The patch is below, if you would like to try it. I did this in the GCC 14
> code.
> 
I saw your email. Thanks for getting involved!

[Bug fortran/110360] ABI issue with character,value dummy argument

2023-06-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110360

--- Comment #13 from CVS Commits  ---
The master branch has been updated by Harald Anlauf :

https://gcc.gnu.org/g:3f97d10aa1ff5984d6fd657f246d3f251b254ff1

commit r14-2064-g3f97d10aa1ff5984d6fd657f246d3f251b254ff1
Author: Harald Anlauf 
Date:   Sat Jun 24 20:36:53 2023 +0200

Fortran: ABI for scalar CHARACTER(LEN=1),VALUE dummy argument [PR110360]

gcc/fortran/ChangeLog:

PR fortran/110360
* trans-expr.cc (gfc_conv_procedure_call): Truncate constant string
argument of length > 1 passed to scalar CHARACTER(1),VALUE dummy.

[Bug fortran/110360] ABI issue with character,value dummy argument

2023-06-24 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110360

--- Comment #12 from anlauf at gcc dot gnu.org ---
(In reply to anlauf from comment #11)
> Created attachment 55393 [details]
> Patch to truncate string argument longer than 1
> 
> This truncates the string to length 1 and appears to work on x86 / -m32 .
> Would be interesting to get feedback on big-endian platforms.

As this works here, cross-checked with valgrind, and not feedback so far,
I'll push this update and watch the testers.

[Bug tree-optimization/110311] [14 Regression] regression in tree-optimizer

2023-06-24 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110311

--- Comment #21 from anlauf at gcc dot gnu.org ---
I forgot to mention that you need to check that the location where a symptom
is seen sometimes may not be the location of the cause.

[x86_PATCH] New *ashl_doubleword_highpart define_insn_and_split.

2023-06-24 Thread Roger Sayle

This patch contains a pair of (related) optimizations in i386.md that
allow us to generate better code for the example below (this is a step
towards fixing a bugzilla PR, but I've forgotten the number).

__int128 foo64(__int128 x, long long y)
{
  __int128 t = (__int128)y << 64;
  return x ^ t;
}

The hidden issue is that the RTL currently seen by reload contains
the sign extension of y from DImode to TImode, even though this is
dead (not required) for left shifts by more than WORD_SIZE bits.

(insn 11 8 12 2 (parallel [
(set (reg:TI 0 ax [orig:91 y ] [91])
(sign_extend:TI (reg:DI 1 dx [97])))
(clobber (reg:CC 17 flags))
(clobber (scratch:DI))
]) {extendditi2}

What makes this particularly undesirable is that the sign-extension
pattern above requires an additional DImode scratch register, indicated
by the clobber, which unnecessarily increases register pressure.

The proposed solution is to add a define_insn_and_split for such
left shifts (of sign or zero extensions) that only have a non-zero
highpart, where the extension is redundant and eliminated, that can
be split after reload, without scratch registers or early clobbers.

This (late split) exposes a second optimization opportunity where
setting the lowpart to zero can sometimes be combined/simplified with
the following instruction during peephole2.

For the test case above, we previously generated with -O2:

foo64:  xorl%eax, %eax
xorq%rsi, %rdx
xorq%rdi, %rax
ret

with this patch, we now generate:

foo64:  movq%rdi, %rax
xorq%rsi, %rdx
ret

Likewise for the related -m32 test case, we go from:

foo32:  movl12(%esp), %eax
movl%eax, %edx
xorl%eax, %eax
xorl8(%esp), %edx
xorl4(%esp), %eax
ret

to the improved:

foo32:  movl12(%esp), %edx
movl4(%esp), %eax
xorl8(%esp), %edx
ret


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-06-24  Roger Sayle  

gcc/ChangeLog
* config/i386/i386.md (peephole2): Simplify zeroing a register
followed by an IOR, XOR or PLUS operation on it, into a move.
(*ashl3_doubleword_highpart): New define_insn_and_split to
eliminate (and hide from reload) unnecessary word to doubleword
extensions that are followed by left shifts by sufficient large
(but valid) bit counts.

gcc/testsuite/ChangeLog
* gcc.target/i386/ashldi3-1.c: New 32-bit test case.
* gcc.target/i386/ashlti3-2.c: New 64-bit test case.


Thanks again,
Roger
--

diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 95a6653c..7664dff 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -12206,6 +12206,18 @@
(set_attr "type" "alu")
(set_attr "mode" "QI")])
 
+;; Peephole2 rega = 0; rega op= regb into rega = regb.
+(define_peephole2
+  [(parallel [(set (match_operand:SWI 0 "general_reg_operand")
+  (const_int 0))
+ (clobber (reg:CC FLAGS_REG))])
+   (parallel [(set (match_dup 0)
+  (any_or_plus:SWI (match_dup 0)
+   (match_operand:SWI 1 "")))
+ (clobber (reg:CC FLAGS_REG))])]
+  ""
+  [(set (match_dup 0) (match_dup 1))])
+   
 ;; Split DST = (HI<<32)|LO early to minimize register usage.
 (define_insn_and_split "*concat3_1"
   [(set (match_operand: 0 "nonimmediate_operand" "=ro,r")
@@ -13365,6 +13377,28 @@
   [(const_int 0)]
   "ix86_split_ashl (operands, operands[3], mode); DONE;")
 
+(define_insn_and_split "*ashl3_doubleword_highpart"
+  [(set (match_operand: 0 "register_operand" "=r")
+   (ashift:
+ (any_extend: (match_operand:DWIH 1 "nonimmediate_operand" "rm"))
+ (match_operand:QI 2 "const_int_operand")))
+   (clobber (reg:CC FLAGS_REG))]
+  "INTVAL (operands[2]) >=  * BITS_PER_UNIT
+   && INTVAL (operands[2]) <  * BITS_PER_UNIT * 2"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+{
+  split_double_mode (mode, [0], 1, [0], [3]);
+  int bits = INTVAL (operands[2]) - ( * BITS_PER_UNIT);
+  if (!rtx_equal_p (operands[3], operands[1]))
+emit_move_insn (operands[3], operands[1]);
+  if (bits > 0)
+emit_insn (gen_ashl3 (operands[3], operands[3], GEN_INT (bits)));
+  ix86_expand_clear (operands[0]);
+  DONE;
+})
+
 (define_insn "x86_64_shld"
   [(set (match_operand:DI 0 "nonimmediate_operand" "+r*m")
 (ior:DI (ashift:DI (match_dup 0)
diff --git a/gcc/testsuite/gcc.target/i386/ashldi3-1.c 
b/gcc/testsuite/gcc.target/i386/ashldi3-1.c
new file mode 100644
index 000..b61d63b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/ashldi3-1.c
@@ -0,0 +1,16 @@
+/* { dg-do compile { target ia32 } } */
+/* { dg-options "-O2" } */
+
+long long foo(long long x, int y)
+{
+  long long t = (long long)y << 

[Bug fortran/82943] [F03] Error with type-bound procedure of parametrized derived type

2023-06-24 Thread ctechnodev at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82943

--- Comment #13 from Alexander Westbrooks  ---
I sent in the patch to those emails. Hopefully now the ball will start rolling
and I can slowly get this packaged into a legitimate fix. I'll post updates
here as I receive them.

The patch is below, if you would like to try it. I did this in the GCC 14 code.



diff --git a/gcc/fortran/decl.cc b/gcc/fortran/decl.cc
index d09c8bc97d9..9043a4d427f 100644
--- a/gcc/fortran/decl.cc
+++ b/gcc/fortran/decl.cc
@@ -4063,6 +4063,21 @@ gfc_get_pdt_instance (gfc_actual_arglist *param_list,
gfc_symbol **sym,
  continue;
}

+  /* 
+Addressing PR82943, this will fix the issue where a function/subroutine is
declared as not
+a member of the PDT instance. The reason for this is because the PDT
instance did not have
+access to its template's f2k_derived namespace in order to find the
typebound procedures.
+
+The number of references to the PDT template's f2k_derived will ensure
that f2k_derived is 
+properly freed later on.
+  */
+
+  if (!instance->f2k_derived && pdt->f2k_derived)
+  {
+instance->f2k_derived = pdt->f2k_derived;
+instance->f2k_derived->refs++;
+  }
+
   /* Set the component kind using the parameterized expression.  */
   if ((c1->ts.kind == 0 || c1->ts.type == BT_CHARACTER)
   && c1->kind_expr != NULL)
diff --git a/gcc/fortran/gfortran.h b/gcc/fortran/gfortran.h
index a58c60e9828..6854edb3467 100644
--- a/gcc/fortran/gfortran.h
+++ b/gcc/fortran/gfortran.h
@@ -3536,6 +3536,7 @@ void gfc_traverse_gsymbol (gfc_gsymbol *, void
(*)(gfc_gsymbol *, void *), void
 gfc_typebound_proc* gfc_get_typebound_proc (gfc_typebound_proc*);
 gfc_symbol* gfc_get_derived_super_type (gfc_symbol*);
 bool gfc_type_is_extension_of (gfc_symbol *, gfc_symbol *);
+bool gfc_pdt_is_instance_of(gfc_symbol *, gfc_symbol *);
 bool gfc_type_compatible (gfc_typespec *, gfc_typespec *);

 void gfc_copy_formal_args_intr (gfc_symbol *, gfc_intrinsic_sym *,
diff --git a/gcc/fortran/resolve.cc b/gcc/fortran/resolve.cc
index 50b49d0cb83..6af55760321 100644
--- a/gcc/fortran/resolve.cc
+++ b/gcc/fortran/resolve.cc
@@ -14705,14 +14705,34 @@ resolve_typebound_procedure (gfc_symtree* stree)
  goto error;
}

-  if (CLASS_DATA (me_arg)->ts.u.derived
- != resolve_bindings_derived)
-   {
- gfc_error ("Argument %qs of %qs with PASS(%s) at %L must be of"
-" the derived-type %qs", me_arg->name, proc->name,
-me_arg->name, , resolve_bindings_derived->name);
- goto error;
-   }
+  /* The derived type is not a PDT template. Resolve as usual */
+  if ( !resolve_bindings_derived->attr.pdt_template && 
+(CLASS_DATA (me_arg)->ts.u.derived != resolve_bindings_derived))
+  {
+gfc_error ("Argument %qs of %qs with PASS(%s) at %L must be of"
+" the derived-type %qs", me_arg->name, proc->name,
+me_arg->name, , resolve_bindings_derived->name);
+goto error;
+  }
+  
+  if ( resolve_bindings_derived->attr.pdt_template && 
+!gfc_pdt_is_instance_of(resolve_bindings_derived,
CLASS_DATA(me_arg)->ts.u.derived) )
+  {
+gfc_error ("Argument %qs of %qs with PASS(%s) at %L must be of"
+  " the parametric derived-type %qs", me_arg->name, proc->name,
+  me_arg->name, , resolve_bindings_derived->name);
+goto error;
+  }
+
+  if ( resolve_bindings_derived->attr.pdt_template 
+&& gfc_pdt_is_instance_of(resolve_bindings_derived,
CLASS_DATA(me_arg)->ts.u.derived)
+&& (me_arg->param_list != NULL)
+&& (gfc_spec_list_type(me_arg->param_list,
CLASS_DATA(me_arg)->ts.u.derived) != SPEC_ASSUMED))
+  {
+gfc_error ("All LEN type parameters of the passed dummy argument %qs of
%qs"
+" at %L must be ASSUMED.", me_arg->name, proc->name, );
+goto error;
+  }

   gcc_assert (me_arg->ts.type == BT_CLASS);
   if (CLASS_DATA (me_arg)->as && CLASS_DATA (me_arg)->as->rank != 0)
diff --git a/gcc/fortran/symbol.cc b/gcc/fortran/symbol.cc
index 37a9e8fa0ae..77f84de0989 100644
--- a/gcc/fortran/symbol.cc
+++ b/gcc/fortran/symbol.cc
@@ -5134,6 +5134,35 @@ gfc_type_is_extension_of (gfc_symbol *t1, gfc_symbol
*t2)
   return gfc_compare_derived_types (t1, t2);
 }

+/* Check if a parameterized derived type t2 is an instance of a PDT template
t1 */
+
+bool
+gfc_pdt_is_instance_of(gfc_symbol *t1, gfc_symbol *t2)
+{
+  if ( !t1->attr.pdt_template || !t2->attr.pdt_type )
+return false;
+
+  /* 
+in decl.cc, gfc_get_pdt_instance, a pdt instance is given a 3 character
prefix "Pdt", followed 
+by an underscore list of the kind parameters, up to a maximum of 8. 
+
+So to check if a PDT Type corresponds to the template, extract the core
derive_type name,
+and then see if it is type compatible by name...
+
+For example:
+
+Pdtf_2_2 -> extract out the 'f' -> see if the derived type 'f' is
compatible with symbol t1
+  */
+
+  // Starting at index 3 of 

PR82943 - Suggested patch to fix

2023-06-24 Thread Alexander Westbrooks via Gcc-patches
Hello,

I am new to the GFortran community. Over the past two weeks I created a
patch that should fix PR82943 for GFortran. I have attached it to this
email. The patch allows the code below to compile successfully. I am
working on creating test cases next, but I am new to the process so it may
take me some time. After I make test cases, do I email them to you as well?
Do I need to make a pull-request on github in order to get the patch
reviewed?

Thank you,

Alexander Westbrooks

module testmod

public :: foo

type, public :: tough_lvl_0(a, b)
integer, kind :: a = 1
integer, len :: b
contains
procedure :: foo
end type

type, public, EXTENDS(tough_lvl_0) :: tough_lvl_1 (c)
integer, len :: c
contains
procedure :: bar
end type

type, public, EXTENDS(tough_lvl_1) :: tough_lvl_2 (d)
integer, len :: d
contains
procedure :: foobar
end type

contains
subroutine foo(this)
class(tough_lvl_0(1,*)), intent(inout) :: this
end subroutine

subroutine bar(this)
class(tough_lvl_1(1,*,*)), intent(inout) :: this
end subroutine

subroutine foobar(this)
class(tough_lvl_2(1,*,*,*)), intent(inout) :: this
end subroutine

end module

PROGRAM testprogram
USE testmod

TYPE(tough_lvl_0(1,5)) :: test_pdt_0
TYPE(tough_lvl_1(1,5,6))   :: test_pdt_1
TYPE(tough_lvl_2(1,5,6,7)) :: test_pdt_2

CALL test_pdt_0%foo()

CALL test_pdt_1%foo()
CALL test_pdt_1%bar()

CALL test_pdt_2%foo()
CALL test_pdt_2%bar()
CALL test_pdt_2%foobar()


END PROGRAM testprogram


0001-bug-patch-PR82943.patch
Description: Binary data


[x86_64 PATCH] Handle SUBREG conversions in TImode STV (for ptest).

2023-06-24 Thread Roger Sayle

This patch teaches i386's STV pass how to handle SUBREG conversions,
i.e. that a TImode SUBREG can be transformed into a V1TImode SUBREG,
without worrying about other DEFs and USEs.

A motivating example where this is useful is

typedef long long __m128i __attribute__ ((__vector_size__ (16)));
int foo (__m128i x, __m128i y) {
  return (__int128)x == (__int128)y;
}

where with -O2 -msse4 we can now scalar-to-vector transform:

(insn 7 4 8 2 (set (reg:CCZ 17 flags)
(compare:CCZ (subreg:TI (reg/v:V2DI 86 [ x ]) 0)
(subreg:TI (reg/v:V2DI 87 [ y ]) 0))) {*cmpti_doubleword}

into

(insn 17 4 7 2 (set (reg:V1TI 91)
(xor:V1TI (subreg:V1TI (reg/v:V2DI 86 [ x ]) 0)
(subreg:V1TI (reg/v:V2DI 87 [ y ]) 0)))
 (nil))
(insn 7 17 8 2 (set (reg:CCZ 17 flags)
(unspec:CCZ [
(reg:V1TI 91) repeated x2
] UNSPEC_PTEST)) {*sse4_1_ptestv1ti}
 (expr_list:REG_DEAD (reg/v:V2DI 87 [ y ])
(expr_list:REG_DEAD (reg/v:V2DI 86 [ x ])
(nil

with the dramatic effect that the assembly output before:

foo:movaps  %xmm0, -40(%rsp)
movq-32(%rsp), %rdx
movq%xmm0, %rax
movq%xmm1, %rsi
movaps  %xmm1, -24(%rsp)
movq-16(%rsp), %rcx
xorq%rsi, %rax
xorq%rcx, %rdx
orq %rdx, %rax
sete%al
movzbl  %al, %eax
ret

now becomes

foo:pxor%xmm1, %xmm0
xorl%eax, %eax
ptest   %xmm0, %xmm0
sete%al
ret

i.e. a 128-bit vector doesn't need to be transferred to the
scalar unit to be tested for equality.  The new test case includes
additional related examples that show similar improvements.

Previously we explicitly checked *cmpti_doubleword operands to be
either immediate constants, or a TImode REG or a TImode MEM.  By
enhancing this to allow a TImode SUBREG, we now handle everything
that would match the general_operand predicate, making this part
of STV more like other RTL passes (lra/reload).  The big change is
that unlike a regular DF USE, a SUBREG USE doesn't require us to
analyze and convert the rest of the DEF-USE chain.

This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2023-06-24  Roger Sayle  

gcc/ChangeLog
* config/i386/i386-features.cc (scalar_chain:add_insn): Don't
call analyze_register_chain if the USE is a SUBREG.
(timode_scalar_chain::convert_op): Call gen_lowpart to convert
TImode SUBREGs to V1TImode SUBREGs.
(convertible_comparison_p): We can now handle all general_operands
of *cmp_doubleword.
(timode_remove_non_convertible_regs): We only need to check TImode
uses that aren't TImode SUBREGs of registers in other modes.

gcc/testsuite/ChangeLog
* gcc.target/i386/sse4_1-ptest-7.c: New test case.


Thanks in advance,
Roger
--

diff --git a/gcc/config/i386/i386-features.cc b/gcc/config/i386/i386-features.cc
index 4a3b07a..6e9ba54 100644
--- a/gcc/config/i386/i386-features.cc
+++ b/gcc/config/i386/i386-features.cc
@@ -449,7 +449,8 @@ scalar_chain::add_insn (bitmap candidates, unsigned int 
insn_uid,
 return true;
 
   for (ref = DF_INSN_UID_USES (insn_uid); ref; ref = DF_REF_NEXT_LOC (ref))
-if (!DF_REF_REG_MEM_P (ref))
+if (DF_REF_TYPE (ref) == DF_REF_REG_USE
+   && !SUBREG_P (DF_REF_REG (ref)))
   if (!analyze_register_chain (candidates, ref, disallowed))
return false;
 
@@ -1621,7 +1622,8 @@ timode_scalar_chain::convert_op (rtx *op, rtx_insn *insn)
   else
 {
   gcc_assert (SUBREG_P (*op));
-  gcc_assert (GET_MODE (*op) == vmode);
+  if (GET_MODE (*op) != V1TImode)
+   *op = gen_lowpart (V1TImode, *op);
 }
 }
 
@@ -1912,12 +1914,8 @@ convertible_comparison_p (rtx_insn *insn, enum 
machine_mode mode)
   rtx op2 = XEXP (src, 1);
 
   /* *cmp_doubleword.  */
-  if ((CONST_SCALAR_INT_P (op1)
-   || ((REG_P (op1) || MEM_P (op1))
-  && GET_MODE (op1) == mode))
-  && (CONST_SCALAR_INT_P (op2)
- || ((REG_P (op2) || MEM_P (op2))
- && GET_MODE (op2) == mode)))
+  if (general_operand (op1, mode)
+  && general_operand (op2, mode))
 return true;
 
   /* *testti_doubleword.  */
@@ -2244,8 +2242,9 @@ timode_remove_non_convertible_regs (bitmap candidates)
   DF_REF_REGNO (ref));
 
FOR_EACH_INSN_USE (ref, insn)
- if (!DF_REF_REG_MEM_P (ref)
- && GET_MODE (DF_REF_REG (ref)) == TImode)
+ if (DF_REF_TYPE (ref) == DF_REF_REG_USE
+ && GET_MODE (DF_REF_REG (ref)) == TImode
+ && !SUBREG_P (DF_REF_REG (ref)))
timode_check_non_convertible_regs (candidates, regs,
   DF_REF_REGNO (ref));
   }
diff --git a/gcc/testsuite/gcc.target/i386/sse4_1-ptest-7.c 

Re: [PATCH] RISC-V: Split VF iterators for Zvfh(min).

2023-06-24 Thread Jeff Law via Gcc-patches




On 6/22/23 07:03, Robin Dapp wrote:

Hi,

when working on FP widening/narrowing I realized the Zvfhmin handling
is not ideal right now:  We use the "enabled" insn attribute to disable
instructions not available with Zvfhmin (but only with Zvfh).

However, "enabled == 0" only disables insn alternatives, in our case all
of them when the mode is a HFmode.  The insn itself remains available
(e.g. for combine to match) and we end up with an insn without alternatives
that reload cannot handle --> ICE.

The proper solution is to disable the instruction for the respective
mode altogether.  This patch achieves this by splitting the VF as well
as VWEXTF iterators into variants with TARGET_ZVFH and
TARGET_VECTOR_ELEN_FP_16 (which is true when either TARGET_ZVFH or
TARGET_ZVFHMIN are true).  Also, VWCONVERTI, VHF and VHF_LMUL1 need
adjustments.

Regards
  Robin

gcc/ChangeLog:

* config/riscv/autovec.md: VF_AUTO -> VF.
* config/riscv/vector-iterators.md: Introduce VF_ZVFHMIN,
VWEXTF_ZVFHMIN and use TARGET_ZVFH in VWCONVERTI, VHF and
VHF_LMUL1.
* config/riscv/vector.md: Use new iterators.

OK for the trunk.  Thanks for walking everyone through the issues here.

jeff


[Bug c++/110394] Lambda capture receives wrong value

2023-06-24 Thread xry111 at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394

Xi Ruoyao  changed:

   What|Removed |Added

 CC||xry111 at gcc dot gnu.org

--- Comment #4 from Xi Ruoyao  ---
(In reply to Andrew Pinski from comment #3)
> You can also try -fno-lifetime-dse to see if you get the behavior you were
> expecting too. Though I am not sure it will help extend the lifetime of the
> temporary here ...
> 
> 
> https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Optimize-Options.html#index-
> flifetime-dse

-fstack-reuse=named_vars maybe needed as well.  -flifetime-dse preserves the
stores outside of the lifetime, and -fstack-reuse=named_vars disallows reusing
the stack space of the temporary object for other objects.

[Bug c++/110394] Lambda capture receives wrong value

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394

--- Comment #3 from Andrew Pinski  ---
You can also try -fno-lifetime-dse to see if you get the behavior you were
expecting too. Though I am not sure it will help extend the lifetime of the
temporary here ...


https://gcc.gnu.org/onlinedocs/gcc-13.1.0/gcc/Optimize-Options.html#index-flifetime-dse

[Bug c++/110394] Lambda capture receives wrong value

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2023-06-24
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
I am almost think this is a bug in your code.
Take:

  auto wait_handle = tc::g_postbox->wait(
"UpdateInputs"sv, [=](const msgpack::object& obj) -> bool {

  });


The temporary for tc::postbox::acceptor_type will end its lifetime at the end
of that statement but tc::g_postbox->wait stores it off into m_awaiters.

And then gets poped off with:
  wait_handle.await();


You can fix this via extending the temporary via:
```
  tc::postbox::acceptor_type t = [=](const msgpack::object& obj) -> bool {
auto [rcv_index, rcv_value] = obj.as>();
tc::tracef(M64MSG_VERBOSE, "index = {}", index);
if (rcv_index != index)
  return false;

keys->Value = rcv_value;
return true;
  };
  auto wait_handle = tc::g_postbox->wait(
"UpdateInputs"sv, t);

```


Note `-fsantize=address` should catch this at runtime too.

[Bug gcov-profile/110395] New: GCOV stuck in an infinite loop with large std::array

2023-06-24 Thread carlosgalvezp at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110395

Bug ID: 110395
   Summary: GCOV stuck in an infinite loop with large std::array
   Product: gcc
   Version: 9.4.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: gcov-profile
  Assignee: unassigned at gcc dot gnu.org
  Reporter: carlosgalvezp at gmail dot com
CC: marxin at gcc dot gnu.org
  Target Milestone: ---

Hi!

We are bumping from GCC 7.5.0 to GCC 9.4.0 (Ubuntu 20.04) and observe that GCOV
is stuck when analyzing the following minimal repro code:

#include 
#include 

template 
class StaticVector
{
 public:
StaticVector() = default;
void foo(){}

 private:
std::array data{};
};

class Foo
{
StaticVector, 4> data_{};
};

int main()
{
Foo f;
return 0;
}


$ g++ --coverage main.cpp
$ ./a.out
$ gcov main.cpp

The problem goes away if I remove the value initialization for std::array in
the StaticVector class (i.e. I leave the member "data" uninitialized).

The same problem happens also on GCC 11 

What might be the reason for this? 

Thanks!

[Bug c++/110394] Lambda capture receives wrong value

2023-06-24 Thread jackyguo18 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394

--- Comment #1 from jackyguo18 at hotmail dot com ---
Created attachment 55396
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=55396=edit
.ii file which triggers the bug

I couldn't attach the original .ii file, so I had to compress it under gzip.

[Bug other/110394] New: Lambda capture receives wrong value

2023-06-24 Thread jackyguo18 at hotmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110394

Bug ID: 110394
   Summary: Lambda capture receives wrong value
   Product: gcc
   Version: 13.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: other
  Assignee: unassigned at gcc dot gnu.org
  Reporter: jackyguo18 at hotmail dot com
  Target Milestone: ---

Note that this doesn't occur in Clang, and to my knowledge, disabling strict
aliasing and overflow would make no difference.

The code submitted here is actually part of a larger library. When I go to
debug it, a lambda in `GetKeys(int index, BUTTONS* keys)` captures the wrong
value for `index`--it should be 0, but it's 23.

Changing the capture type from value to reference causes the lambda to
inexplicably call the address 0x17 (decimal 23).

[Bug tree-optimization/110389] [12/13/14 Regression] wrong code at -Os and -O2 with "-fno-tree-ch -fno-expensive-optimizations -fno-ivopts -fno-tree-loop-ivcanon" on x86_64-linux-gnu

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110389

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Target Milestone|--- |12.4
 Ever confirmed|0   |1
Summary|wrong code at -Os and -O2   |[12/13/14 Regression] wrong
   |with "-fno-tree-ch  |code at -Os and -O2 with
   |-fno-expensive-optimization |"-fno-tree-ch
   |s -fno-ivopts   |-fno-expensive-optimization
   |-fno-tree-loop-ivcanon" on  |s -fno-ivopts
   |x86_64-linux-gnu|-fno-tree-loop-ivcanon" on
   ||x86_64-linux-gnu
   Last reconfirmed||2023-06-24

--- Comment #1 from Andrew Pinski  ---
Something goes really wrong in DOM3.

  _7 = e.5_26 + 1;
  if (_7 <= 2)
goto ; [89.57%]
  else
goto ; [10.43%]

is optimized to always true.

[Bug rtl-optimization/110391] [12/13/14 Regression] wrong code at -O2 and -O3 with "-fsel-sched-pipelining -fselective-scheduling2" on x86_64-linux-gnu

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110391

Andrew Pinski  changed:

   What|Removed |Added

Summary|wrong code at -O2 and -O3   |[12/13/14 Regression] wrong
   |with|code at -O2 and -O3 with
   |"-fsel-sched-pipelining |"-fsel-sched-pipelining
   |-fselective-scheduling2" on |-fselective-scheduling2" on
   |x86_64-linux-gnu|x86_64-linux-gnu
Version|unknown |14.0
   See Also||https://gcc.gnu.org/bugzill
   ||a/show_bug.cgi?id=95123
   Target Milestone|--- |12.4

[Bug tree-optimization/110392] [13/14 Regression] ICE at -O3 with "-O3 -Wall -fno-tree-vrp -fno-tree-dominator-opts -fno-tree-copy-prop -fno-tree-fre -fno-tree-ccp -fno-tree-forwprop": in find_var_cmp

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110392

Andrew Pinski  changed:

   What|Removed |Added

 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-06-24

--- Comment #1 from Andrew Pinski  ---
Confirmed.

[Bug tree-optimization/110392] ICE at -O3 with "-O3 -Wall -fno-tree-vrp -fno-tree-dominator-opts -fno-tree-copy-prop -fno-tree-fre -fno-tree-ccp -fno-tree-forwprop": in find_var_cmp_const, at gimple-p

2023-06-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110392

Andrew Pinski  changed:

   What|Removed |Added

Summary|ICE at -O3 with "-w -O3 |ICE at -O3 with "-O3 -Wall
   |-Wall -fno-tree-vrp |-fno-tree-vrp
   |-fno-tree-dominator-opts|-fno-tree-dominator-opts
   |-fno-tree-copy-prop |-fno-tree-copy-prop
   |-fno-tree-fre -fno-tree-ccp |-fno-tree-fre -fno-tree-ccp
   |-fno-tree-forwprop": in |-fno-tree-forwprop": in
   |find_var_cmp_const, at  |find_var_cmp_const, at
   |gimple-predicate-analysis.c |gimple-predicate-analysis.c
   |c:257   |c:257
Version|unknown |14.0
   Target Milestone|--- |13.2
   Keywords||ice-on-valid-code

Re: [PATCH v7 0/6] c++, libstdc++: get std::is_object to dispatch to new built-in traits

2023-06-24 Thread Ken Matsui via Gcc-patches
On Tue, Jun 20, 2023 at 8:32 AM Patrick Palka  wrote:
>
> On Thu, 15 Jun 2023, Ken Matsui via Libstdc++ wrote:
>
> > Hi,
> >
> > For those curious about the performance improvements of this patch, I
> > conducted a benchmark that instantiates 256k specializations of
> > is_object_v based on Patrick's code. You can find the benchmark code
> > at this link:
> >
> > https://github.com/ken-matsui/gcc-benches/blob/main/is_object_benchmark.cc
> >
> > On my computer, using the gcc HEAD of this patch for a release build,
> > the patch with -DUSE_BUILTIN took 64% less time and used 44-47% less
> > memory compared to not using it.
>
> That's more like it :D  Though the benchmark should also invoke the
> trait on non-object types too, e.g. Instantiator& or Instantiator(int).

Here is the updated benchmark:

https://github.com/ken-matsui/gcc-benches/blob/main/is_object.md#sat-jun-24-080110-am-pdt-2023

Time: -74.7544%
Peak Memory Usage: -62.5913%
Total Memory Usage: -64.2708%

> >
> > Sincerely,
> > Ken Matsui
> >
> > On Mon, Jun 12, 2023 at 3:49 PM Ken Matsui  
> > wrote:
> > >
> > > Hi,
> > >
> > > This patch series gets std::is_object to dispatch to built-in traits and
> > > implements the following built-in traits, on which std::object depends.
> > >
> > > * __is_reference
> > > * __is_function
> > > * __is_void
> > >
> > > std::is_object was depending on them with disjunction and negation.
> > >
> > > __not_<__or_, is_reference<_Tp>, is_void<_Tp>>>::type
> > >
> > > Therefore, this patch uses them directly instead of implementing an 
> > > additional
> > > built-in trait __is_object, which makes the compiler slightly bigger and
> > > slower.
> > >
> > > __bool_constant > > __is_void(_Tp))>
> > >
> > > This would instantiate only __bool_constant and 
> > > __bool_constant,
> > > which can be mostly shared. That is, the purpose of built-in traits is
> > > considered as achieved.
> > >
> > > Changes in v7
> > >
> > > * Removed an unnecessary new line.
> > >
> > > Ken Matsui (6):
> > >   c++: implement __is_reference built-in trait
> > >   libstdc++: use new built-in trait __is_reference for std::is_reference
> > >   c++: implement __is_function built-in trait
> > >   libstdc++: use new built-in trait __is_function for std::is_function
> > >   c++, libstdc++: implement __is_void built-in trait
> > >   libstdc++: make std::is_object dispatch to new built-in traits
> > >
> > >  gcc/cp/constraint.cc  |  9 +++
> > >  gcc/cp/cp-trait.def   |  3 +
> > >  gcc/cp/semantics.cc   | 12 
> > >  gcc/testsuite/g++.dg/ext/has-builtin-1.C  |  9 +++
> > >  gcc/testsuite/g++.dg/ext/is_function.C| 58 +++
> > >  gcc/testsuite/g++.dg/ext/is_reference.C   | 34 +++
> > >  gcc/testsuite/g++.dg/ext/is_void.C| 35 +++
> > >  gcc/testsuite/g++.dg/tm/pr46567.C |  6 +-
> > >  libstdc++-v3/include/bits/cpp_type_traits.h   | 15 -
> > >  libstdc++-v3/include/debug/helper_functions.h |  5 +-
> > >  libstdc++-v3/include/std/type_traits  | 51 
> > >  11 files changed, 216 insertions(+), 21 deletions(-)
> > >  create mode 100644 gcc/testsuite/g++.dg/ext/is_function.C
> > >  create mode 100644 gcc/testsuite/g++.dg/ext/is_reference.C
> > >  create mode 100644 gcc/testsuite/g++.dg/ext/is_void.C
> > >
> > > --
> > > 2.41.0
> > >
> >
> >


Re: [PATCH] RISCV: Add -m(no)-omit-leaf-frame-pointer support.

2023-06-24 Thread Jeff Law via Gcc-patches




On 6/21/23 02:14, Wang, Yanzhang wrote:

Hi Jeff, sorry for the late reply.


The long branch handling is done at the assembler level.  So the clobbering
of $ra isn't visible to the compiler.  Thus the compiler has to be
extremely careful to not hold values in $ra because the assembler may
clobber $ra.


If assembler will modify the $ra behavior, it seems the rules we defined in
the riscv.cc will be ignored. For example, the $ra saving generated by this
patch may be modified by the assmebler and all others depends on it will be
wrong. So implementing the long jump in the compiler is better.
Basically correct.  The assembler potentially clobbers $ra.  That's why 
in the long jump patches $ra becomes a fixed register -- the compiler 
doesn't know when it's clobbered by the assembler.


Even if this were done in the compiler, we'd still have to do something 
special with $ra.  The point at which decisions about register 
allocation and such are made is before the point where we know the final 
positions of jumps/labels.  It's a classic problem in GCC's design.




If you're not going to use dwarf, then my recommendation is to ensure that
the data you need is *always* available in the stack at known
offsets.   That will mean your code isn't optimized as well.  It means
hand written assembly code has to follow the conventions, you can't link
against libraries that do not follow those conventions, etc etc.  But
that's the price you pay for not using dwarf (or presumably ORC/SFRAME
which I haven't studied in detail).


Yes. That's right. All the libraries need to follow the same logic. But as
you said, this is the price if we choose this solution. And fortunately,
this will only be used in special scenarios.
The key point is you want the location of the return pointer to be 
consistent in every function and you want to know that every function 
has a frame pointer.


Otherwise you end up having to either consult on-the-side tables (at 
which point you might as well look at ORC/SFRAME) or disassembling code 
in the executable to deduce where to find fp, ra, etc (which is a path 
to madness).


Thus for the usage scenario you're looking at, I would recommend always 
having a frame pointer, every function, no matter how trivial and that 
$ra always be saved into a suitable slot relative to the frame pointer, 
again, no matter how trivial the function.




And Jeff, do you have any other comments about this patch? Should we add
some descriptions somewhere in the doc?
We may need to adjust the documentation a bit since I think I'm 
suggesting slight changes in the behavior of existing -m options.


I'd like to see an updated patch before commenting further on 
implementation details.


jeff


Re: [PATCH V1] RISC-V:Add float16 tuple type abi

2023-06-24 Thread Jeff Law via Gcc-patches




On 6/21/23 01:46, juzhe.zh...@rivai.ai wrote:

LGTM. Thanks.

OK from me as well.
jeff


Re: [PATCH v2 1/2] c++: implement __is_volatile built-in trait

2023-06-24 Thread Ken Matsui via Gcc-patches
Here is the benchmark result for is_volatile:

https://github.com/ken-matsui/gcc-benches/blob/main/is_volatile.md#sat-jun-24-074036-am-pdt-2023

Time: -2.42335%
Peak Memory Usage: -1.07651%
Total Memory Usage: -1.62369%

On Sat, Jun 24, 2023 at 7:24 AM Ken Matsui  wrote:
>
> This patch implements built-in trait for std::is_volatile.
>
> gcc/cp/ChangeLog:
>
> * cp-trait.def: Define __is_volatile.
> * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_VOLATILE.
> * semantics.cc (trait_expr_value): Likewise.
> (finish_trait_expr): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/ext/has-builtin-1.C: Test existence of __is_volatile.
> * g++.dg/ext/is_volatile.C: New test.
>
> Signed-off-by: Ken Matsui 
> ---
>  gcc/cp/constraint.cc |  3 +++
>  gcc/cp/cp-trait.def  |  1 +
>  gcc/cp/semantics.cc  |  4 
>  gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 +++
>  gcc/testsuite/g++.dg/ext/is_volatile.C   | 19 +++
>  5 files changed, 30 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/ext/is_volatile.C
>
> diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> index 8cf0f2d0974..e971d67ee25 100644
> --- a/gcc/cp/constraint.cc
> +++ b/gcc/cp/constraint.cc
> @@ -3751,6 +3751,9 @@ diagnose_trait_expr (tree expr, tree args)
>  case CPTK_IS_UNION:
>inform (loc, "  %qT is not a union", t1);
>break;
> +case CPTK_IS_VOLATILE:
> +  inform (loc, "  %qT is not a volatile type", t1);
> +  break;
>  case CPTK_IS_AGGREGATE:
>inform (loc, "  %qT is not an aggregate", t1);
>break;
> diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
> index 8b7fece0cc8..414b1065a11 100644
> --- a/gcc/cp/cp-trait.def
> +++ b/gcc/cp/cp-trait.def
> @@ -82,6 +82,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, 
> "__is_trivially_assignable", 2)
>  DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, "__is_trivially_constructible", 
> -1)
>  DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1)
>  DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
> +DEFTRAIT_EXPR (IS_VOLATILE, "__is_volatile", 1)
>  DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
> "__reference_constructs_from_temporary", 2)
>  DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
> "__reference_converts_from_temporary", 2)
>  /* FIXME Added space to avoid direct usage in GCC 13.  */
> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index 8fb47fd179e..10934d01504 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -12079,6 +12079,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, 
> tree type2)
>  case CPTK_IS_ENUM:
>return type_code1 == ENUMERAL_TYPE;
>
> +case CPTK_IS_VOLATILE:
> +  return CP_TYPE_VOLATILE_P (type1);
> +
>  case CPTK_IS_FINAL:
>return CLASS_TYPE_P (type1) && CLASSTYPE_FINAL (type1);
>
> @@ -12296,6 +12299,7 @@ finish_trait_expr (location_t loc, cp_trait_kind 
> kind, tree type1, tree type2)
>  case CPTK_IS_ENUM:
>  case CPTK_IS_UNION:
>  case CPTK_IS_SAME:
> +case CPTK_IS_VOLATILE:
>break;
>
>  case CPTK_IS_LAYOUT_COMPATIBLE:
> diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
> b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> index f343e153e56..7ad640f141b 100644
> --- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> +++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> @@ -146,3 +146,6 @@
>  #if !__has_builtin (__remove_cvref)
>  # error "__has_builtin (__remove_cvref) failed"
>  #endif
> +#if !__has_builtin (__is_volatile)
> +# error "__has_builtin (__is_volatile) failed"
> +#endif
> diff --git a/gcc/testsuite/g++.dg/ext/is_volatile.C 
> b/gcc/testsuite/g++.dg/ext/is_volatile.C
> new file mode 100644
> index 000..004e397e5e7
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ext/is_volatile.C
> @@ -0,0 +1,19 @@
> +// { dg-do compile { target c++11 } }
> +
> +#include 
> +
> +using namespace __gnu_test;
> +
> +#define SA(X) static_assert((X),#X)
> +
> +// Positive tests.
> +SA(__is_volatile(volatile int));
> +SA(__is_volatile(const volatile int));
> +SA(__is_volatile(vClassType));
> +SA(__is_volatile(cvClassType));
> +
> +// Negative tests.
> +SA(!__is_volatile(int));
> +SA(!__is_volatile(const int));
> +SA(!__is_volatile(ClassType));
> +SA(!__is_volatile(cClassType));
> --
> 2.41.0
>


Re: [PATCH] GIMPLE_FOLD: Apply LEN_MASK_{LOAD, STORE} into GIMPLE_FOLD

2023-06-24 Thread Jeff Law via Gcc-patches




On 6/23/23 07:48, juzhe.zh...@rivai.ai wrote:

From: Ju-Zhe Zhong 

Hi, since we are going to have LEN_MASK_{LOAD,STORE} into loopVectorizer.

Currenly,
1. we can fold MASK_{LOAD,STORE} into MEM when mask is all ones.
2. we can fold LEN_{LOAD,STORE} into MEM when (len - bias) is VF.

Now, I think it makes sense that we can support

fold LEN_MASK_{LOAD,STORE} into MEM when both mask = all ones and (len - bias) 
is VF.
  
gcc/ChangeLog:


 * gimple-fold.cc (arith_overflowed_p): Apply LEN_MASK_{LOAD,STORE}.
 (gimple_fold_partial_load_store_mem_ref): Ditto.
 (gimple_fold_partial_store): Ditto.
 (gimple_fold_call): Ditto.

OK
jeff


RE: [PATCH] RISC-V: Refactor the integer ternary autovec pattern

2023-06-24 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Saturday, June 24, 2023 10:04 PM
To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org
Cc: kito.ch...@sifive.com; pal...@rivosinc.com; rdapp@gmail.com
Subject: Re: [PATCH] RISC-V: Refactor the integer ternary autovec pattern



On 6/21/23 16:38, Juzhe-Zhong wrote:
> Long time ago, I encounter ICE when trying to set clobber register as Pmode
> and I forgot the reason.
> 
> So, I clobber SI scratch and PUT_MODE to make it Pmode after reload which
> makes patterns look unreasonable.
> 
> According to Jeff's comments, I tried it again, it works now when we try to
> set clobber register as Pmode and the patterns look more reasonable now.
> 
> The tests are all passed, Ok for trunk.
> 
> gcc/ChangeLog:
> 
>  * config/riscv/autovec.md (*fma): set clobber to Pmode in 
> expand stage.
>  (*fma): Ditto.
>  (*fnma): Ditto.
>  (*fnma): Ditto.
OK
jeff


RE: [PATCH V3] RISC-V: Support RVV floating-point auto-vectorization

2023-06-24 Thread Li, Pan2 via Gcc-patches
Committed, thanks Jeff.

Pan

-Original Message-
From: Gcc-patches  On Behalf 
Of Jeff Law via Gcc-patches
Sent: Saturday, June 24, 2023 10:06 PM
To: Juzhe-Zhong ; gcc-patches@gcc.gnu.org
Cc: kito.ch...@sifive.com; pal...@rivosinc.com; rdapp@gmail.com
Subject: Re: [PATCH V3] RISC-V: Support RVV floating-point auto-vectorization



On 6/21/23 09:53, Juzhe-Zhong wrote:
> This patch adds RVV floating-point auto-vectorization.
> Also, fix attribute bug of floating-point ternary operations in vector.md.
> 
> gcc/ChangeLog:
> 
>  * config/riscv/autovec.md (fma4): New pattern.
>  (*fma): Ditto.
>  (fnma4): Ditto.
>  (*fnma): Ditto.
>  (fms4): Ditto.
>  (*fms): Ditto.
>  (fnms4): Ditto.
>  (*fnms): Ditto.
>  * config/riscv/riscv-protos.h (emit_vlmax_fp_ternary_insn): New 
> function.
>  * config/riscv/riscv-v.cc (emit_vlmax_fp_ternary_insn): Ditto.
>  * config/riscv/vector.md: Fix attribute bug.
OK.  Thanks for digging into that clobber issue.

Jeff



Re: [PATCH v2] RISC-V: Implement autovec copysign.

2023-06-24 Thread Jeff Law via Gcc-patches




On 6/21/23 08:24, 钟居哲 wrote:

LGTM.

Likewise.  OK for the trunk.
jeff


Re: [PATCH][RFC] middle-end/110237 - wrong MEM_ATTRs for partial loads/stores

2023-06-24 Thread Jeff Law via Gcc-patches




On 6/22/23 00:39, Richard Biener wrote:




I suspect there's no way to specify the desired semantics?  OTOH
code that looks at the MEM operand only and not the insn (which
should have some UNSPEC wrapped) needs to be conservative, so maybe
the alias code shouldn't assume that a (mem:V16SI ..) actually
performs an access of the size of V16SI at the specified location?
I'm not aware of a way to express the semantics fully right now.  We'd 
need some way to indicate that the MEM is a partial and pass along the 
actual length.


We could do both through MEM_ATTRS with some work. For example we could 
declare that for vector modes full semantic information is carried in 
the MEM_ATTRS rather than by the mode itself.  So it falls into a space 
between how we currently think of something like V16SI and BLK.  The 
mode specifies a maximum size and how to interpret the elements.  But 
actual size and perhaps mask info would be found in MEM_ATTRS.


jeff


[Bug ada/105212] -gnatwu gives false error message for certain arrays.

2023-06-24 Thread service at totalplanlos dot de via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105212

--- Comment #2 from Honki Tonk  ---
The error still occurs with version 13.1.

[PATCH v2 2/2] libstdc++: use new built-in trait __is_volatile

2023-06-24 Thread Ken Matsui via Gcc-patches
This patch lets libstdc++ use new built-in trait __is_volatile.

libstdc++-v3/ChangeLog:

* include/std/type_traits (is_volatile): Use __is_volatile built-in
trait.
(is_volatile_v): Likewise.

Signed-off-by: Ken Matsui 
---
 libstdc++-v3/include/std/type_traits | 13 +
 1 file changed, 13 insertions(+)

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 0e7a9c9c7f3..db74b884b35 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -773,6 +773,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 : public true_type { };
 
   /// is_volatile
+#if __has_builtin(__is_volatile)
+  template
+struct is_volatile
+: public __bool_constant<__is_volatile(_Tp)>
+{ };
+#else
   template
 struct is_volatile
 : public false_type { };
@@ -780,6 +786,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
 struct is_volatile<_Tp volatile>
 : public true_type { };
+#endif
 
   /// is_trivial
   template
@@ -3214,10 +3221,16 @@ template 
   inline constexpr bool is_const_v = false;
 template 
   inline constexpr bool is_const_v = true;
+
+#if __has_builtin(__is_volatile)
+template 
+  inline constexpr bool is_volatile_v = __is_volatile(_Tp);
+#else
 template 
   inline constexpr bool is_volatile_v = false;
 template 
   inline constexpr bool is_volatile_v = true;
+#endif
 
 template 
   inline constexpr bool is_trivial_v = __is_trivial(_Tp);
-- 
2.41.0



Re: [PATCH] Improve DSE to handle stores before __builtin_unreachable ()

2023-06-24 Thread Jeff Law via Gcc-patches




On 6/22/23 07:42, Jan Hubicka wrote:



On 6/22/23 00:31, Richard Biener wrote:

I think there's a difference in that __builtin_trap () is observable
while __builtin_unreachable () is not and reaching __builtin_unreachable
() invokes undefined behavior while reaching __builtin_trap () does not.

So the isolation code marking the trapping code volatile should be
enough and the trap () is just there to end the basic block
(and maybe be on the safe side to really trap).

Agreed WRT observability -- but that's not really the point of the trap and
if we wanted we could change that behavior.

The trap is there to halt execution immediately rather than letting it keep
running.  That was a design decision from a security standpoint. If we've
detected that we're executing undefined behavior, stop rather than
potentially letting a malicious actor turn a bug into an exploit.


Also as discussed some time ago, the volatile loads between traps has
effect of turning previously pure/const functions into non-const which
is somewhat sad, so it is still on my todo list to change it this stage1
to something more careful.   We discussed internal functions trap_store
and trap_load which will expand to load/store + trap but will make it
clear that side effect does not count to modref.
It's been a long time since I looked at this code -- isn't it the case 
that we already must have had a load/store and that all we've done is 
change its form (to enable more DCE) and added the volatile marker?


Meaning that it couldn't have been pure/cost before, could it?  Or is it 
the case that you want to not have the erroneous path be the sole reason 
to spoil pure/const detection -- does that happen often in practice?


jeff


[PATCH v2 1/2] c++: implement __is_volatile built-in trait

2023-06-24 Thread Ken Matsui via Gcc-patches
This patch implements built-in trait for std::is_volatile.

gcc/cp/ChangeLog:

* cp-trait.def: Define __is_volatile.
* constraint.cc (diagnose_trait_expr): Handle CPTK_IS_VOLATILE.
* semantics.cc (trait_expr_value): Likewise.
(finish_trait_expr): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/ext/has-builtin-1.C: Test existence of __is_volatile.
* g++.dg/ext/is_volatile.C: New test.

Signed-off-by: Ken Matsui 
---
 gcc/cp/constraint.cc |  3 +++
 gcc/cp/cp-trait.def  |  1 +
 gcc/cp/semantics.cc  |  4 
 gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 +++
 gcc/testsuite/g++.dg/ext/is_volatile.C   | 19 +++
 5 files changed, 30 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/ext/is_volatile.C

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 8cf0f2d0974..e971d67ee25 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -3751,6 +3751,9 @@ diagnose_trait_expr (tree expr, tree args)
 case CPTK_IS_UNION:
   inform (loc, "  %qT is not a union", t1);
   break;
+case CPTK_IS_VOLATILE:
+  inform (loc, "  %qT is not a volatile type", t1);
+  break;
 case CPTK_IS_AGGREGATE:
   inform (loc, "  %qT is not an aggregate", t1);
   break;
diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
index 8b7fece0cc8..414b1065a11 100644
--- a/gcc/cp/cp-trait.def
+++ b/gcc/cp/cp-trait.def
@@ -82,6 +82,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, 
"__is_trivially_assignable", 2)
 DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, "__is_trivially_constructible", -1)
 DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1)
 DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
+DEFTRAIT_EXPR (IS_VOLATILE, "__is_volatile", 1)
 DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
"__reference_constructs_from_temporary", 2)
 DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
"__reference_converts_from_temporary", 2)
 /* FIXME Added space to avoid direct usage in GCC 13.  */
diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 8fb47fd179e..10934d01504 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12079,6 +12079,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, tree 
type2)
 case CPTK_IS_ENUM:
   return type_code1 == ENUMERAL_TYPE;
 
+case CPTK_IS_VOLATILE:
+  return CP_TYPE_VOLATILE_P (type1);
+
 case CPTK_IS_FINAL:
   return CLASS_TYPE_P (type1) && CLASSTYPE_FINAL (type1);
 
@@ -12296,6 +12299,7 @@ finish_trait_expr (location_t loc, cp_trait_kind kind, 
tree type1, tree type2)
 case CPTK_IS_ENUM:
 case CPTK_IS_UNION:
 case CPTK_IS_SAME:
+case CPTK_IS_VOLATILE:
   break;
 
 case CPTK_IS_LAYOUT_COMPATIBLE:
diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
index f343e153e56..7ad640f141b 100644
--- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
+++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
@@ -146,3 +146,6 @@
 #if !__has_builtin (__remove_cvref)
 # error "__has_builtin (__remove_cvref) failed"
 #endif
+#if !__has_builtin (__is_volatile)
+# error "__has_builtin (__is_volatile) failed"
+#endif
diff --git a/gcc/testsuite/g++.dg/ext/is_volatile.C 
b/gcc/testsuite/g++.dg/ext/is_volatile.C
new file mode 100644
index 000..004e397e5e7
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ext/is_volatile.C
@@ -0,0 +1,19 @@
+// { dg-do compile { target c++11 } }
+
+#include 
+
+using namespace __gnu_test;
+
+#define SA(X) static_assert((X),#X)
+
+// Positive tests.
+SA(__is_volatile(volatile int));
+SA(__is_volatile(const volatile int));
+SA(__is_volatile(vClassType));
+SA(__is_volatile(cvClassType));
+
+// Negative tests.
+SA(!__is_volatile(int));
+SA(!__is_volatile(const int));
+SA(!__is_volatile(ClassType));
+SA(!__is_volatile(cClassType));
-- 
2.41.0



Re: [PATCH v2 1/2] c++: implement __is_array built-in trait

2023-06-24 Thread Ken Matsui via Gcc-patches
Here is the benchmark result for is_array:

https://github.com/ken-matsui/gcc-benches/blob/main/is_array.md#sat-jun-24-070630-am-pdt-2023

Time: -15.511%
Peak Memory Usage: +0.173923%
Total Memory Usage: -6.2037%

On Sat, Jun 24, 2023 at 6:54 AM Ken Matsui  wrote:
>
> This patch implements built-in trait for std::is_array.
>
> gcc/cp/ChangeLog:
>
> * cp-trait.def: Define __is_array.
> * constraint.cc (diagnose_trait_expr): Handle CPTK_IS_ARRAY.
> * semantics.cc (trait_expr_value): Likewise.
> (finish_trait_expr): Likewise.
>
> gcc/testsuite/ChangeLog:
>
> * g++.dg/ext/has-builtin-1.C: Test existence of __is_array.
> * g++.dg/ext/is_array.C: New test.
>
> Signed-off-by: Ken Matsui 
> ---
>  gcc/cp/constraint.cc |  3 +++
>  gcc/cp/cp-trait.def  |  1 +
>  gcc/cp/semantics.cc  |  4 
>  gcc/testsuite/g++.dg/ext/has-builtin-1.C |  3 +++
>  gcc/testsuite/g++.dg/ext/is_array.C  | 28 
>  5 files changed, 39 insertions(+)
>  create mode 100644 gcc/testsuite/g++.dg/ext/is_array.C
>
> diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
> index 8cf0f2d0974..7cec7eba591 100644
> --- a/gcc/cp/constraint.cc
> +++ b/gcc/cp/constraint.cc
> @@ -3751,6 +3751,9 @@ diagnose_trait_expr (tree expr, tree args)
>  case CPTK_IS_UNION:
>inform (loc, "  %qT is not a union", t1);
>break;
> +case CPTK_IS_ARRAY:
> +  inform (loc, "  %qT is not an array", t1);
> +  break;
>  case CPTK_IS_AGGREGATE:
>inform (loc, "  %qT is not an aggregate", t1);
>break;
> diff --git a/gcc/cp/cp-trait.def b/gcc/cp/cp-trait.def
> index 8b7fece0cc8..f68c7f2e8ec 100644
> --- a/gcc/cp/cp-trait.def
> +++ b/gcc/cp/cp-trait.def
> @@ -82,6 +82,7 @@ DEFTRAIT_EXPR (IS_TRIVIALLY_ASSIGNABLE, 
> "__is_trivially_assignable", 2)
>  DEFTRAIT_EXPR (IS_TRIVIALLY_CONSTRUCTIBLE, "__is_trivially_constructible", 
> -1)
>  DEFTRAIT_EXPR (IS_TRIVIALLY_COPYABLE, "__is_trivially_copyable", 1)
>  DEFTRAIT_EXPR (IS_UNION, "__is_union", 1)
> +DEFTRAIT_EXPR (IS_ARRAY, "__is_array", 1)
>  DEFTRAIT_EXPR (REF_CONSTRUCTS_FROM_TEMPORARY, 
> "__reference_constructs_from_temporary", 2)
>  DEFTRAIT_EXPR (REF_CONVERTS_FROM_TEMPORARY, 
> "__reference_converts_from_temporary", 2)
>  /* FIXME Added space to avoid direct usage in GCC 13.  */
> diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
> index 8fb47fd179e..22f2700ec0b 100644
> --- a/gcc/cp/semantics.cc
> +++ b/gcc/cp/semantics.cc
> @@ -12118,6 +12118,9 @@ trait_expr_value (cp_trait_kind kind, tree type1, 
> tree type2)
>  case CPTK_IS_UNION:
>return type_code1 == UNION_TYPE;
>
> +case CPTK_IS_ARRAY:
> +  return type_code1 == ARRAY_TYPE;
> +
>  case CPTK_IS_ASSIGNABLE:
>return is_xible (MODIFY_EXPR, type1, type2);
>
> @@ -12296,6 +12299,7 @@ finish_trait_expr (location_t loc, cp_trait_kind 
> kind, tree type1, tree type2)
>  case CPTK_IS_ENUM:
>  case CPTK_IS_UNION:
>  case CPTK_IS_SAME:
> +case CPTK_IS_ARRAY:
>break;
>
>  case CPTK_IS_LAYOUT_COMPATIBLE:
> diff --git a/gcc/testsuite/g++.dg/ext/has-builtin-1.C 
> b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> index f343e153e56..56485ae62be 100644
> --- a/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> +++ b/gcc/testsuite/g++.dg/ext/has-builtin-1.C
> @@ -146,3 +146,6 @@
>  #if !__has_builtin (__remove_cvref)
>  # error "__has_builtin (__remove_cvref) failed"
>  #endif
> +#if !__has_builtin (__is_array)
> +# error "__has_builtin (__is_array) failed"
> +#endif
> diff --git a/gcc/testsuite/g++.dg/ext/is_array.C 
> b/gcc/testsuite/g++.dg/ext/is_array.C
> new file mode 100644
> index 000..facfed5c7cb
> --- /dev/null
> +++ b/gcc/testsuite/g++.dg/ext/is_array.C
> @@ -0,0 +1,28 @@
> +// { dg-do compile { target c++11 } }
> +
> +#include 
> +
> +using namespace __gnu_test;
> +
> +#define SA(X) static_assert((X),#X)
> +#define SA_TEST_CATEGORY(TRAIT, X, expect) \
> +  SA(TRAIT(X) == expect);  \
> +  SA(TRAIT(const X) == expect);\
> +  SA(TRAIT(volatile X) == expect); \
> +  SA(TRAIT(const volatile X) == expect)
> +
> +SA_TEST_CATEGORY(__is_array, int[2], true);
> +SA_TEST_CATEGORY(__is_array, int[], true);
> +SA_TEST_CATEGORY(__is_array, int[2][3], true);
> +SA_TEST_CATEGORY(__is_array, int[][3], true);
> +SA_TEST_CATEGORY(__is_array, float*[2], true);
> +SA_TEST_CATEGORY(__is_array, float*[], true);
> +SA_TEST_CATEGORY(__is_array, float*[2][3], true);
> +SA_TEST_CATEGORY(__is_array, float*[][3], true);
> +SA_TEST_CATEGORY(__is_array, ClassType[2], true);
> +SA_TEST_CATEGORY(__is_array, ClassType[], true);
> +SA_TEST_CATEGORY(__is_array, ClassType[2][3], true);
> +SA_TEST_CATEGORY(__is_array, ClassType[][3], true);
> +
> +// Sanity check.
> +SA_TEST_CATEGORY(__is_array, ClassType, false);
> --
> 2.41.0
>


Re: [PATCH] SSA ALIAS: Apply LEN_MASK_STORE to 'ref_maybe_used_by_call_p_1'

2023-06-24 Thread Jeff Law via Gcc-patches




On 6/23/23 17:20, 钟居哲 wrote:

Not sure since I saw MASK_STORE/LEN_STORE didn't compute size.
Also OK after a re-review on my part.  The code sets the size to -1 
after doing the ao_ref_init_from_ptr_and_size, meaning it's not a known 
size.


jeff



Re: [PATCH] SSA ALIAS: Apply LEN_MASK_{LOAD, STORE} into SSA alias analysis

2023-06-24 Thread Jeff Law via Gcc-patches




On 6/23/23 17:21, 钟居哲 wrote:

Not sure since I saw MASK_STORE/LEN_STORE didn't compute size.

Yea, I think you're right.  We take the size from the LHS.  My mistake.

This is fine for the trunk.

jeff


Re: [PATCH V3] RISC-V: Support RVV floating-point auto-vectorization

2023-06-24 Thread Jeff Law via Gcc-patches




On 6/21/23 09:53, Juzhe-Zhong wrote:

This patch adds RVV floating-point auto-vectorization.
Also, fix attribute bug of floating-point ternary operations in vector.md.

gcc/ChangeLog:

 * config/riscv/autovec.md (fma4): New pattern.
 (*fma): Ditto.
 (fnma4): Ditto.
 (*fnma): Ditto.
 (fms4): Ditto.
 (*fms): Ditto.
 (fnms4): Ditto.
 (*fnms): Ditto.
 * config/riscv/riscv-protos.h (emit_vlmax_fp_ternary_insn): New 
function.
 * config/riscv/riscv-v.cc (emit_vlmax_fp_ternary_insn): Ditto.
 * config/riscv/vector.md: Fix attribute bug.

OK.  Thanks for digging into that clobber issue.

Jeff



Re: [PATCH] RISC-V: Refactor the integer ternary autovec pattern

2023-06-24 Thread Jeff Law via Gcc-patches




On 6/21/23 16:38, Juzhe-Zhong wrote:

Long time ago, I encounter ICE when trying to set clobber register as Pmode
and I forgot the reason.

So, I clobber SI scratch and PUT_MODE to make it Pmode after reload which
makes patterns look unreasonable.

According to Jeff's comments, I tried it again, it works now when we try to
set clobber register as Pmode and the patterns look more reasonable now.

The tests are all passed, Ok for trunk.

gcc/ChangeLog:

 * config/riscv/autovec.md (*fma): set clobber to Pmode in expand 
stage.
 (*fma): Ditto.
 (*fnma): Ditto.
 (*fnma): Ditto.

OK
jeff


  1   2   >