date:20220614

[Bug target/105953] [12/13 Regression] ICE in extract_insn, at recog.cc:2791

2022-06-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105953

--- Comment #3 from CVS Commits  ---
The releases/gcc-12 branch has been updated by hongtao Liu
:

https://gcc.gnu.org/g:bac09a893145056217b1e9a0054466a770815c43

commit r12-8482-gbac09a893145056217b1e9a0054466a770815c43
Author: liuhongt 
Date:   Tue Jun 14 16:27:04 2022 +0800

Fix ICE in extract_insn, at recog.cc:2791

(In reply to UroÅ¡ Bizjak from comment #1)
> Instruction does not accept memory operand for operand 3:
>
> (define_insn_and_split
> "*_blendv_ltint"
>   [(set (match_operand: 0 "register_operand" "=Yr,*x,x")
>   (unspec:
> [(match_operand: 1 "register_operand" "0,0,x")
>  (match_operand: 2 "vector_operand" "YrBm,*xBm,xm")
>  (subreg:
>(lt:VI48_AVX
>  (match_operand:VI48_AVX 3 "register_operand" "Yz,Yz,x")
>  (match_operand:VI48_AVX 4 "const0_operand")) 0)]
> UNSPEC_BLENDV))]
>
> The problematic insn is:
>
> (define_insn_and_split "*avx_cmp3_ltint_not"
>  [(set (match_operand:VI48_AVX  0 "register_operand")
>(vec_merge:VI48_AVX
>(match_operand:VI48_AVX 1 "vector_operand")
>(match_operand:VI48_AVX 2 "vector_operand")
>(unspec:
>  [(subreg:VI48_AVX
>   (not:
> (match_operand: 3 "vector_operand")) 0)
>   (match_operand:VI48_AVX 4 "const0_operand")
>   (match_operand:SI 5 "const_0_to_7_operand")]
>   UNSPEC_PCMP)))]
>
> which gets split to the above pattern.
>
> In the preparation statements we have:
>
>   if (!MEM_P (operands[3]))
> operands[3] = force_reg (mode, operands[3]);
>   operands[3] = lowpart_subreg (mode, operands[3],
mode);
>
> Which won't fly when operand 3 is memory operand...
>

gcc/ChangeLog:

PR target/105953
* config/i386/sse.md (*avx_cmp3_ltint_not): Force_reg
operands[3].

gcc/testsuite/ChangeLog:

* g++.target/i386/pr105953.C: New test.

[Bug target/105953] [12/13 Regression] ICE in extract_insn, at recog.cc:2791

2022-06-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105953

--- Comment #2 from CVS Commits  ---
The master branch has been updated by hongtao Liu :

https://gcc.gnu.org/g:4b1a827f024234aaf83ecfe90415e88b525d3969

commit r13-1099-g4b1a827f024234aaf83ecfe90415e88b525d3969
Author: liuhongt 
Date:   Tue Jun 14 16:27:04 2022 +0800

Fix ICE in extract_insn, at recog.cc:2791

(In reply to UroÅ¡ Bizjak from comment #1)
> Instruction does not accept memory operand for operand 3:
>
> (define_insn_and_split
> "*_blendv_ltint"
>   [(set (match_operand: 0 "register_operand" "=Yr,*x,x")
>   (unspec:
> [(match_operand: 1 "register_operand" "0,0,x")
>  (match_operand: 2 "vector_operand" "YrBm,*xBm,xm")
>  (subreg:
>(lt:VI48_AVX
>  (match_operand:VI48_AVX 3 "register_operand" "Yz,Yz,x")
>  (match_operand:VI48_AVX 4 "const0_operand")) 0)]
> UNSPEC_BLENDV))]
>
> The problematic insn is:
>
> (define_insn_and_split "*avx_cmp3_ltint_not"
>  [(set (match_operand:VI48_AVX  0 "register_operand")
>(vec_merge:VI48_AVX
>(match_operand:VI48_AVX 1 "vector_operand")
>(match_operand:VI48_AVX 2 "vector_operand")
>(unspec:
>  [(subreg:VI48_AVX
>   (not:
> (match_operand: 3 "vector_operand")) 0)
>   (match_operand:VI48_AVX 4 "const0_operand")
>   (match_operand:SI 5 "const_0_to_7_operand")]
>   UNSPEC_PCMP)))]
>
> which gets split to the above pattern.
>
> In the preparation statements we have:
>
>   if (!MEM_P (operands[3]))
> operands[3] = force_reg (mode, operands[3]);
>   operands[3] = lowpart_subreg (mode, operands[3],
mode);
>
> Which won't fly when operand 3 is memory operand...
>

gcc/ChangeLog:

PR target/105953
* config/i386/sse.md (*avx_cmp3_ltint_not): Force_reg
operands[3].

gcc/testsuite/ChangeLog:

* g++.target/i386/pr105953.C: New test.

Re: [PATCH 0/9] Add debug_annotate attributes

2022-06-14 Thread Yonghong Song via Gcc-patches





On 6/7/22 2:43 PM, David Faust wrote:

Hello,

This patch series adds support for:

- Two new C-language-level attributes that allow to associate (to "annotate" or
   to "tag") particular declarations and types with arbitrary strings. As
   explained below, this is intended to be used to, for example, characterize
   certain pointer types.

- The conveyance of that information in the DWARF output in the form of a new
   DIE: DW_TAG_GNU_annotation.

- The conveyance of that information in the BTF output in the form of two new
   kinds of BTF objects: BTF_KIND_DECL_TAG and BTF_KIND_TYPE_TAG.

All of these facilities are being added to the eBPF ecosystem, and support for
them exists in some form in LLVM.

Purpose
===

1)  Addition of C-family language constructs (attributes) to specify free-text
 tags on certain language elements, such as struct fields.

 The purpose of these annotations is to provide additional information about
 types, variables, and function parameters of interest to the kernel. A
 driving use case is to tag pointer types within the linux kernel and eBPF
 programs with additional semantic information, such as '__user' or '__rcu'.

 For example, consider the linux kernel function do_execve with the
 following declaration:

   static int do_execve(struct filename *filename,
  const char __user *const __user *__argv,
  const char __user *const __user *__envp);

 Here, __user could be defined with these annotations to record semantic
 information about the pointer parameters (e.g., they are user-provided) in
 DWARF and BTF information. Other kernel facilites such as the eBPF verifier
 can read the tags and make use of the information.

2)  Conveying the tags in the generated DWARF debug info.

 The main motivation for emitting the tags in DWARF is that the Linux kernel
 generates its BTF information via pahole, using DWARF as a source:

 ++  BTF  BTF   +--+
 | pahole |---> vmlinux.btf --->| verifier |
 ++ +--+
 ^^
 ||
   DWARF |BTF |
 ||
  vmlinux  +-+
  module1.ko   | BPF program |
  module2.ko   +-+
...

 This is because:

 a)  Unlike GCC, LLVM will only generate BTF for BPF programs.

 b)  GCC can generate BTF for whatever target with -gbtf, but there is no
 support for linking/deduplicating BTF in the linker.

 In the scenario above, the verifier needs access to the pointer tags of
 both the kernel types/declarations (conveyed in the DWARF and translated
 to BTF by pahole) and those of the BPF program (available directly in BTF).

 Another motivation for having the tag information in DWARF, unrelated to
 BPF and BTF, is that the drgn project (another DWARF consumer) also wants
 to benefit from these tags in order to differentiate between different
 kinds of pointers in the kernel.

3)  Conveying the tags in the generated BTF debug info.

 This is easy: the main purpose of having this info in BTF is for the
 compiled eBPF programs. The kernel verifier can then access the tags
 of pointers used by the eBPF programs.


For more information about these tags and the motivation behind them, please
refer to the following linux kernel discussions:

   https://lore.kernel.org/bpf/20210914223004.244411-1-...@fb.com/
   https://lore.kernel.org/bpf/20211012164838.3345699-1-...@fb.com/
   https://lore.kernel.org/bpf/2022012604.1504583-1-...@fb.com/


Implementation Overview
===

To enable these annotations, two new C language attributes are added:
__attribute__((debug_annotate_decl("foo"))) and
__attribute__((debug_annotate_type("bar"))). Both attributes accept a single
arbitrary string constant argument, which will be recorded in the generated
DWARF and/or BTF debug information. They have no effect on code generation.

Note that we are not using the same attribute names as LLVM (btf_decl_tag and
btf_type_tag, respectively). While these attributes are functionally very
similar, they have grown beyond purely BTF-specific uses, so inclusion of "btf"
in the attribute name seems misleading.

DWARF support is enabled via a new DW_TAG_GNU_annotation. When generating DWARF,
declarations and types will be checked for the corresponding attributes. If
present, a DW_TAG_GNU_annotation DIE will be created as a child of the DIE for
the annotated type or declaration, one for each tag. These DIEs link the
arbitrary tag value to the item they annotate.

For example, the following variable declaration:

   #define __typetag1

Re: [PATCH] Fix ICE in extract_insn, at recog.cc:2791

2022-06-14 Thread Uros Bizjak via Gcc-patches

On Wed, Jun 15, 2022 at 12:49 AM liuhongt  wrote:
>
> (In reply to Uroš Bizjak from comment #1)
> > Instruction does not accept memory operand for operand 3:
> >
> > (define_insn_and_split
> > "*_blendv_ltint"
> >   [(set (match_operand: 0 "register_operand" "=Yr,*x,x")
> >   (unspec:
> > [(match_operand: 1 "register_operand" "0,0,x")
> >  (match_operand: 2 "vector_operand" "YrBm,*xBm,xm")
> >  (subreg:
> >(lt:VI48_AVX
> >  (match_operand:VI48_AVX 3 "register_operand" "Yz,Yz,x")
> >  (match_operand:VI48_AVX 4 "const0_operand")) 0)]
> > UNSPEC_BLENDV))]
> >
> > The problematic insn is:
> >
> > (define_insn_and_split "*avx_cmp3_ltint_not"
> >  [(set (match_operand:VI48_AVX  0 "register_operand")
> >(vec_merge:VI48_AVX
> >(match_operand:VI48_AVX 1 "vector_operand")
> >(match_operand:VI48_AVX 2 "vector_operand")
> >(unspec:
> >  [(subreg:VI48_AVX
> >   (not:
> > (match_operand: 3 "vector_operand")) 0)
> >   (match_operand:VI48_AVX 4 "const0_operand")
> >   (match_operand:SI 5 "const_0_to_7_operand")]
> >   UNSPEC_PCMP)))]
> >
> > which gets split to the above pattern.
> >
> > In the preparation statements we have:
> >
> >   if (!MEM_P (operands[3]))
> > operands[3] = force_reg (mode, operands[3]);
> >   operands[3] = lowpart_subreg (mode, operands[3], mode);
> >
> > Which won't fly when operand 3 is memory operand...
> >
>
> Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
> Ok for trunk?
>
> gcc/ChangeLog:
>
> PR target/105953
> * config/i386/sse.md (*avx_cmp3_ltint_not): Force_reg
> operands[3].
>
> gcc/testsuite/ChangeLog:
>
> * g++.target/i386/pr105953.C: New test.

LGTM.

Thanks,
Uros.

> ---
>  gcc/config/i386/sse.md   | 3 +--
>  gcc/testsuite/g++.target/i386/pr105953.C | 4 
>  2 files changed, 5 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/g++.target/i386/pr105953.C
>
> diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
> index 75609eaf9b7..3e3d96fe087 100644
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -3643,8 +3643,7 @@ (define_insn_and_split "*avx_cmp3_ltint_not"
>   gen_lowpart (mode, operands[1]));
>operands[2] = gen_lowpart (mode, operands[2]);
>
> -  if (!MEM_P (operands[3]))
> -operands[3] = force_reg (mode, operands[3]);
> +  operands[3] = force_reg (mode, operands[3]);
>operands[3] = lowpart_subreg (mode, operands[3], mode);
>  })
>
> diff --git a/gcc/testsuite/g++.target/i386/pr105953.C 
> b/gcc/testsuite/g++.target/i386/pr105953.C
> new file mode 100644
> index 000..b423d2dfdae
> --- /dev/null
> +++ b/gcc/testsuite/g++.target/i386/pr105953.C
> @@ -0,0 +1,4 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -mavx512vl -mabi=ms" } */
> +
> +#include "pr100738-1.C"
> --
> 2.18.1
>

[Bug rtl-optimization/105041] '-fcompare-debug' failure w/ -mcpu=power6 -O2 -fharden-compares -frename-registers

2022-06-14 Thread jskumari at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105041

Surya Kumari Jangala  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #8 from Surya Kumari Jangala  ---
Fixed.

[Bug middle-end/105984] New: [13 Regression] wrong code with __builtin_mul_overflow_p() at -O1

2022-06-14 Thread zsojka at seznam dot cz via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105984

Bug ID: 105984
   Summary: [13 Regression] wrong code with
__builtin_mul_overflow_p() at -O1
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: wrong-code
  Severity: normal
  Priority: P3
 Component: middle-end
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
CC: jakub at gcc dot gnu.org
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu

Created attachment 53139
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53139=edit
reduced testcase

Output:
$ x86_64-pc-linux-gnu-gcc -O1 testcase.c
$ ./a.out 
Aborted

The generated code:
main:
mov rax, QWORD PTR g[rip]
sub rax, 1
mov QWORD PTR g[rip], rax
jne .L6
mov eax, 0
ret

just shows the multiplication was evaluated to overflow at compile time.

$ x86_64-pc-linux-gnu-gcc -v
Using built-in specs.
COLLECT_GCC=/repo/gcc-trunk/binary-latest-amd64/bin/x86_64-pc-linux-gnu-gcc
COLLECT_LTO_WRAPPER=/repo/gcc-trunk/binary-trunk-r13-1092-20220614173648-g3e16b4359e8-checking-yes-rtl-df-extra-nobootstrap-amd64/bin/../libexec/gcc/x86_64-pc-linux-gnu/13.0.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--disable-bootstrap --with-cloog --with-ppl --with-isl
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=x86_64-pc-linux-gnu --with-ld=/usr/bin/x86_64-pc-linux-gnu-ld
--with-as=/usr/bin/x86_64-pc-linux-gnu-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r13-1092-20220614173648-g3e16b4359e8-checking-yes-rtl-df-extra-nobootstrap-amd64
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.0.0 20220614 (experimental) (GCC)

[PATCH] Fix ICE in extract_insn, at recog.cc:2791

2022-06-14 Thread liuhongt via Gcc-patches

(In reply to Uroš Bizjak from comment #1)
> Instruction does not accept memory operand for operand 3:
>
> (define_insn_and_split
> "*_blendv_ltint"
>   [(set (match_operand: 0 "register_operand" "=Yr,*x,x")
>   (unspec:
> [(match_operand: 1 "register_operand" "0,0,x")
>  (match_operand: 2 "vector_operand" "YrBm,*xBm,xm")
>  (subreg:
>(lt:VI48_AVX
>  (match_operand:VI48_AVX 3 "register_operand" "Yz,Yz,x")
>  (match_operand:VI48_AVX 4 "const0_operand")) 0)]
> UNSPEC_BLENDV))]
>
> The problematic insn is:
>
> (define_insn_and_split "*avx_cmp3_ltint_not"
>  [(set (match_operand:VI48_AVX  0 "register_operand")
>(vec_merge:VI48_AVX
>(match_operand:VI48_AVX 1 "vector_operand")
>(match_operand:VI48_AVX 2 "vector_operand")
>(unspec:
>  [(subreg:VI48_AVX
>   (not:
> (match_operand: 3 "vector_operand")) 0)
>   (match_operand:VI48_AVX 4 "const0_operand")
>   (match_operand:SI 5 "const_0_to_7_operand")]
>   UNSPEC_PCMP)))]
>
> which gets split to the above pattern.
>
> In the preparation statements we have:
>
>   if (!MEM_P (operands[3]))
> operands[3] = force_reg (mode, operands[3]);
>   operands[3] = lowpart_subreg (mode, operands[3], mode);
>
> Which won't fly when operand 3 is memory operand...
>

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
Ok for trunk?

gcc/ChangeLog:

PR target/105953
* config/i386/sse.md (*avx_cmp3_ltint_not): Force_reg
operands[3].

gcc/testsuite/ChangeLog:

* g++.target/i386/pr105953.C: New test.
---
 gcc/config/i386/sse.md   | 3 +--
 gcc/testsuite/g++.target/i386/pr105953.C | 4 
 2 files changed, 5 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/g++.target/i386/pr105953.C

diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 75609eaf9b7..3e3d96fe087 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -3643,8 +3643,7 @@ (define_insn_and_split "*avx_cmp3_ltint_not"
  gen_lowpart (mode, operands[1]));
   operands[2] = gen_lowpart (mode, operands[2]);
 
-  if (!MEM_P (operands[3]))
-operands[3] = force_reg (mode, operands[3]);
+  operands[3] = force_reg (mode, operands[3]);
   operands[3] = lowpart_subreg (mode, operands[3], mode);
 })
 
diff --git a/gcc/testsuite/g++.target/i386/pr105953.C 
b/gcc/testsuite/g++.target/i386/pr105953.C
new file mode 100644
index 000..b423d2dfdae
--- /dev/null
+++ b/gcc/testsuite/g++.target/i386/pr105953.C
@@ -0,0 +1,4 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -mavx512vl -mabi=ms" } */
+
+#include "pr100738-1.C"
-- 
2.18.1

[Bug objc/101666] Objective-C frontend crashes with `-fobjc-nilcheck`

2022-06-14 Thread iains at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101666

Iain Sandoe  changed:

   What|Removed |Added

 Status|RESOLVED|REOPENED
 Resolution|FIXED   |---

--- Comment #8 from Iain Sandoe  ---
needed on 11.x too.

Re: [PATCH V2]rs6000: Store complicated constant into pool

2022-06-14 Thread Segher Boessenkool

Hi!

On Tue, Jun 14, 2022 at 09:23:55PM +0800, Jiufu Guo wrote:
> This patch reduces the threshold of instruction number for storing
> constant to pool and update cost for constant and mem accessing.
> And then if building the constant needs more than 2 instructions (or
> more than 1 instruction on P10), then prefer to load it from constant
> pool.

Have you tried with different limits?  And, p10 is a red herring, you
actually test if prefixed insns are used.

>   * config/rs6000/rs6000.cc (rs6000_cannot_force_const_mem):
>   Exclude rtx with code 'HIGH'.

> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -9706,8 +9706,9 @@ rs6000_init_stack_protect_guard (void)
>  static bool
>  rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
>  {
> -  if (GET_CODE (x) == HIGH
> -  && GET_CODE (XEXP (x, 0)) == UNSPEC)
> +  /* Exclude CONSTANT HIGH part.  e.g.
> + (high:DI (symbol_ref:DI ("var") [flags 0xc0] )).  */
> +  if (GET_CODE (x) == HIGH)
>  return true;

So, why is this?  You didn't explain.

> @@ -11139,7 +11140,7 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode 
> mode)
>   && FP_REGNO_P (REGNO (operands[0])))
>  || !CONST_INT_P (operands[1])
>  || (num_insns_constant (operands[1], mode)
> -> (TARGET_CMODEL != CMODEL_SMALL ? 3 : 2)))
> +> (TARGET_PREFIXED ? 1 : 2)))
>  && !toc_relative_expr_p (operands[1], false, NULL, NULL)
>  && (TARGET_CMODEL == CMODEL_SMALL
>  || can_create_pseudo_p ()

This is the more obvious part.

> @@ -22101,6 +22102,14 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> outer_code,
>  
>  case CONST_DOUBLE:
>  case CONST_WIDE_INT:
> +  /* It may needs a few insns for const to SET. -1 for outer SET code.  
> */
> +  if (outer_code == SET)
> + {
> +   *total = COSTS_N_INSNS (num_insns_constant (x, mode)) - 1;
> +   return true;
> + }
> +  /* FALLTHRU */
> +
>  case CONST:
>  case HIGH:
>  case SYMBOL_REF:

But this again isn't an obvious improvement at all, and needs
performance testing -- separately of the other changes.

> @@ -22110,8 +22119,12 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
> outer_code,
>  case MEM:
>/* When optimizing for size, MEM should be slightly more expensive
>than generating address, e.g., (plus (reg) (const)).
> -  L1 cache latency is about two instructions.  */
> -  *total = !speed ? COSTS_N_INSNS (1) + 1 : COSTS_N_INSNS (2);
> +  L1 cache latency is about two instructions.
> +  For prefixed load (pld), we would set it slightly faster than
> +  than two instructions. */
> +  *total = !speed
> +  ? COSTS_N_INSNS (1) + 1
> +  : TARGET_PREFIXED ? COSTS_N_INSNS (2) - 1 : COSTS_N_INSNS (2);
>if (rs6000_slow_unaligned_access (mode, MEM_ALIGN (x)))
>   *total += COSTS_N_INSNS (100);
>return true;

And this is completely independent of the rest as well.  Cost 5 or 7 are
completely random numbers, why did you pick these?  Does it work better
than 8, or 4, etc.?


> --- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> +++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c
> @@ -1,7 +1,7 @@
>  /* { dg-do compile { target { powerpc*-*-* } } } */
>  /* { dg-require-effective-target lp64 } */
>  /* { dg-options "-O" } */
> -/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */
> +/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */

Why?  This is still better generated in code, no?  It should never be
loaded from a constant pool (it is hex 4000___, easy to
construct with just one or two insns).

> --- a/gcc/testsuite/gcc.target/powerpc/pr93012.c
> +++ b/gcc/testsuite/gcc.target/powerpc/pr93012.c
> @@ -10,4 +10,4 @@ unsigned long long mskh1() { return 0x92349234ULL; }
>  unsigned long long mskl1() { return 0x2bcd2bcdULL; }
>  unsigned long long mskse() { return 0x12341234ULL; }
>  
> -/* { dg-final { scan-assembler-times {\mrldimi\M} 7 } } */
> +/* { dg-final { scan-assembler-times {\mrldimi|ld|pld\M} 7 } } */

Please make this the exact number of times you want to see rldimi and
the number of times you want a load.

Btw, you need to write
  \m(?:rldimi|ld|pld)\M
or it will mean
  \mrldimi
or
  ld
or
  pld\M
(and that "ld" will match anything that "pld$" will match of course).


So no doubt this will improve things, but we need testing of each part
separately.  Also look at code size, or differences in the generated
code in general: this is much more sensitive to detect than performance,
and is not itself sensitive to things like system load, so a) is easier
to measure, and b) has more useful outputs, outputs that tell more of
the whole story.


Segher

Make cp-demangle non-recursive

2022-06-14 Thread Mohamed Atef via Gcc

Hi,
Are there any further details about this project?
Thanks
   Mohamed

libgo patch committed: Format the syscall package

2022-06-14 Thread Ian Lance Taylor via Gcc-patches

This Go formatter is starting to format documentation comments in some
cases.  As a step toward that in libgo, this patch adds blank lines
after //sys comments in the syscall package where needed, and then
runs the new formatter on the syscall package files.  This is the
libgo version of https://go.dev/cl/407136.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
3f4a86eef4ebc28e394a7108a2353098d2ca4856
diff --git a/gcc/go/gofrontend/MERGE b/gcc/go/gofrontend/MERGE
index 2cf7141c4fa..aeada9f8d0c 100644
--- a/gcc/go/gofrontend/MERGE
+++ b/gcc/go/gofrontend/MERGE
@@ -1,4 +1,4 @@
-0058658a9efb6e5c5faa6f0f65949beea5ddbc98
+bbb3a4347714faee620dc205674510a0f20b81ae
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
diff --git a/libgo/go/syscall/dir_plan9.go b/libgo/go/syscall/dir_plan9.go
index 4ed052de761..1667cbc02f4 100644
--- a/libgo/go/syscall/dir_plan9.go
+++ b/libgo/go/syscall/dir_plan9.go
@@ -184,6 +184,7 @@ func gbit8(b []byte) (uint8, []byte) {
 }
 
 // gbit16 reads a 16-bit number in little-endian order from b and returns it 
with the remaining slice of b.
+//
 //go:nosplit
 func gbit16(b []byte) (uint16, []byte) {
return uint16(b[0]) | uint16(b[1])<<8, b[2:]
diff --git a/libgo/go/syscall/errstr.go b/libgo/go/syscall/errstr.go
index 6c2441d364d..59f7a82c6d7 100644
--- a/libgo/go/syscall/errstr.go
+++ b/libgo/go/syscall/errstr.go
@@ -4,8 +4,8 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build !hurd
-// +build !linux
+//go:build !hurd && !linux
+// +build !hurd,!linux
 
 package syscall
 
diff --git a/libgo/go/syscall/errstr_glibc.go b/libgo/go/syscall/errstr_glibc.go
index 5b19e6f202d..03a327dbc90 100644
--- a/libgo/go/syscall/errstr_glibc.go
+++ b/libgo/go/syscall/errstr_glibc.go
@@ -7,6 +7,7 @@
 // We use this rather than errstr.go because on GNU/Linux sterror_r
 // returns a pointer to the error message, and may not use buf at all.
 
+//go:build hurd || linux
 // +build hurd linux
 
 package syscall
diff --git a/libgo/go/syscall/exec_bsd.go b/libgo/go/syscall/exec_bsd.go
index 86e513efdea..e631593cbd9 100644
--- a/libgo/go/syscall/exec_bsd.go
+++ b/libgo/go/syscall/exec_bsd.go
@@ -49,6 +49,7 @@ func runtime_AfterForkInChild()
 // For the same reason compiler does not race instrument it.
 // The calls to RawSyscall are okay because they are assembly
 // functions that do not grow the stack.
+//
 //go:norace
 func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir *byte, 
attr *ProcAttr, sys *SysProcAttr, pipe int) (pid int, err Errno) {
// Declare all variables at top in case any
diff --git a/libgo/go/syscall/exec_freebsd.go b/libgo/go/syscall/exec_freebsd.go
index f02f89d1ca0..8e8ecb7e989 100644
--- a/libgo/go/syscall/exec_freebsd.go
+++ b/libgo/go/syscall/exec_freebsd.go
@@ -57,6 +57,7 @@ func runtime_AfterForkInChild()
 // For the same reason compiler does not race instrument it.
 // The calls to RawSyscall are okay because they are assembly
 // functions that do not grow the stack.
+//
 //go:norace
 func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir *byte, 
attr *ProcAttr, sys *SysProcAttr, pipe int) (pid int, err Errno) {
// Declare all variables at top in case any
diff --git a/libgo/go/syscall/exec_hurd.go b/libgo/go/syscall/exec_hurd.go
index 06df513c55c..a62b3e920e6 100644
--- a/libgo/go/syscall/exec_hurd.go
+++ b/libgo/go/syscall/exec_hurd.go
@@ -49,6 +49,7 @@ func runtime_AfterForkInChild()
 // For the same reason compiler does not race instrument it.
 // The calls to RawSyscall are okay because they are assembly
 // functions that do not grow the stack.
+//
 //go:norace
 func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir *byte, 
attr *ProcAttr, sys *SysProcAttr, pipe int) (pid int, err Errno) {
// Declare all variables at top in case any
diff --git a/libgo/go/syscall/exec_linux.go b/libgo/go/syscall/exec_linux.go
index 86fb8e84a66..77846af89e4 100644
--- a/libgo/go/syscall/exec_linux.go
+++ b/libgo/go/syscall/exec_linux.go
@@ -80,6 +80,7 @@ func runtime_AfterFork()
 func runtime_AfterForkInChild()
 
 // Implemented in clone_linux.c
+//
 //go:noescape
 func rawClone(flags _C_ulong, child_stack *byte, ptid *Pid_t, ctid *Pid_t, 
regs unsafe.Pointer) _C_long
 
@@ -92,6 +93,7 @@ func rawClone(flags _C_ulong, child_stack *byte, ptid *Pid_t, 
ctid *Pid_t, regs
 // For the same reason compiler does not race instrument it.
 // The calls to RawSyscall are okay because they are assembly
 // functions that do not grow the stack.
+//
 //go:norace
 func forkAndExecInChild(argv0 *byte, argv, envv []*byte, chroot, dir *byte, 
attr *ProcAttr, sys *SysProcAttr, pipe int) (pid int, err Errno) {
// Set up and fork. This returns immediately in the parent or
diff --git a/libgo/go/syscall/exec_stubs.go b/libgo/go/syscall/exec_stubs.go
index

Re: [PATCH][_Hashtable] Fix insertion of range of type convertible to value_type PR 105714

2022-06-14 Thread Jonathan Wakely via Gcc-patches

On Wed, 25 May 2022 at 06:10, François Dumont  wrote:
>
> Here is the patch to fix just what is described in PR 105714.
>
>  libstdc++: [_Hashtable] Insert range of types convertible to
> value_type PR 105714
>
>  Fix insertion of range of types convertible to value_type.
>
>  libstdc++-v3/ChangeLog:
>
>  PR libstdc++/105714
>  * include/bits/hashtable_policy.h (_ValueTypeEnforcer): New.
>  * include/bits/hashtable.h
> (_Hashtable<>::_M_insert_unique_aux): New.
>  (_Hashtable<>::_M_insert(_Arg&&, const _NodeGenerator&,
> true_type)): Use latters.
>  (_Hashtable<>::_M_insert(_Arg&&, const _NodeGenerator&,
> false_type)): Likewise.
>  (_Hashtable(_InputIterator, _InputIterator, size_type,
> const _Hash&, const _Equal&,
>  const allocator_type&, true_type)): Use this.insert range.
>  (_Hashtable(_InputIterator, _InputIterator, size_type,
> const _Hash&, const _Equal&,
>  const allocator_type&, false_type)): Use _M_insert.
>  * testsuite/23_containers/unordered_map/cons/56112.cc:
> Check how many times conversion
>  is done.
>  * testsuite/23_containers/unordered_map/insert/105714.cc:
> New test.
>  * testsuite/23_containers/unordered_set/insert/105714.cc:
> New test.
>
> Tested under Linux x64, ok to commit ?

I think "_ConvertToValueType" would be a better name than
_ValueTypeEnforcer, and all overloads of _ValueTypeEnforcer can be
const.

OK with that change, thanks.

[Bug tree-optimization/105983] Failure to optimize (b != 0) && (a >= b) as well as the same pattern with binary and

2022-06-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105983

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Blocks||19987
   Last reconfirmed||2022-06-14
 Ever confirmed|0   |1

--- Comment #2 from Andrew Pinski  ---
Confirmed, the issue is GCC does not even handle:
bool f(unsigned a, unsigned b)
{
bool t = (b != 0);
bool t1 = (a >= b);
return t & t1;
}

I suspect this is a fold-const.cc which has not been moved over to match.pd
yet.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19987
[Bug 19987] [meta-bug] fold missing optimizations in general

[Bug tree-optimization/105983] Failure to optimize (b != 0) && (a >= b) as well as the same pattern with binary and

2022-06-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105983

--- Comment #1 from Andrew Pinski  ---
aarch64 GCC is able to compile it to:
f(unsigned int, unsigned int):
cmp w1, 0
ccmpw1, w0, 2, ne
csetw0, ls
ret

While aarch64 LLVM does:
sub w8, w1, #1
cmp w8, w0
csetw0, lo
ret

depending on the pipeline, they might be the same or the ccmp might be better
slightly.

[Bug tree-optimization/105983] Failure to optimize (b != 0) && (a >= b) as well as the same pattern with binary and

2022-06-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105983

Andrew Pinski  changed:

   What|Removed |Added

   Severity|normal  |enhancement

[Bug tree-optimization/105983] New: Failure to optimize (b != 0) && (a >= b) as well as the same pattern with binary and

2022-06-14 Thread gabravier at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105983

Bug ID: 105983
   Summary: Failure to optimize (b != 0) && (a >= b) as well as
the same pattern with binary and
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gabravier at gmail dot com
  Target Milestone: ---

bool f(unsigned a, unsigned b)
{
return (b != 0) && (a >= b);
}

This can be optimized to `return (b != 0) & (a >= b);`, which is itself
optimized to `return (b - 1) > a;`. GCC outputs code equivalent to `return (b
!= 0) & (a >= b);` (at least on x86) whereas if that code is compiled it would
output `return (b - 1) > a;`, while LLVM has no trouble directly outputting the
optimal code.

[PATCH] libcpp: Handle extended characters in user-defined literal suffix [PR103902]

2022-06-14 Thread Lewis Hyatt via Gcc-patches

Hello-

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103902

The attached patch resolves PR preprocessor/103902 as described in the patch
message inline below. bootstrap + regtest all languages was successful on
x86-64 Linux, with no new failures:

FAIL 103 103
PASS 542338 542371
UNSUPPORTED 15247 15250
UNTESTED 136 136
XFAIL 4166 4166
XPASS 17 17

Please let me know if it looks OK?

A few questions I have:

- A difference introduced with this patch is that after lexing something
like `operator ""_abc', then `_abc' is added to the identifier hash map,
whereas previously it was not. I feel like this must be OK because with the
optional space as in `operator "" _abc', it would be added with or without the
patch.

- The behavior of `#pragma GCC poison' is not consistent (including prior to
  my patch). I tried to make it more so but there is still one thing I want to
  ask about. Leaving aside extended characters for now, the inconsistency is
  that currently the poison is only checked, when the suffix appears as a
  standalone token.

  #pragma GCC poison _X
  bool operator ""_X (unsigned long long);   //accepted before the patch,
 //rejected after it
  bool operator "" _X (unsigned long long);  //rejected either before or after
  const char * operator ""_X (const char *, unsigned long); //accepted before,
//rejected after
  const char * operator "" _X (const char *, unsigned long); //rejected either

  const char * s = ""_X; //accepted before the patch, rejected after it
  const bool b = 1_X; //accepted before or after 

I feel like after the patch, the behavior is the expected behavior for all
cases but the last one. Here, we allow the poisoned identifier because it's
not lexed as an identifier, it's lexed as part of a pp-number. Does it seem OK
like this or does it need to be addressed?

Thanks for taking a look!

-Lewis
Subject: [PATCH] libcpp: Handle extended characters in user-defined literal 
suffix [PR103902]

The PR complains that we do not handle UTF-8 in the suffix for a user-defined
literal, such as:

bool operator ""_π (unsigned long long);

In fact we don't handle any extended identifier characters there, whether
UTF-8, UCNs, or the $ sign. We do handle it fine if the optional space after
the "" tokens is included, since then the identifier is lexed in the "normal"
way as its own token. But when it is lexed as part of the string token, this
is handled in lex_string() with a one-off loop that is not aware of extended
characters.

This patch fixes it by adding a new function scan_cur_identifier() that can be
used to lex an identifier while in the middle of lexing another token. It is
somewhat duplicative of the code in lex_identifier(), which handles the normal
case, but I think there's no good way to avoid that without pessimizing the
usual case, since lex_identifier() takes advantage of the fact that the first
character of the identifier has already been analyzed. The code duplication is
somewhat offset by factoring out the identifier lexing diagnostics (e.g. for
poisoned identifiers), which were formerly duplicated in two places, and have
been factored into their own function that's used in (now) 3 places.

BTW, the other place that was lexing identifiers is lex_identifier_intern(),
which is used to implement #pragma push_macro and #pragma pop_macro. This does
not support extended characters either. I will add that in a subsequent patch,
because it can't directly reuse the new function, but rather needs to lex from
a string instead of a cpp_buffer.

With scan_cur_identifier(), we do also correctly warn about bidi and
normalization issues in the extended identifiers comprising the suffix, and we
check for poisoned identifiers there as well.

PR preprocessor/103902

libcpp/ChangeLog:

* lex.cc (identifier_diagnostics_on_lex): New function refactors
common code from...
(lex_identifier_intern): ...here, and...
(lex_identifier): ...here.
(struct scan_id_result): New struct to hold the result of...
(scan_cur_identifier): ...new function.
(create_literal2): New function.
(is_macro): Removed function that is now handled directly in
lex_string() and lex_raw_string().
(is_macro_not_literal_suffix): Likewise.
(lit_accum::create_literal2): New function.
(lex_raw_string): Make use of new function scan_cur_identifier().
(lex_string): Likewise.

gcc/testsuite/ChangeLog:

* g++.dg/cpp0x/udlit-extended-id-1.C: New test.
* g++.dg/cpp0x/udlit-extended-id-2.C: New test.
* g++.dg/cpp0x/udlit-extended-id-3.C: New test.

diff --git a/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C 
b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
new file mode 100644
index 000..411d4fdd0ba
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/udlit-extended-id-1.C
@@ -0,0 +1,68 @@
+// { dg-do run

[Bug c++/105982] [13 Regression] internal compiler error: in lookup_template_class, at cp/pt.cc:10361

2022-06-14 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105982

Jonathan Wakely  changed:

   What|Removed |Added

   Last reconfirmed||2022-06-14
 Status|UNCONFIRMED |NEW
 CC||ppalka at gcc dot gnu.org
  Known to fail||13.0
   Target Milestone|--- |13.0
  Known to work||12.1.0
 Ever confirmed|0   |1

--- Comment #2 from Jonathan Wakely  ---
Regression started with r13-1045

c++: optimize specialization of nested templated classes

[Bug c++/105982] [13 Regression] internal compiler error: in lookup_template_class, at cp/pt.cc:10361

2022-06-14 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105982

--- Comment #1 from Jonathan Wakely  ---
Created attachment 53138
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53138=edit
-freport-bug output

[Bug middle-end/101836] __builtin_object_size(P->M, 1) where M is an array and the last member of a struct fails

2022-06-14 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836

--- Comment #30 from Jakub Jelinek  ---
grep shows:
common.opt:Common Alias(Wattribute_alias=, 1, 0) Warning
common.opt:Common Alias(Wimplicit-fallthrough=,3,0) Warning
c-family/c.opt:C ObjC C++ ObjC++ Warning Alias(Warray-parameter=, 2, 0)
c-family/c.opt:C++ ObjC++ Warning Alias(Wcatch-value=, 1, 0)
c-family/c.opt:C ObjC C++ LTO ObjC++ Alias(Wdangling-pointer=, 2, 0) Warning
c-family/c.opt:C ObjC C++ ObjC++ Warning Alias(Wformat=, 1, 0)
c-family/c.opt:C ObjC C++ LTO ObjC++ Warning Alias(Wformat-overflow=, 1, 0)
IntegerRange(0, 2)
c-family/c.opt:C ObjC C++ LTO ObjC++ Warning Alias(Wformat-truncation=, 1, 0)
c-family/c.opt:C ObjC C++ LTO ObjC++ Warning Alias(Wstringop-overflow=, 2, 0)
c-family/c.opt:C++ Warning Alias(Wplacement-new=, 1, 0)
c-family/c.opt:C ObjC C++ ObjC++ Warning Alias(Wshift-overflow=, 1, 0)
c-family/c.opt:C ObjC C++ ObjC++ Warning Alias(Wunused-const-variable=, 2, 0)
c-family/c.opt:C++ ObjC++ Alias(faligned-new=,1,0)
fortran/lang.opt:Fortran Alias(ftail-call-workaround=,1,0)

[Bug c++/105982] New: [13 Regression] internal compiler error: in lookup_template_class, at cp/pt.cc:10361

2022-06-14 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105982

Bug ID: 105982
   Summary: [13 Regression] internal compiler error: in
lookup_template_class, at cp/pt.cc:10361
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: redi at gcc dot gnu.org
  Target Milestone: ---

I think this is a new-ish regression. It happens when running the libstdc++
testsuite with -std=c++20 i.e. in the $target/libstdc++-v3 build dir:

make check RUNTESTFLAGS="conformance.exp=21_strings/*/deduction.cc
--target_board=unix/-std=gnu++20"

In file included from
/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/string:52,
 from
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/21_strings/basic_string/cons/char/deduction.cc:20:
/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:
In substitution of 'template
basic_string(typename std::__cxx11::basic_string<_CharT, _Traits,
_Alloc>::_Alloc_traits_impl<_Traits, void>::size_type, _CharT, const _Alloc&)->
std::__cxx11::basic_string<_CharT, _Traits, _Alloc> [with _CharT =
std::allocator; _Traits = std::char_traits >; _Alloc
= std::allocator >;  =
std::allocator >]':
/home/jwakely/src/gcc/gcc/libstdc++-v3/testsuite/21_strings/basic_string/cons/char/deduction.cc:53:
  required from here
/home/jwakely/src/gcc/build/x86_64-pc-linux-gnu/libstdc++-v3/include/bits/basic_string.h:656:
internal compiler error: in lookup_template_class, at cp/pt.cc:10361
0x71a527 lookup_template_class(tree_node*, tree_node*, tree_node*, tree_node*,
int, int)
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:10361
0xb638d1 tsubst_aggr_type
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:13802
0xb638d1 tsubst_aggr_type
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:13753
0xb57c74 tsubst(tree_node*, tree_node*, int, tree_node*)
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:16282
0xb643d4 tsubst_arg_types
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15251
0xb64759 tsubst_arg_types
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15228
0xb64759 tsubst_function_type
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:15406
0xb5766e tsubst(tree_node*, tree_node*, int, tree_node*)
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:16200
0xb3ebe6 tsubst_function_decl
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:14179
0xb4198a tsubst_decl
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:14656
0xb56bf6 instantiate_template(tree_node*, tree_node*, int)
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:21776
0xb6c88a fn_type_unification(tree_node*, tree_node*, tree_node*, tree_node*
const*, unsigned int, tree_node*, unification_kind_t, int, conversion**, bool,
bool)
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:22289
0x960b81 add_template_candidate_real
/home/jwakely/src/gcc/gcc/gcc/cp/call.cc:3555
0x961bc3 add_template_candidate
/home/jwakely/src/gcc/gcc/gcc/cp/call.cc:3643
0x961bc3 add_candidates
/home/jwakely/src/gcc/gcc/gcc/cp/call.cc:6191
0x967d37 add_candidates
/home/jwakely/src/gcc/gcc/gcc/cp/call.cc:4717
0x967d37 perform_overload_resolution
/home/jwakely/src/gcc/gcc/gcc/cp/call.cc:4725
0x968179 perform_dguide_overload_resolution(tree_node*, vec const*, int)
/home/jwakely/src/gcc/gcc/gcc/cp/call.cc:4789
0xb32b93 do_class_deduction
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:30214
0xb32b93 do_auto_deduction(tree_node*, tree_node*, tree_node*, int,
auto_deduction_context, tree_node*, int)
/home/jwakely/src/gcc/gcc/gcc/cp/pt.cc:30302
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug target/90777] [10/11/12/13 Regression] pr84828 testcase ICEs for m32 x86_64,i686-darwin*

2022-06-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=90777

Andrew Pinski  changed:

   What|Removed |Added

 CC||gs...@t-online.de

--- Comment #6 from Andrew Pinski  ---
*** Bug 105979 has been marked as a duplicate of this bug. ***

[Bug target/105979] ICE in change_stack, at reg-stack.cc:2660

2022-06-14 Thread pinskia at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105979

Andrew Pinski  changed:

   What|Removed |Added

 Resolution|--- |DUPLICATE
 Status|UNCONFIRMED |RESOLVED

--- Comment #1 from Andrew Pinski  ---
Dup of bug 90777.

*** This bug has been marked as a duplicate of bug 90777 ***

[Bug middle-end/101836] __builtin_object_size(P->M, 1) where M is an array and the last member of a struct fails

2022-06-14 Thread qinzhao at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836

--- Comment #29 from qinzhao at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #28)
> (In reply to Qing Zhao from comment #27)
> > > Wouldn't this be -fno-strict-flex-arrays, i.e. the current behaviour?
> > 
> > Yes, it’s the same.  =0 is aliased with -fno-strict-flex-arrays.
> 
> That is indeed what we do for many options, -fno-whatever is alias to
> -fwhatever=0 (or -fwhatever=something for options which take enums and not
> numbers).

thank you for the info.
could you point me an example of such option? then I can check to see how to
implement this alias relationship between =0 and -fno-strict-flex-arrays?

[Bug libstdc++/59048] operator== between std::string and const char* slower than strcmp

2022-06-14 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59048

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|--- |13.0
 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #20 from Jonathan Wakely  ---
Fixed for GCC 13.

[Bug libstdc++/62187] std::string==const char* could compare sizes first

2022-06-14 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62187

Jonathan Wakely  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED
   Target Milestone|--- |13.0

--- Comment #9 from Jonathan Wakely  ---
Fixed for GCC 13.

[Bug libstdc++/105957] __n * sizeof(_Tp) might overflow under consteval context for std::allocator

2022-06-14 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105957

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|--- |12.2

--- Comment #3 from Jonathan Wakely  ---
Fixed on trunk. gcc-12 backport to follow.

[committed] libstdc++: Check lengths first in operator== for basic_string [PR62187]

2022-06-14 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

As confirmed by LWG 2852, the calls to traits_type::compare do not need
to be obsvervable, so we can make operator== compare string lengths
first and return immediately for non-equal lengths. This avoids doing a
slow string comparison for "abc...xyz" == "abc...xy". Previously we only
did this optimization for std::char_traits, but we can enable it
unconditionally thanks to LWG 2852.

For comparisons with a const char* we can call traits_type::length right
away to do the same optimization. That strlen call can be folded away
for constant arguments, making it very efficient.

For the pre-C++20 operator== and operator!= overloads we can swap the
order of the arguments to take advantage of the operator== improvements.

libstdc++-v3/ChangeLog:

PR libstdc++/62187
* include/bits/basic_string.h (operator==): Always compare
lengths before checking string contents.
[!__cpp_lib_three_way_comparison] (operator==, operator!=):
Reorder arguments.
---
 libstdc++-v3/include/bits/basic_string.h | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index a34b3d9ed28..57247e306dc 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -3627,17 +3627,10 @@ _GLIBCXX_END_NAMESPACE_CXX11
 operator==(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
   const basic_string<_CharT, _Traits, _Alloc>& __rhs)
 _GLIBCXX_NOEXCEPT
-{ return __lhs.compare(__rhs) == 0; }
-
-  template
-_GLIBCXX20_CONSTEXPR
-inline
-typename __gnu_cxx::__enable_if<__is_char<_CharT>::__value, bool>::__type
-operator==(const basic_string<_CharT>& __lhs,
-  const basic_string<_CharT>& __rhs) _GLIBCXX_NOEXCEPT
-{ return (__lhs.size() == __rhs.size()
- && !std::char_traits<_CharT>::compare(__lhs.data(), __rhs.data(),
-   __lhs.size())); }
+{
+  return __lhs.size() == __rhs.size()
+  && !_Traits::compare(__lhs.data(), __rhs.data(), __lhs.size());
+}
 
   /**
*  @brief  Test equivalence of string and C string.
@@ -3650,7 +3643,10 @@ _GLIBCXX_END_NAMESPACE_CXX11
 inline bool
 operator==(const basic_string<_CharT, _Traits, _Alloc>& __lhs,
   const _CharT* __rhs)
-{ return __lhs.compare(__rhs) == 0; }
+{
+  return __lhs.size() == _Traits::length(__rhs)
+  && !_Traits::compare(__lhs.data(), __rhs, __lhs.size());
+}
 
 #if __cpp_lib_three_way_comparison
   /**
@@ -3691,7 +3687,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
 inline bool
 operator==(const _CharT* __lhs,
   const basic_string<_CharT, _Traits, _Alloc>& __rhs)
-{ return __rhs.compare(__lhs) == 0; }
+{ return __rhs == __lhs; }
 
   // operator !=
   /**
@@ -3717,7 +3713,7 @@ _GLIBCXX_END_NAMESPACE_CXX11
 inline bool
 operator!=(const _CharT* __lhs,
   const basic_string<_CharT, _Traits, _Alloc>& __rhs)
-{ return !(__lhs == __rhs); }
+{ return !(__rhs == __lhs); }
 
   /**
*  @brief  Test difference of string and C string.
-- 
2.34.3

[committed] libstdc++: Inline all basic_string::compare overloads [PR59048]

2022-06-14 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

Defining the compare member functions inline allows calls to
traits_type::length and std::min to be inlined, taking advantage of
constant expression arguments. When not inline, the compiler prefers to
use the explicit instantiation definitions in libstdc++.so and can't
take advantage of constant arguments.

libstdc++-v3/ChangeLog:

PR libstdc++/59048
* include/bits/basic_string.h (compare): Define inline.
* include/bits/basic_string.tcc (compare): Remove out-of-line
definitions.
* include/bits/cow_string.h (compare): Define inline.
* testsuite/21_strings/basic_string/operations/compare/char/3.cc:
New test.
---
 libstdc++-v3/include/bits/basic_string.h  | 63 --
 libstdc++-v3/include/bits/basic_string.tcc| 85 ---
 libstdc++-v3/include/bits/cow_string.h| 63 --
 .../basic_string/operations/compare/char/3.cc |  7 ++
 4 files changed, 123 insertions(+), 95 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/21_strings/basic_string/operations/compare/char/3.cc

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index f76ddf970c6..a34b3d9ed28 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -3235,7 +3235,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   */
   _GLIBCXX20_CONSTEXPR
   int
-  compare(size_type __pos, size_type __n, const basic_string& __str) const;
+  compare(size_type __pos, size_type __n, const basic_string& __str) const
+  {
+   _M_check(__pos, "basic_string::compare");
+   __n = _M_limit(__pos, __n);
+   const size_type __osize = __str.size();
+   const size_type __len = std::min(__n, __osize);
+   int __r = traits_type::compare(_M_data() + __pos, __str.data(), __len);
+   if (!__r)
+ __r = _S_compare(__n, __osize);
+   return __r;
+  }
 
   /**
*  @brief  Compare substring to a substring.
@@ -3263,7 +3273,19 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   _GLIBCXX20_CONSTEXPR
   int
   compare(size_type __pos1, size_type __n1, const basic_string& __str,
- size_type __pos2, size_type __n2 = npos) const;
+ size_type __pos2, size_type __n2 = npos) const
+  {
+   _M_check(__pos1, "basic_string::compare");
+   __str._M_check(__pos2, "basic_string::compare");
+   __n1 = _M_limit(__pos1, __n1);
+   __n2 = __str._M_limit(__pos2, __n2);
+   const size_type __len = std::min(__n1, __n2);
+   int __r = traits_type::compare(_M_data() + __pos1,
+  __str.data() + __pos2, __len);
+   if (!__r)
+ __r = _S_compare(__n1, __n2);
+   return __r;
+  }
 
   /**
*  @brief  Compare to a C string.
@@ -3281,7 +3303,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   */
   _GLIBCXX20_CONSTEXPR
   int
-  compare(const _CharT* __s) const _GLIBCXX_NOEXCEPT;
+  compare(const _CharT* __s) const _GLIBCXX_NOEXCEPT
+  {
+   __glibcxx_requires_string(__s);
+   const size_type __size = this->size();
+   const size_type __osize = traits_type::length(__s);
+   const size_type __len = std::min(__size, __osize);
+   int __r = traits_type::compare(_M_data(), __s, __len);
+   if (!__r)
+ __r = _S_compare(__size, __osize);
+   return __r;
+  }
 
   // _GLIBCXX_RESOLVE_LIB_DEFECTS
   // 5 String::compare specification questionable
@@ -3306,7 +3338,18 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   */
   _GLIBCXX20_CONSTEXPR
   int
-  compare(size_type __pos, size_type __n1, const _CharT* __s) const;
+  compare(size_type __pos, size_type __n1, const _CharT* __s) const
+  {
+   __glibcxx_requires_string(__s);
+   _M_check(__pos, "basic_string::compare");
+   __n1 = _M_limit(__pos, __n1);
+   const size_type __osize = traits_type::length(__s);
+   const size_type __len = std::min(__n1, __osize);
+   int __r = traits_type::compare(_M_data() + __pos, __s, __len);
+   if (!__r)
+ __r = _S_compare(__n1, __osize);
+   return __r;
+  }
 
   /**
*  @brief  Compare substring against a character %array.
@@ -3335,7 +3378,17 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   _GLIBCXX20_CONSTEXPR
   int
   compare(size_type __pos, size_type __n1, const _CharT* __s,
- size_type __n2) const;
+ size_type __n2) const
+  {
+   __glibcxx_requires_string_len(__s, __n2);
+   _M_check(__pos, "basic_string::compare");
+   __n1 = _M_limit(__pos, __n1);
+   const size_type __len = std::min(__n1, __n2);
+   int __r = traits_type::compare(_M_data() + __pos, __s, __len);
+   if (!__r)
+ __r = _S_compare(__n1, __n2);
+   return __r;
+  }
 
 #if __cplusplus >= 202002L
   constexpr bool
diff --git

[committed] libstdc++: Fix indentation in allocator base classes

2022-06-14 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* include/bits/new_allocator.h: Fix indentation.
* include/ext/malloc_allocator.h: Likewise.
---
 libstdc++-v3/include/bits/new_allocator.h   | 6 +++---
 libstdc++-v3/include/ext/malloc_allocator.h | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/libstdc++-v3/include/bits/new_allocator.h 
b/libstdc++-v3/include/bits/new_allocator.h
index 1a5bc51b956..92ae9847f1c 100644
--- a/libstdc++-v3/include/bits/new_allocator.h
+++ b/libstdc++-v3/include/bits/new_allocator.h
@@ -119,9 +119,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   allocate(size_type __n, const void* = static_cast(0))
   {
 #if __cplusplus >= 201103L
-// _GLIBCXX_RESOLVE_LIB_DEFECTS
-// 3308. std::allocator().allocate(n)
-static_assert(sizeof(_Tp) != 0, "cannot allocate incomplete types");
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 3308. std::allocator().allocate(n)
+   static_assert(sizeof(_Tp) != 0, "cannot allocate incomplete types");
 #endif
 
if (__builtin_expect(__n > this->_M_max_size(), false))
diff --git a/libstdc++-v3/include/ext/malloc_allocator.h 
b/libstdc++-v3/include/ext/malloc_allocator.h
index b61e9a85bb2..82b3f0a1c6f 100644
--- a/libstdc++-v3/include/ext/malloc_allocator.h
+++ b/libstdc++-v3/include/ext/malloc_allocator.h
@@ -103,9 +103,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   allocate(size_type __n, const void* = 0)
   {
 #if __cplusplus >= 201103L
-// _GLIBCXX_RESOLVE_LIB_DEFECTS
-// 3308. std::allocator().allocate(n)
-static_assert(sizeof(_Tp) != 0, "cannot allocate incomplete types");
+   // _GLIBCXX_RESOLVE_LIB_DEFECTS
+   // 3308. std::allocator().allocate(n)
+   static_assert(sizeof(_Tp) != 0, "cannot allocate incomplete types");
 #endif
 
if (__builtin_expect(__n > this->_M_max_size(), false))
-- 
2.34.3

[committed] libstdc++: Check for size overflow in constexpr allocation [PR105957]

2022-06-14 Thread Jonathan Wakely via Gcc-patches

Tested powerpc64le-linux, pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

PR libstdc++/105957
* include/bits/allocator.h (allocator::allocate): Check for
overflow in constexpr allocation.
* testsuite/20_util/allocator/105975.cc: New test.
---
 libstdc++-v3/include/bits/allocator.h  |  7 ++-
 .../testsuite/20_util/allocator/105975.cc  | 18 ++
 2 files changed, 24 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/20_util/allocator/105975.cc

diff --git a/libstdc++-v3/include/bits/allocator.h 
b/libstdc++-v3/include/bits/allocator.h
index ee1121b080a..aec0b374fd1 100644
--- a/libstdc++-v3/include/bits/allocator.h
+++ b/libstdc++-v3/include/bits/allocator.h
@@ -184,7 +184,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   allocate(size_t __n)
   {
if (std::__is_constant_evaluated())
- return static_cast<_Tp*>(::operator new(__n * sizeof(_Tp)));
+ {
+   if (__builtin_mul_overflow(__n, sizeof(_Tp), &__n))
+ std::__throw_bad_array_new_length();
+   return static_cast<_Tp*>(::operator new(__n));
+ }
+
return __allocator_base<_Tp>::allocate(__n, 0);
   }
 
diff --git a/libstdc++-v3/testsuite/20_util/allocator/105975.cc 
b/libstdc++-v3/testsuite/20_util/allocator/105975.cc
new file mode 100644
index 000..4342aeade04
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/allocator/105975.cc
@@ -0,0 +1,18 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+
+// PR libstdc++/105957
+
+#include 
+
+consteval bool test_pr105957()
+{
+  std::allocator a;
+  auto n = std::size_t(-1) / (sizeof(long long) - 1);
+  auto p = a.allocate(n); // { dg-error "constexpr" }
+  a.deallocate(p, n);
+  return true;
+}
+static_assert( test_pr105957() );
+
+// { dg-error "throw_bad_array_new_length" "" { target *-*-* } 0 }
-- 
2.34.3

[Bug libstdc++/62187] std::string==const char* could compare sizes first

2022-06-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62187

--- Comment #8 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:6abe341558abec40c9c44d76e7fb4fb3978e894b

commit r13-1096-g6abe341558abec40c9c44d76e7fb4fb3978e894b
Author: Jonathan Wakely 
Date:   Tue Jun 14 16:19:32 2022 +0100

libstdc++: Check lengths first in operator== for basic_string [PR62187]

As confirmed by LWG 2852, the calls to traits_type::compare do not need
to be obsvervable, so we can make operator== compare string lengths
first and return immediately for non-equal lengths. This avoids doing a
slow string comparison for "abc...xyz" == "abc...xy". Previously we only
did this optimization for std::char_traits, but we can enable it
unconditionally thanks to LWG 2852.

For comparisons with a const char* we can call traits_type::length right
away to do the same optimization. That strlen call can be folded away
for constant arguments, making it very efficient.

For the pre-C++20 operator== and operator!= overloads we can swap the
order of the arguments to take advantage of the operator== improvements.

libstdc++-v3/ChangeLog:

PR libstdc++/62187
* include/bits/basic_string.h (operator==): Always compare
lengths before checking string contents.
[!__cpp_lib_three_way_comparison] (operator==, operator!=):
Reorder arguments.

[Bug libstdc++/59048] operator== between std::string and const char* slower than strcmp

2022-06-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59048

--- Comment #19 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:1b65779f46f16b4fffd0591f5e58722c1e7cde8d

commit r13-1095-g1b65779f46f16b4fffd0591f5e58722c1e7cde8d
Author: Jonathan Wakely 
Date:   Tue Jun 14 14:54:27 2022 +0100

libstdc++: Inline all basic_string::compare overloads [PR59048]

Defining the compare member functions inline allows calls to
traits_type::length and std::min to be inlined, taking advantage of
constant expression arguments. When not inline, the compiler prefers to
use the explicit instantiation definitions in libstdc++.so and can't
take advantage of constant arguments.

libstdc++-v3/ChangeLog:

PR libstdc++/59048
* include/bits/basic_string.h (compare): Define inline.
* include/bits/basic_string.tcc (compare): Remove out-of-line
definitions.
* include/bits/cow_string.h (compare): Define inline.
* testsuite/21_strings/basic_string/operations/compare/char/3.cc:
New test.

[Bug libstdc++/105957] __n * sizeof(_Tp) might overflow under consteval context for std::allocator

2022-06-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105957

--- Comment #2 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:0a9af7b4ef1b8aa85cc8820acf54d41d1569fc10

commit r13-1093-g0a9af7b4ef1b8aa85cc8820acf54d41d1569fc10
Author: Jonathan Wakely 
Date:   Tue Jun 14 14:37:25 2022 +0100

libstdc++: Check for size overflow in constexpr allocation [PR105957]

libstdc++-v3/ChangeLog:

PR libstdc++/105957
* include/bits/allocator.h (allocator::allocate): Check for
overflow in constexpr allocation.
* testsuite/20_util/allocator/105975.cc: New test.

Re: [PATCH 2/5] xtensa: Add support for sibling call optimization

2022-06-14 Thread Max Filippov via Gcc-patches

linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr90949.c   -O2  execution test
FAIL: gcc.c-torture/execute/pr90949.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/pr90949.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/pr90949.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: gcc.c-torture/execute/printf-2.c   -O2  execution test
FAIL: gcc.c-torture/execute/printf-2.c   -O3 -g  execution test
FAIL: gcc.c-torture/execute/printf-2.c   -Os  execution test
FAIL: gcc.c-torture/execute/printf-2.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.c-torture/execute/printf-2.c   -O2 -flto -fuse-linker-plugin
-fno-fat-lto-objects  execution test
FAIL: gcc.dg/packed-array.c execution test
FAIL: gcc.dg/pr20115.c execution test
FAIL: gcc.dg/pr44404.c execution test
FAIL: gcc.dg/pr81292-2.c execution test
FAIL: gcc.dg/strlenopt-31.c execution test
FAIL: gcc.dg/strlenopt-81.c execution test
FAIL: gcc.dg/torture/builtin-complex-1.c   -O2  execution test
FAIL: gcc.dg/torture/builtin-complex-1.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
FAIL: gcc.dg/torture/builtin-complex-1.c   -O3 -g  execution test
FAIL: gcc.dg/torture/builtin-complex-1.c   -Os  execution test
FAIL: gcc.dg/torture/builtin-complex-1.c   -O2 -flto
-fno-use-linker-plugin -flto-partition=none  execution test
FAIL: gcc.dg/torture/pr56661.c   -Os  execution test
FAIL: gcc.dg/torture/pr65077.c   -O2  execution test
FAIL: gcc.dg/torture/pr65077.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
FAIL: gcc.dg/torture/pr65077.c   -O3 -g  execution test
FAIL: gcc.dg/torture/pr65077.c   -Os  execution test
FAIL: gcc.dg/torture/pr65077.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.dg/torture/pr67916.c   -O2  execution test
FAIL: gcc.dg/torture/pr67916.c   -O3 -fomit-frame-pointer
-funroll-loops -fpeel-loops -ftracer -finline-functions  execution
test
FAIL: gcc.dg/torture/pr67916.c   -O3 -g  execution test
FAIL: gcc.dg/torture/pr67916.c   -Os  execution test
FAIL: gcc.dg/torture/pr67916.c   -O2 -flto -fno-use-linker-plugin
-flto-partition=none  execution test
FAIL: gcc.dg/tree-ssa/cswtch-3.c execution test
FAIL: gcc.dg/tree-ssa/predcom-dse-5.c execution test
FAIL: gcc.dg/tree-ssa/predcom-dse-6.c execution test
FAIL: gcc.dg/tree-ssa/predcom-dse-7.c execution test

The code generated for e.g. gcc.c-torture/execute/921208-2.c looks like this:

   .file   "921208-2.c"
   .text
   .literal_position
   .align  4
   .global g
   .type   g, @function
g:
   ret.n
   .size   g, .-g
   .literal_position
   .literal .LC1, g@PLT
   .literal .LC3, 1072693248
   .literal .LC4, 1073741824
   .align  4
   .global f
   .type   f, @function
f:
   addisp, sp, -16
   s32i.n  a13, sp, 4
   l32ra13, .LC3
   s32i.n  a12, sp, 8
   s32i.n  a14, sp, 0
   movi.n  a12, 0
   l32ra14, .LC1
   s32i.n  a0, sp, 12
   mov.n   a3, a13
   mov.n   a4, a12
   mov.n   a5, a13
   mov.n   a2, a12
   callx0  a14
   l32i.n  a0, sp, 12
   l32i.n  a14, sp, 0
   mov.n   a4, a12
   mov.n   a5, a13
   l32i.n  a12, sp, 8
   l32i.n  a13, sp, 4
   l32ra3, .LC4
   movi.n  a2, 0
   addisp, sp, 16
   jx  a14
   .size   f, .-f
   .section.text.startup,"ax",@progbits
   .literal_position
   .literal .LC5, f@PLT
   .literal .LC6, exit@PLT
   .align  4
   .global main
   .type   main, @function
main:
   addisp, sp, -16
   l32ra2, .LC5
   s32i.n  a0, sp, 12
   callx0  a2
   l32ra3, .LC6
   movi.n  a2, 0
   callx0  a3
   .size   main, .-main
   .ident  "GCC: (GNU) 13.0.0 20220614 (experimental)"

-- 
Thanks.
-- Max

[Bug target/105975] OpenMP/nvptx offloading: 'internal compiler error: in maybe_legitimize_operand, at optabs.cc:7785'

2022-06-14 Thread rsandifo at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105975

--- Comment #1 from rsandifo at gcc dot gnu.org  
---
Could you point to a specific test and give command-line arguments?
I'm not set up to do an nvptx test run.

[Bug fortran/105954] ICE in gfc_element_size, at fortran/target-memory.cc:132

2022-06-14 Thread anlauf at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105954

--- Comment #4 from anlauf at gcc dot gnu.org ---
Anyway, there's likely an ordering issue looking at array bounds and using
them.  Moving the type decl to a module, the problem seems to disappear:

module m
  implicit none
  integer, parameter :: n = -1
  real:: a(3,2:n)
  type t
 real :: b(3,2:n)
  end type
end module m
program p
  use m
  implicit none
  type(t) :: d
  integer, parameter :: k = sizeof(a)
  integer, parameter :: j = sizeof(d)
  integer, parameter :: l = storage_size(d)
end

This compiles and produces the right numbers.

[Bug target/34422] Bootstrap error with --enable-fixed-point (configure should reject --enable-fixed-point on platforms that don't support it)

2022-06-14 Thread egallager at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=34422

Eric Gallager  changed:

   What|Removed |Added

   Keywords||patch
URL||https://gcc.gnu.org/piperma
   ||il/gcc-patches/2022-June/59
   ||6654.html

--- Comment #11 from Eric Gallager  ---
Patch posted to gcc-patches:
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596654.html

[PATCH] gcc/configure.ac: fix --enable-fixed-point enablement [PR34422]

2022-06-14 Thread Eric Gallager via Gcc-patches

So, in investigating PR target/34422, I discovered that the gcc
subdirectory's configure script had an instance of AC_ARG_ENABLE with
3rd and 4th its arguments reversed: the one where it warns that the
--enable-fixed-point flag is being ignored is the one where that flag
hasn't even been passed in the first place. The attached patch puts
the warning in the correct argument to the macro in question. (I'm not
including the regeneration of gcc/configure in the patch this time
since that confused people last time.) OK to commit, with an
appropriate ChangeLog?


patch-gcc_configure.diff
Description: Binary data

[Bug fortran/105954] ICE in gfc_element_size, at fortran/target-memory.cc:132

2022-06-14 Thread anlauf at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105954

--- Comment #3 from anlauf at gcc dot gnu.org ---
(In reply to anlauf from comment #2)
> Reduced testcase:
> 
>   integer, parameter :: m = sizeof(d) ! ICE for n < 1

In target-memory.cc we run into int_size_in_bytes(), which returns -12
for n=0, and -24 for n=-1, and so on...

(gdb) l
127 gfc_typespec ts;
128 HOST_WIDE_INT size;
129 ts = e->ts;
130 type = gfc_typenode_for_spec ();
131 size = int_size_in_bytes (type);
132 gcc_assert (size >= 0);
133 *siz = size;
134   }

It is strange that -fdump-fortran-original shows the same output for n=0 and
n=-1 when commenting out the ICEing line, so I wonder where this difference
comes from.  Some help interpreting tree type here might be helpful...

[Bug libstdc++/97944] 30_threads/jthread/95989.cc fails randomly

2022-06-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97944

--- Comment #9 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:29b676bcf10b0a6c04e8acdf91f18b28bf5b1501

commit r10-10835-g29b676bcf10b0a6c04e8acdf91f18b28bf5b1501
Author: Jonathan Wakely 
Date:   Tue Nov 24 23:22:01 2020 +

libstdc++: Disable failing test [PR97944]

Disable this test on the branch. It's already disabled for gcc-11 and
later.

libstdc++-v3/ChangeLog:

PR libstdc++/97944
* testsuite/30_threads/jthread/95989.cc: Mark XFAIL.

[PATCH] i386: Disallow sibcall when calling ifunc functions with PIC register

2022-06-14 Thread H.J. Lu via Gcc-patches

Disallow siball when calling ifunc functions with PIC register so that
PIC register can be restored.

gcc/

PR target/105960
* config/i386/i386.cc (ix86_function_ok_for_sibcall): Return
false if PIC register is used when calling ifunc functions.

gcc/testsuite/

PR target/105960
* gcc.target/i386/pr105960.c: New test.
---
 gcc/config/i386/i386.cc  |  9 +
 gcc/testsuite/gcc.target/i386/pr105960.c | 19 +++
 2 files changed, 28 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/i386/pr105960.c

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 3d189e124e4..1ca7836e11e 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -1015,6 +1015,15 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
}
 }
 
+  if (decl && ix86_use_pseudo_pic_reg ())
+{
+  /* When PIC register is used, it must be restored after ifunc
+function returns.  */
+   cgraph_node *node = cgraph_node::get (decl);
+   if (node && node->ifunc_resolver)
+return false;
+}
+
   /* Otherwise okay.  That also includes certain types of indirect calls.  */
   return true;
 }
diff --git a/gcc/testsuite/gcc.target/i386/pr105960.c 
b/gcc/testsuite/gcc.target/i386/pr105960.c
new file mode 100644
index 000..db137a1642d
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr105960.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-require-ifunc "" } */
+/* { dg-options "-O2 -fpic" } */
+
+__attribute__((target_clones("default","fma")))
+static inline double
+expfull_ref(double x)
+{
+  return __builtin_pow(x, 0.1234);
+}
+
+double
+exp_ref(double x)
+{
+  return expfull_ref(x);
+}
+
+/* { dg-final { scan-assembler "jmp\[ \t\]*expfull_ref@PLT" { target { ! ia32 
} } } } */
+/* { dg-final { scan-assembler "call\[ \t\]*expfull_ref@PLT" { target ia32 } } 
} */
-- 
2.36.1

[Bug c/105981] New: Wrong code generated when compiling for arm cortex-a72 in AARCH32 with -mbig-endian

2022-06-14 Thread gjimenez at teldat dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105981

Bug ID: 105981
   Summary: Wrong code generated when compiling for arm cortex-a72
in AARCH32 with -mbig-endian
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gjimenez at teldat dot com
  Target Milestone: ---

Created attachment 53137
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53137=edit
files with the source and results for cortex-a72 and cortex-a53

When compiling a file("test.c") with a simple function like this:
void test(char *buf)
{
__builtin_strcpy(buf, "abcd1234");
}

With this:
../x-tools/armeb-none-eabi/bin/armeb-none-eabi-gcc -c -Wall -Wextra -O3
-mbig-endian -ffreestanding -fbuiltin -nostdinc -save-temps -mcpu=cortex-a72
test.c -o test.a72

The code generated is:
.LC0:
.ascii  "abcd1234\000"
.text
.align  2
.global test
.syntax unified
.arm
.type   test, %function
test:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movwr3, #:lower16:.LC0
movtr3, #:upper16:.LC0
strdr4, [sp, #-8]!
ldrdr4, [r3]
ldrbr3, [r3, #8]@ zero_extendqisi2
str r5, [r0]@ unaligned
str r4, [r0, #4]@ unaligned
ldrdr4, [sp]
add sp, sp, #8
strbr3, [r0, #8]
bx  lr


String read:
In "ldrdr4, [r3]" It reads the string "abcd1234" into r4 and r5. (I am
skipping the null string terminator. It is written ok).


Buffer fill:
Then, it writes r5 into the start of the buffer "str r5, [r0]"
And then r4 in the offset 4 of the buffer "str r4, [r0, #4]".

The result is that we get the buffer with the string 32bits-swapped "1234abcd".

*** WRONG RESULT ***






But, if we compile the same file with -mcpu=cortex-a53 it works fine:
.LC0:
.ascii  "abcd1234\000"
.text
.align  2
.global test
.syntax unified
.arm
.type   test, %function
test:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
movwr3, #:lower16:.LC0
movtr3, #:upper16:.LC0
mov r2, r0
ldmia   r3!, {r0, r1}
str r0, [r2]@ unaligned
str r1, [r2, #4]@ unaligned
ldrbr3, [r3]@ zero_extendqisi2
strbr3, [r2, #8]
bx  lr

String read:
ldmia   r3!, {r0, r1}
Buffer fill:
str r0, [r2]@ unaligned
str r1, [r2, #4]@ unaligned

*** CORRECT ORDER ***




The gcc release:
git log -1
commit 1ea978e3066ac565a1ec28a96a4d61eaf38e2726 (HEAD, tag:
releases/gcc-12.1.0)
Author: Jakub Jelinek 
Date:   Fri May 6 07:07:53 2022 +

Update ChangeLog and version files for release


git remote -v
origin  git://gcc.gnu.org/git/gcc.git (fetch)
origin  git://gcc.gnu.org/git/gcc.git (push)




The output when I add the "-v" option:

../x-tools/armeb-none-eabi/bin/armeb-none-eabi-gcc -v -c -Wall -Wextra -O3
-mbig-endian -ffreestanding -fbuiltin -nostdinc -save-temps -mcpu=cortex-a72
test.c -o test.a72
Using built-in specs.
COLLECT_GCC=../x-tools/armeb-none-eabi/bin/armeb-none-eabi-gcc
Target: armeb-none-eabi
Configured with:
/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/src/gcc/configure
--build=x86_64-build_pc-linux-gnu --host=x86_64-build_pc-linux-gnu
--target=armeb-none-eabi --prefix=/home/gjimenez/x-tools/armeb-none-eabi
--exec_prefix=/home/gjimenez/x-tools/armeb-none-eabi
--with-local-prefix=/home/gjimenez/x-tools/armeb-none-eabi/armeb-none-eabi
--with-headers=/home/gjimenez/x-tools/armeb-none-eabi/armeb-none-eabi/include
--with-newlib --enable-threads=no --disable-shared
--with-pkgversion='crosstool-NG 1.25.0.36_a3e3d73' --enable-__cxa_atexit
--disable-libgomp --disable-libmudflap --disable-libmpx --disable-libssp
--disable-libquadmath --disable-libquadmath-support --disable-libstdcxx-verbose
--with-gmp=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools
--with-mpfr=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools
--with-mpc=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools
--with-isl=/home/gjimenez/gcc/crosstool-ng/.build/armeb-none-eabi/buildtools
--disable-lto --enable-target-optspace --disable-nls --enable-multiarch
--with-multilib-list='rmprofile aprofile' --enable-languages=c,c++
Thread model: single
Supported LTO compression algorithms: zlib
gcc version 12.1.0 (crosstool-NG 1.25.0.36_a3e3d73)
COLLECT_GCC_OPTIONS='-v' '-c' '-Wall' '-Wextra' '-O3' '-mbig-endian'
'-ffreestanding' '-fbuiltin' '-nostdinc' '-save-temps' '-mcpu=cortex-a72' '-o'
'test.a72' '-mfloat-abi=soft'

[Bug libstdc++/99290] std::filesystem::copy does not always report errors for recursion

2022-06-14 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99290

Jonathan Wakely  changed:

   What|Removed |Added

   Target Milestone|11.4|10.4

--- Comment #7 from Jonathan Wakely  ---
Also fixed for 10.4 now.

[Bug libstdc++/99290] std::filesystem::copy does not always report errors for recursion

2022-06-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99290

--- Comment #6 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:43cbff3da5a856d1b18a9ad33b337ab829af73ed

commit r10-10833-g43cbff3da5a856d1b18a9ad33b337ab829af73ed
Author: Jonathan Wakely 
Date:   Thu Apr 28 13:06:31 2022 +0100

libstdc++: Fix error reporting in filesystem::copy [PR99290]

The recursive calls to filesystem::copy should stop if any of them
reports an error.

libstdc++-v3/ChangeLog:

PR libstdc++/99290
* src/c++17/fs_ops.cc (fs::copy): Pass error_code to
directory_iterator constructor, and check on each iteration.
* src/filesystem/ops.cc (fs::copy): Likewise.
* testsuite/27_io/filesystem/operations/copy.cc: Check for
errors during recursion.
* testsuite/experimental/filesystem/operations/copy.cc:
Likewise.

(cherry picked from commit 4e117418fb71f508c479e0144500f4da9cc92520)

[Bug fortran/105954] ICE in gfc_element_size, at fortran/target-memory.cc:132

2022-06-14 Thread anlauf at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105954

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 CC||anlauf at gcc dot gnu.org
   Last reconfirmed||2022-06-14
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #2 from anlauf at gcc dot gnu.org ---
Confirmed.

Reduced testcase:

program p
  implicit none
  integer, parameter :: n = 0 !-1
  real:: a(3,2:n)
  type t
 real :: a(3,2:n)
  end type
  type(t) :: d
  integer, parameter :: k = sizeof(a) ! OK
  integer, parameter :: m = sizeof(d) ! ICE for n < 1
end

Re: [PATCH] opts: improve option suggestion

2022-06-14 Thread Jeff Law via Gcc-patches





On 5/24/2022 2:45 AM, Martin Liška wrote:

PING^1

On 5/12/22 09:10, Martin Liška wrote:

On 5/11/22 20:49, David Malcolm wrote:

On Wed, 2022-05-11 at 16:49 +0200, Martin Liška wrote:

In case where we have 2 equally good candidates like
-ftrivial-auto-var-init=
-Wtrivial-auto-var-init

for -ftrivial-auto-var-init, we should take the candidate that
has a difference in trailing sign symbol.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

 PR driver/105564

gcc/ChangeLog:

 * spellcheck.cc (test_find_closest_string): Add new test.
 * spellcheck.h (class best_match): Prefer a difference in
 trailing sign symbol.

OK
jeff

[Bug c++/105980] New: [11/12/13 Regression] ICE in final_scan_insn_1, at final.cc:2811

2022-06-14 Thread gscfq--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105980

Bug ID: 105980
   Summary: [11/12/13 Regression] ICE in final_scan_insn_1, at
final.cc:2811
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gs...@t-online.de
  Target Milestone: ---

Started with r11 between 20201115 and 20201122, using -m16 or -m32 :
(with files pr41257.C, pr47366.C, pr83549.C, pr83619.C, etc.)


$ gcc-13-20220612 -c pr41257.C -fpie -m32 -mforce-indirect-call
pr41257.C: In member function 'virtual void C::_ZTv0_n16_N1CD1Ev()':
pr41257.C:20:1: error: insn does not satisfy its constraints:
   20 | }
  | ^
(insn 6 5 7 (set (reg:SI 82)
(plus:SI (reg:SI 82)
(const:SI (unspec:SI [
(symbol_ref:SI ("*.LTHUNK2") [flags 0x3] )
] UNSPEC_GOTOFF "pr41257.C":12:8 227 {*leasi}
 (nil))
pr41257.C:20:1: internal compiler error: in final_scan_insn_1, at final.cc:2811
0x698229 _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
../../gcc/rtl-error.cc:108
0x698252 _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
../../gcc/rtl-error.cc:118
0xa85023 final_scan_insn_1
../../gcc/final.cc:2811
0xa8505b final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
../../gcc/final.cc:2940
0xa85304 final_1
../../gcc/final.cc:1997
0x10c6f52 x86_output_mi_thunk
../../gcc/config/i386/i386.cc:21482
0x98b867 expand_thunk(cgraph_node*, bool, bool)
../../gcc/symtab-thunks.cc:388
0x99bca6 cgraph_node::assemble_thunks_and_aliases()
../../gcc/cgraphunit.cc:1757
0x99bc61 cgraph_node::assemble_thunks_and_aliases()
../../gcc/cgraphunit.cc:1779
0x99be90 cgraph_node::expand()
../../gcc/cgraphunit.cc:1898
0x99d10f output_in_order
../../gcc/cgraphunit.cc:2142
0x99d10f symbol_table::compile()
../../gcc/cgraphunit.cc:2346
0x99fa9f symbol_table::compile()
../../gcc/cgraphunit.cc:2533
0x99fa9f symbol_table::finalize_compilation_unit()
../../gcc/cgraphunit.cc:2530

[Bug c++/105979] New: ICE in change_stack, at reg-stack.cc:2660

2022-06-14 Thread gscfq--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105979

Bug ID: 105979
   Summary: ICE in change_stack, at reg-stack.cc:2660
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gs...@t-online.de
  Target Milestone: ---

Started with r9, at -O1, with file g++.dg/ext/pr84828.C :


$ gcc-13-20220612 -c pr84828.C -O1 -mfpmath=387 -fno-tree-loop-optimize
pr84828.C: In function 'void foo(float, double)':
pr84828.C:10:7: error: output constraint 0 must specify a single register
   10 |   asm volatile ("" : "+f" (c)); // { dg-error "must specify a
single register" }
  |   ^~~
during RTL pass: stack
pr84828.C:13:1: internal compiler error: in change_stack, at reg-stack.cc:2660
   13 | }
  | ^
0x103d37c change_stack
../../gcc/reg-stack.cc:2660
0x1041a57 compensate_edge
../../gcc/reg-stack.cc:2941
0x1041a57 compensate_edges
../../gcc/reg-stack.cc:2972
0x1041a57 convert_regs
../../gcc/reg-stack.cc:3276
0x1041a57 reg_to_stack
../../gcc/reg-stack.cc:3385
0x1041a57 rest_of_handle_stack_regs
../../gcc/reg-stack.cc:3441
0x1041a57 execute
../../gcc/reg-stack.cc:3473

[Bug c++/105978] New: ICE: nodes with unreleased memory found

2022-06-14 Thread gscfq--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105978

Bug ID: 105978
   Summary: ICE: nodes with unreleased memory found
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gs...@t-online.de
  Target Milestone: ---

This started with r10 (gcc configured with --enable-checking=yes)
(case from g++.dg/torture/pr43611.C, but other cases ICE earlier)


$ gcc-13-20220612 -c pr43611.C -O2 -fdisable-ipa-inline
cc1plus: note: disable pass ipa-inline for functions in the range of [0,
4294967295]
_ZN1BIiEC2Ev/2 (B<  >::B() [with
 = int]) @0x7ff744966220
  Type: function definition analyzed
  Visibility: semantic_interposition external public comdat
  References:
  Referring: _ZN1BIiEC1Ev/3 (alias)
  Availability: available
  Function flags: count:1073741824 (estimated locally) body
  Called by:
  Calls: _ZN1AIiE4initEi/9 (1073741824 (estimated locally),1.00 per call) (can
throw external) _ZN1AIiE4initEi/9 (1073741824 (estimated locally),1.00 per
call) (can throw external)
pr43611.C:22:12: internal compiler error: nodes with unreleased memory found
   22 | B < int > b;
  |^
0xbc4de9 symbol_table::compile()
../../gcc/cgraphunit.cc:2381
0xbc831f symbol_table::compile()
../../gcc/cgraphunit.cc:2533
0xbc831f symbol_table::finalize_compilation_unit()
../../gcc/cgraphunit.cc:2530

[Bug c/105977] New: [12/13 Regression] ICE in gimple_call_static_chain_flags, at gimple.cc:1636

2022-06-14 Thread gscfq--- via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105977

Bug ID: 105977
   Summary: [12/13 Regression] ICE in
gimple_call_static_chain_flags, at gimple.cc:1636
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c
  Assignee: unassigned at gcc dot gnu.org
  Reporter: gs...@t-online.de
  Target Milestone: ---

Started between 20211107 and 2024, at -Os :
(from gcc.c-torture/execute/920612-2.c)
(gcc configured with --enable-checking=yes)


$ cat z1.c
void f ()
{
  int i = 0;
  int a (int x)
{
  while (x)
i++, x--;
  return x;
}
  if (a (2) != 0)
return;
}


$ gcc-13-20220612 -c z1.c -O2 -fdisable-ipa-inline
cc1: note: disable pass ipa-inline for functions in the range of [0,
4294967295]
$
$ gcc-13-20220612 -c z1.c -Os -fdisable-ipa-inline
cc1: note: disable pass ipa-inline for functions in the range of [0,
4294967295]
during GIMPLE pass: alias
z1.c: In function 'f':
z1.c:1:6: internal compiler error: in gimple_call_static_chain_flags, at
gimple.cc:1636
1 | void f ()
  |  ^
0xa75e29 gimple_call_static_chain_flags(gcall const*)
../../gcc/gimple.cc:1636
0x1086681 handle_rhs_call
../../gcc/tree-ssa-structalias.cc:4345
0x1087fe5 find_func_aliases_for_call
../../gcc/tree-ssa-structalias.cc:5010
0x1087fe5 find_func_aliases
../../gcc/tree-ssa-structalias.cc:5113
0x108b7a6 compute_points_to_sets
../../gcc/tree-ssa-structalias.cc:7536
0x108b7a6 compute_may_aliases()
../../gcc/tree-ssa-structalias.cc:8044
0xd0bbde execute_function_todo
../../gcc/passes.cc:2057
0xd0c712 execute_todo
../../gcc/passes.cc:2139

[Bug rtl-optimization/105041] '-fcompare-debug' failure w/ -mcpu=power6 -O2 -fharden-compares -frename-registers

2022-06-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105041

--- Comment #7 from CVS Commits  ---
The master branch has been updated by Segher Boessenkool :

https://gcc.gnu.org/g:3e16b4359e86b36676ed01219e6deafa95f3c16b

commit r13-1092-g3e16b4359e86b36676ed01219e6deafa95f3c16b
Author: Surya Kumari Jangala 
Date:   Fri Jun 10 19:52:57 2022 +0530

regrename: Fix -fcompare-debug issue in check_new_reg_p [PR105041]

In check_new_reg_p, the nregs of a du chain is computed by obtaining the
MODE of the first element in the chain, and then calling
hard_regno_nregs() with the MODE. But the first element of the chain can
be a DEBUG_INSN whose mode need not be the same as the rest of the
elements in the du chain. This was resulting in fcompare-debug failure
as check_new_reg_p was returning a different result with -g for the same
candidate register. We can instead obtain nregs from the du chain
itself.

2022-06-10  Surya Kumari Jangala  

gcc/
PR rtl-optimization/105041
* regrename.cc (check_new_reg_p): Use nregs value from du chain.

gcc/testsuite/
PR rtl-optimization/105041
* gcc.target/powerpc/pr105041.c: New test.

[Bug tree-optimization/97185] inconsistent builtin elimination for impossible range

2022-06-14 Thread msebor at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97185

--- Comment #2 from Martin Sebor  ---
There's a heuristic for ranges of allocation sizes to exclude zero
(size_range_flags) that comes into play here.  The actual range isn't
"impossible" in the sense it's necessarily invalid.  It just means the string
function call is either a no-op or out of bounds, and so can be eliminated as
an optimization.  With the optimization consistently implemented the warning
will also go away (eliminating the calls will prevent sanitizers from detecting
the out of bounds ones, so that might be a consideration).

In general, a low > high range denoted an anti-range before Ranger was
introduced (i.e., ~[high, low]).  With Ranger it's the corresponding union of
two ranges.  Some of the cruft for dealing with anti-ranges is still around,
such as in get_size_range() in pointer-query.cc.  The code should be migrated
to the irange class and the representation probably also updated to print
something more sensible (e.g., the union [MIN, high) U (low, MAX]; we talked
about introducing a pretty-printer % directive for ranges to make the format
consistent across all diagnostics).

[Bug target/105960] [12/13 Regression] Crash in 32-bit mode

2022-06-14 Thread wwcsmail at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105960

Wolfgang Wander  changed:

   What|Removed |Added

 CC||wwcsmail at gmail dot com

--- Comment #8 from Wolfgang Wander  ---
Thanks H.J,

tried and this indeed fixes the issue!

[Bug middle-end/105951] [12/13 Regression] ICE in emit_store_flag, at expmed.cc:6027

2022-06-14 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105951

Jakub Jelinek  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED
   Assignee|unassigned at gcc dot gnu.org  |jakub at gcc dot gnu.org

--- Comment #3 from Jakub Jelinek  ---
Created attachment 53136
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53136=edit
gcc13-pr105951.patch

Untested fix.

GCC 10.3.1 Status Report (2022-06-14)

2022-06-14 Thread Jakub Jelinek via Gcc

Status
==

The GCC 10 branch is in regression and documentation fixing mode.

After the release of GCC 9.5 it's time to do another release
from the 10 branch - GCC 10.4.  I will do a GCC 10.4 release candidate
next week, June 21st, followed by the release a week after that
if no serious problems arise.

If you have pending backports, please commit them to GCC 10 branch
this week or on Monday next week.  In particular I'd like to see
backport of Honza's PR105739 fix and I'll need to backport PR105732
because it is a P1.

It's also a very good point in time to ensure the branch still builds
and has reasonable testresults for the target you maintain.

Quality Data


Priority  #   Change from last report
---   ---
P11   +   1
P2  407   +  81
P3   52   +  21
P4  204   +  26
P5   24   +   1
---   ---
Total P1-P3 460   + 103
Total   688   + 130


Previous Report
===

https://gcc.gnu.org/pipermail/gcc/2021-April/235360.html

[Bug c/105950] > O2 optimization causes runtime (SIGILL) during main initialization

2022-06-14 Thread jkanapes at yahoo dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105950

--- Comment #26 from John Kanapes  ---
On Tuesday, June 14, 2022 at 06:37:17 PM GMT+3, redi at gcc dot gnu.org
 wrote:  

 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105950

--- Comment #25 from Jonathan Wakely  ---
(In reply to John Kanapes from comment #22)
>> It took you 4 posts to explain me what to do.
>> It took me 4 posts to understand what you were talking about.
>> You should explain better.

>You should read better. Comment 3 is perfectly clear.

>"For UBSan, you can't just compile with -fsanitize=undefined, you need to link
> with that flag as well."

> That 

I am sorry, that's not at all clear to me. It tells me that I first need to
compile with fsanitized and then link with it.Since i couldn't even compile
with it, due to the undefined references to ubsan_handles, how could I link?
I tried to link against the -lubsan library, just to get past compilation, and
you told me that this was wrong.Finally I ended up just linking with that flag.
You have to bear in mind, that us programmers do not uselinker flags and do not
know about them:(

How could I explain it bettter?
"fsanitized=undefined is a linker flag. Example: gcc test.o example.o ${LIBS}
fsanitized=undefined -o test"

(In reply to John Kanapes from comment #23)
>> What do you do with the sources after the ticket?

>They will stay attached here. If you don't want them to be public, you need to
>reduce it to something smaller that still shows the bug (which you've said you
>can't) or put them somewhere online and persuade somebody here to download them
>and try to reproduce and reduce it for you.
That is extra work for you. I wasn;t able to do it, and I know the code. 
But if you can do smt about it when you find the bug, I would be grateful.I
will upload it to my dropbox account and then send you the link.
TIAJohn

Re: [committed] openmp: Conforming device numbers and omp_{initial, invalid}_device

2022-06-14 Thread Thomas Schwinge

Hi Jakub!

On 2022-06-13T14:06:39+0200, Jakub Jelinek via Gcc-patches 
 wrote:
> OpenMP 5.2 changed once more what device numbers are allowed.

> libgomp/

>   * testsuite/libgomp.c-c++-common/target-is-accessible-1.c (main): Add
>   test with omp_initial_device.  Use -5 instead of -1 for negative value
>   test.
>   * testsuite/libgomp.fortran/target-is-accessible-1.f90 (main):
>   Likewise.  Reorder stop numbers.

In an offloading configuration, I'm seeing:

PASS: libgomp.fortran/get-mapped-ptr-1.f90   -O  (test for excess errors)
[-PASS:-]{+FAIL:+} libgomp.fortran/get-mapped-ptr-1.f90   -O  execution test

Does that one need similar treatment?

It FAILs in 'STOP 1'; 'libgomp.fortran/get-mapped-ptr-1.f90':

 1 program main
 2   use omp_lib
 3   use iso_c_binding
 4   implicit none (external, type)
 5   integer :: d, id
 6   type(c_ptr) :: p
 7   integer, target :: q
 8
 9   d = omp_get_default_device ()
10   id = omp_get_initial_device ()
11
12   if (d < 0 .or. d >= omp_get_num_devices ()) &
13 d = id
14
15   p = omp_target_alloc (c_sizeof (q), d)
16   if (.not. c_associated (p)) &
17 stop 0  ! okay
18
19   if (omp_target_associate_ptr (c_loc (q), p, c_sizeof (q), &
20 0_c_size_t, d) == 0) then
21
22 if(c_associated (omp_get_mapped_ptr (c_loc (q), -1))) &
23   stop 1
[...]


Grüße
 Thomas


> --- libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-1.c.jj
> 2022-05-23 21:44:48.950848384 +0200
> +++ libgomp/testsuite/libgomp.c-c++-common/target-is-accessible-1.c   
> 2022-06-13 13:10:56.471535852 +0200
> @@ -17,7 +17,10 @@ main ()
>if (!omp_target_is_accessible (p, sizeof (int), id))
>  __builtin_abort ();
>
> -  if (omp_target_is_accessible (p, sizeof (int), -1))
> +  if (!omp_target_is_accessible (p, sizeof (int), omp_initial_device))
> +__builtin_abort ();
> +
> +  if (omp_target_is_accessible (p, sizeof (int), -5))
>  __builtin_abort ();
>
>if (omp_target_is_accessible (p, sizeof (int), n + 1))
> --- libgomp/testsuite/libgomp.fortran/target-is-accessible-1.f90.jj   
> 2022-05-23 21:44:48.954848343 +0200
> +++ libgomp/testsuite/libgomp.fortran/target-is-accessible-1.f90  
> 2022-06-13 13:12:08.133819977 +0200
> @@ -19,12 +19,15 @@ program main
>if (omp_target_is_accessible (p, c_sizeof (d), id) /= 1) &
>  stop 2
>
> -  if (omp_target_is_accessible (p, c_sizeof (d), -1) /= 0) &
> +  if (omp_target_is_accessible (p, c_sizeof (d), omp_initial_device) /= 1) &
>  stop 3
>
> -  if (omp_target_is_accessible (p, c_sizeof (d), n + 1) /= 0) &
> +  if (omp_target_is_accessible (p, c_sizeof (d), -5) /= 0) &
>  stop 4
>
> +  if (omp_target_is_accessible (p, c_sizeof (d), n + 1) /= 0) &
> +stop 5
> +
>! Currently, a host pointer is accessible if the device supports shared
>! memory or omp_target_is_accessible is executed on the host. This
>! test case must be adapted when unified shared memory is avialable.
> @@ -35,14 +38,14 @@ program main
>  !$omp end target
>
>  if (omp_target_is_accessible (p, c_sizeof (d), d) /= shared_mem) &
> -  stop 5;
> +  stop 6;
>
>  if (omp_target_is_accessible (c_loc (a), 128 * sizeof (a(1)), d) /= 
> shared_mem) &
> -  stop 6;
> +  stop 7;
>
>  do i = 1, 128
>if (omp_target_is_accessible (c_loc (a(i)), sizeof (a(i)), d) /= 
> shared_mem) &
> -stop 7;
> +stop 8;
>  end do
>
>end do
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955

Re: Question on cgraph_edge::call_stmt during LTO

2022-06-14 Thread Martin Jambor

Hello Erick,

sorry for a late reply, I've been recovering from an injury recently.

On Thu, Jun 02 2022, Erick Ochoa wrote:
> Hi Martin,
>
> Thanks for the tips! I have implemented an edge summary which:
>
> * is allocated at IPA analysis phase
> * streamed out in ipcp_write_transformation_summaries
> * streamed in in ipcp_read_transformation_summaries
>
> However, before the implementation of this edge summary we had another
> mechanism of propagating the information all the way until it was used in a
> SIMPLE_IPA_PASS executed after all LGEN stages were finished (after
> all_regular_ipa_passes). After changing the implementation to use edge
> summaries, I find that the information is conserved during inlining (the
> duplication hook prints out the new edges that gets formed via inlining
> with the correct information), however it is not found in the
> SIMPLE_IPA_PASS that gets executed after all_regular_ipa_passes.

I have discussed your situation with Honza and we could not think of a
reason why this is happening to you.  Summaries have destructors, so we
suggest you put a breakpoint in there and see where your summaries get
deallocated.

>
> What is perhaps more interesting is that if I run with -fno-ipa-pure-const
> and no -fno-ipa-modref, I can still see the cgraph_nodes and edges of the
> inlined methods, along with the information needed. But not in the ones
> that have been inlined. I believe this could be just that when these
> options are disabled, cgraph_nodes might not be reclaimed.

In a late SIMPLE_IPA_PASS?  That is really weird, the inlining
transformation code quite clearly removes those.  How do you even get at
these nodes and edges, by traversing the call graph?  If not, you might
indeed be looking at stale data structures.

> I understand that there are many differences between SIMPLE_IPA_PASSes and
> regular IPA_PASSes, but at the moment I am unsure how to narrow down my
> search for a fix. Is this something that could be caused by:
>
> * memory management: (I am not familiar with the memory management in GCC
> and it is a bit difficult to understand.) I have removed the bodies of the
> my_edge_summary::remove (cgraph_edge*) and my_edge_summary::remove
> (cgraph_edge *, my_edge_summary_instance *) so I don't think this might be
> it. However, the class my_edge_summary still copies some of the structure
> in the other transformation summaries, so there is a macro GTY((for_user))
> in the class declaration and the information is stored in a vec  va_gc> *my_info.
> * missing implementation details in the duplicate functions: Looking at
> ipa_edge_args_sum_t::duplicate, it is a relatively complex function. I also
> noticed that it does something else when the dst->caller has been inlined.
> Should I also update the cgraph_edge that disappears when dst->caller is
> inlined to its caller?
> * something else?
>

You probably do not need that complexity.
ipa_edge_args_sum_t::duplicate does kind of reference counting so that
it can then estimate what references should look like after cloning and
inlining, and speculation and speculation-resolutions make this complex.
Unless you need to track something similar or treat speculative edges
especially for some reason, you can just copy your data and be done with
it.

> Any direction is greatly appreciated!

Sorry if I did not help much and good luck.

Martin

>
> On Sat, 21 May 2022 at 00:13, Martin Jambor  wrote:
>
>> Hello,
>>
>> On Fri, May 20 2022, Erick Ochoa via Gcc wrote:
>> > Hi,
>> >
>> > I'm working on a pass that looks into the estimated values during ipa-cp
>> > and stores them for a later analyses/pass. I would like to store the real
>> > arguments' estimates in a cgraph_edge::call_stmt or somewhere else that
>> > makes similar sense. (Note, this is different from the formal parameters'
>> > estimates which can be found in the lattice print out of ipa-cp).
>>
>> the statement is not the right place to store such pass-specific
>> information, for reasons you described and more (especially simple
>> memory use efficiency).
>>
>> Instead they should be placed into an "edge summary" (also sometimes
>> called "call summary"), a structure similar to ipa_edge_args_sum (in
>> ipa-prop.h and ipa-prop.cc).  Unlike ipa_edge_args_sum, which is
>> allocated at analysis phase, then streamed out and in in case of LTO,
>> and used thrown away during the IPA analysis phase, your summary would
>> need to be allocated at IPA analysis time, then streamed out in
>> ipcp_write_transformation_summaries, streamed in in
>> ipcp_read_transformation_summaries so that they can be used in the
>> transformation phase.
>>
>> Usually a simple implementation of the duplication hook of an edge
>> summary is enough for the data to survive cloning and inlining and the
>> like.
>>
>> Martin
>>

[Bug bootstrap/44425] configure should probe prefix for gmp/mpfr/mpc

2022-06-14 Thread egallager at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44425

Eric Gallager  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 CC||rguenth at gcc dot gnu.org
 Resolution|--- |WONTFIX

--- Comment #5 from Eric Gallager  ---
All right never mind; apparently this isn't actually wanted:
https://gcc.gnu.org/pipermail/gcc-patches/2022-June/596639.html

[Bug middle-end/101836] __builtin_object_size(P->M, 1) where M is an array and the last member of a struct fails

2022-06-14 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836

--- Comment #28 from Jakub Jelinek  ---
(In reply to Qing Zhao from comment #27)
> > Wouldn't this be -fno-strict-flex-arrays, i.e. the current behaviour?
> 
> Yes, it’s the same.  =0 is aliased with -fno-strict-flex-arrays.

That is indeed what we do for many options, -fno-whatever is alias to
-fwhatever=0 (or -fwhatever=something for options which take enums and not
numbers).

[Bug c++/105976] New: -Wuse-after-free warning with std::shared_ptr[]>::reset

2022-06-14 Thread anabelsmruggiero at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105976

Bug ID: 105976
   Summary: -Wuse-after-free warning with
std::shared_ptr[]>::reset
   Product: gcc
   Version: 12.1.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: anabelsmruggiero at gmail dot com
  Target Milestone: ---

GCC 12.1 emits a use-after-free warning when calling
std::shared_ptr[]>::reset outside of main:


void testArrOfArr(){
std::shared_ptr[]> testPtr; 
testPtr.reset(new std::shared_ptr< double[] >[6]);
}

void testArr(){
std::shared_ptr[]> testPtr; 
testPtr.reset(new std::shared_ptr< double >[6]);
}

int main(){
// Uncommenting the body of main squelches all warnings
/*
std::shared_ptr[]> testArrPtr;
testArrPtr.reset(new std::shared_ptr< double >[6]);

std::shared_ptr[]> testArrOfArrPtr;
testArrOfArrPtr.reset(new std::shared_ptr< double[] >[6]);
*/
}

Compiler arguments: -std=c++20 -Wall -Wextra -O2 
This warning occurs at or above -02 and disappears below that.

I also have this repro on Godbolt with a few more additional lines that end up
squelching the warning: https://godbolt.org/z/YaKP985es

c++: Elide calls to NOP module initializers

2022-06-14 Thread Nathan Sidwell via Gcc-patches

This implements NOP module initializer elision. Each CMI gains a new 
flag informing importers whether its initializer actually does something 
(initializers its own globals, and/or calls initializers of its 
imports).  This allows an importer to determine whether to call it.


nathan


--
Nathan Sidwell

Re: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-14 Thread Richard Biener via Gcc-patches




> Am 14.06.2022 um 17:58 schrieb Tamar Christina via Gcc-patches 
> :
> 
> 
> 
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Tuesday, June 14, 2022 2:43 PM
>> To: Richard Biener 
>> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org;
>> nd 
>> Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2
>> bitmask
>> 
>> Richard Biener  writes:
 On Mon, 13 Jun 2022, Tamar Christina wrote:
>>> 
> -Original Message-
> From: Richard Biener 
> Sent: Monday, June 13, 2022 12:48 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> 
> Subject: RE: [PATCH 1/2]middle-end Support optimized division by
> pow2 bitmask
> 
> On Mon, 13 Jun 2022, Tamar Christina wrote:
> 
>>> -Original Message-
>>> From: Richard Biener 
>>> Sent: Monday, June 13, 2022 10:39 AM
>>> To: Tamar Christina 
>>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
>>> 
>>> Subject: Re: [PATCH 1/2]middle-end Support optimized division
>>> by
>>> pow2 bitmask
>>> 
>>> On Mon, 13 Jun 2022, Richard Biener wrote:
>>> 
 On Thu, 9 Jun 2022, Tamar Christina wrote:
 
> Hi All,
> 
> In plenty of image and video processing code it's common to
> modify pixel values by a widening operation and then scale
> them back into range
>>> by dividing by 255.
> 
> This patch adds an optab to allow us to emit an optimized
> sequence when doing an unsigned division that is equivalent to:
> 
>   x = y / (2 ^ (bitsize (y)/2)-1
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu,
> x86_64-pc-linux-gnu and no issues.
> 
> Ok for master?
 
 Looking at 2/2 it seems that this is the wrong way to attack
 the problem.  The ISA doesn't have such instruction so adding
 an optab looks premature.  I suppose that there's no unsigned
 vector integer division and thus we open-code that in a different
>> way?
 Isn't the correct thing then to fixup that open-coding if it
 is more
> efficient?
>>> 
>> 
>> The problem is that even if you fixup the open-coding it would
>> need to be something target specific? The sequence of
>> instructions we generate don't have a GIMPLE representation.  So
>> whatever is generated I'd have to fixup in RTL then.
> 
> What's the operation that doesn't have a GIMPLE representation?
 
 For NEON use two operations:
 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2
Where the + widens and the >> narrows.  So you give it two
 shorts, get a byte 2. Add widening add of lowpart so basically
 lowpart (a +w b)
 
 For SVE2 we use a different sequence, we use two back-to-back
>> sequences of:
 1. Add narrow high part (bottom).  In SVE the Top and Bottom instructions
>> select
   Even and odd elements of the vector rather than "top half" and "bottom
>> half".
 
   So this instruction does : Add each vector element of the first source
>> vector to the
   corresponding vector element of the second source vector, and place
>> the most
significant half of the result in the even-numbered half-width
>> destination elements,
while setting the odd-numbered elements to zero.
 
 So there's an explicit permute in there. The instructions are
 sufficiently different that there wouldn't be a single GIMPLE
>> representation.
>>> 
>>> I see.  Are these also useful to express scalar integer division?
>>> 
>>> I'll defer to others to ack the special udiv_pow2_bitmask optab or
>>> suggest some piecemail things other targets might be able to do as
>>> well.  It does look very special.  I'd also bikeshed it to
>>> udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I
>>> interpreted 'bitmask' correctly ;)).  It seems to be even less general
>>> since it is an unary op and the actual divisor is constrained by the
>>> mode itself?
>> 
>> Yeah, those were my concerns as well.  For n-bit numbers, the same kind of
>> arithmetic transformation can be used for any 2^m-1 for m in [n/2, n), so
>> from a target-independent point of view, m==n/2 isn't particularly special.
>> Hard-coding one value of m would make sense if there was an underlying
>> instruction that did exactly this, but like you say, there isn't.
>> 
>> Would a compromise be to define an optab for ADDHN and then add a vector
>> pattern for this division that (at least initially) prefers ADDHN over the
>> current approach whenever ADDHN is available?  We could then adapt the
>> conditions on the pattern if other targets also provide ADDHN but don't want
>> this transform.  (I think the other instructions in the pattern already have
>> optabs.)
>> 
>> That still leaves open the question about what to do about SVE2,

[Bug middle-end/101836] __builtin_object_size(P->M, 1) where M is an array and the last member of a struct fails

2022-06-14 Thread qing.zhao at oracle dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836

--- Comment #27 from Qing Zhao  ---
> On Jun 14, 2022, at 11:39 AM, siddhesh at gcc dot gnu.org 
>  wrote:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836
> 
> --- Comment #26 from Siddhesh Poyarekar  ---
> (In reply to qinzhao from comment #25)
>> So, based on all the discussion so far, how about the following:
>> 
>> ** add the following gcc option:
>> 
>> -fstrict-flex-arrays=[0|1|2|3]
>> 
>> when -fstrict-flex-arrays=0:
>> treat all trailing arrays as flexible arrays. the default behavior;
> 
> Wouldn't this be -fno-strict-flex-arrays, i.e. the current behaviour?

Yes, it’s the same.  =0 is aliased with -fno-strict-flex-arrays.

The point is, the larger the value of LEVEL, the stricter with treating the
flexing array.

i.e, 0 is the least strict, and 3 is the strictest mode.

But we can delete the level 0 if not necessary.
> 
>> when -fstrict-flex-arrays=1:
>> Only treating [], [0], and [1] as flexible array;
>> 
>> when -fstrict-flex-arrays=2:
>> Only treating [] and [0] as flexible array;
>> 
>> when -fstrict-flex-arrays=3:
>> Only treating [] as flexible array; The strictest level.
> 
> If yes, then you end up having:
> 
> -fstrict-flex-arrays=[1|2|3]
> 
> with, I suppose, 1 as the default based on Jakub's comment about maximum
> compatibility support.
Yes.  And 3 is the one Kees requested for kernel usage.

RE: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-14 Thread Tamar Christina via Gcc-patches




> -Original Message-
> From: Richard Sandiford 
> Sent: Tuesday, June 14, 2022 2:43 PM
> To: Richard Biener 
> Cc: Tamar Christina ; gcc-patches@gcc.gnu.org;
> nd 
> Subject: Re: [PATCH 1/2]middle-end Support optimized division by pow2
> bitmask
> 
> Richard Biener  writes:
> > On Mon, 13 Jun 2022, Tamar Christina wrote:
> >
> >> > -Original Message-
> >> > From: Richard Biener 
> >> > Sent: Monday, June 13, 2022 12:48 PM
> >> > To: Tamar Christina 
> >> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> >> > 
> >> > Subject: RE: [PATCH 1/2]middle-end Support optimized division by
> >> > pow2 bitmask
> >> >
> >> > On Mon, 13 Jun 2022, Tamar Christina wrote:
> >> >
> >> > > > -Original Message-
> >> > > > From: Richard Biener 
> >> > > > Sent: Monday, June 13, 2022 10:39 AM
> >> > > > To: Tamar Christina 
> >> > > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> >> > > > 
> >> > > > Subject: Re: [PATCH 1/2]middle-end Support optimized division
> >> > > > by
> >> > > > pow2 bitmask
> >> > > >
> >> > > > On Mon, 13 Jun 2022, Richard Biener wrote:
> >> > > >
> >> > > > > On Thu, 9 Jun 2022, Tamar Christina wrote:
> >> > > > >
> >> > > > > > Hi All,
> >> > > > > >
> >> > > > > > In plenty of image and video processing code it's common to
> >> > > > > > modify pixel values by a widening operation and then scale
> >> > > > > > them back into range
> >> > > > by dividing by 255.
> >> > > > > >
> >> > > > > > This patch adds an optab to allow us to emit an optimized
> >> > > > > > sequence when doing an unsigned division that is equivalent to:
> >> > > > > >
> >> > > > > >x = y / (2 ^ (bitsize (y)/2)-1
> >> > > > > >
> >> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> >> > > > > > x86_64-pc-linux-gnu and no issues.
> >> > > > > >
> >> > > > > > Ok for master?
> >> > > > >
> >> > > > > Looking at 2/2 it seems that this is the wrong way to attack
> >> > > > > the problem.  The ISA doesn't have such instruction so adding
> >> > > > > an optab looks premature.  I suppose that there's no unsigned
> >> > > > > vector integer division and thus we open-code that in a different
> way?
> >> > > > > Isn't the correct thing then to fixup that open-coding if it
> >> > > > > is more
> >> > efficient?
> >> > > >
> >> > >
> >> > > The problem is that even if you fixup the open-coding it would
> >> > > need to be something target specific? The sequence of
> >> > > instructions we generate don't have a GIMPLE representation.  So
> >> > > whatever is generated I'd have to fixup in RTL then.
> >> >
> >> > What's the operation that doesn't have a GIMPLE representation?
> >>
> >> For NEON use two operations:
> >> 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2
> >> Where the + widens and the >> narrows.  So you give it two
> >> shorts, get a byte 2. Add widening add of lowpart so basically
> >> lowpart (a +w b)
> >>
> >> For SVE2 we use a different sequence, we use two back-to-back
> sequences of:
> >> 1. Add narrow high part (bottom).  In SVE the Top and Bottom instructions
> select
> >>Even and odd elements of the vector rather than "top half" and "bottom
> half".
> >>
> >>So this instruction does : Add each vector element of the first source
> vector to the
> >>corresponding vector element of the second source vector, and place
> the most
> >> significant half of the result in the even-numbered half-width
> destination elements,
> >> while setting the odd-numbered elements to zero.
> >>
> >> So there's an explicit permute in there. The instructions are
> >> sufficiently different that there wouldn't be a single GIMPLE
> representation.
> >
> > I see.  Are these also useful to express scalar integer division?
> >
> > I'll defer to others to ack the special udiv_pow2_bitmask optab or
> > suggest some piecemail things other targets might be able to do as
> > well.  It does look very special.  I'd also bikeshed it to
> > udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I
> > interpreted 'bitmask' correctly ;)).  It seems to be even less general
> > since it is an unary op and the actual divisor is constrained by the
> > mode itself?
> 
> Yeah, those were my concerns as well.  For n-bit numbers, the same kind of
> arithmetic transformation can be used for any 2^m-1 for m in [n/2, n), so
> from a target-independent point of view, m==n/2 isn't particularly special.
> Hard-coding one value of m would make sense if there was an underlying
> instruction that did exactly this, but like you say, there isn't.
> 
> Would a compromise be to define an optab for ADDHN and then add a vector
> pattern for this division that (at least initially) prefers ADDHN over the
> current approach whenever ADDHN is available?  We could then adapt the
> conditions on the pattern if other targets also provide ADDHN but don't want
> this transform.  (I think the other instructions in the pattern already have
> optabs.)
> 
> That still leaves

[Bug rtl-optimization/104637] [10/11 Regression] ICE: maximum number of LRA assignment passes is achieved (30) with -Og -fno-forward-propagate -mavx since r9-5221-gd8fcab689435a29d

2022-06-14 Thread vmakarov at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104637

--- Comment #14 from Vladimir Makarov  ---
I've just ported the two patches to gcc-10 and gcc-11 release branches.

gcc-10 required additional work besides just cherry-picking.

The patches were successfully bootstrapped and tested on x86-64.

[Bug rtl-optimization/104637] [10/11 Regression] ICE: maximum number of LRA assignment passes is achieved (30) with -Og -fno-forward-propagate -mavx since r9-5221-gd8fcab689435a29d

2022-06-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104637

--- Comment #13 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Vladimir Makarov
:

https://gcc.gnu.org/g:4c6e66a4dba5bbbcf343c1f6a58f355e270e79b9

commit r10-10831-g4c6e66a4dba5bbbcf343c1f6a58f355e270e79b9
Author: Jakub Jelinek 
Date:   Wed Mar 2 11:04:35 2022 +0100

testsuite: Fix up pr104637 testcase [PR104637]

This testcase FAILs everywhere for 3 reasons:
1) the testcase can't work on ia32, where sizeof (long double) == 12
   and as it is not a power of 2, we disallow creating vectors with such
   elements, -mx32 and -m64 are fine
2) the testcase emits a lot of -Wdiv-by-zero warnings, I've just added
   -Wno-div-by-zero to dg-options
3) my fault, when tweaking the testcase I've missed 33 initializers of
   a 32 element vector which didn't change anything on the ICE, but is
   still reported

This patch fixes all of it, tested with
RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} i386.exp=pr104637.c'
both without the LRA fix where it ICEs and with it where it passes
everywhere.

2022-03-02  Jakub Jelinek  

PR rtl-optimization/104637
* gcc.target/i386/pr104637.c: Don't run on ia32.  Add
-Wno-div-by-zero
to dg-options.
(foo): Remove extraneous initializer.

[Bug rtl-optimization/104637] [10/11 Regression] ICE: maximum number of LRA assignment passes is achieved (30) with -Og -fno-forward-propagate -mavx since r9-5221-gd8fcab689435a29d

2022-06-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104637

--- Comment #12 from CVS Commits  ---
The releases/gcc-10 branch has been updated by Vladimir Makarov
:

https://gcc.gnu.org/g:688703569091edfd0400523d85cbb44d15aa61ea

commit r10-10830-g688703569091edfd0400523d85cbb44d15aa61ea
Author: Vladimir N. Makarov 
Date:   Mon Feb 28 16:43:50 2022 -0500

[PR104637] LRA: Split hard regs as many as possible on one subpass

LRA hard reg split subpass is a small subpass used as the last
resort for LRA when it can not assign a hard reg to a reload
pseudo by other ways (e.g. by spilling non-reload pseudos).  For
simplicity the subpass works on one split base (as each split
changes pseudo live range info).  In this case it results in
reaching maximal possible number of subpasses.  The patch
implements as many non-overlapping hard reg splits
splits as possible on each subpass.

gcc/ChangeLog:

PR rtl-optimization/104637
* lra-assigns.c (lra_split_hard_reg_for): Split hard regs as many
as possible on one subpass.

gcc/testsuite/ChangeLog:

PR rtl-optimization/104637
* gcc.target/i386/pr104637.c: New.

[Bug rtl-optimization/104637] [10/11 Regression] ICE: maximum number of LRA assignment passes is achieved (30) with -Og -fno-forward-propagate -mavx since r9-5221-gd8fcab689435a29d

2022-06-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104637

--- Comment #11 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Vladimir Makarov
:

https://gcc.gnu.org/g:0b518d844a49b2ee48d07e17cce855a4eec59490

commit r11-10065-g0b518d844a49b2ee48d07e17cce855a4eec59490
Author: Jakub Jelinek 
Date:   Wed Mar 2 11:04:35 2022 +0100

testsuite: Fix up pr104637 testcase [PR104637]

This testcase FAILs everywhere for 3 reasons:
1) the testcase can't work on ia32, where sizeof (long double) == 12
   and as it is not a power of 2, we disallow creating vectors with such
   elements, -mx32 and -m64 are fine
2) the testcase emits a lot of -Wdiv-by-zero warnings, I've just added
   -Wno-div-by-zero to dg-options
3) my fault, when tweaking the testcase I've missed 33 initializers of
   a 32 element vector which didn't change anything on the ICE, but is
   still reported

This patch fixes all of it, tested with
RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} i386.exp=pr104637.c'
both without the LRA fix where it ICEs and with it where it passes
everywhere.

2022-03-02  Jakub Jelinek  

PR rtl-optimization/104637
* gcc.target/i386/pr104637.c: Don't run on ia32.  Add
-Wno-div-by-zero
to dg-options.
(foo): Remove extraneous initializer.

[Bug rtl-optimization/104637] [10/11 Regression] ICE: maximum number of LRA assignment passes is achieved (30) with -Og -fno-forward-propagate -mavx since r9-5221-gd8fcab689435a29d

2022-06-14 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104637

--- Comment #10 from CVS Commits  ---
The releases/gcc-11 branch has been updated by Vladimir Makarov
:

https://gcc.gnu.org/g:776283dd1946f1563a59d8f527697e0206f5390e

commit r11-10064-g776283dd1946f1563a59d8f527697e0206f5390e
Author: Vladimir N. Makarov 
Date:   Mon Feb 28 16:43:50 2022 -0500

[PR104637] LRA: Split hard regs as many as possible on one subpass

LRA hard reg split subpass is a small subpass used as the last
resort for LRA when it can not assign a hard reg to a reload
pseudo by other ways (e.g. by spilling non-reload pseudos).  For
simplicity the subpass works on one split base (as each split
changes pseudo live range info).  In this case it results in
reaching maximal possible number of subpasses.  The patch
implements as many non-overlapping hard reg splits
splits as possible on each subpass.

gcc/ChangeLog:

PR rtl-optimization/104637
* lra-assigns.c (lra_split_hard_reg_for): Split hard regs as many
as possible on one subpass.

gcc/testsuite/ChangeLog:

PR rtl-optimization/104637
* gcc.target/i386/pr104637.c: New.

[Bug gcov-profile/100980] [GCOV]The assignment statement in the “for” structure caused the wrong coverage

2022-06-14 Thread njuwy at smail dot nju.edu.cn via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100980

Yang Wang  changed:

   What|Removed |Added

 Resolution|FIXED   |---
 Status|RESOLVED|NEW

--- Comment #3 from Yang Wang  ---
The coverage is correct later version （gcov 11.1.0 and 12.1.0）,is it fixed?

[Bug gcov-profile/101618] [GCOV] Wrong coverage caused by call site in a "for" statement

2022-06-14 Thread njuwy at smail dot nju.edu.cn via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101618

Yang Wang  changed:

   What|Removed |Added

 Resolution|FIXED   |---
 Status|RESOLVED|UNCONFIRMED

--- Comment #2 from Yang Wang  ---
The coverage is correct later version （gcov 11.1.0 and 12.1.0）,is it fixed?

[Bug middle-end/101836] __builtin_object_size(P->M, 1) where M is an array and the last member of a struct fails

2022-06-14 Thread siddhesh at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836

--- Comment #26 from Siddhesh Poyarekar  ---
(In reply to qinzhao from comment #25)
> So, based on all the discussion so far, how about the following:
> 
> ** add the following gcc option:
> 
> -fstrict-flex-arrays=[0|1|2|3]
> 
> when -fstrict-flex-arrays=0:
> treat all trailing arrays as flexible arrays. the default behavior;

Wouldn't this be -fno-strict-flex-arrays, i.e. the current behaviour?

> when -fstrict-flex-arrays=1:
> Only treating [], [0], and [1] as flexible array;
> 
> when -fstrict-flex-arrays=2:
> Only treating [] and [0] as flexible array;
> 
> when -fstrict-flex-arrays=3:
> Only treating [] as flexible array; The strictest level.

If yes, then you end up having:

-fstrict-flex-arrays=[1|2|3]

with, I suppose, 1 as the default based on Jakub's comment about maximum
compatibility support.

[Bug c/105950] > O2 optimization causes runtime (SIGILL) during main initialization

2022-06-14 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105950

--- Comment #25 from Jonathan Wakely  ---
(In reply to John Kanapes from comment #22)
> It took you 4 posts to explain me what to do.
> It took me 4 posts to understand what you were talking about.
> You should explain better.

You should read better. Comment 3 is perfectly clear.

"For UBSan, you can't just compile with -fsanitize=undefined, you need to link
with that flag as well."


(In reply to John Kanapes from comment #23)
> What do you do with the sources after the ticket?

They will stay attached here. If you don't want them to be public, you need to
reduce it to something smaller that still shows the bug (which you've said you
can't) or put them somewhere online and persuade somebody here to download them
and try to reproduce and reduce it for you.

[PATCH v1.1] tree-optimization/105736: Don't let error_mark_node escape for ADDR_EXPR

2022-06-14 Thread Siddhesh Poyarekar

The addr_expr computation does not check for error_mark_node before
returning the size expression.  This used to work in the constant case
because the conversion to uhwi would end up causing it to return
size_unknown, but that won't work for the dynamic case.

Modify the control flow to explicitly return size_unknown if the offset
computation returns an error_mark_node.

gcc/ChangeLog:

PR tree-optimization/105736
* tree-object-size.cc (addr_object_size): Return size_unknown
when object offset computation returns an error.

gcc/testsuite/ChangeLog:

PR tree-optimization/105736
* gcc.dg/builtin-dynamic-object-size-0.c (TV4, val3,
test_pr105736): New struct declaration, variable and function to
test PR.
(main): Use them.

Signed-off-by: Siddhesh Poyarekar 
---
Changes from v1:
- Used FAIL() instead of __builtin_abort() in the test.

Tested:

- x86_64 bootstrap and test
- --with-build-config=bootstrap-ubsan build

May I also backport this to gcc12?

 .../gcc.dg/builtin-dynamic-object-size-0.c| 18 +
 gcc/tree-object-size.cc   | 20 ++-
 2 files changed, 29 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index b5b0b3a677c..01a280b2d7b 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -479,6 +479,20 @@ test_loop (int *obj, size_t sz, size_t start, size_t end, 
int incr)
   return __builtin_dynamic_object_size (ptr, 0);
 }
 
+/* Other tests.  */
+
+struct TV4
+{
+  __attribute__((vector_size (sizeof (int) * 4))) int v;
+};
+
+struct TV4 val3;
+int *
+test_pr105736 (struct TV4 *a)
+{
+  return >v[0];
+}
+
 unsigned nfails = 0;
 
 #define FAIL() ({ \
@@ -633,6 +647,10 @@ main (int argc, char **argv)
 FAIL ();
   if (test_loop (arr, 42, 20, 52, 1) != 0)
 FAIL ();
+  /* pr105736.  */
+  int *t = test_pr105736 ();
+  if (__builtin_dynamic_object_size (t, 0) != -1)
+FAIL ();
 
   if (nfails > 0)
 __builtin_abort ();
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 5ca87ae3504..12bc0868b77 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -695,19 +695,21 @@ addr_object_size (struct object_size_info *osi, 
const_tree ptr,
var_size = pt_var_size;
   bytes = compute_object_offset (TREE_OPERAND (ptr, 0), var);
   if (bytes != error_mark_node)
-   bytes = size_for_offset (var_size, bytes);
-  if (var != pt_var
- && pt_var_size
- && TREE_CODE (pt_var) == MEM_REF
- && bytes != error_mark_node)
{
- tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0), pt_var);
- if (bytes2 != error_mark_node)
+ bytes = size_for_offset (var_size, bytes);
+ if (var != pt_var && pt_var_size && TREE_CODE (pt_var) == MEM_REF)
{
- bytes2 = size_for_offset (pt_var_size, bytes2);
- bytes = size_binop (MIN_EXPR, bytes, bytes2);
+ tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0),
+  pt_var);
+ if (bytes2 != error_mark_node)
+   {
+ bytes2 = size_for_offset (pt_var_size, bytes2);
+ bytes = size_binop (MIN_EXPR, bytes, bytes2);
+   }
}
}
+  else
+   bytes = size_unknown (object_size_type);
 
   wholebytes
= object_size_type & OST_SUBOBJECT ? var_size : pt_var_wholesize;
-- 
2.35.3

[Bug c/105950] > O2 optimization causes runtime (SIGILL) during main initialization

2022-06-14 Thread sam at gentoo dot org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105950

Sam James  changed:

   What|Removed |Added

 CC||sam at gentoo dot org

--- Comment #24 from Sam James  ---
Please be polite on these bugs. There's a lot of documentation online about how
to use UBsan.

It's not ideal to upload a tarball with all of the bits, but if it's what's
needed, then I guess so be it. Some build systems make it easier to enable
sanitizers like Meson.

GCC's bug tracker isn't for general support on how to use build systems and
flags. 

The bug tracker is public and I don't think one can delete their own
attachments.

Are you saying that when you use -fsanitize=undefined and run your program, it
gets SIGILL'd?

[PATCH] tree-optimization/105736: Don't let error_mark_node escape for ADDR_EXPR

2022-06-14 Thread Siddhesh Poyarekar

The addr_expr computation does not check for error_mark_node before
returning the size expression.  This used to work in the constant case
because the conversion to uhwi would end up causing it to return
size_unknown, but that won't work for the dynamic case.

Modify the control flow to explicitly return size_unknown if the offset
computation returns an error_mark_node.

gcc/ChangeLog:

PR tree-optimization/105736
* tree-object-size.cc (addr_object_size): Return size_unknown
when object offset computation returns an error.

gcc/testsuite/ChangeLog:

PR tree-optimization/105736
* gcc.dg/builtin-dynamic-object-size-0.c (TV4, val3,
test_pr105736): New struct declaration, variable and function to
test PR.
(main): Use them.

Signed-off-by: Siddhesh Poyarekar 
---

Tested:

- x86_64 bootstrap and test
- --with-build-config=bootstrap-ubsan build

May I also backport this to gcc12?

 .../gcc.dg/builtin-dynamic-object-size-0.c| 19 ++
 gcc/tree-object-size.cc   | 20 ++-
 2 files changed, 30 insertions(+), 9 deletions(-)

diff --git a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c 
b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
index b5b0b3a677c..90f303ef40e 100644
--- a/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
+++ b/gcc/testsuite/gcc.dg/builtin-dynamic-object-size-0.c
@@ -479,6 +479,20 @@ test_loop (int *obj, size_t sz, size_t start, size_t end, 
int incr)
   return __builtin_dynamic_object_size (ptr, 0);
 }
 
+/* Other tests.  */
+
+struct TV4
+{
+  __attribute__((vector_size (sizeof (int) * 4))) int v;
+};
+
+struct TV4 val3;
+int *
+test_pr105736 (struct TV4 *a)
+{
+  return >v[0];
+}
+
 unsigned nfails = 0;
 
 #define FAIL() ({ \
@@ -633,6 +647,11 @@ main (int argc, char **argv)
 FAIL ();
   if (test_loop (arr, 42, 20, 52, 1) != 0)
 FAIL ();
+  /* pr105736.  */
+  int *t = test_pr105736 ();
+  if (__builtin_dynamic_object_size (t, 0) != -1)
+__builtin_abort ();
+
 
   if (nfails > 0)
 __builtin_abort ();
diff --git a/gcc/tree-object-size.cc b/gcc/tree-object-size.cc
index 5ca87ae3504..12bc0868b77 100644
--- a/gcc/tree-object-size.cc
+++ b/gcc/tree-object-size.cc
@@ -695,19 +695,21 @@ addr_object_size (struct object_size_info *osi, 
const_tree ptr,
var_size = pt_var_size;
   bytes = compute_object_offset (TREE_OPERAND (ptr, 0), var);
   if (bytes != error_mark_node)
-   bytes = size_for_offset (var_size, bytes);
-  if (var != pt_var
- && pt_var_size
- && TREE_CODE (pt_var) == MEM_REF
- && bytes != error_mark_node)
{
- tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0), pt_var);
- if (bytes2 != error_mark_node)
+ bytes = size_for_offset (var_size, bytes);
+ if (var != pt_var && pt_var_size && TREE_CODE (pt_var) == MEM_REF)
{
- bytes2 = size_for_offset (pt_var_size, bytes2);
- bytes = size_binop (MIN_EXPR, bytes, bytes2);
+ tree bytes2 = compute_object_offset (TREE_OPERAND (ptr, 0),
+  pt_var);
+ if (bytes2 != error_mark_node)
+   {
+ bytes2 = size_for_offset (pt_var_size, bytes2);
+ bytes = size_binop (MIN_EXPR, bytes, bytes2);
+   }
}
}
+  else
+   bytes = size_unknown (object_size_type);
 
   wholebytes
= object_size_type & OST_SUBOBJECT ? var_size : pt_var_wholesize;
-- 
2.35.3

[Bug libstdc++/59048] operator== between std::string and const char* slower than strcmp

2022-06-14 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59048

Jonathan Wakely  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org
 Status|NEW |ASSIGNED

[Bug c/105970] ICE in ix86_function_arg, at config/i386/i386.cc:3351

2022-06-14 Thread hjl.tools at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105970

--- Comment #2 from H.J. Lu  ---
(In reply to Uroš Bizjak from comment #1)
> Probably something like:
> 
> diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
> index 3d189e124e4..f158cc3aaea 100644
> --- a/gcc/config/i386/i386.cc
> +++ b/gcc/config/i386/i386.cc
> @@ -3348,7 +3348,7 @@ ix86_function_arg (cumulative_args_t cum_v, const
> function_arg_info )
>if (POINTER_TYPE_P (arg.type))
> {
>   /* This is the pointer argument.  */
> - gcc_assert (TYPE_MODE (arg.type) == Pmode);
> + gcc_assert (TYPE_MODE (arg.type) == ptr_mode);

This looks reasonable since pointer mode should be ptr_mode.

>   /* It is at -WORD(AP) in the current frame in interrupt and
>  exception handlers.  */
>   reg = plus_constant (Pmode, arg_pointer_rtx, -UNITS_PER_WORD);
> 
> Pointer mode and Pmode can be distinct for x32 target.  However, I have no
> idea what goes into interrupt frame for x32. Let's ask HJ.

[Bug target/105960] [12/13 Regression] Crash in 32-bit mode

2022-06-14 Thread hjl.tools at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105960

H.J. Lu  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |hjl.tools at gmail dot 
com

--- Comment #7 from H.J. Lu  ---
Created attachment 53135
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53135=edit
A patch

Try this.

[Bug middle-end/101836] __builtin_object_size(P->M, 1) where M is an array and the last member of a struct fails

2022-06-14 Thread qinzhao at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101836

--- Comment #25 from qinzhao at gcc dot gnu.org ---
So, based on all the discussion so far, how about the following:

** add the following gcc option:

-fstrict-flex-arrays=[0|1|2|3]

when -fstrict-flex-arrays=0:
treat all trailing arrays as flexible arrays. the default behavior;

when -fstrict-flex-arrays=1:
Only treating [], [0], and [1] as flexible array;

when -fstrict-flex-arrays=2:
Only treating [] and [0] as flexible array;

when -fstrict-flex-arrays=3:
Only treating [] as flexible array; The strictest level. 

any comments?

[Bug libstdc++/62187] std::string==const char* could compare sizes first

2022-06-14 Thread redi at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62187

Jonathan Wakely  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org
 Status|NEW |ASSIGNED

--- Comment #7 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #5)
> I've also created an LWG issue about this,

Rather than a new issue, this was added to https://wg21.link/lwg2852 

The resolution was to confirm that operator== doesn't need to call compare if
it can determine the result another way. That means we can do the length check
unconditionally.

[Bug c/105950] > O2 optimization causes runtime (SIGILL) during main initialization

2022-06-14 Thread jkanapes at yahoo dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105950

--- Comment #23 from John Kanapes  ---
Hi,

I have not been able to recreate the issue with simpler programs that use the
same resources. I will need to upload my sources. Is it OK to upload a tar.gz
archive with a test directory with the sources and a makefile? What do you do
with the sources after the ticket?

TIA

[Bug target/105975] New: OpenMP/nvptx offloading: 'internal compiler error: in maybe_legitimize_operand, at optabs.cc:7785'

2022-06-14 Thread tschwinge at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105975

Bug ID: 105975
   Summary: OpenMP/nvptx offloading: 'internal compiler error: in
maybe_legitimize_operand, at optabs.cc:7785'
   Product: gcc
   Version: 12.0
Status: UNCONFIRMED
  Keywords: openmp
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: tschwinge at gcc dot gnu.org
CC: jakub at gcc dot gnu.org, rsandifo at gcc dot gnu.org,
vries at gcc dot gnu.org
  Target Milestone: ---
Target: nvptx

The recent commit r13-1068-g1d205dbac1e1754c01c22a31bd1688126545401e "Factor
out common internal-fn idiom" causes a class of ICEs in OpenMP/nvptx offloading
compilation: 'during RTL pass: expand', 'internal compiler error: in
maybe_legitimize_operand, at optabs.cc:7785', seen for a lot of libgomp
OpenMP/nvptx offloading test cases (with '-O1' and higher).

0xb1b0b3 maybe_legitimize_operand
[...]/source-gcc/gcc/optabs.cc:7785
0xb1b0b3 maybe_legitimize_operands(insn_code, unsigned int, unsigned int,
expand_operand*)
[...]/source-gcc/gcc/optabs.cc:7936
0xb1b139 maybe_gen_insn(insn_code, unsigned int, expand_operand*)
[...]/source-gcc/gcc/optabs.cc:7955
0xb1a8b8 maybe_expand_insn(insn_code, unsigned int, expand_operand*)
[...]/source-gcc/gcc/optabs.cc:7998
0xb1a8b8 expand_insn(insn_code, unsigned int, expand_operand*)
[...]/source-gcc/gcc/optabs.cc:8029
0x95dcb3 expand_fn_using_insn
[...]/source-gcc/gcc/internal-fn.cc:193
0x6d3ee7 expand_call_stmt
[...]/source-gcc/gcc/cfgexpand.cc:2737
0x6d3ee7 expand_gimple_stmt_1
[...]/source-gcc/gcc/cfgexpand.cc:3869

For extra entertainment: when running with '-wrapper "$GDB",-q,--args', we get
'[Inferior 1 (process [...]) exited normally]'...  (Maybe Valgrind could help? 
Unless someone directly pinpoints the issue, of course.)

I've not yet determined whether it's a latent problem just exposed by this
commit, or whether the commit itself has an issue.  It's not magically fixed by
the related subsequent commit
r13-1069-gf8baf4004ef965ce7a9edf6d2f5eb99adb15803a "Add a general mapping from
internal fns to target insns".

'gcc/internal-fn.cc':

193expand_insn (icode, opno, ops);

'gcc/optabs.cc':

8026expand_insn (enum insn_code icode, unsigned int nops,
8027 class expand_operand *ops)
8028{
8029  if (!maybe_expand_insn (icode, nops, ops))

7995maybe_expand_insn (enum insn_code icode, unsigned int nops,
7996   class expand_operand *ops)
7997{
7998  rtx_insn *pat = maybe_gen_insn (icode, nops, ops);

7951maybe_gen_insn (enum insn_code icode, unsigned int nops,
7952class expand_operand *ops)
7953{
7954  gcc_assert (nops == (unsigned int) insn_data[(int)
icode].n_generator_args);
7955  if (!maybe_legitimize_operands (icode, 0, nops, ops))

7935  /* Otherwise try legitimizing the operand on its own.  */
7936  if (j == i && !maybe_legitimize_operand (icode, opno + i,
[i]))

7784case EXPAND_OUTPUT:
7785  gcc_assert (mode != VOIDmode);

[Bug target/105974] New: [13 Regression] ICE: RTL check: expected elt 0 type 'i' or 'n', have 'w' (rtx const_int) in arm_bfi_1_p, at config/arm/arm.cc:10214

2022-06-14 Thread zsojka at seznam dot cz via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105974

Bug ID: 105974
   Summary: [13 Regression] ICE: RTL check: expected elt 0 type
'i' or 'n', have 'w' (rtx const_int) in arm_bfi_1_p,
at config/arm/arm.cc:10214
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: build, ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: zsojka at seznam dot cz
  Target Milestone: ---
  Host: x86_64-pc-linux-gnu
Target: armv7a-hardfloat-linux-gnueabi

Created attachment 53134
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53134=edit
reduced testcase

This currently breaks build with RTL checking enabled.

Compiler output:
$ /repo/build-gcc-trunk-armv7a-hardfloat/./gcc/cc1 -O2 -march=armv7-a+vfpv4
testcase.c 
 __gnu_fractqquda
Analyzing compilation unit
Performing interprocedural optimizations
 <*free_lang_data> {heap 932k}  {heap 932k} 
{heap 932k}  {heap 1212k}  {heap 1684k}
 {heap 1684k}  {heap 1684k} 
{heap 1684k}Streaming LTO
  {heap 1684k}  {heap 1684k}  {heap
1684k}  {heap 1684k}  {heap 1684k}  {heap 1684k} 
{heap 1684k}  {heap 1684k}  {heap 1684k}  {heap
1684k}  {heap 1684k}  {heap 1684k} 
{heap 1684k}  {heap 1684k}Assembling functions:
 __gnu_fractqqudaduring RTL pass: combine

testcase.c: In function '__gnu_fractqquda':
testcase.c:9:1: internal compiler error: RTL check: expected elt 0 type 'i' or
'n', have 'w' (rtx const_int) in arm_bfi_1_p, at config/arm/arm.cc:10214
9 | }
  | ^
0x71d11e rtl_check_failed_type2(rtx_def const*, int, int, int, char const*,
int, char const*)
/repo/gcc-trunk/gcc/rtl.cc:907
0x7d01d3 arm_bfi_1_p
/repo/gcc-trunk/gcc/config/arm/arm.cc:10214
0x14406d6 arm_bfi_p
/repo/gcc-trunk/gcc/config/arm/arm.cc:10255
0x14406d6 arm_rtx_costs_internal
/repo/gcc-trunk/gcc/config/arm/arm.cc:11027
0x14406d6 arm_rtx_costs
/repo/gcc-trunk/gcc/config/arm/arm.cc:12058
0x102c33e rtx_cost(rtx_def*, machine_mode, rtx_code, int, bool)
/repo/gcc-trunk/gcc/rtlanal.cc:4629
0x1b69e98 set_src_cost
/repo/gcc-trunk/gcc/rtl.h:2943
0x1b69e98 distribute_and_simplify_rtx
/repo/gcc-trunk/gcc/combine.cc:10013
0x1b77941 simplify_logical
/repo/gcc-trunk/gcc/combine.cc:7103
0x1b77941 combine_simplify_rtx
/repo/gcc-trunk/gcc/combine.cc:6330
0x1b79d19 subst
/repo/gcc-trunk/gcc/combine.cc:5605
0x1b7d3d7 try_combine
/repo/gcc-trunk/gcc/combine.cc:3288
0x1b85dd5 combine_instructions
/repo/gcc-trunk/gcc/combine.cc:1266
0x1b85dd5 rest_of_handle_combine
/repo/gcc-trunk/gcc/combine.cc:14976
0x1b85dd5 execute
/repo/gcc-trunk/gcc/combine.cc:15021
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.

$ /repo/build-gcc-trunk-armv7a-hardfloat/./gcc/xgcc -v
Using built-in specs.
COLLECT_GCC=/repo/build-gcc-trunk-armv7a-hardfloat/./gcc/xgcc
Target: armv7a-hardfloat-linux-gnueabi
Configured with: /repo/gcc-trunk//configure --enable-languages=c,c++
--enable-valgrind-annotations --disable-nls --enable-checking=yes,rtl,df,extra
--with-cloog --with-ppl --with-isl --with-float=hard --with-fpu=vfpv4
--with-arch=armv7-a --with-sysroot=/usr/armv7a-hardfloat-linux-gnueabi
--build=x86_64-pc-linux-gnu --host=x86_64-pc-linux-gnu
--target=armv7a-hardfloat-linux-gnueabi
--with-ld=/usr/bin/armv7a-hardfloat-linux-gnueabi-ld
--with-as=/usr/bin/armv7a-hardfloat-linux-gnueabi-as --disable-libstdcxx-pch
--prefix=/repo/gcc-trunk//binary-trunk-r13-1089-20220614140553-g8f6c317b3a1-checking-yes-rtl-df-extra-armv7a-hardfloat
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 13.0.0 20220614 (experimental) (GCC)

[Bug target/105960] [12/13 Regression] Crash in 32-bit mode

2022-06-14 Thread hjl.tools at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105960

--- Comment #6 from H.J. Lu  ---
This is caused by r12-5771.

[Bug middle-end/105638] Redundant stores aren't removed by DSE

2022-06-14 Thread hjl.tools at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105638

H.J. Lu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
   Target Milestone|--- |13.0

--- Comment #3 from H.J. Lu  ---
Fixed.

Re: GSoC Blog Post 0 - GCCprefab build system

2022-06-14 Thread Wileam Yonatan Phan via Gcc

Hi Jonathan,

I just pushed a commit to update the build script and config files to use the
release tags instead of the tip of the branch. Thanks again for pointing this
out!

Thanks,
Wil

---

Hi Damian,

That's indeed a tricky issue to implement with the build script if the user
doesn't have sudo rights to install software on the system using the package
manager. Maybe I can make the script download the tarballs and build them from
sources.

Full disclosure: the script currently assumes all prerequisites have been
successfully installed, but based on the discussion here, I can add several
lines to check for their existence using `command -v`.

I'll start working on this later tonight.

Thanks,
Wil

On Tue, 2022-06-14 at 04:16 -0700, Damian Rouson wrote:
> On Mon, Jun 13, 2022 at 8:27 AM Jonathan Wakely via Fortran <
> fort...@gcc.gnu.org> wrote:
> > It doesn't include them, but they are standard system packages that
> > everybody can install without downloading the sources and building
> > them from scratch. 
> 
> unless the person is on a system on which they are not preinstalled and a
> system for which the person doesn’t have the sudo privileges that package
> managers often require.  What I’m describing is the norm for a lot of
> government employees and even many people at private corporations with strict
> security policies.  For what it’s worth, I’ve been assisting someone who
> contacted me with this very issue over the past few days.  Building the
> entire stack from source is the least painful option for this person. 
> 
> > You still need to have the other prerequisites listed at
> > https://gcc.gnu.org/install/prerequisites.html
> 
> That is a long and daunting list for a newcomer.  I’ve listened to gfortran
> developers describe building gfortran as “easy” for more than a decade
> now.  Simply saying it’s easy doesn’t make it so. I don’t know that I’ve ever
> met someone who described the process as easy unless that person was a
> gfortran developer. 
> 
> Damian
> 
> 
>

[Bug target/105920] __builtin_cpu_supports ("f16c") should check AVX

2022-06-14 Thread hjl.tools at gmail dot com via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105920

H.J. Lu  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED
   Target Milestone|--- |11.4

--- Comment #2 from H.J. Lu  ---
Fixed for GCC 13 by

https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=751f306688508b08842d0ab967dee8e6c3b91351

Fixed for GCC 12.2 by:

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=4b06b7304066fb1016e017d15e189f2e745dceae

Fixed for GCC 11.4 by

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=30c1cde3adec938606cd49b1b4a262590b496719

Re: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-14 Thread Richard Sandiford via Gcc-patches

Richard Biener  writes:
> On Mon, 13 Jun 2022, Tamar Christina wrote:
>
>> > -Original Message-
>> > From: Richard Biener 
>> > Sent: Monday, June 13, 2022 12:48 PM
>> > To: Tamar Christina 
>> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
>> > 
>> > Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2
>> > bitmask
>> > 
>> > On Mon, 13 Jun 2022, Tamar Christina wrote:
>> > 
>> > > > -Original Message-
>> > > > From: Richard Biener 
>> > > > Sent: Monday, June 13, 2022 10:39 AM
>> > > > To: Tamar Christina 
>> > > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
>> > > > 
>> > > > Subject: Re: [PATCH 1/2]middle-end Support optimized division by
>> > > > pow2 bitmask
>> > > >
>> > > > On Mon, 13 Jun 2022, Richard Biener wrote:
>> > > >
>> > > > > On Thu, 9 Jun 2022, Tamar Christina wrote:
>> > > > >
>> > > > > > Hi All,
>> > > > > >
>> > > > > > In plenty of image and video processing code it's common to
>> > > > > > modify pixel values by a widening operation and then scale them
>> > > > > > back into range
>> > > > by dividing by 255.
>> > > > > >
>> > > > > > This patch adds an optab to allow us to emit an optimized
>> > > > > > sequence when doing an unsigned division that is equivalent to:
>> > > > > >
>> > > > > >x = y / (2 ^ (bitsize (y)/2)-1
>> > > > > >
>> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
>> > > > > > x86_64-pc-linux-gnu and no issues.
>> > > > > >
>> > > > > > Ok for master?
>> > > > >
>> > > > > Looking at 2/2 it seems that this is the wrong way to attack the
>> > > > > problem.  The ISA doesn't have such instruction so adding an optab
>> > > > > looks premature.  I suppose that there's no unsigned vector
>> > > > > integer division and thus we open-code that in a different way?
>> > > > > Isn't the correct thing then to fixup that open-coding if it is more
>> > efficient?
>> > > >
>> > >
>> > > The problem is that even if you fixup the open-coding it would need to
>> > > be something target specific? The sequence of instructions we generate
>> > > don't have a GIMPLE representation.  So whatever is generated I'd have
>> > > to fixup in RTL then.
>> > 
>> > What's the operation that doesn't have a GIMPLE representation?
>> 
>> For NEON use two operations:
>> 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2
>> Where the + widens and the >> narrows.  So you give it two shorts, get a 
>> byte
>> 2. Add widening add of lowpart so basically lowpart (a +w b)
>> 
>> For SVE2 we use a different sequence, we use two back-to-back sequences of:
>> 1. Add narrow high part (bottom).  In SVE the Top and Bottom instructions 
>> select
>>Even and odd elements of the vector rather than "top half" and "bottom 
>> half".
>> 
>>So this instruction does : Add each vector element of the first source 
>> vector to the
>>corresponding vector element of the second source vector, and place the 
>> most
>> significant half of the result in the even-numbered half-width 
>> destination elements,
>> while setting the odd-numbered elements to zero.
>> 
>> So there's an explicit permute in there. The instructions are sufficiently 
>> different that there
>> wouldn't be a single GIMPLE representation.
>
> I see.  Are these also useful to express scalar integer division?
>
> I'll defer to others to ack the special udiv_pow2_bitmask optab
> or suggest some piecemail things other targets might be able to do as 
> well.  It does look very special.  I'd also bikeshed it to
> udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming
> I interpreted 'bitmask' correctly ;)).  It seems to be even less
> general since it is an unary op and the actual divisor is constrained
> by the mode itself?

Yeah, those were my concerns as well.  For n-bit numbers, the same kind
of arithmetic transformation can be used for any 2^m-1 for m in [n/2, n),
so from a target-independent point of view, m==n/2 isn't particularly
special.  Hard-coding one value of m would make sense if there was an
underlying instruction that did exactly this, but like you say, there
isn't.

Would a compromise be to define an optab for ADDHN and then add a vector
pattern for this division that (at least initially) prefers ADDHN over
the current approach whenever ADDHN is available?  We could then adapt
the conditions on the pattern if other targets also provide ADDHN but
don't want this transform.  (I think the other instructions in the
pattern already have optabs.)

That still leaves open the question about what to do about SVE2,
but the underlying problem there is that the vectoriser doesn't
know about the B/T layout.

Thanks,
Richard

Re: [PATCH take #2] Fold truncations of left shifts in match.pd

2022-06-14 Thread Richard Biener via Gcc-patches

On Sun, Jun 5, 2022 at 1:12 PM Roger Sayle  wrote:
>
>
> Hi Richard,
> Many thanks for taking the time to explain how vectorization is supposed
> to work.  I now see that vect_recog_rotate_pattern in tree-vect-patterns.cc
> is supposed to handle lowering of rotations to (vector) shifts, and
> completely agree that adding support for signed types (using appropriate
> casts to unsigned_type_for and casting the result back to the original
> signed type) is a better approach to avoid the regression of pr98674.c.
>
> I've also implemented your suggestions of combining the proposed new
> (convert (lshift @1 INTEGER_CST@2)) with the existing one, and at the
> same time including support for valid shifts greater than the narrower
> type, such as (short)(x << 20),  to constant zero.  Although this optimization
> is already performed during the tree-ssa passes, it's convenient to
> also catch it here during constant folding.
>
> This revised patch has been tested on x86_64-pc-linux-gnu with
> make bootstrap and make -k check, both with and without
> --target_board=unix{-m32}, with no new failures.  Ok for mainline?

OK.

Thanks,
Richard.

> 2022-06-05  Roger Sayle  
> Richard Biener  
>
> gcc/ChangeLog
> * match.pd (convert (lshift @1 INTEGER_CST@2)): Narrow integer
> left shifts by a constant when the result is truncated, and the
> shift constant is well-defined.
> * tree-vect-patterns.cc (vect_recog_rotate_pattern): Add
> support for rotations of signed integer types, by lowering
> using unsigned vector shifts.
>
> gcc/testsuite/ChangeLog
> * gcc.dg/fold-convlshift-4.c: New test case.
> * gcc.dg/optimize-bswaphi-1.c: Update found bswap count.
> * gcc.dg/tree-ssa/pr61839_3.c: Shift is now optimized before VRP.
> * gcc.dg/vect/vect-over-widen-1-big-array.c: Remove obsolete tests.
> * gcc.dg/vect/vect-over-widen-1.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-3.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
> * gcc.dg/vect/vect-over-widen-4.c: Likewise.
>
>
> Thanks again,
> Roger
> --
>
> > -Original Message-
> > From: Richard Biener 
> > Sent: 02 June 2022 12:03
> > To: Roger Sayle 
> > Cc: GCC Patches 
> > Subject: Re: [PATCH] Fold truncations of left shifts in match.pd
> >
> > On Thu, Jun 2, 2022 at 12:55 PM Roger Sayle 
> > wrote:
> > >
> > >
> > > Hi Richard,
> > > > +  /* RTL expansion knows how to expand rotates using shift/or.  */
> > > > + if (icode == CODE_FOR_nothing
> > > > +  && (code == LROTATE_EXPR || code == RROTATE_EXPR)
> > > > +  && optab_handler (ior_optab, vec_mode) != CODE_FOR_nothing
> > > > +  && optab_handler (ashl_optab, vec_mode) != CODE_FOR_nothing)
> > > > +icode = (int) optab_handler (lshr_optab, vec_mode);
> > > >
> > > > but we then get the vector costing wrong.
> > >
> > > The issue is that we currently get the (relative) vector costing wrong.
> > > Currently for gcc.dg/vect/pr98674.c, the vectorizer thinks the scalar
> > > code requires two shifts and an ior, so believes its profitable to
> > > vectorize this loop using two vector shifts and an vector ior.  But
> > > once match.pd simplifies the truncate and recognizes the HImode rotate we
> > end up with:
> > >
> > > pr98674.c:6:16: note:   ==> examining statement: _6 = _1 r>> 8;
> > > pr98674.c:6:16: note:   vect_is_simple_use: vectype vector(8) short int
> > > pr98674.c:6:16: note:   vect_is_simple_use: operand 8, type of def: 
> > > constant
> > > pr98674.c:6:16: missed:   op not supported by target.
> > > pr98674.c:8:33: missed:   not vectorized: relevant stmt not supported: _6 
> > > = _1
> > r>> 8;
> > > pr98674.c:6:16: missed:  bad operation or unsupported loop bound.
> > >
> > >
> > > Clearly, it's a win to vectorize HImode rotates, when the backend can
> > > perform
> > > 8 (or 16) rotations at a time, but using 3 vector instructions, even
> > > when a scalar rotate can performed in a single instruction.
> > > Fundamentally, vectorization may still be desirable/profitable even when 
> > > the
> > backend doesn't provide an optab.
> >
> > Yes, as said it's tree-vect-patterns.cc job to handle this not natively 
> > supported
> > rotate by re-writing it.  Can you check why vect_recog_rotate_pattern does 
> > not
> > do this?  Ah, the code only handles !TYPE_UNSIGNED (type) - not sure why
> > though (for rotates it should not matter and for the lowered sequence we can
> > convert to desired signedness to get arithmetic/logical shifts)?
> >
> > > The current situation where the i386's backend provides expanders to
> > > lower rotations (or vcond) into individual instruction sequences, also 
> > > interferes
> > with
> > > vector costing.   It's the vector cost function that needs to be fixed, 
> > > not the
> > > generated code made worse (or the backend bloated performing its own
> > > RTL

RE: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-14 Thread Tamar Christina via Gcc-patches

> -Original Message-
> From: Richard Biener 
> Sent: Tuesday, June 14, 2022 2:19 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> 
> Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2
> bitmask
> 
> On Mon, 13 Jun 2022, Tamar Christina wrote:
> 
> > > -Original Message-
> > > From: Richard Biener 
> > > Sent: Monday, June 13, 2022 12:48 PM
> > > To: Tamar Christina 
> > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> > > 
> > > Subject: RE: [PATCH 1/2]middle-end Support optimized division by
> > > pow2 bitmask
> > >
> > > On Mon, 13 Jun 2022, Tamar Christina wrote:
> > >
> > > > > -Original Message-
> > > > > From: Richard Biener 
> > > > > Sent: Monday, June 13, 2022 10:39 AM
> > > > > To: Tamar Christina 
> > > > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> > > > > 
> > > > > Subject: Re: [PATCH 1/2]middle-end Support optimized division by
> > > > > pow2 bitmask
> > > > >
> > > > > On Mon, 13 Jun 2022, Richard Biener wrote:
> > > > >
> > > > > > On Thu, 9 Jun 2022, Tamar Christina wrote:
> > > > > >
> > > > > > > Hi All,
> > > > > > >
> > > > > > > In plenty of image and video processing code it's common to
> > > > > > > modify pixel values by a widening operation and then scale
> > > > > > > them back into range
> > > > > by dividing by 255.
> > > > > > >
> > > > > > > This patch adds an optab to allow us to emit an optimized
> > > > > > > sequence when doing an unsigned division that is equivalent to:
> > > > > > >
> > > > > > >x = y / (2 ^ (bitsize (y)/2)-1
> > > > > > >
> > > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > > > > x86_64-pc-linux-gnu and no issues.
> > > > > > >
> > > > > > > Ok for master?
> > > > > >
> > > > > > Looking at 2/2 it seems that this is the wrong way to attack
> > > > > > the problem.  The ISA doesn't have such instruction so adding
> > > > > > an optab looks premature.  I suppose that there's no unsigned
> > > > > > vector integer division and thus we open-code that in a different
> way?
> > > > > > Isn't the correct thing then to fixup that open-coding if it
> > > > > > is more
> > > efficient?
> > > > >
> > > >
> > > > The problem is that even if you fixup the open-coding it would
> > > > need to be something target specific? The sequence of instructions
> > > > we generate don't have a GIMPLE representation.  So whatever is
> > > > generated I'd have to fixup in RTL then.
> > >
> > > What's the operation that doesn't have a GIMPLE representation?
> >
> > For NEON use two operations:
> > 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2
> > Where the + widens and the >> narrows.  So you give it two shorts,
> > get a byte 2. Add widening add of lowpart so basically lowpart (a +w
> > b)
> >
> > For SVE2 we use a different sequence, we use two back-to-back
> sequences of:
> > 1. Add narrow high part (bottom).  In SVE the Top and Bottom instructions
> select
> >Even and odd elements of the vector rather than "top half" and "bottom
> half".
> >
> >So this instruction does : Add each vector element of the first source
> vector to the
> >corresponding vector element of the second source vector, and place the
> most
> > significant half of the result in the even-numbered half-width 
> > destination
> elements,
> > while setting the odd-numbered elements to zero.
> >
> > So there's an explicit permute in there. The instructions are
> > sufficiently different that there wouldn't be a single GIMPLE
> representation.
> 
> I see.  Are these also useful to express scalar integer division?

Hmm not these exact instructions as they only exist on vector. Scalar may
Potentially benefit from rewriting this to (x + ((x + 257) >> 8)) >> 8
Which avoids the multiply with the magic constant.  But the problem here is
that unless undone for vector it would likely generate worse code if vectorized
exactly like this on most ISAs compared to what we have now.

> 
> I'll defer to others to ack the special udiv_pow2_bitmask optab or suggest
> some piecemail things other targets might be able to do as well.  It does look
> very special.  I'd also bikeshed it to
> udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming I
> interpreted 'bitmask' correctly ;)).  It seems to be even less general since 
> it is
> an unary op and the actual divisor is constrained by the mode itself?

I am happy to change the name, and quite happy to add the constant as an
argument.   I had only made it this specific because this was the only fairly
common operation I had found.  Though perhaps it's indeed better to keep
the optab a bit more general?

Thanks,
Tamar

> 
> Richard.
> 
> > >
> > > I think for costing you could resort to the *_cost functions as used
> > > by synth_mult and friends.
> > >
> > > > The problem with this is that it seemed fragile. We generate from
> > > > the
> > > > Vectorizer:
> > > >
> > > >   vect__3.8_35 = MEM

[Bug c++/105838] [10/11/12/13 Regression] g++ 12.1.0 runs out of memory or time when building const std::vector of std::strings

2022-06-14 Thread jakub at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105838

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org,
   ||redi at gcc dot gnu.org

--- Comment #6 from Jakub Jelinek  ---
Note, for say
#include 
#include 

void foo (const std::vector &);
int main ()
{
  const std::vector lst = {
  "aahing", "aaliis", "aarrgh", "abacas", "abacus", "abakas", "abamps",
"abands", "abased", "abaser", "abases", "abasia" };
  foo (lst);
}
one gets terrible code from both g++ and clang++, in both cases it is serial
code calling many std::string ctors with the string literal arguments
that perhaps later on are inlined.  Over 21000 times in a row.  That also means
over 21000 memory allocations etc.
For your game, the obvious first question would be if you really need
std::vector of std::string in this case and if a normal array of const char *
strings wouldn't be better, that can be initialized at compile time.
Or, if you really need std::vector, if it wouldn't be better to
use array of const char * and build the
vector from it (sizeof (arr) / sizeof (arr[0]) to reserve that many elts in the
vector, then a loop that will construct
the std::string objects and move them into the list).

On the compiler side, a question is if we shouldn't detect such kind of
initializers and if they have over some param determined number of elements
which have the same type / kind (or at least a large sequence of such), don't
emit those
std::allocator::allocator ();
try
  {
std::__cxx11::basic_string::basic_string<> (_4,
"aahing", );
D.37581 = D.37581 + 32;
D.37582 = D.37582 + -1;
_5 = D.37581;
try
  {
std::allocator::allocator ();
try
  {
   
std::__cxx11::basic_string::basic_string<> (_5, "aaliis", );
D.37581 = D.37581 + 32;
D.37582 = D.37582 + -1;
_6 = D.37581;
try
  {
...
but a loop.  Doesn't have to be just for the STL types, if we have
struct S { S (int); ... };
  const S s[] = { 1, 3, 22, 42, 132, -12, 18, 19, 32, 0, 25, ... };
then again there should be some upper limit over which we'd just emit:
  const S s[count];
  static const int stemp[count] = { 1, 3, 22, 42, 132, -12, 18, 19, 32, 0, 25,
... };
  for (size_t x = 0; x < count; ++x) S ([x], stemp[x]);
or so (of course, with destruction possibility if some ctor may throw).

Re: [PATCH] Do not erase warning data in gimple_set_location

2022-06-14 Thread Richard Biener via Gcc-patches

On Tue, Jun 14, 2022 at 12:49 PM Eric Botcazou  wrote:
>
> > Hmm, I think instead of special-casing UNKNOWN_LOCATION
> > what gimple_set_location should probably do is either not copy
> > warnings at all or union them.  Btw, gimple_set_location also
> > removes a previously set BLOCK (but gimple_set_block preserves
> > the location locus and diagnostic override).
> >
> > So I'd be tempted to axe the copy_warning () completely here.
>
> The first thing I tried, but it regressed the original testcase IIRC.
>
> Even my minimal patch manages to break bootstrap on ARM:
>
> buildslave/workspace/tcwg_gnu_1/abe/snapshots/gcc.git~master/libcpp/lex.cc:
> 1523:9: error: pointer used after ‘void operator delete(void*, std::size_t)’
> [-Werror=use-after-free]
> # 00:31:04 make[3]: *** [Makefile:227: lex.o] Error 1
> # 00:31:04 make[2]: *** [Makefile:9527: all-stage3-libcpp] Error 2
> # 00:31:35 make[1]: *** [Makefile:25887: stage3-bubble] Error 2
> # 00:31:35 make: *** [Makefile:1072: all] Error 2
>
>   /* Don't warn for cases like when a cdtor returns 'this' on ARM.  */
>   else if (warning_suppressed_p (var, OPT_Wuse_after_free))
> return;
>
> because warning-control.cc:copy_warning also clobbers the warning data of the
> destination.  We have in cp/decl.cc:maybe_return_this the lines:
>
>   /* Return the address of the object.  */
>   tree val = DECL_ARGUMENTS (current_function_decl);
>   suppress_warning (val, OPT_Wuse_after_free);
>
> -Wuse-after-free is suppressed for the location of VAL and the TREE_NO_WARNING
> bit set on it.  But other expressions may have the same location as VAL and
> the TREE_NO_WARNING bit _not_ set, so when you call copy_warning (expr, expr)
> (we do that a lot after failed folding) for them, copy_warning erases the
> warning data of the location.
>
> I have installed the obvious fixlet after testing on x86-64/Linux, but the
> decoupling between TREE_NO_WARNING bit and location looks a bit problematic.

Thanks - that makes more sense.

>
> * warning-control.cc (copy_warning) [generic version]: Do not erase
> the warning data of the destination location when the no-warning
> bit is not set on the source.
> (copy_warning) [tree version]: Return early if TO is equal to FROM.
> (copy_warning) [gimple version]: Likewise.
> testsuite/
> * g++.dg/warn/Wuse-after-free5.C: New test.
>
> --
> Eric Botcazou

[PATCH V2]rs6000: Store complicated constant into pool

2022-06-14 Thread Jiufu Guo via Gcc-patches

Hi,

This patch is same with nearly:
https://gcc.gnu.org/pipermail/gcc-patches/2022-May/595378.html

The concept of this patch is similar with the patches which is
attached in PR63281.  e.g.
https://gcc.gnu.org/bugzilla/attachment.cgi?id=42186

I had a test for perlbench from SPEC2017. As expected, on -O2 for P9,
there is ~2% performance improvement with this patch. 

This patch reduces the threshold of instruction number for storing
constant to pool and update cost for constant and mem accessing.
And then if building the constant needs more than 2 instructions (or
more than 1 instruction on P10), then prefer to load it from constant
pool.

Bootstrap and regtest pass on ppc64le and ppc64.
Is this ok for trunk?  Thanks for comments and sugguestions.


BR,
Jiufu


PR target/63281

gcc/ChangeLog:
2022-06-14  Jiufu Guo  
Alan Modra 

* config/rs6000/rs6000.cc (rs6000_cannot_force_const_mem):
Exclude rtx with code 'HIGH'.
(rs6000_emit_move): Update threshold of const insn.
(rs6000_rtx_costs): Update cost of constant and mem.

gcc/testsuite/ChangeLog:
2022-06-14  Jiufu Guo  
Alan Modra 

* gcc.target/powerpc/medium_offset.c: Update.
* gcc.target/powerpc/pr93012.c: Update.
* gcc.target/powerpc/pr63281.c: New test.


---
 gcc/config/rs6000/rs6000.cc   | 23 +++
 .../gcc.target/powerpc/medium_offset.c|  2 +-
 gcc/testsuite/gcc.target/powerpc/pr63281.c| 11 +
 gcc/testsuite/gcc.target/powerpc/pr93012.c|  2 +-
 4 files changed, 31 insertions(+), 7 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr63281.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index cd291f93019..90c91a8e1ea 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -9706,8 +9706,9 @@ rs6000_init_stack_protect_guard (void)
 static bool
 rs6000_cannot_force_const_mem (machine_mode mode ATTRIBUTE_UNUSED, rtx x)
 {
-  if (GET_CODE (x) == HIGH
-  && GET_CODE (XEXP (x, 0)) == UNSPEC)
+  /* Exclude CONSTANT HIGH part.  e.g.
+ (high:DI (symbol_ref:DI ("var") [flags 0xc0] )).  */
+  if (GET_CODE (x) == HIGH)
 return true;
 
   /* A TLS symbol in the TOC cannot contain a sum.  */
@@ -11139,7 +11140,7 @@ rs6000_emit_move (rtx dest, rtx source, machine_mode 
mode)
&& FP_REGNO_P (REGNO (operands[0])))
   || !CONST_INT_P (operands[1])
   || (num_insns_constant (operands[1], mode)
-  > (TARGET_CMODEL != CMODEL_SMALL ? 3 : 2)))
+  > (TARGET_PREFIXED ? 1 : 2)))
   && !toc_relative_expr_p (operands[1], false, NULL, NULL)
   && (TARGET_CMODEL == CMODEL_SMALL
   || can_create_pseudo_p ()
@@ -22101,6 +22102,14 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 
 case CONST_DOUBLE:
 case CONST_WIDE_INT:
+  /* It may needs a few insns for const to SET. -1 for outer SET code.  */
+  if (outer_code == SET)
+   {
+ *total = COSTS_N_INSNS (num_insns_constant (x, mode)) - 1;
+ return true;
+   }
+  /* FALLTHRU */
+
 case CONST:
 case HIGH:
 case SYMBOL_REF:
@@ -22110,8 +22119,12 @@ rs6000_rtx_costs (rtx x, machine_mode mode, int 
outer_code,
 case MEM:
   /* When optimizing for size, MEM should be slightly more expensive
 than generating address, e.g., (plus (reg) (const)).
-L1 cache latency is about two instructions.  */
-  *total = !speed ? COSTS_N_INSNS (1) + 1 : COSTS_N_INSNS (2);
+L1 cache latency is about two instructions.
+For prefixed load (pld), we would set it slightly faster than
+than two instructions. */
+  *total = !speed
+? COSTS_N_INSNS (1) + 1
+: TARGET_PREFIXED ? COSTS_N_INSNS (2) - 1 : COSTS_N_INSNS (2);
   if (rs6000_slow_unaligned_access (mode, MEM_ALIGN (x)))
*total += COSTS_N_INSNS (100);
   return true;
diff --git a/gcc/testsuite/gcc.target/powerpc/medium_offset.c 
b/gcc/testsuite/gcc.target/powerpc/medium_offset.c
index f29eba08c38..4889e8fa8ec 100644
--- a/gcc/testsuite/gcc.target/powerpc/medium_offset.c
+++ b/gcc/testsuite/gcc.target/powerpc/medium_offset.c
@@ -1,7 +1,7 @@
 /* { dg-do compile { target { powerpc*-*-* } } } */
 /* { dg-require-effective-target lp64 } */
 /* { dg-options "-O" } */
-/* { dg-final { scan-assembler-not "\\+4611686018427387904" } } */
+/* { dg-final { scan-assembler-times {\msldi|pld\M} 1 } } */
 
 static int x;
 
diff --git a/gcc/testsuite/gcc.target/powerpc/pr63281.c 
b/gcc/testsuite/gcc.target/powerpc/pr63281.c
new file mode 100644
index 000..469a8f64400
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr63281.c
@@ -0,0 +1,11 @@
+/* PR target/63281 */
+/* { dg-do compile { target lp64 } } */
+/* { dg-options "-O2 -std=c99" } */
+
+void
+foo (unsigned long long *a)
+{
+  *a =

RE: [PATCH 1/2]middle-end Support optimized division by pow2 bitmask

2022-06-14 Thread Richard Biener via Gcc-patches

On Mon, 13 Jun 2022, Tamar Christina wrote:

> > -Original Message-
> > From: Richard Biener 
> > Sent: Monday, June 13, 2022 12:48 PM
> > To: Tamar Christina 
> > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> > 
> > Subject: RE: [PATCH 1/2]middle-end Support optimized division by pow2
> > bitmask
> > 
> > On Mon, 13 Jun 2022, Tamar Christina wrote:
> > 
> > > > -Original Message-
> > > > From: Richard Biener 
> > > > Sent: Monday, June 13, 2022 10:39 AM
> > > > To: Tamar Christina 
> > > > Cc: gcc-patches@gcc.gnu.org; nd ; Richard Sandiford
> > > > 
> > > > Subject: Re: [PATCH 1/2]middle-end Support optimized division by
> > > > pow2 bitmask
> > > >
> > > > On Mon, 13 Jun 2022, Richard Biener wrote:
> > > >
> > > > > On Thu, 9 Jun 2022, Tamar Christina wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > In plenty of image and video processing code it's common to
> > > > > > modify pixel values by a widening operation and then scale them
> > > > > > back into range
> > > > by dividing by 255.
> > > > > >
> > > > > > This patch adds an optab to allow us to emit an optimized
> > > > > > sequence when doing an unsigned division that is equivalent to:
> > > > > >
> > > > > >x = y / (2 ^ (bitsize (y)/2)-1
> > > > > >
> > > > > > Bootstrapped Regtested on aarch64-none-linux-gnu,
> > > > > > x86_64-pc-linux-gnu and no issues.
> > > > > >
> > > > > > Ok for master?
> > > > >
> > > > > Looking at 2/2 it seems that this is the wrong way to attack the
> > > > > problem.  The ISA doesn't have such instruction so adding an optab
> > > > > looks premature.  I suppose that there's no unsigned vector
> > > > > integer division and thus we open-code that in a different way?
> > > > > Isn't the correct thing then to fixup that open-coding if it is more
> > efficient?
> > > >
> > >
> > > The problem is that even if you fixup the open-coding it would need to
> > > be something target specific? The sequence of instructions we generate
> > > don't have a GIMPLE representation.  So whatever is generated I'd have
> > > to fixup in RTL then.
> > 
> > What's the operation that doesn't have a GIMPLE representation?
> 
> For NEON use two operations:
> 1. Add High narrowing lowpart, essentially doing (a +w b) >>.n bitsize(a)/2
> Where the + widens and the >> narrows.  So you give it two shorts, get a 
> byte
> 2. Add widening add of lowpart so basically lowpart (a +w b)
> 
> For SVE2 we use a different sequence, we use two back-to-back sequences of:
> 1. Add narrow high part (bottom).  In SVE the Top and Bottom instructions 
> select
>Even and odd elements of the vector rather than "top half" and "bottom 
> half".
> 
>So this instruction does : Add each vector element of the first source 
> vector to the
>corresponding vector element of the second source vector, and place the 
> most
> significant half of the result in the even-numbered half-width 
> destination elements,
> while setting the odd-numbered elements to zero.
> 
> So there's an explicit permute in there. The instructions are sufficiently 
> different that there
> wouldn't be a single GIMPLE representation.

I see.  Are these also useful to express scalar integer division?

I'll defer to others to ack the special udiv_pow2_bitmask optab
or suggest some piecemail things other targets might be able to do as 
well.  It does look very special.  I'd also bikeshed it to
udiv_pow2m1 since 'bitmask' is less obvious than 2^n-1 (assuming
I interpreted 'bitmask' correctly ;)).  It seems to be even less
general since it is an unary op and the actual divisor is constrained
by the mode itself?

Richard.

> > 
> > I think for costing you could resort to the *_cost functions as used by
> > synth_mult and friends.
> > 
> > > The problem with this is that it seemed fragile. We generate from the
> > > Vectorizer:
> > >
> > >   vect__3.8_35 = MEM  [(uint8_t *)_21];
> > >   vect_patt_28.9_37 = WIDEN_MULT_LO_EXPR  > vect_cst__36>;
> > >   vect_patt_28.9_38 = WIDEN_MULT_HI_EXPR  > vect_cst__36>;
> > >   vect_patt_19.10_40 = vect_patt_28.9_37 h* { 32897, 32897, 32897, 32897,
> > 32897, 32897, 32897, 32897 };
> > >   vect_patt_19.10_41 = vect_patt_28.9_38 h* { 32897, 32897, 32897, 32897,
> > 32897, 32897, 32897, 32897 };
> > >   vect_patt_25.11_42 = vect_patt_19.10_40 >> 7;
> > >   vect_patt_25.11_43 = vect_patt_19.10_41 >> 7;
> > >   vect_patt_11.12_44 = VEC_PACK_TRUNC_EXPR  > > vect_patt_25.11_43>;
> > >
> > > and if the magic constants change then we miss the optimization. I
> > > could rewrite the open coding to use shifts alone, but that might be a
> > regression for some uarches I would imagine.
> > 
> > OK, so you do have a highpart multiply.  I suppose the pattern is too deep 
> > to
> > be recognized by combine?  What's the RTL good vs. bad before combine of
> > one of the expressions?
> 
> Yeah combine only tried 2-3 instructions, but to use these sequences we have 
> to
> match the entire chain as the instructions do the

Re: [PATCH][WIP] have configure probe prefix for gmp/mpfr/mpc [PR44425]

2022-06-14 Thread Richard Biener via Gcc-patches

On Mon, Jun 13, 2022 at 4:27 PM Eric Gallager  wrote:
>
> On Mon, Jun 13, 2022 at 7:02 AM Richard Biener
>  wrote:
> >
> > On Thu, Jun 2, 2022 at 5:54 PM Eric Gallager via Gcc-patches
> >  wrote:
> > >
> > > So, I'm working on fixing PR bootstrap/44425, and have this patch to
> > > have the top-level configure script check in the value passed to
> > > `--prefix=` when looking for gmp/mpfr/mpc. It "works" (in that
> > > configuring with just `--prefix=` and none of
> > > `--with-gmp=`/`--with-mpfr=`/`--with-mpc=` now works where it failed
> > > before), but unfortunately it results in a bunch of duplicated
> > > `-I`/`-L` flags stuck in ${gmplibs} and ${gmpinc}... is that
> > > acceptable or should I try another approach?
> >
> > --prefix is documented as to be used for installing (architecture
> > independent) files,
> > I'm not sure it is a good idea to probe that for gmp/mpfr/mpc installs used 
> > for
> > the host.
> >
> > Richard.
> >
> > > Eric
>
> So... I guess that means we should close bug 44425 as INVALID or
> WONTFIX then? https://gcc.gnu.org/bugzilla/show_bug.cgi?id=44425

That would be my reaction, yes.

[Bug c++/105838] [10/11/12/13 Regression] g++ 12.1.0 runs out of memory or time when building const std::vector of std::strings

2022-06-14 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105838

Richard Biener  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #5 from Richard Biener  ---
btw, the unincluded testcase ended up too small, not matching the posted
numbers (I had to hit reload and cut it further at that point ...).

[Bug c++/105838] [10/11/12/13 Regression] g++ 12.1.0 runs out of memory or time when building const std::vector of std::strings

2022-06-14 Thread rguenth at gcc dot gnu.org via Gcc-bugs

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105838

Richard Biener  changed:

   What|Removed |Added

Summary|g++ 12.1.0 runs out of  |[10/11/12/13 Regression]
   |memory or time when |g++ 12.1.0 runs out of
   |building const std::vector  |memory or time when
   |of std::strings |building const std::vector
   ||of std::strings
 Blocks||93199
   Target Milestone|--- |10.4
   Priority|P3  |P2
 CC||ebotcazou at gcc dot gnu.org,
   ||rguenth at gcc dot gnu.org

--- Comment #4 from Richard Biener  ---
Memory usage is from cleanup_empty_eh_merge_phis which deals with a very large
number of incoming edges, recording the edge/var mappings.  This likely runs
into

  /* The post-order traversal may lead to quadraticness in the redirection
 of incoming EH edges from inner LPs, so first try to walk the region
 tree from inner to outer LPs in order to eliminate these edges.  */

where we end up re-directing more and more edges again and again.  Still the
peak memory use is odd, but it might be simply GC garbage piling up in the
CFG manipulation odyssee.

It's removal of MNT regions - with just 3 elements we go in ehcleanup1 from

Before removal of unreachable regions:
Eh tree:
   25 must_not_throw
   1 cleanup land:{12,}
 24 cleanup
 23 must_not_throw
 2 cleanup land:{11,}
   22 must_not_throw
   3 cleanup land:{10,}
 21 must_not_throw
 4 cleanup land:{9,}
   20 must_not_throw
   5 cleanup land:{1,}
 19 must_not_throw
 6 cleanup land:{8,}
   18 must_not_throw
   7 cleanup land:{2,}
 17 must_not_throw
 8 cleanup land:{7,}
   16 must_not_throw
   9 cleanup land:{3,}
 15 must_not_throw
 10 cleanup land:{6,}
   14 must_not_throw
   11 cleanup land:{5,}
 13 must_not_throw
 12 cleanup land:{4,}

to

After removal of unreachable regions:
Eh tree:
   1 cleanup land:{12,}
 2 cleanup land:{11,}
   3 cleanup land:{10,}
 4 cleanup land:{9,}
   5 cleanup land:{1,}
 6 cleanup land:{8,}
   7 cleanup land:{2,}
 8 cleanup land:{7,}
   9 cleanup land:{3,}
 10 cleanup land:{6,}
   11 cleanup land:{5,}
 12 cleanup land:{4,}

but we do this in a sub-optimal order.  Axing the first walk:

  for (i = vec_safe_length (cfun->eh->lp_array) - 1; i >= 1; --i)
{
  lp = (*cfun->eh->lp_array)[i];
  if (lp)
changed |= cleanup_empty_eh (lp);
}

fixes this but it will go against the PR93199 fix in r10-5868-g5eaf0c498f718f,
which the followup r11-3234-gaab6194d0898f5 preserved.  I fear the
optimal order is different for the clobber optimizations and the edge
redirection overhead.

In any case a fix should be evaluated against the PR93199 testcase as well.


Referenced Bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93199
[Bug 93199] [9 Regression] Compile time hog in sink_clobbers

1 2 3 >

1 - 100 of 205 matches

Mail list logo