[PATCH] tree-optimization/107865 - ICE with outlining of loops

2022-11-24 Thread Richard Biener via Gcc-patches
The following makes sure to clear loops number of iterations when
outlining them as part of a SESE region as can happen with
auto-parallelization.  The referenced SSA names become stale otherwise.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

PR tree-optimization/107865
* tree-cfg.cc (move_sese_region_to_fn): Free the number of
iterations of moved loops.

* gfortran.dg/graphite/pr107865.f90: New testcase.
---
 .../gfortran.dg/graphite/pr107865.f90  | 18 ++
 gcc/tree-cfg.cc|  2 ++
 2 files changed, 20 insertions(+)
 create mode 100644 gcc/testsuite/gfortran.dg/graphite/pr107865.f90

diff --git a/gcc/testsuite/gfortran.dg/graphite/pr107865.f90 
b/gcc/testsuite/gfortran.dg/graphite/pr107865.f90
new file mode 100644
index 000..6bddb17a1be
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/graphite/pr107865.f90
@@ -0,0 +1,18 @@
+! { dg-do compile }
+! { dg-options "-O1 -floop-parallelize-all -ftree-parallelize-loops=2" }
+
+  SUBROUTINE FNC (F)
+
+  IMPLICIT REAL (A-H)
+  DIMENSION F(N)
+
+  DO I = 1, 6
+ DO J = 1, 6
+IF (J .NE. I) THEN
+   F(I) = F(I) + 1
+END IF
+ END DO
+  END DO
+
+  RETURN
+  END
diff --git a/gcc/tree-cfg.cc b/gcc/tree-cfg.cc
index 28175312afc..0c409b435fb 100644
--- a/gcc/tree-cfg.cc
+++ b/gcc/tree-cfg.cc
@@ -7859,6 +7859,8 @@ move_sese_region_to_fn (struct function *dest_cfun, 
basic_block entry_bb,
   if (bb->loop_father->header == bb)
{
  class loop *this_loop = bb->loop_father;
+ /* Avoid the need to remap SSA names used in nb_iterations.  */
+ free_numbers_of_iterations_estimates (this_loop);
  class loop *outer = loop_outer (this_loop);
  if (outer == loop
  /* If the SESE region contains some bbs ending with
-- 
2.35.3


Re: Please, really, make `-masm=intel` the default for x86

2022-11-24 Thread LIU Hao via Gcc

在 2022/11/25 15:37, Hi-Angel 写道:

Why? A default is merely a default. I don't really see why changing
that should help you specifically. A decision "which assembly syntax
to use" is one that makes a project like ones you're contributing to,
not GCC. If they decided to use AT syntax, they won't switch to
Intel just because a compiler toolchain has changed their default.



There's a lot more than that. The AT syntax usually surprises people; and more importantly, for 
miserable beginners on GCC inline assembly, they can't start learning from official Intel 
documentation, but have to learn from some non-standard, insane and incompatible dialect. That's 
just too unfortunate.


The AT syntax should really die out, but if it is kept the default, that is 
never going to happen.



If you care specifically about the projects you are contributing to,
then those are the ones whom you need to convince to switch to "intel"
assembly syntax, not the GCC developers. Because as I said, changing a
default in GCC will hardly make any change to those other projects.


That is a poor reason for putting up with a piece of 50-year-old evilness and refusing to move 
forward. Upgrading the compiler is always a big change, and updating sources should be expected, 
when we take `-Werror` into account.



--
Best regards,
LIU Hao



OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH-1, rs6000] Generate permute index directly for little endian target [PR100866]

2022-11-24 Thread Kewen.Lin via Gcc-patches
Hi Haochen,

Sorry for the late review.

on 2022/10/11 15:38, HAO CHEN GUI wrote:
> Hi,
>   This patch modifies the help function which generates permute index for
> vector byte reversion and generates permute index directly for little endian
> targets. It saves one "xxlnor" instructions on P8 little endian targets as
> the original process needs an "xxlnor" to calculate complement for the index.
> 

Nice.

> Bootstrapped and tested on ppc64 Linux BE and LE with no regressions.
> Is this okay for trunk? Any recommendations? Thanks a lot.
> 
> ChangeLog
> 2022-10-11  Haochen Gui 
> 
> gcc/
>   PR target/100866
>   * config/rs6000/rs6000-call.cc (swap_endian_selector_for_mode):
>   Generate permute index directly for little endian targets.
>   * config/rs6000/vsx.md (revb_): Call vprem directly with
>   corresponding permute indexes.
> 
> gcc/testsuite/
>   PR target/100866
>   * gcc.target/powerpc/pr100866.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000-call.cc 
> b/gcc/config/rs6000/rs6000-call.cc
> index 551968b0995..bad8e9e0e52 100644
> --- a/gcc/config/rs6000/rs6000-call.cc
> +++ b/gcc/config/rs6000/rs6000-call.cc
> @@ -2839,7 +2839,10 @@ swap_endian_selector_for_mode (machine_mode mode)
>  }
> 
>for (i = 0; i < 16; ++i)
> -perm[i] = GEN_INT (swaparray[i]);
> +if (BYTES_BIG_ENDIAN)
> +  perm[i] = GEN_INT (swaparray[i]);
> +else
> +  perm[i] = GEN_INT (~swaparray[i] & 0x001f);

IMHO, it would be good to add a function comment for this function,
it's sad that we didn't have it before.  With this patch, the selector (perm) is
expected to be used with vperm direct as shown below, it would be good to note 
it
explicitly for other potential callers too.

> 
>return force_reg (V16QImode, gen_rtx_CONST_VECTOR (V16QImode,
>gen_rtvec_v (16, perm)));
> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index e226a93bbe5..b68eba48d2c 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -6096,8 +6096,8 @@ (define_expand "revb_"
>to the endian mode in use, i.e. in LE mode, put elements
>in BE order.  */
>rtx sel = swap_endian_selector_for_mode(mode);
> -  emit_insn (gen_altivec_vperm_ (operands[0], operands[1],
> -operands[1], sel));
> +  emit_insn (gen_altivec_vperm__direct (operands[0], operands[1],
> +   operands[1], sel));>  }
> 
>DONE;
> diff --git a/gcc/testsuite/gcc.target/powerpc/pr100866.c 
> b/gcc/testsuite/gcc.target/powerpc/pr100866.c
> new file mode 100644
> index 000..c708dfd502e
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr100866.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target powerpc_p8vector_ok } */
> +/* { dg-options "-O2 -mdejagnu-cpu=power8" } */
> +/* { dg-final { scan-assembler-not "xxlnor" } } */

Nit: may be better with {\mxxlnor\M}?

The others look good to me.  Thanks!

BR,
Kewen

> +
> +#include 
> +
> +vector unsigned short revb(vector unsigned short a)
> +{
> +   return vec_revb(a);
> +}
> 




Re: Please, really, make `-masm=intel` the default for x86

2022-11-24 Thread Dave Blanchard


On Fri, 25 Nov 2022 at 09:40, LIU Hao via Gcc  wrote:
>> One annoying thing about GCC is that, for x86 if I need to write I piece of 
>> inline assembly then I
>> have to do it twice: one in AT syntax and one in Intel syntax.

> Why? A default is merely a default. I don't really see why changing
> that should help you specifically. A decision "which assembly syntax
> to use" is one that makes a project like ones you're contributing to,
> not GCC. If they decided to use AT syntax, they won't switch to
> Intel just because a compiler toolchain has changed their default.

While I sympathize with the desire to get rid of crud (and I agree that AT 
syntax is crud), as stated above it wouldn't really make a practical 
difference. For distro maintainers it would likely break some/many older 
packages which assumed the old default behavior, thus requiring a number of 
patches. Usually not a big deal in and of itself (though it can be if the build 
system for that package is particularly junky), but when you consider there are 
so many packages including GCC always deprecating and changing things, it adds 
up to a lot of work to keep up with it all.

-- 
Dave Blanchard 


[Bug tree-optimization/106923] [13 Regression] ICE in eliminate_unnecessary_stmts, at tree-ssa-dce.cc:1512 since r13-2518-ga262f969d6fd936f

2022-11-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106923

Richard Biener  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |hubicka at gcc dot 
gnu.org
 Status|NEW |ASSIGNED

Re: Please, really, make `-masm=intel` the default for x86

2022-11-24 Thread Hi-Angel via Gcc
On Fri, 25 Nov 2022 at 09:40, LIU Hao via Gcc  wrote:
> One annoying thing about GCC is that, for x86 if I need to write I piece of 
> inline assembly then I
> have to do it twice: one in AT syntax and one in Intel syntax.

Why? A default is merely a default. I don't really see why changing
that should help you specifically. A decision "which assembly syntax
to use" is one that makes a project like ones you're contributing to,
not GCC. If they decided to use AT syntax, they won't switch to
Intel just because a compiler toolchain has changed their default.

If you care specifically about the projects you are contributing to,
then those are the ones whom you need to convince to switch to "intel"
assembly syntax, not the GCC developers. Because as I said, changing a
default in GCC will hardly make any change to those other projects.


[Bug tree-optimization/107865] [12/13 Regression] ICE in verify_loop_structure, at cfgloop.cc:1748 (Error: loop 3's number of iterations '_61 > 0 ? (uint128_t) (_61 + -1) : 0' references the released

2022-11-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107865

Richard Biener  changed:

   What|Removed |Added

 Ever confirmed|0   |1
   Keywords||ice-checking
   Assignee|unassigned at gcc dot gnu.org  |rguenth at gcc dot 
gnu.org
   Last reconfirmed||2022-11-25
 Status|UNCONFIRMED |ASSIGNED
   Target Milestone|--- |12.3

--- Comment #1 from Richard Biener  ---
Testing a patch.

Please, really, make `-masm=intel` the default for x86

2022-11-24 Thread LIU Hao via Gcc
I am a Windows developer and I have been writing x86 and amd64 assembly for more than ten years. One 
annoying thing about GCC is that, for x86 if I need to write I piece of inline assembly then I have 
to do it twice: one in AT syntax and one in Intel syntax.



The AT syntax is an awkward foreign dialect, designed originally for PDP-11 and spoken by bumpkins 
that knew little about x86 or ARM. No official Intel or AMD documentation ever adopts it. The syntax 
is terrible. Consider:


   movl $1, %eax  ; k; moves $1 into EAX
  ; but in high-level languages we expect '%eax = $1',
  ; so it goes awkwardly backwards.

If this looks fine to you, please re-consider:

  cmpl $1, %eax
  jg .L1  ; does this mean 'jump if $1 is greater than %eax'
  ; or something stupidly reversed?

If CMP still looks fine to you, please consider how to write VFMADD231PD in 
AT syntax, really.


I have been tired of such inconsistency. For God's sake, please deprecate it.


--
Best regards,
LIU Hao


OpenPGP_signature
Description: OpenPGP digital signature


[Bug target/99889] Add powerpc ELFv1 support for -fpatchable-function-entry* with "o" sections

2022-11-24 Thread linkw at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99889

Kewen Lin  changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|ASSIGNED|RESOLVED

--- Comment #4 from Kewen Lin  ---
Should be fixed on trunk.

[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863

--- Comment #8 from Hongtao.liu  ---
(In reply to Hongtao.liu from comment #7)
> > -  if (width < HOST_BITS_PER_WIDE_INT)
> > +  if (width < HOST_BITS_PER_WIDE_INT
> > +  && (mode != QImode || !flag_signed_char))
> typo should be 
> +  && (mode != QImode || flag_signed_char))

I guess not, flag_signed_char is not an exact map to QImode.

[PATCH V3] [x86] Fix incorrect _mm_cvtsbh_ss.

2022-11-24 Thread liuhongt via Gcc-patches
Update in V3:
Remove !flag_signaling_nans since there's already HONOR_NANS (BFmode).

Here's the patch:

After supporting real __bf16, the implementation of _mm_cvtsbh_ss went
wrong.

The patch add a builtin to generate pslld for the intrinsic, also
extendbfsf2 is supported with pslld when !HONOR_NANS (BFmode).

truncsfbf2 is supported with vcvtneps2bf16 when
!HONOR_NANS (BFmode) && flag_unsafe_math_optimizations.

gcc/ChangeLog:

PR target/107748
* config/i386/avx512bf16intrin.h (_mm_cvtsbh_ss): Refined.
* config/i386/i386-builtin-types.def (FLOAT_FTYPE_BFLOAT16):
New function type.
* config/i386/i386-builtin.def (BDESC): New builtin.
* config/i386/i386-expand.cc (ix86_expand_args_builtin):
Handle the builtin.
* config/i386/i386.md (extendbfsf2): New expander.
(extendbfsf2_1): New define_insn.
(truncsfbf2): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/avx512bf16-cvtsbh2ss-1.c: Scan pslld.
* gcc.target/i386/extendbfsf.c: New test.
---
 gcc/config/i386/avx512bf16intrin.h|  4 +-
 gcc/config/i386/i386-builtin-types.def|  1 +
 gcc/config/i386/i386-builtin.def  |  2 +
 gcc/config/i386/i386-expand.cc|  1 +
 gcc/config/i386/i386.md   | 40 ++-
 .../gcc.target/i386/avx512bf16-cvtsbh2ss-1.c  |  3 +-
 gcc/testsuite/gcc.target/i386/extendbfsf.c| 16 
 7 files changed, 61 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/i386/extendbfsf.c

diff --git a/gcc/config/i386/avx512bf16intrin.h 
b/gcc/config/i386/avx512bf16intrin.h
index ea1d0125b3f..75378af5584 100644
--- a/gcc/config/i386/avx512bf16intrin.h
+++ b/gcc/config/i386/avx512bf16intrin.h
@@ -46,9 +46,7 @@ extern __inline float
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_cvtsbh_ss (__bf16 __A)
 {
-  union{ float a; unsigned int b;} __tmp;
-  __tmp.b = ((unsigned int)(__A)) << 16;
-  return __tmp.a;
+  return __builtin_ia32_cvtbf2sf (__A);
 }
 
 /* vcvtne2ps2bf16 */
diff --git a/gcc/config/i386/i386-builtin-types.def 
b/gcc/config/i386/i386-builtin-types.def
index d10de32643f..65fe070e37f 100644
--- a/gcc/config/i386/i386-builtin-types.def
+++ b/gcc/config/i386/i386-builtin-types.def
@@ -1281,6 +1281,7 @@ DEF_FUNCTION_TYPE (V4SI, V4SI, V4SI, UHI)
 DEF_FUNCTION_TYPE (V8SI, V8SI, V8SI, UHI)
 
 # BF16 builtins
+DEF_FUNCTION_TYPE (FLOAT, BFLOAT16)
 DEF_FUNCTION_TYPE (V32BF, V16SF, V16SF)
 DEF_FUNCTION_TYPE (V32BF, V16SF, V16SF, V32BF, USI)
 DEF_FUNCTION_TYPE (V32BF, V16SF, V16SF, USI)
diff --git a/gcc/config/i386/i386-builtin.def b/gcc/config/i386/i386-builtin.def
index 5e0461acc00..d85b1753039 100644
--- a/gcc/config/i386/i386-builtin.def
+++ b/gcc/config/i386/i386-builtin.def
@@ -2838,6 +2838,8 @@ BDESC (0, OPTION_MASK_ISA2_AVX512BF16, 
CODE_FOR_avx512f_dpbf16ps_v8sf_maskz, "__
 BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf, 
"__builtin_ia32_dpbf16ps_v4sf", IX86_BUILTIN_DPBF16PS_V4SF, UNKNOWN, (int) 
V4SF_FTYPE_V4SF_V8BF_V8BF)
 BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_mask, 
"__builtin_ia32_dpbf16ps_v4sf_mask", IX86_BUILTIN_DPBF16PS_V4SF_MASK, UNKNOWN, 
(int) V4SF_FTYPE_V4SF_V8BF_V8BF_UQI)
 BDESC (0, OPTION_MASK_ISA2_AVX512BF16, CODE_FOR_avx512f_dpbf16ps_v4sf_maskz, 
"__builtin_ia32_dpbf16ps_v4sf_maskz", IX86_BUILTIN_DPBF16PS_V4SF_MASKZ, 
UNKNOWN, (int) V4SF_FTYPE_V4SF_V8BF_V8BF_UQI)
+BDESC (OPTION_MASK_ISA_SSE2, 0, CODE_FOR_extendbfsf2_1, 
"__builtin_ia32_cvtbf2sf", IX86_BUILTIN_CVTBF2SF, UNKNOWN, (int) 
FLOAT_FTYPE_BFLOAT16)
+
 
 /* AVX512FP16.  */
 BDESC (OPTION_MASK_ISA_AVX512VL, OPTION_MASK_ISA2_AVX512FP16, 
CODE_FOR_addv8hf3_mask, "__builtin_ia32_addph128_mask", 
IX86_BUILTIN_ADDPH128_MASK, UNKNOWN, (int) V8HF_FTYPE_V8HF_V8HF_V8HF_UQI)
diff --git a/gcc/config/i386/i386-expand.cc b/gcc/config/i386/i386-expand.cc
index 0373c3614a4..d26e7e41445 100644
--- a/gcc/config/i386/i386-expand.cc
+++ b/gcc/config/i386/i386-expand.cc
@@ -10423,6 +10423,7 @@ ix86_expand_args_builtin (const struct 
builtin_description *d,
   return ix86_expand_sse_ptest (d, exp, target);
 case FLOAT128_FTYPE_FLOAT128:
 case FLOAT_FTYPE_FLOAT:
+case FLOAT_FTYPE_BFLOAT16:
 case INT_FTYPE_INT:
 case UINT_FTYPE_UINT:
 case UINT16_FTYPE_UINT16:
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 01faa911b77..9451883396c 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -130,6 +130,7 @@ (define_c_enum "unspec" [
   ;; For AVX/AVX512F support
   UNSPEC_SCALEF
   UNSPEC_PCMP
+  UNSPEC_CVTBFSF
 
   ;; Generic math support
   UNSPEC_IEEE_MIN  ; not commutative
@@ -4961,6 +4962,31 @@ (define_insn "*extendhf2"
(set_attr "prefix" "evex")
(set_attr "mode" "")])
 
+(define_expand "extendbfsf2"
+  [(set (match_operand:SF 0 "register_operand")
+   (unspec:SF
+ [(match_operand:BF 1 "register_operand")]
+

[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863

--- Comment #7 from Hongtao.liu  ---

> -  if (width < HOST_BITS_PER_WIDE_INT)
> +  if (width < HOST_BITS_PER_WIDE_INT
> +  && (mode != QImode || !flag_signed_char))
typo should be 
+  && (mode != QImode || flag_signed_char))

[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863

--- Comment #6 from Hongtao.liu  ---
For pattern
(set (reg:QI 607)
(const_int 255 [0xff]))

general_operand return false for op const_int 255 QImode since
trunc_int_for_mode (INTVAL (op), mode) return -1, INVAL (op) is 255.

---cut from general_operand (rtx, machine_mode)--
  if (CONST_INT_P (op)
  && mode != VOIDmode
  && trunc_int_for_mode (INTVAL (op), mode) != INTVAL (op))
return false;
---cut end-


and in trunc_int_for_mode, it does signed extend, not unsigned_extend for
!flag_signed_char.

cut from trunc_int_for_mode
  /* Sign-extend for the requested mode.  */

  if (width < HOST_BITS_PER_WIDE_INT)
{
  HOST_WIDE_INT sign = 1;
  sign <<= width - 1;
  c &= (sign << 1) - 1;
  c ^= sign;
  c -= sign;
}

  return c;
--cut end--


Should we do something like 


modified   gcc/explow.cc
@@ -64,7 +64,8 @@ trunc_int_for_mode (HOST_WIDE_INT c, machine_mode mode)

   /* Sign-extend for the requested mode.  */

-  if (width < HOST_BITS_PER_WIDE_INT)
+  if (width < HOST_BITS_PER_WIDE_INT
+  && (mode != QImode || !flag_signed_char))
 {
   HOST_WIDE_INT sign = 1;
   sign <<= width - 1;

Re: [PATCH V2] Update block move for struct param or returns

2022-11-24 Thread Jiufu Guo via Gcc-patches


Based on the discussions in previous mails:
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607139.html
https://gcc.gnu.org/pipermail/gcc-patches/2022-November/607197.html

I will update the patch accordingly, and then submit a new version.

BR,
Jeff (Jiufu)

Jiufu Guo  writes:

> Hi,
>
> When assigning a parameter to a variable, or assigning a variable to
> return value with struct type, "block move" are used to expand
> the assignment. It would be better to use the register mode according
> to the target/ABI to move the blocks. And then this would raise more 
> opportunities for other optimization passes(cse/dse/xprop).
>
> As the example code (like code in PR65421):
>
> typedef struct SA {double a[3];} A;
> A ret_arg_pt (A *a){return *a;} // on ppc64le, only 3 lfd(s)
> A ret_arg (A a) {return a;} // just empty fun body
> void st_arg (A a, A *p) {*p = a;} //only 3 stfd(s)
>
> This patch is based on the previous version which supports assignments
> from parameter:
> https://gcc.gnu.org/pipermail/gcc-patches/2022-November/605709.html
> This patch also supports returns.
>
> I also tried to update gimplify/nrv to replace "return D.xxx;" with
> "return ;". While there is one issue: "" with
> PARALLEL code can not be accessed through address/component_ref.
> This issue blocks a few passes (e.g. sra, expand).
>
> On ppc64, some dead stores are not eliminated. e.g. for ret_arg:
> .cfi_startproc
> std 4,56(1)//reductant
> std 5,64(1)//reductant
> std 6,72(1)//reductant
> std 4,0(3)
> std 5,8(3)
> std 6,16(3)
> blr
>
> Bootstraped and regtested on ppc64le and x86_64.
>
> I'm wondering if this patch could be committed first.
> Thanks for the comments and suggestions.
>
>
> BR,
> Jeff (Jiufu)
>
>   PR target/65421
>
> gcc/ChangeLog:
>
>   * cfgexpand.cc (expand_used_vars): Add collecting return VARs.
>   (expand_gimple_stmt_1): Call expand_special_struct_assignment.
>   (pass_expand::execute): Free collections of return VARs.
>   * expr.cc (expand_special_struct_assignment): New function.
>   * expr.h (expand_special_struct_assignment): Declare.
>
> gcc/testsuite/ChangeLog:
>
>   * gcc.target/powerpc/pr65421-1.c: New test.
>   * gcc.target/powerpc/pr65421.c: New test.
>
> ---
>  gcc/cfgexpand.cc | 37 +
>  gcc/expr.cc  | 43 
>  gcc/expr.h   |  3 ++
>  gcc/testsuite/gcc.target/powerpc/pr65421-1.c | 21 ++
>  gcc/testsuite/gcc.target/powerpc/pr65421.c   | 19 +
>  5 files changed, 123 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421-1.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/pr65421.c
>
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index dd29c03..f185de39341 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -341,6 +341,9 @@ static hash_map *decl_to_stack_part;
> all of them in one big sweep.  */
>  static bitmap_obstack stack_var_bitmap_obstack;
>  
> +/* Those VARs on returns.  */
> +static bitmap return_vars;
> +
>  /* An array of indices such that stack_vars[stack_vars_sorted[i]].size
> is non-decreasing.  */
>  static size_t *stack_vars_sorted;
> @@ -2158,6 +2161,24 @@ expand_used_vars (bitmap forced_stack_vars)
>  frame_phase = off ? align - off : 0;
>}
>  
> +  /* Collect VARs on returns.  */
> +  return_vars = NULL;
> +  if (DECL_RESULT (current_function_decl)
> +  && TYPE_MODE (TREE_TYPE (DECL_RESULT (current_function_decl))) == 
> BLKmode)
> +{
> +  return_vars = BITMAP_ALLOC (NULL);
> +
> +  edge_iterator ei;
> +  edge e;
> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
> + if (greturn *ret = safe_dyn_cast (last_stmt (e->src)))
> +   {
> + tree val = gimple_return_retval (ret);
> + if (val && VAR_P (val))
> +   bitmap_set_bit (return_vars, DECL_UID (val));
> +   }
> +}
> +
>/* Set TREE_USED on all variables in the local_decls.  */
>FOR_EACH_LOCAL_DECL (cfun, i, var)
>  TREE_USED (var) = 1;
> @@ -3942,6 +3963,17 @@ expand_gimple_stmt_1 (gimple *stmt)
> /* This is a clobber to mark the going out of scope for
>this LHS.  */
> expand_clobber (lhs);
> + else if ((TREE_CODE (rhs) == PARM_DECL && DECL_INCOMING_RTL (rhs)
> +   && TYPE_MODE (TREE_TYPE (rhs)) == BLKmode
> +   && (GET_CODE (DECL_INCOMING_RTL (rhs)) == PARALLEL
> +   || REG_P (DECL_INCOMING_RTL (rhs
> +  || (VAR_P (lhs) && return_vars
> +  && DECL_RTL_SET_P (DECL_RESULT (current_function_decl))
> +  && GET_CODE (
> +   DECL_RTL (DECL_RESULT (current_function_decl)))
> +   == PARALLEL
> +  && bitmap_bit_p 

Re: [PATCH V2] Use subscalar mode to move struct block for parameter

2022-11-24 Thread Jiufu Guo via Gcc-patches


Hi Richard,

Thanks a lot for your comments!

Richard Biener  writes:

> On Wed, 23 Nov 2022, Jiufu Guo wrote:
>
>> Hi Jeff,
>> 
>> Thanks a lot for your comments!
>
> Sorry for the late response ...
>
>> Jeff Law  writes:
>> 
>> > On 11/20/22 20:07, Jiufu Guo wrote:
>> >> Jiufu Guo  writes:
>> >>
>> >>> Hi,
>> >>>
>> >>> As mentioned in the previous version patch:
>> >>> https://gcc.gnu.org/pipermail/gcc-patches/2022-October/604646.html
>> >>> The suboptimal code is generated for "assigning from parameter" or
>> >>> "assigning to return value".
>> >>> This patch enhances the assignment from parameters like the below
>> >>> cases:
>> >>> /case1.c
>> >>> typedef struct SA {double a[3];long l; } A;
>> >>> A ret_arg (A a) {return a;}
>> >>> void st_arg (A a, A *p) {*p = a;}
>> >>>
>> >>> case2.c
>> >>> typedef struct SA {double a[3];} A;
>> >>> A ret_arg (A a) {return a;}
>> >>> void st_arg (A a, A *p) {*p = a;}
>> >>>
>> >>> For this patch, bootstrap and regtest pass on ppc64{,le}
>> >>> and x86_64.
>> >>> * Besides asking for help reviewing this patch, I would like to
>> >>> consult comments about enhancing for "assigning to returns".
>> >> I updated the patch to fix the issue for returns.  This patch
>> >> adds a flag DECL_USEDBY_RETURN_P to indicate if a var is used
>> >> by a return stmt.  This patch fix the issue in expand pass only,
>> >> so, we would try to update the patch to avoid this flag.
>> >>
>> >> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
>> >> index dd29c03..09b8ec64cea 100644
>> >> --- a/gcc/cfgexpand.cc
>> >> +++ b/gcc/cfgexpand.cc
>> >> @@ -2158,6 +2158,20 @@ expand_used_vars (bitmap forced_stack_vars)
>> >>   frame_phase = off ? align - off : 0;
>> >> }
>> >>   +  /* Collect VARs on returns.  */
>> >> +  if (DECL_RESULT (current_function_decl))
>> >> +{
>> >> +  edge_iterator ei;
>> >> +  edge e;
>> >> +  FOR_EACH_EDGE (e, ei, EXIT_BLOCK_PTR_FOR_FN (cfun)->preds)
>> >> + if (greturn *ret = safe_dyn_cast (last_stmt (e->src)))
>> >> +   {
>> >> + tree val = gimple_return_retval (ret);
>> >> + if (val && VAR_P (val))
>> >> +   DECL_USEDBY_RETURN_P (val) = 1;
>
> you probably want to check && auto_var_in_fn (val, ...) since val
> might be global?
Since we are collecting the return vals, it would be built in function
gimplify_return_expr.  In this function, create_tmp_reg is used and
a local temp.  So it would not be a global var here.

code piece in gimplify_return_expr:
  if (!result_decl)
result = NULL_TREE;
  else if (aggregate_value_p (result_decl, TREE_TYPE (current_function_decl)))
{

  result = result_decl;
}
  else if (gimplify_ctxp->return_temp)
result = gimplify_ctxp->return_temp;
  else
{
  result = create_tmp_reg (TREE_TYPE (result_decl));

Here, for "typedef struct SA {double a[3];}", aggregate_value_p returns
false for target like ppc64le, because result of "hard_function_value"
is a "rtx with PARALLELL code".
And then a DECL_VAR is built via "create_tmp_reg" (actually it is not a
reg here. it built a DECL_VAR with RECORD type and BLK mode).

I also tried the way to use RESULT_DECL for this kind of type, or
let aggregate_value_p accept this kind of type.  But it seems not easy,
because " (RESULT_DECL with PARALLEL)" is not ok for address
operations.


>
>> >> +   }
>> >> +}
>> >> +
>> >> /* Set TREE_USED on all variables in the local_decls.  */
>> >> FOR_EACH_LOCAL_DECL (cfun, i, var)
>> >>   TREE_USED (var) = 1;
>> >> diff --git a/gcc/expr.cc b/gcc/expr.cc
>> >> index d9407432ea5..20973649963 100644
>> >> --- a/gcc/expr.cc
>> >> +++ b/gcc/expr.cc
>> >> @@ -6045,6 +6045,52 @@ expand_assignment (tree to, tree from, bool 
>> >> nontemporal)
>> >> return;
>> >>   }
>
> I miss an explanatory comment here on that the following is heuristics
> and its reasoning.
>
>> >>   +  if ((TREE_CODE (from) == PARM_DECL && DECL_INCOMING_RTL (from)
>> >> +   && TYPE_MODE (TREE_TYPE (from)) == BLKmode
>
> Why check TYPE_MODE here?  Do you want AGGREGATE_TYPE_P on the type
> instead?
Checking BLK, because I want make sure the param should occur on
register and stack (saved from register).
Actualy, my intention is checking:
GET_MODE (DECL_INCOMING_RTL (from)) == BLKmode

For code:
GET_MODE (DECL_INCOMING_RTL (from)) == BLKmode
&& (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL
|| REG_P (DECL_INCOMING_RTL (from)))
This checking indicates if the param may be passing via 2 or more
registers.

Using "AGGREGATE_TYPE_P && (PARALLEL || REG_P)" may be ok and more
readable. I would have a test.

>
>> >> +   && (GET_CODE (DECL_INCOMING_RTL (from)) == PARALLEL
>> >> +|| REG_P (DECL_INCOMING_RTL (from
>> >> +  || (VAR_P (to) && DECL_USEDBY_RETURN_P (to)
>> >> +   && TYPE_MODE (TREE_TYPE (to)) == BLKmode
>
> Likewise.
>
>> >> +   && GET_CODE (DECL_RTL (DECL_RESULT (current_function_decl)))
>> >> +== PARALLEL))
>
> Not REG_P here?
REG_P with BLK on return would 

[Bug tree-optimization/107865] New: [12/13 Regression] ICE in verify_loop_structure, at cfgloop.cc:1748 (Error: loop 3's number of iterations '_61 > 0 ? (uint128_t) (_61 + -1) : 0' references the rele

2022-11-24 Thread asolokha at gmx dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107865

Bug ID: 107865
   Summary: [12/13 Regression] ICE in verify_loop_structure, at
cfgloop.cc:1748 (Error: loop 3's number of iterations
'_61 > 0 ? (uint128_t) (_61 + -1) : 0' references the
released SSA name '_61')
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Keywords: ice-on-valid-code
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: asolokha at gmx dot com
  Target Milestone: ---

gfortran 13.0.0 20221120 snapshot (g:a16a5460447eaaff0b4468064e4d7b1cc8fc42eb)
ICEs when compiling the following testcase w/ -O1 -floop-parallelize-all
-ftree-parallelize-loops=2:

  SUBROUTINE FNC (F)

  IMPLICIT REAL (A-H)
  DIMENSION F(N)

  DO I = 1, 6
 DO J = 1, 6
IF (J .NE. I) THEN
   F(I) = F(I) + 1
END IF
 END DO
  END DO

  RETURN
  END

% gfortran-13 -O1 -floop-parallelize-all -ftree-parallelize-loops=2 -c
g7tbwxor.f
g7tbwxor.f:1:20:

1 |   SUBROUTINE FNC (F)
  |^
Error: loop 3's number of iterations '_61 > 0 ? (uint128_t) (_61 + -1) : 0'
references the released SSA name '_61'
during GIMPLE pass: ompexpssa
g7tbwxor.f:1:20: internal compiler error: in verify_loop_structure, at
cfgloop.cc:1748
0x6ac564 verify_loop_structure()
   
/var/tmp/portage/sys-devel/gcc-13.0.0_p20221120/work/gcc-13-20221120/gcc/cfgloop.cc:1748
0x1ea4a0b expand_omp_taskreg
   
/var/tmp/portage/sys-devel/gcc-13.0.0_p20221120/work/gcc-13-20221120/gcc/omp-expand.cc:1513
0x1ea91d7 expand_omp_synch
   
/var/tmp/portage/sys-devel/gcc-13.0.0_p20221120/work/gcc-13-20221120/gcc/omp-expand.cc:8653
0x1ea91d7 expand_omp
   
/var/tmp/portage/sys-devel/gcc-13.0.0_p20221120/work/gcc-13-20221120/gcc/omp-expand.cc:10610
0x1eab175 execute_expand_omp
   
/var/tmp/portage/sys-devel/gcc-13.0.0_p20221120/work/gcc-13-20221120/gcc/omp-expand.cc:10813

Re: PING^2 [PATCH] Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]

2022-11-24 Thread Kewen.Lin via Gcc-patches
Hi Richard,

on 2022/11/23 00:08, Richard Sandiford wrote:
> "Kewen.Lin"  writes:
>> Hi Richard,
>>
>> Many thanks for your review comments!
>>
> on 2022/8/24 16:17, Kewen.Lin via Gcc-patches wrote:
>> Hi,
>>
>> As discussed in PR98125, -fpatchable-function-entry with
>> SECTION_LINK_ORDER support doesn't work well on powerpc64
>> ELFv1 because the filled "Symbol" in
>>
>>   .section name,"flags"o,@type,Symbol
>>
>> sits in .opd section instead of in the function_section
>> like .text or named .text*.
>>
>> Since we already generates one label LPFE* which sits in
>> function_section of current_function_decl, this patch is
>> to reuse it as the symbol for the linked_to section.  It
>> avoids the above ABI specific issue when using the symbol
>> concluded from current_function_decl.
>>
>> Besides, with this support some previous workarounds for
>> powerpc64 ELFv1 can be reverted.
>>
>> btw, rs6000_print_patchable_function_entry can be dropped
>> but there is another rs6000 patch which needs this rs6000
>> specific hook rs6000_print_patchable_function_entry, not
>> sure which one gets landed first, so just leave it here.
>>
>> Bootstrapped and regtested on below:
>>
>>   1) powerpc64-linux-gnu P8 with default binutils 2.27
>>  and latest binutils 2.39.
>>   2) powerpc64le-linux-gnu P9 (default binutils 2.30).
>>   3) powerpc64le-linux-gnu P10 (default binutils 2.30).
>>   4) x86_64-redhat-linux with default binutils 2.30
>>  and latest binutils 2.39.
>>   5) aarch64-linux-gnu  with default binutils 2.30
>>  and latest binutils 2.39.
>>
>>
>> [snip...]
>>
>> diff --git a/gcc/varasm.cc b/gcc/varasm.cc
>> index 4db8506b106..d4de6e164ee 100644
>> --- a/gcc/varasm.cc
>> +++ b/gcc/varasm.cc
>> @@ -6906,11 +6906,16 @@ default_elf_asm_named_section (const char *name, 
>> unsigned int flags,
>>  fprintf (asm_out_file, ",%d", flags & SECTION_ENTSIZE);
>>if (flags & SECTION_LINK_ORDER)
>>  {
>> -  tree id = DECL_ASSEMBLER_NAME (decl);
>> -  ultimate_transparent_alias_target ();
>> -  const char *name = IDENTIFIER_POINTER (id);
>> -  name = targetm.strip_name_encoding (name);
>> -  fprintf (asm_out_file, ",%s", name);
>> +  /* For now, only section "__patchable_function_entries"
>> + adopts flag SECTION_LINK_ORDER, internal label LPFE*
>> + was emitted in default_print_patchable_function_entry,
>> + just place it here for linked_to section.  */
>> +  gcc_assert (!strcmp (name, "__patchable_function_entries"));
>>>
>>> I like the idea of removing the rs600 workaround in favour of making the
>>> target-independent more robust.  But this seems a bit hackish.  What
>>> would we do if SECTION_LINK_ORDER was used for something else in future?
>>>
>>
>> Good question!  I think it depends on how we can get the symbol for the
>> linked_to section, if adopting the name of the decl will suffer the
>> similar issue which this patch wants to fix, we have to reuse the label
>> LPFE* or some kind of new artificial label in the related section; or
>> we can just go with the name of the given decl, or something related to
>> that decl.  Since we can't predict any future uses, I just placed an
>> assertion here to ensure that we would revisit and adjust this part at
>> that time.  Does it sound reasonable to you?
> 
> Yeah, I guess that's good enough.  If the old scheme ends up being
> correct for some future use, we can make the new behaviour conditional
> on __patchable_function_entries.

Yes, we can check if the given section name is
"__patchable_function_entries".

> 
> So yeah, the patch LGTM to me, thanks.

Thanks again!  I rebased and re-tested it on x86/aarch64/powerpc64{,le},
just committed in r13-4294-gf120196382ac5a.

BR,
Kewen


[Bug target/99889] Add powerpc ELFv1 support for -fpatchable-function-entry* with "o" sections

2022-11-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=99889

--- Comment #3 from CVS Commits  ---
The master branch has been updated by Kewen Lin :

https://gcc.gnu.org/g:f120196382ac5ac49ec4a60f8abad42f22d45a91

commit r13-4294-gf120196382ac5ac49ec4a60f8abad42f22d45a91
Author: Kewen.Lin 
Date:   Thu Nov 24 21:17:28 2022 -0600

Adjust the symbol for SECTION_LINK_ORDER linked_to section [PR99889]

As discussed in PR98125, -fpatchable-function-entry with
SECTION_LINK_ORDER support doesn't work well on powerpc64
ELFv1 because the filled "Symbol" in

  .section name,"flags"o,@type,Symbol

sits in .opd section instead of in the function_section
like .text or named .text*.

Since we already generates one label LPFE* which sits in
function_section of current_function_decl, this patch is
to reuse it as the symbol for the linked_to section.  It
avoids the above ABI specific issue when using the symbol
concluded from current_function_decl.

Besides, with this support some previous workarounds can
be reverted.

PR target/99889

gcc/ChangeLog:

* config/rs6000/rs6000.cc (rs6000_print_patchable_function_entry):
Adjust to call function default_print_patchable_function_entry.
* targhooks.cc (default_print_patchable_function_entry_1): Remove
and
move the flags preparation ...
(default_print_patchable_function_entry): ... here, adjust to use
current_function_funcdef_no for label no.
* targhooks.h (default_print_patchable_function_entry_1): Remove.
* varasm.cc (default_elf_asm_named_section): Adjust code for
__patchable_function_entries section support with LPFE label.

gcc/testsuite/ChangeLog:

* g++.dg/pr93195a.C: Remove the skip on powerpc*-*-* 64-bit.
* gcc.target/aarch64/pr92424-2.c: Adjust LPFE1 with LPFE0.
* gcc.target/aarch64/pr92424-3.c: Likewise.
* gcc.target/i386/pr93492-2.c: Likewise.
* gcc.target/i386/pr93492-3.c: Likewise.
* gcc.target/i386/pr93492-4.c: Likewise.
* gcc.target/i386/pr93492-5.c: Likewise.

[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863

--- Comment #5 from Hongtao.liu  ---
Also I get below from build_common_tree_nodes

  /* Define `char', which is like either `signed char' or `unsigned char'
 but not the same as either.  */
  char_type_node
= (signed_char
   ? make_signed_type (CHAR_TYPE_SIZE)
   : make_unsigned_type (CHAR_TYPE_SIZE));

So using char_type_node should be ok with -funsigned_char?

[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-24 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863

Hongtao.liu  changed:

   What|Removed |Added

 CC||hjl.tools at gmail dot com

--- Comment #4 from Hongtao.liu  ---
Git blame show it start from
-cut from git blame
Richard Henderson 2009-11-26 10:39 i386-builtin-types.awk(DEF_VCETOR_TYPE):
Allow an optinal 3rd argument to define the mode
---cut end-

Intrinsics are inlines, and usally mapping to one or serverl instructions which
are not related to name mangling.

HJ do you know why?

[PATCH] [OpenMP] GC unused SIMD clones

2022-11-24 Thread Sandra Loosemore

This patch is a followup to my not-yet-reviewed patch

[PATCH v4] OpenMP: Generate SIMD clones for functions with "declare target"

https://gcc.gnu.org/pipermail/gcc-patches/2022-November/606218.html

In comments on a previous iteration of that patch, I was asked to do 
something to delete unused SIMD clones to avoid code bloat; this is it.


I've implemented something like a simple mark-and-sweep algorithm. 
Clones that are used are marked at the point where the call is generated 
in the vectorizer.  The loop that iterates over functions to apply the 
passes after IPA is modified to defer processing of unmarked clones, and 
anything left over is deleted.


OK to commit this along with the above-linked patch?

-SandraFrom bfffcea926d4dfb6275346237c61922a95c9e715 Mon Sep 17 00:00:00 2001
From: Sandra Loosemore 
Date: Wed, 23 Nov 2022 23:14:31 +
Subject: [PATCH] [OpenMP] GC unused SIMD clones

SIMD clones are created during the IPA phase when it is not known whether
or not the vectorizer can use them.  Clones for functions with external
linkage are part of the ABI, but local clones can be GC'ed if no calls are
found in the compilation unit after vectorization.

gcc/ChangeLog
	* cgraph.h (struct cgraph_node): Add gc_candidate bit, modify
	default constructor to initialize it.
	* cgraphunit.cc (expand_all_functions): Save gc_candidate functions
	for last and iterate to handle recursive calls.  Delete leftover
	candidates at the end.
	* omp-simd-clone.cc (simd_clone_create): Set gc_candidate bit
	on local clones.
	* tree-vect-stmts.cc (vectorizable_simd_clone_call): Clear
	gc_candidate bit when a clone is used.

gcc/testsuite/ChangeLog
	* testsuite/g++.dg/gomp/target-simd-clone-1.C: Tweak to test
	that the unused clone is GC'ed.
	* testsuite/gcc.dg/gomp/target-simd-clone-1.c: Likewise.
---
 gcc/cgraph.h  |  7 ++-
 gcc/cgraphunit.cc | 49 ---
 gcc/omp-simd-clone.cc |  5 ++
 .../g++.dg/gomp/target-simd-clone-1.C |  7 ++-
 .../gcc.dg/gomp/target-simd-clone-1.c |  6 ++-
 gcc/tree-vect-stmts.cc|  3 ++
 6 files changed, 66 insertions(+), 11 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 4be67e3cea9..b065677a8d0 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -891,7 +891,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
   versionable (false), can_change_signature (false),
   redefined_extern_inline (false), tm_may_enter_irr (false),
   ipcp_clone (false), declare_variant_alt (false),
-  calls_declare_variant_alt (false), m_uid (uid), m_summary_id (-1)
+  calls_declare_variant_alt (false), gc_candidate (false),
+  m_uid (uid), m_summary_id (-1)
   {}
 
   /* Remove the node from cgraph and all inline clones inlined into it.
@@ -1490,6 +1491,10 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
   unsigned declare_variant_alt : 1;
   /* True if the function calls declare_variant_alt functions.  */
   unsigned calls_declare_variant_alt : 1;
+  /* True if the function should only be emitted if it is used.  This flag
+ is set for local SIMD clones when they are created and cleared if the
+ vectorizer uses them.  */
+  unsigned gc_candidate : 1;
 
 private:
   /* Unique id of the node.  */
diff --git a/gcc/cgraphunit.cc b/gcc/cgraphunit.cc
index b05d790bf8d..587daf5674e 100644
--- a/gcc/cgraphunit.cc
+++ b/gcc/cgraphunit.cc
@@ -1996,19 +1996,52 @@ expand_all_functions (void)
 
   /* Output functions in RPO so callees get optimized before callers.  This
  makes ipa-ra and other propagators to work.
- FIXME: This is far from optimal code layout.  */
-  for (i = new_order_pos - 1; i >= 0; i--)
-{
-  node = order[i];
+ FIXME: This is far from optimal code layout.
+ Make multiple passes over the list to defer processing of gc
+ candidates until all potential uses are seen.  */
+  int gc_candidates = 0;
+  int prev_gc_candidates = 0;
 
-  if (node->process)
+  while (1)
+{
+  for (i = new_order_pos - 1; i >= 0; i--)
 	{
-	  expanded_func_count++;
-	  node->process = 0;
-	  node->expand ();
+	  node = order[i];
+
+	  if (node->gc_candidate)
+	gc_candidates++;
+	  else if (node->process)
+	{
+	  expanded_func_count++;
+	  node->process = 0;
+	  node->expand ();
+	}
 	}
+  if (!gc_candidates || gc_candidates == prev_gc_candidates)
+	break;
+  prev_gc_candidates = gc_candidates;
+  gc_candidates = 0;
 }
 
+  /* Free any unused gc_candidate functions.  */
+  if (gc_candidates)
+for (i = new_order_pos - 1; i >= 0; i--)
+  {
+	node = order[i];
+	if (node->gc_candidate)
+	  {
+	struct function *fn = DECL_STRUCT_FUNCTION (node->decl);
+	if (symtab->dump_file)
+	  fprintf (symtab->dump_file,
+		   "Deleting unused function %s\n",
+		   IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (node->decl)));
+	

[committed] libstdc++: Change return type of std::bit_width to int (LWG 3656)

2022-11-24 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

libstdc++-v3/ChangeLog:

* doc/html/manual/bugs.html: Regenerate.
* doc/xml/manual/intro.xml: Document LWG 3656 change.
* include/std/bit (__bit_width, bit_width): Return int.
* testsuite/26_numerics/bit/bit.pow.two/lwg3656.cc: New test.
---
 libstdc++-v3/doc/html/manual/bugs.html|  4 
 libstdc++-v3/doc/xml/manual/intro.xml |  7 +++
 libstdc++-v3/include/std/bit  |  6 --
 .../26_numerics/bit/bit.pow.two/lwg3656.cc| 15 +++
 4 files changed, 30 insertions(+), 2 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/26_numerics/bit/bit.pow.two/lwg3656.cc

diff --git a/libstdc++-v3/doc/html/manual/bugs.html 
b/libstdc++-v3/doc/html/manual/bugs.html
index 58600cd6ede..c4a2c26ea39 100644
--- a/libstdc++-v3/doc/html/manual/bugs.html
+++ b/libstdc++-v3/doc/html/manual/bugs.html
@@ -619,4 +619,8 @@
path::lexically_relative is confused by trailing slashes

 Implement the fix for trailing slashes.
+http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-defects.html#3656; 
target="_top">3656:
+   Inconsistent bit operations returning a count
+   
+Changed bit_width to return 
int.
 Prev Up NextLicense Home Chapter 2. 
Setup
\ No newline at end of file
diff --git a/libstdc++-v3/doc/xml/manual/intro.xml 
b/libstdc++-v3/doc/xml/manual/intro.xml
index dee01c82159..aee96e37c61 100644
--- a/libstdc++-v3/doc/xml/manual/intro.xml
+++ b/libstdc++-v3/doc/xml/manual/intro.xml
@@ -1308,6 +1308,13 @@ requirements of the license of GCC.
 Implement the fix for trailing slashes.
 
 
+http://www.w3.org/1999/xlink; xlink:href="#3656">3656:
+   Inconsistent bit operations returning a count
+   
+
+Changed bit_width to return int.
+
+
   
 
  
diff --git a/libstdc++-v3/include/std/bit b/libstdc++-v3/include/std/bit
index 2fd80187210..3e072ef2113 100644
--- a/libstdc++-v3/include/std/bit
+++ b/libstdc++-v3/include/std/bit
@@ -361,7 +361,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   template
-constexpr _Tp
+constexpr int
 __bit_width(_Tp __x) noexcept
 {
   constexpr auto _Nd = __gnu_cxx::__int_traits<_Tp>::__digits;
@@ -448,9 +448,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 bit_floor(_Tp __x) noexcept
 { return std::__bit_floor(__x); }
 
+  // _GLIBCXX_RESOLVE_LIB_DEFECTS
+  // 3656. Inconsistent bit operations returning a count
   /// The smallest integer greater than the base-2 logarithm of `x`.
   template
-constexpr _If_is_unsigned_integer<_Tp>
+constexpr _If_is_unsigned_integer<_Tp, int>
 bit_width(_Tp __x) noexcept
 { return std::__bit_width(__x); }
 
diff --git a/libstdc++-v3/testsuite/26_numerics/bit/bit.pow.two/lwg3656.cc 
b/libstdc++-v3/testsuite/26_numerics/bit/bit.pow.two/lwg3656.cc
new file mode 100644
index 000..4752c3b1d33
--- /dev/null
+++ b/libstdc++-v3/testsuite/26_numerics/bit/bit.pow.two/lwg3656.cc
@@ -0,0 +1,15 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+
+#include 
+
+template constexpr bool is_int = false;
+template<> constexpr bool is_int = true;
+
+// LWG 3656. Inconsistent bit operations returning a count
+// Rturn type of std::bit_width(T) changed from T to int.
+static_assert( is_int );
+static_assert( is_int );
+static_assert( is_int );
+static_assert( is_int );
+static_assert( is_int );
-- 
2.38.1



[committed] libstdc++: Update tests on trunk [PR106201]

2022-11-24 Thread Jonathan Wakely via Gcc-patches
Tested x86_64-linux. Pushed to trunk.

-- >8 --

This copies the better tests from gcc-12 to trunk.

libstdc++-v3/ChangeLog:

PR libstdc++/106201
* testsuite/27_io/filesystem/iterators/106201.cc: Improve test.
* testsuite/experimental/filesystem/iterators/106201.cc: New test.
---
 .../testsuite/27_io/filesystem/iterators/106201.cc |  8 +---
 .../experimental/filesystem/iterators/106201.cc| 14 ++
 2 files changed, 19 insertions(+), 3 deletions(-)
 create mode 100644 
libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc

diff --git a/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc
index 4a64e675816..c5fefd9ac3f 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/iterators/106201.cc
@@ -5,8 +5,10 @@
 // PR libstdc++/106201 constraint recursion in path(Source const&) constructor.
 
 #include 
-#include 
-using I = std::counted_iterator;
+#include 
+#include 
+namespace fs = std::filesystem;
+using I = std::counted_iterator;
 static_assert( std::swappable );
-using R = std::counted_iterator;
+using R = std::counted_iterator;
 static_assert( std::swappable );
diff --git a/libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc 
b/libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc
new file mode 100644
index 000..017b72ef5f6
--- /dev/null
+++ b/libstdc++-v3/testsuite/experimental/filesystem/iterators/106201.cc
@@ -0,0 +1,14 @@
+// { dg-options "-std=gnu++20" }
+// { dg-do compile { target c++20 } }
+// { dg-require-filesystem-ts "" }
+
+// PR libstdc++/106201 constraint recursion in path(Source const&) constructor.
+
+#include 
+#include 
+#include 
+namespace fs = std::experimental::filesystem;
+using I = std::counted_iterator;
+static_assert( std::swappable );
+using R = std::counted_iterator;
+static_assert( std::swappable );
-- 
2.38.1



[Bug libstdc++/106201] filesystem::directory_iterator is a borrowable range?

2022-11-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106201

--- Comment #13 from CVS Commits  ---
The master branch has been updated by Jonathan Wakely :

https://gcc.gnu.org/g:3892251498c16c9507cf8471f4f10676212e9ead

commit r13-4292-g3892251498c16c9507cf8471f4f10676212e9ead
Author: Jonathan Wakely 
Date:   Tue Nov 22 21:51:06 2022 +

libstdc++: Update tests on trunk [PR106201]

This copies the better tests from gcc-12 to trunk.

libstdc++-v3/ChangeLog:

PR libstdc++/106201
* testsuite/27_io/filesystem/iterators/106201.cc: Improve test.
* testsuite/experimental/filesystem/iterators/106201.cc: New test.

[Bug c++/107864] [10/11/12/13 Regression] ICE (seg fault) in check_return_expr or instantiate_body with concepts and specialized version

2022-11-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107864

Jonathan Wakely  changed:

   What|Removed |Added

 CC||jason at gcc dot gnu.org

--- Comment #7 from Jonathan Wakely  ---
Regression started with r10-3735-gcb57504a550158 "Update the concepts
implementation to conform to C++20."

[Bug target/107551] g++ 12.2 test fails

2022-11-24 Thread brjd_epdjq36 at kygur dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107551

--- Comment #8 from Brjd  ---
__get_cpuid(0): eax=0xa, ebx=0x756e6547, ecx=0x6c65746e, edx=0x49656e69
__get_cpuid(1): eax=0x106ca, ebx=0x20800, ecx=0x40e39d, edx=0xbfe9fbff

[Bug tree-optimization/97832] AoSoA complex caxpy-like loops: AVX2+FMA -Ofast 7 times slower than -O3

2022-11-24 Thread already5chosen at yahoo dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97832

--- Comment #14 from Michael_S  ---
I tested a smaller test bench from Comment 3 with gcc trunk on godbolt.
Issue appears to be only partially fixed.
-Ofast result is no longer a horror that it was before, but it is still not as
good as -O3 or -O2. -Ofast code generation is still strange and there are few
vblendpd instruction that serve no useful purpose.

And -O2/O3 is still not as good as it should be or as good as icc.
But, as mentioned in my original post, over-aggressive load+op combining is a
separate problem.

[Bug ipa/107661] [13 Regression] lambdas get merged incorrectly in tempaltes, cause llvm-12 miscompilation since r13-3358-ge0403e95689af7

2022-11-24 Thread slyfox at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107661

--- Comment #18 from Sergei Trofimovich  ---
The fix also fixed all initial llvm-12's test suite failures for me. Thank you!

gcc-10-20221124 is now available

2022-11-24 Thread GCC Administrator via Gcc
Snapshot gcc-10-20221124 is now available on
  https://gcc.gnu.org/pub/gcc/snapshots/10-20221124/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 10 git branch
with the following options: git://gcc.gnu.org/git/gcc.git branch 
releases/gcc-10 revision 187c235b65c3c23bbbcef6ece4343dc924c5d60c

You'll find:

 gcc-10-20221124.tar.xz   Complete GCC

  SHA256=37aec2f46a173c37dd86269664dc754c57116721043b028cad54c20d6669aa22
  SHA1=9831d55ccea8b80da069b33a060ea8a7e3aab9b5

Diffs from 10-20221117 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-10
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


[Bug fortran/107819] ICE in gfc_check_argument_var_dependency, at fortran/dependency.cc:978

2022-11-24 Thread mikael at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107819

--- Comment #9 from Mikael Morin  ---
(In reply to anlauf from comment #7)
> 
> In the meantime, do you have an idea where to force the generation of a
> temporary?  I've been scrolling through gfc_conv_procedure_call to see
> if that might be the right place, but that's not a small function...

It seems the semantics when an argument has the value attribute is the same as
the case ELEM_CHECK_VARIABLE in my previous comment.
So forcing the value of the elemental argument to ELEM_CHECK_VARIABLE at some
appropriate place could possibly work.

[Bug c++/107864] [10/11/12/13 Regression] ICE (seg fault) in check_return_expr or instantiate_body with concepts and specialized version

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107864

Andrew Pinski  changed:

   What|Removed |Added

Summary|[10/11/12/13 Regression]|[10/11/12/13 Regression]
   |ICE (seg fault) in  |ICE (seg fault) in
   |check_return_expr with  |check_return_expr or
   |concepts and specialized|instantiate_body with
   |version |concepts and specialized
   ||version

--- Comment #6 from Andrew Pinski  ---
Note without the concept there, GCC does not ICE.

Note changing it slightly (remove the if return type):
template
void j( T const& val ) requires true {}
template<> void j( int const& val ) { }
void g() {
  j(1);
}
--- CUT ---
Gives a different ICE:
t.cc: In instantiation of ‘void j(const T&) requires  true [with T = int]’:
t.cc:5:4:   required from here
t.cc:2:39: internal compiler error: Segmentation fault
2 | void j( T const& val ) requires true {}
  |   ^
0x120e72f crash_signal
/home/apinski/src/upstream-gcc/gcc/gcc/toplev.cc:314
0xbd42d1 instantiate_body
/home/apinski/src/upstream-gcc/gcc/gcc/cp/pt.cc:26494
0xbd52ba instantiate_decl(tree_node*, bool, bool)
/home/apinski/src/upstream-gcc/gcc/gcc/cp/pt.cc:26774
0xbf2b2b instantiate_pending_templates(int)
/home/apinski/src/upstream-gcc/gcc/gcc/cp/pt.cc:26852
0xaaa7fb c_parse_final_cleanups()
/home/apinski/src/upstream-gcc/gcc/gcc/cp/decl2.cc:4940
0xcd38b0 c_common_parse_file()
/home/apinski/src/upstream-gcc/gcc/gcc/c-family/c-opts.cc:1266
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

[Bug c++/107864] [10/11/12/13 Regression] ICE (seg fault) in check_return_expr with concepts and specialized version

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107864

Andrew Pinski  changed:

   What|Removed |Added

  Known to fail||10.1.0, 11.1.0
  Known to work||9.5.0
   Target Milestone|--- |10.5
Summary|ICE (seg fault) in  |[10/11/12/13 Regression]
   |check_return_expr with  |ICE (seg fault) in
   |concepts and specialized|check_return_expr with
   |version |concepts and specialized
   ||version

--- Comment #5 from Andrew Pinski  ---
Works in GCC 9.5.0 with -std=c++2a -fconcepts.

[Bug c++/107861] C++ static_assert() does not honor -fwrapv, leading to compilation error

2022-11-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107861

--- Comment #8 from Jonathan Wakely  ---
(In reply to Jonathan Wakely from comment #7)
> int_max+1 is not a core constant (C++20 7.7 [expr.const] paragraph 5, bullet

oops, s/core constant/core constant expression/

> 5.7), so is not usable as the condition in a static_assert, so the program
> is ill-formed.

[Bug c++/107864] Internal Compiler Error (Large Project, C++20)

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107864

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2022-11-24

--- Comment #4 from Andrew Pinski  ---
Reduced testcase:
template
T j( T const& val ) requires true
{
  return val;
}
template<> int j( int const& val ) { return 0; }
int t = j(1);

[Bug c++/107864] Internal Compiler Error (Large Project, C++20)

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107864

--- Comment #3 from Andrew Pinski  ---
(In reply to Andrew Pinski from comment #2)
> Hmm, I don't know if I reduced it too much but this might be __bfloat16_t
> related ...

It is not, I changed 0.0bf16 to just 0.0f and it fails. still reducing it.

[Bug fortran/107819] ICE in gfc_check_argument_var_dependency, at fortran/dependency.cc:978

2022-11-24 Thread mikael at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107819

--- Comment #8 from Mikael Morin  ---
(In reply to anlauf from comment #3)
> Could need help by some expert on this...

I guess I qualify as expert.
Reading the code again after years, it is not exactly crystal clear...

Here is a dump of what I could gather about the gfc_check_fncall_dependency and
friends functions.


The different gfc_dep_check cases are the following:


ELEM_DONT_CHECK_VARIABLE:
This is the simple case of direct subroutine call.
As per the 15.5.2.13 I quoted above, this is invalid:
  call elem_sub(a(2:n), a(1:n-1))
while this isn't
  call elem_sub(a, a)

so we can always generate:
  do i = ...
call elem_sub(a(...), a(...))
  end do

without caring for temporaries


ELEM_CHECK_VARIABLE:
This is the case of multiple elemental procedures.
For example:
  call elem_sub(a, elem_func(a))

The semantics is like:
  tmp = elem_func(a)
  call elem_sub(a, tmp)

Here, elem_sub can write to a without modifying tmp, and we have to
preserve that.
We generate code like this:
  do i = ...
call elem_sub(tmp(i), elem_func(a(i)))
  end do
  a = tmp
and try to avoid the temporary tmp if possible.
we explore the second argument to elem_sub and look for the same variable
as the expression from the first one, and we generate a temporary
if we find it.  But there is no need if they are strictly the same
variable reference.


NOT_ELEMENTAL:
This is the case of the presence of transpose in the expression
For example, for elem_sub(var, elem_func(transpose(var))), the semantics is:
  tmp1 = transpose(var)
  tmp2 = elem_func(tmp1)
  call elem_sub(var, tmp2)

which we try to preserve, but with less temporaries.
We try to generate
  do i = ..., j = ...
call elem_sub(tmp(i,j), elem_func(var(j,i)))
  end do
  var = tmp

and try to avoid the temporary tmp if possible (it's not with this example).
We have to make sure that if the same variable appears in a subexpression
of the argument, a temporary is generated.
Contrary to the previous case, we have to generate the temporary
even if the variable references are strictly the same.

[Bug c++/107864] Internal Compiler Error (Large Project, C++20)

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107864

--- Comment #2 from Andrew Pinski  ---
Hmm, I don't know if I reduced it too much but this might be __bfloat16_t
related ...

[Bug fortran/107819] ICE in gfc_check_argument_var_dependency, at fortran/dependency.cc:978

2022-11-24 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107819

--- Comment #7 from anlauf at gcc dot gnu.org ---
(In reply to Mikael Morin from comment #6)
> (In reply to anlauf from comment #5)
> > Second, I stumbled over:
> > 
> > ! 15.5.2.3 Argument association
> > ! (4) A present dummy argument with the VALUE attribute becomes argument
> > ! associated with a definable anonymous data object whose initial value is
> > ! the value of the actual argument.
> > 
> Ouch! You're right, this makes the part I quoted above irrelevant.
> And it explicitly asks for a temporary.

I've asked Intel if they agree with this interpretation.

In the meantime, do you have an idea where to force the generation of a
temporary?  I've been scrolling through gfc_conv_procedure_call to see
if that might be the right place, but that's not a small function...

[Bug libstdc++/107850] [12/13 Regression] std::erase_if (map) forces predicate to takes a const value_type

2022-11-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107850

Jonathan Wakely  changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |redi at gcc dot gnu.org
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2022-11-24

[Bug analyzer/107158] False postives from -Wanalyzer-malloc-leak on tin-2.6.2

2022-11-24 Thread urs at akk dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107158

--- Comment #9 from urs at akk dot org ---
After commit ce917b0422c145779b83e005afd8433c0c86fb06 this doesn't show up
anymore.

[Bug c++/107861] C++ static_assert() does not honor -fwrapv, leading to compilation error

2022-11-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107861

--- Comment #7 from Jonathan Wakely  ---
(In reply to Markus F.X.J. Oberhumer from comment #6)
> Please note that I'm explicitly using "int_max" and not "INT_MAX",

It's a constexpr variable with the same value, so that makes no difference
whatsoever.

> and I'd
> appreciate if you could give me a link where the standard says this is
> "ill-formed". Thanks!

The condition is a static assert must be a constant-expression (C++20 9.1
[dcl.pre] paragraph 6).

int_max+1 is not a core constant (C++20 7.7 [expr.const] paragraph 5, bullet
5.7), so is not usable as the condition in a static_assert, so the program is
ill-formed.

[PATCH] gcc/jit/jit-recording.cc: recording::global::write_to_dump: Avoid crashes when writing psuedo-C for globals with string initializers.

2022-11-24 Thread Vibhav Pant via Gcc-patches
If a char * global was initialized with a rvalue from
`gcc_jit_context_new_string_literal` containing a format string,
dumping the context causes libgccjit to SIGSEGV due to an improperly
constructed call to vasprintf. The following code snippet can reproduce
the crash:

int main(int argc, char **argv)
{
 gcc_jit_context *ctxt = gcc_jit_context_acquire ();
 gcc_jit_lvalue *var = gcc_jit_context_new_global(
 ctxt, NULL, GCC_JIT_GLOBAL_EXPORTED,
 gcc_jit_context_get_type(ctxt, GCC_JIT_TYPE_CONST_CHAR_PTR),
 "var");
 gcc_jit_global_set_initializer_rvalue(
 var, gcc_jit_context_new_string_literal(ctxt, "%s"));
 gcc_jit_context_dump_to_file (ctxt, "output", 0);
 return 0;
}

The offending line is jit-recording.cc:4922, where a call to d.write
passes the initializer rvalue's debug string to `write` without a
format specifier. The attached patch fixes this issue.

Thanks,
Vibhav
-- 
Vibhav Pant
vibh...@gmail.com
GPG: 7ED1 D48C 513C A024 BE3A 785F E3FB 28CB 6AB5 9598

From e598a4076b2bff72b4a3cc29d1d70db8c53baf45 Mon Sep 17 00:00:00 2001
From: Vibhav Pant 
Date: Fri, 25 Nov 2022 02:02:09 +0530
Subject: [PATCH] jit-recording.cc: Dump string literal initializers correctly

---
 gcc/jit/jit-recording.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/jit/jit-recording.cc b/gcc/jit/jit-recording.cc
index 6ae5a667e90..7bb98ddcb42 100644
--- a/gcc/jit/jit-recording.cc
+++ b/gcc/jit/jit-recording.cc
@@ -4919,7 +4919,7 @@ recording::global::write_to_dump (dump )
   else if (m_rvalue_init)
 {
   d.write (" = ");
-  d.write (m_rvalue_init->get_debug_string ());
+  d.write ("%s", m_rvalue_init->get_debug_string ());
   d.write (";\n");
 }
 
-- 
2.38.1



signature.asc
Description: This is a digitally signed message part


[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-24 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863

Richard Biener  changed:

   What|Removed |Added

 Target||x86_64-*-*

--- Comment #3 from Richard Biener  ---
Not sure how name mangling is a concern for intrinsics...

[Bug c++/107861] C++ static_assert() does not honor -fwrapv, leading to compilation error

2022-11-24 Thread markus at oberhumer dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107861

--- Comment #6 from Markus F.X.J. Oberhumer  ---
Please note that I'm explicitly using "int_max" and not "INT_MAX", and I'd
appreciate if you could give me a link where the standard says this is
"ill-formed". Thanks!

[Bug c++/107864] Internal Compiler Error (Large Project, C++20)

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107864

--- Comment #1 from Andrew Pinski  ---
/path/to/mycodedistr/include/mycode/grfx/_cpu/../color/../../_stdafx/_math.hpp:111:23:
internal compiler error: Segmentation fault
0x120e72f crash_signal
/home/apinski/src/upstream-gcc/gcc/gcc/toplev.cc:314
0xc5c9b2 check_return_expr(tree_node*, bool*)
/home/apinski/src/upstream-gcc/gcc/gcc/cp/typeck.cc:11062
0xc07eee finish_return_stmt(tree_node*)
/home/apinski/src/upstream-gcc/gcc/gcc/cp/semantics.cc:1229
0xbca0db tsubst_expr(tree_node*, tree_node*, int, tree_node*)
/home/apinski/src/upstream-gcc/gcc/gcc/cp/pt.cc:18580
0xbc9931 tsubst_expr(tree_node*, tree_node*, int, tree_node*)
/home/apinski/src/upstream-gcc/gcc/gcc/cp/pt.cc:18932
0xbd4295 tsubst_expr(tree_node*, tree_node*, int, tree_node*)
/home/apinski/src/upstream-gcc/gcc/gcc/cp/pt.cc:18556
0xbd4295 instantiate_body
/home/apinski/src/upstream-gcc/gcc/gcc/cp/pt.cc:26484
0xbd52ba instantiate_decl(tree_node*, bool, bool)
/home/apinski/src/upstream-gcc/gcc/gcc/cp/pt.cc:26774
0xbf2b2b instantiate_pending_templates(int)
/home/apinski/src/upstream-gcc/gcc/gcc/cp/pt.cc:26852
0xaaa7fb c_parse_final_cleanups()
/home/apinski/src/upstream-gcc/gcc/gcc/cp/decl2.cc:4940
0xcd38b0 c_common_parse_file()
/home/apinski/src/upstream-gcc/gcc/gcc/c-family/c-opts.cc:1266
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See  for instructions.

Let me try to reduce it.

[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863

Andrew Pinski  changed:

   What|Removed |Added

   Keywords||FIXME

--- Comment #2 from Andrew Pinski  ---
>From i386-builtin-types.def:
# ??? Logically this should be intQI_type_node, but that maps to "signed char"
# which is a different type than "char" even if "char" is signed.  This must
# match the usage in emmintrin.h and changing this would change name mangling
# and so is not advisable.
DEF_PRIMITIVE_TYPE (QI, char_type_node)

[Bug c++/107861] C++ static_assert() does not honor -fwrapv, leading to compilation error

2022-11-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107861

--- Comment #5 from Jonathan Wakely  ---
To expand on that, the standard allows compilers to do anything for undefined
behaviour, including making it valid with well-defined semantics. So wrapping
for undefined overflow is a conforming extension. But the outcome of overflow
in a constant expression is not undefined, so the implementation isn't allowed
to choose the behaviour. It has to be ill-formed (although maybe that could be
a pedantic warning when -fwrapv is used).

[Bug c/107831] Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page

2022-11-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831

--- Comment #8 from Jakub Jelinek  ---
Alloca itself doesn't touch the stack on many architectures, and the code
doesn't have to have a function call in between.

[Bug target/107863] [10/11/12/13 Regression] ICE with unrecognizable insn when using -funsigned-char with some SSE/AVX builtins

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863

Andrew Pinski  changed:

   What|Removed |Added

   Target Milestone|--- |10.5
Summary|ICE with unrecognizable |[10/11/12/13 Regression]
   |insn when using |ICE with unrecognizable
   |-funsigned-char with some   |insn when using
   |AVX builtins|-funsigned-char with some
   ||SSE/AVX builtins
  Known to work||4.4.7
   Keywords||ice-on-valid-code
  Known to fail||4.5.3
   Last reconfirmed||2022-11-24
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1

--- Comment #1 from Andrew Pinski  ---
Reduced to just:
#include 

__m128i f(__m128i a) {
   return _mm_insert_epi8(a, -1, 2);
}

This only requires -msse4.1 -funsigned-char to reproduce the ICE.

;; _4 = __builtin_ia32_vec_set_v16qi (_1, 255, 2);

(insn 7 6 8 (set (reg:QI 86)
(const_int 255 [0xff])) "/app/example.cpp":4:11 -1
 (nil))

Without -funsigned-char:
;; _4 = __builtin_ia32_vec_set_v16qi (_1, -1, 2);

(insn 7 6 8 (set (reg:QI 86)
(const_int -1 [0x])) "/app/example.cpp":4:11 -1
 (nil))


I suspect the issue is the definition of __builtin_ia32_vec_set_v16qi uses char
type rather than signed/unsigned char here ...

[Bug c++/107861] C++ static_assert() does not honor -fwrapv, leading to compilation error

2022-11-24 Thread redi at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107861

--- Comment #4 from Jonathan Wakely  ---
Andrew is right. The C++ standard says this is ill-formed, the -fwrapv option
isn't allowed to change that.

The option means that runtime overflow is well-defined instead of undefined,
but that doesn't change static compile-time behaviour that is ill-formed.
Overflow in constant expressions and template arguments is still an error, even
if the arithmetic would wrap at runtime.

[Bug c++/107864] New: Internal Compiler Error (Large Project, C++20)

2022-11-24 Thread ian at geometrian dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107864

Bug ID: 107864
   Summary: Internal Compiler Error (Large Project, C++20)
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: ian at geometrian dot com
  Target Milestone: ---

Created attachment 53962
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53962=edit
".zip" file containing "main.cpp", the repro

Using g++ built from source yesterday, getting internal compiler error
(segfault) on a large C++20 project. A preprocessed file (lightly edited and
zipped for file size limit) reproducing the error is attached.

Compiled with:

g++ main.cpp -std=c++20 -mavx2

The output begins with:

In file included from
/path/to/mycodedistr/include/mycode/grfx/_cpu/../color/../../_stdafx/matr.hpp:7,
 from
/path/to/mycodedistr/include/mycode/grfx/_cpu/../color/../../_stdafx.hpp:254,
 from
/path/to/mycodedistr/include/mycode/grfx/_cpu/../color/tuples.hpp:3,
 from
/path/to/mycodedistr/include/mycode/grfx/_cpu/framebuffer.hpp:3,
 from
/path/to/mycodedistr/src/mycode/grfx/_cpu/framebuffer.cpp:1:
   
/path/to/mycodedistr/include/mycode/grfx/_cpu/../color/../../_stdafx/_math.hpp:
In instantiation of ‘T BR::Math::round(const T&) requires 
is_floating_point_v [with T = float]’:
/path/to/mycodedistr/include/mycode/grfx/color/convert.hpp:132:29:  
required from here
   
/path/to/mycodedistr/include/mycode/grfx/_cpu/../color/../../_stdafx/_math.hpp:111:23:
internal compiler error: Segmentation fault
0x7f3e5c1be51f ???
./signal/../sysdeps/unix/sysv/linux/x86_64/libc_sigaction.c:0
0x7f3e5c1a5d8f __libc_start_call_main
../sysdeps/nptl/libc_start_call_main.h:58
0x7f3e5c1a5e3f __libc_start_main_impl
../csu/libc-start.c:392
Please submit a full bug report

"gcc -v" includes:

Target: x86_64-pc-linux-gnu
Configured with: ../gcc-src/configure --enable-languages=c,c++
--enable-multiarch --program-suffix=-trunk
gcc version 13.0.0 20221124 (experimental) (GCC)

[Bug c/107831] Missed optimization: -fclash-stack-protection causes unnecessary code generation for dynamic stack allocations that are clearly less than a page

2022-11-24 Thread pskocik at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107831

--- Comment #7 from Petr Skocik  ---
(In reply to Jakub Jelinek from comment #4)
> Say for
> void bar (char *);
> void
> foo (int x, int y)
> {
>   __attribute__((assume (x < 64)));
>   for (int i = 0; i < y; ++i)
> bar (__builtin_alloca (x));
> }
> all the alloca calls are known to be small, yet they can quickly cross pages.
> Similarly:
> void
> baz (int x)
> {
>   if (x >= 512) __builtin_unreachable ();
>   char a[x];
>   bar (a);
>   char b[x];
>   bar (b);
>   char c[x];
>   bar (c);
>   char d[x];
>   bar (d);
>   char e[x];
>   bar (e);
>   char f[x];
>   bar (f);
>   char g[x];
>   bar (g);
>   char h[x];
>   bar (h);
>   char i[x];
>   bar (i);
>   char j[x];
>   bar (j);
> }
> All the VLAs here are small, yet together they can cross a page.
> So, we'd need to punt for dynamic allocations in loops and for others
> estimate
> the maximum size of all the allocations together (+ __builtin_alloca
> overhead + normal frame size).

I think this shouldn't need probes either (unless you tried to coalesce the
allocations) on architectures where making a function call touches the stack.
Also alloca's of less than or equal to half a page intertwined with writes
anywhere to the allocated blocks should be always safe (but I guess I'll just
turn stack-clash-protection off in the one file where I'm making such clearly
safe dynamic stack allocations).

[Bug fortran/107819] ICE in gfc_check_argument_var_dependency, at fortran/dependency.cc:978

2022-11-24 Thread mikael at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107819

--- Comment #6 from Mikael Morin  ---
(In reply to anlauf from comment #5)
> (In reply to Mikael Morin from comment #4)
> > But is it required to generate a temporary?
> > As I understand it, the code is invalid, and (correctly) diagnosed, so there
> > is nothing else to do.
> > It's invalid because of 15.5.2.13 Restrictions on entities associated with
> > dummy arguments:
> > (4) If the value of the entity or any subobject of it is affected through
> > the dummy argument, then at any time during the invocation and execution of
> > the procedure, either before or after the definition, it shall be referenced
> > only through that dummy argument unless (...)
> 
> Right.
> 
> I was confused by two observations.  First, NAG & Cray seem to generate
> temporaries, while Intel and NVidia don't and would agree with gfortran
> after the patch.
> 
> Second, I stumbled over:
> 
> ! 15.5.2.3 Argument association
> ! (4) A present dummy argument with the VALUE attribute becomes argument
> ! associated with a definable anonymous data object whose initial value is
> ! the value of the actual argument.
> 
Ouch! You're right, this makes the part I quoted above irrelevant.
And it explicitly asks for a temporary.

> So it boils down to what ELEMENTAL actually means in that context.  F2018:
> 
> 15.8.3 Elemental subroutine actual arguments
> 
> ! In a reference to an elemental subroutine, if the actual arguments
> ! corresponding to INTENT(OUT) and INTENT(INOUT) dummy arguments are
> ! arrays, the values of the elements, if any, of the results are the same
> ! as would be obtained if the subroutine had been applied separately, in
> ! array element order, to corresponding elements of each array actual
> ! argument.
> 
> So I read this that
> 
>call s (a(n), a)
> 
> is to be interpreted as
> 
>   do i = 1, size (a)
>  call s (a(n(i)), a(i))
>   end do
> 
> and this would actually be well-defined behavior... ;-)

With your quote from 15.5.2.3 above, it would be more like:
do i = 1, size(a)
  tmp(i) = a(n(i))
end do
do i = 1, size(a)
  call s(tmp(i), a(i))
end do

[Bug target/107863] New: ICE with unrecognizable insn when using -funsigned-char with some AVX builtins

2022-11-24 Thread bouanto at zoho dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107863

Bug ID: 107863
   Summary: ICE with unrecognizable insn when using
-funsigned-char with some AVX builtins
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: bouanto at zoho dot com
  Target Milestone: ---

Hi.
When I compile the following code:

#include 

int main(int argc, char* argv[]) {
__m256i a = _mm256_set1_epi8(4);
__m256i b = _mm256_set1_epi8(2);
__m256i mask = _mm256_insert_epi8(_mm256_set1_epi8(0), -1, 2);
__m256i r = (__m256i) __builtin_ia32_pblendvb256 ((__v32qi)a, (__v32qi)b,
(__v32qi)mask);
return 0;
}

with the following command:

gcc main.c -o main -mavx512f -funsigned-char

I get the following error:

main.c: In function ‘main’:
main.c:9:1: error: unrecognizable insn:
9 | }
  | ^
(insn 655 654 656 2 (set (reg:QI 607)
(const_int 255 [0xff])) "main.c":6:20 -1
 (nil))
during RTL pass: vregs
main.c:9:1: internal compiler error: in extract_insn, at recog.cc:2791
0x1840d78 internal_error(char const*, ...)
???:0
0x62a3ac fancy_abort(char const*, int, char const*)
???:0
0x60555b _fatal_insn(char const*, rtx_def const*, char const*, int, char
const*)
???:0
0x60557d _fatal_insn_not_found(rtx_def const*, char const*, int, char const*)
???:0

The code compiles when not using -funsigned-char.

I'm not sure what would be the fix for this. Would it make sense that builtins
never use the char type, but instead use either unsigned char or signed char?

Re: [PATCH] Fortran: error recovery on associate with bad selector [PR107577]

2022-11-24 Thread Thomas Koenig via Gcc-patches

Hi Harald,


please find attached an obvious patch by Steve for a technical
regression that resulted from improvements in error recovery
of bad uses of associate.

Regtested on x86_64-pc-linux-gnu.

Will commit soon unless there are comments.


Obvious enough, I think.  Thanks!


As a sidenote: the testcase shows that we resolve the associate
names quite often, likely more often than necessary, resulting
in many error messages produced for the same line of code.  In
the present case, each use of the bad name produces two errors,
one where it is used, and one at the associate statement.
That is probably not helpful for the user.


We have an "error" flag in gfc_expr, which we use infrequently to
avoid repetitions of errors.  If an error has already been issued
for an expression, then we could set this.  We would have to be careful
about resetting the error on matching though, so it is probably better
to only use it during resolution.

Best regards

Thomas



[Bug c++/107862] Returning an std::vector from a lambda fails to be constexpr, while a custom class with allocated storage works

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107862

--- Comment #1 from Andrew Pinski  ---
I don't think this is valid code.
In the function dynamic_data_to_array you have:
std::array data;

But test.size() is not a constexpr unless test is a constexpr. making test a
constexpr does not work as it has dynamic allocation

In your Test struct you have:
constexpr unsigned size() const { return 1; }

So it is always a constexpr ...

Note clang also rejects this code for the same reason as GCC even with libc++.

Re: [PATCH]AArch64 sve2: Fix expansion of division [PR107830]

2022-11-24 Thread Richard Sandiford via Gcc-patches
Tamar Christina  writes:
>> -Original Message-
>> From: Richard Sandiford 
>> Sent: Wednesday, November 23, 2022 4:18 PM
>> To: Tamar Christina 
>> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
>> ; Marcus Shawcroft
>> ; Kyrylo Tkachov 
>> Subject: Re: [PATCH]AArch64 sve2: Fix expansion of division [PR107830]
>> 
>> Tamar Christina  writes:
>> > Hi All,
>> >
>> > SVE has an actual division optab, and when using -Os we don't optimize
>> > the division away.  This means that we need to distinguish between a
>> > div which we can optimize and one we cannot even during expansion.
>> >
>> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>> >
>> > Ok for master?
>> >
>> > Thanks,
>> > Tamar
>> >
>> > gcc/ChangeLog:
>> >
>> >PR target/107830
>> >* config/aarch64/aarch64.cc
>> >(aarch64_vectorize_can_special_div_by_constant): Check validity
>> during
>> >codegen phase as well.
>> >
>> > gcc/testsuite/ChangeLog:
>> >
>> >PR target/107830
>> >* gcc.target/aarch64/sve2/pr107830.c: New test.
>> >
>> > --- inline copy of patch --
>> > diff --git a/gcc/config/aarch64/aarch64.cc
>> > b/gcc/config/aarch64/aarch64.cc index
>> >
>> 4176d7b046a126664360596b6db79a43e77ff76a..bee23625807af95d5ec15ad45
>> 702
>> > 961b2d7ab55d 100644
>> > --- a/gcc/config/aarch64/aarch64.cc
>> > +++ b/gcc/config/aarch64/aarch64.cc
>> > @@ -24322,12 +24322,15 @@
>> aarch64_vectorize_can_special_div_by_constant (enum tree_code code,
>> >if ((flags & VEC_ANY_SVE) && !TARGET_SVE2)
>> >  return false;
>> >
>> > +  wide_int val = wi::add (cst, 1);
>> > +  int pow = wi::exact_log2 (val);
>> > +  bool valid_p = pow == (int)(element_precision (vectype) / 2);
>> > +  /* SVE actually has a div operator, we we may have gotten here through
>> > + that route.  */
>> >if (in0 == NULL_RTX && in1 == NULL_RTX)
>> > -{
>> > -  wide_int val = wi::add (cst, 1);
>> > -  int pow = wi::exact_log2 (val);
>> > -  return pow == (int)(element_precision (vectype) / 2);
>> > -}
>> > +return valid_p;
>> > +  else if (!valid_p)
>> > +return false;
>> 
>> Is this equivalent to:
>> 
>>   int pow = wi::exact_log2 (cst + 1);
>>   if (pow != (int) (element_precision (vectype) / 2))
>> return false;
>> 
>>   /* We can use the optimized pattern.  */
>>   if (in0 == NULL_RTX && in1 == NULL_RTX)
>> return true;
>> 
>> ?  If so, I'd find that slightly easier to follow, but I realise it's 
>> personal taste.
>> OK with that change if it works and you agree.
>> 
>> While looking at this, I noticed that we ICE for:
>> 
>>   void f(unsigned short *restrict p1, unsigned int *restrict p2)
>>   {
>> for (int i = 0; i < 16; ++i)
>>   {
>> p1[i] /= 0xff;
>> p2[i] += 1;
>>   }
>>   }
>> 
>> for -march=armv8-a+sve2 -msve-vector-bits=512.  I guess we need to filter
>> out partial modes or (better) add support for them.  Adding support for them
>> probably requires changes to the underlying ADDHNB pattern.
>
> I've prevented the ice by checking if the expansion for the mode exists. I'd 
> like to
> defer adding partial support because when I tried I had to modify some 
> iterators
> as well and need to check that it's safe to do so.

Sounds good.

> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
>   PR target/107830
>   * config/aarch64/aarch64.cc
>   (aarch64_vectorize_can_special_div_by_constant): Check validity during
>   codegen phase as well.
>
> gcc/testsuite/ChangeLog:
>
>   PR target/107830
>   * gcc.target/aarch64/sve2/pr107830-1.c: New test.
>   * gcc.target/aarch64/sve2/pr107830-2.c: New test.
>
> --- inline copy of patch 
>
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index 
> 4176d7b046a126664360596b6db79a43e77ff76a..02aa1f34ac6155b877340d788c6d151b7c8d8bcd
>  100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -24322,12 +24322,18 @@ aarch64_vectorize_can_special_div_by_constant (enum 
> tree_code code,
>if ((flags & VEC_ANY_SVE) && !TARGET_SVE2)
>  return false;
>  
> +  wide_int val = wi::add (cst, 1);
> +  int pow = wi::exact_log2 (val);

Does the:

  int pow = wi::exact_log2 (cst + 1);

I suggested above not work?  That seems easier to read IMO, since there
are no other uses of "val".

> +  auto insn_code = maybe_code_for_aarch64_bitmask_udiv3 (TYPE_MODE 
> (vectype));
> +  /* SVE actually has a div operator, we may have gotten here through
> + that route.  */
> +  if (pow != (int)(element_precision (vectype) / 2)

Formatting nit: should be a space after "(int)".

OK with those changes, thanks.

Richard

> +  || insn_code == CODE_FOR_nothing)
> +return false;
> +
> +  /* We can use the optimized pattern.  */
>if (in0 == NULL_RTX && in1 == NULL_RTX)
> -{
> -  wide_int val = wi::add (cst, 1);
> -  int pow = wi::exact_log2 (val);
> -  

[Bug c++/107862] New: Returning an std::vector from a lambda fails to be constexpr, while a custom class with allocated storage works

2022-11-24 Thread milasudril at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107862

Bug ID: 107862
   Summary: Returning an std::vector from a lambda fails to be
constexpr, while a custom class with allocated storage
works
   Product: gcc
   Version: 12.2.0
   URL: https://godbolt.org/z/dve3Yx8ax
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: milasudril at gmail dot com
  Target Milestone: ---
  Host: x86-64_linux_gnu
Target: x86-64_linux_gnu

Minimal working example:


#include 
#include 
#include 

struct Test {
using value_type = int;
constexpr Test(){ data = new value_type(0); }
constexpr ~Test(){ delete data; }

constexpr unsigned size() const { return 1; }
constexpr auto begin() const { return data; }
constexpr auto end() const { return data + 1; }

  private:
value_type* data;
};

consteval auto dynamic_data_to_array(auto generator) {
auto test = generator();
using VT = typename decltype(test)::value_type;

std::array data;
std::copy(test.begin(), test.end(), data.begin());
return data;
}

constexpr Test get_test() { return Test(); }

int main()
{
// Does not work  constexpr auto data0 = dynamic_data_to_array([] { return
std::vector{0};});
constexpr auto data1 = dynamic_data_to_array([] { return Test{};});

return data1[0];
}


It works to return a Test object here, but not an std::vector. Compiler output:

:22:29:   in 'constexpr' expansion of 'test.std::vector::size()'
:22:33: error: the value of 'test' is not usable in a constant
expression
   22 | std::array data;
  | ^~~~
:19:10: note: 'test' was not declared 'constexpr'
   19 | auto test = generator();
  |  ^~~~
:22:29: note: in template argument for type 'long unsigned int'
   22 | std::array data;
  |~^~
:22:29:   in 'constexpr' expansion of 'test.std::vector::size()'
:22:33: error: the value of 'test' is not usable in a constant
expression
   22 | std::array data;
  | ^~~~
:19:10: note: 'test' was not declared 'constexpr'
   19 | auto test = generator();
  |  ^~~~
:22:29: note: in template argument for type 'long unsigned int'
   22 | std::array data;

I am not sure if this is by the standard, or if this is related to some missing
implementation details in the standard library. Note: clang doesn't like it
either.

RE: [PATCH]AArch64 sve2: Fix expansion of division [PR107830]

2022-11-24 Thread Tamar Christina via Gcc-patches
> -Original Message-
> From: Richard Sandiford 
> Sent: Wednesday, November 23, 2022 4:18 PM
> To: Tamar Christina 
> Cc: gcc-patches@gcc.gnu.org; nd ; Richard Earnshaw
> ; Marcus Shawcroft
> ; Kyrylo Tkachov 
> Subject: Re: [PATCH]AArch64 sve2: Fix expansion of division [PR107830]
> 
> Tamar Christina  writes:
> > Hi All,
> >
> > SVE has an actual division optab, and when using -Os we don't optimize
> > the division away.  This means that we need to distinguish between a
> > div which we can optimize and one we cannot even during expansion.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > PR target/107830
> > * config/aarch64/aarch64.cc
> > (aarch64_vectorize_can_special_div_by_constant): Check validity
> during
> > codegen phase as well.
> >
> > gcc/testsuite/ChangeLog:
> >
> > PR target/107830
> > * gcc.target/aarch64/sve2/pr107830.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/aarch64/aarch64.cc
> > b/gcc/config/aarch64/aarch64.cc index
> >
> 4176d7b046a126664360596b6db79a43e77ff76a..bee23625807af95d5ec15ad45
> 702
> > 961b2d7ab55d 100644
> > --- a/gcc/config/aarch64/aarch64.cc
> > +++ b/gcc/config/aarch64/aarch64.cc
> > @@ -24322,12 +24322,15 @@
> aarch64_vectorize_can_special_div_by_constant (enum tree_code code,
> >if ((flags & VEC_ANY_SVE) && !TARGET_SVE2)
> >  return false;
> >
> > +  wide_int val = wi::add (cst, 1);
> > +  int pow = wi::exact_log2 (val);
> > +  bool valid_p = pow == (int)(element_precision (vectype) / 2);
> > +  /* SVE actually has a div operator, we we may have gotten here through
> > + that route.  */
> >if (in0 == NULL_RTX && in1 == NULL_RTX)
> > -{
> > -  wide_int val = wi::add (cst, 1);
> > -  int pow = wi::exact_log2 (val);
> > -  return pow == (int)(element_precision (vectype) / 2);
> > -}
> > +return valid_p;
> > +  else if (!valid_p)
> > +return false;
> 
> Is this equivalent to:
> 
>   int pow = wi::exact_log2 (cst + 1);
>   if (pow != (int) (element_precision (vectype) / 2))
> return false;
> 
>   /* We can use the optimized pattern.  */
>   if (in0 == NULL_RTX && in1 == NULL_RTX)
> return true;
> 
> ?  If so, I'd find that slightly easier to follow, but I realise it's 
> personal taste.
> OK with that change if it works and you agree.
> 
> While looking at this, I noticed that we ICE for:
> 
>   void f(unsigned short *restrict p1, unsigned int *restrict p2)
>   {
> for (int i = 0; i < 16; ++i)
>   {
> p1[i] /= 0xff;
> p2[i] += 1;
>   }
>   }
> 
> for -march=armv8-a+sve2 -msve-vector-bits=512.  I guess we need to filter
> out partial modes or (better) add support for them.  Adding support for them
> probably requires changes to the underlying ADDHNB pattern.

I've prevented the ice by checking if the expansion for the mode exists. I'd 
like to
defer adding partial support because when I tried I had to modify some iterators
as well and need to check that it's safe to do so.

Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

Ok for master?

Thanks,
Tamar

gcc/ChangeLog:

PR target/107830
* config/aarch64/aarch64.cc
(aarch64_vectorize_can_special_div_by_constant): Check validity during
codegen phase as well.

gcc/testsuite/ChangeLog:

PR target/107830
* gcc.target/aarch64/sve2/pr107830-1.c: New test.
* gcc.target/aarch64/sve2/pr107830-2.c: New test.

--- inline copy of patch 

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 
4176d7b046a126664360596b6db79a43e77ff76a..02aa1f34ac6155b877340d788c6d151b7c8d8bcd
 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -24322,12 +24322,18 @@ aarch64_vectorize_can_special_div_by_constant (enum 
tree_code code,
   if ((flags & VEC_ANY_SVE) && !TARGET_SVE2)
 return false;
 
+  wide_int val = wi::add (cst, 1);
+  int pow = wi::exact_log2 (val);
+  auto insn_code = maybe_code_for_aarch64_bitmask_udiv3 (TYPE_MODE (vectype));
+  /* SVE actually has a div operator, we may have gotten here through
+ that route.  */
+  if (pow != (int)(element_precision (vectype) / 2)
+  || insn_code == CODE_FOR_nothing)
+return false;
+
+  /* We can use the optimized pattern.  */
   if (in0 == NULL_RTX && in1 == NULL_RTX)
-{
-  wide_int val = wi::add (cst, 1);
-  int pow = wi::exact_log2 (val);
-  return pow == (int)(element_precision (vectype) / 2);
-}
+return true;
 
   if (!VECTOR_TYPE_P (vectype))
return false;
diff --git a/gcc/testsuite/gcc.target/aarch64/sve2/pr107830-1.c 
b/gcc/testsuite/gcc.target/aarch64/sve2/pr107830-1.c
new file mode 100644
index 
..6d8ee3615fdb0083dbde1e45a2826fb681726139
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve2/pr107830-1.c
@@ -0,0 +1,13 @@
+/* { dg-do 

[Bug c++/107861] C++ static_assert() does not honor -fwrapv, leading to compilation error

2022-11-24 Thread markus at oberhumer dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107861

--- Comment #3 from Markus F.X.J. Oberhumer  ---
Indeed.

And just for reference I had also reported this as clang bug in
https://github.com/llvm/llvm-project/issues/59195

[Bug c++/107861] C++ static_assert() does not honor -fwrapv, leading to compilation error

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107861

--- Comment #2 from Andrew Pinski  ---
Note clang rejects even just:

#include 

#define wrap_inc(x) ((x) + 1 < (x))

constexpr int int_max = INT_MAX;

bool b0 = wrap_inc(int_max);

[Bug c++/107861] C++ static_assert() does not honor -fwrapv, leading to compilation error

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107861

--- Comment #1 from Andrew Pinski  ---
Right, this is by design and I don't think it is a bug. The reasoning is the
C++ constant expressions have a way of requiring you to having to figure out if
it was going to overflow and cause different behavior (of well defined code) at
compile time.

There is most likely a way to get a subsitutation failure with the overflow
too.
I tried with C++20 concepts but I am not so good at it.

[Patch] libgomp: Add no-target-region rev offload test + fix plugin-nvptx

2022-11-24 Thread Tobias Burnus

The nvptx reverse-offload code mishandled the case that there was a reverse
offload function that isn't called inside a target region. In that case,
the linker did not include GOMP_target_ext and the global variable it uses.
But the plugin-nvptx.c code expected that the latter is present.

Found via sollve_vv's tests/5.0/requires/test_requires_reverse_offload.c which 
is
similar to the new testcase. (Albeit the 'if' and comments imply that the 
sollve_vv
author did not intend this.)

Solution: Handle it gracefully that the global variable does not exist - and
do this check first - and only when successful allocate dev->rev_data. If not,
deallocate rev_fn_table to disable reverse offload handling.

OK for mainline?

Tobias

PS: Admittedly, the nvptx code is not yet exercised as I still have to submit 
the
libgomp/target.c code handling the reverse offload (+ enabling requires 
reverse_offload
in plugin-nvptx.c). As it is obvious from this patch, the target.c patch is 
nearly but
not yet completely ready. - That patch passes the three sollve_vv testcases and 
also
the existing libgomp testcases, but some corner cases and more testcases are 
missing.
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
libgomp: Add no-target-region rev offload test + fix plugin-nvptx

OpenMP permits that a 'target device(ancestor:1)' is called without being
enclosed in a target region - using the current device (i.e. the host) in
that case.  This commit adds a testcase for this.

In case of nvptx, the missing on-device 'GOMP_target_ext' call causes that
it and also the associated on-device GOMP_REV_OFFLOAD_VAR variable are not
linked in from nvptx's libgomp.a. Thus, handle the failing cuModuleGetGlobal
gracefully by disabling reverse offload and assuming that the failure is fine.

libgomp/ChangeLog:

	* plugin/plugin-nvptx.c (GOMP_OFFLOAD_load_image): Use unsigned int
	for 'i' to match 'fn_entries'; regard absent GOMP_REV_OFFLOAD_VAR
	as valid and the code having no reverse-offload code.
	* testsuite/libgomp.c-c++-common/reverse-offload-2.c: New test.

 libgomp/plugin/plugin-nvptx.c  | 36 ++--
 .../libgomp.c-c++-common/reverse-offload-2.c   | 49 ++
 2 files changed, 73 insertions(+), 12 deletions(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 0768fca350b..e803f083591 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -1390,7 +1390,8 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   else if (rev_fn_table)
 {
   CUdeviceptr var;
-  size_t bytes, i;
+  size_t bytes;
+  unsigned int i;
   r = CUDA_CALL_NOCHECK (cuModuleGetGlobal, , , module,
 			 "$offload_func_table");
   if (r != CUDA_SUCCESS)
@@ -1413,12 +1414,11 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
 
   if (rev_fn_table && *rev_fn_table && dev->rev_data == NULL)
 {
-  /* cuMemHostAlloc memory is accessible on the device, if unified-shared
-	 address is supported; this is assumed - see comment in
-	 nvptx_open_device for CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING.   */
-  CUDA_CALL_ASSERT (cuMemHostAlloc, (void **) >rev_data,
-			sizeof (*dev->rev_data), CU_MEMHOSTALLOC_DEVICEMAP);
-  CUdeviceptr dp = (CUdeviceptr) dev->rev_data;
+  /* Get the on-device GOMP_REV_OFFLOAD_VAR variable.  It should be
+	 available but it might be not.  One reason could be: if the user code
+	 has 'omp target device(ancestor:1)' in pure hostcode, GOMP_target_ext
+	 is not called on the device and, hence, it and GOMP_REV_OFFLOAD_VAR
+	 are not linked in.  */
   CUdeviceptr device_rev_offload_var;
   size_t device_rev_offload_size;
   CUresult r = CUDA_CALL_NOCHECK (cuModuleGetGlobal,
@@ -1426,11 +1426,23 @@ GOMP_OFFLOAD_load_image (int ord, unsigned version, const void *target_data,
   _rev_offload_size, module,
   XSTRING (GOMP_REV_OFFLOAD_VAR));
   if (r != CUDA_SUCCESS)
-	GOMP_PLUGIN_fatal ("cuModuleGetGlobal error - GOMP_REV_OFFLOAD_VAR: %s", cuda_error (r));
-  r = CUDA_CALL_NOCHECK (cuMemcpyHtoD, device_rev_offload_var, ,
-			 sizeof (dp));
-  if (r != CUDA_SUCCESS)
-	GOMP_PLUGIN_fatal ("cuMemcpyHtoD error: %s", cuda_error (r));
+	{
+	  free (*rev_fn_table);
+	  *rev_fn_table = NULL;
+	}
+  else
+	{
+	  /* cuMemHostAlloc memory is accessible on the device, if
+	 unified-shared address is supported; this is assumed - see comment
+	 in nvptx_open_device for CU_DEVICE_ATTRIBUTE_UNIFIED_ADDRESSING. */
+	  CUDA_CALL_ASSERT (cuMemHostAlloc, (void **) >rev_data,
+			sizeof (*dev->rev_data), CU_MEMHOSTALLOC_DEVICEMAP);
+	  CUdeviceptr dp = (CUdeviceptr) dev->rev_data;
+	  r = 

[Bug target/107860] Compilation failure, ambiguous fisttp

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107860

Andrew Pinski  changed:

   What|Removed |Added

   Host|x86_64-apple-darwin21   |aarch64-apple-darwin21
  Build|x86_64-apple-darwin21   |aarch64-apple-darwin21

--- Comment #5 from Andrew Pinski  ---
Note you could in theory just build a cross compiler but you still need a
wrapper as to configure to detect the correct thing 

[Bug target/107860] Compilation failure, ambiguous fisttp

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107860

Andrew Pinski  changed:

   What|Removed |Added

 Status|WAITING |UNCONFIRMED
 Ever confirmed|1   |0

--- Comment #4 from Andrew Pinski  ---

configure:27158: checking assembler for filds and fists mnemonics
configure:27167: /usr/bin/as-o conftest.o conftest.s >&5
conftest.s:1:8: error: unknown token in expression
filds (%ebp); fists (%ebp)
   ^
conftest.s:1:7: error: invalid operand
filds (%ebp); fists (%ebp)
  ^

Oh this is a configure issue on how to test as here.
But I don't know exactly how to fix it really because GCC configure with no
option is the right thing to do for the target.

You might need a wrapper as to do the right thing really for this kind of
building 

[Bug c++/107755] [10/11/12/13 Regression] ICE: in fold_convert_loc, at fold-const.c:2435, with -Wlogical-op, implicit user-defined conversion operator, template function, logical operator, and conditi

2022-11-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107755

Jakub Jelinek  changed:

   What|Removed |Added

 CC||jakub at gcc dot gnu.org

--- Comment #4 from Jakub Jelinek  ---
One possible fix would be not to emit these warnings while
processing_template_decl, the warning goes too deep into the expressions and
even when the operands are neither type nor value dependent, if
processing_template_decl they are far from what the middle-end code can grok.
I wonder where IMPLICIT_CONV_EXPR gets lost, so the COND_EXPR with
boolean_type_node type has RECORD_TYPE operand.

[Bug target/107860] Compilation failure, ambiguous fisttp

2022-11-24 Thread simon at pushface dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107860

--- Comment #3 from simon at pushface dot org ---
Created attachment 53961
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53961=edit
gcc/config.log

As requested (this time, sorry about previous attempt)

[Bug fortran/107577] [13 Regression] ICE in find_array_spec, at fortran/resolve.cc:5008 since r13-1757-gf838d15641d256e2

2022-11-24 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107577

anlauf at gcc dot gnu.org changed:

   What|Removed |Added

 Resolution|--- |FIXED
 Status|NEW |RESOLVED

--- Comment #5 from anlauf at gcc dot gnu.org ---
Fixed.

Thanks for the report, and to Steve for the patch.

[Bug target/107860] Compilation failure, ambiguous fisttp

2022-11-24 Thread simon at pushface dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107860

--- Comment #2 from simon at pushface dot org ---
Created attachment 53960
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=53960=edit
gcc/config.log

As requested

[Bug libgcc/107728] _umoddi3.o has reference to the unwinder at -O0

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107728

Andrew Pinski  changed:

   What|Removed |Added

  Component|bootstrap   |libgcc
 Status|RESOLVED|UNCONFIRMED
Summary|with -O0, libgcc in the |_umoddi3.o has reference to
   |first stage compiler has|the unwinder at -O0
   |reference to libc functions |
   Keywords||build
 Target||sh4eb-buildroot-linux-gnu
 Resolution|MOVED   |---

--- Comment #6 from Andrew Pinski  ---
There still might not much to be done really.
But let's see ...

[Bug bootstrap/107728] with -O0, libgcc in the first stage compiler has reference to libc functions

2022-11-24 Thread arnout at mind dot be via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107728

--- Comment #5 from Arnout Vandecappelle  ---
Based on

> glibc builds needs to be fixed such that it does not reference the function 
> in the unwinder at -O0

I've traced through the map file why this symbol is pulled in:

/.../libgcc.a(unwind-dw2.o)
  /.../libgcc.a(_umoddi3.o) (_Unwind_Resume)
/.../libgcc.a(unwind-dw2-fde-dip.o)
  /.../libgcc.a(unwind-dw2.o) (_Unwind_Find_FDE)

In other words, _umoddi3.o is the one that pulls in the unwinder.

[Bug c++/107861] New: C++ static_assert() does not honor -fwrapv, leading to compilation error

2022-11-24 Thread markus at oberhumer dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107861

Bug ID: 107861
   Summary: C++ static_assert() does not honor -fwrapv, leading to
compilation error
   Product: gcc
   Version: 12.2.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: markus at oberhumer dot com
  Target Milestone: ---

C++ static_assert() does not honor -fwrapv, leading to compilation error

godbolt link: https://gcc.godbolt.org/z/Po8vc5Kex

$ g++ --version
g++ (GCC) 12.2.1 20220819 (Red Hat 12.2.1-

$ g++ -fwrapv -Wno-overflow -c test.cpp
test.cpp:7:34: error: non-constant condition for static assertion

$ cat test.cpp

// compile with: g++ -fwrapv -Wno-overflow -c test.cpp
//
// extra rant: -Wno-overflow should not be needed here!

#include 

#define wrap_inc(x) ((x) + 1 < (x))

constexpr int int_max = INT_MAX;

bool b0 = wrap_inc(int_max);
const bool b1 = wrap_inc(int_max);
constexpr bool b2 = wrap_inc(int_max);
static_assert(b2, ""); // works

// error: non-constant condition for static assertion
static_assert(wrap_inc(int_max), "");

[Bug tree-optimization/107839] spurious "may be used uninitialized" warning while all uses are under "if (c)"

2022-11-24 Thread vincent-gcc at vinc17 dot net via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107839

--- Comment #3 from Vincent Lefèvre  ---
(In reply to Richard Biener from comment #2)
> it's loop invariant motion that hoists the v + v compute out of the loop
> and thus outside of its controlling condition.  You can see it's careful
> to not introduce undefined overflow that is possibly conditionally
> executed only but it fails to consider the case of 'v' being conditionally
> uninitialized.
> 
> It's very difficult to do the right thing here - it might be tempting to
> hoist the compute as
> 
>   if (c)
> tem = v+v;
>   while (1)
> if (c)
>   f(tem);

Couldn't the -Wmaybe-uninitialized warning be disabled on hoisted code, so that
the controlling condition wouldn't be needed?

To make sure not to disable potential warnings, the information that v was used
for tem should be kept together with tem in the loop. Something like
((void)v,tem), though GCC doesn't currently warn on that if v is uninitialized
(but that's another issue that should be solved).

However...

> Maybe the simplest thing would be to never hoist v + v, or only
> hoist it when the controlling branch is not loop invariant.
> 
> The original testcase is probably more "sensible", does it still have
> a loop invariant controlling condition and a loop invariant computation
> under that control?

In my tmd/binary32/hrcases.c file, there doesn't seem to be a loop invariant,
so I'm wondering what is the real cause. The code looks like the following:

static inline double cldiff (clock_t t1, clock_t t0)
{
  return (double) (t1 - t0) / CLOCKS_PER_SEC;
}

and in a function hrsearch() where its mprog argument (named c above) is an
integer that enables progress output when it is nonzero:

  if (mprog)
{
  mctr = 0;
  nctr = 0;
  t0 = ti = clock ();
}

  do
{
[...]
  if (mprog && ++mctr == mprog)
{
  mctr = 0;
  tj = clock ();
  mpfr_fprintf (stderr, "[exponent %ld: %8.2fs %8.2fs  %5lu / %lu]\n",
e, cldiff (tj, ti), cldiff (tj, t0), ++nctr, nprog);
  ti = tj;
}
[...]
}
  while (mpfr_get_exp (x) < e + 2);

The warning I get is

In function ‘cldiff’,
inlined from ‘hrsearch’ at hrcases.c:298:11,
inlined from ‘main’ at hrcases.c:520:9:
hrcases.c:46:23: warning: ‘t0’ may be used uninitialized
[-Wmaybe-uninitialized]
   46 |   return (double) (t1 - t0) / CLOCKS_PER_SEC;
  |   ^
hrcases.c: In function ‘main’:
hrcases.c:128:11: note: ‘t0’ was declared here
  128 |   clock_t t0, ti, tj;
  |   ^~

So the operation on t0 is tj - t0, and as tj is set just before, I don't see
how it can be used in a loop invariant.

This can be simplified as follows:

int f (int);
void g (int mprog)
{
  int t0, ti, tj;

  if (mprog)
t0 = ti = f(0);

  do
if (mprog)
  {
tj = f(0);
f(tj - ti);
f(tj - t0);
ti = tj;
  }
  while (f(0));
}

and I get

tst.c: In function ‘g’:
tst.c:13:9: warning: ‘t0’ may be used uninitialized [-Wmaybe-uninitialized]
   13 | f(tj - ti);
  | ^~
tst.c:4:7: note: ‘t0’ was declared here
4 |   int t0, ti, tj;
  |   ^~

BTW, the warning is incorrect: I can't see t0 in "f(tj - ti);".

[Bug fortran/107819] ICE in gfc_check_argument_var_dependency, at fortran/dependency.cc:978

2022-11-24 Thread anlauf at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107819

--- Comment #5 from anlauf at gcc dot gnu.org ---
(In reply to Mikael Morin from comment #4)
> But is it required to generate a temporary?
> As I understand it, the code is invalid, and (correctly) diagnosed, so there
> is nothing else to do.
> It's invalid because of 15.5.2.13 Restrictions on entities associated with
> dummy arguments:
> (4) If the value of the entity or any subobject of it is affected through
> the dummy argument, then at any time during the invocation and execution of
> the procedure, either before or after the definition, it shall be referenced
> only through that dummy argument unless (...)

Right.

I was confused by two observations.  First, NAG & Cray seem to generate
temporaries, while Intel and NVidia don't and would agree with gfortran
after the patch.

Second, I stumbled over:

! 15.5.2.3 Argument association
! (4) A present dummy argument with the VALUE attribute becomes argument
! associated with a definable anonymous data object whose initial value is
! the value of the actual argument.

So it boils down to what ELEMENTAL actually means in that context.  F2018:

15.8.3 Elemental subroutine actual arguments

! In a reference to an elemental subroutine, if the actual arguments
! corresponding to INTENT(OUT) and INTENT(INOUT) dummy arguments are
! arrays, the values of the elements, if any, of the results are the same
! as would be obtained if the subroutine had been applied separately, in
! array element order, to corresponding elements of each array actual
! argument.

So I read this that

   call s (a(n), a)

is to be interpreted as

  do i = 1, size (a)
 call s (a(n(i)), a(i))
  end do

and this would actually be well-defined behavior... ;-)

[Bug target/107860] Compilation failure, ambiguous fisttp

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107860

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2022-11-24
 Ever confirmed|0   |1

[Bug target/107860] Compilation failure, ambiguous fisttp

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107860

--- Comment #1 from Andrew Pinski  ---
Can you attach config.log from the gcc directory?

It should have done detected filds :

gcc_GAS_CHECK_FEATURE([filds and fists mnemonics],
   gcc_cv_as_ix86_filds,,
   [filds (%ebp); fists (%ebp)],,
   [AC_DEFINE(HAVE_AS_IX86_FILDS, 1,
 [Define if your assembler uses filds and fists mnemonics.])])

[Bug target/106609] [12 Regression] sh3eb-elf cross compiler is being miscompiled since r12-1525-g3155d51bfd1de8b6c4645

2022-11-24 Thread sebastien.michelland--- via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106609

--- Comment #15 from Sébastien Michelland  ---
Thanks, turns out my bisected commit was related after all...

I can confirm that test cases from OP and #4 (with protocol from OP) are no
longer broken for me on yesterday's master.

Re: How to debug while using LTO?

2022-11-24 Thread Richard Biener via Gcc



> Am 24.11.2022 um 17:28 schrieb Stefan Schulze Frielinghaus via Gcc 
> :
> 
> Hi everyone,
> 
> Currently I'm looking into a wrong-code bug and would like to understand
> a certain optimization done by combine during local transformation.
> Without LTO I would simply debug cc1 and step through combine.  However,
> with LTO enabled AFAIK I have to debug lto1 instead.  In order to get
> the lto1 command line of interest according to
> https://gcc.gnu.org/legacy-ml/gcc/2009-11/msg00047.html
> I have to pass -Wl,-debug to gcc in order to get the command for
> collect2 to which itself I have to pass -plugin-opt=-debug in order to
> get the command for lto-wrapper.  According to the aforementioned mail I
> should add option -debug to lto-wrapper, however, it appears to me that
> option -debug was removed.  I gave options -v and -### a chance without
> luck, i.e., those only print the usual environment variables and
> afterwards a list of object files like
> 
> /tmp/ccPEIV35.ltrans0.ltrans.o
> /tmp/ccNmpKfS.debug.temp.o
> /tmp/cceiCIFg.debug.temp.o
> /tmp/ccZ4Qc7E.debug.temp.o
> ...
> 
> but no lto1 command.  Thus, how do you retrieve the lto1 command?
> 
> While desperate I retrieved it manually via strace.  However, the lto1
> command refers to temporary files which have been erased meanwhile.  I
> actually didn't expect that because I added -save-temps to all the
> intermediate commands which is also reflected in the environment
> variable COLLECT_GCC_OPTIONS.  Thus, how do you keep temporary files?

Adding -v -save-temps and then running gdb on the lto1 command works and is 
what I usually do.

Richard 

> Cheers,
> Stefan


[Bug bootstrap/107728] with -O0, libgcc in the first stage compiler has reference to libc functions

2022-11-24 Thread arnout at mind dot be via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107728

Arnout Vandecappelle  changed:

   What|Removed |Added

 CC||arnout at mind dot be

--- Comment #4 from Arnout Vandecappelle  ---
This bug was already reported before at glibc:
https://sourceware.org/bugzilla/show_bug.cgi?id=29621

There it was concluded that it is due to gcc stage1 being built with
CFLAGS_FOR_TARGET="-O0". This is not something that is tested in glibc's
testing infrastructure.

It's not clear to me whether the issue needs to be solved in GCC or in glibc.

[Bug modula2/107611] mc-boot-ch/Gtermios.cc etc. don't compile on Mac OS X 10.7

2022-11-24 Thread gaius at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107611

Gaius Mulley  changed:

   What|Removed |Added

 Status|UNCONFIRMED |RESOLVED
 Resolution|--- |FIXED

--- Comment #2 from Gaius Mulley  ---
Many thanks - applying patch!

[Bug bootstrap/107860] New: Compilation failure, ambiguous fisttp

2022-11-24 Thread simon at pushface dot org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107860

Bug ID: 107860
   Summary: Compilation failure, ambiguous fisttp
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: bootstrap
  Assignee: unassigned at gcc dot gnu.org
  Reporter: simon at pushface dot org
  Target Milestone: ---

Building the snapshot gcc-13-20221120 on macOS 13 (actually an aarch64 machine,
but using x86_64-apple-darwin21 compiler under Rosetta) with Command Line Tools
14.1.
Source patched as commit ac50541 for PR107781.

Phase 1 (actually configured with --disable-bootstrap) fails with this (I was
building with -j7, so had to extract the relevant parts of the log:

checking __sync extensions...
/Volumes/Miscellaneous1/x86_64/gcc-13-20221120/gcc/./gcc/xgcc
-B/Volumes/Miscellaneous1/x86_64/gcc-13-20221120/gcc/./gcc/
-B/opt/gcc-13-20221120/x86_64-apple-darwin21/bin/
-B/opt/gcc-13-20221120/x86_64-apple-darwin21/lib/ -isystem
/opt/gcc-13-20221120/x86_64-apple-darwin21/include -isystem
/opt/gcc-13-20221120/x86_64-apple-darwin21/sys-include
--sysroot=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk   -c -g -O2 
-fno-common  -W -Wall -gnatpg -nostdinc  -fno-toplevel-reorder  \
  g-debpoo.adb -o g-debpoo.o
...
/var/folders/ch/k_zwspdx3qsfbt1_x21zld6mgn/T//ccJJp5X6.s:11992:2: error:
ambiguous instructions require an explicit suffix (could be 'fisttps', or
'fisttpl')
fisttp  -408(%rbp)
^
/var/folders/ch/k_zwspdx3qsfbt1_x21zld6mgn/T//ccJJp5X6.s:12278:2: error:
ambiguous instructions require an explicit suffix (could be 'fisttps', or
'fisttpl')
fisttp  -408(%rbp)
^
...
make[6]: *** [g-debpoo.o] Error 1

Configure script (BUILD set to x86_64-apple-darwin21):
+++
XCODE=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX.sdk
CLU=/Library/Developer/CommandLineTools/SDKs/MacOSX.sdk

$GCC_SRC/configure  \
  --prefix=$PREFIX  \
  --without-libiconv-prefix \
  --disable-libmudflap  \
  --disable-libstdcxx-pch   \
  --disable-libsanitizer\
  --disable-libcc1  \
  --disable-libcilkrts  \
  --disable-multilib\
  --disable-nls \
  --enable-languages=c,c++,ada  \
  --host=$BUILD \
  --target=$BUILD   \
  --build=$BUILD\
  --without-isl \
  --with-build-sysroot="$(xcrun --show-sdk-path)"   \
  --with-sysroot=   \
  --with-specs="%{!sysroot=*:--sysroot=%:if-exists-else($XCODE $CLU)}"  \
  --with-build-config=no\
  --disable-bootstrap   \
   CFLAGS=-Wno-deprecated-declarations  \
   CXXFLAGS=-Wno-deprecated-declarations
+++

[Bug libstdc++/91456] std::function and std::is_invocable_r do not understand guaranteed elision

2022-11-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91456

--- Comment #10 from CVS Commits  ---
The releases/gcc-12 branch has been updated by Jonathan Wakely
:

https://gcc.gnu.org/g:db206f15f7091382cb981ade3c75f4c3e3559ab8

commit r12-8930-gdb206f15f7091382cb981ade3c75f4c3e3559ab8
Author: Jonathan Wakely 
Date:   Fri Sep 23 13:28:37 2022 +0100

libstdc++: Fix std::is_nothrow_invocable_r for uncopyable prvalues
[PR91456]

This is the last missing piece of PR 91456.

This also removes the only use of the C++11 version of
std::is_nothrow_invocable.

libstdc++-v3/ChangeLog:

PR libstdc++/91456
* include/std/type_traits (__is_nothrow_invocable): Remove.
(__is_invocable_impl::__nothrow_type): New member type which
checks if the conversion can throw.
(__is_nt_invocable_impl): Replace class template with alias
template to __is_nt_invocable_impl::__nothrow_type.
* testsuite/20_util/is_nothrow_invocable/91456.cc: New test.
* testsuite/20_util/is_nothrow_convertible/value.cc: Remove
macro used by value_ext.cc test.
* testsuite/20_util/is_nothrow_convertible/value_ext.cc: Remove
test for non-standard __is_nothrow_invocable trait.

(cherry picked from commit 71c828f84572d933979468baf2cf744180258ee4)

How to debug while using LTO?

2022-11-24 Thread Stefan Schulze Frielinghaus via Gcc
Hi everyone,

Currently I'm looking into a wrong-code bug and would like to understand
a certain optimization done by combine during local transformation.
Without LTO I would simply debug cc1 and step through combine.  However,
with LTO enabled AFAIK I have to debug lto1 instead.  In order to get
the lto1 command line of interest according to
https://gcc.gnu.org/legacy-ml/gcc/2009-11/msg00047.html
I have to pass -Wl,-debug to gcc in order to get the command for
collect2 to which itself I have to pass -plugin-opt=-debug in order to
get the command for lto-wrapper.  According to the aforementioned mail I
should add option -debug to lto-wrapper, however, it appears to me that
option -debug was removed.  I gave options -v and -### a chance without
luck, i.e., those only print the usual environment variables and
afterwards a list of object files like

/tmp/ccPEIV35.ltrans0.ltrans.o
/tmp/ccNmpKfS.debug.temp.o
/tmp/cceiCIFg.debug.temp.o
/tmp/ccZ4Qc7E.debug.temp.o
...

but no lto1 command.  Thus, how do you retrieve the lto1 command?

While desperate I retrieved it manually via strace.  However, the lto1
command refers to temporary files which have been erased meanwhile.  I
actually didn't expect that because I added -save-temps to all the
intermediate commands which is also reflected in the environment
variable COLLECT_GCC_OPTIONS.  Thus, how do you keep temporary files?

Cheers,
Stefan


Re: [Patch Arm] Fix PR 92999

2022-11-24 Thread Richard Earnshaw via Gcc-patches




On 11/11/2022 21:50, Ramana Radhakrishnan via Gcc-patches wrote:

On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan
 wrote:


On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
 wrote:




On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:



On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:

PR92999 is a case where the VFP calling convention does not allocate
enough FP registers for a homogenous aggregate containing FP16 values.
I believe this is the complete fix but would appreciate another set of
eyes on this.

Could I get a hand with a regression test run on an armhf environment
while I fix my environment ?

gcc/ChangeLog:

PR target/92999
*  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to handle
aggregates with elements smaller than SFmode.

gcc/testsuite/ChangeLog:

* gcc.target/arm/pr92999.c: New test.


Thanks,
Ramana

Signed-off-by: Ramana Radhakrishnan 


I'm not sure about this.  The AAPCS does not mention a base type of a
half-precision FP type as an appropriate homogeneous aggregate for using
VFP registers for either calling or returning.


Ooh interesting, thanks for taking a look and poking at the AAPCS and
that's a good catch. BF16 should also have the same behaviour as FP16
, I suspect ?


I suspect I got caught out by the definition of the Homogenous
aggregate from Section 5.3.5
((https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates)
which simply suggests it's an aggregate of fundamental types which
lists half precision floating point .


A homogeneous aggregate is any aggregate that fits the general 
definition, but only HAs of specific types are of interest for the VFP 
PCS rules.


The problem we have is that when we added HFmode (and later BF16mode) 
support we didn't notice that the base types are VFP candidates, but the 
nested types (eg in records or arrays) are not.


The problems started around SVN r236269 (git:1b81a1c1bd53) when we added 
FP16 support.





FTR, ideally I should have read 7.1.2.1
https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling)
:)







So perhaps the bug is that we try to treat this as a homogeneous
aggregate at all.


Yep I agree - I'll take a look again tomorrow and see if I can get a fix.

(And thanks Alex for the test run, I might trouble you again while I
still (slowly) get some of my boards back up)



and as promised take 2. I'd really prefer another review on this one
to see if I've not missed anything in the cases below.


I think I'd prefer to try and fix this at the point where we accept the 
base types, ie around:


case REAL_TYPE:
  mode = TYPE_MODE (type);
  if (mode != DFmode && mode != SFmode && mode != HFmode && mode != 
BFmode)

return -1;

by changing this to something like

/* HFmode and BFmode can be passed in registers, but are not valid
   base types for an HFA, so only accept these if we are at the top
   level.  */
if (!(mode == DFmode || mode == SFmode
  || (depth == 0
  && (mode == HFmode || mode == BFmode)))
   return -1;

and we then pass depth into the recursion calls as an extra parameter, 
starting at 0 for the top level and incrementing it by 1 each time 
aapcs_vfp_sub_candidate recurses.


For the test, would it be possible to rewrite it in the style of 
gcc.target/arm/aapcs/* and put it there? That would ensure that not only 
are the caller and callee compatible, but that the values are passed in 
the correct location.


R.



RE: [PATCH 35/35 V2] arm: improve tests for vsetq_lane*

2022-11-24 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Andrea Corallo 
> Sent: Thursday, November 24, 2022 2:44 PM
> To: Kyrylo Tkachov 
> Cc: gcc-patches@gcc.gnu.org; Richard Earnshaw
> 
> Subject: [PATCH 35/35 V2] arm: improve tests for vsetq_lane*
> 
> Kyrylo Tkachov  writes:
> 
> [...]
> 
> >> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> >> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> >> index e03e9620528..b5c9f4d5eb8 100644
> >> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> >> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
> >> @@ -1,15 +1,45 @@
> >> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } 
> >> {""} }
> */
> >>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
> >>  /* { dg-add-options arm_v8_1m_mve_fp } */
> >>  /* { dg-additional-options "-O2" } */
> >> +/* { dg-final { check-function-bodies "**" "" } } */
> >>
> >>  #include "arm_mve.h"
> >>
> >> +/*
> >> +**foo:
> >> +**...
> >> +**vmov.16 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
> >> +**...
> >> +*/
> >>  float16x8_t
> >>  foo (float16_t a, float16x8_t b)
> >>  {
> >> -return vsetq_lane_f16 (a, b, 0);
> >> +  return vsetq_lane_f16 (a, b, 1);
> >>  }
> >>
> >
> > Hmm, for these tests we should be able to scan for more specific codegen
> as we're setting individual lanes, so we should be able to scan for lane 1 in
> the vmov instruction, though it may need to be flipped for big-endian.
> > Thanks,
> > Kyrill
> 
> Hi Kyrill,
> 
> please find attached the updated version of this patch.
> 
> Big-endian should not be a problem as for my understanding is just not
> supported with MVE intrinsics.

Huh, that's right.
This version is ok.
Thanks!
Kyrill

> 
> Thanks!
> 
>   Andrea



Re: [PATCH] Make Warray-bounds alias to Warray-bounds= [PR107787]

2022-11-24 Thread Iskander Shakirzyanov via Gcc-patches
>> How did you test the patch? If you bootstrapped it and ran the
>> testsuite then it's OK.
Yes, i ran testsuite and  bootstrapped and everything seemed OK, but i missed 
fail of tests gcc.dg/Warray-bounds-34.c and gcc.dg/Warray-bounds-43.c, so Franz 
is right. After that I fixed the regexps in dg directives and now everything 
seems OK.  

> I'm pretty sure the testsuite will have regressions, as I have a very similar 
> patch lying around that needs these testsuite changes.

You are right, thank you. I missed this, attaching corrected version of patch.

> This also shows nicely why I don't like warnings with levels, what if I want 
> -Werror=array-bounds=2 + -Warray-bounds=1?

I completely agree with you, because I also thought about using -Werror=opt=X + 
-Wopt=Y, this functionality looks useful. As I know, gcc, while parsing an 
option with the same OPT,  overwrites the old config of OPT. 

> Because I think at least -Wuse-after-free= and Wattributes= have the same 
> problem.

Yes, looks like this, probably should be fixed too.  

> BTW, is the duplicated warning description "Warn if an array is accessed out 
> of bounds." needed or not with Alias()?

According to other examples in common.opt, duplicated description is not 
necessary, you are right.

> I've attached my patch, feel free to integrate the testsuite changes.

Thanks, but it seems to me that duplicating existing tests seems redundant to 
test functionality of -Werror=array-bounds=X.


From bf047e36392dab138db10be2ec257d08c376ada5 Mon Sep 17 00:00:00 2001
From: Iskander Shakirzyanov 
Date: Thu, 24 Nov 2022 14:26:59 +
Subject: [PATCH] Make Warray-bounds alias to Warray-bounds= [PR107787]

According to documentation the -Werror= option makes the specified warning
into an error and also automatically implies this option. Then it seems that
the behavior of the compiler when specifying -Werror=array-bounds=X should be
the same as specifying "-Werror=array-bounds -Warray-bounds=X", so we expect to
receive array-bounds pass triggers and they must be processed as errors.
In practice, we observe that the array-bounds pass is indeed called, but
its responses are processed as warnings, not errors.
As I understand, this happens because Warray-bounds and Warray-bounds= are
declared as 2 different options in common.opt, so when
diagnostic_classify_diagnostic() is called, DK_ERROR is set for
the Warray-bounds= option, but in diagnostic_report_diagnostic() through
warning_at() passes opt_index of Warray-bounds, so information about
DK_ERROR is lost. Fixed by using Alias() in declaration of
Warray-bounds (similarly as in Wattribute-alias etc.)

PR driver/107787

Co-authored-by: Franz Sirl 

gcc/ChangeLog:

* common.opt (Warray-bounds): Turn into alias to
-Warray-bounds=1.
* builtins.cc (warn_array_bounds): Use OPT_Warray_bounds_
instead of OPT_Warray_bounds.
* diagnostic-spec.cc: Likewise.
* gimple-array-bounds.cc: Likewise.
* gimple-ssa-warn-restrict.cc: Likewise.

gcc/testsuite/ChangeLog:

* gcc.dg/Warray-bounds-34.c: Correct the regular
expression for -Warray-bounds=.
* gcc.dg/Warray-bounds-43.c: Likewise.
* gcc.dg/pr107787.c: New test.

gcc/c-family/ChangeLog:

* c-common.cc (warn_array_bounds): Use OPT_Warray_bounds_
instead of OPT_Warray_bounds.
---
 gcc/builtins.cc |  6 ++--
 gcc/c-family/c-common.cc|  4 +--
 gcc/common.opt  |  3 +-
 gcc/diagnostic-spec.cc  |  1 -
 gcc/gimple-array-bounds.cc  | 38 -
 gcc/gimple-ssa-warn-restrict.cc |  2 +-
 gcc/testsuite/gcc.dg/Warray-bounds-34.c |  2 +-
 gcc/testsuite/gcc.dg/Warray-bounds-43.c |  6 ++--
 gcc/testsuite/gcc.dg/pr107787.c | 13 +
 9 files changed, 43 insertions(+), 32 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr107787.c

diff --git a/gcc/builtins.cc b/gcc/builtins.cc
index 4dc1ca672b2..02c4fefa86f 100644
--- a/gcc/builtins.cc
+++ b/gcc/builtins.cc
@@ -696,14 +696,14 @@ c_strlen (tree arg, int only_value, c_strlen_data *data, 
unsigned eltsize)
 {
   /* Suppress multiple warnings for propagated constant strings.  */
   if (only_value != 2
- && !warning_suppressed_p (arg, OPT_Warray_bounds)
- && warning_at (loc, OPT_Warray_bounds,
+ && !warning_suppressed_p (arg, OPT_Warray_bounds_)
+ && warning_at (loc, OPT_Warray_bounds_,
 "offset %qwi outside bounds of constant string",
 eltoff))
{
  if (decl)
inform (DECL_SOURCE_LOCATION (decl), "%qE declared here", decl);
- suppress_warning (arg, OPT_Warray_bounds);
+ suppress_warning (arg, OPT_Warray_bounds_);
}
   return NULL_TREE;
 }
diff --git a/gcc/c-family/c-common.cc b/gcc/c-family/c-common.cc
index 6f1f21bc4c1..b0da6886ccf 100644
--- 

[Bug c++/84469] structured binding inside for all loop thinks it is type depedent when it is not (inside a template)

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84469

Andrew Pinski  changed:

   What|Removed |Added

 CC||pilarlatiesa at gmail dot com

--- Comment #5 from Andrew Pinski  ---
*** Bug 107858 has been marked as a duplicate of this bug. ***

[Bug c++/107858] structured binding with auto type and for all loop in a template considered a type dependent name

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107858

Andrew Pinski  changed:

   What|Removed |Added

 Status|NEW |RESOLVED
 Resolution|--- |DUPLICATE

--- Comment #2 from Andrew Pinski  ---
Dup of bug 84469.

*** This bug has been marked as a duplicate of bug 84469 ***

[Bug c++/107858] structed binding with auto type and for all loop in a template considered a type dependent name

2022-11-24 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107858

Andrew Pinski  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
Summary|Variable in generic lambda  |structed binding with auto
   |incorrectly considered to   |type and for all loop in a
   |be a dependent name |template considered a type
   ||dependent name
   Last reconfirmed||2022-11-24

--- Comment #1 from Andrew Pinski  ---
This is unrelated to generic lambdas and can reproduce with just a templated
function:
```
struct y
{
  template
  void foo() const {}
};

template
struct pair
{
  T a, b;
};

template
void bar(void)
{
pair x[10];
for (auto const &[a, b] : x)
  a.foo<0>();
}
```

Note the forall loop is required.
I wonder if that is because begin and end are considered type depdent ...
But not using structured binding, GCC does not considered them as type depdent.
Confirmed.

[PATCH 35/35 V2] arm: improve tests for vsetq_lane*

2022-11-24 Thread Andrea Corallo via Gcc-patches
Kyrylo Tkachov  writes:

[...]

>> diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
>> b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
>> index e03e9620528..b5c9f4d5eb8 100644
>> --- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
>> +++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
>> @@ -1,15 +1,45 @@
>> -/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } 
>> {""} } */
>>  /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
>>  /* { dg-add-options arm_v8_1m_mve_fp } */
>>  /* { dg-additional-options "-O2" } */
>> +/* { dg-final { check-function-bodies "**" "" } } */
>> 
>>  #include "arm_mve.h"
>> 
>> +/*
>> +**foo:
>> +**  ...
>> +**  vmov.16 q[0-9]+\[[0-9]+\], (?:ip|fp|r[0-9]+)(?: @.*|)
>> +**  ...
>> +*/
>>  float16x8_t
>>  foo (float16_t a, float16x8_t b)
>>  {
>> -return vsetq_lane_f16 (a, b, 0);
>> +  return vsetq_lane_f16 (a, b, 1);
>>  }
>> 
>
> Hmm, for these tests we should be able to scan for more specific codegen as 
> we're setting individual lanes, so we should be able to scan for lane 1 in 
> the vmov instruction, though it may need to be flipped for big-endian.
> Thanks,
> Kyrill

Hi Kyrill,

please find attached the updated version of this patch.

Big-endian should not be a problem as for my understanding is just not
supported with MVE intrinsics.

Thanks!

  Andrea

>From 79f2c990553a1f793e08b9a0c4abb7dae8de7120 Mon Sep 17 00:00:00 2001
From: Andrea Corallo 
Date: Thu, 17 Nov 2022 11:06:29 +0100
Subject: [PATCH] arm: improve tests for vsetq_lane*

gcc/testsuite/ChangeLog:

* gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c: Improve test.
* gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_s16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_s32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_s64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_s8.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_u16.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_u32.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_u64.c: Likewise.
* gcc.target/arm/mve/intrinsics/vsetq_lane_u8.c: Likewise.
---
 .../arm/mve/intrinsics/vsetq_lane_f16.c   | 36 +++--
 .../arm/mve/intrinsics/vsetq_lane_f32.c   | 36 +++--
 .../arm/mve/intrinsics/vsetq_lane_s16.c   | 24 ++--
 .../arm/mve/intrinsics/vsetq_lane_s32.c   | 24 ++--
 .../arm/mve/intrinsics/vsetq_lane_s64.c   | 27 ++---
 .../arm/mve/intrinsics/vsetq_lane_s8.c| 24 ++--
 .../arm/mve/intrinsics/vsetq_lane_u16.c   | 36 +++--
 .../arm/mve/intrinsics/vsetq_lane_u32.c   | 36 +++--
 .../arm/mve/intrinsics/vsetq_lane_u64.c   | 39 ---
 .../arm/mve/intrinsics/vsetq_lane_u8.c| 36 +++--
 10 files changed, 284 insertions(+), 34 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
index e03e9620528..6b148a4b03d 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f16.c
@@ -1,15 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} 
} */
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
 
 #include "arm_mve.h"
 
+/*
+**foo:
+** ...
+** vmov.16 q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
 float16x8_t
 foo (float16_t a, float16x8_t b)
 {
-return vsetq_lane_f16 (a, b, 0);
+  return vsetq_lane_f16 (a, b, 1);
 }
 
-/* { dg-final { scan-assembler "vmov.16"  }  } */
 
+/*
+**foo1:
+** ...
+** vmov.16 q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
+float16x8_t
+foo1 (float16_t a, float16x8_t b)
+{
+  return vsetq_lane (a, b, 1);
+}
+
+/*
+**foo2:
+** ...
+** vmov.16 q[0-9]+\[1\], (?:ip|fp|r[0-9]+)(?:  @.*|)
+** ...
+*/
+float16x8_t
+foo2 (float16x8_t b)
+{
+  return vsetq_lane (1.1, b, 1);
+}
+
+/* { dg-final { scan-assembler-not "__ARM_undef" } } */
\ No newline at end of file
diff --git a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c 
b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
index 2b9f1a7e627..e4e7f892e97 100644
--- a/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
+++ b/gcc/testsuite/gcc.target/arm/mve/intrinsics/vsetq_lane_f32.c
@@ -1,15 +1,45 @@
-/* { dg-skip-if "Incompatible float ABI" { *-*-* } { "-mfloat-abi=soft" } {""} 
} */
 /* { dg-require-effective-target arm_v8_1m_mve_fp_ok } */
 /* { dg-add-options arm_v8_1m_mve_fp } */
 /* { dg-additional-options "-O2" } */
+/* { dg-final { 

[Bug target/106609] [12 Regression] sh3eb-elf cross compiler is being miscompiled since r12-1525-g3155d51bfd1de8b6c4645

2022-11-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106609

Jakub Jelinek  changed:

   What|Removed |Added

 CC||law at gcc dot gnu.org
   Priority|P3  |P4
Summary|[12/13 Regression]  |[12 Regression] sh3eb-elf
   |sh3eb-elf cross compiler is |cross compiler is being
   |being miscompiled since |miscompiled since
   |r12-1525-g3155d51bfd1de8b6c |r12-1525-g3155d51bfd1de8b6c
   |4645|4645

--- Comment #14 from Jakub Jelinek  ---
Ugh, wasted time.
This is (well, was) a SH backend bug, fixed apparently on the trunk by Jeff
in r13-4118-ge214cab68cb34e77622b91113f7698cf137bbdd6
gcc/config/sh/sh_treg_combine.cc
contained bogus
// FIXME: Remove dependency on SH predicate function somehow.
extern int t_reg_operand (rtx, machine_mode);
extern int negt_reg_operand (rtx, machine_mode);
declarations, when the definitions of those functions actually were
bool t_reg_operand (rtx op, machine_mode mode ATTRIBUTE_UNUSED) { ... }
bool negt_reg_operand (rtx op, machine_mode mode ATTRIBUTE_UNUSED) { ... }
As x86_64 ABI on bool return only guarantees the state of the low 8 bits,
the callee could leave garbage in the upper 56 bits of the %rax register,
and then the caller would test 32 bits against zero as it was told it returns
int.

[Bug middle-end/107840] ICE when compiling cursed setjmp/longjmp nested function calls and non-local jumps

2022-11-24 Thread marxin at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107840

Martin Liška  changed:

   What|Removed |Added

 CC||marxin at gcc dot gnu.org
 Ever confirmed|0   |1
   Last reconfirmed||2022-11-24
 Status|UNCONFIRMED |NEW

--- Comment #4 from Martin Liška  ---
Also happens with 4.8.0+.

Re: [PATCH v2 16/19] modula2 front end: bootstrap and documentation tools

2022-11-24 Thread Gaius Mulley via Gcc-patches
Martin Liška  writes:

> On 11/8/22 14:22, Gaius Mulley wrote:
>> Martin Liška  writes:
>> 
>> should be good - I'll complete the rst output in the scripts,
>
> Hi.
>

Hi Martin,

> As you probably noticed, the Sphinx migration didn't go well.

Yes, sorry to see this didn't happen.  Thank you for your hard work and
I hope it can occur in the future.

> However, it's still up to you if you want to use it or not for Modula
> 2.

Once modula-2 is in master I'd like to revisit rst in devel/modula-2
along with analyzer patches and m2 generics.  If successful then submit
patches in early stage 1.

> We have manuals like libgccjit, or Ada manuals
> that use RST natively and provide exported .texi files.

Ok thanks for the pointers, I will experiment with these build rhunes.

> Cheers and sorry for the troubles I caused.

No problem at all - the modula-2 scripts are now improved and cleaner
due to the port.  Hopefully rst will happen sometime in the future,

regards,
Gaius


Re: [Patch] OpenMP, libgomp, gimple: omp_get_max_teams, omp_set_num_teams, and omp_{gs}et_teams_thread_limit on offload devices

2022-11-24 Thread Marcel Vollweiler

Hi Jakub,


> * testsuite/libgomp.c-c++-common/icv-4.c: Bugfix.

Better say what exactly you changed in words.


Changed.


> --- a/gcc/gimplify.cc
> +++ b/gcc/gimplify.cc
> @@ -14153,7 +14153,7 @@ optimize_target_teams (tree target, gimple_seq
*pre_p)
>struct gimplify_omp_ctx *target_ctx = gimplify_omp_ctxp;
>
>if (teams == NULL_TREE)
> -num_teams_upper = integer_one_node;
> +num_teams_upper = build_int_cst (integer_type_node, -2);
>else
>  for (c = OMP_TEAMS_CLAUSES (teams); c; c = OMP_CLAUSE_CHAIN (c))
>{

The function comment above optimize_target_teams contains detailed description
on what the values mean and why, so it definitely should document what -2 means
and when it is used.
I know you have documentation in libgomp for it, but it should be in both 
places.


I updated the comment with an explanation for "-2".



> +  intptr_t new_teams = orig_teams, new_threads = orig_threads;
> +  /* ORIG_TEAMS == -2: No explicit teams construct specified. Set to 1.

Two spaces after .


Corrected here and at other places.



> + ORIG_TEAMS == -1: TEAMS construct with NUM_TEAMS clause specified, but
the
> +  value could not be specified. No Change.

Likewise.
lowercase change ?


Corrected.



> + ORIG_TEAMS == 0: TEAMS construct without NUM_TEAMS clause.
> + Set device-specific value.
> + ORIG_TEAMS > 0: Value was already set through e.g. NUM_TEAMS clause.
> +No change.  */
> +  if (orig_teams == -2)
> +new_teams = 1;
> +  else if (orig_teams == 0)
> +{
> +  struct gomp_offload_icv_list *item = gomp_get_offload_icv_item 
(device);
> +  if (item != NULL)
> +   new_teams = item->icvs.nteams;
> +}
> +  /* The device-specific teams-thread-limit is only set if (a) an explicit 
TEAMS
> + region exists, i.e. ORIG_TEAMS > -2, and (b) THREADS was not already 
set by
> + e.g. a THREAD_LIMIT clause.  */
> +  if (orig_teams >= -2 && orig_threads == 0)

The comment talks about ORIG_TEAMS > -2, but the condition is >= -2.
So which one is it?


Thanks for the hint. It should be indeed "> -2" since teams_thread_limit "sets
the maximum number of OpenMP threads to use in each contention group created by
a teams construct" (OpenMP 5.2, section 21.6.2). So if there is no (explicit)
teams construct, then teams_thread_limit doesn't need to be copied to the 
device.



> +  /* This tests a large number of teams and threads. If it is larger than
> +2^15+1 then the according argument in the kernels arguments list
> +is encoded with two items instead of one. On NVIDIA there is an
> +adjustment for too large teams and threads. For AMD such adjustment
> +exists only for threads and will cause runtime errors with a two
> +large

s/two/too/ ?
Shouldn't amdgcn adjusts also number of teams?


I adjusted now also the number of teams in the amdgcn plugin. As upper bound I
chose two times the number of compute units. This seems to be sufficient when
one team is executed at one compute unit. This at least avoids the queueing of a
large amount of teams and the corresponding memory allocation.

The drawback is that a user is probably not aware of the actual number of
compute units (which is not very large on gfx cards, e.g. 120 for gfx908 and 104
for gfx90a) and thus maybe expects different values from omp_get_team_num(). For
instance in something like the following:

#pragma omp target
#pragma omp teams num_teams(220)
#pragma omp distribute parallel for
  for(int i = 0; i < 220; ++i)
{
#pragma omp critical
   ... omp_get_team_num () ...
}

On a gfx90a card with 104 compute units 12 threads are assigned to "reused"
teams (instead of having their own teams) that would not be the case without the
limit.

Alternatively, we could just define some (larger) constant number (though I
don't know a reasonable value here). But this does actually not solve the above
mentioned drawback. I think, we need to find a compromise between an
unneccessary small upper bound and the chance to get memory allocation failures
due to a too large number of teams.



As for testcases, have you tested this in a native setup where 
dg-set-target-env-var
actually works?


Besides remote testing with offloading (which does not yet work with
dg-set-target-env-var), I also tested locally on x86_64-pc-linux-gnu with one
nvptx offload device without issues (using "make check" and verifying that
offloading is indeed used).

Marcel
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
This patch adds support for omp_get_max_teams, omp_set_num_teams, and
omp_{gs}et_teams_thread_limit on offload devices. That includes the usage of
device-specific ICV values (specified as environment variables or changed on a
device). In order 

[Bug tree-optimization/107859] New: Fail to optimize rot13

2022-11-24 Thread denis.campredon at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107859

Bug ID: 107859
   Summary: Fail to optimize rot13
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
  Assignee: unassigned at gcc dot gnu.org
  Reporter: denis.campredon at gmail dot com
  Target Milestone: ---

Compiled with -O2, the following functions produce different assembly although
they compute the same things:



unsigned rot13_1(unsigned c) {
  if(c >= 'A' && c <= 'Z') return 'A' + ((c -'A') + 13)%26;
  __builtin_unreachable();
}

unsigned rot13_2(unsigned c) {
  if (c >= 'A' && c <= 'M' ) return c + 13;
  else if (c >= 'N' && c <= 'Z' ) return c - 13;
  __builtin_unreachable();
}

unsigned rot13_3(unsigned c) {
  if(c >= 'A' && c <= 'Z') return  c + (c > 'Z' - 13 ? -13 : 13);
  __builtin_unreachable();
}

unsigned rot13_4(unsigned c) {
  if(c >= 'A' && c <= 'Z') return  c + 13 + (c > 'Z' - 13 ? -26 : 0);
  __builtin_unreachable();
}

--

rot13_1(unsigned int):
lea edx, [rdi-52]
mov rax, rdx
imulrdx, rdx, 1321528399
shr rdx, 35
imuledx, edx, 26
sub eax, edx
add eax, 65
ret
rot13_2(unsigned int):
lea edx, [rdi-65]
lea eax, [rdi+13]
sub edi, 13
cmp edx, 12
cmova   eax, edi
ret
rot13_3(unsigned int):
cmp edi, 78
sbb eax, eax
and eax, 26
lea eax, [rax-13+rdi]
ret
rot13_4(unsigned int):
cmp edi, 78
sbb eax, eax
not eax
and eax, -26
lea eax, [rax+13+rdi]
ret

[Bug tree-optimization/107413] Perf loss ~14% on 519.lbm_r SPEC cpu2017 benchmark with r8-7132-gb5b33e113434be

2022-11-24 Thread cvs-commit at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107413

--- Comment #12 from CVS Commits  ---
The master branch has been updated by Wilco Dijkstra :

https://gcc.gnu.org/g:0c1b0a23f1fe7db6a2e391b7cb78cff90032

commit r13-4291-g0c1b0a23f1fe7db6a2e391b7cb78cff90032
Author: Wilco Dijkstra 
Date:   Wed Nov 23 17:27:19 2022 +

AArch64: Add fma_reassoc_width [PR107413]

Add a reassocation width for FMA in per-CPU tuning structures. Keep
the existing setting of 1 for cores with 2 FMA pipes (this disables
reassociation), and use 4 for cores with 4 FMA pipes.  This improves
SPECFP2017 on Neoverse V1 by ~1.5%.

gcc/
PR tree-optimization/107413
* config/aarch64/aarch64.cc (struct tune_params): Add
fma_reassoc_width to all CPU tuning structures.
(aarch64_reassociation_width): Use fma_reassoc_width.
* config/aarch64/aarch64-protos.h (struct tune_params): Add
fma_reassoc_width.

[Bug target/106609] [12/13 Regression] sh3eb-elf cross compiler is being miscompiled since r12-1525-g3155d51bfd1de8b6c4645

2022-11-24 Thread jakub at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106609

Jakub Jelinek  changed:

   What|Removed |Added

   Last reconfirmed||2022-11-24
 Status|UNCONFIRMED |NEW
 CC||jakub at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #13 from Jakub Jelinek  ---
(In reply to Mikael Pettersson from comment #12)
> I tried compiling the gcc-13 cross compiler using the broken gcc-12 host
> compiler and -mtune-ctrl=^use_bt but that didn't help.
> 
> I then tried rebuilding the broken gcc-12 host compiler with the new
> splitters disabled, one by one. Disabling the "*bt_setcqi" one did
> unbreak the gcc-13 cross-compiler:
> 
> diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
> index 48532eb7ddf..0780ba992f3 100644
> --- a/gcc/config/i386/i386.md
> +++ b/gcc/config/i386/i386.md
> @@ -12830,7 +12830,7 @@
>   (const_int 1)
>   (zero_extend:SI (match_operand:QI 2 "register_operand"
> (clobber (reg:CC FLAGS_REG))]
> -  "TARGET_USE_BT && ix86_pre_reload_split ()"
> +  "0 && TARGET_USE_BT && ix86_pre_reload_split ()"
>"#"
>"&& 1"
>[(set (reg:CCC FLAGS_REG)

Ok, reproduced with last night's gcc trunk as the x86_64-linux system compiler
(with/without the above patch) and r12-8924-ga6b1f6126de5e4 as 12 branch for
the cross-compiler.  The difference appears first in the
sh_treg_combine2 dump
-not a condition store
-other set found - aborting trace
+inverted condition store
+tracing ccreg
+set of ccreg not found
+
+cbranch trace summary:
etc.
And bisection points to insn-preds.o.

[Bug fortran/107819] ICE in gfc_check_argument_var_dependency, at fortran/dependency.cc:978

2022-11-24 Thread mikael at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107819

Mikael Morin  changed:

   What|Removed |Added

 CC||mikael at gcc dot gnu.org

--- Comment #4 from Mikael Morin  ---
(In reply to anlauf from comment #3)
> But then no temporary is generated for a(n), which means we miss a
> corresponding check elsewhere.
> 
But is it required to generate a temporary?
As I understand it, the code is invalid, and (correctly) diagnosed, so there is
nothing else to do.
It's invalid because of 15.5.2.13 Restrictions on entities associated with
dummy arguments:
(4) If the value of the entity or any subobject of it is affected through the
dummy argument, then at any time during the invocation and execution of the
procedure, either before or after the definition, it shall be referenced only
through that dummy argument unless (...)

[Bug c++/107858] New: Variable in generic lambda incorrectly considered to be a dependent name

2022-11-24 Thread pilarlatiesa at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=107858

Bug ID: 107858
   Summary: Variable in generic lambda incorrectly considered to
be a dependent name
   Product: gcc
   Version: 11.1.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: c++
  Assignee: unassigned at gcc dot gnu.org
  Reporter: pilarlatiesa at gmail dot com
  Target Milestone: ---

$ cat test.cpp

struct T
{
  template
  void foo() const {}
};

template
struct pair
{
  T a, b;
};

void bar()
{
  [](auto)
  {
pair x[10];
for (auto const &[a, b] : x) a.foo<0>();
  };
}

$ g++-11 -c test.cpp 
test.cpp: In lambda function:
test.cpp:19:43: error: expected primary-expression before ‘)’ token
   19 | for (auto const &[a, b] : x) a.foo<0>();
  |   ^

$ g++-11 -v
Using built-in specs.
COLLECT_GCC=g++-11
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-linux-gnu/11/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none:amdgcn-amdhsa
OFFLOAD_TARGET_DEFAULT=1
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu
11.1.0-1ubuntu1~20.04' --with-bugurl=file:///usr/share/doc/gcc-11/README.Bugs
--enable-languages=c,ada,c++,go,brig,d,fortran,objc,obj-c++,m2 --prefix=/usr
--with-gcc-major-version-only --program-suffix=-11
--program-prefix=x86_64-linux-gnu- --enable-shared --enable-linker-build-id
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix
--libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu
--enable-libstdcxx-debug --enable-libstdcxx-time=yes
--with-default-libstdcxx-abi=new --enable-gnu-unique-object
--disable-vtable-verify --enable-plugin --enable-default-pie --with-system-zlib
--enable-libphobos-checking=release --with-target-system-zlib=auto
--enable-objc-gc=auto --enable-multiarch --disable-werror --disable-cet
--with-arch-32=i686 --with-abi=m64 --with-multilib-list=m32,m64,mx32
--enable-multilib --with-tune=generic
--enable-offload-targets=nvptx-none=/build/gcc-11-2V7zgg/gcc-11-11.1.0/debian/tmp-nvptx/usr,amdgcn-amdhsa=/build/gcc-11-2V7zgg/gcc-11-11.1.0/debian/tmp-gcn/usr
--without-cuda-driver --enable-checking=release --build=x86_64-linux-gnu
--host=x86_64-linux-gnu --target=x86_64-linux-gnu
--with-build-config=bootstrap-lto-lean --enable-link-serialization=2
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.1.0 (Ubuntu 11.1.0-1ubuntu1~20.04)


It can be worked around by typing a.template foo<0>(), which shouldn’t be
necessary.

Re: [Patch Arm] Fix PR 92999

2022-11-24 Thread Ramana Radhakrishnan
Ping x 2

Ramana

On Thu, 17 Nov 2022, 20:15 Ramana Radhakrishnan, 
wrote:

> On Fri, Nov 11, 2022 at 9:50 PM Ramana Radhakrishnan
>  wrote:
> >
> > On Thu, Nov 10, 2022 at 7:46 PM Ramana Radhakrishnan
> >  wrote:
> > >
> > > On Thu, Nov 10, 2022 at 6:03 PM Richard Earnshaw
> > >  wrote:
> > > >
> > > >
> > > >
> > > > On 10/11/2022 17:21, Richard Earnshaw via Gcc-patches wrote:
> > > > >
> > > > >
> > > > > On 08/11/2022 18:20, Ramana Radhakrishnan via Gcc-patches wrote:
> > > > >> PR92999 is a case where the VFP calling convention does not
> allocate
> > > > >> enough FP registers for a homogenous aggregate containing FP16
> values.
> > > > >> I believe this is the complete fix but would appreciate another
> set of
> > > > >> eyes on this.
> > > > >>
> > > > >> Could I get a hand with a regression test run on an armhf
> environment
> > > > >> while I fix my environment ?
> > > > >>
> > > > >> gcc/ChangeLog:
> > > > >>
> > > > >> PR target/92999
> > > > >> *  config/arm/arm.c (aapcs_vfp_allocate_return_reg): Adjust to
> handle
> > > > >> aggregates with elements smaller than SFmode.
> > > > >>
> > > > >> gcc/testsuite/ChangeLog:
> > > > >>
> > > > >> * gcc.target/arm/pr92999.c: New test.
> > > > >>
> > > > >>
> > > > >> Thanks,
> > > > >> Ramana
> > > > >>
> > > > >> Signed-off-by: Ramana Radhakrishnan 
> > > > >
> > > > > I'm not sure about this.  The AAPCS does not mention a base type
> of a
> > > > > half-precision FP type as an appropriate homogeneous aggregate for
> using
> > > > > VFP registers for either calling or returning.
> > >
> > > Ooh interesting, thanks for taking a look and poking at the AAPCS and
> > > that's a good catch. BF16 should also have the same behaviour as FP16
> > > , I suspect ?
> >
> > I suspect I got caught out by the definition of the Homogenous
> > aggregate from Section 5.3.5
> > ((
> https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#homogeneous-aggregates
> )
> > which simply suggests it's an aggregate of fundamental types which
> > lists half precision floating point .
> >
> > FTR, ideally I should have read 7.1.2.1
> >
> https://github.com/ARM-software/abi-aa/blob/2982a9f3b512a5bfdc9e3fea5d3b298f9165c36b/aapcs32/aapcs32.rst#procedure-calling
> )
> > :)
> >
> >
> >
> > >
> > > > >
> > > > > So perhaps the bug is that we try to treat this as a homogeneous
> > > > > aggregate at all.
> > >
> > > Yep I agree - I'll take a look again tomorrow and see if I can get a
> fix.
> > >
> > > (And thanks Alex for the test run, I might trouble you again while I
> > > still (slowly) get some of my boards back up)
> >
> >
> > and as promised take 2. I'd really prefer another review on this one
> > to see if I've not missed anything in the cases below.
>
> Ping  ?
>
> Ramana
>
> >
> > regards
> > Ramana
> >
> >
> > >
> > > regards,
> > > Ramana
> > >
> > >
> > > >
> > > > R.
>


  1   2   >