Re: [r12-3321 Regression] FAIL: gfortran.dg/PR100914.f90 -Os (test for excess errors) on Linux/x86_64

2021-09-02 Thread Sandra Loosemore

On 9/2/21 10:18 PM, sunil.k.pandey wrote:

On Linux/x86_64,

93b6b2f614eb692d1d8126ec6cb946984a9d01d7 is the first bad commit
commit 93b6b2f614eb692d1d8126ec6cb946984a9d01d7
Author: Sandra Loosemore 
Date:   Wed Aug 18 07:22:03 2021 -0700

 libgfortran: Further fixes for GFC/CFI descriptor conversions.

caused

FAIL: gfortran.dg/PR100914.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/PR100914.f90   -O1  (test for excess errors)
FAIL: gfortran.dg/PR100914.f90   -O2  (test for excess errors)
FAIL: gfortran.dg/PR100914.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/PR100914.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/PR100914.f90   -Os  (test for excess errors)


I did not see this failure in my own testing, but I also had this patch 
for another known bug in my stack:


https://gcc.gnu.org/pipermail/fortran/2021-August/056382.html

I just pinged that earlier this evening as it has not been reviewed yet. 
 I'll rebuild/retest without that patch and confirm that's what the 
problem is.


-Sandra


[PATCH] Fix some GC issues in the aarch64 back-end.

2021-09-02 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

I got some ICEs in my latest testsing while running the libstdc++ testsuite.
I had noticed the problem was connected to types and had just touched the
builtins code but nothing which could have caused this and I looked for
some types/variables that were not being marked with GTY.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c (struct aarch64_simd_type_info):
Mark with GTY.
(aarch64_simd_types): Likewise.
(aarch64_simd_intOI_type_node): Likewise.
(aarch64_simd_intCI_type_node): Likewise.
(aarch64_simd_intXI_type_node): Likewise.
* config/aarch64/aarch64.h (aarch64_fp16_type_node): Likewise.
(aarch64_fp16_ptr_type_node): Likewise.
(aarch64_bf16_type_node): Likewise.
(aarch64_bf16_ptr_type_node): Likewise.
---
 gcc/config/aarch64/aarch64-builtins.c | 10 +-
 gcc/config/aarch64/aarch64.h  |  8 
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index d441437..9f37a71 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -594,7 +594,7 @@ enum aarch64_simd_type
 };
 #undef ENTRY
 
-struct aarch64_simd_type_info
+struct GTY(()) aarch64_simd_type_info
 {
   enum aarch64_simd_type type;
 
@@ -626,14 +626,14 @@ struct aarch64_simd_type_info
 
 #define ENTRY(E, M, Q, G)  \
   {E, "__" #E, #G "__" #E, NULL_TREE, NULL_TREE, E_##M##mode, qualifier_##Q},
-static struct aarch64_simd_type_info aarch64_simd_types [] = {
+static GTY(()) struct aarch64_simd_type_info aarch64_simd_types [] = {
 #include "aarch64-simd-builtin-types.def"
 };
 #undef ENTRY
 
-static tree aarch64_simd_intOI_type_node = NULL_TREE;
-static tree aarch64_simd_intCI_type_node = NULL_TREE;
-static tree aarch64_simd_intXI_type_node = NULL_TREE;
+static GTY(()) tree aarch64_simd_intOI_type_node = NULL_TREE;
+static GTY(()) tree aarch64_simd_intCI_type_node = NULL_TREE;
+static GTY(()) tree aarch64_simd_intXI_type_node = NULL_TREE;
 
 /* The user-visible __fp16 type, and a pointer to that type.  Used
across the back-end.  */
diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index bfffbcd..a5ba6c2 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -1262,13 +1262,13 @@ extern const char *host_detect_local_cpu (int argc, 
const char **argv);
 
 /* This type is the user-visible __fp16, and a pointer to that type.  We
need it in many places in the backend.  Defined in aarch64-builtins.c.  */
-extern tree aarch64_fp16_type_node;
-extern tree aarch64_fp16_ptr_type_node;
+extern GTY(()) tree aarch64_fp16_type_node;
+extern GTY(()) tree aarch64_fp16_ptr_type_node;
 
 /* This type is the user-visible __bf16, and a pointer to that type.  Defined
in aarch64-builtins.c.  */
-extern tree aarch64_bf16_type_node;
-extern tree aarch64_bf16_ptr_type_node;
+extern GTY(()) tree aarch64_bf16_type_node;
+extern GTY(()) tree aarch64_bf16_ptr_type_node;
 
 /* The generic unwind code in libgcc does not initialize the frame pointer.
So in order to unwind a function using a frame pointer, the very first
-- 
1.8.3.1



[r12-3321 Regression] FAIL: gfortran.dg/PR100914.f90 -Os (test for excess errors) on Linux/x86_64

2021-09-02 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

93b6b2f614eb692d1d8126ec6cb946984a9d01d7 is the first bad commit
commit 93b6b2f614eb692d1d8126ec6cb946984a9d01d7
Author: Sandra Loosemore 
Date:   Wed Aug 18 07:22:03 2021 -0700

libgfortran: Further fixes for GFC/CFI descriptor conversions.

caused

FAIL: gfortran.dg/PR100914.f90   -O0  (test for excess errors)
FAIL: gfortran.dg/PR100914.f90   -O1  (test for excess errors)
FAIL: gfortran.dg/PR100914.f90   -O2  (test for excess errors)
FAIL: gfortran.dg/PR100914.f90   -O3 -fomit-frame-pointer -funroll-loops 
-fpeel-loops -ftracer -finline-functions  (test for excess errors)
FAIL: gfortran.dg/PR100914.f90   -O3 -g  (test for excess errors)
FAIL: gfortran.dg/PR100914.f90   -Os  (test for excess errors)

with GCC configured with



To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/PR100914.f90 --target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/PR100914.f90 --target_board='unix{-m32\ 
-march=cascadelake}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/PR100914.f90 --target_board='unix{-m64}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg.exp=gfortran.dg/PR100914.f90 --target_board='unix{-m64\ 
-march=cascadelake}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [RFC/PATCH] ipa-inline: Add target info into fn summary [PR102059]

2021-09-02 Thread Kewen.Lin via Gcc-patches
Hi Segher,

Thanks for the comments!

on 2021/9/3 上午1:44, Segher Boessenkool wrote:
> Hi!
> 
> On Wed, Sep 01, 2021 at 03:02:22PM +0800, Kewen.Lin wrote:
>> It introduces two target hooks need_ipa_fn_target_info and
>> update_ipa_fn_target_info.  The former allows target to do
>> some previous check and decides to collect target specific
>> information for this function or not.  For some special case,
>> it can predict the analysis result and push it early without
>> any scannings.  The latter allows the analyze_function_body
>> to pass gimple stmts down just like fp_expressions handlings,
>> target can do its own tricks.
>>
>> To make it simple, this patch uses HOST_WIDE_INT to record the
>> flags just like what we use for isa_flags.  For rs6000's HTM
>> need, one HOST_WIDE_INT variable is quite enough, but it seems
>> good to have one auto_vec for scalability as I noticed some
>> targets have more than one HOST_WIDE_INT flag.  For now, this
>> target information collection is only for always_inline function,
>> function ipa_merge_fn_summary_after_inlining deals with target
>> information merging.
> 
> These flags can in principle be separate from any flags the target
> keeps, so 64 bits will be enough for a long time.  If we want to
> architect that better, we should really architect the way all targets
> do target flags first.  Let's not go there now :-)
> 
> So just one HOST_WIDE_INT, not a stack of them please?

I considered this, it's fine to use this customized bit in the target hook,
but back to target hook can_inline_p, we have to decoded them to the bits
in isa_flags separately, it's inefficient than just using the whole mask
if the interesting bits are more.

As the discussion with Richi, theoretically speaking if target likes, it can
try to scan for many isa features with target's own desicions, there could be
much more bits.  Another thing inspiring me to make it with one vector is that
i386 port ix86_can_inline_p checks x_ix86_target_flags, x_ix86_isa_flags,
x_ix86_isa_flags2, arch and tune etc. now, one HOST_WIDE_INT seems not good
to it, if it wants to check more.  ;-)

> 
>> --- a/gcc/config/rs6000/rs6000-call.c
>> +++ b/gcc/config/rs6000/rs6000-call.c
>> @@ -13642,6 +13642,17 @@ rs6000_builtin_decl (unsigned code, bool 
>> initialize_p ATTRIBUTE_UNUSED)
>>return rs6000_builtin_decls[code];
>>  }
>>  
>> +/* Return true if the builtin with CODE has any mask bits set
>> +   which are specified by MASK.  */
>> +
>> +bool
>> +rs6000_builtin_mask_set_p (unsigned code, HOST_WIDE_INT mask)
>> +{
>> +  gcc_assert (code < RS6000_BUILTIN_COUNT);
>> +  HOST_WIDE_INT fnmask = rs6000_builtin_info[code].mask;
>> +  return fnmask & mask;
>> +}
> 
> The "_p" does not say that "any bits" part, which is crucial here.  So
> name this something like "rs6000_fn_has_any_of_these_mask_bits"?  Yes
> the name sucks, because this interface does :-P
> 

Thanks for the name, will fix it.  :)

> Its it useful to have "any" semantics at all?  Otherwise, require this
> to be passed just a single bit?
> 

Since we can not just pass in a bit, we have to assert it with something
like:

   gcc_assert (__builtin_popcount(mask) == 1);

to claim it's checking a single bit.  But the implementation logic still
supports checking any bits, so I thought we can just claim it to check
any bits and a single bit is just one special case.

Yeah, not sure if there is a need to check any bits, but something like
checking exists FRSQRTE and FRSQRTES bifs can pass (RS6000_BTM_FRSQRTE |
RS6000_BTM_FRSQRTES), so is it fine to keep it for any bits?

> The implicit "!!" (or "!= 0", same thing) that casting to bool does
> might be better explicit, too?  A cast to bool changes value so is more
> suprising than other casts.

OK, will fix it.

> 
>> +  /* Assume inline asm can use any instruction features.  */
>> +  if (gimple_code (stmt) == GIMPLE_ASM)
>> +{
>> +  info[0] = -1;
>> +  return false;
>> +}
> 
> What is -1 here?  "All options set"?  Does that work?  Reliably?
> 

Good question, in the current implementation it's reliable, since we do
operation "~" first then & the interesting bits (OPTION_MASK_HTM here)
but I think you concerned some conflict bits co-exists is reasonable or
not.  I was intended to cover any future interesting bits, but I agree
it's better to just set the correpsonding intersting bits to make it clear.

Will fix it.

>> +  if (fndecl && fndecl_built_in_p (fndecl, BUILT_IN_MD))
>> +{
>> +  enum rs6000_builtins fcode =
>> +(enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
>> +  /* HTM bifs definitely exploit HTM insns.  */
>> +  if (rs6000_builtin_mask_set_p ((unsigned) fcode, RS6000_BTM_HTM))
> 
> Why the cast here?  Please change the parameter type, instead?  It is
> fine to use enums specific to our backend in that backend itself :-)
> 

Referred to the exisitng rs6000_builtin_decl, just noticed it's a hook.
Will fix it.

>> @@ -1146,6 +1147,16 @@ ipa_dump_fn_summa

Re: Ping ^ 2: [PATCH] rs6000: Expand fmod and remainder when built with fast-math [PR97142]

2021-09-02 Thread Xionghu Luo via Gcc-patches
Resend the patch that addressed Will's comments.


fmod/fmodf and remainder/remainderf could be expanded instead of library
call when fast-math build, which is much faster.

fmodf:
 fdivs   f0,f1,f2
 frizf0,f0
 fnmsubs f1,f2,f0,f1

remainderf:
 fdivs   f0,f1,f2
 frinf0,f0
 fnmsubs f1,f2,f0,f1

SPEC2017 Ofast P8LE: 511.povray_r +1.14%,  526.blender_r +1.72%

gcc/ChangeLog:

2021-09-03  Xionghu Luo  

PR target/97142
* config/rs6000/rs6000.md (fmod3): New define_expand.
(remainder3): Likewise.

gcc/testsuite/ChangeLog:

2021-09-03  Xionghu Luo  

PR target/97142
* gcc.target/powerpc/pr97142.c: New test.
---
 gcc/config/rs6000/rs6000.md| 36 ++
 gcc/testsuite/gcc.target/powerpc/pr97142.c | 35 +
 2 files changed, 71 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr97142.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index c8cdc42533c..84820d3b5cb 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -4932,6 +4932,42 @@ (define_insn "fre"
   [(set_attr "type" "fp")
(set_attr "isa" "*,")])
 
+(define_expand "fmod3"
+  [(use (match_operand:SFDF 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))
+   (use (match_operand:SFDF 2 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+  && TARGET_FPRND
+  && flag_unsafe_math_optimizations"
+{
+  rtx div = gen_reg_rtx (mode);
+  emit_insn (gen_div3 (div, operands[1], operands[2]));
+
+  rtx friz = gen_reg_rtx (mode);
+  emit_insn (gen_btrunc2 (friz, div));
+
+  emit_insn (gen_nfms4 (operands[0], operands[2], friz, operands[1]));
+  DONE;
+ })
+
+(define_expand "remainder3"
+  [(use (match_operand:SFDF 0 "gpc_reg_operand"))
+   (use (match_operand:SFDF 1 "gpc_reg_operand"))
+   (use (match_operand:SFDF 2 "gpc_reg_operand"))]
+  "TARGET_HARD_FLOAT
+  && TARGET_FPRND
+  && flag_unsafe_math_optimizations"
+{
+  rtx div = gen_reg_rtx (mode);
+  emit_insn (gen_div3 (div, operands[1], operands[2]));
+
+  rtx frin = gen_reg_rtx (mode);
+  emit_insn (gen_round2 (frin, div));
+
+  emit_insn (gen_nfms4 (operands[0], operands[2], frin, operands[1]));
+  DONE;
+ })
+
 (define_insn "*rsqrt2"
   [(set (match_operand:SFDF 0 "gpc_reg_operand" "=,wa")
(unspec:SFDF [(match_operand:SFDF 1 "gpc_reg_operand" ",wa")]
diff --git a/gcc/testsuite/gcc.target/powerpc/pr97142.c 
b/gcc/testsuite/gcc.target/powerpc/pr97142.c
new file mode 100644
index 000..e5306eb681b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr97142.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-Ofast" } */
+
+#include 
+
+float test1 (float x, float y)
+{
+  return fmodf (x, y);
+}
+
+double test2 (double x, double y)
+{
+  return fmod (x, y);
+}
+
+float test3 (float x, float y)
+{
+  return remainderf (x, y);
+}
+
+double test4 (double x, double y)
+{
+  return remainder (x, y);
+}
+
+/* { dg-final { scan-assembler-not {\mbl fmod\M} } } */
+/* { dg-final { scan-assembler-not {\mbl fmodf\M} } } */
+/* { dg-final { scan-assembler-not {\mbl remainder\M} } } */
+/* { dg-final { scan-assembler-not {\mbl remainderf\M} } } */
+/* { dg-final { scan-assembler-times {\mfdiv\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mfdivs\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mfnmsub\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mfnmsubs\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mfriz\M} 2 } } */
+/* { dg-final { scan-assembler-times {\mfrin\M} 2 } } */
-- 
2.25.1



PING Re: [PATCH, Fortran] Revert to non-multilib-specific ISO_Fortran_binding.h

2021-09-02 Thread Sandra Loosemore

On 8/18/21 8:57 PM, Sandra Loosemore wrote:


This is a follow-up to commit fef67987cf502fe322e92ddce22eea7ac46b4d75:

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=fef67987cf502fe322e92ddce22eea7ac46b4d75 



I realized last week that having multilib-specific versions of 
ISO_Fortran_binding.h (generated by running the compiler to ask what 
kinds it supports) was still broken outside of the test support; the 
directory where it's being installed isn't on GCC's normal search path. 
It seemed to me that it was better to try to find some other solution 
for this problem than to venture down what appears to be a rat hole.


I've come up with this patch to return to a single ISO_Fortran_binding.h 
file that uses preprocessor magic to identify the Fortran kind 
corresponding to the standard C long double type and the GCC extension 
types __float128 and int128_t.  I haven't attempted to undo the 
follow-up patches that fixed in-tree testing; the static .h file is 
still copied to the build directory, and it can still be referenced with 
<> syntax during testing.


Any complaints about either the overall strategy here, or the logic to 
infer the C type -> kind mapping?  Or OK to commit?

Ping!

https://gcc.gnu.org/pipermail/fortran/2021-August/056382.html

-Sandra


Re: [PATCH v3] MIPS: add .module mipsREV/.set arch= to all output asm file

2021-09-02 Thread YunQiang Su
YunQiang Su  于2021年9月3日周五 上午9:53写道:
>
> Maciej W. Rozycki  于2021年9月3日周五 上午9:48写道:
> >
> > On Thu, 2 Sep 2021, YunQiang Su wrote:
> >
> > > diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> > > index 493d3de48..743a1d0fe 100644
> > > --- a/gcc/config/mips/mips.c
> > > +++ b/gcc/config/mips/mips.c
> > > @@ -9896,6 +9896,12 @@ mips_file_start (void)
> > >else
> > >  fputs ("\t.module\tnooddspreg\n", asm_out_file);
> > >
> > > +  if (!global_options_set.x_mips_arch_option
> > > +  || startswith(mips_arch_info->name, "mips"))
>
> here is a code style problem. I will fix it.
>
> > > +fprintf (asm_out_file, "\t.module\t%s\n", mips_arch_info->name);
> > > +  else
> > > +fprintf (asm_out_file, "\t.set\tarch=%s\n", mips_arch_info->name);
> >
> >  Why not consistently `.module' for both legs?  And actually why not just
> > `.module arch=...' in all cases?
> >
>
> Thanks for your advisor. I will have a try.

Ohh, it really works. w.
The testsuite is running.

> The reason is that I don't know this method.
>
> >   Maciej


Re: [PATCH v3] MIPS: add .module mipsREV/.set arch= to all output asm file

2021-09-02 Thread YunQiang Su
Maciej W. Rozycki  于2021年9月3日周五 上午9:48写道:
>
> On Thu, 2 Sep 2021, YunQiang Su wrote:
>
> > diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> > index 493d3de48..743a1d0fe 100644
> > --- a/gcc/config/mips/mips.c
> > +++ b/gcc/config/mips/mips.c
> > @@ -9896,6 +9896,12 @@ mips_file_start (void)
> >else
> >  fputs ("\t.module\tnooddspreg\n", asm_out_file);
> >
> > +  if (!global_options_set.x_mips_arch_option
> > +  || startswith(mips_arch_info->name, "mips"))

here is a code style problem. I will fix it.

> > +fprintf (asm_out_file, "\t.module\t%s\n", mips_arch_info->name);
> > +  else
> > +fprintf (asm_out_file, "\t.set\tarch=%s\n", mips_arch_info->name);
>
>  Why not consistently `.module' for both legs?  And actually why not just
> `.module arch=...' in all cases?
>

Thanks for your advisor. I will have a try.
The reason is that I don't know this method.

>   Maciej


[PATCH] [aarch64] Fix target/95969: __builtin_aarch64_im_lane_boundsi interferes with gimple

2021-09-02 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

This patch adds simple folding of __builtin_aarch64_im_lane_boundsi where
we are not going to error out. It fixes the problem by the removal
of the function from the IR.

OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c (aarch64_fold_builtin_lane_check):
New function.
(aarch64_general_fold_builtin): Handle AARCH64_SIMD_BUILTIN_LANE_CHECK.
(aarch64_general_gimple_fold_builtin): Likewise.
---
 gcc/config/aarch64/aarch64-builtins.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/gcc/config/aarch64/aarch64-builtins.c 
b/gcc/config/aarch64/aarch64-builtins.c
index f6b41d9c200..d4414373aa4 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -29,6 +29,7 @@
 #include "rtl.h"
 #include "tree.h"
 #include "gimple.h"
+#include "ssa.h"
 #include "memmodel.h"
 #include "tm_p.h"
 #include "expmed.h"
@@ -2333,6 +2334,27 @@ aarch64_general_builtin_rsqrt (unsigned int fn)
   return NULL_TREE;
 }
 
+/* Return true if the lane check can be removed as there is no
+   error going to be emitted.  */
+static bool
+aarch64_fold_builtin_lane_check (tree arg0, tree arg1, tree arg2)
+{
+  if (TREE_CODE (arg0) != INTEGER_CST)
+return false;
+  if (TREE_CODE (arg1) != INTEGER_CST)
+return false;
+  if (TREE_CODE (arg2) != INTEGER_CST)
+return false;
+
+  auto totalsize = wi::to_widest (arg0);
+  auto elementsize = wi::to_widest (arg1);
+  if (totalsize == 0 || elementsize == 0)
+return false;
+  auto lane = wi::to_widest (arg2);
+  auto high = wi::udiv_trunc (totalsize, elementsize);
+  return wi::ltu_p (lane, high);
+}
+
 #undef VAR1
 #define VAR1(T, N, MAP, FLAG, A) \
   case AARCH64_SIMD_BUILTIN_##T##_##N##A:
@@ -2353,6 +2375,11 @@ aarch64_general_fold_builtin (unsigned int fcode, tree 
type,
   VAR1 (UNOP, floatv4si, 2, ALL, v4sf)
   VAR1 (UNOP, floatv2di, 2, ALL, v2df)
return fold_build1 (FLOAT_EXPR, type, args[0]);
+  case AARCH64_SIMD_BUILTIN_LANE_CHECK:
+   if (n_args == 3
+   && aarch64_fold_builtin_lane_check (args[0], args[1], args[2]))
+ return fold_convert (void_type_node, integer_zero_node);
+   break;
   default:
break;
 }
@@ -2440,6 +2467,14 @@ aarch64_general_gimple_fold_builtin (unsigned int fcode, 
gcall *stmt)
}
  break;
}
+case AARCH64_SIMD_BUILTIN_LANE_CHECK:
+  if (aarch64_fold_builtin_lane_check (args[0], args[1], args[2]))
+   {
+ unlink_stmt_vdef (stmt);
+ release_defs (stmt);
+ new_stmt = gimple_build_nop ();
+   }
+  break;
 default:
   break;
 }
-- 
2.17.1



Re: [PATCH v3] MIPS: add .module mipsREV/.set arch= to all output asm file

2021-09-02 Thread Maciej W. Rozycki
On Thu, 2 Sep 2021, YunQiang Su wrote:

> diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
> index 493d3de48..743a1d0fe 100644
> --- a/gcc/config/mips/mips.c
> +++ b/gcc/config/mips/mips.c
> @@ -9896,6 +9896,12 @@ mips_file_start (void)
>else
>  fputs ("\t.module\tnooddspreg\n", asm_out_file);
>  
> +  if (!global_options_set.x_mips_arch_option
> +  || startswith(mips_arch_info->name, "mips"))
> +fprintf (asm_out_file, "\t.module\t%s\n", mips_arch_info->name);
> +  else
> +fprintf (asm_out_file, "\t.set\tarch=%s\n", mips_arch_info->name);

 Why not consistently `.module' for both legs?  And actually why not just 
`.module arch=...' in all cases?

  Maciej


[PATCH v3] MIPS: add .module mipsREV/.set arch= to all output asm file

2021-09-02 Thread YunQiang Su
Currently, the asm output file for MIPS has no rev info.
It can make some trouble, for example:

  assembler is mips1 by default,
  gcc is fpxx by default.

To assemble the output of gcc -S, we have to pass -mips2
to assembler.

The same situation is for some CPU has extension insn.
Octeon is an example.
So we can just add ".set arch=octeon".
---
 gcc/config/mips/mips.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 493d3de48..743a1d0fe 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -9896,6 +9896,12 @@ mips_file_start (void)
   else
 fputs ("\t.module\tnooddspreg\n", asm_out_file);
 
+  if (!global_options_set.x_mips_arch_option
+  || startswith(mips_arch_info->name, "mips"))
+fprintf (asm_out_file, "\t.module\t%s\n", mips_arch_info->name);
+  else
+fprintf (asm_out_file, "\t.set\tarch=%s\n", mips_arch_info->name);
+
 #else
 #ifdef HAVE_AS_GNU_ATTRIBUTE
   {
-- 
2.30.2



[PATCH v2] c++: Fix cp_tree_equal for template value args using dependent sizeof/alignof/noexcept expressions

2021-09-02 Thread Barrett Adair via Gcc-patches
Thanks for the feedback, Jason. Coming back to this today, The problem
appears much deeper than I realized. I've attached another WIP version of
the patch, including a couple of new test cases based on your feedback (for
now, please excuse any misformatted dg-error comments).

The dependent-name16.C case demonstrates an error message regression caused
by this patch; I'm trying to fix the regression. When compiling
dependent-name16.C and breaking at cp/tree.c:3922, DECL_CONTEXT(t2) is NULL
because the "g" declaration is still being parsed near the top of
cp_parser_init_declarator, and context is only set later during that
function. DECL_CONTEXT(t1), on the other hand, is set because the "f"
declaration was already parsed.

I'm beginning to believe that a proper solution to this problem would
require decorating the function template type parameter nodes with more
function information (e.g. at least scope and name) prior to parsing the
trailing return type, if not somehow setting the DECL_CONTEXT earlier in
some form -- am I missing something?

Also, in the first place, I'm a little confused why we insert dependent-arg
instantiations into the specialization/instantiation hash tables before any
top-level instantiation occurs. From a bird's eye view, the
benefit/necessity of this design is unclear. Can anyone point me to some
background reading here?

Thanks,
Barrett
From 90baf57f388a0aed45f72cac5b1df920ec61b6ab Mon Sep 17 00:00:00 2001
From: Barrett Adair 
Date: Fri, 20 Aug 2021 15:37:36 -0500
Subject: [PATCH] Fix cp_tree_equal for template value args using dependent
 sizeof/alignof/noexcept expressions

---
 gcc/cp/tree.c  | 12 
 gcc/testsuite/g++.dg/template/canon-type-15.C  |  7 +++
 gcc/testsuite/g++.dg/template/canon-type-16.C  |  6 ++
 gcc/testsuite/g++.dg/template/canon-type-17.C  |  5 +
 gcc/testsuite/g++.dg/template/canon-type-18.C  |  6 ++
 .../g++.dg/template/dependent-name15.C | 18 ++
 .../g++.dg/template/dependent-name16.C | 16 
 7 files changed, 70 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-15.C
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-16.C
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-17.C
 create mode 100644 gcc/testsuite/g++.dg/template/canon-type-18.C
 create mode 100644 gcc/testsuite/g++.dg/template/dependent-name15.C
 create mode 100644 gcc/testsuite/g++.dg/template/dependent-name16.C

diff --git a/gcc/cp/tree.c b/gcc/cp/tree.c
index 3c62dd74380..72e616a4eb7 100644
--- a/gcc/cp/tree.c
+++ b/gcc/cp/tree.c
@@ -3920,6 +3920,8 @@ cp_tree_equal (tree t1, tree t2)
 	 template.  */
 
   if (comparing_specializations
+  && DECL_CONTEXT (t1)
+  && DECL_CONTEXT (t2)
 	  && DECL_CONTEXT (t1) != DECL_CONTEXT (t2))
 	/* When comparing hash table entries, only an exact match is
 	   good enough; we don't want to replace 'this' with the
@@ -4015,6 +4017,16 @@ cp_tree_equal (tree t1, tree t2)
 	else
 	  return cp_tree_equal (o1, o2);
   }
+case NOEXCEPT_EXPR:
+  {
+	tree o1 = TREE_OPERAND (t1, 0);
+	tree o2 = TREE_OPERAND (t2, 0);
+
+	if (TREE_CODE (o1) != TREE_CODE (o2))
+	  return false;
+
+	return cp_tree_equal (o1, o2);
+  }
 
 case MODOP_EXPR:
   {
diff --git a/gcc/testsuite/g++.dg/template/canon-type-15.C b/gcc/testsuite/g++.dg/template/canon-type-15.C
new file mode 100644
index 000..b001b5c841d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/canon-type-15.C
@@ -0,0 +1,7 @@
+// { dg-do compile { target c++11 } }
+template struct size_c{ static constexpr unsigned value = u; };
+namespace g {
+template auto return_size(T t) -> size_c;
+template auto return_size(T t) -> size_c;
+}
+static_assert(decltype(g::return_size('a'))::value == 1u, "");
diff --git a/gcc/testsuite/g++.dg/template/canon-type-16.C b/gcc/testsuite/g++.dg/template/canon-type-16.C
new file mode 100644
index 000..99361cbac30
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/canon-type-16.C
@@ -0,0 +1,6 @@
+// { dg-do compile { target c++11 } }
+template struct bool_c{ static constexpr bool value = u; };
+template auto noexcepty(T t) -> bool_c;
+template auto noexcepty(T t) -> bool_c;
+struct foo { void operator()() noexcept; };
+static_assert(decltype(noexcepty(foo{}))::value, "");
diff --git a/gcc/testsuite/g++.dg/template/canon-type-17.C b/gcc/testsuite/g++.dg/template/canon-type-17.C
new file mode 100644
index 000..0555c8d0a42
--- /dev/null
+++ b/gcc/testsuite/g++.dg/template/canon-type-17.C
@@ -0,0 +1,5 @@
+// { dg-do compile { target c++11 } }
+template struct size_c{ static constexpr unsigned value = u; };
+template auto return_size(T... t) -> size_c;
+template auto return_size(T... t) -> size_c;
+static_assert(decltype(return_size('a'))::value == 1u, "");
diff --git a/gcc/testsuite/g++.dg/template/canon-type-18.C b/gcc/testsuite/g++.dg/template/canon-type-18.C
new file mode 100644
index 00

Re: [PATCH] warn for more impossible null pointer tests [PR102103]

2021-09-02 Thread Martin Sebor via Gcc-patches

Attached is an updated patch with Jason's suggested change to use
handled_component_p(), retested on x86_64-linux and with Glibc.
Adding more tests led to more changes but hopefully also a better
end result.

I've changed the warning suppression from a cast to void* to one
to intptr_t, in part because that's what Clang does, and in part
because I couldn't get it to work consistently between C and C++
(the C front end seems to introduce a NOP_EXPR in some cases even
when there is no cast in the source).  In hindsight, a cast to
void* doesn't seem like the most intuitive way to avoid this sort
of warning.

Jeff, I didn't get your reply to my first post for some reason so
let me copy your question here and answer it below:

  how does this interact with targets that allow objects at address
  0?   We have a few targets like that and that makes me wonder if
  we should be suppressing some, if not all, of these warnings for
  targets that turn on -fno-delete-null-pointer-checks?

I didn't see any code related to -Waddress that tries to handle those
targets.  I have very little experience with them but I do know that
AVR makes it possible to pin an object to a fixed address by means
of attribute address.  The test case below triggers -Waddress (both
with and without my changes):

__attribute__ ((address (0))) int x;

int f (void)
{
  if (&x == 0) return -1;   // -Waddress

  int *p = &x;
  int t = *p;
  if (!p) __builtin_abort ();   // folded to false
  return t;
}

But the optimized code doesn't have the second test so I'm not sure
that the address attribute here does what I think it does (I'd
expect the test to be folded to true).  If you have a better test
case or other targets for me to try I can look into it some more.

Martin

On 9/2/21 8:39 AM, Martin Sebor wrote:

On 9/2/21 7:43 AM, Jason Merrill wrote:

On 9/1/21 6:27 PM, Martin Sebor wrote:

On 9/1/21 3:39 PM, Jason Merrill wrote:

On 9/1/21 4:33 PM, Martin Sebor wrote:

On 9/1/21 1:21 PM, Jason Merrill wrote:

On 8/31/21 10:08 PM, Martin Sebor wrote:
A Coverity run recently uncovered a latent bug in GCC that GCC 
should

be able to detect itself: comparing the address of a declared object
for equality to null, similar to:

   int f (void)
   {
 int a[2][2];
 return &a == 0;
   }

GCC issues -Waddress for this code, but the bug Coverity found was
actually closer to the following:

   int f (void)
   {
 int a[2][2];
 return a[0] == 0;
   }

where the hapless author (yours truly) meant to compare the value
of a[0][0] (as in r12-3268).

This variant is not diagnosed even though the bug in it is the same
and I'd expect more likely to occur in practice.  (&a[0] == 0 isn't
diagnosed either, though that's a less likely mistake to make).

The attached patch enhances -Waddress to detect this variant along
with a number of other similar instances of the problem, including
comparing the address of array members to null.

Besides these, the patch also issues -Waddress for null equality
tests of pointer-plus expressions such as in:

   int g (int i)
   {
 return a[0] + i == 0;
   }

and in C++ more instances of pointers to members.

Testing on x86_64-linux, besides a few benign issues in GCC sources
a regression test, run shows a failure in gcc.dg/Waddress.c.  That's
a test added after GCC for some reason stopped warning for one of
the basic cases that other tools warn about (comparing an array to
null).  I suspect the change was unintentional because GCC still
warns for other very similar expressions.  The reporter who also
submitted the test in pr36299 argued that the warning wasn't
helpful because tests for arrays sometimes come from macros, and
the test was committed after it was noted that GCC no longer warned
for the reporter's simple case.  While it's certainly true that
the warning can be triggered by the null equality tests in macros
(the patch exposed two such instances in GCC) they are easy to
avoid (the patch adds a an additional escape hatch).  At the same
time, as is evident from the Coverity bug report and from the two
issues the enhancement exposes in the FORTRAN front end (even if
benign), issuing the warning in these cases does help find bugs
or mistaken assumptions.  With that, I've changed the test to
expect the restored -Waddress warning instead.

Testing with Glibc exposed a couple of harmless comparisons of
arrays a large macro in vfprintf-internal.c.  I'll submit a fix
to avoid the -Waddress instances if/when this enhancement is
approved.

Testing with Binutils/GDB also turned up a couple of pointless
comparison of arrays to null and a couple of uses in macros that
can be trivially suppressed.

Martin

PS Clang issues a warning for some of the same null pointer tests
the patch diagnoses, including gcc.dg/Waddress.c, except under at
least three different options: some under -Wpointer-bool-conversion,
others under -Wtautological-pointer-compare, and others still under
-Wtautological-compare.



+  while (TREE_CODE 

Re: [PATCH] Jit, testsuite: Amend expect processing to tolerate more platforms.

2021-09-02 Thread Iain Sandoe
Hi David,

> On 2 Sep 2021, at 15:47, David Malcolm  wrote:
> 
> On Thu, 2021-08-19 at 19:59 +0100, Iain Sandoe wrote:

>> OK for master?
> 
> Did you try this with RUN_UNDER_VALGRIND set?  Assuming that that still
> works, yes, looks good to me.

For what configuration parameters is this expected to work?

With unpatched master and "--enable-languages=all --enable-host-shared 
--enable-default-pie” 
RUN_UNDER_VALGRIND=1 make check-jit

I am seeing a large number of fails of the form:

ERROR: verbose: illegal argument: --19611-- When reading debug info from 
/tmp/libgccjit-9Op2CX/fake.so:
ERROR: verbose: illegal argument: --19611-- parse_CU_Header: is neither DWARF2 
nor DWARF3 nor DWARF4

and a lot of timeouts.
guessing there’s something needed to make this DWARF-5 friendly?
(this is on gcc123)

Will report on the various test permutations once I get this sorted out …

cheers
Iain



Sv: Sv: [PATCH] jit : Generate debug info for variables

2021-09-02 Thread Petter Tomner via Gcc-patches
Thanks for the review. I will post a revision to the mailing list.

I have not filed a copyright assignment with the FSF, but if DCO is enough
I'll add it. I also read the mail about the testcase of this patch.

Regards, Petter

-Ursprungligt meddelande-
Från: David Malcolm  
Skickat: den 2 september 2021 17:21
Till: Petter Tomner ; gcc-patches@gcc.gnu.org; j...@gcc.gnu.org
Ämne: Re: Sv: [PATCH] jit : Generate debug info for variables

On Tue, 2021-08-31 at 00:23 +, Petter Tomner via Gcc-patches wrote:
> Well I seemed to have attached the wrong testcase. Here is the proper 
> one attached.
> 
> Regards,
> 
> -Ursprungligt meddelande-
> Från: Petter Tomner
> Skickat: den 31 augusti 2021 02:14
> Till: gcc-patches@gcc.gnu.org; j...@gcc.gnu.org
> Ämne: [PATCH] jit : Generate debug info for variables
> 
> Hi,
> 
> This is a patch to generate debug info for local variables as well as 
> globals.
> With this, "ptype foo", "info variables", "info locals" etc works when 
> debugging in GDB.
> 
> Finalizing of global variable declares are moved to after locations 
> are handled and done as Fortran, C, Go etc do it. Also, primitive 
> types have their TYPE_NAME set for debug info on types to work.
> 
> Below are the patch, and I attached a testcase. Since it requires GDB 
> to run it might not be suitable? Make check-jit runs fine on Debian 
> x64.
> 
> Regards,

> From 6a5d24cbe80429d19042e643bd4c23940cd185fa Mon Sep 17 00:00:00 2001
> From: Petter Tomner 
> Date: Mon, 30 Aug 2021 01:45:11 +0200
> Subject: [PATCH 2/2] libgccjit: Test cases for debug info
> 
> Assure that debug info is available for variables.
> 
> gcc/testsuite/jit.dg/
>   jit.exp: Helper function
>   test-debuginfo.c

Again, please provided non-empty ChangeLog entries.

You can use contrib/gcc-changelog/git_check_commit.py to validate them.

I don't see "Signed-off-by" tags in the patches.  Have you either filed a 
copyright assignment with the FSF, or can you please add the tags to certify 
that you wrote the patches and can contribute them, see:
  https://gcc.gnu.org/contribute.html#legal
  https://gcc.gnu.org/dco.html

[...snip...]

> +proc jit-check-debug-info { obj_file cmds match } {
> +verbose "Checking debug info for $obj_file with match: $match"
> +
> +if { [catch {exec gdb -v} fid] } {
> +verbose "No gdb seems to be in path. Can't check debug info.
Reporting expected fail."
> +xfail "No gdb seems to be in path. Can't check debug info"

I think this should be "unsupported" rather than "xfail".

[...snip...]

> diff --git a/gcc/testsuite/jit.dg/test-debuginfo.c
b/gcc/testsuite/jit.dg/test-debuginfo.c
> new file mode 100644
> index 000..0af5779fdd1
> --- /dev/null
> +++ b/gcc/testsuite/jit.dg/test-debuginfo.c
> @@ -0,0 +1,72 @@
> +/* Essentially this test checks that debug info are generated for
globals
> +   locals and functions, including type info.  The comment bellow is
used
> +   as fake code (does not affect the test, use for manual
debugging). */
> +/*
> +int a_global_for_test_debuginfo;
> +int main (int argc, char **argv)
> +{
> +int a_local_for_test_debuginfo = 2;
> +return a_global_for_test_debuginfo + a_local_for_test_debuginfo; 
> +} */

This is OK, but maybe using gcc_jit_context_dump_to_file with update_locations 
== 1 might be more sustainable in the long run?  See:
  https://gcc.gnu.org/onlinedocs/jit/topics/locations.html#faking-it

OK otherwise.

Thanks
Dave



Re: [PATCH 2/8] coroutines: Add a helper for creating local vars.

2021-09-02 Thread Jason Merrill via Gcc-patches

On 9/1/21 6:53 AM, Iain Sandoe wrote:


This is primarily code factoring, but we take this opportunity
to rename some of the implementation variables (which we intend
to expose to debugging) so that they are in the implementation
namespace.

Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc (coro_build_artificial_var): New.
(build_actor_fn): Use var builder, rename vars to use
implementation namespace.
(coro_rewrite_function_body): Likewise.
(morph_fn_to_coro): Likewise.
---
  gcc/cp/coroutines.cc | 69 +++-
  1 file changed, 43 insertions(+), 26 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index 2d68098f242..b8501032969 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1474,6 +1474,29 @@ coro_build_cvt_void_expr_stmt (tree expr, location_t loc)
return coro_build_expr_stmt (t, loc);
  }
  
+/* Helpers to build an artificial var, with location LOC, NAME and TYPE, in

+   CTX, and with initializer INIT.  */
+
+static tree
+coro_build_artificial_var (location_t loc, tree name, tree type, tree ctx,
+  tree init)
+{
+  tree res = build_lang_decl (VAR_DECL, name, type);
+  DECL_SOURCE_LOCATION (res) = loc;
+  DECL_CONTEXT (res) = ctx;
+  DECL_ARTIFICIAL (res) = true;
+  DECL_INITIAL (res) = init;
+  return res;
+}
+
+static tree
+coro_build_artificial_var (location_t loc, const char *name, tree type,
+  tree ctx, tree init)
+{
+  return coro_build_artificial_var (loc, get_identifier (name),
+   type, ctx, init);
+}
+
  /* Helpers for label creation:
 1. Create a named label in the specified context.  */
  
@@ -2113,12 +2136,10 @@ build_actor_fn (location_t loc, tree coro_frame_type, tree actor, tree fnbody,

tree top_block = make_node (BLOCK);
BIND_EXPR_BLOCK (actor_bind) = top_block;
  
-  tree continuation = build_lang_decl (VAR_DECL,

-  get_identifier ("actor.continue"),
-  void_coro_handle_type);
-  DECL_ARTIFICIAL (continuation) = 1;
-  DECL_IGNORED_P (continuation) = 1;
-  DECL_CONTEXT (continuation) = actor;
+  tree continuation = coro_build_artificial_var (loc, "_Coro_actor_continue",
+void_coro_handle_type, actor,
+NULL_TREE);
+
BIND_EXPR_VARS (actor_bind) = continuation;
  
/* Link in the block associated with the outer scope of the re-written

@@ -4069,12 +4090,11 @@ coro_rewrite_function_body (location_t fn_start, tree 
fnbody, tree orig,
 fn_start, NULL, /*musthave=*/true);
/* Create and initialize the initial-await-resume-called variable per
 [dcl.fct.def.coroutine] / 5.3.  */
-  tree i_a_r_c = build_lang_decl (VAR_DECL, get_identifier ("i_a_r_c"),
- boolean_type_node);
-  DECL_ARTIFICIAL (i_a_r_c) = true;
+  tree i_a_r_c = coro_build_artificial_var (fn_start, "_Coro_i_a_r_c",


I might use a bit longer name, since the user doesn't have the comment 
above to explain the acronym.  :)


OK either way.


+   boolean_type_node, orig,
+   boolean_false_node);
DECL_CHAIN (i_a_r_c) = var_list;
var_list = i_a_r_c;
-  DECL_INITIAL (i_a_r_c) = boolean_false_node;
add_decl_expr (i_a_r_c);
/* Start the try-catch.  */
tree tcb = build_stmt (fn_start, TRY_BLOCK, NULL_TREE, NULL_TREE);
@@ -4459,8 +4479,10 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
add_stmt (ramp_bind);
tree ramp_body = push_stmt_list ();
  
-  tree coro_fp = build_lang_decl (VAR_DECL, get_identifier ("coro.frameptr"),

- coro_frame_ptr);
+  tree zeroinit = build1_loc (fn_start, CONVERT_EXPR,
+ coro_frame_ptr, integer_zero_node);
+  tree coro_fp = coro_build_artificial_var (fn_start, "_Coro_frameptr",
+   coro_frame_ptr, orig, zeroinit);
tree varlist = coro_fp;
  
/* To signal that we need to cleanup copied function args.  */

@@ -4478,21 +4500,19 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
  
/* Signal that we need to clean up the promise object on exception.  */

tree coro_promise_live
-   = build_lang_decl (VAR_DECL, get_identifier ("coro.promise.live"),
- boolean_type_node);
-  DECL_ARTIFICIAL (coro_promise_live) = true;
+= coro_build_artificial_var (fn_start, "_Coro_promise_live",
+boolean_type_node, orig, boolean_false_node);
DECL_CHAIN (coro_promise_live) = varlist;
varlist = coro_promise_live;
-  DECL_INITIAL (coro_promise_live) = boolean_false_node;
+
/* When the get-return-ob

Re: [PATCH 1/8] coroutines : Use DECL_VALUE_EXPR instead of rewriting vars.

2021-09-02 Thread Jason Merrill via Gcc-patches

On 9/1/21 6:52 AM, Iain Sandoe wrote:

Hi,

Variables that need to persist over suspension expressions
must be preserved by being copied into the coroutine frame.

The initial implementations do this manually in the transform
code.  However, that has various disadvantages - including
that the debug connections are lost between the original var
and the frame copy.

The revised implementation makes use of DECL_VALUE_EXPRs to
contain the frame offset expressions, so that the original
var names are preserved in the code.

This process is also applied to the function parms which are
always copied to the frame.  In this case the decls need to be
copied since they are used in two different contexts during
the re-write (in the building of the ramp function, and in
the actor function itself).

This will assist in improvement of debugging (PR 99215).


OK.


Signed-off-by: Iain Sandoe 

gcc/cp/ChangeLog:

* coroutines.cc (transform_local_var_uses): Record
frame offset expressions as DECL_VALUE_EXPRs instead of
rewriting them.
---
  gcc/cp/coroutines.cc | 105 +++
  1 file changed, 5 insertions(+), 100 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index ceb3d3be75e..2d68098f242 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -1974,8 +1974,7 @@ transform_local_var_uses (tree *stmt, int *do_subtree, 
void *d)
local_vars_transform *lvd = (local_vars_transform *) d;
  
/* For each var in this bind expr (that has a frame id, which means it was

- accessed), build a frame reference for each and then walk the bind expr
- statements, substituting the frame ref for the original var.  */
+ accessed), build a frame reference and add it as the DECL_VALUE_EXPR.  */
  
if (TREE_CODE (*stmt) == BIND_EXPR)

  {
@@ -1991,13 +1990,9 @@ transform_local_var_uses (tree *stmt, int *do_subtree, 
void *d)
  /* Re-write the variable's context to be in the actor func.  */
  DECL_CONTEXT (lvar) = lvd->context;
  
-	/* For capture proxies, this could include the decl value expr.  */

-   if (local_var.is_lambda_capture || local_var.has_value_expr_p)
- {
-   tree ve = DECL_VALUE_EXPR (lvar);
-   cp_walk_tree (&ve, transform_local_var_uses, d, NULL);
+ /* For capture proxies, this could include the decl value expr.  */
+ if (local_var.is_lambda_capture || local_var.has_value_expr_p)
continue; /* No frame entry for this.  */
- }
  
  	  /* TODO: implement selective generation of fields when vars are

 known not-used.  */
@@ -2011,103 +2006,13 @@ transform_local_var_uses (tree *stmt, int *do_subtree, 
void *d)
  tree fld_idx = build3_loc (lvd->loc, COMPONENT_REF, TREE_TYPE (lvar),
 lvd->actor_frame, fld_ref, NULL_TREE);
  local_var.field_idx = fld_idx;
-   }
-  /* FIXME: we should be able to do this in the loop above, but (at least
-for range for) there are cases where the DECL_INITIAL contains
-forward references.
-So, now we've built the revised var in the frame, substitute uses of
-it in initializers and the bind expr body.  */
-  for (lvar = BIND_EXPR_VARS (*stmt); lvar != NULL;
-  lvar = DECL_CHAIN (lvar))
-   {
- /* we need to walk some of the decl trees, which might contain
-references to vars replaced at a higher level.  */
- cp_walk_tree (&DECL_INITIAL (lvar), transform_local_var_uses, d,
-   NULL);
- cp_walk_tree (&DECL_SIZE (lvar), transform_local_var_uses, d, NULL);
- cp_walk_tree (&DECL_SIZE_UNIT (lvar), transform_local_var_uses, d,
-   NULL);
+ SET_DECL_VALUE_EXPR (lvar, fld_idx);
+ DECL_HAS_VALUE_EXPR_P (lvar) = true;
}
cp_walk_tree (&BIND_EXPR_BODY (*stmt), transform_local_var_uses, d, 
NULL);
-
-  /* Now we have processed and removed references to the original vars,
-we can drop those from the bind - leaving capture proxies alone.  */
-  for (tree *pvar = &BIND_EXPR_VARS (*stmt); *pvar != NULL;)
-   {
- bool existed;
- local_var_info &local_var
-   = lvd->local_var_uses->get_or_insert (*pvar, &existed);
- gcc_checking_assert (existed);
-
- /* Leave lambda closure captures alone, we replace the *this
-pointer with the frame version and let the normal process
-deal with the rest.
-Likewise, variables with their value found elsewhere.
-Skip past unused ones too.  */
- if (local_var.is_lambda_capture
-|| local_var.has_value_expr_p
-|| local_var.field_id == NULL_TREE)
-   {
- pvar = &DECL_CHAIN (*pvar);
- continue;
-   }
-
- /* Discard this one, we replaced it.  */
- *pvar = DECL_CHAIN (*pvar);
-   }

Re: [PATCH] libstdc++: Define macro before it is first checked

2021-09-02 Thread Jonathan Wakely via Gcc-patches
On Thu, 2 Sept 2021 at 19:00, Jonathan Wakely wrote:
>
> * include/bits/atomic_wait.h (_GLIBCXX_HAVE_PLATFORM_WAIT):
> Define before first attempt to check it.
>
> Tested x86_64-linux and powerpc64-linux, not committed yet.

Actually ignore that ... I tested the wrong patch. This one introduces
a new FAIL, which I have a fix for, but it will have to wait for next
week.


> I think we need this, otherwise __platform_wait_uses_type is false
> for all T.
>
>



[PATCH] Fix target/102173 ICE after error recovery

2021-09-02 Thread apinski--- via Gcc-patches
From: Andrew Pinski 

After the recent r12-3278-823685221de986a change, the testcase
gcc.target/aarch64/sve/acle/general-c/type_redef_1.c started
to ICE as the code was not ready for error_mark_node in the
type.  This fixes that and the testcase now passes.

gcc/ChangeLog:

* config/aarch64/aarch64-sve-builtins.cc (register_vector_type):
Handle error_mark_node as the type of the type_decl.
---
 gcc/config/aarch64/aarch64-sve-builtins.cc | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/config/aarch64/aarch64-sve-builtins.cc 
b/gcc/config/aarch64/aarch64-sve-builtins.cc
index f71b287570e..bc92213665c 100644
--- a/gcc/config/aarch64/aarch64-sve-builtins.cc
+++ b/gcc/config/aarch64/aarch64-sve-builtins.cc
@@ -3416,6 +3416,7 @@ register_vector_type (vector_type_index type)
  installing an incorrect type.  */
   if (decl
   && TREE_CODE (decl) == TYPE_DECL
+  && TREE_TYPE (decl) != error_mark_node
   && TYPE_MAIN_VARIANT (TREE_TYPE (decl)) == vectype)
 vectype = TREE_TYPE (decl);
   acle_vector_types[0][type] = vectype;
-- 
2.17.1



Re: [PATCH] c++: parameter pack inside constexpr if [PR101764]

2021-09-02 Thread Jason Merrill via Gcc-patches

On 8/30/21 10:05 PM, Patrick Palka wrote:

Here when partially substituting into the pack expansion, substitution
into the constexpr if yields a still-dependent tree, so tsubst_expr
returns an IF_STMT with an unsubstituted IF_COND and with
IF_STMT_EXTRA_ARGS added to.  Hence after partial substitution
the pack expansion pattern still refers to the parameter pack 'ts' of
level 2 (and it's thus represented in the new PACK_EXPANSION_PARAMETER_PACKS)
even though the partially instantiated generic lambda admits only one
level of template arguments.



This causes us to crash during the
subsequent instantiation with the lambda's template arguments because of
the level mismatch.  (Likewise when the constexpr if is replaced by a
requires-expr, which too uses the extra args mechanism for delaying
partial instantiation.)



So essentially, a pack expansion pattern that contains a parameter pack
inside an "extra args" tree doesn't play well with partial substitution
thereof.  This patch fixes this by forcing such pack expansions to use
the extra args mechanism as well.


Why is this specific to parameter packs?  Won't non-pack template 
parameters also suffer from the level mismatch?  I'd think it would be 
simpler to just note when the pattern contains a constexpr if or 
requires-expression, for which we can't substitute into the pattern like 
a pack expansion, and know we need to use extra args in that case.



Bootstrapped and regtested on x86_64-pc-linux-gnu, does this look OK for
trunk?

PR c++/101764

gcc/cp/ChangeLog:

* cp-tree.h (PACK_EXPANSION_FORCE_EXTRA_ARGS_P): New accessor
macro.
* pt.c (uses_extra_args_mechanism_p): New function.
(find_parameter_pack_data::found_pack_within_extra_args_tree_p):
New data member.
(find_parameter_pack_data::inside_extra_args_tree_p): Likewise.
(find_parameter_packs_r): Detect parameter packs within "extra
args" trees and set found_pack_within_extra_args_tree_p
appropriately.
(make_pack_expansion): Set PACK_EXPANSION_FORCE_EXTRA_ARGS_P if
found_pack_within_extra_args_tree_p.
(use_pack_expansion_extra_args_p): Return true if there were
unsubstituted packs and PACK_EXPANSION_FORCE_EXTRA_ARGS_P.
(tsubst_pack_expansion): Pass the pack expansion to
use_pack_expansion_extra_args_p.

gcc/testsuite/ChangeLog:

* g++.dg/cpp1z/constexpr-if35.C: New test.
---
  gcc/cp/cp-tree.h|  5 ++
  gcc/cp/pt.c | 69 -
  gcc/testsuite/g++.dg/cpp1z/constexpr-if35.C | 18 ++
  3 files changed, 90 insertions(+), 2 deletions(-)
  create mode 100644 gcc/testsuite/g++.dg/cpp1z/constexpr-if35.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index ce7ca53a113..06dec495428 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -493,6 +493,7 @@ extern GTY(()) tree cp_global_trees[CPTI_MAX];
CONSTRUCTOR_C99_COMPOUND_LITERAL (in CONSTRUCTOR)
OVL_NESTED_P (in OVERLOAD)
DECL_MODULE_EXPORT_P (in _DECL)
+  PACK_EXPANSION_FORCE_EXTRA_ARGS_P (in *_PACK_EXPANSION)
 4: IDENTIFIER_MARKED (IDENTIFIER_NODEs)
TREE_HAS_CONSTRUCTOR (in INDIRECT_REF, SAVE_EXPR, CONSTRUCTOR,
  CALL_EXPR, or FIELD_DECL).
@@ -3902,6 +3903,10 @@ struct GTY(()) lang_decl {
  /* True iff this pack expansion is for auto... in lambda init-capture.  */
  #define PACK_EXPANSION_AUTO_P(NODE) TREE_LANG_FLAG_2 (NODE)
  
+/* True if we must use PACK_EXPANSION_EXTRA_ARGS to avoid partial

+   substitution into this pack expansion.  */
+#define PACK_EXPANSION_FORCE_EXTRA_ARGS_P(NODE) TREE_LANG_FLAG_3 (NODE)
+
  /* True iff the wildcard can match a template parameter pack.  */
  #define WILDCARD_PACK_P(NODE) TREE_LANG_FLAG_0 (NODE)
  
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c

index fcf3ac31b25..a92dff88d9d 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -3855,6 +3855,20 @@ expand_builtin_pack_call (tree call, tree args, 
tsubst_flags_t complain,
return NULL_TREE;
  }
  
+/* Return true if the tree T uses the extra args mechanism for

+   deferring partial substitution into it.  */
+
+static bool
+uses_extra_args_mechanism_p (tree t)
+{
+  return (PACK_EXPANSION_P (t)
+ || TREE_CODE (t) == REQUIRES_EXPR
+ || (TREE_CODE (t) == IF_STMT
+ && IF_STMT_CONSTEXPR_P (t)));
+}
+
+static tree find_parameter_packs_r (tree *, int *, void*);
+
  /* Structure used to track the progress of find_parameter_packs_r.  */
  struct find_parameter_pack_data
  {
@@ -3867,6 +3881,16 @@ struct find_parameter_pack_data
  
/* True iff we're making a type pack expansion.  */

bool type_pack_expansion_p;
+
+  /* True iff we found a pack inside a subtree that uses the extra
+ args mechanism.  */
+  bool found_pack_within_extra_args_tree_p = false;
+
+private:
+  /* True iff find_parameter_packs_r is currently visiting a tree
+ that uses the extra args mechanism or a subtree ther

Re: [PATCH V3 0/6] Initial support for AVX512FP16

2021-09-02 Thread Joseph Myers
On Thu, 2 Sep 2021, Iain Sandoe via Gcc-patches wrote:

> diff --git a/libgcc/soft-fp/eqdf2.c b/libgcc/soft-fp/eqdf2.c
> index 2a44ee377ce..a3bb664f5f1 100644
> --- a/libgcc/soft-fp/eqdf2.c
> +++ b/libgcc/soft-fp/eqdf2.c
> @@ -28,6 +28,7 @@
> License along with the GNU C Library; if not, see
> .  */
>  
> +#define DarwinMode DF
>  #include "soft-fp.h"
>  #include "double.h"

All these files are supposed to be taken unmodified from glibc.  They 
shouldn't contain any OS-specific code, such as a define of DarwinMode.  
sfp-machine.h, however, is libgcc-local, hence putting the definition of 
strong_alias there.

So you need some other way to extract the argument type of name in order 
to use it in a declaration of aliasname.  E.g.

__typeof (_Generic (name,
CMPtype (*) (HFtype, HFtype): (HFtype) 0,
CMPtype (*) (SFtype, SFtype): (SFtype) 0,
CMPtype (*) (DFtype, DFtype): (DFtype) 0,
CMPtype (*) (TFtype, TFtype): (TFtype) 0))

Now in fact I think the include ordering means none of the *type macros 
are defined here.  But if you do e.g.

typedef float alias_SFtype __attribute__ ((mode (SF)));

and similar, you could use alias_SFtype in the above.  And so keep the 
changes to the Darwin-specific parts of the libgcc-local sfp-machine.h.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH V3 0/6] Initial support for AVX512FP16

2021-09-02 Thread Joseph Myers
One of the committed changes breaks the build of libgcc for 32-bit x86 
configurations without SSE2 enabled by default:

In file included from 
/scratch/jmyers/glibc-bot/src/gcc/libgcc/soft-fp/extendhfsf2.c:31:
/scratch/jmyers/glibc-bot/src/gcc/libgcc/soft-fp/half.h:62:1: error: unable to 
emulate 'HF'
   62 | typedef float HFtype __attribute__ ((mode (HF)));
  | ^~~

(this showed up with my glibc bot building for i686-gnu).

Such a configuration should still support HFmode when you build user code 
with appropriate options.  I.e., the functions in question do need to be 
built into libgcc, so that user code can link against them, so you need to 
arrange for an explicit -msse2 to be used when building the HFmode libgcc 
functions (but not any other libgcc functions).

-- 
Joseph S. Myers
jos...@codesourcery.com


[r12-3310 Regression] FAIL: gcc.dg/torture/fp-int-convert-float16-timode.c -Os execution test on Linux/x86_64

2021-09-02 Thread sunil.k.pandey via Gcc-patches
On Linux/x86_64,

b387e664cfa4e9dd010a3f64d446308d6d84a5d2 is the first bad commit
commit b387e664cfa4e9dd010a3f64d446308d6d84a5d2
Author: liuhongt 
Date:   Mon Jul 5 17:31:46 2021 +0800

libgcc: Enable hfmode soft-sf/df/xf/tf extensions and truncations.

caused

FAIL: gcc.dg/torture/fp-int-convert-float16.c   -Os  execution test
FAIL: gcc.dg/torture/fp-int-convert-float16-timode.c   -Os  execution test

with GCC configured with



To reproduce:

$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg-torture.exp=gcc.dg/torture/fp-int-convert-float16.c 
--target_board='unix{-m32}'"
$ cd {build_dir}/gcc && make check 
RUNTESTFLAGS="dg-torture.exp=gcc.dg/torture/fp-int-convert-float16-timode.c 
--target_board='unix{-m32}'"

(Please do not reply to this email, for question about this report, contact me 
at skpgkp2 at gmail dot com)


Re: [PATCH, V2 2/3] targhooks: New target hook for CTF/BTF debug info emission

2021-09-02 Thread Indu Bhagat via Gcc-patches

On 8/26/21 11:12 PM, Richard Biener wrote:

On Thu, Aug 26, 2021 at 8:55 PM Indu Bhagat  wrote:


On 8/26/21 3:03 AM, Richard Biener wrote:

On Tue, Aug 24, 2021 at 7:07 PM Indu Bhagat  wrote:


On 8/18/21 12:00 AM, Richard Biener wrote:

On Tue, Aug 17, 2021 at 7:26 PM Indu Bhagat  wrote:


On 8/17/21 1:04 AM, Richard Biener wrote:

On Mon, Aug 16, 2021 at 7:39 PM Indu Bhagat  wrote:


On 8/10/21 4:54 AM, Richard Biener wrote:

On Thu, Aug 5, 2021 at 2:52 AM Indu Bhagat via Gcc-patches
 wrote:


This patch adds a new target hook to detect if the CTF container can allow the
emission of CTF/BTF debug info at DWARF debug info early finish time. Some
backends, e.g., BPF when generating code for CO-RE usecase, may need to emit
the CTF/BTF debug info sections around the time when late DWARF debug is
finalized (dwarf2out_finish).


Without looking at the dwarf2out.c usage in the next patch - I think
the CTF part
should be always emitted from dwarf2out_early_finish, the "hooks" should somehow
arrange for the alternate output specific data to be preserved until
dwarf2out_finish
time so the late BTF data can be emitted from there.

Lumping everything together now just makes it harder to see what info
is required
to persist and thus make LTO support more intrusive than necessary.


In principle, I agree the approach to split generate/emit CTF/BTF like
you mention is ideal.  But, the BTF CO-RE relocations format is such
that the .BTF section cannot be finalized until .BTF.ext contents are
all fully known (David Faust summarizes this issue in the other thread
"[PATCH, V2 3/3] dwarf2out: Emit BTF in dwarf2out_finish for BPF CO-RE
usecase".)

In summary, the .BTF.ext section refers to strings in the .BTF section.
These strings are added at the time the CO-RE relocations are added.
Recall that the .BTF section's header has information about the .BTF
string table start offset and length. So, this means the "CTF part" (or
the .BTF section) cannot simply be emitted in the dwarf2out_early_finish
because it's not ready yet. If it is still unclear, please let me know.

My judgement here is that the BTF format itself is not amenable to split
early/late emission like DWARF. BTF has no linker support yet either.


But are the strings used for the CO-RE relocations not all present already?
Or does the "CTF part" have only "foo", "bar" and "baz" while the CO-RE
part wants to output sth like "foo->bar.baz" (which IMHO would be quite
stupid also for size purposes)?



Yes, the latter ("foo->bar.baz") is closer to what the format does for
CO-RE relocations!


That said, fix the format.

Alternatively hand the CO-RE part its own string table (what's the fuss
with re-using the CTF string table if there's nothing to share ...)



BTF and .BTF.ext formats are specified already by implementations in the
kernel, libbpf, and LLVM. For that matter, I should add BPF CO-RE to the
mix and say that BPF CO-RE capability _and_ .BTF/.BTF.ext debug formats
have been defined already by the BPF kernel developers/associated
entities. At this time, we as GCC developers simply extending the BPF
backend/BTF generation support in GCC, cannot fix the format. That ship
has sailed.


Hmm, well.  How about emitting .BTF.ext.string from GCC and have the linker
merge the .BTF.ext.string section with the CTF string section then?  You can't
really say "the ship has sailed" if I read the CTF webpage - there seems to be
many format changes planned.

Well.  Guess that was it from my side on the topic of ranting about the
not well thought out debug format ;)

Richard.


Hello Richard,

As we clarified in this thread, BTF/CO-RE format cannot be changed. What
are your thoughts on this patch set now ? Is this OK ?


Since the issue is intrinsic to BTF/CO-RE and not the actual target can we
do w/o a target hook by just gating on BTF_WITH_CORE as debug format
or so?

Richard.



The issue is intrinsic to BTF debug format *when* CO-RE is in effect, so
it is not entirely target independent because the whole "Compile Once -
Run Everywhere" scheme is BPF backend specific.


I see.


The debug information generation routines need to know if CO-RE is in
effect (to finalize BTF debug info generation late and not early). Now,
because it is the user who selects it via the -mco-re option, we need to
have a way to detect this at run-time. Guarding it with a definition
like BTF_WITH_CORE (is this what you meant?) will not work.


I was thinking about having BTF_CORE_DEBUG in addition to BTF_DEBUG
and thus have this part of the debug info format.  That would be
straight-forward
in case the option to enable it were not backend specific but I guess it might
be valid for the backend to alter ops->x_write_symbols in the backend
option processing code.



This is doable. I updated the patch series and have posted V3.

Thanks
Indu


But, yes, we can do without a target hook. We can keep a global var in
the BTF context in btfout.c / CTF container (CTFC) which can be updated
by the backend whe

Re: [Committed] [PATCH 2/4] (v4) On-demand locations within string-literals

2021-09-02 Thread Thomas Schwinge
Hi!

On 2021-09-02T15:59:14+0200, I wrote:
> On 2016-08-05T14:16:58-0400, David Malcolm  wrote:
>> Committed to trunk as r239175; I'm attaching the final version of the
>> patch for reference.
>
> David, you've added here 'gcc/input.h:struct location_hash' (see quoted
> below), which will be useful elsewhere, so:
>
>> --- a/gcc/input.c
>> +++ b/gcc/input.c
>
>> +/* Internal function.  Canonicalize LOC into a form suitable for
>> +   use as a key within the database, stripping away macro expansion,
>> +   ad-hoc information, and range information, using the location of
>> +   the start of LOC within an ordinary linemap.  */
>> +
>> +location_t
>> +string_concat_db::get_key_loc (location_t loc)
>> +{
>> +  loc = linemap_resolve_location (line_table, loc, LRK_SPELLING_LOCATION,
>> +  NULL);
>> +
>> +  loc = get_range_from_loc (line_table, loc).m_start;
>> +
>> +  return loc;
>> +}
>
> OK to push the attached
> "Harden 'gcc/input.c:string_concat_db::get_key_loc'"?  (This fell out of
> my analysis for development work elsewhere.)

My suggested patch was:

--- a/gcc/input.c
+++ b/gcc/input.c
@@ -1483,6 +1483,9 @@ string_concat_db::get_key_loc (location_t loc)

   loc = get_range_from_loc (line_table, loc).m_start;

+  /* Ascertain that 'loc' is valid as a key in 'm_table'.  */
+  gcc_checking_assert (!RESERVED_LOCATION_P (loc));
+
   return loc;
 }

Uh, I should've looked at the correct test logs...  This change actually
does regress 'c-c++-common/substring-location-PR-87721.c' and
'gcc.dg/plugin/diagnostic-test-string-literals-1.c': for these, we do see
'BUILTINS_LOCATION' (via 'string_concat_db::record_string_concatenation').
Unless someone tell me that's unexpected (I'm completely lost in this
code...), I shall change/generalize my changes to provide both a
'location_hash' only using 'UNKNOWN_LOCATION' as a spare value for
'Empty' (as currently used here) and another variant additionally using
'BUILTINS_LOCATION' as spare value for 'Deleted'.


Grüße
 Thomas


>> --- a/gcc/input.h
>> +++ b/gcc/input.h
>
>> +struct location_hash : int_hash  { };
>> +
>> +class GTY(()) string_concat_db
>> +{
>> +[...]
>> +  hash_map  *m_table;
>> +};
>
> OK to push the attached
> "Generalize 'gcc/input.h:struct location_hash'"?

My suggested patch was:

> Subject: [PATCH 2/2] Generalize 'gcc/input.h:struct location_hash'
>
> This is currently only used here ('gcc/input.h:class string_concat_db'), but 
> is
> actually generally useful, so advertize it as such.
>
> Per the rationale given, we may use 'BUILTINS_LOCATION' as spare value for
> 'Deleted', in addition to the existing use of 'UNKNOWN_LOCATION' as spare 
> value
> for 'Empty'.
>
>   gcc/
>   * input.h (location_hash): Use 'BUILTINS_LOCATION' as spare value
>   for 'Deleted'.  Turn into a '#define'.
> ---
>  gcc/input.h | 21 +++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
>
> diff --git a/gcc/input.h b/gcc/input.h
> index e6881072c5f..46971a2684c 100644
> --- a/gcc/input.h
> +++ b/gcc/input.h
> @@ -36,6 +36,25 @@ extern GTY(()) class line_maps *saved_line_table;
> both UNKNOWN_LOCATION and BUILTINS_LOCATION fit into that.  */
>  STATIC_ASSERT (BUILTINS_LOCATION < RESERVED_LOCATION_COUNT);
>
> +/* Hasher for 'location_t' values satisfying '!RESERVED_LOCATION_P', thus 
> able
> +   to use 'UNKNOWN_LOCATION'/'BUILTINS_LOCATION' as spare values for
> +   'Empty'/'Deleted'.  */
> +/* If the following is used more than once, 'gengtype' generates duplicate
> +   functions (thus: "error: redefinition of 'void gt_ggc_mx(location_hash&)'"
> +   etc.):
> +
> +   struct location_hash
> + : int_hash {};
> +
> +   Likewise for this:
> +
> +   typedef int_hash
> + location_hash;
> +
> +   Thus, use a plain ol' '#define':
> +*/
> +#define location_hash int_hash BUILTINS_LOCATION>
> +
>  extern bool is_location_from_builtin_token (location_t);
>  extern expanded_location expand_location (location_t);
>
> @@ -230,8 +249,6 @@ public:
>location_t * GTY ((atomic)) m_locs;
>  };
>
> -struct location_hash : int_hash  { };
> -
>  class GTY(()) string_concat_db
>  {
>   public:
-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] c++: shortcut bad convs during overload resolution [PR101904]

2021-09-02 Thread Jason Merrill via Gcc-patches

On 8/31/21 3:15 PM, Patrick Palka wrote:

On Mon, 30 Aug 2021, Patrick Palka wrote:


In the context of overload resolution we have the notion of a "bad"
argument conversion, which is a conversion that "would be a permitted
with a bending of the language standards", and we handle such bad
conversions specially.  In particular, we rank a bad conversion as
better than no conversion but worse than a good conversion, and a bad
conversion doesn't necessarily make a candidate unviable.  With the
flag -fpermissive, we permit the situation where overload resolution
selects a candidate that contains a bad conversion (which we call a
non-strictly viable candidate).  And without the flag we issue a
distinct permerror in this situation instead.

One consequence of this defacto behavior is that in order to distinguish
a non-strictly viable candidate from an unviable candidate, if we
encounter a bad argument conversion during overload resolution we must
keep converting subsequent arguments because a subsequent conversion
could render the candidate unviable instead of just non-strictly viable.
But checking subsequent arguments can force template instantiations and
result in otherwise avoidable hard errors.  And in particular, all
'this' conversions are at worst bad, so this means the const/ref-qualifiers
of a member function can't be used to prune a candidate quickly, which
is the subject of the mentioned PR.

This patch tries to improve the situation without changing the defacto
output of add_candidates.  Specifically, when considering a candidate
during overload resolution this patch makes us shortcut argument
conversion checking upon encountering the first bad conversion
(tentatively marking the candidate as non-strictly viable, though it
could ultimately be unviable) under the assumption that we'll eventually
find a strictly viable candidate anyway (rendering the distinction
between non-strictly viable and unviable moot, since both are worse
than a strictly viable candidate).  If this assumption turns out to be
false, we'll fully reconsider the candidate under the defacto behavior
(without the shortcutting).

So in the best case (there's a strictly viable candidate), we avoid
some argument conversions and/or template argument deduction that may
cause a hard error.  In the worst case (there's no such candidate), we
have to redundantly consider some candidates twice.  (In a previous
version of the patch, to avoid this redundant checking I created a new
"deferred" conversion type that represents a conversion that is yet to
be performed, and instead of reconsidering a candidate I just realized
its deferred conversions.  But it doesn't seem this redundancy is a
significant performance issue to justify the added complexity of this
other approach.)


OK, thanks.


Lots of care was taken to preserve the defacto behavior w.r.t.
non-strictly viable candidates, but I wonder how important this behavior
is nowadays?  Can the notion of a non-strictly viable candidate be done
away with, or is it here to stay?


To expand on this, as a concrete alternative to this optimistic shortcutting
trick we could maybe recognize non-strictly viable candidates only when
-fpermissive (and just mark them as unviable when not -fpermissive).  IIUC
this would be a backwards compatible change overall -- only diagnostics would
be affected, probably for the better, since we'd explain the rejection reason
for more candidates in the event of overload resolution failure.

Here's a testcase for which such a change would result in better diagnostics:

   struct A {
 void f(int, int) const; // #1
 void f(int);// #2
   };
   
   int main() {

 const A a;
 a.f(0);
   }

We currently consider #2 to be a better candidate than #1 because the
bad conversion of the 'this' argument makes it only non-strictly
viable, whereas #1 is considered unviable due to the arity mismatch.
So overload resolution selects #2 and we end up making no mention of #1
in the subsequent diagnostic:

   : In function ‘int main()’:
   :8:8: error: passing ‘const A’ as ‘this’ argument discards qualifiers 
[-fpermissive]
   :3:8: note:   in call to ‘void A::f(int)’

Better would be to explain why neither candidate is a match:

   :8:6: error: no matching function for call to ‘A::f(int) const’
   :2:8: note: candidate: ‘void A::f(int, int) const’
   :2:8: note:   candidate expects 2 arguments, 1 provided
   :3:8: note: candidate: ‘void A::f(int)’
   :3:8: note:   passing ‘const A*’ as ‘this’ argument discards 
qualifiers


Is that better?  Focusing diagnostics on the candidate you probably 
meant seems helpful in most cases.




Same for

   void f(int, int);
   void f(int*);
   
   int main() {

 f(42);
   }

for which we currently emit

   : In function ‘int main()’:
   :5:5: error: invalid conversion from ‘int’ to ‘int*’ [-fpermissive]
   :2:8: note:   initializing argument 1 of ‘void f(int*)’

instead of

   : In function ‘int main()’:
   :5:4: error: no ma

Re: [PATCH] Generate XXSPLTIDP on power10.

2021-09-02 Thread Segher Boessenkool
On Wed, Sep 01, 2021 at 04:22:13PM -0400, Michael Meissner wrote:
> On Tue, Aug 31, 2021 at 06:41:30PM -0500, Segher Boessenkool wrote:
> > Hi!
> > 
> > Please do two separate patches.  The first that adds the instruction
> > (with a bit pattern, i.e. integer, input), and perhaps a second pattern
> > that has an fp as input and uses it if the constant is valid for the
> > insn (survives being converted to SP and back to DP (or the other way
> > around), and is not denormal).  That can be two patches if you want,
> > but :-)
> 
> Well we already have the two stages for the built-in.  But the point of the
> work is to make it work without the explicit builtin.

There should be a define_insn that works with just as just a bit
pattern, without floating point in sight.

> > Having the integer intermediate step not only makes the code hugely less
> > complicated, but is also allows e.g.
> 
> 
> > ===
> > typedef unsigned long long v2u64 __attribute__ ((vector_size (16)));
> > v2u64 f(void)
> > {
> > v2u64 x = { 0x8000, 0x8000 };
> > return x;
> > }
> > ===
> > 
> > to be optimised properly.
> > 
> > The second part is letting the existing code use such FP (and integer!)
> > contants.
> 
> It is somewhat harder to do this as integer values, because you have to check
> whether all of the bits are correct, including skipping the SF denormal bits.

You have to *anyway*.  And it is easy enough.

> I tend to think in real life, the only win will be be -0.0
> (i.e. 0x8000 that you mentioned).  I doubt many V2DI constants 
> will
> be appropriate canidates for this.

There are a few more.  But the point is that the code is *simpler* like
this.  It usually is -- shortcuts often take more time in the end.

> > (Btw, initialising the value (although the function always writes it) is
> > not defensive programming, it is hiding bugs.  IMNSHO :-) )
> 
> And avoiding warnings.

Shutting up warnings that your control flow is not good is not a good
idea usually.  Don't fight the compiler: it does not matter if you have
better taste than it, it will win.  Instead, write better control flow.
There might even be an actual bug hiding in there!

> > > +/* Whether a permute type instruction is a prefixed instruction.  This is
> > > +   called from the prefixed attribute processing.  */
> > > +
> > > +bool
> > > +prefixed_permute_p (rtx_insn *insn)
> > 
> > What does this have to do with this patch?
> 
> This is used by the prefixed type attribute to know that the 
> xxsplti{w,dp,32dx}
> instructions are prefixed by default, but unlike paddi, they don't want a
> leading 'p' in the instruction.

They are not permute insns, that's where your code turns into an enigma.

You should not have a function for prefixed insns that are not a variant
of non-prefixed insns: that is *most* prefixed insns (eventually it will
be, right now you see memory insns mostly of course).  Pick out the
special case, instead (hint: if it has a "not" in the description, you
probably picked the wrong side).

So you have an "is_prefixed" thing, all prefixed insns, and an
"is_prefixed_variant" (or some better name :-) ) function, that returns
true for those prefixed insns written with a leading "p" on an existing
base insn (and actually have the original opcode as suffix).

> > > @@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat"
> > > @@ -8051,20 +8057,21 @@ (define_insn "*mov_hardfloat32"
> > > @@ -8091,19 +8098,19 @@ (define_insn "*mov_softfloat32"
> > > @@ -8125,18 +8132,19 @@ (define_insn "*mov_hardfloat64"
> > > @@ -8170,6 +8178,7 @@ (define_insn "*mov_softfloat64"
> > 
> > It would be a good idea to merge many of these patterns again.  We can
> > do this now that we have the "isa" and "enabled" attributes.
> 
> I don't plan to do this right now.

The longer you wait the more work it becomes, and the more work other
things (like this patch) become as well.


Segher


Re: [PATCH] PR fortran/93834 - [9/10/11/12 Regression] ICE in trans_caf_is_present, at fortran/trans-intrinsic.c:8469

2021-09-02 Thread Harald Anlauf via Gcc-patches
Hi Tobias,

> Consider:
> 
> type t
>integer, allocatable :: a
> end type t
> 
> type(t) :: var[*]
> print *, allocated(var%a)
> print *, allocated(var[1]%a)
> end

what is the problem here?  Can you elaborate?

BTW: Intel accepts it, we (currently) accept it, my patch does not change 
anything.

> I think pointer has a likewise issue.
> It should be sufficient to get rid of the
> attr.codimension.

???

>   * * *
> 
> Note regarding pointers: F2018:C1542 also does not
> apply to intrinsics, cf. note below C1542 (quoted below).
> 
>   * * *
> 
> By itself, I do not see why accessing the value of a
> coindexed variable is a problem – just (de)allocating it should
> cause problems.
> 
> With pointers, undefined might be an additional issue.
> 
> Thus, while
>   allocate( coindexed object )
> has issues and is invalid – all refs to F2018:
>   C950 (R932) An allocate-object shall not be a coindexed object.
> 
> I do not see why
>   allocated ( ...)
> should be invalid; in particular, just a a NULL value check is needed.
> 
> Likewise for
>   associated  ( )
> 
> Besides exceptions for polymorphic allocatables, I find:
> 
> C1537 An actual argument that is a coindexed object shall not have a pointer 
> ultimate component.
> 
> C1542 The actual argument corresponding to a dummy pointer shall not be a 
> coindexed object.
> Note 1: Constraint C1542 does not apply to any intrinsic procedure because an 
> intrinsic procedure is defined in
> terms of its actual arguments.
> 
> For allocatables, there is:
> "If the actual argument is a coindexed object with an allocatable ultimate
> component, the dummy argument shall have the INTENT (IN) or the VALUE 
> attribute."

I am afraid you lost me here.  There are other PRs involving coarrays.
Are you suggesting I should solve them all?

Withdrawing the patch.

Harald



Re: [PATCH] libstdc++: Define macro before it is first checked

2021-09-02 Thread Thomas Rodgers via Gcc-patches
Agreed.

On Thu, Sep 2, 2021 at 10:58 AM Jonathan Wakely  wrote:

> Signed-off-by: Jonathan Wakely 
>
> libstdc++-v3/ChangeLog:
>
> * include/bits/atomic_wait.h (_GLIBCXX_HAVE_PLATFORM_WAIT):
> Define before first attempt to check it.
>
> Tested x86_64-linux and powerpc64-linux, not committed yet.
>
> I think we need this, otherwise __platform_wait_uses_type is false
> for all T.
>
>
>


[PATCH] libstdc++: Define macro before it is first checked

2021-09-02 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/atomic_wait.h (_GLIBCXX_HAVE_PLATFORM_WAIT):
Define before first attempt to check it.

Tested x86_64-linux and powerpc64-linux, not committed yet.

I think we need this, otherwise __platform_wait_uses_type is false
for all T.


commit 9b8a78b5c923efd2f9d10eaed3bfd43dd6a91fe3
Author: Jonathan Wakely 
Date:   Tue Aug 31 15:51:09 2021

libstdc++: Define macro before it is first checked

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/atomic_wait.h (_GLIBCXX_HAVE_PLATFORM_WAIT):
Define before first attempt to check it.

diff --git a/libstdc++-v3/include/bits/atomic_wait.h 
b/libstdc++-v3/include/bits/atomic_wait.h
index 07bb744d822..64b71e408eb 100644
--- a/libstdc++-v3/include/bits/atomic_wait.h
+++ b/libstdc++-v3/include/bits/atomic_wait.h
@@ -56,9 +56,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   namespace __detail
   {
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
+#define _GLIBCXX_HAVE_PLATFORM_WAIT 1
 using __platform_wait_t = int;
 static constexpr size_t __platform_wait_alignment = 4;
 #else
+// define _GLIBCX_HAVE_PLATFORM_WAIT and implement __platform_wait()
+// and __platform_notify() if there is a more efficient primitive supported
+// by the platform (e.g. __ulock_wait()/__ulock_wake()) which is better than
+// a mutex/condvar based wait.
 using __platform_wait_t = uint64_t;
 static constexpr size_t __platform_wait_alignment
   = __alignof__(__platform_wait_t);
@@ -70,7 +75,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #ifdef _GLIBCXX_HAVE_PLATFORM_WAIT
   = is_scalar_v<_Tp>
&& ((sizeof(_Tp) == sizeof(__detail::__platform_wait_t))
-   && (alignof(_Tp*) >= __platform_wait_alignment));
+   && (alignof(_Tp*) >= __detail::__platform_wait_alignment));
 #else
   = false;
 #endif
@@ -78,7 +83,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   namespace __detail
   {
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX
-#define _GLIBCXX_HAVE_PLATFORM_WAIT 1
 enum class __futex_wait_flags : int
 {
 #ifdef _GLIBCXX_HAVE_LINUX_FUTEX_PRIVATE
@@ -118,11 +122,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 static_cast(__futex_wait_flags::__wake_private),
 __all ? INT_MAX : 1);
   }
-#else
-// define _GLIBCX_HAVE_PLATFORM_WAIT and implement __platform_wait()
-// and __platform_notify() if there is a more efficient primitive supported
-// by the platform (e.g. __ulock_wait()/__ulock_wake()) which is better than
-// a mutex/condvar based wait
 #endif
 
 inline void
@@ -192,9 +191,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct __waiter_pool_base
 {
 #ifdef __cpp_lib_hardware_interference_size
-static constexpr auto _S_align = hardware_destructive_interference_size;
+  static constexpr auto _S_align = hardware_destructive_interference_size;
 #else
-static constexpr auto _S_align = 64;
+  static constexpr auto _S_align = 64;
 #endif
 
   alignas(_S_align) __platform_wait_t _M_wait = 0;


Re: [PATCH 2/2] Get rid of all float-int special cases in validate_subreg.

2021-09-02 Thread Segher Boessenkool
On Tue, Aug 31, 2021 at 07:17:49PM +0800, liuhongt via Gcc-patches wrote:
>   * emit-rtl.c (validate_subreg): Get rid of all float-int
>   special cases.

This caused various regressions on powerpc.  Please revert this until
this can be done safely (the comment this patch deletes says why it can
not be done yet).


Segher


Re: [PATCH] Check the type of mask while generating cond_op in gimple simplication.

2021-09-02 Thread Richard Sandiford via Gcc-patches
Hongtao Liu via Gcc-patches  writes:
> On Wed, Sep 1, 2021 at 8:52 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Wed, Sep 1, 2021 at 8:28 AM Hongtao Liu  wrote:
>> >>
>> >> On Tue, Aug 31, 2021 at 7:56 PM Richard Biener
>> >>  wrote:
>> >> >
>> >> > On Tue, Aug 31, 2021 at 12:18 PM Hongtao Liu  wrote:
>> >> > >
>> >> > > On Mon, Aug 30, 2021 at 8:25 PM Richard Biener via Gcc-patches
>> >> > >  wrote:
>> >> > > >
>> >> > > > On Fri, Aug 27, 2021 at 8:53 AM liuhongt  
>> >> > > > wrote:
>> >> > > > >
>> >> > > > >   When gimple simplifcation try to combine op and vec_cond_expr 
>> >> > > > > to cond_op,
>> >> > > > > it doesn't check if mask type matches. It causes an ICE when 
>> >> > > > > expand cond_op
>> >> > > > > with mismatched mode.
>> >> > > > >   This patch add a function named 
>> >> > > > > cond_vectorized_internal_fn_supported_p
>> >> > > > >  to additionally check mask type than 
>> >> > > > > vectorized_internal_fn_supported_p.
>> >> > > > >
>> >> > > > >   Bootstrapped and regtested on x86_64-linux-gnu{-m32,}.
>> >> > > > >   Ok for trunk?
>> >> > > > >
>> >> > > > > gcc/ChangeLog:
>> >> > > > >
>> >> > > > > PR middle-end/102080
>> >> > > > > * internal-fn.c 
>> >> > > > > (cond_vectorized_internal_fn_supported_p): New functions.
>> >> > > > > * internal-fn.h 
>> >> > > > > (cond_vectorized_internal_fn_supported_p): New declaration.
>> >> > > > > * match.pd: Check the type of mask while generating 
>> >> > > > > cond_op in
>> >> > > > > gimple simplication.
>> >> > > > >
>> >> > > > > gcc/testsuite/ChangeLog:
>> >> > > > >
>> >> > > > > PR middle-end/102080
>> >> > > > > * gcc.target/i386/pr102080.c: New test.
>> >> > > > > ---
>> >> > > > >  gcc/internal-fn.c| 22 
>> >> > > > > ++
>> >> > > > >  gcc/internal-fn.h|  1 +
>> >> > > > >  gcc/match.pd | 24 
>> >> > > > > 
>> >> > > > >  gcc/testsuite/gcc.target/i386/pr102080.c | 16 
>> >> > > > >  4 files changed, 55 insertions(+), 8 deletions(-)
>> >> > > > >  create mode 100644 gcc/testsuite/gcc.target/i386/pr102080.c
>> >> > > > >
>> >> > > > > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c
>> >> > > > > index 1360a00f0b9..8b2b65db1a7 100644
>> >> > > > > --- a/gcc/internal-fn.c
>> >> > > > > +++ b/gcc/internal-fn.c
>> >> > > > > @@ -4102,6 +4102,28 @@ expand_internal_call (gcall *stmt)
>> >> > > > >expand_internal_call (gimple_call_internal_fn (stmt), stmt);
>> >> > > > >  }
>> >> > > > >
>> >> > > > > +/* Check cond_op for vector modes since 
>> >> > > > > vectorized_internal_fn_supported_p
>> >> > > > > +   doesn't check if mask type matches.  */
>> >> > > > > +bool
>> >> > > > > +cond_vectorized_internal_fn_supported_p (internal_fn ifn, tree 
>> >> > > > > type,
>> >> > > > > +tree mask_type)
>> >> > > > > +{
>> >> > > > > +  if (!vectorized_internal_fn_supported_p (ifn, type))
>> >> > > > > +return false;
>> >> > > > > +
>> >> > > > > +  machine_mode mask_mode;
>> >> > > > > +  machine_mode vmode = TYPE_MODE (type);
>> >> > > > > +  int size1, size2;
>> >> > > > > +  if (VECTOR_MODE_P (vmode)
>> >> > > > > +  && targetm.vectorize.get_mask_mode 
>> >> > > > > (vmode).exists(&mask_mode)
>> >> > > > > +  && GET_MODE_SIZE (mask_mode).is_constant (&size1)
>> >> > > > > +  && GET_MODE_SIZE (TYPE_MODE (mask_type)).is_constant 
>> >> > > > > (&size2)
>> >> > > > > +  && size1 != size2)
>> >> > > >
>> >> > > > Why do we check for equal size rather than just mode equality which
>> >> > > I originally thought  TYPE_MODE of vector(8)  was
>> >> > > not QImode, Changed the patch to check mode equality.
>> >> > > Update patch.
>> >> >
>> >> > Looking at all this it seems the match.pd patterns should have not
>> >> > used vectorized_internal_fn_supported_p but 
>> >> > direct_internal_fn_supported_p
>> >> > which is equivalent here because we're always working with vector modes?
>>
>> Yeah, looks like it.
>>
>> >> > And then shouldn't we look at the actual optab whether the mask mode 
>> >> > matches
>> >> > the expectation rather than going around via the target hook which may 
>> >> > not have
>> >> > enough context to decide which mask mode to use?
>> >> How about this?
>> >>
>> >> +/* Return true if target supports cond_op with data TYPE and
>> >> +   mask MASK_TYPE.  */
>> >> +bool
>> >> +cond_internal_fn_supported_p (internal_fn ifn, tree type,
>> >> +   tree mask_type)
>> >> +{
>> >> +  tree_pair types = tree_pair (type, type);
>> >> +  optab tmp = direct_internal_fn_optab (ifn, types);
>> >> +  machine_mode vmode = TYPE_MODE (type);
>> >> +  insn_code icode = direct_optab_handler (tmp, vmode);
>> >> +  if (icode == CODE_FOR_nothing)
>> >> +return false;
>> >> +
>> >> +  machine_mode mask_mode = TYPE_MODE (mask_type);
>> >> +  /* Can't create rtx and use insn

[committed] libstdc++: Implement std::atomic::compare_exchange_weak

2021-09-02 Thread Jonathan Wakely via Gcc-patches
For some reason r170217 didn't add compare_exchange_weak to the
__atomic_base partial specialization, and so weak compare exchange
operations on pointers use compare_exchange_strong instead.

This adds __atomic_base::compare_exchange_weak and then uses it in
std::atomic::compare_exchange_weak.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/atomic_base.h (__atomic_base::compare_exchange_weak):
Add new functions.
* include/std/atomic (atomic::compare_exchange_weak): Use
it.

Tested x86_64-linux. Committed to trunk.

commit 89cf858571c58a58ca51dbbf3975582ebab41e2d
Author: Jonathan Wakely 
Date:   Thu Sep 2 16:47:31 2021

libstdc++: Implement std::atomic::compare_exchange_weak

For some reason r170217 didn't add compare_exchange_weak to the
__atomic_base partial specialization, and so weak compare exchange
operations on pointers use compare_exchange_strong instead.

This adds __atomic_base::compare_exchange_weak and then uses it in
std::atomic::compare_exchange_weak.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/bits/atomic_base.h 
(__atomic_base::compare_exchange_weak):
Add new functions.
* include/std/atomic (atomic::compare_exchange_weak): Use
it.

diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index cbe1da6d125..71e1de078b5 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -846,6 +846,28 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return __atomic_exchange_n(&_M_p, __p, int(__m));
   }
 
+  _GLIBCXX_ALWAYS_INLINE bool
+  compare_exchange_weak(__pointer_type& __p1, __pointer_type __p2,
+   memory_order __m1,
+   memory_order __m2) noexcept
+  {
+   __glibcxx_assert(__is_valid_cmpexch_failure_order(__m2));
+
+   return __atomic_compare_exchange_n(&_M_p, &__p1, __p2, 1,
+  int(__m1), int(__m2));
+  }
+
+  _GLIBCXX_ALWAYS_INLINE bool
+  compare_exchange_weak(__pointer_type& __p1, __pointer_type __p2,
+   memory_order __m1,
+   memory_order __m2) volatile noexcept
+  {
+   __glibcxx_assert(__is_valid_cmpexch_failure_order(__m2));
+
+   return __atomic_compare_exchange_n(&_M_p, &__p1, __p2, 1,
+  int(__m1), int(__m2));
+  }
+
   _GLIBCXX_ALWAYS_INLINE bool
   compare_exchange_strong(__pointer_type& __p1, __pointer_type __p2,
  memory_order __m1,
diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index bdbbfd5c8f8..936dd50ba1c 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -595,13 +595,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   bool
   compare_exchange_weak(__pointer_type& __p1, __pointer_type __p2,
memory_order __m1, memory_order __m2) noexcept
-  { return _M_b.compare_exchange_strong(__p1, __p2, __m1, __m2); }
+  { return _M_b.compare_exchange_weak(__p1, __p2, __m1, __m2); }
 
   bool
   compare_exchange_weak(__pointer_type& __p1, __pointer_type __p2,
memory_order __m1,
memory_order __m2) volatile noexcept
-  { return _M_b.compare_exchange_strong(__p1, __p2, __m1, __m2); }
+  { return _M_b.compare_exchange_weak(__p1, __p2, __m1, __m2); }
 
   bool
   compare_exchange_weak(__pointer_type& __p1, __pointer_type __p2,


[committed] libstdc++: Tweak whitespace in

2021-09-02 Thread Jonathan Wakely via Gcc-patches
Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/atomic: Tweak whitespace.

Tested x86_64-linux. Committed to trunk.

commit 892400f1f21ccee98dddcd90677038ce266248c8
Author: Jonathan Wakely 
Date:   Thu Sep 2 16:08:25 2021

libstdc++: Tweak whitespace in 

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/atomic: Tweak whitespace.

diff --git a/libstdc++-v3/include/std/atomic b/libstdc++-v3/include/std/atomic
index b395c65d468..bdbbfd5c8f8 100644
--- a/libstdc++-v3/include/std/atomic
+++ b/libstdc++-v3/include/std/atomic
@@ -560,7 +560,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   { return _M_b.is_lock_free(); }
 
 #if __cplusplus >= 201703L
-static constexpr bool is_always_lock_free = ATOMIC_POINTER_LOCK_FREE == 2;
+  static constexpr bool is_always_lock_free
+   = ATOMIC_POINTER_LOCK_FREE == 2;
 #endif
 
   void
@@ -660,6 +661,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 notify_all() const noexcept
 { _M_b.notify_all(); }
 #endif // __cpp_lib_atomic_wait
+
   __pointer_type
   fetch_add(ptrdiff_t __d,
memory_order __m = memory_order_seq_cst) noexcept
@@ -721,7 +723,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __base_type::operator=;
 
 #if __cplusplus >= 201703L
-static constexpr bool is_always_lock_free = ATOMIC_CHAR_LOCK_FREE == 2;
+  static constexpr bool is_always_lock_free = ATOMIC_CHAR_LOCK_FREE == 2;
 #endif
 };
 
@@ -744,7 +746,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __base_type::operator=;
 
 #if __cplusplus >= 201703L
-static constexpr bool is_always_lock_free = ATOMIC_CHAR_LOCK_FREE == 2;
+  static constexpr bool is_always_lock_free = ATOMIC_CHAR_LOCK_FREE == 2;
 #endif
 };
 
@@ -767,7 +769,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __base_type::operator=;
 
 #if __cplusplus >= 201703L
-static constexpr bool is_always_lock_free = ATOMIC_CHAR_LOCK_FREE == 2;
+  static constexpr bool is_always_lock_free = ATOMIC_CHAR_LOCK_FREE == 2;
 #endif
 };
 
@@ -790,7 +792,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __base_type::operator=;
 
 #if __cplusplus >= 201703L
-static constexpr bool is_always_lock_free = ATOMIC_SHORT_LOCK_FREE == 2;
+  static constexpr bool is_always_lock_free = ATOMIC_SHORT_LOCK_FREE == 2;
 #endif
 };
 
@@ -813,7 +815,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __base_type::operator=;
 
 #if __cplusplus >= 201703L
-static constexpr bool is_always_lock_free = ATOMIC_SHORT_LOCK_FREE == 2;
+  static constexpr bool is_always_lock_free = ATOMIC_SHORT_LOCK_FREE == 2;
 #endif
 };
 
@@ -836,7 +838,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __base_type::operator=;
 
 #if __cplusplus >= 201703L
-static constexpr bool is_always_lock_free = ATOMIC_INT_LOCK_FREE == 2;
+  static constexpr bool is_always_lock_free = ATOMIC_INT_LOCK_FREE == 2;
 #endif
 };
 
@@ -859,7 +861,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __base_type::operator=;
 
 #if __cplusplus >= 201703L
-static constexpr bool is_always_lock_free = ATOMIC_INT_LOCK_FREE == 2;
+  static constexpr bool is_always_lock_free = ATOMIC_INT_LOCK_FREE == 2;
 #endif
 };
 
@@ -882,7 +884,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __base_type::operator=;
 
 #if __cplusplus >= 201703L
-static constexpr bool is_always_lock_free = ATOMIC_LONG_LOCK_FREE == 2;
+  static constexpr bool is_always_lock_free = ATOMIC_LONG_LOCK_FREE == 2;
 #endif
 };
 
@@ -905,7 +907,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __base_type::operator=;
 
 #if __cplusplus >= 201703L
-static constexpr bool is_always_lock_free = ATOMIC_LONG_LOCK_FREE == 2;
+  static constexpr bool is_always_lock_free = ATOMIC_LONG_LOCK_FREE == 2;
 #endif
 };
 
@@ -928,7 +930,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __base_type::operator=;
 
 #if __cplusplus >= 201703L
-static constexpr bool is_always_lock_free = ATOMIC_LLONG_LOCK_FREE == 2;
+  static constexpr bool is_always_lock_free = ATOMIC_LLONG_LOCK_FREE == 2;
 #endif
 };
 
@@ -951,7 +953,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __base_type::operator=;
 
 #if __cplusplus >= 201703L
-static constexpr bool is_always_lock_free = ATOMIC_LLONG_LOCK_FREE == 2;
+  static constexpr bool is_always_lock_free = ATOMIC_LLONG_LOCK_FREE == 2;
 #endif
 };
 
@@ -974,7 +976,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __base_type::operator=;
 
 #if __cplusplus >= 201703L
-static constexpr bool is_always_lock_free = ATOMIC_WCHAR_T_LOCK_FREE == 2;
+  static constexpr bool is_always_lock_free = ATOMIC_WCHAR_T_LOCK_FREE == 
2;
 #endif
 };
 
@@ -998,7 +1000,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   using __base_type::operator=;
 
 #if __cplusplus > 201402L
-static constexpr bool is_always_lock_free = ATOMIC_CHAR8_T_LOCK_FREE == 2;
+  static constexpr bool is_always_lock_free
+   = AT

[committed] libstdc++: Remove "no stronger" assertion in compare exchange [PR102177]

2021-09-02 Thread Jonathan Wakely via Gcc-patches
P0418R2 removed some preconditions from std::atomic::compare_exchange_*
but we still enforce them via __glibcxx_assert. This removes those
assertions.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR c++/102177
* include/bits/atomic_base.h (__is_valid_cmpexch_failure_order):
New function to check if a memory order is valid for the failure
case of compare exchange operations.
(__atomic_base::compare_exchange_weak): Simplify assertions
by using __is_valid_cmpexch_failure_order.
(__atomic_base::compare_exchange_strong): Likewise.
(__atomic_base::compare_exchange_weak): Likewise.
(__atomic_base::compare_exchange_strong): Likewise.
(__atomic_impl::compare_exchange_weak): Add assertion.
(__atomic_impl::compare_exchange_strong): Likewise.
* include/std/atomic (atomic::compare_exchange_weak): Likewise.
(atomic::compare_exchange_strong): Likewise.

Tested x86_64-linux. Committed to trunk. I think we should backport
this to gcc-11 too, as it was a C++17 change and these assertions fail
for valid (if questionable) C++17 programs.



commit dba1ab212292839572fda60df00965e094a11252
Author: Jonathan Wakely 
Date:   Thu Sep 2 15:29:22 2021

libstdc++: Remove "no stronger" assertion in compare exchange [PR102177]

P0418R2 removed some preconditions from std::atomic::compare_exchange_*
but we still enforce them via __glibcxx_assert. This removes those
assertions.

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

PR c++/102177
* include/bits/atomic_base.h (__is_valid_cmpexch_failure_order):
New function to check if a memory order is valid for the failure
case of compare exchange operations.
(__atomic_base::compare_exchange_weak): Simplify assertions
by using __is_valid_cmpexch_failure_order.
(__atomic_base::compare_exchange_strong): Likewise.
(__atomic_base::compare_exchange_weak): Likewise.
(__atomic_base::compare_exchange_strong): Likewise.
(__atomic_impl::compare_exchange_weak): Add assertion.
(__atomic_impl::compare_exchange_strong): Likewise.
* include/std/atomic (atomic::compare_exchange_weak): Likewise.
(atomic::compare_exchange_strong): Likewise.

diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index 20cf1343c58..cbe1da6d125 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -121,6 +121,13 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   | __memory_order_modifier(__m & __memory_order_modifier_mask));
   }
 
+  constexpr bool
+  __is_valid_cmpexch_failure_order(memory_order __m) noexcept
+  {
+return (__m & __memory_order_mask) != memory_order_release
+   && (__m & __memory_order_mask) != memory_order_acq_rel;
+  }
+
   _GLIBCXX_ALWAYS_INLINE void
   atomic_thread_fence(memory_order __m) noexcept
   { __atomic_thread_fence(int(__m)); }
@@ -511,13 +518,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   compare_exchange_weak(__int_type& __i1, __int_type __i2,
memory_order __m1, memory_order __m2) noexcept
   {
-   memory_order __b2 __attribute__ ((__unused__))
- = __m2 & __memory_order_mask;
-   memory_order __b1 __attribute__ ((__unused__))
- = __m1 & __memory_order_mask;
-   __glibcxx_assert(__b2 != memory_order_release);
-   __glibcxx_assert(__b2 != memory_order_acq_rel);
-   __glibcxx_assert(__b2 <= __b1);
+   __glibcxx_assert(__is_valid_cmpexch_failure_order(__m2));
 
return __atomic_compare_exchange_n(&_M_i, &__i1, __i2, 1,
   int(__m1), int(__m2));
@@ -528,13 +529,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
memory_order __m1,
memory_order __m2) volatile noexcept
   {
-   memory_order __b2 __attribute__ ((__unused__))
- = __m2 & __memory_order_mask;
-   memory_order __b1 __attribute__ ((__unused__))
- = __m1 & __memory_order_mask;
-   __glibcxx_assert(__b2 != memory_order_release);
-   __glibcxx_assert(__b2 != memory_order_acq_rel);
-   __glibcxx_assert(__b2 <= __b1);
+   __glibcxx_assert(__is_valid_cmpexch_failure_order(__m2));
 
return __atomic_compare_exchange_n(&_M_i, &__i1, __i2, 1,
   int(__m1), int(__m2));
@@ -560,13 +555,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   compare_exchange_strong(__int_type& __i1, __int_type __i2,
  memory_order __m1, memory_order __m2) noexcept
   {
-   memory_order __b2 __attribute__ ((__unused__))
- = __m2 & __memory_order_mask;
-   memory_order __b1 __attribute__ ((__unused__))
- = __m1 & __memory_order_mask;
-   __glibcxx_assert(__b2 != memory_order_

[committed] libstdc++: Define std::invoke_r for C++23 (P2136R3)

2021-09-02 Thread Jonathan Wakely via Gcc-patches
We already supported this feature as std::__invoke, for internal use.
This just adds a public version of it to .

Internal uses should continue to include  and use
std::__invoke so that they don't need to include all of .

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/functional (invoke_r): Define.
* include/std/version (__cpp_lib_invoke_r): Define.
* testsuite/20_util/function_objects/invoke/version.cc: Check
for __cpp_lib_invoke_r as well as __cpp_lib_invoke.
* testsuite/20_util/function_objects/invoke/4.cc: New test.

Tested x86_64-linux. Committed to trunk.

commit 5b73abd1a5f44f72e36bc7aefd423816083291ea
Author: Jonathan Wakely 
Date:   Thu Sep 2 11:54:12 2021

libstdc++: Define std::invoke_r for C++23 (P2136R3)

We already supported this feature as std::__invoke, for internal use.
This just adds a public version of it to .

Internal uses should continue to include  and use
std::__invoke so that they don't need to include all of .

Signed-off-by: Jonathan Wakely 

libstdc++-v3/ChangeLog:

* include/std/functional (invoke_r): Define.
* include/std/version (__cpp_lib_invoke_r): Define.
* testsuite/20_util/function_objects/invoke/version.cc: Check
for __cpp_lib_invoke_r as well as __cpp_lib_invoke.
* testsuite/20_util/function_objects/invoke/4.cc: New test.

diff --git a/libstdc++-v3/include/std/functional 
b/libstdc++-v3/include/std/functional
index 131e6629341..0b257926fd5 100644
--- a/libstdc++-v3/include/std/functional
+++ b/libstdc++-v3/include/std/functional
@@ -96,6 +96,21 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return std::__invoke(std::forward<_Callable>(__fn),
   std::forward<_Args>(__args)...);
 }
+
+#if __cplusplus > 202002L
+# define __cpp_lib_invoke_r 202106L
+
+  /// Invoke a callable object and convert the result to _Res.
+  template
+requires is_invocable_r_v<_Res, _Callable, _Args...>
+constexpr _Res
+invoke_r(_Callable&& __fn, _Args&&... __args)
+noexcept(is_nothrow_invocable_r_v<_Res, _Callable, _Args...>)
+{
+  return std::__invoke_r<_Res>(std::forward<_Callable>(__fn),
+  std::forward<_Args>(__args)...);
+}
+#endif // C++23
 #endif // C++17
 
   template 202002L
 // c++2b
+#define __cpp_lib_invoke_r 202106L
 #define __cpp_lib_is_scoped_enum 202011L
 #define __cpp_lib_string_contains 202011L
 #define __cpp_lib_to_underlying 202102L
diff --git a/libstdc++-v3/testsuite/20_util/function_objects/invoke/4.cc 
b/libstdc++-v3/testsuite/20_util/function_objects/invoke/4.cc
new file mode 100644
index 000..3ee6711f687
--- /dev/null
+++ b/libstdc++-v3/testsuite/20_util/function_objects/invoke/4.cc
@@ -0,0 +1,59 @@
+// { dg-options "-std=gnu++2b" }
+// { dg-do compile { target c++23 } }
+
+#include 
+
+#ifndef __cpp_lib_invoke_r
+# error Feature-test macro for invoke_r is missing in 
+#elif __cpp_lib_invoke_r < 202106L
+# error Feature-test macro for invoke_r has the wrong value in 
+#endif
+
+constexpr int sq(int i) { return i * i; }
+
+template
+constexpr bool chk(Val&& val, Expected&& exp)
+{
+  return std::is_same_v && val == exp;
+}
+
+void
+test01()
+{
+  static_assert( chk( std::invoke(sq, 2), 4 ) );
+  static_assert( chk( std::invoke_r(sq, 3), 9 ) );
+  static_assert( chk( std::invoke_r(sq, 4), '\x10' ) );
+}
+
+struct abstract {
+  virtual ~abstract() = 0;
+  void operator()() noexcept;
+};
+
+static_assert( noexcept(std::invoke(std::declval())),
+"It should be possible to use abstract types with INVOKE" );
+
+static_assert( noexcept(std::invoke_r(std::declval())),
+"It should be possible to use abstract types with INVOKE" );
+
+struct F {
+  void operator()() &;
+  void operator()() && noexcept;
+  int operator()(int);
+  double* operator()(int, int) noexcept;
+};
+struct D { D(void*); };
+
+static_assert( !noexcept(std::invoke(std::declval())) );
+static_assert( noexcept(std::invoke(std::declval())) );
+static_assert( !noexcept(std::invoke(std::declval(), 1)) );
+static_assert( noexcept(std::invoke(std::declval(), 1, 2)) );
+
+static_assert( !noexcept(std::invoke_r(std::declval())) );
+static_assert( noexcept(std::invoke_r(std::declval())) );
+static_assert( !noexcept(std::invoke_r(std::declval(), 1)) );
+static_assert( !noexcept(std::invoke_r(std::declval(), 1)) );
+static_assert( !noexcept(std::invoke_r(std::declval(), 1)) );
+static_assert( noexcept(std::invoke_r(std::declval(), 1, 2)) );
+static_assert( noexcept(std::invoke_r(std::declval(), 1, 2)) );
+static_assert( !noexcept(std::invoke_r(std::declval(), 1, 2)) );
diff --git a/libstdc++-v3/testsuite/20_util/function_objects/invoke/version.cc 
b/libstdc++-v3/testsuite/20_util/function_objects/invoke/version.cc
index cf1a46a1ada..2dc71aea504 100644
--- a/libstdc++-v3/testsuite/20_util/function_objects/invoke/version.cc
+++ b/libstdc++-v3/testsuite/20_util

Re: [RFC/PATCH] ipa-inline: Add target info into fn summary [PR102059]

2021-09-02 Thread Segher Boessenkool
Hi!

On Wed, Sep 01, 2021 at 03:02:22PM +0800, Kewen.Lin wrote:
> It introduces two target hooks need_ipa_fn_target_info and
> update_ipa_fn_target_info.  The former allows target to do
> some previous check and decides to collect target specific
> information for this function or not.  For some special case,
> it can predict the analysis result and push it early without
> any scannings.  The latter allows the analyze_function_body
> to pass gimple stmts down just like fp_expressions handlings,
> target can do its own tricks.
> 
> To make it simple, this patch uses HOST_WIDE_INT to record the
> flags just like what we use for isa_flags.  For rs6000's HTM
> need, one HOST_WIDE_INT variable is quite enough, but it seems
> good to have one auto_vec for scalability as I noticed some
> targets have more than one HOST_WIDE_INT flag.  For now, this
> target information collection is only for always_inline function,
> function ipa_merge_fn_summary_after_inlining deals with target
> information merging.

These flags can in principle be separate from any flags the target
keeps, so 64 bits will be enough for a long time.  If we want to
architect that better, we should really architect the way all targets
do target flags first.  Let's not go there now :-)

So just one HOST_WIDE_INT, not a stack of them please?

> --- a/gcc/config/rs6000/rs6000-call.c
> +++ b/gcc/config/rs6000/rs6000-call.c
> @@ -13642,6 +13642,17 @@ rs6000_builtin_decl (unsigned code, bool 
> initialize_p ATTRIBUTE_UNUSED)
>return rs6000_builtin_decls[code];
>  }
>  
> +/* Return true if the builtin with CODE has any mask bits set
> +   which are specified by MASK.  */
> +
> +bool
> +rs6000_builtin_mask_set_p (unsigned code, HOST_WIDE_INT mask)
> +{
> +  gcc_assert (code < RS6000_BUILTIN_COUNT);
> +  HOST_WIDE_INT fnmask = rs6000_builtin_info[code].mask;
> +  return fnmask & mask;
> +}

The "_p" does not say that "any bits" part, which is crucial here.  So
name this something like "rs6000_fn_has_any_of_these_mask_bits"?  Yes
the name sucks, because this interface does :-P

Its it useful to have "any" semantics at all?  Otherwise, require this
to be passed just a single bit?

The implicit "!!" (or "!= 0", same thing) that casting to bool does
might be better explicit, too?  A cast to bool changes value so is more
suprising than other casts.

> +  /* Assume inline asm can use any instruction features.  */
> +  if (gimple_code (stmt) == GIMPLE_ASM)
> +{
> +  info[0] = -1;
> +  return false;
> +}

What is -1 here?  "All options set"?  Does that work?  Reliably?

> +  if (fndecl && fndecl_built_in_p (fndecl, BUILT_IN_MD))
> + {
> +   enum rs6000_builtins fcode =
> + (enum rs6000_builtins) DECL_MD_FUNCTION_CODE (fndecl);
> +   /* HTM bifs definitely exploit HTM insns.  */
> +   if (rs6000_builtin_mask_set_p ((unsigned) fcode, RS6000_BTM_HTM))

Why the cast here?  Please change the parameter type, instead?  It is
fine to use enums specific to our backend in that backend itself :-)

> @@ -1146,6 +1147,16 @@ ipa_dump_fn_summary (FILE *f, struct cgraph_node *node)
> fprintf (f, "  calls:\n");
> dump_ipa_call_summary (f, 4, node, s);
> fprintf (f, "\n");
> +   HOST_WIDE_INT flags;
> +   for (int i = 0; s->target_info.iterate (i, &flags); i++)
> + {
> +   if (i == 0)
> + {
> +   fprintf (f, "  target_info flags:");
> + }

Don't use blocks around single statements please.

> +  /* Only look for target information for inlinable always_inline functions. 
>  */
> +  bool scan_for_target_info =
> +(info->inlinable
> + && DECL_DISREGARD_INLINE_LIMITS (node->decl)
> + && lookup_attribute ("always_inline", DECL_ATTRIBUTES (node->decl))
> + && targetm.target_option.need_ipa_fn_target_info (node->decl,
> +info->target_info));

Don't use unnecessary parens please.


Segher


Re: [patch][version 6] add -ftrivial-auto-var-init and variable attribute "uninitialized" to gcc

2021-09-02 Thread Qing Zhao via Gcc-patches
Sorry for my late reply (I just came back from vacation last night).

> On Aug 23, 2021, at 8:55 AM, Richard Biener  wrote:
> 
 
 
 Looks like for the following code:
 
 3026   if (!reg_lhs)
 3027 {
 3028 /* If this is a VLA or the variable is not in register,
 3029expand to a memset to initialize it.  */
 3030   mark_addressable (lhs);
 3031   tree var_addr = build_fold_addr_expr (lhs);
 3032 
 3033   tree value = (init_type == AUTO_INIT_PATTERN) ?
 3034 build_int_cst (integer_type_node,
 3035INIT_PATTERN_VALUE) :
 3036 integer_zero_node;
 3037   tree m_call = build_call_expr (builtin_decl_implicit 
 (BUILT_IN_MEMSET),
 3038  3, var_addr, value, var_size);
 3039   /* Expand this memset call.  */
 3040   expand_builtin_memset (m_call, NULL_RTX, TYPE_MODE (var_type));
 3041 }
 
 At line 3030, “lhs” could be a SSA_NAME.
 
 My questions are:
 
 1. Could the routine “mark_addressable” and “build_fold_addr_expr” be 
 applied on SSA_NAME?
>>> 
>>> No.
>>> 
 2. Could the routine “expand_builtin_memset” be applied on the memset call 
 whose “DEST” is
   an address expression on SSA_NAME? 
>>> 
>>> No.
>>> 
 3. Within “expand_DEFERRED_INIT”, can I call “expand_builtin_memset” to 
 expand .DEFERRED_INIT?
>>> 
>>> Well, not with "invalid" GENERIC I fear (address of a SSA name).
>>> 
 I suspect that one of the above 3 might be the issue, but not sure which 
 one?
>>> 
>>> All of the above ;)  So while reg_lhs is now precise as to how the
>>> variable will end up (the SSA name will end up as a stack variable in this
>>> case, for whatever reason), expansion via memcpy only works when
>>> working on the RTL representation.  The usual "workaround" (ugh)
>>> is to use make_tree (), so in the !reg_lhs path you'd do
>>> 
>>> /* Get a new GENERIC representation for the RTL.  That's necesary
>>>in case LHS is an SSA name.  */
>>> lhs = make_tree (TREE_TYPE (lhs), tem);
>> 
>> This resolved the issue.
>> 
>> Another question,
>> 
>> Previously, I used
>> 
>>if (TREE_CODE (lhs) == SSA_NAME)
>>   lhs = SSA_NAME_VAR (lhs);
>> 
>> To resolve this issue. The purpose looks like the same as “make_tree”, just 
>> get an generic tree for the LHS. 
>> 
>> Why you said using SSA_NAME_VAR is broken?  Is it because SSA_NAME_VAR will 
>> not always return a valid TREE?
> 
> Because it's simply the wrong entity - I have no idea why that even
> worked.  Ah, cfgexpand associates it with some DECL_RTL for the 
> benefit of debug info.  But it's still wrong.
> 
>> I should use as following
>> 
>> 
>>   If (TREE_CODE (lhs) == SSA_NAME) && SSA_NAME_VAR (lhs))
>>  Lhs = SSA_NAME_VAR (lhs)
>> 
>> ?
> 
> No.  A SSA_NAME_VAR can have multiple SSA_NAMEs (obviously) and
> they do not necessarily have to be allocated to the same variable
> partition - that is, there's no 1:1 relationship between SSA_NAME
> and stack slot or (pseudo) register.  You want to initialize the
> storage associated with the SSA_NAME in the .DEFERRED_INIT call,
> not some other storage.

Okay, thanks for the explanation. Now I understand.
> 
>>> 
>>> alternatively you could maybe do
>>> 
>>> if (DECL_P (lhs))
>>>   {
>>> +  rtx tem = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>>> +  reg_lhs = !MEM_P (tem);
>>>   }
>>> else if (TREE_CODE (lhs) == SSA_NAME)
>>>   reg_lhs = true;
>>> else
>>>   reg_lhs = false;
>>> 
>>> thus treat SSA names as register storage always (even if it will end
>>> up on the stack).
>> 
>> My question here, for a complicate structure SSA_NAME, will expanding 
>> through memset better than expand_asssignment? 
> 
> It depends.  In the end I'd consider it a missed-optimization bug on
> the side that generates worse code - but I do expect cases will exist
> for both.
Okay. I agree.

>  Clearly memset will be worse when dealing with register
> initialization (thus the !MEM_P check) and I expect memset to be OK
> for stack where member-wise init esp. with non-zero might turn up
> worse code.

So, for SSA_NAME, since they are all treated as reg_lhs, they will be expanded 
through “expand_assignment” 
as member-wise init, therefore might generate worse code for a SSA_NAME that is 
in stack actually.

Qing

> 
> Richard.
> 
>> Qing
>>> 



Re: [PATCH] tree-optimization/102176 - locally compute participating SLP stmts

2021-09-02 Thread Richard Sandiford via Gcc-patches
Richard Biener  writes:
> This performs local re-computation of participating scalar stmts
> in BB vectorization subgraphs to allow precise computation of
> liveness of scalar stmts after vectorization and thus precise
> costing.  This treats all extern defs as live but continues
> to optimistically handle scalar defs that we think we can handle
> by lane-extraction even though that can still fail late during
> code-generation.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> Any comments?

LGTM.

> Thanks,
> Richard.
>
> 2021-09-02  Richard Biener  
>
>   PR tree-optimization/102176
>   * tree-vect-slp.c (vect_slp_gather_vectorized_scalar_stmts):
>   New function.
>   (vect_bb_slp_scalar_cost): Use the computed set of
>   vectorized scalar stmts instead of relying on the out-of-date
>   and not accurate PURE_SLP_STMT.
>   (vect_bb_vectorization_profitable_p): Compute the set
>   of vectorized scalar stmts.
> ---
>  gcc/tree-vect-slp.c | 69 +
>  1 file changed, 64 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
> index fa3566f3d06..024a1c38a23 100644
> --- a/gcc/tree-vect-slp.c
> +++ b/gcc/tree-vect-slp.c
> @@ -5104,6 +5104,42 @@ vect_bb_partition_graph (bb_vec_info bb_vinfo)
>  }
>  }
>  
> +/* Compute the set of scalar stmts participating in internal and external
> +   nodes.  */
> +
> +static void
> +vect_slp_gather_vectorized_scalar_stmts (vec_info *vinfo, slp_tree node,
> +  hash_set &visited,
> +  hash_set &vstmts,
> +  hash_set &estmts)
> +{
> +  int i;
> +  stmt_vec_info stmt_info;
> +  slp_tree child;
> +
> +  if (visited.add (node))
> +return;

Probably said this before, but it's confusing that hash_set::add
returns true if it did nothing, whereas bitmap_set_bit returns
false if it did nothing.  (The bitmap version seems more natural IMO.)

Not a problem with this patch of course. :-)

Richard

> +
> +  if (SLP_TREE_DEF_TYPE (node) == vect_internal_def)
> +{
> +  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
> + vstmts.add (stmt_info);
> +
> +  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
> + if (child)
> +   vect_slp_gather_vectorized_scalar_stmts (vinfo, child, visited,
> +vstmts, estmts);
> +}
> +  else
> +for (tree def : SLP_TREE_SCALAR_OPS (node))
> +  {
> + stmt_vec_info def_stmt = vinfo->lookup_def (def);
> + if (def_stmt)
> +   estmts.add (def_stmt);
> +  }
> +}
> +
> +
>  /* Compute the scalar cost of the SLP node NODE and its children
> and return it.  Do not account defs that are marked in LIFE and
> update LIFE according to uses of NODE.  */
> @@ -5112,6 +5148,7 @@ static void
>  vect_bb_slp_scalar_cost (vec_info *vinfo,
>slp_tree node, vec *life,
>stmt_vector_for_cost *cost_vec,
> +  hash_set &vectorized_scalar_stmts,
>hash_set &visited)
>  {
>unsigned i;
> @@ -5148,8 +5185,7 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
> {
>   stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt);
>   if (!use_stmt_info
> - || !PURE_SLP_STMT
> -   (vect_stmt_to_vectorize (use_stmt_info)))
> + || !vectorized_scalar_stmts.contains (use_stmt_info))
> {
>   (*life)[i] = true;
>   break;
> @@ -5212,7 +5248,7 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
> subtree_life.safe_splice (*life);
>   }
> vect_bb_slp_scalar_cost (vinfo, child, &subtree_life, cost_vec,
> -visited);
> +vectorized_scalar_stmts, visited);
> subtree_life.truncate (0);
>   }
>  }
> @@ -5254,11 +5290,33 @@ vect_bb_vectorization_profitable_p (bb_vec_info 
> bb_vinfo,
> SLP_INSTANCE_TREE (instance), visited);
>  }
>  
> +  /* Compute the set of scalar stmts we know will go away 'locally' when
> + vectorizing.  This used to be tracked with just PURE_SLP_STMT but that's
> + not accurate for nodes promoted extern late or for scalar stmts that
> + are used both in extern defs and in vectorized defs.  */
> +  hash_set vectorized_scalar_stmts;
> +  hash_set scalar_stmts_in_externs;
> +  hash_set visited;
> +  FOR_EACH_VEC_ELT (slp_instances, i, instance)
> +{
> +  vect_slp_gather_vectorized_scalar_stmts (bb_vinfo,
> +SLP_INSTANCE_TREE (instance),
> +visited,
> +vectorized_scalar_stmts,
> + 

Re: [PATCH v2] [MIPS]: add .module mipsREV to all output asm file

2021-09-02 Thread Jeff Law via Gcc-patches




On 8/28/2021 1:08 AM, Xi Ruoyao wrote:

On Fri, 2021-08-27 at 15:36 -0600, Jeff Law wrote:


It's easier when someone has to debug the code later.
enums show up in debug output by default, while #defines do not.

switch (mips_isa)
    {
  case MIPS_ISA_MIPS1: return "mips1";
  // ...
    }

It looks better, and (maybe) generates better code.  Just my 2 cents
though.

Coding standards would have that as

switch (mips_isa)
    {
    case MIPS_ISA_MIPS_1:
  return "mips1";
    ...
    }

There is some existing code using "case ... : return ..." in one line in
mips.c, so I thought it was standard :(.


Presumably .module is supported by all reasonably modern versions of
GAS?

It's added by the commit in binutils-gdb:


commit 919731affbef19fcad8dddb0a595bb05755cb345
Author: mfortune 
Date:   Tue May 20 13:28:20 2014 +0100

 Add MIPS .module directive


So it should be supported since binutils-2.25.

If we want to support old binutils we'll need something like "-fno-mips-
module-directive" and "--without-mips-module-directive".  My suggestion
is just bumping the binutils requirement for mips*-*-*.
That's old enough that I'm not going to worry about it :-)  By the time 
gcc-12 hits the streets, binutils-2.25 will be roughly 8 years old.


Jeff


Re: [PATCH V3 0/6] Initial support for AVX512FP16

2021-09-02 Thread Iain Sandoe via Gcc-patches
Patch below fixes bootstrap,

OK if it passes testing on x86_64 darwin/linux?
(if !OK .. then suggestions welcome)

thanks
Iain

> On 2 Sep 2021, at 16:18, Hongtao Liu  wrote:
> 
> 
> 
> On Thursday, September 2, 2021, Iain Sandoe  wrote:
> Hi Hongtao.
> 
> > On 2 Sep 2021, at 07:06, Hongtao Liu via Gcc-patches 
> >  wrote:
> > 
> > I'm going to check in the first 3 patches which are already approved.
> > 
> >  Update hf soft-fp from glibc.
> >  [i386] Enable _Float16 type for TARGET_SSE2 and above.
> >  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> >truncations.
> 
> Bootstrap on Darwin x86_64 is broken on at least AVX512 and i5 cpus at 
> revision
> r12-3311-g1e6267b33526.
> 
> "fp-machine.h:81:22: error: unknown type name 'TFtype'; did you mean 
> 'HFtype’?”
> 
> any immediate ideas on what might be the issue?
> thanks
>  
> Seems to be related to the belowpart which is not changed by my patch, and 
> TFtype is defined in quad.h
> 
>  76 /* Define ALIASNAME as a strong alias for NAME.  */
>   77 #if defined __MACH__
>   78 /* Mach-O doesn't support aliasing.  If these functions ever return
>   79anything but CMPtype we need to revisit this... */
>   80 #define strong_alias(name, aliasname) \
>   81   CMPtype aliasname (TFtype a, TFtype b) { return name(a, b); }
>   82 #else
> 
> Would you try to add
> typedef float TFtype __attribute__ ((mode (TF))); 
> Here to see if it fixes the issue.

I don’t think it’s  quite as simple as that - this is what I’m testing:

[PATCH] libgcc, soft-float: Fix strong_alias macro use for Darwin.

Darwin does not support strong symbol aliases and a work-
around is provided in sfp-machine.h where a second function
is created that simply calls the original.  However this
needs the arguments to the synthesized function to track
the mode of the original function.

So the fix here is to adjust the macro to allow the mode to
be provided and then to set it as needed before the header
is included.

Signed-off-by: Iain Sandoe 

libgcc/ChangeLog:

* config/i386/sfp-exceptions.c (DarwinMode): Set
arbitrarily to DF mode (the strong_alias macros is
not used here).
* config/i386/sfp-machine.h: Adjust strong_alias macro
so that the type can be provided per case.
* soft-fp/eqdf2.c (DarwinMode): Set to DF mode.
* soft-fp/eqhf2.c (DarwinMode): Set to HF mode.
* soft-fp/eqsf2.c (DarwinMode): Set to SF mode.
* soft-fp/eqtf2.c (DarwinMode): Set to TF mode.
* soft-fp/gedf2.c (DarwinMode): Set to DF mode.
* soft-fp/gesf2.c (DarwinMode): Set to SF mode.
* soft-fp/getf2.c (DarwinMode): Set to TF mode.
* soft-fp/ledf2.c (DarwinMode): Set to DF mode.
* soft-fp/lesf2.c (DarwinMode): Set to SF mode.
* soft-fp/letf2.c (DarwinMode): Set to TF mode.
---
 libgcc/config/i386/sfp-exceptions.c | 1 +
 libgcc/config/i386/sfp-machine.h| 9 ++---
 libgcc/soft-fp/eqdf2.c  | 1 +
 libgcc/soft-fp/eqhf2.c  | 1 +
 libgcc/soft-fp/eqsf2.c  | 1 +
 libgcc/soft-fp/eqtf2.c  | 1 +
 libgcc/soft-fp/gedf2.c  | 1 +
 libgcc/soft-fp/gesf2.c  | 1 +
 libgcc/soft-fp/getf2.c  | 1 +
 libgcc/soft-fp/ledf2.c  | 1 +
 libgcc/soft-fp/lesf2.c  | 1 +
 libgcc/soft-fp/letf2.c  | 1 +
 12 files changed, 17 insertions(+), 3 deletions(-)

diff --git a/libgcc/config/i386/sfp-exceptions.c 
b/libgcc/config/i386/sfp-exceptions.c
index edb6a57bb35..7431cf93e33 100644
--- a/libgcc/config/i386/sfp-exceptions.c
+++ b/libgcc/config/i386/sfp-exceptions.c
@@ -22,6 +22,7 @@
  */
 
 #ifndef _SOFT_FLOAT
+#define DarwinMode DF
 #include "sfp-machine.h"
 
 struct fenv
diff --git a/libgcc/config/i386/sfp-machine.h b/libgcc/config/i386/sfp-machine.h
index f15d29d3755..2cb6119b8f8 100644
--- a/libgcc/config/i386/sfp-machine.h
+++ b/libgcc/config/i386/sfp-machine.h
@@ -75,10 +75,13 @@ void __sfp_handle_exceptions (int);
 
 /* Define ALIASNAME as a strong alias for NAME.  */
 #if defined __MACH__
-/* Mach-O doesn't support aliasing.  If these functions ever return
-   anything but CMPtype we need to revisit this... */
+/* Mach-O doesn't support aliasing, so we build a secondary function for
+   the alias - this needs the type of the arguments to be provided as
+   DarwinFtype.  If these functions ever return anything but CMPtype
+   we need to revisit this... */
 #define strong_alias(name, aliasname) \
-  CMPtype aliasname (TFtype a, TFtype b) { return name(a, b); }
+  typedef float DarwinFtype __attribute__((mode (DarwinMode))); \
+  CMPtype aliasname (DarwinFtype a, DarwinFtype b) { return name(a, b); }
 #else
 # define strong_alias(name, aliasname) _strong_alias(name, aliasname)
 # define _strong_alias(name, aliasname) \
diff --git a/libgcc/soft-fp/eqdf2.c b/libgcc/soft-fp/eqdf2.c
index 2a44ee377ce..a3bb664f5f1 100644
--- a/libgcc/soft-fp/eqdf2.c
+++ b/libgcc/soft-fp/eqdf2.c
@@ -28,6 +28,7 @@
Lic

Re: [PATCH] introduce predicate analysis class

2021-09-02 Thread Jeff Law via Gcc-patches




On 8/30/2021 2:03 PM, Martin Sebor via Gcc-patches wrote:

The predicate analysis subset of the tree-ssa-uninit pass isn't
necessarily specific to the detection of uninitialized reads.
Suitably parameterized, the same core logic could be used in
other warning passes to improve their S/N ratio, or issue more
nuanced diagnostics (e.g., when an invalid access cannot be
ruled out but also need not in reality be unavoidable, issue
a "may be invalid" type of warning rather than "is invalid").

Separating the predicate analysis logic from the uninitialized
pass and exposing a narrow API should also make it easier to
understand and evolve each part independently of the other,
or replace one with a better implementation without modifying
the other.(*)

As the first step in this direction, the attached patch extracts
the predicate analysis logic out of the pass, turns the interface
into public class members, and hides the internals in either
private members or static functions defined in a new source file.
(**)

The changes should have no externally observable effect (i.e.,
should cause no changes in warnings), except on the contents of
the uninitialized dump.  While making the changes I enhanced
the dumps to help me follow the logic.  Turning some previously
free-standing functions into members involved changing their
signatures and adjusting their callers.  While making these
changes I also renamed some of them as well some variables for
improved clarity.  Finally, I moved declarations of locals
closer to their point of initialization.

Tested on x86_64-linux.  Besides the usual bootstrap/regtest
I also tentatively verified the generality of the new class
interfaces by making use of it in -Warray-bounds.  Besides there,
I'd like to make use of it in the new gimple-ssa-warn-access pass
and, longer term, any other flow-sensitive warnings that might
benefit from it.

Martin

[*] A review of open -Wuninitialized bugs I did while working
on this project made me aware of a number of opportunities to
improve the analyzer to reduce the number of false positives
-Wmaybe-uninitiailzed suffers from.

[**] The class isn't fully general and, like the uninit pass,
only works with PHI nodes.  I plan to generalize it to compute
the set of predicates between any two basic blocks.

gcc-predanal.diff

Factor predidacte analysis out of tree-ssa-uninit.c into its own module.

gcc/ChangeLog:

* Makefile.in (OBJS): Add gimple-predicate-analysis.o.
* tree-ssa-uninit.c (max_phi_args): Move to gimple-predicate-analysis.
(MASK_SET_BIT, MASK_TEST_BIT, MASK_EMPTY): Same.
(check_defs):
(can_skip_redundant_opnd):
(compute_uninit_opnds_pos): Adjust to namespace change.
(find_pdom): Move to gimple-predicate-analysis.cc.
(find_dom): Same.
(struct uninit_undef_val_t): New.
(is_non_loop_exit_postdominating): Move to gimple-predicate-analysis.cc.
(find_control_equiv_block): Same.
(MAX_NUM_CHAINS, MAX_CHAIN_LEN, MAX_POSTDOM_CHECK): Same.
(MAX_SWITCH_CASES): Same.
(compute_control_dep_chain): Same.
(find_uninit_use): Use predicate analyzer.
(struct pred_info): Move to gimple-predicate-analysis.
(convert_control_dep_chain_into_preds): Same.
(find_predicates): Same.
(collect_phi_def_edges): Same.
(warn_uninitialized_phi): Use predicate analyzer.
(find_def_preds): Move to gimple-predicate-analysis.
(dump_pred_info): Same.
(dump_pred_chain): Same.
(dump_predicates): Same.
(destroy_predicate_vecs): Remove.
(execute_late_warn_uninitialized): New.
(get_cmp_code): Move to gimple-predicate-analysis.
(is_value_included_in): Same.
(value_sat_pred_p): Same.
(find_matching_predicate_in_rest_chains): Same.
(is_use_properly_guarded): Same.
(prune_uninit_phi_opnds): Same.
(find_var_cmp_const): Same.
(use_pred_not_overlap_with_undef_path_pred): Same.
(pred_equal_p): Same.
(is_neq_relop_p): Same.
(is_neq_zero_form_p): Same.
(pred_expr_equal_p): Same.
(is_pred_expr_subset_of): Same.
(is_pred_chain_subset_of): Same.
(is_included_in): Same.
(is_superset_of): Same.
(pred_neg_p): Same.
(simplify_pred): Same.
(simplify_preds_2): Same.
(simplify_preds_3): Same.
(simplify_preds_4): Same.
(simplify_preds): Same.
(push_pred): Same.
(push_to_worklist): Same.
(get_pred_info_from_cmp): Same.
(is_degenerated_phi): Same.
(normalize_one_pred_1): Same.
(normalize_one_pred): Same.
(normalize_one_pred_chain): Same.
(normalize_preds): Same.
(can_one_predicate_be_invalidated_p): Same.
(can_chain_union_be_invalidated_p): Same.
(uninit_uses_cannot_happen): Same.
(pass_late_warn_uninitialized::execute): Define.
   

[PATCH v2 0/4] libffi: Sync with upstream

2021-09-02 Thread H.J. Lu via Gcc-patches
Change in the v2 patch:

1. Disable static trampolines by default.


GCC maintained a copy of libffi snapshot from 2009 and cherry-picked fixes 
from upstream over the last 10+ years.  In the meantime, libffi upstream
has been changed significantly with new features, bug fixes and new target
support.  Here is a set of patches to sync with libffi 3.4.2 release and
make it easier to sync with libffi upstream:

1. Document how to sync with upstream.
2. Add scripts to help sync with upstream.
3. Sync with libffi 3.4.2. This patch is quite big.  It is availale at

https://gitlab.com/x86-gcc/gcc/-/commit/15e80c879c571f79a0e57702848a9df5fba5be2f
4. Integrate libffi build and testsuite with GCC.

H.J. Lu (4):
  libffi: Add HOWTO_MERGE, autogen.sh and merge.sh
  libffi: Sync with libffi 3.4.2
  libffi: Integrate build with GCC
  libffi: Integrate testsuite with GCC testsuite

 libffi/.gitattributes |4 +
 libffi/ChangeLog.libffi   | 7743 -
 libffi/HOWTO_MERGE|   13 +
 libffi/LICENSE|2 +-
 libffi/LICENSE-BUILDTOOLS |  353 +
 libffi/MERGE  |4 +
 libffi/Makefile.am|  135 +-
 libffi/Makefile.in|  219 +-
 libffi/README |  450 -
 libffi/README.md  |  495 ++
 libffi/acinclude.m4   |   38 +-
 libffi/autogen.sh |   11 +
 libffi/configure  |  487 +-
 libffi/configure.ac   |   91 +-
 libffi/configure.host |   97 +-
 libffi/doc/Makefile.am|3 +
 libffi/doc/libffi.texi|  382 +-
 libffi/doc/version.texi   |8 +-
 libffi/fficonfig.h.in |   21 +-
 libffi/generate-darwin-source-and-headers.py  |  143 +-
 libffi/include/Makefile.am|2 +-
 libffi/include/Makefile.in|3 +-
 libffi/include/ffi.h.in   |  213 +-
 libffi/include/ffi_cfi.h  |   21 +
 libffi/include/ffi_common.h   |   50 +-
 libffi/include/tramp.h|   45 +
 libffi/libffi.map.in  |   24 +-
 libffi/libffi.pc.in   |2 +-
 libffi/libffi.xcodeproj/project.pbxproj   |  530 +-
 libffi/libtool-version|   25 +-
 libffi/man/Makefile.in|1 +
 libffi/mdate-sh   |   39 +-
 libffi/merge.sh   |   51 +
 libffi/msvcc.sh   |  134 +-
 libffi/src/aarch64/ffi.c  |  536 +-
 libffi/src/aarch64/ffitarget.h|   35 +-
 libffi/src/aarch64/internal.h |   33 +
 libffi/src/aarch64/sysv.S |  189 +-
 libffi/src/aarch64/win64_armasm.S |  506 ++
 libffi/src/alpha/ffi.c|6 +-
 libffi/src/arc/ffi.c  |6 +-
 libffi/src/arm/ffi.c  |  380 +-
 libffi/src/arm/ffitarget.h|   24 +-
 libffi/src/arm/internal.h |   10 +
 libffi/src/arm/sysv.S |  304 +-
 libffi/src/arm/sysv_msvc_arm32.S  |  311 +
 libffi/src/closures.c |  489 +-
 libffi/src/cris/ffi.c |4 +-
 libffi/src/csky/ffi.c |  395 +
 libffi/src/csky/ffitarget.h   |   63 +
 libffi/src/csky/sysv.S|  371 +
 libffi/src/dlmalloc.c |7 +-
 libffi/src/frv/ffi.c  |4 +-
 libffi/src/ia64/ffi.c |   30 +-
 libffi/src/ia64/ffitarget.h   |3 +-
 libffi/src/ia64/unix.S|9 +-
 libffi/src/java_raw_api.c |6 +-
 libffi/src/kvx/asm.h  |5 +
 libffi/src/kvx/ffi.c  |  273 +
 libffi/src/kvx/ffitarget.h|   75 +
 libffi/src/kvx/sysv.S |  127 +
 libffi/src/m32r/ffi.c |2 +-
 libffi/src/m68k/ffi.c |4 +-
 libffi/src/m68k/sysv.S|   29 +-
 libffi/src/m88k/ffi.c |8 +-
 libffi/src/metag/ffi.c|   14 +-
 libffi/src/microblaze/ffi.c   |   10 +-
 libffi/src/mips/ffi.c |  146 +-
 libffi/src/mips/ffitarget.h   |   23 +-
 libffi/src/mips/n32.S |  151 +-
 libffi/src/mips/o32.S |  177 +-
 libffi/src/moxie/eabi.S   |2 +-
 libffi/src/moxie/ffi

[PATCH v2 2/4] libffi: Sync with libffi 3.4.2

2021-09-02 Thread H.J. Lu via Gcc-patches
Merged commit: f9ea41683444ebe11cfa45b05223899764df28fb
---
 libffi/.gitattributes | 4 +
 libffi/ChangeLog.libffi   |  7743 +-
 libffi/LICENSE| 2 +-
 libffi/LICENSE-BUILDTOOLS |   353 +
 libffi/MERGE  | 4 +
 libffi/Makefile.am|   249 +-
 libffi/Makefile.in|  1944 --
 libffi/README |   450 -
 libffi/README.md  |   495 +
 libffi/acinclude.m4   |38 +-
 libffi/aclocal.m4 |  1202 -
 libffi/configure  | 19411 
 libffi/configure.ac   |   199 +-
 libffi/configure.host |97 +-
 libffi/doc/Makefile.am| 3 +
 libffi/doc/libffi.texi|   382 +-
 libffi/doc/version.texi   | 8 +-
 libffi/fficonfig.h.in |   208 -
 libffi/generate-darwin-source-and-headers.py  |   143 +-
 libffi/include/Makefile.am| 8 +-
 libffi/include/Makefile.in|   565 -
 libffi/include/ffi.h.in   |   213 +-
 libffi/include/ffi_cfi.h  |21 +
 libffi/include/ffi_common.h   |50 +-
 libffi/include/tramp.h|45 +
 libffi/libffi.map.in  |24 +-
 libffi/libffi.pc.in   | 2 +-
 libffi/libffi.xcodeproj/project.pbxproj   |   530 +-
 libffi/libtool-version|25 +-
 libffi/man/Makefile.in|   515 -
 libffi/mdate-sh   |   205 -
 libffi/msvcc.sh   |   134 +-
 libffi/src/aarch64/ffi.c  |   536 +-
 libffi/src/aarch64/ffitarget.h|35 +-
 libffi/src/aarch64/internal.h |33 +
 libffi/src/aarch64/sysv.S |   189 +-
 libffi/src/aarch64/win64_armasm.S |   506 +
 libffi/src/alpha/ffi.c| 6 +-
 libffi/src/arc/ffi.c  | 6 +-
 libffi/src/arm/ffi.c  |   380 +-
 libffi/src/arm/ffitarget.h|24 +-
 libffi/src/arm/internal.h |10 +
 libffi/src/arm/sysv.S |   304 +-
 libffi/src/arm/sysv_msvc_arm32.S  |   311 +
 libffi/src/closures.c |   489 +-
 libffi/src/cris/ffi.c | 4 +-
 libffi/src/csky/ffi.c |   395 +
 libffi/src/csky/ffitarget.h   |63 +
 libffi/src/csky/sysv.S|   371 +
 libffi/src/dlmalloc.c | 7 +-
 libffi/src/frv/ffi.c  | 4 +-
 libffi/src/ia64/ffi.c |30 +-
 libffi/src/ia64/ffitarget.h   | 3 +-
 libffi/src/ia64/unix.S| 9 +-
 libffi/src/java_raw_api.c | 6 +-
 libffi/src/kvx/asm.h  | 5 +
 libffi/src/kvx/ffi.c  |   273 +
 libffi/src/kvx/ffitarget.h|75 +
 libffi/src/kvx/sysv.S |   127 +
 libffi/src/m32r/ffi.c | 2 +-
 libffi/src/m68k/ffi.c | 4 +-
 libffi/src/m68k/sysv.S|29 +-
 libffi/src/m88k/ffi.c | 8 +-
 libffi/src/metag/ffi.c|14 +-
 libffi/src/microblaze/ffi.c   |10 +-
 libffi/src/mips/ffi.c |   146 +-
 libffi/src/mips/ffitarget.h   |23 +-
 libffi/src/mips/n32.S |   151 +-
 libffi/src/mips/o32.S |   177 +-
 libffi/src/moxie/eabi.S   | 2 +-
 libffi/src/moxie/ffi.c|27 +-
 libffi/src/nios2/ffi.c| 4 +-
 libffi/src/pa/ffi.c   |   216 +-
 libffi/src/pa/ffitarget.h |11 +-
 libffi/src/pa/hpux32.S|76 +-
 libffi/src/pa/linux.S |   160 +-
 libffi/src/powerpc/asm.h  | 4 +-
 libffi/src/powerpc/darwin_closure.S   | 6 +-
 libffi/src/powerpc/ffi.c  |10 +-
 libffi/src/powerpc/ffi_darwin.c   |48 +-
 libffi/src/powerpc/ffi_linux64.c  |   247 +-
 libffi/src/powerpc/ffi_powerpc.h  |25 +-
 libffi/src/powerpc/ffitarget.h|14 +-
 libffi/src/powerpc/linux64.S  |   111 +-
 libffi/src/powerpc/linux64_closure.S  |70 +-
 libffi/src/pow

[PATCH v2 3/4] libffi: Integrate build with GCC

2021-09-02 Thread H.J. Lu via Gcc-patches
1. Integrate with GCC build.
2. Disable static trampolines by default.
3. Support multilib.

* Makefile.am (AUTOMAKE_OPTIONS): Add info-in-builddir.
(ACLOCAL_AMFLAGS): Set to -I .. -I ../config.
(SUBDIRS): Don't add doc.
(TEXINFO_TEX): New.
(MAKEINFOFLAGS): Likewise.
(info_TEXINFOS): Likewise.
(STAMP_GENINSRC): Likewise.
(STAMP_BUILD_INFO): Likewise.
(all-local): Likewise.
(stamp-geninsrc): Likewise.
(doc/libffi.info): Likewise.
(stamp-build-info:): Likewise.
(CLEANFILES): Likewise.
(MAINTAINERCLEANFILES): Likewise.
(AM_MAKEFLAGS): Likewise.
(all-recursive): Likewise.
(install-recursive): Likewise.
(mostlyclean-recursive): Likewise.
(clean-recursive): Likewise.
(distclean-recursive): Likewise.
(maintainer-clean-recursive): Likewise.
(LTLDFLAGS): Replace libtool-ldflags with ../libtool-ldflags.
(AM_CFLAGS): Add -g -fexceptions.
(libffi.map-sun): Replace make_sunver.pl with
../contrib/make_sunver.pl.
(dist-hook): Removed.
Include $(top_srcdir)/../multilib.am.
* configure.ac: Add AM_ENABLE_MULTILIB.
Remove the frv*-elf check.
(AX_ENABLE_BUILDDIR): Removed.
(AM_INIT_AUTOMAKE): Add [no-dist].
Add --enable-generated-files-in-srcdir.
(C_CONFIG_MACRO_DIR): Removed.
(AX_COMPILER_VENDOR): Likewise.
(AX_CC_MAXOPT): Likewise.
(AX_CFLAGS_WARN_ALL): Likewise.
Remove the GCC check.
(SYMBOL_UNDERSCORE): Removed.
(AX_CHECK_COMPILE_FLAG): Likewise.
Remove --disable-docs.
(ACX_CHECK_PROG_VER): Check makeinfo.
(BUILD_DOCS): Updated.
(exec-static-tramp): Don't enable use of static exec trampolines
by default.
Remove --disable-multi-os-directory.
(GCC_WITH_TOOLEXECLIBDIR): New.
Support cross host.
Support --enable-multilib.
* include/Makefile.am (nodist_include_HEADERS): Removed.
(gcc_version): New.
(toollibffidir): Likewise.
(toollibffi_HEADERS): Likewise.
* Makefile.in: Regenerate.
(GCC_BASE_VER): New.
(AC_CONFIG_FILES): Remove doc/Makefile.
(AC_CONFIG_LINKS): New.
* aclocal.m4: Likewise.
* configure: Likewise.
* fficonfig.h.in: Likewise.
* mdate-sh: Likewise.
* include/Makefile.in: Likewise.
* man/Makefile.in: Likewise.
* testsuite/Makefile.in: Likewise.
---
 libffi/Makefile.am   |   116 +-
 libffi/Makefile.in   |  1963 
 libffi/aclocal.m4|  1202 ++
 libffi/configure | 19584 +
 libffi/configure.ac  |   130 +-
 libffi/fficonfig.h.in|   227 +
 libffi/include/Makefile.am   | 6 +-
 libffi/include/Makefile.in   |   566 +
 libffi/man/Makefile.in   |   516 +
 libffi/mdate-sh  |   224 +
 libffi/testsuite/Makefile.in |   606 +
 11 files changed, 25054 insertions(+), 86 deletions(-)
 create mode 100644 libffi/Makefile.in
 create mode 100644 libffi/aclocal.m4
 create mode 100755 libffi/configure
 create mode 100644 libffi/fficonfig.h.in
 create mode 100644 libffi/include/Makefile.in
 create mode 100644 libffi/man/Makefile.in
 create mode 100755 libffi/mdate-sh
 create mode 100644 libffi/testsuite/Makefile.in

diff --git a/libffi/Makefile.am b/libffi/Makefile.am
index 1b18198ad18..02e36176c67 100644
--- a/libffi/Makefile.am
+++ b/libffi/Makefile.am
@@ -1,18 +1,10 @@
 ## Process this with automake to create Makefile.in
 
-AUTOMAKE_OPTIONS = foreign subdir-objects
+AUTOMAKE_OPTIONS = foreign subdir-objects info-in-builddir
 
-ACLOCAL_AMFLAGS = -I m4
+ACLOCAL_AMFLAGS = -I .. -I ../config
 
 SUBDIRS = include testsuite man
-if BUILD_DOCS
-## This hack is needed because it doesn't seem possible to make a
-## conditional info_TEXINFOS in Automake.  At least Automake 1.14
-## either gives errors -- if this attempted in the most
-## straightforward way -- or simply unconditionally tries to build the
-## info file.
-SUBDIRS += doc
-endif
 
 EXTRA_DIST = LICENSE ChangeLog.old \
m4/libtool.m4 m4/lt~obsolete.m4 \
@@ -26,6 +18,90 @@ EXTRA_DIST = LICENSE ChangeLog.old   
\
 # local.exp is generated by configure
 DISTCLEANFILES = local.exp
 
+# Automake Documentation:
+# If your package has Texinfo files in many directories, you can use the
+# variable TEXINFO_TEX to tell Automake where to find the canonical
+# `texinfo.tex' for your package. The value of this variable should be
+# the relative path from the current `Makefile.am' to `texinfo.tex'.
+TEXINFO_TEX   = ../gcc/doc/include/texinfo.tex
+
+# Defines info, dvi, pdf and html targets
+MAKEINFOFLAGS = -I $(srcdir)/../gcc/doc/include
+info_TEXINFOS = doc/libffi.texi
+
+# AM_CONDITIONAL on

[PATCH v2 1/4] libffi: Add HOWTO_MERGE, autogen.sh and merge.sh

2021-09-02 Thread H.J. Lu via Gcc-patches
Add scripts for syncing with libffi upstream:

1. Clone libffi repo.
2. Checkout the specific commit.
3. Remove the unused files.
4. Add new files and remove old files if needed.

* HOWTO_MERGE: New file.
* autogen.sh: Likewise.
* merge.sh: Likewise.
---
 libffi/HOWTO_MERGE | 13 
 libffi/autogen.sh  | 11 ++
 libffi/merge.sh| 51 ++
 3 files changed, 75 insertions(+)
 create mode 100644 libffi/HOWTO_MERGE
 create mode 100755 libffi/autogen.sh
 create mode 100755 libffi/merge.sh

diff --git a/libffi/HOWTO_MERGE b/libffi/HOWTO_MERGE
new file mode 100644
index 000..5b92b10c15c
--- /dev/null
+++ b/libffi/HOWTO_MERGE
@@ -0,0 +1,13 @@
+In general, merging process should not be very difficult, but we need to
+track GCC-specific patches carefully.  Here is a general list of actions
+required to perform the merge:
+
+* Checkout recent GCC tree.
+* Run merge.sh script from the libffi directory.
+* Add new files and remove old files if needed.
+* Apply all needed GCC-specific patches to libffi (note that some of
+  them might be already included to upstream).  The list of these patches
+  is stored into LOCAL_PATCHES file.  May need to re-run autogen.sh to
+  regenerate configure and Makefile.in files.
+* Send your patches for review to GCC Patches Mailing List 
(gcc-patches@gcc.gnu.org).
+* Update LOCAL_PATCHES file when you've committed the whole patch set with new 
revisions numbers.
diff --git a/libffi/autogen.sh b/libffi/autogen.sh
new file mode 100755
index 000..95bfc389faf
--- /dev/null
+++ b/libffi/autogen.sh
@@ -0,0 +1,11 @@
+#!/bin/sh
+#exec autoreconf -v -i
+
+rm -rf autom4te.cache
+aclocal  -I .. -I ../config
+autoheader -I .. -I ../config
+autoconf
+automake --foreign --add-missing --copy Makefile
+automake --foreign include/Makefile
+automake --foreign man/Makefile
+automake --foreign testsuite/Makefile
diff --git a/libffi/merge.sh b/libffi/merge.sh
new file mode 100755
index 000..b36fbb92185
--- /dev/null
+++ b/libffi/merge.sh
@@ -0,0 +1,51 @@
+#!/bin/bash
+
+# FIXME: do we need a license (or whatever else) header here?
+
+# This script merges libffi sources from upstream.
+
+# Default to the tip of master branch.
+commit=${1-master}
+
+fatal() {
+  echo "$1"
+  exit 1;
+}
+
+get_upstream() {
+  rm -rf upstream
+  git clone https://github.com/libffi/libffi.git upstream
+  pushd upstream
+  git checkout $commit || fatal "Failed to checkout $commit"
+  popd
+}
+
+get_current_rev() {
+  cd upstream
+  git rev-parse HEAD
+}
+
+pwd | grep 'libffi$' || \
+  fatal "Run this script from the libffi directory"
+get_upstream
+CUR_REV=$(get_current_rev)
+echo Current upstream revision: $CUR_REV
+
+# Remove the unused files.
+pushd upstream
+rm -rf ChangeLog.old .appveyor* .ci .github .gitignore .travis* \
+   config.guess config.sub libtool-ldflags m4 make_sunver.pl \
+   msvc_build
+rm -rf .git autogen.sh
+cp -a . ..
+popd
+
+rm -rf upstream
+
+# Update the MERGE file.
+cat << EOF > MERGE
+$CUR_REV
+
+The first line of this file holds the git revision number of the
+last merge done from the master library sources.
+EOF
-- 
2.31.1



[PATCH v2 4/4] libffi: Integrate testsuite with GCC testsuite

2021-09-02 Thread H.J. Lu via Gcc-patches
* testsuite/lib/libffi.exp (load_gcc_lib): Load library from GCC
testsuite.
Load target-supports.exp and target-supports-dg.exp.
(libffi-init): Use libraries in GCC build tree.
(libffi_target_compile): Link with -lstdc++ for C++ sources.
---
 libffi/testsuite/lib/libffi.exp | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/libffi/testsuite/lib/libffi.exp b/libffi/testsuite/lib/libffi.exp
index 4f4dd48d2c6..d8bf6269a36 100644
--- a/libffi/testsuite/lib/libffi.exp
+++ b/libffi/testsuite/lib/libffi.exp
@@ -15,12 +15,15 @@
 # .
 
 proc load_gcc_lib { filename } {
-global srcdir
-load_file $srcdir/lib/$filename
+global srcdir loaded_libs
+load_file $srcdir/../../gcc/testsuite/lib/$filename
+set loaded_libs($filename) ""
 }
 
 load_lib dg.exp
 load_lib libgloss.exp
+load_gcc_lib target-supports.exp
+load_gcc_lib target-supports-dg.exp
 load_gcc_lib target-libpath.exp
 load_gcc_lib wrapper.exp
 
@@ -277,6 +280,7 @@ proc libffi-init { args } {
 global srcdir
 global blddirffi
 global objdir
+global blddircxx
 global TOOL_OPTIONS
 global tool
 global libffi_include
@@ -285,13 +289,13 @@ proc libffi-init { args } {
 global ld_library_path
 global compiler_vendor
 
-if ![info exists blddirffi] {
-   set blddirffi [pwd]/..
-}
-
+set blddirffi [lookfor_file [get_multilibs] libffi]
 verbose "libffi $blddirffi"
+set blddircxx [lookfor_file [get_multilibs] libstdc++-v3]
+verbose "libstdc++ $blddircxx"
+
+set compiler_vendor "gnu"
 
-# Which compiler are we building with?
 if { [string match $compiler_vendor "gnu"] } {
 set gccdir [lookfor_file $tool_root_dir gcc/libgcc.a]
 if {$gccdir != ""} {
@@ -320,6 +324,8 @@ proc libffi-init { args } {
 
 # add the library path for libffi.
 append ld_library_path ":${blddirffi}/.libs"
+# add the library path for libstdc++ as well.
+append ld_library_path ":${blddircxx}/src/.libs"
 
 verbose "ld_library_path: $ld_library_path"
 
@@ -332,6 +338,7 @@ proc libffi-init { args } {
 if { $libffi_dir != "" } {
set libffi_dir [file dirname ${libffi_dir}]
set libffi_link_flags "-L${libffi_dir}/.libs"
+   lappend libffi_link_flags "-L${blddircxx}/src/.libs"
 }
 
 set_ld_library_path_env_vars
@@ -398,9 +405,8 @@ proc libffi_target_compile { source dest type options } {
lappend options "libs= -lpthread"
 }
 
-# this may be required for g++, but just confused clang.
 if { [string match "*.cc" $source] } {
-lappend options "c++"
+   lappend options "ldflags=-lstdc++"
 }
 
 if { [string match "arc*-*-linux*" $target_triplet] } {
-- 
2.31.1



Re: [PATCH V3 0/6] Initial support for AVX512FP16

2021-09-02 Thread Hongtao Liu via Gcc-patches
On Thursday, September 2, 2021, H.J. Lu  wrote:

> On Wed, Sep 1, 2021 at 11:00 PM Hongtao Liu  wrote:
> >
> > I'm going to check in the first 3 patches which are already approved.
> >
> >   Update hf soft-fp from glibc.
> >   [i386] Enable _Float16 type for TARGET_SSE2 and above.
> >   [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> > truncations.
> >
> > On Mon, Aug 2, 2021 at 2:31 PM liuhongt  wrote:
> > >
> > > Update from v2:
> > >
> > > 1. Support -fexcess-precision=16 which will enable
> > > FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
> > > 2. Update ix86_get_excess_precision, so -fexcess-precision=standard
> > > should not do anything different from -fexcess-precision=fast
> > >  regarding _Float16.
> > > 3. Avoiding macroization of HFmode patterns.
> > > 4. Allow (subreg:SI (reg:HF)).
> > > 5. Update documents corresponding exactly to the code changes in
> > > the same patch.
> > > 6. According to 32bit abi, pass vector _Float16 by sse registers
> > > for 32-bit mode, not stack.
> > >
> > > Guo, Xuepeng (1):
> > >   AVX512FP16: Initial support for AVX512FP16 feature and scalar
> _Float16
> > > instructions.
> > >
> > > liuhongt (5):
> > >   Update hf soft-fp from glibc.
> > >   [i386] Enable _Float16 type for TARGET_SSE2 and above.
> > >   [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> > > truncations.
> > >   Support -fexcess-precision=16 which will enable
> > > FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
> > >   AVX512FP16: Support vector init/broadcast/set/extract for FP16.
> > >
>
> I got
>
> FAIL: gcc.dg/torture/fp-int-convert-float16.c   -Os  execution test
> FAIL: gcc.dg/torture/fp-int-convert-float16-timode.c   -Os  execution test
>
> with -m32:

Guess it hit some precess excession issue w/ x87 fpu.

>
> [hjl@gnu-skx-1 gcc]$ ./xgcc -B./ -m32
> /export/gnu/import/git/gitlab/x86-gcc/gcc/testsuite/gcc.dg/
> torture/fp-int-convert-float16.c
>  -m32  -Os -march=i686 -mfpmath=sse -msse2
> [hjl@gnu-skx-1 gcc]$ ./a.out
> Aborted (core dumped)
> [hjl@gnu-skx-1 gcc]$
>
> H.J.
>


-- 
BR,
Hongtao


Re: [PATCH] c++, abi: Set DECL_FIELD_ABI_IGNORED on C++ zero width bitfields [PR102024]

2021-09-02 Thread Jason Merrill via Gcc-patches

On 9/2/21 4:49 AM, Jakub Jelinek wrote:

On Thu, Sep 02, 2021 at 12:19:03AM +0200, Jakub Jelinek via Gcc-patches wrote:

Ah, thanks for the archeology.  So it indeed seems like in theory an ABI change
between GCC 3.4 and 4.0 for C then on some of the targets like x86_64 which
already existed in 3.2-ish era.  I actually couldn't see a difference
between C and C++ in that era on x86_64, e.g. 3.3 C and C++ both work as
C and C++ now, as if the zero width bitfields aren't removed.
Before the PR42217 fix the C++ FE wasn't really removing the zero width 
bitfields
properly, so it is actually 4.5/4.4-ish regression for C++.


Ok, verified even the C FE used to suffer from the same issue as PR42217 and
didn't actually ever remove any zero width bitfields, while grokfield put
the field width into DECL_INITIAL, then finish_struct did:
   for (x = fieldlist; x; x = TREE_CHAIN (x))
 {
...
   if (DECL_INITIAL (x))
 {
   unsigned HOST_WIDE_INT width = tree_low_cst (DECL_INITIAL (x), 1);
   DECL_SIZE (x) = bitsize_int (width);
   DECL_BIT_FIELD (x) = 1;
   SET_DECL_C_BIT_FIELD (x);
 }

   DECL_INITIAL (x) = 0;
...
 }
and only a few lines later it did:
   /* Delete all zero-width bit-fields from the fieldlist.  */
   {
 tree *fieldlistp = &fieldlist;
 while (*fieldlistp)
   if (TREE_CODE (*fieldlistp) == FIELD_DECL && DECL_INITIAL (*fieldlistp))
 *fieldlistp = TREE_CHAIN (*fieldlistp);
   else
 fieldlistp = &TREE_CHAIN (*fieldlistp);
   }
but DECL_INITIAL was already guaranteed to be NULL here.  PR42217 actually
was the same problem as PR102019, but was fixed by actually making the
zero-width bit-field removal work when it never worked before.


Ah, oops :/


Here is an updated patch that instead uses separate macros for the
previous DECL_FIELD_ABI_IGNORED meaning and for the C++ zero-width
bitfields.  The backends don't need any changes in that case (until they
want to actually use the new macro for the -Wpsabi or ABI decisions):


LGTM.


2021-09-02  Jakub Jelinek  

PR target/102024
gcc/
* tree.h (DECL_FIELD_ABI_IGNORED): Changed into rvalue only macro
that is false if DECL_BIT_FIELD.
(SET_DECL_FIELD_ABI_IGNORED, DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD,
SET_DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD): Define.
* tree-streamer-out.c (pack_ts_decl_common_value_fields): For
DECL_BIT_FIELD stream DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD instead
of DECL_FIELD_ABI_IGNORED.
* tree-streamer-in.c (unpack_ts_decl_common_value_fields): Use
SET_DECL_FIELD_ABI_IGNORED instead of writing to
DECL_FIELD_ABI_IGNORED and for DECL_BIT_FIELD use
SET_DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD instead.
* lto-streamer-out.c (hash_tree): For DECL_BIT_FIELD hash
DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD instead of DECL_FIELD_ABI_IGNORED.
gcc/cp/
* class.c (build_base_field): Use SET_DECL_FIELD_ABI_IGNORED
instead of writing to DECL_FIELD_ABI_IGNORED.
(layout_class_type): Likewise.  In the place where zero-width
bitfields used to be removed, use
SET_DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD on those fields instead.
gcc/lto/
* lto-common.c (compare_tree_sccs_1): Also compare
DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD values.

--- gcc/tree.h.jj   2021-09-01 21:30:30.551306387 +0200
+++ gcc/tree.h  2021-09-02 10:34:43.559851006 +0200
@@ -2852,16 +2852,34 @@ extern void decl_value_expr_insert (tree
  /* In a FIELD_DECL, indicates this field should be bit-packed.  */
  #define DECL_PACKED(NODE) (FIELD_DECL_CHECK (NODE)->base.u.bits.packed_flag)
  
+/* Nonzero in a FIELD_DECL means it is a bit field, and must be accessed

+   specially.  */
+#define DECL_BIT_FIELD(NODE) (FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_1)
+
  /* In a FIELD_DECL, indicates this field should be ignored for ABI decisions
 like passing/returning containing struct by value.
 Set for C++17 empty base artificial FIELD_DECLs as well as
 empty [[no_unique_address]] non-static data members.  */
  #define DECL_FIELD_ABI_IGNORED(NODE) \
-  (FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_0)
+  (!DECL_BIT_FIELD (NODE) && (NODE)->decl_common.decl_flag_0)
+#define SET_DECL_FIELD_ABI_IGNORED(NODE, VAL) \
+  do { \
+gcc_checking_assert (!DECL_BIT_FIELD (NODE));  \
+FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_0 = (VAL);   \
+  } while (0)
  
-/* Nonzero in a FIELD_DECL means it is a bit field, and must be accessed

-   specially.  */
-#define DECL_BIT_FIELD(NODE) (FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_1)
+/* In a FIELD_DECL, indicates C++ zero-width bitfield that used to be
+   removed from the IL since PR42217 until PR101539 and by that changed
+   the ABI on several targets.  This flag is provided so that the backends
+   can decide on th

Re: [PATCH] Run pass_sink_code once more before store_mergin

2021-09-02 Thread Martin Jambor
Hi,

On Tue, May 18 2021, Xionghu Luo via Gcc-patches wrote:
>

[...]

> From 7fcc6ca9ef3b6acbfbcbd3da4be1d1c0eef4be80 Mon Sep 17 00:00:00 2001
> From: Xiong Hu Luo 
> Date: Mon, 17 May 2021 20:46:15 -0500
> Subject: [PATCH] Run pass_sink_code once more before store_merging
>
> Gimple sink code pass runs quite early, there may be some new
> oppertunities exposed by later gimple optmization passes, this patch
> runs the sink code pass once more before store_merging.  For detailed
> discussion, please refer to:
> https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562352.html
>
> Tested the SPEC2017 performance on P8LE, 544.nab_r is improved
> by 2.43%, but no big changes to other cases, GEOMEAN is improved quite
> small with 0.25%.
>
> gcc/ChangeLog:
>
>   * passes.def: Add sink_code before store_merging.
>   * tree-ssa-sink.c (pass_sink_code:clone): New.


Unfortunately, this seems to have caused PR 102178
(https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102178)

Sorry about the bad news,

Martin



Re: [PATCH V3 0/6] Initial support for AVX512FP16

2021-09-02 Thread H.J. Lu via Gcc-patches
On Wed, Sep 1, 2021 at 11:00 PM Hongtao Liu  wrote:
>
> I'm going to check in the first 3 patches which are already approved.
>
>   Update hf soft-fp from glibc.
>   [i386] Enable _Float16 type for TARGET_SSE2 and above.
>   [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> truncations.
>
> On Mon, Aug 2, 2021 at 2:31 PM liuhongt  wrote:
> >
> > Update from v2:
> >
> > 1. Support -fexcess-precision=16 which will enable
> > FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
> > 2. Update ix86_get_excess_precision, so -fexcess-precision=standard
> > should not do anything different from -fexcess-precision=fast
> >  regarding _Float16.
> > 3. Avoiding macroization of HFmode patterns.
> > 4. Allow (subreg:SI (reg:HF)).
> > 5. Update documents corresponding exactly to the code changes in
> > the same patch.
> > 6. According to 32bit abi, pass vector _Float16 by sse registers
> > for 32-bit mode, not stack.
> >
> > Guo, Xuepeng (1):
> >   AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16
> > instructions.
> >
> > liuhongt (5):
> >   Update hf soft-fp from glibc.
> >   [i386] Enable _Float16 type for TARGET_SSE2 and above.
> >   [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> > truncations.
> >   Support -fexcess-precision=16 which will enable
> > FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
> >   AVX512FP16: Support vector init/broadcast/set/extract for FP16.
> >

I got

FAIL: gcc.dg/torture/fp-int-convert-float16.c   -Os  execution test
FAIL: gcc.dg/torture/fp-int-convert-float16-timode.c   -Os  execution test

with -m32:

[hjl@gnu-skx-1 gcc]$ ./xgcc -B./ -m32
/export/gnu/import/git/gitlab/x86-gcc/gcc/testsuite/gcc.dg/torture/fp-int-convert-float16.c
 -m32  -Os -march=i686 -mfpmath=sse -msse2
[hjl@gnu-skx-1 gcc]$ ./a.out
Aborted (core dumped)
[hjl@gnu-skx-1 gcc]$

H.J.


Re: [PATCH] improve note location and refactor warn_uninit

2021-09-02 Thread Martin Sebor via Gcc-patches

On 8/27/21 5:23 PM, Jeff Law wrote:



On 8/26/2021 1:30 PM, Martin Sebor via Gcc-patches wrote:

On 8/26/21 10:38 AM, Martin Sebor wrote:

On 8/26/21 1:00 AM, Richard Biener wrote:

On Wed, Aug 25, 2021 at 10:03 PM Martin Sebor  wrote:


Richard, some time ago you mentioned you'd had trouble getting
-Wuninitialized to print the note pointing to the uninitialized
variable.  I said I'd look into it, and so I did.  The attached
patch simplifies the warn_uninit() function to get rid of some
redundant cruft: besides a few pointless null tests and
a duplicate argument it also does away with ancient code that's
responsible for the problem you saw.  It turns out that tests
for the problem are already in the test suite but have been
xfailed since the day they were added, so the patch simply
removes the xfails.  I'll go ahead and commit this cleanup
as obvious later this week unless you have suggestions for
further changes.


I do see lots of useful refactoring.

-  if (context != NULL && gimple_has_location (context))
-    location = gimple_location (context);
-  else if (phiarg_loc != UNKNOWN_LOCATION)
-    location = phiarg_loc;
-  else
-    location = DECL_SOURCE_LOCATION (var);
+  /* Use either the location of the read statement or that of the PHI
+ argument, or that of the uninitialized variable, in that order,
+ whichever is valid.  */
+  location_t location = gimple_location (context);
+  if (location == UNKNOWN_LOCATION)
+    {
+  if (phi_arg_loc == UNKNOWN_LOCATION)
+   location = DECL_SOURCE_LOCATION (var);
+  else
+   location = phi_arg_loc;
+    }

the comment is an improvement but I think the original flow is
easier to follow ;)


It doesn't really simplify things so I can revert it.


I've done that and pushed r12-3165, and...


-  if (xloc.file != floc.file
- || linemap_location_before_p (line_table, location, cfun_loc)
- || linemap_location_before_p (line_table, 
cfun->function_end_locus,

-   location))
-   inform (DECL_SOURCE_LOCATION (var), "%qD was declared here", 
var);

...
+  inform (var_loc, "%qD was declared here", var);

and this is the actual change that "fixes" the missing diagnostics?  
Can

you explain what the reason for the original sing-and-dance was?  It
looks like it tries to print the 'was declared here' only for locations
within the current function (but then it probably intended to do that
on the DECL_SOURCE_LOCATION (var), not on whatever we are
choosing now)?


The author(s) of the logic wanted to print the note only for variables
declared in functions other than the one the uninitialized read is in.
The idea first appears in pr17506, comment #25 (and attachment 12131).
At that time GCC would print no note at all and pr17506 was about
inlining making it difficult to find the uninitialized variable.
The originally reported problem can still be reproduced on Godbolt
(with the original very large translation unit):
https://godbolt.org/z/8WW43rxnd

I've reduced it to a small test case:
https://godbolt.org/z/rnPEfPqPf

It looks like they didn't think it would also be a problem if
the variable was defined and read in the same function, even
if the read was far away from the declaration.



That said, I'd prefer if you can commit the refactoring independently
of this core change and can try to dig why this was added and what
it was supposed to do.


Sure, let me do that.  Please let me know if the fix for the note
is okay to commit as is on its own.


...the removal of the test guarding the note is attached.



Martin



Thanks,
Richard.


Tested on x86_64-linux.

Martin





gcc-17506.diff

Improve note location.

Related:
PR tree-optimization/17506 - warning about uninitialized variable points to 
wrong location
PR testsuite/37182 - Revision 139286 caused gcc.dg/pr17506.c and 
gcc.dg/uninit-15.c

gcc/ChangeLog:

PR tree-optimization/17506
PR testsuite/37182
* tree-ssa-uninit.c (warn_uninit): Remove conditional guarding note.

gcc/testsuite/ChangeLog:

PR tree-optimization/17506
PR testsuite/37182
* gcc.dg/diagnostic-tree-expr-ranges-2.c: Add expected output.
* gcc.dg/uninit-15-O0.c: Remove xfail.
* gcc.dg/uninit-15.c: Same.
OK if neither Joern nor Nathan object by Wednesday morning (want to give 
them a couple workdays to chime in if they feel the need).


Having heard nothing back I've just pushed r12-3315.

Martin


Re: Sv: [PATCH] jit : Generate debug info for variables

2021-09-02 Thread David Malcolm via Gcc-patches
On Tue, 2021-08-31 at 00:23 +, Petter Tomner via Gcc-patches wrote:
> Well I seemed to have attached the wrong testcase. Here is the proper
> one attached.
> 
> Regards,
> 
> -Ursprungligt meddelande-
> Från: Petter Tomner 
> Skickat: den 31 augusti 2021 02:14
> Till: gcc-patches@gcc.gnu.org; j...@gcc.gnu.org
> Ämne: [PATCH] jit : Generate debug info for variables
> 
> Hi,
> 
> This is a patch to generate debug info for local variables as well as
> globals. 
> With this, "ptype foo", "info variables", "info locals" etc works when
> debugging in GDB.
> 
> Finalizing of global variable declares are moved to after locations are
> handled and done
> as Fortran, C, Go etc do it. Also, primitive types have their TYPE_NAME
> set for debug info
> on types to work.
> 
> Below are the patch, and I attached a testcase. Since it requires GDB
> to run it might
> not be suitable? Make check-jit runs fine on Debian x64.
> 
> Regards,

> From 6a5d24cbe80429d19042e643bd4c23940cd185fa Mon Sep 17 00:00:00 2001
> From: Petter Tomner 
> Date: Mon, 30 Aug 2021 01:45:11 +0200
> Subject: [PATCH 2/2] libgccjit: Test cases for debug info
> 
> Assure that debug info is available for variables.
> 
> gcc/testsuite/jit.dg/
>   jit.exp: Helper function
>   test-debuginfo.c

Again, please provided non-empty ChangeLog entries.

You can use contrib/gcc-changelog/git_check_commit.py to validate them.

I don't see "Signed-off-by" tags in the patches.  Have you either filed
a copyright assignment with the FSF, or can you please add the tags to
certify that you wrote the patches and can contribute them, see:
  https://gcc.gnu.org/contribute.html#legal
  https://gcc.gnu.org/dco.html

[...snip...]

> +proc jit-check-debug-info { obj_file cmds match } {
> +verbose "Checking debug info for $obj_file with match: $match"
> +
> +if { [catch {exec gdb -v} fid] } {
> +verbose "No gdb seems to be in path. Can't check debug info.
Reporting expected fail."
> +xfail "No gdb seems to be in path. Can't check debug info"

I think this should be "unsupported" rather than "xfail".

[...snip...]

> diff --git a/gcc/testsuite/jit.dg/test-debuginfo.c
b/gcc/testsuite/jit.dg/test-debuginfo.c
> new file mode 100644
> index 000..0af5779fdd1
> --- /dev/null
> +++ b/gcc/testsuite/jit.dg/test-debuginfo.c
> @@ -0,0 +1,72 @@
> +/* Essentially this test checks that debug info are generated for
globals
> +   locals and functions, including type info.  The comment bellow is
used
> +   as fake code (does not affect the test, use for manual
debugging). */
> +/*
> +int a_global_for_test_debuginfo;
> +int main (int argc, char **argv)
> +{
> +int a_local_for_test_debuginfo = 2;
> +return a_global_for_test_debuginfo + a_local_for_test_debuginfo;
> +}
> +*/

This is OK, but maybe using gcc_jit_context_dump_to_file with
update_locations == 1 might be more sustainable in the long run?  See:
  https://gcc.gnu.org/onlinedocs/jit/topics/locations.html#faking-it

OK otherwise.

Thanks
Dave



[PATCH V3 0/6] Initial support for AVX512FP16

2021-09-02 Thread Hongtao Liu via Gcc-patches
On Thursday, September 2, 2021, Iain Sandoe  wrote:

> Hi Hongtao.
>
> > On 2 Sep 2021, at 07:06, Hongtao Liu via Gcc-patches <
> gcc-patches@gcc.gnu.org> wrote:
> >
> > I'm going to check in the first 3 patches which are already approved.
> >
> >  Update hf soft-fp from glibc.
> >  [i386] Enable _Float16 type for TARGET_SSE2 and above.
> >  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> >truncations.
>
> Bootstrap on Darwin x86_64 is broken on at least AVX512 and i5 cpus at
> revision
> r12-3311-g1e6267b33526.
>
> "fp-machine.h:81:22: error: unknown type name 'TFtype'; did you mean
> 'HFtype’?”
>
> any immediate ideas on what might be the issue?
> thanks


Seems to be related to the belowpart which is not changed by my patch, and
TFtype is defined in quad.h

76

/*
Define ALIASNAME as a strong alias for NAME. */
77

#if
defined __MACH__
78

/*
Mach-O doesn't support aliasing. If these functions ever return
79

anything
but CMPtype we need to revisit this... */
80

#define
strong_alias(name, aliasname) \
81

CMPtype
aliasname (TFtype a, TFtype b) { return name(a, b); }
82

#else

Would you try to add
typedef float TFtype __attribute__ ((mode (TF)));
Here to see if it fixes the issue.

Iain
>
>

> >
> > On Mon, Aug 2, 2021 at 2:31 PM liuhongt  wrote:
> >>
> >> Update from v2:
> >>
> >> 1. Support -fexcess-precision=16 which will enable
> >> FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
> >> 2. Update ix86_get_excess_precision, so -fexcess-precision=standard
> >> should not do anything different from -fexcess-precision=fast
> >> regarding _Float16.
> >> 3. Avoiding macroization of HFmode patterns.
> >> 4. Allow (subreg:SI (reg:HF)).
> >> 5. Update documents corresponding exactly to the code changes in
> >> the same patch.
> >> 6. According to 32bit abi, pass vector _Float16 by sse registers
> >> for 32-bit mode, not stack.
> >>
> >> Guo, Xuepeng (1):
> >>  AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16
> >>instructions.
> >>
> >> liuhongt (5):
> >>  Update hf soft-fp from glibc.
> >>  [i386] Enable _Float16 type for TARGET_SSE2 and above.
> >>  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
> >>truncations.
> >>  Support -fexcess-precision=16 which will enable
> >>FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
> >>  AVX512FP16: Support vector init/broadcast/set/extract for FP16.
> >>
> >> gcc/ada/gcc-interface/misc.c  |   3 +
> >> gcc/c-family/c-common.c   |   6 +-
> >> gcc/c-family/c-cppbuiltin.c   |   6 +-
> >> gcc/common.opt|   5 +-
> >> gcc/common/config/i386/cpuinfo.h  |   2 +
> >> gcc/common/config/i386/i386-common.c  |  26 +-
> >> gcc/common/config/i386/i386-cpuinfo.h |   1 +
> >> gcc/common/config/i386/i386-isas.h|   1 +
> >> gcc/config.gcc|   2 +-
> >> gcc/config/aarch64/aarch64.c  |   1 +
> >> gcc/config/arm/arm.c  |   1 +
> >> gcc/config/i386/avx512fp16intrin.h| 225 ++
> >> gcc/config/i386/cpuid.h   |   1 +
> >> gcc/config/i386/i386-builtin-types.def|   7 +-
> >> gcc/config/i386/i386-builtins.c   |  23 +
> >> gcc/config/i386/i386-c.c  |   2 +
> >> gcc/config/i386/i386-expand.c | 129 +-
> >> gcc/config/i386/i386-isa.def  |   1 +
> >> gcc/config/i386/i386-modes.def|  13 +-
> >> gcc/config/i386/i386-options.c|   4 +-
> >> gcc/config/i386/i386.c| 243 +--
> >> gcc/config/i386/i386.h|  29 +-
> >> gcc/config/i386/i386.md   | 291 +

Re: [PATCH] jit : Generate debug info for variables

2021-09-02 Thread David Malcolm via Gcc-patches
On Tue, 2021-08-31 at 00:13 +, Petter Tomner via Gcc-patches wrote:
> Hi,
> 
> This is a patch to generate debug info for local variables as well as
> globals. 
> With this, "ptype foo", "info variables", "info locals" etc works
> when debugging in GDB.
> 
> Finalizing of global variable declares are moved to after locations
> are handled and done
> as Fortran, C, Go etc do it. Also, primitive types have their
> TYPE_NAME set for debug info
> on types to work.
> 
> Below are the patch, and I attached a testcase. Since it requires GDB
> to run it might
> not be suitable? Make check-jit runs fine on Debian x64.

Thanks for the patches.  Overall, looks good, but I have some review
nits...

Reviewing patch 1 in this email...

> From d77e77104024c7ae9ce31b419dad1f0a5801fda7 Mon Sep 17 00:00:00
> 2001
> From: Petter Tomner 
> Date: Mon, 30 Aug 2021 01:44:07 +0200
> Subject: [PATCH 1/2] libgccjit: Generate debug info for variables
> 
> Finalize declares via available helpers after location is set. Set
> TYPE_NAME of primitives and friends to "int" etc. Debug info is now
> set properly for variables.
> 
> 2021-08-31  Petter Tomner   
> 
> gcc/jit
> jit-playback.c
> jit-playback.h
> gcc/testsuite/jit.dg/
> test-error-array-bounds.c: Array is not unsigned

Can you write non-empty ChangeLog entries please.

[...snip...]

> --- a/gcc/jit/jit-playback.c
> +++ b/gcc/jit/jit-playback.c

[...snip...]

> @@ -2984,15 +2975,22 @@ replay ()
>  {
>    int i;
>    function *func;
> -
> +  tree global;
>    /* No GC can happen yet; process the cached source locations. 
> */
>    handle_locations ();
>  
> +  /* Finalize globals. See how FORTRAN 95 does it in
> gfc_be_parse_file()
> + for a simple reference. */
> +  FOR_EACH_VEC_ELT (m_globals, i, global)
> +    rest_of_decl_compilation (global, true, true);
> +
> +  wrapup_global_declarations (m_globals.address(),
> m_globals.length());
> +
>    /* We've now created tree nodes for the stmts in the various
> blocks
> -    in each function, but we haven't built each function's
> single stmt
> -    list yet.  Do so now.  */
> +  in each function, but we haven't built each function's
> single stmt
> +  list yet.  Do so now.  */
>    FOR_EACH_VEC_ELT (m_functions, i, func)
> -   func->build_stmt_list ();
> + func->build_stmt_list ();

Looks like some whitespace churn above; did your text editor
accidentally convert tabs to spaces?  I prefer to avoid changes that
touch lines without changing things, as it messes up e.g. "git blame".

In case you haven't discovered it yet, "git add -p" is very helpful for
just staging individual hunks within a changed file.

[...snip...]

...plus some comments about the testcase which I'll post in reply to
the other patch.


Dave



Re: [PATCH] Jit, testsuite: Amend expect processing to tolerate more platforms.

2021-09-02 Thread David Malcolm via Gcc-patches
On Thu, 2021-08-19 at 19:59 +0100, Iain Sandoe wrote:
> Hi,
> 
> Preface:
> 
> this is the last patch for now in my series - with this applied
> Darwin
> reports the same results as Linux (at least, for modern x86_64
> platform versions).
> 
> Note
> a)  that the expect expression in {fixed}host_execute seems to depend
> on the assumption that the dejagnu.h output is used by the testcase
> and that the executable’s output can be seen to end with the totals
> produced there (which might in itself be erroneous, see 3).
> 
> b) the main GCC testsuite processing does not do this; rather the
> expect
> expression is somewhat simple and the output from the executable
> is copied into a secondary buffer, which is then processed by
> prune expressions and then to find the requisite matches (so most
> of the issues seen below do not occur there).
> 
>  patch discussion
> 
> The current 'fixed_host_execute' implementation fails on Darwin
> platforms for a number of reasons:
> 
> 1/ If the sub-process spawn fails (e.g. because of missing or mal-
>    formed params); rather than reporting the fail output into the
>    match stream, as indicated by the expect manual, it terminates
>    the script.
> 
>  - We fix this by (a) checking that the executable is valid as well
>    as existing (b) we put the spawn into a catch block and report
>    a failure.
> 
> 2/ There is no recovery path at all for a buffer-full case (and we
>    do see buffer-full events with the default sizes).
> 
>  - Added by the patch here, however it is not as sophisticated as
>    the methods used by dejagnu internally.  Here we set the process
>    to be "nowait" and then close the connection - with the intent
>    that this will terminate the spawned process.
> 
> 3/  The expect logic assumes that 'Totals:' is a valid indicator
>     for the end of the spawned process output.  This is not true
>     even for the default dejagnu header (there are a number of
>     additional reporting lines after).  In addition to this, there
>     are some tests that intentionally produce more output after
>     the totals report (and there are tests that do not use that
>     mechanism at all).
> 
>     The effect is the we might arrive at the "wait" for the spawned
>     process to finish - but that process might not have completed
>     all its output.  For Darwin, at least that causes a deadlock
>     between expect and the spawnee - the latter is doing a non-
>     cancellable write and the former is waiting for the latter to
>     terminate.  For some reason this does not seem to affect Linux
>     perhaps the pty implementation allows the write(s) are able to
>     proceed even though there is no reader.
> 
>  -  This is fixed by modifying the loop termination condition to be
>     either EOF (which will be the 'correct' condition) or a timeout
>     which would represent an error either in the runtime or in the
>     parsing of the output.  As added precautions, we only try to
>     wait if there is a correcly-spawned process, and we are also
>     specific about which process we are waiting for.
> 
> 4/  Darwin appears to have a bug in either the tcl or termios
>     'cooking' code that ocassionally inserts an additional CR char
>     into the stream - thus '\n' => '\r\r\n' instead of '\r\n'. The
>     original program output is correct (it only contains a single
>     \n) - the additional character is being inserted somewhere in
>     the translations applied before the output reaches expect.
> 
>     The logic of this expect implementation does not tolerate single
>     \r or \n characters (it will fail with a timeout or buffer-full
>     if that occurs).
> 
>  -  This is fixed by having a line-end match that is adjusted for
>     Darwin.
> 
> 5/  The default buffer size does seem to be too small in some cases
>     noting that GCC uses 1 as the match buffer size and the
>     default is 2000.
> 
>  -  Fixed by increasing the size to 8192.
> 
> 6/  There is a somewhat arbitrary dumping of output where we match
>     ^$prefix\tSOMETHING... and then process the something.  This
>     essentially allows the match to start at any place in the buffer
>     following any collection of non-line-end chars.
> 
>  -  Fixed by amending the match for 'general' lines to accommodate
>     these cases, and reporting such lines to the log.  At least this
>     should allow debugging of any cases where output that should be
>     recognized is being dropped.
> 
> tested on i686, x86_64-darwin, x86_64,powerpc64-linux,

Which versions of DejaGnu, BTW?

> OK for master?

Did you try this with RUN_UNDER_VALGRIND set?  Assuming that that still
works, yes, looks good to me.

Thanks for the patch
Dave




Re: [PATCH RFA] tree: Change error_operand_p to an inline function

2021-09-02 Thread Marek Polacek via Gcc-patches
On Thu, Sep 02, 2021 at 10:41:52AM -0400, Jason Merrill via Gcc-patches wrote:
> I've thought for a while that many of the macros in tree.h and such should
> become inline functions.  This one in particular was confusing Coverity; the
> null check in the macro made it think that all code guarded by
> error_operand_p would also need null checks.
> 
> Tested x86_64-pc-linux-gnu.  OK for trunk?

Please!  One nit:

> ---
>  gcc/tree.h | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/tree.h b/gcc/tree.h
> index 2c8973f34e2..5f27f02df9e 100644
> --- a/gcc/tree.h
> +++ b/gcc/tree.h
> @@ -4349,9 +4349,12 @@ tree_strip_any_location_wrapper (tree exp)
>  
>  /* True if NODE is an erroneous expression.  */

s/NODE/T/
  
> -#define error_operand_p(NODE)\
> -  ((NODE) == error_mark_node \
> -   || ((NODE) && TREE_TYPE ((NODE)) == error_mark_node))
> +inline bool
> +error_operand_p (const_tree t)
> +{
> +  return (t == error_mark_node
> +   || (t && TREE_TYPE (t) == error_mark_node));
> +}
>  
>  /* Return the number of elements encoded directly in a VECTOR_CST.  */
>  
> 
> base-commit: 483e400870601f650c80f867ec781cd5f83507d6
> -- 
> 2.27.0
> 

Marek



[PATCH RFA] tree: Change error_operand_p to an inline function

2021-09-02 Thread Jason Merrill via Gcc-patches
I've thought for a while that many of the macros in tree.h and such should
become inline functions.  This one in particular was confusing Coverity; the
null check in the macro made it think that all code guarded by
error_operand_p would also need null checks.

Tested x86_64-pc-linux-gnu.  OK for trunk?

---
 gcc/tree.h | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/tree.h b/gcc/tree.h
index 2c8973f34e2..5f27f02df9e 100644
--- a/gcc/tree.h
+++ b/gcc/tree.h
@@ -4349,9 +4349,12 @@ tree_strip_any_location_wrapper (tree exp)
 
 /* True if NODE is an erroneous expression.  */
 
-#define error_operand_p(NODE)  \
-  ((NODE) == error_mark_node   \
-   || ((NODE) && TREE_TYPE ((NODE)) == error_mark_node))
+inline bool
+error_operand_p (const_tree t)
+{
+  return (t == error_mark_node
+ || (t && TREE_TYPE (t) == error_mark_node));
+}
 
 /* Return the number of elements encoded directly in a VECTOR_CST.  */
 

base-commit: 483e400870601f650c80f867ec781cd5f83507d6
-- 
2.27.0



Re: [PATCH] warn for more impossible null pointer tests

2021-09-02 Thread Martin Sebor via Gcc-patches

On 9/2/21 7:43 AM, Jason Merrill wrote:

On 9/1/21 6:27 PM, Martin Sebor wrote:

On 9/1/21 3:39 PM, Jason Merrill wrote:

On 9/1/21 4:33 PM, Martin Sebor wrote:

On 9/1/21 1:21 PM, Jason Merrill wrote:

On 8/31/21 10:08 PM, Martin Sebor wrote:

A Coverity run recently uncovered a latent bug in GCC that GCC should
be able to detect itself: comparing the address of a declared object
for equality to null, similar to:

   int f (void)
   {
 int a[2][2];
 return &a == 0;
   }

GCC issues -Waddress for this code, but the bug Coverity found was
actually closer to the following:

   int f (void)
   {
 int a[2][2];
 return a[0] == 0;
   }

where the hapless author (yours truly) meant to compare the value
of a[0][0] (as in r12-3268).

This variant is not diagnosed even though the bug in it is the same
and I'd expect more likely to occur in practice.  (&a[0] == 0 isn't
diagnosed either, though that's a less likely mistake to make).

The attached patch enhances -Waddress to detect this variant along
with a number of other similar instances of the problem, including
comparing the address of array members to null.

Besides these, the patch also issues -Waddress for null equality
tests of pointer-plus expressions such as in:

   int g (int i)
   {
 return a[0] + i == 0;
   }

and in C++ more instances of pointers to members.

Testing on x86_64-linux, besides a few benign issues in GCC sources
a regression test, run shows a failure in gcc.dg/Waddress.c.  That's
a test added after GCC for some reason stopped warning for one of
the basic cases that other tools warn about (comparing an array to
null).  I suspect the change was unintentional because GCC still
warns for other very similar expressions.  The reporter who also
submitted the test in pr36299 argued that the warning wasn't
helpful because tests for arrays sometimes come from macros, and
the test was committed after it was noted that GCC no longer warned
for the reporter's simple case.  While it's certainly true that
the warning can be triggered by the null equality tests in macros
(the patch exposed two such instances in GCC) they are easy to
avoid (the patch adds a an additional escape hatch).  At the same
time, as is evident from the Coverity bug report and from the two
issues the enhancement exposes in the FORTRAN front end (even if
benign), issuing the warning in these cases does help find bugs
or mistaken assumptions.  With that, I've changed the test to
expect the restored -Waddress warning instead.

Testing with Glibc exposed a couple of harmless comparisons of
arrays a large macro in vfprintf-internal.c.  I'll submit a fix
to avoid the -Waddress instances if/when this enhancement is
approved.

Testing with Binutils/GDB also turned up a couple of pointless
comparison of arrays to null and a couple of uses in macros that
can be trivially suppressed.

Martin

PS Clang issues a warning for some of the same null pointer tests
the patch diagnoses, including gcc.dg/Waddress.c, except under at
least three different options: some under -Wpointer-bool-conversion,
others under -Wtautological-pointer-compare, and others still under
-Wtautological-compare.



+  while (TREE_CODE (cop) == ARRAY_REF
+ || TREE_CODE (cop) == COMPONENT_REF)
+    {
+  unsigned opno = TREE_CODE (cop) == COMPONENT_REF;
+  cop = TREE_OPERAND (cop, opno);
+    }


1) Maybe 'while (handled_component_p (cop))'?
2) Why handle COMPONENT_REF differently?  Operand 1 is the 
FIELD_DECL, which doesn't have an address of its own.


This is because the address of a field is never null, regardless of
what the P in in &P->M points to.


True, though I'd change "invalid" to "undefined" in the comment for 
decl_with_nonnull_addr_p.



(With the caveat mentioned in
the comment further up about the pointer used to access the member
being nonnull.)  So this is diagnosed:

   extern struct { int m; } *p;
   bool b = &p->m == 0;

Using handled_component_p() in a loop would prevent that.


Would it?  p isn't declared weak.


Maybe I misunderstood.  This loop:

   while (handled_component_p (cop))
 cop = TREE_OPERAND (cop, 0);

would unwrap the COMPONENT_REF from cop and terminate with it set
to INDIRECT_REF for which decl_with_nonnull_addr_p() would return
false.  But if you meant to keep the body as is and just change
the condition, that would work.  If you think that's better,
e.g., because it would handle more cases, I'm all for it.


It's worth considering the other codes in handled_component_p, at least. 
  BIT_FIELD_REF we shouldn't see an address of.  I don't think the C++ 
front end produces ARRAY_RANGE_REF.  I'd think REALPART_EXPR and 
IMAGPART_EXPR would be like COMPONENT_REF.  I guess we would want to 
look through VIEW_CONVERT_EXPR.


Okay, let me update the patch then and repost before committing.




For array_refs, the loop gets us the decl to mention in the warning.
But this should work too and looks cleaner:

   cop = TREE_OPERAND (cop, 0);

 

Re: [Committed] [PATCH 2/4] (v4) On-demand locations within string-literals

2021-09-02 Thread Thomas Schwinge
Hi!

On 2016-08-05T14:16:58-0400, David Malcolm  wrote:
> Committed to trunk as r239175; I'm attaching the final version of the
> patch for reference.

David, you've added here 'gcc/input.h:struct location_hash' (see quoted
below), which will be useful elsewhere, so:

> --- a/gcc/input.c
> +++ b/gcc/input.c

> +/* Internal function.  Canonicalize LOC into a form suitable for
> +   use as a key within the database, stripping away macro expansion,
> +   ad-hoc information, and range information, using the location of
> +   the start of LOC within an ordinary linemap.  */
> +
> +location_t
> +string_concat_db::get_key_loc (location_t loc)
> +{
> +  loc = linemap_resolve_location (line_table, loc, LRK_SPELLING_LOCATION,
> +   NULL);
> +
> +  loc = get_range_from_loc (line_table, loc).m_start;
> +
> +  return loc;
> +}

OK to push the attached
"Harden 'gcc/input.c:string_concat_db::get_key_loc'"?  (This fell out of
my analysis for development work elsewhere.)

> --- a/gcc/input.h
> +++ b/gcc/input.h

> +struct location_hash : int_hash  { };
> +
> +class GTY(()) string_concat_db
> +{
> +[...]
> +  hash_map  *m_table;
> +};

OK to push the attached
"Generalize 'gcc/input.h:struct location_hash'"?


Grüße
 Thomas


-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
>From 521c94471ae2f044f8cca8025bfa8db2d2936aea Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 31 Aug 2021 23:05:46 +0200
Subject: [PATCH 1/2] Harden 'gcc/input.c:string_concat_db::get_key_loc'

We're using 'UNKNOWN_LOCATION' as a spare value for 'Empty', so should
ascertain that we don't use it as a key additionally.

Follow-up to r239175 (commit 88faa309e5d6c6171b957daaf2f800920869)
"On-demand locations within string-literals".

	gcc/
	* input.c (string_concat_db::get_key_loc): Harden.
---
 gcc/input.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/gcc/input.c b/gcc/input.c
index 4b809862e02..98b8bb64618 100644
--- a/gcc/input.c
+++ b/gcc/input.c
@@ -1483,6 +1483,9 @@ string_concat_db::get_key_loc (location_t loc)
 
   loc = get_range_from_loc (line_table, loc).m_start;
 
+  /* Ascertain that 'loc' is valid as a key in 'm_table'.  */
+  gcc_checking_assert (!RESERVED_LOCATION_P (loc));
+
   return loc;
 }
 
-- 
2.33.0

>From 349a3172f64db93ee98ea39b36489b702b6596ab Mon Sep 17 00:00:00 2001
From: Thomas Schwinge 
Date: Tue, 31 Aug 2021 23:30:25 +0200
Subject: [PATCH 2/2] Generalize 'gcc/input.h:struct location_hash'

This is currently only used here ('gcc/input.h:class string_concat_db'), but is
actually generally useful, so advertize it as such.

Per the rationale given, we may use 'BUILTINS_LOCATION' as spare value for
'Deleted', in addition to the existing use of 'UNKNOWN_LOCATION' as spare value
for 'Empty'.

	gcc/
	* input.h (location_hash): Use 'BUILTINS_LOCATION' as spare value
	for 'Deleted'.  Turn into a '#define'.
---
 gcc/input.h | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/gcc/input.h b/gcc/input.h
index e6881072c5f..46971a2684c 100644
--- a/gcc/input.h
+++ b/gcc/input.h
@@ -36,6 +36,25 @@ extern GTY(()) class line_maps *saved_line_table;
both UNKNOWN_LOCATION and BUILTINS_LOCATION fit into that.  */
 STATIC_ASSERT (BUILTINS_LOCATION < RESERVED_LOCATION_COUNT);
 
+/* Hasher for 'location_t' values satisfying '!RESERVED_LOCATION_P', thus able
+   to use 'UNKNOWN_LOCATION'/'BUILTINS_LOCATION' as spare values for
+   'Empty'/'Deleted'.  */
+/* If the following is used more than once, 'gengtype' generates duplicate
+   functions (thus: "error: redefinition of 'void gt_ggc_mx(location_hash&)'"
+   etc.):
+
+   struct location_hash
+ : int_hash {};
+
+   Likewise for this:
+
+   typedef int_hash
+ location_hash;
+
+   Thus, use a plain ol' '#define':
+*/
+#define location_hash int_hash
+
 extern bool is_location_from_builtin_token (location_t);
 extern expanded_location expand_location (location_t);
 
@@ -230,8 +249,6 @@ public:
   location_t * GTY ((atomic)) m_locs;
 };
 
-struct location_hash : int_hash  { };
-
 class GTY(()) string_concat_db
 {
  public:
-- 
2.33.0



Re: [PATCH] warn for more impossible null pointer tests

2021-09-02 Thread Jason Merrill via Gcc-patches

On 9/1/21 6:27 PM, Martin Sebor wrote:

On 9/1/21 3:39 PM, Jason Merrill wrote:

On 9/1/21 4:33 PM, Martin Sebor wrote:

On 9/1/21 1:21 PM, Jason Merrill wrote:

On 8/31/21 10:08 PM, Martin Sebor wrote:

A Coverity run recently uncovered a latent bug in GCC that GCC should
be able to detect itself: comparing the address of a declared object
for equality to null, similar to:

   int f (void)
   {
 int a[2][2];
 return &a == 0;
   }

GCC issues -Waddress for this code, but the bug Coverity found was
actually closer to the following:

   int f (void)
   {
 int a[2][2];
 return a[0] == 0;
   }

where the hapless author (yours truly) meant to compare the value
of a[0][0] (as in r12-3268).

This variant is not diagnosed even though the bug in it is the same
and I'd expect more likely to occur in practice.  (&a[0] == 0 isn't
diagnosed either, though that's a less likely mistake to make).

The attached patch enhances -Waddress to detect this variant along
with a number of other similar instances of the problem, including
comparing the address of array members to null.

Besides these, the patch also issues -Waddress for null equality
tests of pointer-plus expressions such as in:

   int g (int i)
   {
 return a[0] + i == 0;
   }

and in C++ more instances of pointers to members.

Testing on x86_64-linux, besides a few benign issues in GCC sources
a regression test, run shows a failure in gcc.dg/Waddress.c.  That's
a test added after GCC for some reason stopped warning for one of
the basic cases that other tools warn about (comparing an array to
null).  I suspect the change was unintentional because GCC still
warns for other very similar expressions.  The reporter who also
submitted the test in pr36299 argued that the warning wasn't
helpful because tests for arrays sometimes come from macros, and
the test was committed after it was noted that GCC no longer warned
for the reporter's simple case.  While it's certainly true that
the warning can be triggered by the null equality tests in macros
(the patch exposed two such instances in GCC) they are easy to
avoid (the patch adds a an additional escape hatch).  At the same
time, as is evident from the Coverity bug report and from the two
issues the enhancement exposes in the FORTRAN front end (even if
benign), issuing the warning in these cases does help find bugs
or mistaken assumptions.  With that, I've changed the test to
expect the restored -Waddress warning instead.

Testing with Glibc exposed a couple of harmless comparisons of
arrays a large macro in vfprintf-internal.c.  I'll submit a fix
to avoid the -Waddress instances if/when this enhancement is
approved.

Testing with Binutils/GDB also turned up a couple of pointless
comparison of arrays to null and a couple of uses in macros that
can be trivially suppressed.

Martin

PS Clang issues a warning for some of the same null pointer tests
the patch diagnoses, including gcc.dg/Waddress.c, except under at
least three different options: some under -Wpointer-bool-conversion,
others under -Wtautological-pointer-compare, and others still under
-Wtautological-compare.



+  while (TREE_CODE (cop) == ARRAY_REF
+ || TREE_CODE (cop) == COMPONENT_REF)
+    {
+  unsigned opno = TREE_CODE (cop) == COMPONENT_REF;
+  cop = TREE_OPERAND (cop, opno);
+    }


1) Maybe 'while (handled_component_p (cop))'?
2) Why handle COMPONENT_REF differently?  Operand 1 is the 
FIELD_DECL, which doesn't have an address of its own.


This is because the address of a field is never null, regardless of
what the P in in &P->M points to.


True, though I'd change "invalid" to "undefined" in the comment for 
decl_with_nonnull_addr_p.



(With the caveat mentioned in
the comment further up about the pointer used to access the member
being nonnull.)  So this is diagnosed:

   extern struct { int m; } *p;
   bool b = &p->m == 0;

Using handled_component_p() in a loop would prevent that.


Would it?  p isn't declared weak.


Maybe I misunderstood.  This loop:

   while (handled_component_p (cop))
 cop = TREE_OPERAND (cop, 0);

would unwrap the COMPONENT_REF from cop and terminate with it set
to INDIRECT_REF for which decl_with_nonnull_addr_p() would return
false.  But if you meant to keep the body as is and just change
the condition, that would work.  If you think that's better,
e.g., because it would handle more cases, I'm all for it.


It's worth considering the other codes in handled_component_p, at least. 
 BIT_FIELD_REF we shouldn't see an address of.  I don't think the C++ 
front end produces ARRAY_RANGE_REF.  I'd think REALPART_EXPR and 
IMAGPART_EXPR would be like COMPONENT_REF.  I guess we would want to 
look through VIEW_CONVERT_EXPR.



For array_refs, the loop gets us the decl to mention in the warning.
But this should work too and looks cleaner:

   cop = TREE_OPERAND (cop, 0);

   /* Get the outermost array.  */
   while (TREE_CODE (cop) == ARRAY_REF)
 cop = TREE_OPERAND (cop, 0)

[PATCH] tree-optimization/102176 - locally compute participating SLP stmts

2021-09-02 Thread Richard Biener via Gcc-patches
This performs local re-computation of participating scalar stmts
in BB vectorization subgraphs to allow precise computation of
liveness of scalar stmts after vectorization and thus precise
costing.  This treats all extern defs as live but continues
to optimistically handle scalar defs that we think we can handle
by lane-extraction even though that can still fail late during
code-generation.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Any comments?

Thanks,
Richard.

2021-09-02  Richard Biener  

PR tree-optimization/102176
* tree-vect-slp.c (vect_slp_gather_vectorized_scalar_stmts):
New function.
(vect_bb_slp_scalar_cost): Use the computed set of
vectorized scalar stmts instead of relying on the out-of-date
and not accurate PURE_SLP_STMT.
(vect_bb_vectorization_profitable_p): Compute the set
of vectorized scalar stmts.
---
 gcc/tree-vect-slp.c | 69 +
 1 file changed, 64 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-vect-slp.c b/gcc/tree-vect-slp.c
index fa3566f3d06..024a1c38a23 100644
--- a/gcc/tree-vect-slp.c
+++ b/gcc/tree-vect-slp.c
@@ -5104,6 +5104,42 @@ vect_bb_partition_graph (bb_vec_info bb_vinfo)
 }
 }
 
+/* Compute the set of scalar stmts participating in internal and external
+   nodes.  */
+
+static void
+vect_slp_gather_vectorized_scalar_stmts (vec_info *vinfo, slp_tree node,
+hash_set &visited,
+hash_set &vstmts,
+hash_set &estmts)
+{
+  int i;
+  stmt_vec_info stmt_info;
+  slp_tree child;
+
+  if (visited.add (node))
+return;
+
+  if (SLP_TREE_DEF_TYPE (node) == vect_internal_def)
+{
+  FOR_EACH_VEC_ELT (SLP_TREE_SCALAR_STMTS (node), i, stmt_info)
+   vstmts.add (stmt_info);
+
+  FOR_EACH_VEC_ELT (SLP_TREE_CHILDREN (node), i, child)
+   if (child)
+ vect_slp_gather_vectorized_scalar_stmts (vinfo, child, visited,
+  vstmts, estmts);
+}
+  else
+for (tree def : SLP_TREE_SCALAR_OPS (node))
+  {
+   stmt_vec_info def_stmt = vinfo->lookup_def (def);
+   if (def_stmt)
+ estmts.add (def_stmt);
+  }
+}
+
+
 /* Compute the scalar cost of the SLP node NODE and its children
and return it.  Do not account defs that are marked in LIFE and
update LIFE according to uses of NODE.  */
@@ -5112,6 +5148,7 @@ static void
 vect_bb_slp_scalar_cost (vec_info *vinfo,
 slp_tree node, vec *life,
 stmt_vector_for_cost *cost_vec,
+hash_set &vectorized_scalar_stmts,
 hash_set &visited)
 {
   unsigned i;
@@ -5148,8 +5185,7 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
  {
stmt_vec_info use_stmt_info = vinfo->lookup_stmt (use_stmt);
if (!use_stmt_info
-   || !PURE_SLP_STMT
- (vect_stmt_to_vectorize (use_stmt_info)))
+   || !vectorized_scalar_stmts.contains (use_stmt_info))
  {
(*life)[i] = true;
break;
@@ -5212,7 +5248,7 @@ vect_bb_slp_scalar_cost (vec_info *vinfo,
  subtree_life.safe_splice (*life);
}
  vect_bb_slp_scalar_cost (vinfo, child, &subtree_life, cost_vec,
-  visited);
+  vectorized_scalar_stmts, visited);
  subtree_life.truncate (0);
}
 }
@@ -5254,11 +5290,33 @@ vect_bb_vectorization_profitable_p (bb_vec_info 
bb_vinfo,
  SLP_INSTANCE_TREE (instance), visited);
 }
 
+  /* Compute the set of scalar stmts we know will go away 'locally' when
+ vectorizing.  This used to be tracked with just PURE_SLP_STMT but that's
+ not accurate for nodes promoted extern late or for scalar stmts that
+ are used both in extern defs and in vectorized defs.  */
+  hash_set vectorized_scalar_stmts;
+  hash_set scalar_stmts_in_externs;
+  hash_set visited;
+  FOR_EACH_VEC_ELT (slp_instances, i, instance)
+{
+  vect_slp_gather_vectorized_scalar_stmts (bb_vinfo,
+  SLP_INSTANCE_TREE (instance),
+  visited,
+  vectorized_scalar_stmts,
+  scalar_stmts_in_externs);
+  for (stmt_vec_info rstmt : SLP_INSTANCE_ROOT_STMTS (instance))
+   vectorized_scalar_stmts.add (rstmt);
+}
+  /* Scalar stmts used as defs in external nodes need to be preseved, so
+ remove them from vectorized_scalar_stmts.  */
+  for (stmt_vec_info stmt : scalar_stmts_in_externs)
+vectorized_scalar_stmts.remove (stmt);
+
   /* Calculate scalar cost and sum the c

Re: [RFC/PATCH] ipa-inline: Add target info into fn summary [PR102059]

2021-09-02 Thread Richard Biener via Gcc-patches
On Thu, Sep 2, 2021 at 3:11 PM Kewen.Lin  wrote:
>
> on 2021/9/2 下午7:51, Richard Biener wrote:
> > On Thu, Sep 2, 2021 at 1:13 PM Kewen.Lin  wrote:
> >>
> >> Hi Richi,
> >>
> >> Thanks for the comments!
> >>
> >> on 2021/9/2 下午5:25, Richard Biener wrote:
> >>> On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin  wrote:
> 
>  Hi!
> 
>  Power ISA 2.07 (Power8) introduces transactional memory feature
>  but ISA3.1 (Power10) removes it.  It exposes one troublesome
>  issue as PR102059 shows.  Users define some function with
>  target pragma cpu=power10 then it calls one function with
>  attribute always_inline which inherits command line option
>  cpu=power8 which enables HTM implicitly.  The current isa_flags
>  check doesn't allow this inlining due to "target specific
>  option mismatch" and error mesasge is emitted.
> 
>  Normally, the callee function isn't intended to exploit HTM
>  feature, but the default flag setting make it look it has.
>  As Richi raised in the PR, we have fp_expressions flag in
>  function summary, and allow us to check the function actually
>  contains any floating point expressions to avoid overkill.
>  So this patch follows the similar idea but is more target
>  specific, for this rs6000 port specific requirement on HTM
>  feature check, we would like to check rs6000 specific HTM
>  built-in functions and inline assembly, it allows targets
>  to do their own customized checks and updates.
> 
>  It introduces two target hooks need_ipa_fn_target_info and
>  update_ipa_fn_target_info.  The former allows target to do
>  some previous check and decides to collect target specific
>  information for this function or not.  For some special case,
>  it can predict the analysis result and push it early without
>  any scannings.  The latter allows the analyze_function_body
>  to pass gimple stmts down just like fp_expressions handlings,
>  target can do its own tricks.  I put them as one hook initially
>  with one boolean to indicates whether it's initial time, but
>  the code looks a bit ugly, to separate them seems to have
>  better readability.
> 
>  To make it simple, this patch uses HOST_WIDE_INT to record the
>  flags just like what we use for isa_flags.  For rs6000's HTM
>  need, one HOST_WIDE_INT variable is quite enough, but it seems
>  good to have one auto_vec for scalability as I noticed some
>  targets have more than one HOST_WIDE_INT flag.  For now, this
>  target information collection is only for always_inline function,
>  function ipa_merge_fn_summary_after_inlining deals with target
>  information merging.
> 
>  The patch has been bootstrapped and regress-tested on
>  powerpc64le-linux-gnu Power9.
> 
>  Is it on the right track?
> >>>
> >>> +  if (always_inline)
> >>> +{
> >>> +  cgraph_node *callee_node = cgraph_node::get (callee);
> >>> +  if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != 
> >>> NULL)
> >>> +   {
> >>> + if (dump_file)
> >>> +   ipa_dump_fn_summary (dump_file, callee_node);
> >>> + const vec &info =
> >>> +   ipa_fn_summaries->get (callee_node)->target_info;
> >>> + if (!info.is_empty ())
> >>> +   always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM;
> >>> +   }
> >>> +
> >>> +  caller_isa &= ~always_inline_safe_mask;
> >>> +  callee_isa &= ~always_inline_safe_mask;
> >>> +}
> >>>
> >>> that's a bit convoluted but obviously the IPA info can be used for
> >>> non-always_inline cases as well.
> >>>
> >>> As said above the info can be produced for not always-inline functions
> >>> as well, the usual case would be for LTO inlining across TUs compiled
> >>> with different target options.  In your case the special -mcpu=power10
> >>> TU would otherwise not be able to inline from a general -mcpu=power8 TU.
> >>>
> >>
> >> Agree it can be extended to non-always_inline cases.  Since always_inline
> >> is kind of user "forced" requirement and compiler emits error if it fails
> >> to inline, while non-always_inline will have warning instead.  Considering
> >> the scanning might be considered as costly for some big functions, I
> >> guessed it might be good to start from always_inline as the first step.
> >> But if different target options among LTO TUs is a common user case, I
> >> think it's worth to extending it now.
> >
> > I was merely looking at this from the perspective of test coverage - with
> > restricting it to always-inline we're not going to notice issues very
> > reliably I think.
> >
>
> Got it, will extend it to support all inlinable functions in next version.
>
> >>> On the streaming side we possibly have to take care about the
> >>> GPU offloading path where we likely want to avoid pushing host target
> >>> bits to the GPU target in some way.
> >>>
> >>
> >> I guess this comment

Re: [RFC/PATCH] ipa-inline: Add target info into fn summary [PR102059]

2021-09-02 Thread Kewen.Lin via Gcc-patches
on 2021/9/2 下午7:51, Richard Biener wrote:
> On Thu, Sep 2, 2021 at 1:13 PM Kewen.Lin  wrote:
>>
>> Hi Richi,
>>
>> Thanks for the comments!
>>
>> on 2021/9/2 下午5:25, Richard Biener wrote:
>>> On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin  wrote:

 Hi!

 Power ISA 2.07 (Power8) introduces transactional memory feature
 but ISA3.1 (Power10) removes it.  It exposes one troublesome
 issue as PR102059 shows.  Users define some function with
 target pragma cpu=power10 then it calls one function with
 attribute always_inline which inherits command line option
 cpu=power8 which enables HTM implicitly.  The current isa_flags
 check doesn't allow this inlining due to "target specific
 option mismatch" and error mesasge is emitted.

 Normally, the callee function isn't intended to exploit HTM
 feature, but the default flag setting make it look it has.
 As Richi raised in the PR, we have fp_expressions flag in
 function summary, and allow us to check the function actually
 contains any floating point expressions to avoid overkill.
 So this patch follows the similar idea but is more target
 specific, for this rs6000 port specific requirement on HTM
 feature check, we would like to check rs6000 specific HTM
 built-in functions and inline assembly, it allows targets
 to do their own customized checks and updates.

 It introduces two target hooks need_ipa_fn_target_info and
 update_ipa_fn_target_info.  The former allows target to do
 some previous check and decides to collect target specific
 information for this function or not.  For some special case,
 it can predict the analysis result and push it early without
 any scannings.  The latter allows the analyze_function_body
 to pass gimple stmts down just like fp_expressions handlings,
 target can do its own tricks.  I put them as one hook initially
 with one boolean to indicates whether it's initial time, but
 the code looks a bit ugly, to separate them seems to have
 better readability.

 To make it simple, this patch uses HOST_WIDE_INT to record the
 flags just like what we use for isa_flags.  For rs6000's HTM
 need, one HOST_WIDE_INT variable is quite enough, but it seems
 good to have one auto_vec for scalability as I noticed some
 targets have more than one HOST_WIDE_INT flag.  For now, this
 target information collection is only for always_inline function,
 function ipa_merge_fn_summary_after_inlining deals with target
 information merging.

 The patch has been bootstrapped and regress-tested on
 powerpc64le-linux-gnu Power9.

 Is it on the right track?
>>>
>>> +  if (always_inline)
>>> +{
>>> +  cgraph_node *callee_node = cgraph_node::get (callee);
>>> +  if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL)
>>> +   {
>>> + if (dump_file)
>>> +   ipa_dump_fn_summary (dump_file, callee_node);
>>> + const vec &info =
>>> +   ipa_fn_summaries->get (callee_node)->target_info;
>>> + if (!info.is_empty ())
>>> +   always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM;
>>> +   }
>>> +
>>> +  caller_isa &= ~always_inline_safe_mask;
>>> +  callee_isa &= ~always_inline_safe_mask;
>>> +}
>>>
>>> that's a bit convoluted but obviously the IPA info can be used for
>>> non-always_inline cases as well.
>>>
>>> As said above the info can be produced for not always-inline functions
>>> as well, the usual case would be for LTO inlining across TUs compiled
>>> with different target options.  In your case the special -mcpu=power10
>>> TU would otherwise not be able to inline from a general -mcpu=power8 TU.
>>>
>>
>> Agree it can be extended to non-always_inline cases.  Since always_inline
>> is kind of user "forced" requirement and compiler emits error if it fails
>> to inline, while non-always_inline will have warning instead.  Considering
>> the scanning might be considered as costly for some big functions, I
>> guessed it might be good to start from always_inline as the first step.
>> But if different target options among LTO TUs is a common user case, I
>> think it's worth to extending it now.
> 
> I was merely looking at this from the perspective of test coverage - with
> restricting it to always-inline we're not going to notice issues very
> reliably I think.
> 

Got it, will extend it to support all inlinable functions in next version.

>>> On the streaming side we possibly have to take care about the
>>> GPU offloading path where we likely want to avoid pushing host target
>>> bits to the GPU target in some way.
>>>
>>
>> I guess this comment is about lto_stream_offload_p, I just did some quick
>> checks, this flag seems to guard things into section offload_lto, while
>> the function summary has its own section, it seems fine?
> 
> Yes, but the data, since its target specific, is interpreted different

[PATCH V2] Set bound/cmp/control for until wrap loop.

2021-09-02 Thread Jiufu Guo via Gcc-patches


Changes on V1:
* Add more test case
* Add comment for exit-condition transform
* Removing duplicate setting on niter->control

This patch reset niter->control, niter->bound and niter->cmp in
number_of_iterations_until_wrap.

Bootstrap and test pass on ppc64 and x86, and pass the test cases
in PR.  Is this ok for trunk?

One thing, in this patch, the IVbase is still keep as biasing by 1 step.


BR.
Jiufu Guo

gcc/ChangeLog:

2021-09-02  Jiufu Guo  

PR tree-optimization/102087
* tree-ssa-loop-niter.c (number_of_iterations_until_wrap):
Update bound/cmp/control for niter.

gcc/testsuite/ChangeLog:

2021-09-02  Jiufu Guo  

PR tree-optimization/102087
* gcc.dg/pr102087.c: New test.

---
 gcc/tree-ssa-loop-niter.c   | 16 ++-
 gcc/testsuite/gcc.dg/pr102087.c | 35 +
 2 files changed, 50 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr102087.c

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 7af92d1c893..75109407124 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1482,7 +1482,7 @@ number_of_iterations_until_wrap (class loop *, tree type, 
affine_iv *iv0,
 affine_iv *iv1, class tree_niter_desc *niter)
 {
   tree niter_type = unsigned_type_for (type);
-  tree step, num, assumptions, may_be_zero;
+  tree step, num, assumptions, may_be_zero, span;
   wide_int high, low, max, min;
 
   may_be_zero = fold_build2 (LE_EXPR, boolean_type_node, iv1->base, iv0->base);
@@ -1557,6 +1557,20 @@ number_of_iterations_until_wrap (class loop *, tree 
type, affine_iv *iv0,
 
   niter->control.no_overflow = false;
 
+  /* Update bound and exit condition as:
+ bound = niter * STEP + (IVbase - STEP).
+ { IVbase - STEP, +, STEP } != bound
+ Here, biasing IVbase by 1 step makes 'bound' be the value before wrap.
+ */
+  niter->control.base = fold_build2 (MINUS_EXPR, niter_type,
+niter->control.base, niter->control.step);
+  span = fold_build2 (MULT_EXPR, niter_type, niter->niter,
+ fold_convert (niter_type, niter->control.step));
+  niter->bound = fold_build2 (PLUS_EXPR, niter_type, span,
+ fold_convert (niter_type, niter->control.base));
+  niter->bound = fold_convert (type, niter->bound);
+  niter->cmp = NE_EXPR;
+
   return true;
 }
 
diff --git a/gcc/testsuite/gcc.dg/pr102087.c b/gcc/testsuite/gcc.dg/pr102087.c
new file mode 100644
index 000..fc60cbda066
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102087.c
@@ -0,0 +1,35 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+unsigned __attribute__ ((noinline))
+foo (int *__restrict__ a, int *__restrict__ b, unsigned l, unsigned n)
+{
+  while (n < ++l)
+*a++ = *b++ + 1;
+  return l;
+}
+
+volatile int a[1];
+unsigned b;
+int c;
+
+int
+check ()
+{
+  int d;
+  for (; b > 1; b++)
+for (c = 0; c < 2; c++)
+  for (d = 0; d < 2; d++)
+   a[0];
+  return 0;
+}
+
+char **Gif_ClipImage_gfi_0;
+int Gif_ClipImage_y, Gif_ClipImage_shift;
+void
+Gif_ClipImage ()
+{
+  for (; Gif_ClipImage_y >= Gif_ClipImage_shift; Gif_ClipImage_y++)
+Gif_ClipImage_gfi_0[Gif_ClipImage_shift]
+  = Gif_ClipImage_gfi_0[Gif_ClipImage_y];
+}
-- 
2.17.1



Re: [PATCH]AArch64 RFC: Don't cost all scalar operations during vectorization if scalar will fuse

2021-09-02 Thread Richard Biener via Gcc-patches
On Wed, Sep 1, 2021 at 3:47 PM Richard Biener
 wrote:
>
> On Tue, Aug 31, 2021 at 4:50 PM Richard Sandiford via Gcc-patches
>  wrote:
> >
> > Tamar Christina  writes:
> > > Hi All,
> > >
> > > As the vectorizer has improved over time in capabilities it has started
> > > over-vectorizing.  This has causes regressions in the order of 1-7x on 
> > > libraries
> > > that Arm produces.
> > >
> > > The vector costs actually do make a lot of sense and I don't think that 
> > > they are
> > > wrong.  I think that the costs for the scalar code are wrong.
> > >
> > > In particular the costing doesn't take into effect that scalar operation
> > > can/will fuse as this happens in RTL.  Because of this the costs for the 
> > > scalars
> > > end up being always higher.
> > >
> > > As an example the loop in PR 97984:
> > >
> > > void x (long * __restrict a, long * __restrict b)
> > > {
> > >   a[0] *= b[0];
> > >   a[1] *= b[1];
> > >   a[0] += b[0];
> > >   a[1] += b[1];
> > > }
> > >
> > > generates:
> > >
> > > x:
> > > ldp x2, x3, [x0]
> > > ldr x4, [x1]
> > > ldr q1, [x1]
> > > mul x2, x2, x4
> > > ldr x4, [x1, 8]
> > > fmovd0, x2
> > > ins v0.d[1], x3
> > > mul x1, x3, x4
> > > ins v0.d[1], x1
> > > add v0.2d, v0.2d, v1.2d
> > > str q0, [x0]
> > > ret
> > >
> > > On an actual loop the prologue costs would make the loop too expensive so 
> > > we
> > > produce the scalar output, but with SLP there's no loop overhead costs so 
> > > we end
> > > up trying to vectorize this. Because SLP discovery is started from the 
> > > stores we
> > > will end up vectorizing and costing the add but not the MUL.
> > >
> > > To counter this the patch adjusts the costing when it finds an operation 
> > > that
> > > can be fused and discounts the cost of the "other" operation being fused 
> > > in.
> > >
> > > The attached testcase shows that even when we discount it we still get 
> > > still get
> > > vectorized code when profitable to do so, e.g. SVE.
> > >
> > > This happens as well with other operations such as scalar operations where
> > > shifts can be fused in or for e.g. bfxil.  As such sending this for 
> > > feedback.
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> > >
> > > Ok for master? If the approach is acceptable I can add support for more.
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   PR target/97984
> > >   * config/aarch64/aarch64.c (aarch64_add_stmt_cost): Check for fusing
> > >   madd.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   PR target/97984
> > >   * gcc.target/aarch64/pr97984-1.c: New test.
> > >   * gcc.target/aarch64/pr97984-2.c: New test.
> > >   * gcc.target/aarch64/pr97984-3.c: New test.
> > >   * gcc.target/aarch64/pr97984-4.c: New test.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> > > index 
> > > 4cd4b037f2606e515ad8f4669d2cd13a509dd0a4..329b556311310d86aaf546d7b395a3750a9d57d4
> > >  100644
> > > --- a/gcc/config/aarch64/aarch64.c
> > > +++ b/gcc/config/aarch64/aarch64.c
> > > @@ -15536,6 +15536,39 @@ aarch64_add_stmt_cost (class vec_info *vinfo, 
> > > void *data, int count,
> > >   stmt_cost = aarch64_sve_adjust_stmt_cost (vinfo, kind, stmt_info,
> > > vectype, stmt_cost);
> > >
> > > +  /* Scale costs if operation is fusing.  */
> > > +  if (stmt_info && kind == scalar_stmt)
> > > +  {
> > > + if (gassign *stmt = dyn_cast (STMT_VINFO_STMT 
> > > (stmt_info)))
> > > +   {
> > > + switch (gimple_assign_rhs_code (stmt))
> > > + {
> > > + case PLUS_EXPR:
> > > + case MINUS_EXPR:
> > > +   {
> > > + /* Check if operation can fuse into MSUB or MADD.  */
> > > + tree rhs1 = gimple_assign_rhs1 (stmt);
> > > + if (gassign *stmt1 = dyn_cast (SSA_NAME_DEF_STMT 
> > > (rhs1)))
> > > +   if (gimple_assign_rhs_code (stmt1) == MULT_EXPR)
> > > + {
> > > +   stmt_cost = 0;
> > > +   break;
> > > +}
> > > + tree rhs2 = gimple_assign_rhs2 (stmt);
> > > + if (gassign *stmt2 = dyn_cast (SSA_NAME_DEF_STMT 
> > > (rhs2)))
> > > +   if (gimple_assign_rhs_code (stmt2) == MULT_EXPR)
> > > + {
> > > +   stmt_cost = 0;
> > > +   break;
> > > + }
> > > +   }
> > > +   break;
> > > + default:
> > > +   break;
> > > + }
> > > +   }
> > > +  }
> > > +
> >
> > The difficulty with this is that we can also use MLA-type operations
> > for SVE, and for Advanced SIMD if the mode is not DI.  It's not just
> > a scalar thing.
> >
> > We already take the combination 

Re: [Patch v2] C, C++, Fortran, OpenMP: Add support for device-modifiers for 'omp target device'

2021-09-02 Thread Jakub Jelinek via Gcc-patches
On Thu, Sep 02, 2021 at 02:09:25PM +0200, Marcel Vollweiler wrote:
> gcc/testsuite/ChangeLog:
> 
>   * gfortran.dg/gomp/target-device-ancestor-4.f90: Comment out dg-final 
> to avoid
>UNRESOLVED.

Ok, thanks.
> 
> diff --git a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90 
> b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90
> index 540b3d0..63872fa 100644
> --- a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90
> +++ b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90
> @@ -11,4 +11,4 @@
>  
>  end
>  
> -! { dg-final { scan-tree-dump "pragma omp target 
> \[^\n\r)]*device\\(ancestor:1\\)" "original" } }
> +! TODO: dg-final { scan-tree-dump-times "pragma omp target 
> \[^\n\r)]*device\\(ancestor:1\\)" 1 "original" } }


Jakub



Re: [Patch v2] C, C++, Fortran, OpenMP: Add support for device-modifiers for 'omp target device'

2021-09-02 Thread Marcel Vollweiler



Am 01.09.2021 um 11:02 schrieb Jakub Jelinek:

On Wed, Sep 01, 2021 at 09:06:31AM +0200, Christophe Lyon wrote:

   * gfortran.dg/gomp/target-device-ancestor-4.f90: New test.




The last new test fails on aarch64:
  /gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90:7:15: Error:
Sorry, 'reverse_offload' clause at (1) on REQUIRES directive is not yet
supported
compiler exited with status 1
PASS: gfortran.dg/gomp/target-device-ancestor-4.f90   -O   (test for
errors, line 7)
XFAIL: gfortran.dg/gomp/target-device-ancestor-4.f90   -O  sorry,
unimplemented: 'ancestor' not yet supported (test for warnings, line 9)
PASS: gfortran.dg/gomp/target-device-ancestor-4.f90   -O  (test for excess
errors)
gfortran.dg/gomp/target-device-ancestor-4.f90   -O  : dump file does not
exist
UNRESOLVED: gfortran.dg/gomp/target-device-ancestor-4.f90   -O
scan-tree-dump original "pragma omp target [^\n\r)]*device\\(ancestor:1\\)"


It is UNRESOLVED everywhere.  Unlike the C/C++ FEs that emit the original
dump even if there are errors/sorry during parsing, the Fortran FE doesn't
do that.
So I think either the dg-final should be xfailed or removed for now.


To xfail dg-final does not seem to work with a missing dump (it results
in UNRESOLVED as before). Instead I commented out dg-final with "TODO"
similar to other tests and hope that this is ok?



  Jakub



Marcel

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955
gcc/testsuite/ChangeLog:

* gfortran.dg/gomp/target-device-ancestor-4.f90: Comment out dg-final 
to avoid
 UNRESOLVED.

diff --git a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90 
b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90
index 540b3d0..63872fa 100644
--- a/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90
+++ b/gcc/testsuite/gfortran.dg/gomp/target-device-ancestor-4.f90
@@ -11,4 +11,4 @@
 
 end
 
-! { dg-final { scan-tree-dump "pragma omp target 
\[^\n\r)]*device\\(ancestor:1\\)" "original" } }
+! TODO: dg-final { scan-tree-dump-times "pragma omp target 
\[^\n\r)]*device\\(ancestor:1\\)" 1 "original" } }


Re: [RFC/PATCH] ipa-inline: Add target info into fn summary [PR102059]

2021-09-02 Thread Richard Biener via Gcc-patches
On Thu, Sep 2, 2021 at 1:13 PM Kewen.Lin  wrote:
>
> Hi Richi,
>
> Thanks for the comments!
>
> on 2021/9/2 下午5:25, Richard Biener wrote:
> > On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin  wrote:
> >>
> >> Hi!
> >>
> >> Power ISA 2.07 (Power8) introduces transactional memory feature
> >> but ISA3.1 (Power10) removes it.  It exposes one troublesome
> >> issue as PR102059 shows.  Users define some function with
> >> target pragma cpu=power10 then it calls one function with
> >> attribute always_inline which inherits command line option
> >> cpu=power8 which enables HTM implicitly.  The current isa_flags
> >> check doesn't allow this inlining due to "target specific
> >> option mismatch" and error mesasge is emitted.
> >>
> >> Normally, the callee function isn't intended to exploit HTM
> >> feature, but the default flag setting make it look it has.
> >> As Richi raised in the PR, we have fp_expressions flag in
> >> function summary, and allow us to check the function actually
> >> contains any floating point expressions to avoid overkill.
> >> So this patch follows the similar idea but is more target
> >> specific, for this rs6000 port specific requirement on HTM
> >> feature check, we would like to check rs6000 specific HTM
> >> built-in functions and inline assembly, it allows targets
> >> to do their own customized checks and updates.
> >>
> >> It introduces two target hooks need_ipa_fn_target_info and
> >> update_ipa_fn_target_info.  The former allows target to do
> >> some previous check and decides to collect target specific
> >> information for this function or not.  For some special case,
> >> it can predict the analysis result and push it early without
> >> any scannings.  The latter allows the analyze_function_body
> >> to pass gimple stmts down just like fp_expressions handlings,
> >> target can do its own tricks.  I put them as one hook initially
> >> with one boolean to indicates whether it's initial time, but
> >> the code looks a bit ugly, to separate them seems to have
> >> better readability.
> >>
> >> To make it simple, this patch uses HOST_WIDE_INT to record the
> >> flags just like what we use for isa_flags.  For rs6000's HTM
> >> need, one HOST_WIDE_INT variable is quite enough, but it seems
> >> good to have one auto_vec for scalability as I noticed some
> >> targets have more than one HOST_WIDE_INT flag.  For now, this
> >> target information collection is only for always_inline function,
> >> function ipa_merge_fn_summary_after_inlining deals with target
> >> information merging.
> >>
> >> The patch has been bootstrapped and regress-tested on
> >> powerpc64le-linux-gnu Power9.
> >>
> >> Is it on the right track?
> >
> > +  if (always_inline)
> > +{
> > +  cgraph_node *callee_node = cgraph_node::get (callee);
> > +  if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL)
> > +   {
> > + if (dump_file)
> > +   ipa_dump_fn_summary (dump_file, callee_node);
> > + const vec &info =
> > +   ipa_fn_summaries->get (callee_node)->target_info;
> > + if (!info.is_empty ())
> > +   always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM;
> > +   }
> > +
> > +  caller_isa &= ~always_inline_safe_mask;
> > +  callee_isa &= ~always_inline_safe_mask;
> > +}
> >
> > that's a bit convoluted but obviously the IPA info can be used for
> > non-always_inline cases as well.
> >
> > As said above the info can be produced for not always-inline functions
> > as well, the usual case would be for LTO inlining across TUs compiled
> > with different target options.  In your case the special -mcpu=power10
> > TU would otherwise not be able to inline from a general -mcpu=power8 TU.
> >
>
> Agree it can be extended to non-always_inline cases.  Since always_inline
> is kind of user "forced" requirement and compiler emits error if it fails
> to inline, while non-always_inline will have warning instead.  Considering
> the scanning might be considered as costly for some big functions, I
> guessed it might be good to start from always_inline as the first step.
> But if different target options among LTO TUs is a common user case, I
> think it's worth to extending it now.

I was merely looking at this from the perspective of test coverage - with
restricting it to always-inline we're not going to notice issues very
reliably I think.

> > On the streaming side we possibly have to take care about the
> > GPU offloading path where we likely want to avoid pushing host target
> > bits to the GPU target in some way.
> >
>
> I guess this comment is about lto_stream_offload_p, I just did some quick
> checks, this flag seems to guard things into section offload_lto, while
> the function summary has its own section, it seems fine?

Yes, but the data, since its target specific, is interpreted different by
the host target than by the offload target so IMHO we should drop this
to a conservative state when offloading?

> > Your case is specifically lookin

Re: [PATCH] avoid transform at run until wrap comparesion

2021-09-02 Thread Jiufu Guo via Gcc-patches
Richard Biener  writes:

> On Thu, 2 Sep 2021, Jiufu Guo wrote:
>
>> When transform
>>   {iv0.base, iv0.step} LT/LE {iv1.base, iv1.step}
>> to
>>   {iv0.base, iv0.step - iv1.step} LT/LE {iv1.base, 0}
>> 
>> There would be error if 'iv0.step - iv1.step' in negative,
>> for which means run until wrap/overflow.
>> 
>> For example:
>>{1, +, 1} <= {4, +, 3} => {1, +, -2} <= {4, +, 0}
>> 
>> This patch avoid this kind transform.
>> 
>> Bootstrap and regtest pass on ppc64le.
>> Is this ok for trunk?
>
> This looks like PR100740, see the discussion starting at
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571570.html
Oh, thanks!
>
> We seem to be at a dead end figuring what's exactly required
> to make the transform valid and I have my doubts that arguing
> with overflow we're not running in circles.
>
> My last attempt was
>
> +  if (code != NE_EXPR)
> +   {
> + if (TREE_CODE (step) != INTEGER_CST)
> +   return false;
> + if (!iv0->no_overflow || !iv1->no_overflow)
> +   return false;
> + /* The new IV does not overflow if the step has the same
> +sign and is less in magnitude.  */
> + if (tree_int_cst_sign_bit (iv0->step) != tree_int_cst_sign_bit 
> (step)
> + || wi::geu_p (wi::abs (wi::to_wide (step)),
> +   wi::abs (wi::to_wide (iv0->step
> +   return false;
> +   }
>
> where your patch at least misses { 1, +, -1 } < { -3, + , -3 }
> transforming to { 1, +, +2 } < -3, thus a positive step but
> we're still iterating unti wrap?  That is, the sign of the
> combined step cannot really be the issue at hand.
hmm, this transform may be invalid for a few cases.
If the "iv0->step - iv1->step" is possitive, there may be some
cases are not valid on LE/LT, like the example you said (or
{0, +, 5} < {MAX - 30, 3}, iv0 walks faster, but iv1 overflow
early).
Also it maybe circles or inifinite occur on NE cases.

Similary with this patch to check if combined STEP is negative,
I find in the links, there is also talking about 
'if (tree_int_cst_lt (iv0->step, iv1->step))'
For this kind of cases LE/LT, the transformation seems always
invalid.  right?

>
> Bin argued my condition is too strict and I agreed, see
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574334.html
> which is the last mail in the thread sofar (still without a fix :/)
>
> Somewhere it was said that we eventually should simply put
> preconditions into ->assumptions rather than trying to
> check ->no_overflow and friends, not sure if that's really
> applicable here.
Yeap, adding to ->assumptions could help some kind of cases,
the assumption may include "checking on iv0.base/iv1.base".

Like the example "{0, +, 5} < {MAX - 30, 3}", it become
"{0, +, 2} < {MAX - 30, 0}", this calls number_of_iterations_lt
and get a niter, which is not same with the real niter
(the original real niter maybe 10: 30/3 or less if iv1 does not
overflow). 
For, "{base0, +, 5} < {base1, 3}", we may get a niter from
"{base0, +, 0} < {base1, 3}", and then use 'assumption' which
checking if the niter is valid.

And for "NE_EXPR", the assumption may be more complicate
on possible circles/inifinite loops.

BR.
Jiufu

>
> Richard.
>
>> BR.
>> Jiufu
>> 
>> gcc/ChangeLog:
>> 
>> 2021-09-02  Jiufu Guo  
>> 
>>  PR tree-optimization/102131
>>  * tree-ssa-loop-niter.c (number_of_iterations_cond):
>>  Avoid transform until wrap condition
>> 
>> gcc/testsuite/ChangeLog:
>> 
>> 2021-09-02  Jiufu Guo  
>> 
>>  PR tree-optimization/102131
>>  * gcc.dg/pr102131.c: New test.
>> 
>> ---
>>  gcc/tree-ssa-loop-niter.c   | 18 +
>>  gcc/testsuite/gcc.dg/pr102131.c | 69 +
>>  2 files changed, 87 insertions(+)
>>  create mode 100644 gcc/testsuite/gcc.dg/pr102131.c
>> 
>> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
>> index 7af92d1c893..52ce517af89 100644
>> --- a/gcc/tree-ssa-loop-niter.c
>> +++ b/gcc/tree-ssa-loop-niter.c
>> @@ -1866,6 +1866,24 @@ number_of_iterations_cond (class loop *loop,
>>|| !iv0->no_overflow || !iv1->no_overflow))
>>  return false;
>>  
>> +  /* GT/GE has been transformed to LT/LE already.
>> +cmp_code could be LT, LE or NE
>> +
>> +For LE/LT transform
>> +{iv0.base, iv0.step} LT/LE {iv1.base, iv1.step}
>> +to
>> +{iv0.base, iv0.step - iv1.step} LT/LE {iv1.base, 0}
>> +Negative iv0.step - iv1.step means decreasing until wrap,
>> +then the transform is not accurate.
>> +
>> +For example:
>> +{1, +, 1} <= {4, +, 3}
>> +is not same with
>> +{1, +, -2} <= {4, +, 0}
>> +   */
>> +  if ((code == LE_EXPR || code == LT_EXPR) && tree_int_cst_sign_bit 
>> (step))
>> +return false;
>> +
>>iv0->step = step;
>>if (!POINTER_TYPE_P (type))
>>  iv0->no_overflow = false;
>> diff --git a/gcc/testsuite/gcc.dg/pr102131.c 
>> b/gcc/testsuite/gcc.dg/pr102131.c
>> new file mode 100644
>> index 0

Re: [PATCH V3 0/6] Initial support for AVX512FP16

2021-09-02 Thread Iain Sandoe via Gcc-patches
Hi Hongtao.

> On 2 Sep 2021, at 07:06, Hongtao Liu via Gcc-patches 
>  wrote:
> 
> I'm going to check in the first 3 patches which are already approved.
> 
>  Update hf soft-fp from glibc.
>  [i386] Enable _Float16 type for TARGET_SSE2 and above.
>  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
>truncations.

Bootstrap on Darwin x86_64 is broken on at least AVX512 and i5 cpus at revision
r12-3311-g1e6267b33526.

"fp-machine.h:81:22: error: unknown type name 'TFtype'; did you mean 'HFtype’?”

any immediate ideas on what might be the issue?
thanks
Iain

> 
> On Mon, Aug 2, 2021 at 2:31 PM liuhongt  wrote:
>> 
>> Update from v2:
>> 
>> 1. Support -fexcess-precision=16 which will enable
>> FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
>> 2. Update ix86_get_excess_precision, so -fexcess-precision=standard
>> should not do anything different from -fexcess-precision=fast
>> regarding _Float16.
>> 3. Avoiding macroization of HFmode patterns.
>> 4. Allow (subreg:SI (reg:HF)).
>> 5. Update documents corresponding exactly to the code changes in
>> the same patch.
>> 6. According to 32bit abi, pass vector _Float16 by sse registers
>> for 32-bit mode, not stack.
>> 
>> Guo, Xuepeng (1):
>>  AVX512FP16: Initial support for AVX512FP16 feature and scalar _Float16
>>instructions.
>> 
>> liuhongt (5):
>>  Update hf soft-fp from glibc.
>>  [i386] Enable _Float16 type for TARGET_SSE2 and above.
>>  [i386] libgcc: Enable hfmode soft-sf/df/xf/tf extensions and
>>truncations.
>>  Support -fexcess-precision=16 which will enable
>>FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16 when backend supports _Float16.
>>  AVX512FP16: Support vector init/broadcast/set/extract for FP16.
>> 
>> gcc/ada/gcc-interface/misc.c  |   3 +
>> gcc/c-family/c-common.c   |   6 +-
>> gcc/c-family/c-cppbuiltin.c   |   6 +-
>> gcc/common.opt|   5 +-
>> gcc/common/config/i386/cpuinfo.h  |   2 +
>> gcc/common/config/i386/i386-common.c  |  26 +-
>> gcc/common/config/i386/i386-cpuinfo.h |   1 +
>> gcc/common/config/i386/i386-isas.h|   1 +
>> gcc/config.gcc|   2 +-
>> gcc/config/aarch64/aarch64.c  |   1 +
>> gcc/config/arm/arm.c  |   1 +
>> gcc/config/i386/avx512fp16intrin.h| 225 ++
>> gcc/config/i386/cpuid.h   |   1 +
>> gcc/config/i386/i386-builtin-types.def|   7 +-
>> gcc/config/i386/i386-builtins.c   |  23 +
>> gcc/config/i386/i386-c.c  |   2 +
>> gcc/config/i386/i386-expand.c | 129 +-
>> gcc/config/i386/i386-isa.def  |   1 +
>> gcc/config/i386/i386-modes.def|  13 +-
>> gcc/config/i386/i386-options.c|   4 +-
>> gcc/config/i386/i386.c| 243 +--
>> gcc/config/i386/i386.h|  29 +-
>> gcc/config/i386/i386.md   | 291 -
>> gcc/config/i386/i386.opt  |   4 +
>> gcc/config/i386/immintrin.h   |   4 +
>> gcc/config/i386/sse.md| 397 +-
>> gcc/config/m68k/m68k.c|   2 +
>> gcc/config/s390/s390.c|   2 +
>> gcc/coretypes.h   |   3 +-
>> gcc/doc/extend.texi   |  22 +
>> gcc/doc/invoke.texi   |  10 +-
>> gcc/doc/tm.texi   |  14 +-
>> gcc/doc/tm.texi.in|   3 +
>> gcc/emit-rtl.c|   5 +
>> gcc/flag-types.h  |   3 +-
>> gcc/fortran/options.c |   3 +
>> gcc/lto/lto-lang.c|   3 +
>> gcc/target.def|  11 +-
>> gcc/testsuite/g++.dg/other/i386-2.C   |   2 +-
>> gcc/testsuite/g++.dg/other/i386-3.C   |   2 +-
>> gcc/testsuite/g++.target/i386/float16-1.C |   8 +
>> gcc/testsuite/g++.target/i386/float16-2.C |  14 +
>> gcc/testsuite/g++.target/i386/float16-3.C |  10 +
>> gcc/testsuite/gcc.target/i386/avx-1.c |   2 +-
>> gcc/testsuite/gcc.target/i386/avx-2.c |   2 +-
>> gcc/testsuite/gcc.target/i386/avx512-check.h  |   3 +
>> .../gcc.target/i386/avx512fp16-12a.c  |  21 +
>> .../gcc.target/i386/avx512fp16-12b.c  |  27 ++
>> gcc/testsuite/gcc.target/i386/float16-3a.c|  10 +
>> gcc/testsuite/gcc.target/i386/float16-3b.c|  10 +
>> gcc/testsuite/gcc.target/i386/float16-4a.c|  10 +
>> gcc/testsuite/gcc.target/i386/float16-4b.c|  10 +
>> gcc/testsuite/gcc.target/i386/float16-5.c |  12 +
>> gcc/testsuite/gcc.target/i386/float16-6.c |   8 +
>> gcc/testsuite/gcc.target/i386/funcspec-56.inc |   2 +
>> gcc/testsuite/gcc.target/i386/pr54855-12.c|  14 +
>> gcc/test

Re: [RFC/PATCH] ipa-inline: Add target info into fn summary [PR102059]

2021-09-02 Thread Kewen.Lin via Gcc-patches
Hi Richi,

Thanks for the comments!

on 2021/9/2 下午5:25, Richard Biener wrote:
> On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin  wrote:
>>
>> Hi!
>>
>> Power ISA 2.07 (Power8) introduces transactional memory feature
>> but ISA3.1 (Power10) removes it.  It exposes one troublesome
>> issue as PR102059 shows.  Users define some function with
>> target pragma cpu=power10 then it calls one function with
>> attribute always_inline which inherits command line option
>> cpu=power8 which enables HTM implicitly.  The current isa_flags
>> check doesn't allow this inlining due to "target specific
>> option mismatch" and error mesasge is emitted.
>>
>> Normally, the callee function isn't intended to exploit HTM
>> feature, but the default flag setting make it look it has.
>> As Richi raised in the PR, we have fp_expressions flag in
>> function summary, and allow us to check the function actually
>> contains any floating point expressions to avoid overkill.
>> So this patch follows the similar idea but is more target
>> specific, for this rs6000 port specific requirement on HTM
>> feature check, we would like to check rs6000 specific HTM
>> built-in functions and inline assembly, it allows targets
>> to do their own customized checks and updates.
>>
>> It introduces two target hooks need_ipa_fn_target_info and
>> update_ipa_fn_target_info.  The former allows target to do
>> some previous check and decides to collect target specific
>> information for this function or not.  For some special case,
>> it can predict the analysis result and push it early without
>> any scannings.  The latter allows the analyze_function_body
>> to pass gimple stmts down just like fp_expressions handlings,
>> target can do its own tricks.  I put them as one hook initially
>> with one boolean to indicates whether it's initial time, but
>> the code looks a bit ugly, to separate them seems to have
>> better readability.
>>
>> To make it simple, this patch uses HOST_WIDE_INT to record the
>> flags just like what we use for isa_flags.  For rs6000's HTM
>> need, one HOST_WIDE_INT variable is quite enough, but it seems
>> good to have one auto_vec for scalability as I noticed some
>> targets have more than one HOST_WIDE_INT flag.  For now, this
>> target information collection is only for always_inline function,
>> function ipa_merge_fn_summary_after_inlining deals with target
>> information merging.
>>
>> The patch has been bootstrapped and regress-tested on
>> powerpc64le-linux-gnu Power9.
>>
>> Is it on the right track?
> 
> +  if (always_inline)
> +{
> +  cgraph_node *callee_node = cgraph_node::get (callee);
> +  if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL)
> +   {
> + if (dump_file)
> +   ipa_dump_fn_summary (dump_file, callee_node);
> + const vec &info =
> +   ipa_fn_summaries->get (callee_node)->target_info;
> + if (!info.is_empty ())
> +   always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM;
> +   }
> +
> +  caller_isa &= ~always_inline_safe_mask;
> +  callee_isa &= ~always_inline_safe_mask;
> +}
> 
> that's a bit convoluted but obviously the IPA info can be used for
> non-always_inline cases as well.
> 
> As said above the info can be produced for not always-inline functions
> as well, the usual case would be for LTO inlining across TUs compiled
> with different target options.  In your case the special -mcpu=power10
> TU would otherwise not be able to inline from a general -mcpu=power8 TU.
> 

Agree it can be extended to non-always_inline cases.  Since always_inline
is kind of user "forced" requirement and compiler emits error if it fails
to inline, while non-always_inline will have warning instead.  Considering
the scanning might be considered as costly for some big functions, I
guessed it might be good to start from always_inline as the first step.
But if different target options among LTO TUs is a common user case, I
think it's worth to extending it now.

> On the streaming side we possibly have to take care about the
> GPU offloading path where we likely want to avoid pushing host target
> bits to the GPU target in some way.
> 

I guess this comment is about lto_stream_offload_p, I just did some quick
checks, this flag seems to guard things into section offload_lto, while
the function summary has its own section, it seems fine? 

> Your case is specifically looking for HTM target builtins - for more general
> cases, like for example deciding whether we can inline across, say,
> -mlzcnt on x86 the scanning would have to include at least direct internal-fns
> mapping to optabs that could change.  With such inlining we might also
> work against heuristic tuning decisions based on the ISA availability
> making code (much) more expensive to expand without such ISA availability,
> but that wouldn't be a correctness issue then.

OK,I would assume the hook function parameter gimple* will be also enough for
this example.  :)

IMHO, even with this

Re: [PATCH] avoid transform at run until wrap comparesion

2021-09-02 Thread Bin.Cheng via Gcc-patches
On Thu, Sep 2, 2021 at 6:18 PM Richard Biener  wrote:
>
> On Thu, 2 Sep 2021, Jiufu Guo wrote:
>
> > When transform
> >   {iv0.base, iv0.step} LT/LE {iv1.base, iv1.step}
> > to
> >   {iv0.base, iv0.step - iv1.step} LT/LE {iv1.base, 0}
> >
> > There would be error if 'iv0.step - iv1.step' in negative,
> > for which means run until wrap/overflow.
> >
> > For example:
> >{1, +, 1} <= {4, +, 3} => {1, +, -2} <= {4, +, 0}
> >
> > This patch avoid this kind transform.
> >
> > Bootstrap and regtest pass on ppc64le.
> > Is this ok for trunk?
>
> This looks like PR100740, see the discussion starting at
> https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571570.html
>
> We seem to be at a dead end figuring what's exactly required
> to make the transform valid and I have my doubts that arguing
> with overflow we're not running in circles.
>
> My last attempt was
>
> +  if (code != NE_EXPR)
> +   {
> + if (TREE_CODE (step) != INTEGER_CST)
> +   return false;
> + if (!iv0->no_overflow || !iv1->no_overflow)
> +   return false;
> + /* The new IV does not overflow if the step has the same
> +sign and is less in magnitude.  */
> + if (tree_int_cst_sign_bit (iv0->step) != tree_int_cst_sign_bit
> (step)
> + || wi::geu_p (wi::abs (wi::to_wide (step)),
> +   wi::abs (wi::to_wide (iv0->step
> +   return false;
> +   }
>
> where your patch at least misses { 1, +, -1 } < { -3, + , -3 }
> transforming to { 1, +, +2 } < -3, thus a positive step but
> we're still iterating unti wrap?  That is, the sign of the
> combined step cannot really be the issue at hand.
>
> Bin argued my condition is too strict and I agreed, see
> https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574334.html
> which is the last mail in the thread sofar (still without a fix :/)
>
> Somewhere it was said that we eventually should simply put
> preconditions into ->assumptions rather than trying to
> check ->no_overflow and friends, not sure if that's really
I did some experiments which can fix the PRs, however it causes new
failures in graphite possibly because of missing cases.  However, I
didn't have time looking into graphite tests back in time.

Thanks,
bin
> applicable here.
>
> Richard.
>
> > BR.
> > Jiufu
> >
> > gcc/ChangeLog:
> >
> > 2021-09-02  Jiufu Guo  
> >
> >   PR tree-optimization/102131
> >   * tree-ssa-loop-niter.c (number_of_iterations_cond):
> >   Avoid transform until wrap condition
> >
> > gcc/testsuite/ChangeLog:
> >
> > 2021-09-02  Jiufu Guo  
> >
> >   PR tree-optimization/102131
> >   * gcc.dg/pr102131.c: New test.
> >
> > ---
> >  gcc/tree-ssa-loop-niter.c   | 18 +
> >  gcc/testsuite/gcc.dg/pr102131.c | 69 +
> >  2 files changed, 87 insertions(+)
> >  create mode 100644 gcc/testsuite/gcc.dg/pr102131.c
> >
> > diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> > index 7af92d1c893..52ce517af89 100644
> > --- a/gcc/tree-ssa-loop-niter.c
> > +++ b/gcc/tree-ssa-loop-niter.c
> > @@ -1866,6 +1866,24 @@ number_of_iterations_cond (class loop *loop,
> > || !iv0->no_overflow || !iv1->no_overflow))
> >   return false;
> >
> > +  /* GT/GE has been transformed to LT/LE already.
> > + cmp_code could be LT, LE or NE
> > +
> > + For LE/LT transform
> > + {iv0.base, iv0.step} LT/LE {iv1.base, iv1.step}
> > + to
> > + {iv0.base, iv0.step - iv1.step} LT/LE {iv1.base, 0}
> > + Negative iv0.step - iv1.step means decreasing until wrap,
> > + then the transform is not accurate.
> > +
> > + For example:
> > + {1, +, 1} <= {4, +, 3}
> > + is not same with
> > + {1, +, -2} <= {4, +, 0}
> > +   */
> > +  if ((code == LE_EXPR || code == LT_EXPR) && tree_int_cst_sign_bit 
> > (step))
> > + return false;
> > +
> >iv0->step = step;
> >if (!POINTER_TYPE_P (type))
> >   iv0->no_overflow = false;
> > diff --git a/gcc/testsuite/gcc.dg/pr102131.c 
> > b/gcc/testsuite/gcc.dg/pr102131.c
> > new file mode 100644
> > index 000..0fcfaa132da
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.dg/pr102131.c
> > @@ -0,0 +1,69 @@
> > +/* { dg-do run } */
> > +
> > +unsigned int a;
> > +int
> > +fun1 ()
> > +{
> > +  unsigned b = 0;
> > +  int c = 1;
> > +  for (; b < 3; b++)
> > +{
> > +  while (c < b)
> > + __builtin_abort ();
> > +  for (a = 0; a < 3; a++)
> > + c++;
> > +}
> > +  return 0;
> > +}
> > +
> > +unsigned b;
> > +int
> > +fun2 ()
> > +{
> > +  unsigned c = 0;
> > +  for (a = 0; a < 2; a++)
> > +for (b = 0; b < 2; b++)
> > +  if (++c < a)
> > + __builtin_abort ();
> > +  return 0;
> > +}
> > +
> > +int
> > +fun3 (void)
> > +{
> > +  unsigned a, b, c = 0;
> > +  for (a = 0; a < 10; a++)
> > +{
> > +  for (b = 0; b < 2; b++)
> > + {
> > +   c++;
> > +   if (c < a)
> > + {
> > +   __builtin_abort (

Re: [PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk

2021-09-02 Thread Richard Biener via Gcc-patches
On Thu, 2 Sep 2021, Xionghu Luo wrote:

> 
> 
> On 2021/9/2 16:50, Richard Biener wrote:
> > On Thu, 2 Sep 2021, Richard Biener wrote:
> > 
> >> On Thu, 2 Sep 2021, Xionghu Luo wrote:
> >>
> >>>
> >>>
> >>> On 2021/9/1 17:58, Richard Biener wrote:
>  This fixes the CFG walk order of fill_always_executed_in to use
>  RPO oder rather than the dominator based order computed by
>  get_loop_body_in_dom_order.  That fixes correctness issues with
>  unordered dominator children.
> 
>  The RPO order computed by rev_post_order_and_mark_dfs_back_seme in
>  its for-iteration mode is a good match for the algorithm.
> 
>  Xionghu, I've tried to only fix the CFG walk order issue and not
>  change anything else with this so we have a more correct base
>  to work against.  The code still walks inner loop bodies
>  up to loop depth times and thus is quadratic in the loop depth.
> 
>  Bootstrapped and tested on x86_64-unknown-linux-gnu, if you don't
>  have any comments I plan to push this and then revisit what we
>  were circling around.
> >>>
> >>> LGTM, thanks.
> >>
> >> I pushed it, thought again in the attempt to build a testcase and
> >> concluded I was wrong with the appearant mishandling of
> >> contains_call - get_loop_body_in_dom_order seems to be exactly
> >> correct for this specific case.  So I reverted the commit again.
> > 
> > And I figured what the
> > 
> >/* In a loop that is always entered we may proceed anyway.
> >   But record that we entered it and stop once we leave it.
> > */
> > 
> > comment was about.  The code was present before the fix for PR78185
> > and it was supposed to catch the case where the entered inner loop
> > is not finite.  Just as the testcase from PR78185 shows the
> > stopping was done too late when the exit block was already marked
> > as to be always executed.  A simpler fix for PR78185 would have been
> > to move
> > 
> >if (!flow_bb_inside_loop_p (inn_loop, bb))
> >  break;
> > 
> > before setting of last = bb.  In fact the installed fix was more
> > pessimistic than that given it terminated already when entering
> > a possibly infinite loop.  So we can improve that by doing
> > sth like which should also improve the situation for some of
> > the cases you were looking at?
> > 
> > What remains is that we continue to stop when entering a
> > not always executed loop:
> > 
> >if (bb->loop_father->header == bb)
> >  {
> >if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> >  break;
> 
> Yes.  This will cause blocks after inner loop missed to be check
> if they are actually ALWAYS_EXECUTED.   I am afraid O(N^2) is 
> inevitable here...

Yes.  What we can try is pre-computing whether a loop
has a call or an inner loop that might not terminate and then
when that's true for the loop to be entered continue to break;
but when not, skip processing that loop blocks (but we still
fill the blocks array, and we do need to do this in the order
for the loop we're processing ...).

So what I was thinking was to somehow embed the dominator
walk of get_loop_body_in_dom_order and instead of pre-recording
the above info (call, infinite loop) for loops, pre-record
it on the dominator tree so that we can ask "in any of our
dominator children, is there a call or an infinite loop" and
thus cut the dominator walk at loop header blocks that are
not dominating the outer loop latch ...

Of course the simplistic solution might be to simply do

   if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb)
   && ((loop_depth (bb->loop_father) - loop_depth (loop))
   > param_max_lim_loop_depth_lookahead)))
 break;

and thus limit the processing of conditionally executed inner
loops by relative depth ... as you say the actual processing
is unlikely to be the bottleneck for the degenerate cases
of a very deep nest of conditionally executed loops.

But still for this case get_loop_body_in_dom_order is
doing quadratic processing so we can also say that
another linear walk over the produced array does not
increase complexity.

> > 
> > that I can at this point only explain by possible efficiency
> > concerns?  Any better idea on that one?
> From experiment, early break from inner loop seems not cost shorter
> time than full inner loop walk.  I will take more precise
> measurement and larger data set on the function fill_always_executed_in_1
> if necessary.   
> 
> My previous v2 patch also tried to update inn_loop level by level
> when exiting from inn_loops, but it is proved to be  unnecessary
> but you worried about the dominance order by get_loop_body_in_dom_order.
> 
> > 
> > I'm going to test the patch below which improves the situation for
> > 
> > volatile int flag, bar;
> > double foo (double *valp)
> > {
> >double sum = 0;
> >for (int i = 0; i < 256; ++i)
> >  {
> >for (int j = 0; j < 256; ++j)
>

Re: [PATCH] PR fortran/93834 - [9/10/11/12 Regression] ICE in trans_caf_is_present, at fortran/trans-intrinsic.c:8469

2021-09-02 Thread Tobias Burnus

Hi Harald,

On 24.08.21 22:36, Harald Anlauf via Fortran wrote:

here's a pretty obvious one: we didn't properly check the arguments
for intrinsics when these had to be ALLOCATABLE and in the case that
argument was a coarray object.  Simple solution: just reuse a check
that was used for pointer etc.

Regtested on x86_64-pc-linux-gnu.  OK for mainline / backports?

...

  PR fortran/93834
  * check.c (allocatable_check): A coindexed array element is not an
  allocatable object.

First, I think the patch does not do what's on its label:

+  if (attr.codimension && gfc_is_coindexed (e))
+{


Consider:

type t
  integer, allocatable :: a
end type t

type(t) :: var[*]
print *, allocated(var%a)
print *, allocated(var[1]%a)
end

I think pointer has a likewise issue.
It should be sufficient to get rid of the
attr.codimension.

 * * *

Note regarding pointers: F2018:C1542 also does not
apply to intrinsics, cf. note below C1542 (quoted below).

 * * *

By itself, I do not see why accessing the value of a
coindexed variable is a problem – just (de)allocating it should
cause problems.

With pointers, undefined might be an additional issue.

Thus, while
 allocate( coindexed object )
has issues and is invalid – all refs to F2018:
 C950 (R932) An allocate-object shall not be a coindexed object.

I do not see why
 allocated ( ...)
should be invalid; in particular, just a a NULL value check is needed.

Likewise for
 associated  ( )

Besides exceptions for polymorphic allocatables, I find:

C1537 An actual argument that is a coindexed object shall not have a pointer 
ultimate component.

C1542 The actual argument corresponding to a dummy pointer shall not be a 
coindexed object.
Note 1: Constraint C1542 does not apply to any intrinsic procedure because an 
intrinsic procedure is defined in
terms of its actual arguments.

For allocatables, there is:
"If the actual argument is a coindexed object with an allocatable ultimate
component, the dummy argument shall have the INTENT (IN) or the VALUE 
attribute."


Tobias

-
Siemens Electronic Design Automation GmbH; Anschrift: Arnulfstraße 201, 80634 
München; Gesellschaft mit beschränkter Haftung; Geschäftsführer: Thomas 
Heurung, Frank Thürauf; Sitz der Gesellschaft: München; Registergericht 
München, HRB 106955


Re: [PATCH] avoid transform at run until wrap comparesion

2021-09-02 Thread Richard Biener via Gcc-patches
On Thu, 2 Sep 2021, Jiufu Guo wrote:

> When transform
>   {iv0.base, iv0.step} LT/LE {iv1.base, iv1.step}
> to
>   {iv0.base, iv0.step - iv1.step} LT/LE {iv1.base, 0}
> 
> There would be error if 'iv0.step - iv1.step' in negative,
> for which means run until wrap/overflow.
> 
> For example:
>{1, +, 1} <= {4, +, 3} => {1, +, -2} <= {4, +, 0}
> 
> This patch avoid this kind transform.
> 
> Bootstrap and regtest pass on ppc64le.
> Is this ok for trunk?

This looks like PR100740, see the discussion starting at
https://gcc.gnu.org/pipermail/gcc-patches/2021-June/571570.html

We seem to be at a dead end figuring what's exactly required
to make the transform valid and I have my doubts that arguing
with overflow we're not running in circles.

My last attempt was

+  if (code != NE_EXPR)
+   {
+ if (TREE_CODE (step) != INTEGER_CST)
+   return false;
+ if (!iv0->no_overflow || !iv1->no_overflow)
+   return false;
+ /* The new IV does not overflow if the step has the same
+sign and is less in magnitude.  */
+ if (tree_int_cst_sign_bit (iv0->step) != tree_int_cst_sign_bit 
(step)
+ || wi::geu_p (wi::abs (wi::to_wide (step)),
+   wi::abs (wi::to_wide (iv0->step
+   return false;
+   }

where your patch at least misses { 1, +, -1 } < { -3, + , -3 }
transforming to { 1, +, +2 } < -3, thus a positive step but
we're still iterating unti wrap?  That is, the sign of the
combined step cannot really be the issue at hand.

Bin argued my condition is too strict and I agreed, see
https://gcc.gnu.org/pipermail/gcc-patches/2021-July/574334.html
which is the last mail in the thread sofar (still without a fix :/)

Somewhere it was said that we eventually should simply put
preconditions into ->assumptions rather than trying to
check ->no_overflow and friends, not sure if that's really
applicable here.

Richard.

> BR.
> Jiufu
> 
> gcc/ChangeLog:
> 
> 2021-09-02  Jiufu Guo  
> 
>   PR tree-optimization/102131
>   * tree-ssa-loop-niter.c (number_of_iterations_cond):
>   Avoid transform until wrap condition
> 
> gcc/testsuite/ChangeLog:
> 
> 2021-09-02  Jiufu Guo  
> 
>   PR tree-optimization/102131
>   * gcc.dg/pr102131.c: New test.
> 
> ---
>  gcc/tree-ssa-loop-niter.c   | 18 +
>  gcc/testsuite/gcc.dg/pr102131.c | 69 +
>  2 files changed, 87 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.dg/pr102131.c
> 
> diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
> index 7af92d1c893..52ce517af89 100644
> --- a/gcc/tree-ssa-loop-niter.c
> +++ b/gcc/tree-ssa-loop-niter.c
> @@ -1866,6 +1866,24 @@ number_of_iterations_cond (class loop *loop,
> || !iv0->no_overflow || !iv1->no_overflow))
>   return false;
>  
> +  /* GT/GE has been transformed to LT/LE already.
> + cmp_code could be LT, LE or NE
> +
> + For LE/LT transform
> + {iv0.base, iv0.step} LT/LE {iv1.base, iv1.step}
> + to
> + {iv0.base, iv0.step - iv1.step} LT/LE {iv1.base, 0}
> + Negative iv0.step - iv1.step means decreasing until wrap,
> + then the transform is not accurate.
> +
> + For example:
> + {1, +, 1} <= {4, +, 3}
> + is not same with
> + {1, +, -2} <= {4, +, 0}
> +   */
> +  if ((code == LE_EXPR || code == LT_EXPR) && tree_int_cst_sign_bit 
> (step))
> + return false;
> +
>iv0->step = step;
>if (!POINTER_TYPE_P (type))
>   iv0->no_overflow = false;
> diff --git a/gcc/testsuite/gcc.dg/pr102131.c b/gcc/testsuite/gcc.dg/pr102131.c
> new file mode 100644
> index 000..0fcfaa132da
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/pr102131.c
> @@ -0,0 +1,69 @@
> +/* { dg-do run } */
> +
> +unsigned int a;
> +int
> +fun1 ()
> +{
> +  unsigned b = 0;
> +  int c = 1;
> +  for (; b < 3; b++)
> +{
> +  while (c < b)
> + __builtin_abort ();
> +  for (a = 0; a < 3; a++)
> + c++;
> +}
> +  return 0;
> +}
> +
> +unsigned b;
> +int
> +fun2 ()
> +{
> +  unsigned c = 0;
> +  for (a = 0; a < 2; a++)
> +for (b = 0; b < 2; b++)
> +  if (++c < a)
> + __builtin_abort ();
> +  return 0;
> +}
> +
> +int
> +fun3 (void)
> +{
> +  unsigned a, b, c = 0;
> +  for (a = 0; a < 10; a++)
> +{
> +  for (b = 0; b < 2; b++)
> + {
> +   c++;
> +   if (c < a)
> + {
> +   __builtin_abort ();
> + }
> + }
> +}
> +  return 0;
> +}
> +
> +int
> +fun4 ()
> +{
> +  unsigned i;
> +  for (i = 0; i < 3; ++i)
> +{
> +  if (i > i * 2)
> + __builtin_abort ();
> +}
> +  return 0;
> +}
> +
> +int
> +main ()
> +{
> +  fun1 ();
> +  fun2 ();
> +  fun3 ();
> +  fun4 ();
> +  return 0;
> +}
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


Re: [PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk

2021-09-02 Thread Xionghu Luo via Gcc-patches



On 2021/9/2 16:50, Richard Biener wrote:
> On Thu, 2 Sep 2021, Richard Biener wrote:
> 
>> On Thu, 2 Sep 2021, Xionghu Luo wrote:
>>
>>>
>>>
>>> On 2021/9/1 17:58, Richard Biener wrote:
 This fixes the CFG walk order of fill_always_executed_in to use
 RPO oder rather than the dominator based order computed by
 get_loop_body_in_dom_order.  That fixes correctness issues with
 unordered dominator children.

 The RPO order computed by rev_post_order_and_mark_dfs_back_seme in
 its for-iteration mode is a good match for the algorithm.

 Xionghu, I've tried to only fix the CFG walk order issue and not
 change anything else with this so we have a more correct base
 to work against.  The code still walks inner loop bodies
 up to loop depth times and thus is quadratic in the loop depth.

 Bootstrapped and tested on x86_64-unknown-linux-gnu, if you don't
 have any comments I plan to push this and then revisit what we
 were circling around.
>>>
>>> LGTM, thanks.
>>
>> I pushed it, thought again in the attempt to build a testcase and
>> concluded I was wrong with the appearant mishandling of
>> contains_call - get_loop_body_in_dom_order seems to be exactly
>> correct for this specific case.  So I reverted the commit again.
> 
> And I figured what the
> 
>/* In a loop that is always entered we may proceed anyway.
>   But record that we entered it and stop once we leave it.
> */
> 
> comment was about.  The code was present before the fix for PR78185
> and it was supposed to catch the case where the entered inner loop
> is not finite.  Just as the testcase from PR78185 shows the
> stopping was done too late when the exit block was already marked
> as to be always executed.  A simpler fix for PR78185 would have been
> to move
> 
>if (!flow_bb_inside_loop_p (inn_loop, bb))
>  break;
> 
> before setting of last = bb.  In fact the installed fix was more
> pessimistic than that given it terminated already when entering
> a possibly infinite loop.  So we can improve that by doing
> sth like which should also improve the situation for some of
> the cases you were looking at?
> 
> What remains is that we continue to stop when entering a
> not always executed loop:
> 
>if (bb->loop_father->header == bb)
>  {
>if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>  break;

Yes.  This will cause blocks after inner loop missed to be check
if they are actually ALWAYS_EXECUTED.   I am afraid O(N^2) is 
inevitable here...

> 
> that I can at this point only explain by possible efficiency
> concerns?  Any better idea on that one?
>From experiment, early break from inner loop seems not cost shorter
time than full inner loop walk.  I will take more precise
measurement and larger data set on the function fill_always_executed_in_1
if necessary.   

My previous v2 patch also tried to update inn_loop level by level
when exiting from inn_loops, but it is proved to be  unnecessary
but you worried about the dominance order by get_loop_body_in_dom_order.

> 
> I'm going to test the patch below which improves the situation for
> 
> volatile int flag, bar;
> double foo (double *valp)
> {
>double sum = 0;
>for (int i = 0; i < 256; ++i)
>  {
>for (int j = 0; j < 256; ++j)
>  bar = flag;
>if (flag)
>  sum += 1.;
>sum += *valp;
>  }
>return sum;
> }

The patch still fails to handle cases like this:


struct X { int i; int j; int k;};
volatile int m;

void bar (struct X *x, int n, int l, int k)
{
  for (int i = 0; i < l; i++)
{
 if (k)
for (int j = 0; j < l; j++)
  {
if (m)
  x->i = m;
else
  x->i = 1 - m;

int *r = &x->k;
int tem2 = *r;
x->k += tem2 * j;
}

  x->i = m;
}
}

x->i is still not marked ALWAYS_EXECUTED for outer loop.


> 
> Thanks,
> Richard.
> 
> diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
> index d9f75d5025e..f0c93d6a882 100644
> --- a/gcc/tree-ssa-loop-im.c
> +++ b/gcc/tree-ssa-loop-im.c
> @@ -3044,23 +3044,27 @@ fill_always_executed_in_1 (class loop *loop,
> sbitmap contains_call)
>edge_iterator ei;
>bb = bbs[i];
>   
> + if (!flow_bb_inside_loop_p (inn_loop, bb))
> +   {
> + /* When we are leaving a possibly infinite inner loop
> +we have to stop processing.  */
> + if (!finite_loop_p (inn_loop))
> +   break;
> + /* If the loop was finite we can continue with processing
> +the loop we exited to.  */
> + inn_loop = bb->loop_father;
> +   }
> +
>if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
>  last = bb;
>   
>if (bitmap_bit_p (contains_call, bb->index))
>  break;
>   

[PATCH] avoid transform at run until wrap comparesion

2021-09-02 Thread Jiufu Guo via Gcc-patches
When transform
  {iv0.base, iv0.step} LT/LE {iv1.base, iv1.step}
to
  {iv0.base, iv0.step - iv1.step} LT/LE {iv1.base, 0}

There would be error if 'iv0.step - iv1.step' in negative,
for which means run until wrap/overflow.

For example:
   {1, +, 1} <= {4, +, 3} => {1, +, -2} <= {4, +, 0}

This patch avoid this kind transform.

Bootstrap and regtest pass on ppc64le.
Is this ok for trunk?

BR.
Jiufu

gcc/ChangeLog:

2021-09-02  Jiufu Guo  

PR tree-optimization/102131
* tree-ssa-loop-niter.c (number_of_iterations_cond):
Avoid transform until wrap condition

gcc/testsuite/ChangeLog:

2021-09-02  Jiufu Guo  

PR tree-optimization/102131
* gcc.dg/pr102131.c: New test.

---
 gcc/tree-ssa-loop-niter.c   | 18 +
 gcc/testsuite/gcc.dg/pr102131.c | 69 +
 2 files changed, 87 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/pr102131.c

diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
index 7af92d1c893..52ce517af89 100644
--- a/gcc/tree-ssa-loop-niter.c
+++ b/gcc/tree-ssa-loop-niter.c
@@ -1866,6 +1866,24 @@ number_of_iterations_cond (class loop *loop,
  || !iv0->no_overflow || !iv1->no_overflow))
return false;
 
+  /* GT/GE has been transformed to LT/LE already.
+   cmp_code could be LT, LE or NE
+
+   For LE/LT transform
+   {iv0.base, iv0.step} LT/LE {iv1.base, iv1.step}
+   to
+   {iv0.base, iv0.step - iv1.step} LT/LE {iv1.base, 0}
+   Negative iv0.step - iv1.step means decreasing until wrap,
+   then the transform is not accurate.
+
+   For example:
+   {1, +, 1} <= {4, +, 3}
+   is not same with
+   {1, +, -2} <= {4, +, 0}
+   */
+  if ((code == LE_EXPR || code == LT_EXPR) && tree_int_cst_sign_bit (step))
+   return false;
+
   iv0->step = step;
   if (!POINTER_TYPE_P (type))
iv0->no_overflow = false;
diff --git a/gcc/testsuite/gcc.dg/pr102131.c b/gcc/testsuite/gcc.dg/pr102131.c
new file mode 100644
index 000..0fcfaa132da
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr102131.c
@@ -0,0 +1,69 @@
+/* { dg-do run } */
+
+unsigned int a;
+int
+fun1 ()
+{
+  unsigned b = 0;
+  int c = 1;
+  for (; b < 3; b++)
+{
+  while (c < b)
+   __builtin_abort ();
+  for (a = 0; a < 3; a++)
+   c++;
+}
+  return 0;
+}
+
+unsigned b;
+int
+fun2 ()
+{
+  unsigned c = 0;
+  for (a = 0; a < 2; a++)
+for (b = 0; b < 2; b++)
+  if (++c < a)
+   __builtin_abort ();
+  return 0;
+}
+
+int
+fun3 (void)
+{
+  unsigned a, b, c = 0;
+  for (a = 0; a < 10; a++)
+{
+  for (b = 0; b < 2; b++)
+   {
+ c++;
+ if (c < a)
+   {
+ __builtin_abort ();
+   }
+   }
+}
+  return 0;
+}
+
+int
+fun4 ()
+{
+  unsigned i;
+  for (i = 0; i < 3; ++i)
+{
+  if (i > i * 2)
+   __builtin_abort ();
+}
+  return 0;
+}
+
+int
+main ()
+{
+  fun1 ();
+  fun2 ();
+  fun3 ();
+  fun4 ();
+  return 0;
+}
-- 
2.17.1



[PATCH] Refine fix for PR78185, improve LIM for code after inner loops

2021-09-02 Thread Richard Biener via Gcc-patches
This refines the fix for PR78185 after understanding that the code
regarding to the comment 'In a loop that is always entered we may
proceed anyway.  But record that we entered it and stop once we leave
it.' was supposed to protect us from leaving possibly infinite inner
loops.  The simpler fix of moving the misplaced stopping code
can then be refined to continue processing when the exited inner
loop is finite, improving invariant motion for cases like in the
added testcase.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-09-02  Richard Biener  

* tree-ssa-loop-im.c (fill_always_executed_in_1): Refine
fix for PR78185 and continue processing when leaving
finite inner loops.

* gcc.dg/tree-ssa/ssa-lim-16.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c | 19 +
 gcc/tree-ssa-loop-im.c | 33 +-
 2 files changed, 38 insertions(+), 14 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c
new file mode 100644
index 000..741582b745f
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-lim-16.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-lim2-details" } */
+
+volatile int flag, bar;
+double foo (double *valp)
+{
+  double sum = 0;
+  for (int i = 0; i < 256; ++i)
+{
+  for (int j = 0; j < 256; ++j)
+bar = flag;
+  if (flag)
+sum += 1.;
+  sum += *valp; // we should move the load of *valp out of the loop
+}
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "Moving statement" 1 "lim2" } } */
diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index d9f75d5025e..01f3fc2f7f0 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -3044,23 +3044,27 @@ fill_always_executed_in_1 (class loop *loop, sbitmap 
contains_call)
  edge_iterator ei;
  bb = bbs[i];
 
+ if (!flow_bb_inside_loop_p (inn_loop, bb))
+   {
+ /* When we are leaving a possibly infinite inner loop
+we have to stop processing.  */
+ if (!finite_loop_p (inn_loop))
+   break;
+ /* If the loop was finite we can continue with processing
+the loop we exited to.  */
+ inn_loop = bb->loop_father;
+   }
+
  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
last = bb;
 
  if (bitmap_bit_p (contains_call, bb->index))
break;
 
+ /* If LOOP exits from this BB stop processing.  */
  FOR_EACH_EDGE (e, ei, bb->succs)
-   {
- /* If there is an exit from this BB.  */
- if (!flow_bb_inside_loop_p (loop, e->dest))
-   break;
- /* Or we enter a possibly non-finite loop.  */
- if (flow_loop_nested_p (bb->loop_father,
- e->dest->loop_father)
- && ! finite_loop_p (e->dest->loop_father))
-   break;
-   }
+   if (!flow_bb_inside_loop_p (loop, e->dest))
+ break;
  if (e)
break;
 
@@ -3069,22 +3073,23 @@ fill_always_executed_in_1 (class loop *loop, sbitmap 
contains_call)
  if (bb->flags & BB_IRREDUCIBLE_LOOP)
break;
 
- if (!flow_bb_inside_loop_p (inn_loop, bb))
-   break;
-
  if (bb->loop_father->header == bb)
{
  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
break;
 
  /* In a loop that is always entered we may proceed anyway.
-But record that we entered it and stop once we leave it.  */
+But record that we entered it and stop once we leave it
+since it might not be finite.  */
  inn_loop = bb->loop_father;
}
}
 
   while (1)
{
+ if (dump_enabled_p ())
+   dump_printf (MSG_NOTE, "BB %d is always executed in loop %d\n",
+last->index, loop->num);
  SET_ALWAYS_EXECUTED_IN (last, loop);
  if (last == loop->header)
break;
-- 
2.31.1


Re: [RFC/PATCH] ipa-inline: Add target info into fn summary [PR102059]

2021-09-02 Thread Richard Biener via Gcc-patches
On Wed, Sep 1, 2021 at 9:02 AM Kewen.Lin  wrote:
>
> Hi!
>
> Power ISA 2.07 (Power8) introduces transactional memory feature
> but ISA3.1 (Power10) removes it.  It exposes one troublesome
> issue as PR102059 shows.  Users define some function with
> target pragma cpu=power10 then it calls one function with
> attribute always_inline which inherits command line option
> cpu=power8 which enables HTM implicitly.  The current isa_flags
> check doesn't allow this inlining due to "target specific
> option mismatch" and error mesasge is emitted.
>
> Normally, the callee function isn't intended to exploit HTM
> feature, but the default flag setting make it look it has.
> As Richi raised in the PR, we have fp_expressions flag in
> function summary, and allow us to check the function actually
> contains any floating point expressions to avoid overkill.
> So this patch follows the similar idea but is more target
> specific, for this rs6000 port specific requirement on HTM
> feature check, we would like to check rs6000 specific HTM
> built-in functions and inline assembly, it allows targets
> to do their own customized checks and updates.
>
> It introduces two target hooks need_ipa_fn_target_info and
> update_ipa_fn_target_info.  The former allows target to do
> some previous check and decides to collect target specific
> information for this function or not.  For some special case,
> it can predict the analysis result and push it early without
> any scannings.  The latter allows the analyze_function_body
> to pass gimple stmts down just like fp_expressions handlings,
> target can do its own tricks.  I put them as one hook initially
> with one boolean to indicates whether it's initial time, but
> the code looks a bit ugly, to separate them seems to have
> better readability.
>
> To make it simple, this patch uses HOST_WIDE_INT to record the
> flags just like what we use for isa_flags.  For rs6000's HTM
> need, one HOST_WIDE_INT variable is quite enough, but it seems
> good to have one auto_vec for scalability as I noticed some
> targets have more than one HOST_WIDE_INT flag.  For now, this
> target information collection is only for always_inline function,
> function ipa_merge_fn_summary_after_inlining deals with target
> information merging.
>
> The patch has been bootstrapped and regress-tested on
> powerpc64le-linux-gnu Power9.
>
> Is it on the right track?

+  if (always_inline)
+{
+  cgraph_node *callee_node = cgraph_node::get (callee);
+  if (ipa_fn_summaries && ipa_fn_summaries->get (callee_node) != NULL)
+   {
+ if (dump_file)
+   ipa_dump_fn_summary (dump_file, callee_node);
+ const vec &info =
+   ipa_fn_summaries->get (callee_node)->target_info;
+ if (!info.is_empty ())
+   always_inline_safe_mask |= ~info[0] & OPTION_MASK_HTM;
+   }
+
+  caller_isa &= ~always_inline_safe_mask;
+  callee_isa &= ~always_inline_safe_mask;
+}

that's a bit convoluted but obviously the IPA info can be used for
non-always_inline cases as well.

As said above the info can be produced for not always-inline functions
as well, the usual case would be for LTO inlining across TUs compiled
with different target options.  In your case the special -mcpu=power10
TU would otherwise not be able to inline from a general -mcpu=power8 TU.

On the streaming side we possibly have to take care about the
GPU offloading path where we likely want to avoid pushing host target
bits to the GPU target in some way.

Your case is specifically looking for HTM target builtins - for more general
cases, like for example deciding whether we can inline across, say,
-mlzcnt on x86 the scanning would have to include at least direct internal-fns
mapping to optabs that could change.  With such inlining we might also
work against heuristic tuning decisions based on the ISA availability
making code (much) more expensive to expand without such ISA availability,
but that wouldn't be a correctness issue then.

Otherwise the overall bits look OK but I'll leave the details to our IPA folks.

Thanks,
Richard.

>
> Any comments are highly appreciated!
>
> BR,
> Kewen
> --
> gcc/ChangeLog:
>
> PR ipa/102059
> * config/rs6000/rs6000-call.c (rs6000_builtin_mask_set_p): New
> function.
> * config/rs6000/rs6000-internal.h (rs6000_builtin_mask_set_p): New
> declare.
> * config/rs6000/rs6000.c (TARGET_NEED_IPA_FN_TARGET_INFO): New macro.
> (TARGET_UPDATE_IPA_FN_TARGET_INFO): Likewise.
> (rs6000_need_ipa_fn_target_info): New function.
> (rs6000_update_ipa_fn_target_info): Likewise.
> (rs6000_can_inline_p): Adjust for ipa function summary target info.
> * ipa-fnsummary.c (ipa_dump_fn_summary): Adjust for ipa function
> summary target info.
> (analyze_function_body): Adjust for ipa function summary target
> info and call hook rs6000_need_ipa_fn_target_info and
> rs6000_update_ipa_fn_t

Re: [PATCH] Set bound/cmp/control for until wrap loop.

2021-09-02 Thread Jiufu Guo via Gcc-patches
Richard Biener  writes:

> On Tue, 31 Aug 2021, guojiufu wrote:
>
>> On 2021-08-30 20:02, Richard Biener wrote:
>> > On Mon, 30 Aug 2021, guojiufu wrote:
>> > 
>> >> On 2021-08-30 14:15, Jiufu Guo wrote:
>> >> > Hi,
>> >> >
>> >> > In patch r12-3136, niter->control, niter->bound and niter->cmp are
>> >> > derived from number_of_iterations_lt.  While for 'until wrap condition',
>> >> > the calculation in number_of_iterations_lt is not align the requirements
>> >> > on the define of them and requirements in determine_exit_conditions.
>> >> >
>> >> > This patch calculate niter->control, niter->bound and niter->cmp in
>> >> > number_of_iterations_until_wrap.
>> >> >
>> >> > The ICEs in the PR are pass with this patch.
>> >> > Bootstrap and reg-tests pass on ppc64/ppc64le and x86.
>> >> > Is this ok for trunk?
>> >> >
>> >> > BR.
>> >> > Jiufu Guo
>> >> >
>> >> Add ChangeLog:
>> >> gcc/ChangeLog:
>> >> 
>> >> 2021-08-30  Jiufu Guo  
>> >> 
>> >> PR tree-optimization/102087
>> >> * tree-ssa-loop-niter.c (number_of_iterations_until_wrap):
>> >> Set bound/cmp/control for niter.
>> >> 
>> >> gcc/testsuite/ChangeLog:
>> >> 
>> >> 2021-08-30  Jiufu Guo  
>> >> 
>> >> PR tree-optimization/102087
>> >> * gcc.dg/vect/pr101145_3.c: Update tests.
>> >> * gcc.dg/pr102087.c: New test.
>> >> 
>> >> > ---
>> >> >  gcc/tree-ssa-loop-niter.c  | 14 +-
>> >> >  gcc/testsuite/gcc.dg/pr102087.c| 25 +
>> >> >  gcc/testsuite/gcc.dg/vect/pr101145_3.c |  4 +++-
>> >> >  3 files changed, 41 insertions(+), 2 deletions(-)
>> >> >  create mode 100644 gcc/testsuite/gcc.dg/pr102087.c
>> >> >
>> >> > diff --git a/gcc/tree-ssa-loop-niter.c b/gcc/tree-ssa-loop-niter.c
>> >> > index 7af92d1c893..747f04d3ce0 100644
>> >> > --- a/gcc/tree-ssa-loop-niter.c
>> >> > +++ b/gcc/tree-ssa-loop-niter.c
>> >> > @@ -1482,7 +1482,7 @@ number_of_iterations_until_wrap (class loop *,
>> >> > tree type, affine_iv *iv0,
>> >> >  affine_iv *iv1, class tree_niter_desc 
>> >> > *niter)
>> >> >  {
>> >> >tree niter_type = unsigned_type_for (type);
>> >> > -  tree step, num, assumptions, may_be_zero;
>> >> > +  tree step, num, assumptions, may_be_zero, span;
>> >> >wide_int high, low, max, min;
>> >> >
>> >> >may_be_zero = fold_build2 (LE_EXPR, boolean_type_node, iv1->base,
>> >> > iv0->base);
>> >> > @@ -1513,6 +1513,8 @@ number_of_iterations_until_wrap (class loop *,
>> >> > tree type, affine_iv *iv0,
>> >> >   low = wi::to_wide (iv0->base);
>> >> >  else
>> >> > low = min;
>> >> > +
>> >> > +  niter->control = *iv1;
>> >> >  }
>> >> >/* {base, -C} < n.  */
>> >> >else if (tree_int_cst_sign_bit (iv0->step) && integer_zerop
>> >> > (iv1->step))
>> >> > @@ -1533,6 +1535,8 @@ number_of_iterations_until_wrap (class loop *,
>> >> > tree type, affine_iv *iv0,
>> >> >   high = wi::to_wide (iv1->base);
>> >> >  else
>> >> > high = max;
>> >> > +
>> >> > +  niter->control = *iv0;
>> >> >  }
>> >> >else
>> >> >  return false;
>> > 
>> > it looks like the above two should already be in effect from the
>> > caller (guarding with integer_nozerop)?
>> 
>> I add them just because set these fields in one function.
>> Yes, they have been set in caller already,  I could remove them here.
>> 
>> > 
>> >> > @@ -1556,6 +1560,14 @@ number_of_iterations_until_wrap (class loop *,
>> >> > tree type, affine_iv *iv0,
>> >> >niter->assumptions, assumptions);
>> >> >
>> >> >niter->control.no_overflow = false;
>> >> > +  niter->control.base = fold_build2 (MINUS_EXPR, niter_type,
>> >> > +niter->control.base,
>> >> > niter->control.step);
>> > 
>> > how do we know IVn - STEP doesn't already wrap?
>> 
>> The last IV value is just cross the max/min value of the type
>> at the last iteration,  then IVn - STEP is the nearest value
>> to max(or min) and not wrap.
>> 
>> > A comment might be
>> > good to explain you're turning the simplified exit condition into
>> > 
>> >{ IVbase - STEP, +, STEP } != niter * STEP + (IVbase - STEP)
>> > 
>> > which, when mathematically looking at it makes me wonder why there's
>> > the seemingly redundant '- STEP' term?  Also is NE_EXPR really
>> > correct since STEP might be not 1?  Only for non equality compares
I may miss the question in previous mail.  If STEP is not 1, NE_EXPR
Would be still correct, because the niter is an integer, and the then
after 'niter' iterations, the value should meet 'base + niter * STEP'.

BR,
Jiufu.
>> > the '- STEP' should matter?
>> 
>> I need to add comments for this.  This is a little tricky.
>> The last value of the original IV just cross max/min at most one STEP,
>> at there wrapping already happen.
>> Using "{IVbase, +, STEP} != niter * STEP + IVbase" is not wrong
>> in the aspect of exit condition.
>> 
>> But this would not work well with existing code:
>> like deter

Re: [ARM] PR66791: Replace builtins for vdup_n and vmov_n intrinsics

2021-09-02 Thread Christophe Lyon via Gcc-patches
On Tue, Aug 24, 2021 at 10:17 AM Kyrylo Tkachov 
wrote:

>
>
> > -Original Message-
> > From: Prathamesh Kulkarni 
> > Sent: 24 August 2021 09:01
> > To: Christophe Lyon 
> > Cc: Kyrylo Tkachov ; gcc Patches  > patc...@gcc.gnu.org>
> > Subject: Re: [ARM] PR66791: Replace builtins for vdup_n and vmov_n
> > intrinsics
> >
> > On Tue, 17 Aug 2021 at 11:55, Prathamesh Kulkarni
> >  wrote:
> > >
> > > On Thu, 12 Aug 2021 at 19:04, Christophe Lyon
> > >  wrote:
> > > >
> > > >
> > > >
> > > > On Thu, Aug 12, 2021 at 1:54 PM Prathamesh Kulkarni
> >  wrote:
> > > >>
> > > >> On Wed, 11 Aug 2021 at 22:23, Christophe Lyon
> > > >>  wrote:
> > > >> >
> > > >> >
> > > >> >
> > > >> > On Thu, Jun 24, 2021 at 6:29 PM Kyrylo Tkachov via Gcc-patches
>  > patc...@gcc.gnu.org> wrote:
> > > >> >>
> > > >> >>
> > > >> >>
> > > >> >> > -Original Message-
> > > >> >> > From: Prathamesh Kulkarni 
> > > >> >> > Sent: 24 June 2021 12:11
> > > >> >> > To: gcc Patches ; Kyrylo Tkachov
> > > >> >> > 
> > > >> >> > Subject: [ARM] PR66791: Replace builtins for vdup_n and vmov_n
> > intrinsics
> > > >> >> >
> > > >> >> > Hi,
> > > >> >> > This patch replaces builtins for vdup_n and vmov_n.
> > > >> >> > The patch results in regression for pr51534.c.
> > > >> >> > Consider following function:
> > > >> >> >
> > > >> >> > uint8x8_t f1 (uint8x8_t a) {
> > > >> >> >   return vcgt_u8(a, vdup_n_u8(0));
> > > >> >> > }
> > > >> >> >
> > > >> >> > code-gen before patch:
> > > >> >> > f1:
> > > >> >> > vmov.i32  d16, #0  @ v8qi
> > > >> >> > vcgt.u8 d0, d0, d16
> > > >> >> > bx lr
> > > >> >> >
> > > >> >> > code-gen after patch:
> > > >> >> > f1:
> > > >> >> > vceq.i8 d0, d0, #0
> > > >> >> > vmvnd0, d0
> > > >> >> > bx lr
> > > >> >> >
> > > >> >> > I am not sure which one is better tho ?
> > > >> >>
> > > >> >
> > > >> > Hi Prathamesh,
> > > >> >
> > > >> > This patch introduces a regression on non-hardfp configs (eg arm-
> > linux-gnueabi or arm-eabi):
> > > >> > FAIL:  gcc:gcc.target/arm/arm.exp=gcc.target/arm/pr51534.c scan-
> > assembler-times vmov.i32[ \t]+[dD][0-9]+, #0x 3
> > > >> > FAIL:  gcc:gcc.target/arm/arm.exp=gcc.target/arm/pr51534.c scan-
> > assembler-times vmov.i32[ \t]+[qQ][0-9]+, #4294967295 3
> > > >> >
> > > >> > Can you fix this?
> > > >> The issue is, for following test:
> > > >>
> > > >> #include 
> > > >>
> > > >> uint8x8_t f1 (uint8x8_t a) {
> > > >>   return vcge_u8(a, vdup_n_u8(0));
> > > >> }
> > > >>
> > > >> armhf code-gen:
> > > >> f1:
> > > >> vmov.i32  d0, #0x  @ v8qi
> > > >> bxlr
> > > >>
> > > >> arm softfp code-gen:
> > > >> f1:
> > > >> mov r0, #-1
> > > >> mov r1, #-1
> > > >> bx  lr
> > > >>
> > > >> The code-gen for both is same upto split2 pass:
> > > >>
> > > >> (insn 10 6 11 2 (set (reg/i:V8QI 16 s0)
> > > >> (const_vector:V8QI [
> > > >> (const_int -1 [0x]) repeated x8
> > > >> ])) "foo.c":5:1 1052 {*neon_movv8qi}
> > > >>  (expr_list:REG_EQUAL (const_vector:V8QI [
> > > >> (const_int -1 [0x]) repeated x8
> > > >> ])
> > > >> (nil)))
> > > >> (insn 11 10 13 2 (use (reg/i:V8QI 16 s0)) "foo.c":5:1 -1
> > > >>  (nil))
> > > >>
> > > >> and for softfp target, split2 pass splits the assignment to r0 and
> r1:
> > > >>
> > > >> (insn 15 6 16 2 (set (reg:SI 0 r0)
> > > >> (const_int -1 [0x])) "foo.c":5:1 740
> > {*thumb2_movsi_vfp}
> > > >>  (nil))
> > > >> (insn 16 15 11 2 (set (reg:SI 1 r1 [+4 ])
> > > >> (const_int -1 [0x])) "foo.c":5:1 740
> > {*thumb2_movsi_vfp}
> > > >>  (nil))
> > > >> (insn 11 16 13 2 (use (reg/i:V8QI 0 r0)) "foo.c":5:1 -1
> > > >>  (nil))
> > > >>
> > > >> I suppose we could use a dg-scan for r[0-9]+, #-1 for softfp
> targets ?
> > > >>
> > > > Yes, probably, or try with check-function-bodies.
> > > Hi,
> > > Sorry for the late response. Does the attached patch look OK ?
> > ping https://gcc.gnu.org/pipermail/gcc-patches/2021-August/577532.html
>
> Ok.
>


Sorry Prathamesh, this does not quite work. See
https://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/r12-3294-g7a6f40d0452ec76e126c2612dcfa32f3c73e2315/report-build-info.html
(red cells in the gcc column)

Can you have a look?

Thanks

Christophe

Thanks,
> Kyrill
>
> >
> > Thanks,
> > Prathamesh
> > >
> > > Thanks,
> > > Prathamesh
> > > >
> > > >  Christophe
> > > >
> > > >> Thanks,
> > > >> Prathamesh
> > > >> >
> > > >> > Thanks
> > > >> >
> > > >> > Christophe
> > > >> >
> > > >> >
> > > >> >>
> > > >> >> I think they're equivalent in practice, in any case the patch
> itself is
> > good (move away from RTL builtins).
> > > >> >> Ok.
> > > >> >> Thanks,
> > > >> >> Kyrill
> > > >> >>
> > > >> >> >
> > > >> >> > Also, this patch regressed bf16_dup.c on arm-linux-gnueabi,

Re: [PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk

2021-09-02 Thread Richard Biener via Gcc-patches
On Thu, 2 Sep 2021, Richard Biener wrote:

> On Thu, 2 Sep 2021, Xionghu Luo wrote:
> 
> > 
> > 
> > On 2021/9/1 17:58, Richard Biener wrote:
> > > This fixes the CFG walk order of fill_always_executed_in to use
> > > RPO oder rather than the dominator based order computed by
> > > get_loop_body_in_dom_order.  That fixes correctness issues with
> > > unordered dominator children.
> > > 
> > > The RPO order computed by rev_post_order_and_mark_dfs_back_seme in
> > > its for-iteration mode is a good match for the algorithm.
> > > 
> > > Xionghu, I've tried to only fix the CFG walk order issue and not
> > > change anything else with this so we have a more correct base
> > > to work against.  The code still walks inner loop bodies
> > > up to loop depth times and thus is quadratic in the loop depth.
> > > 
> > > Bootstrapped and tested on x86_64-unknown-linux-gnu, if you don't
> > > have any comments I plan to push this and then revisit what we
> > > were circling around.
> > 
> > LGTM, thanks.
> 
> I pushed it, thought again in the attempt to build a testcase and
> concluded I was wrong with the appearant mishandling of
> contains_call - get_loop_body_in_dom_order seems to be exactly
> correct for this specific case.  So I reverted the commit again.

And I figured what the

  /* In a loop that is always entered we may proceed anyway.
 But record that we entered it and stop once we leave it.  
*/

comment was about.  The code was present before the fix for PR78185
and it was supposed to catch the case where the entered inner loop
is not finite.  Just as the testcase from PR78185 shows the
stopping was done too late when the exit block was already marked
as to be always executed.  A simpler fix for PR78185 would have been
to move

  if (!flow_bb_inside_loop_p (inn_loop, bb))
break;

before setting of last = bb.  In fact the installed fix was more
pessimistic than that given it terminated already when entering
a possibly infinite loop.  So we can improve that by doing
sth like which should also improve the situation for some of
the cases you were looking at?

What remains is that we continue to stop when entering a
not always executed loop:

  if (bb->loop_father->header == bb)
{
  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
break;

that I can at this point only explain by possible efficiency
concerns?  Any better idea on that one?

I'm going to test the patch below which improves the situation for

volatile int flag, bar;
double foo (double *valp)
{
  double sum = 0;
  for (int i = 0; i < 256; ++i)
{
  for (int j = 0; j < 256; ++j)
bar = flag;
  if (flag)
sum += 1.;
  sum += *valp;
}
  return sum;
}

Thanks,
Richard.

diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
index d9f75d5025e..f0c93d6a882 100644
--- a/gcc/tree-ssa-loop-im.c
+++ b/gcc/tree-ssa-loop-im.c
@@ -3044,23 +3044,27 @@ fill_always_executed_in_1 (class loop *loop, 
sbitmap contains_call)
  edge_iterator ei;
  bb = bbs[i];
 
+ if (!flow_bb_inside_loop_p (inn_loop, bb))
+   {
+ /* When we are leaving a possibly infinite inner loop
+we have to stop processing.  */
+ if (!finite_loop_p (inn_loop))
+   break;
+ /* If the loop was finite we can continue with processing
+the loop we exited to.  */
+ inn_loop = bb->loop_father;
+   }
+
  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
last = bb;
 
  if (bitmap_bit_p (contains_call, bb->index))
break;
 
+ /* If LOOP exits from this BB stop processing.  */
  FOR_EACH_EDGE (e, ei, bb->succs)
-   {
- /* If there is an exit from this BB.  */
- if (!flow_bb_inside_loop_p (loop, e->dest))
-   break;
- /* Or we enter a possibly non-finite loop.  */
- if (flow_loop_nested_p (bb->loop_father,
- e->dest->loop_father)
- && ! finite_loop_p (e->dest->loop_father))
-   break;
-   }
+   if (!flow_bb_inside_loop_p (loop, e->dest))
+ break;
  if (e)
break;
 
@@ -3069,16 +3073,14 @@ fill_always_executed_in_1 (class loop *loop, 
sbitmap contains_call)
  if (bb->flags & BB_IRREDUCIBLE_LOOP)
break;
 
- if (!flow_bb_inside_loop_p (inn_loop, bb))
-   break;
-
  if (bb->loop_father->header == bb)
{
  if (!dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
break;
 
  /* In a loop that is always entered we may proceed anyway.
-But record that we entered it and stop once we leave it.  
*/
+But record that we entered it and stop once we leave it
+since it might not be finite.

Re: [PATCH] c++, abi: Set DECL_FIELD_ABI_IGNORED on C++ zero width bitfields [PR102024]

2021-09-02 Thread Jakub Jelinek via Gcc-patches
On Thu, Sep 02, 2021 at 12:19:03AM +0200, Jakub Jelinek via Gcc-patches wrote:
> Ah, thanks for the archeology.  So it indeed seems like in theory an ABI 
> change
> between GCC 3.4 and 4.0 for C then on some of the targets like x86_64 which
> already existed in 3.2-ish era.  I actually couldn't see a difference
> between C and C++ in that era on x86_64, e.g. 3.3 C and C++ both work as
> C and C++ now, as if the zero width bitfields aren't removed.
> Before the PR42217 fix the C++ FE wasn't really removing the zero width 
> bitfields
> properly, so it is actually 4.5/4.4-ish regression for C++.

Ok, verified even the C FE used to suffer from the same issue as PR42217 and
didn't actually ever remove any zero width bitfields, while grokfield put
the field width into DECL_INITIAL, then finish_struct did:
  for (x = fieldlist; x; x = TREE_CHAIN (x))
{
...
  if (DECL_INITIAL (x))
{
  unsigned HOST_WIDE_INT width = tree_low_cst (DECL_INITIAL (x), 1);
  DECL_SIZE (x) = bitsize_int (width);
  DECL_BIT_FIELD (x) = 1;
  SET_DECL_C_BIT_FIELD (x);
}

  DECL_INITIAL (x) = 0;
...
}
and only a few lines later it did:
  /* Delete all zero-width bit-fields from the fieldlist.  */
  {
tree *fieldlistp = &fieldlist;
while (*fieldlistp)
  if (TREE_CODE (*fieldlistp) == FIELD_DECL && DECL_INITIAL (*fieldlistp))
*fieldlistp = TREE_CHAIN (*fieldlistp);
  else
fieldlistp = &TREE_CHAIN (*fieldlistp);
  }
but DECL_INITIAL was already guaranteed to be NULL here.  PR42217 actually
was the same problem as PR102019, but was fixed by actually making the
zero-width bit-field removal work when it never worked before.

Here is an updated patch that instead uses separate macros for the
previous DECL_FIELD_ABI_IGNORED meaning and for the C++ zero-width
bitfields.  The backends don't need any changes in that case (until they
want to actually use the new macro for the -Wpsabi or ABI decisions):

2021-09-02  Jakub Jelinek  

PR target/102024
gcc/
* tree.h (DECL_FIELD_ABI_IGNORED): Changed into rvalue only macro
that is false if DECL_BIT_FIELD.
(SET_DECL_FIELD_ABI_IGNORED, DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD,
SET_DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD): Define.
* tree-streamer-out.c (pack_ts_decl_common_value_fields): For
DECL_BIT_FIELD stream DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD instead
of DECL_FIELD_ABI_IGNORED.
* tree-streamer-in.c (unpack_ts_decl_common_value_fields): Use
SET_DECL_FIELD_ABI_IGNORED instead of writing to
DECL_FIELD_ABI_IGNORED and for DECL_BIT_FIELD use
SET_DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD instead.
* lto-streamer-out.c (hash_tree): For DECL_BIT_FIELD hash
DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD instead of DECL_FIELD_ABI_IGNORED.
gcc/cp/
* class.c (build_base_field): Use SET_DECL_FIELD_ABI_IGNORED
instead of writing to DECL_FIELD_ABI_IGNORED.
(layout_class_type): Likewise.  In the place where zero-width
bitfields used to be removed, use
SET_DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD on those fields instead.
gcc/lto/
* lto-common.c (compare_tree_sccs_1): Also compare
DECL_FIELD_CXX_ZERO_WIDTH_BIT_FIELD values.

--- gcc/tree.h.jj   2021-09-01 21:30:30.551306387 +0200
+++ gcc/tree.h  2021-09-02 10:34:43.559851006 +0200
@@ -2852,16 +2852,34 @@ extern void decl_value_expr_insert (tree
 /* In a FIELD_DECL, indicates this field should be bit-packed.  */
 #define DECL_PACKED(NODE) (FIELD_DECL_CHECK (NODE)->base.u.bits.packed_flag)
 
+/* Nonzero in a FIELD_DECL means it is a bit field, and must be accessed
+   specially.  */
+#define DECL_BIT_FIELD(NODE) (FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_1)
+
 /* In a FIELD_DECL, indicates this field should be ignored for ABI decisions
like passing/returning containing struct by value.
Set for C++17 empty base artificial FIELD_DECLs as well as
empty [[no_unique_address]] non-static data members.  */
 #define DECL_FIELD_ABI_IGNORED(NODE) \
-  (FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_0)
+  (!DECL_BIT_FIELD (NODE) && (NODE)->decl_common.decl_flag_0)
+#define SET_DECL_FIELD_ABI_IGNORED(NODE, VAL) \
+  do { \
+gcc_checking_assert (!DECL_BIT_FIELD (NODE));  \
+FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_0 = (VAL);  \
+  } while (0)
 
-/* Nonzero in a FIELD_DECL means it is a bit field, and must be accessed
-   specially.  */
-#define DECL_BIT_FIELD(NODE) (FIELD_DECL_CHECK (NODE)->decl_common.decl_flag_1)
+/* In a FIELD_DECL, indicates C++ zero-width bitfield that used to be
+   removed from the IL since PR42217 until PR101539 and by that changed
+   the ABI on several targets.  This flag is provided so that the backends
+   can decide on the ABI with zero-width bitfields and emit -Wpsabi
+   warnings.  */
+#define DECL_FIEL

Re: [PATCH v3] md/define_c_enum: support value assignation

2021-09-02 Thread YunQiang Su
Andrew Pinski via Gcc-patches  于2021年9月2日周四 上午5:28写道:
>
> On Tue, Aug 31, 2021 at 4:22 AM YunQiang Su  wrote:
> >
> > Currently, the enums from define_c_enum and define_enum can only
> > has values one by one from 0.
> >
> > In fact we can support the behaviour just like C, aka like
> >   (define_enum "mips_isa" [(mips1 1) mips2 (mips32 32) mips32r2]),
> > then we can get
> >   enum mips_isa {
> > MIPS_ISA_MIPS1 = 1,
> > MIPS_ISA_MIPS2 = 2,
> > MIPS_ISA_MIPS32 = 32,
> > MIPS_ISA_MIPS32R2 = 33
> >   };
> >
> > gcc/ChangeLog:
> > * read-md.c (md_reader::handle_enum): support value assignation.
> > * doc/md.texi: record define_c_enum value assignation support.
> > ---
> >  gcc/doc/md.texi |  4 
> >  gcc/read-md.c   | 21 +
> >  2 files changed, 21 insertions(+), 4 deletions(-)
> >
> > diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> > index f8047aefc..2b41cb7fb 100644
> > --- a/gcc/doc/md.texi
> > +++ b/gcc/doc/md.texi
> > @@ -11074,6 +11074,8 @@ The syntax is as follows:
> >  (define_c_enum "@var{name}" [
> >@var{value0}
> >@var{value1}
> > +  (@var{value32} 32)
> > +  @var{value33}
> >@dots{}
> >@var{valuen}
> >  ])
> > @@ -11086,6 +11088,8 @@ in @file{insn-constants.h}:
> >  enum @var{name} @{
> >@var{value0} = 0,
> >@var{value1} = 1,
> > +  @var{value32} = 32,
> > +  @var{value33} = 33,
> >@dots{}
> >@var{valuen} = @var{n}
> >  @};
> > diff --git a/gcc/read-md.c b/gcc/read-md.c
> > index bb419e0f6..0fbe924d1 100644
> > --- a/gcc/read-md.c
> > +++ b/gcc/read-md.c
> > @@ -902,7 +902,8 @@ void
> >  md_reader::handle_enum (file_location loc, bool md_p)
> >  {
> >char *enum_name, *value_name;
> > -  struct md_name name;
> > +  unsigned int cur_value;
> > +  struct md_name name, value;
> >struct enum_type *def;
> >struct enum_value *ev;
> >void **slot;
> > @@ -928,6 +929,7 @@ md_reader::handle_enum (file_location loc, bool md_p)
> >*slot = def;
> >  }
> >
> > +  cur_value = def->num_values;
> >require_char_ws ('[');
> >
> >while ((c = read_skip_spaces ()) != ']')
> > @@ -937,8 +939,18 @@ md_reader::handle_enum (file_location loc, bool md_p)
> >   error_at (loc, "unterminated construct");
> >   exit (1);
> > }
> > -  unread_char (c);
> > -  read_name (&name);
> > +  if (c == '(')
> > +   {
> > + read_name (&name);
> > + read_name (&value);
> > + require_char_ws (')');
> > + cur_value = atoi(value.string);
>
> We really should be avoiding adding atoi.  Yes there are uses already

It is not user input value, as the value is from our souce code.

> in the source but https://gcc.gnu.org/PR44574 exists to track those
> uses.
>

Your problem is still exist:
 how big range should we support here, for define_enum?

> Thanks,
> Andrew
>
>
> > +   }
> > +  else
> > +   {
> > + unread_char (c);
> > + read_name (&name);
> > +   }
> >
> >ev = XNEW (struct enum_value);
> >ev->next = 0;
> > @@ -954,11 +966,12 @@ md_reader::handle_enum (file_location loc, bool md_p)
> >   ev->name = value_name;
> > }
> >ev->def = add_constant (get_md_constants (), value_name,
> > - md_decimal_string (def->num_values), def);
> > + md_decimal_string (cur_value), def);
> >
> >*def->tail_ptr = ev;
> >def->tail_ptr = &ev->next;
> >def->num_values++;
> > +  cur_value++;
> >  }
> >  }
> >
> > --
> > 2.30.2
> >


Re: [PATCH] tree-optimization/102155 - fix LIM fill_always_executed_in CFG walk

2021-09-02 Thread Richard Biener via Gcc-patches
On Thu, 2 Sep 2021, Xionghu Luo wrote:

> 
> 
> On 2021/9/1 17:58, Richard Biener wrote:
> > This fixes the CFG walk order of fill_always_executed_in to use
> > RPO oder rather than the dominator based order computed by
> > get_loop_body_in_dom_order.  That fixes correctness issues with
> > unordered dominator children.
> > 
> > The RPO order computed by rev_post_order_and_mark_dfs_back_seme in
> > its for-iteration mode is a good match for the algorithm.
> > 
> > Xionghu, I've tried to only fix the CFG walk order issue and not
> > change anything else with this so we have a more correct base
> > to work against.  The code still walks inner loop bodies
> > up to loop depth times and thus is quadratic in the loop depth.
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, if you don't
> > have any comments I plan to push this and then revisit what we
> > were circling around.
> 
> LGTM, thanks.

I pushed it, thought again in the attempt to build a testcase and
concluded I was wrong with the appearant mishandling of
contains_call - get_loop_body_in_dom_order seems to be exactly
correct for this specific case.  So I reverted the commit again.

Richard.

> > 
> > Richard.
> > 
> > 2021-09-01  Richard Biener  
> > 
> >  PR tree-optimization/102155
> >  * tree-ssa-loop-im.c (fill_always_executed_in_1): Iterate
> >  over a part of the RPO array and do not recurse here.
> >  Dump blocks marked as always executed.
> >  (fill_always_executed_in): Walk over the RPO array and
> >  process loops whose header we run into.
> >  (loop_invariant_motion_in_fun): Compute the first RPO
> >  using rev_post_order_and_mark_dfs_back_seme in iteration
> >  order and pass that to fill_always_executed_in.
> > ---
> >   gcc/tree-ssa-loop-im.c | 136 ++---
> >   1 file changed, 73 insertions(+), 63 deletions(-)
> > 
> > diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c
> > index d9f75d5025e..f3706dcdb8a 100644
> > --- a/gcc/tree-ssa-loop-im.c
> > +++ b/gcc/tree-ssa-loop-im.c
> > @@ -3025,77 +3025,74 @@ do_store_motion (void)
> >   /* Fills ALWAYS_EXECUTED_IN information for basic blocks of LOOP, i.e.
> >  for each such basic block bb records the outermost loop for that
> >  execution
> >  of its header implies execution of bb.  CONTAINS_CALL is the bitmap of
> > -   blocks that contain a nonpure call.  */
> > +   blocks that contain a nonpure call.  The blocks of LOOP start at index
> > +   START of the RPO array of size N.  */
> >   
> >   static void
> > -fill_always_executed_in_1 (class loop *loop, sbitmap contains_call)
> > +fill_always_executed_in_1 (function *fun, class loop *loop,
> > +  int *rpo, int start, int n, sbitmap contains_call)
> >   {
> > -  basic_block bb = NULL, *bbs, last = NULL;
> > -  unsigned i;
> > -  edge e;
> > +  basic_block last = NULL;
> > class loop *inn_loop = loop;
> >   -  if (ALWAYS_EXECUTED_IN (loop->header) == NULL)
> > +  for (int i = start; i < n; i++)
> >   {
> > -  bbs = get_loop_body_in_dom_order (loop);
> > -
> > -  for (i = 0; i < loop->num_nodes; i++)
> > -   {
> > - edge_iterator ei;
> > - bb = bbs[i];
> > -
> > - if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> > -   last = bb;
> > +  basic_block bb = BASIC_BLOCK_FOR_FN (fun, rpo[i]);
> > +  /* Stop when we iterated over all blocks in this loop.  */
> > +  if (!flow_bb_inside_loop_p (loop, bb))
> > +   break;
> >   -   if (bitmap_bit_p (contains_call, bb->index))
> > -   break;
> > +  if (dominated_by_p (CDI_DOMINATORS, loop->latch, bb))
> > +   last = bb;
> >   -   FOR_EACH_EDGE (e, ei, bb->succs)
> > -   {
> > - /* If there is an exit from this BB.  */
> > - if (!flow_bb_inside_loop_p (loop, e->dest))
> > -   break;
> > - /* Or we enter a possibly non-finite loop.  */
> > - if (flow_loop_nested_p (bb->loop_father,
> > - e->dest->loop_father)
> > - && ! finite_loop_p (e->dest->loop_father))
> > -   break;
> > -   }
> > - if (e)
> > -   break;
> > +  if (bitmap_bit_p (contains_call, bb->index))
> > +   break;
> >   -   /* A loop might be infinite (TODO use simple loop analysis
> > -to disprove this if possible).  */
> > - if (bb->flags & BB_IRREDUCIBLE_LOOP)
> > +  edge_iterator ei;
> > +  edge e;
> > +  FOR_EACH_EDGE (e, ei, bb->succs)
> > +   {
> > + /* If there is an exit from this BB.  */
> > + if (!flow_bb_inside_loop_p (loop, e->dest))
> >break;
> > -
> > - if (!flow_bb_inside_loop_p (inn_loop, bb))
> > + /* Or we enter a possibly non-finite loop.  */
> > + if (flow_loop_nested_p (bb->loop_father,
> > + e->dest->loop_father)
> > + && ! finite_loop_p (e->dest->loop_father))
> > break;
> > +   }
> > +  if (e)
> > +   break;
> >   -   if (bb->loop_father->header == bb)
> > -   {
>

Re: [PATCH] match.pd: Demote IFN_{ADD,SUB,MUL}_OVERFLOW operands [PR99591]

2021-09-02 Thread Richard Biener via Gcc-patches
On Thu, 2 Sep 2021, Jakub Jelinek wrote:

> Hi!
> 
> The overflow builtins work on infinite precision integers and then convert
> to the result type's precision, so any argument promotions are useless.
> The expand_arith_overflow expansion is able to demote the arguments itself
> through get_range_pos_neg and get_min_precision calls and if needed promote
> to whatever mode it decides to perform the operations in, but if there are
> any promotions it demoted, those are already expanded.  Normally combine
> would remove the useless sign or zero extensions when it sees the result
> of those is only used in a lowpart subreg, but typically those lowpart
> subregs appear multiple times in the pattern so that they describe properly
> the overflow behavior and combine gives up, so we end up with e.g.
> movswl  %si, %esi
> movswl  %di, %edi
> imulw   %si, %di
> seto%al
> where both movswl insns are useless.
> 
> The following patch fixes it by demoting operands of the ifns (only gets
> rid of integral to integral conversions that increase precision).
> While IFN_{ADD,MUL}_OVERFLOW are commutative and just one simplify would be
> enough, IFN_SUB_OVERFLOW is not, therefore two simplifications.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

It appears we're careful to accept different typed operands on
those IFNs at least in the few places I found, thus OK.

Thanks,
Richard.

> 2021-09-02  Jakub Jelinek  
> 
>   PR tree-optimization/99591
>   * match.pd: Demote operands of IFN_{ADD,SUB,MUL}_OVERFLOW if they
>   were promoted.
> 
>   * gcc.target/i386/pr99591.c: New test.
>   * gcc.target/i386/pr97950.c: Match or reject setb or jn?b instructions
>   together with seta or jn?a.
> 
> --- gcc/match.pd.jj   2021-08-30 08:36:11.226516509 +0200
> +++ gcc/match.pd  2021-09-01 10:43:15.072908430 +0200
> @@ -5587,6 +5587,21 @@ (define_operator_list COND_TERNARY
> (with { tree t = TREE_TYPE (@0), cpx = build_complex_type (t); }
>  (cmp (imagpart (IFN_MUL_OVERFLOW:cpx @0 @1)) { build_zero_cst (t); })
>  
> +/* Demote operands of IFN_{ADD,SUB,MUL}_OVERFLOW.  */
> +(for ovf (IFN_ADD_OVERFLOW IFN_SUB_OVERFLOW IFN_MUL_OVERFLOW)
> + (simplify
> +  (ovf (convert@2 @0) @1)
> +  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +   && INTEGRAL_TYPE_P (TREE_TYPE (@2))
> +   && TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE (@0)))
> +   (ovf @0 @1)))
> + (simplify
> +  (ovf @1 (convert@2 @0))
> +  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
> +   && INTEGRAL_TYPE_P (TREE_TYPE (@2))
> +   && TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE (@0)))
> +   (ovf @1 @0
> +
>  /* Simplification of math builtins.  These rules must all be optimizations
> as well as IL simplifications.  If there is a possibility that the new
> form could be a pessimization, the rule should go in the canonicalization
> --- gcc/testsuite/gcc.target/i386/pr99591.c.jj2021-09-01 
> 10:49:32.286556087 +0200
> +++ gcc/testsuite/gcc.target/i386/pr99591.c   2021-09-01 10:49:17.450766597 
> +0200
> @@ -0,0 +1,32 @@
> +/* PR tree-optimization/99591 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not "\tmovs\[bw]l\t" } } */
> +
> +int
> +foo (signed char a, signed char b)
> +{
> +  signed char r;
> +  return __builtin_add_overflow (a, b, &r);
> +}
> +
> +int
> +bar (short a, short b)
> +{
> +  short r;
> +  return __builtin_add_overflow (a, b, &r);
> +}
> +
> +int
> +baz (signed char a, signed char b)
> +{
> +  signed char r;
> +  return __builtin_add_overflow ((int) a, (int) b, &r);
> +}
> +
> +int
> +qux (short a, short b)
> +{
> +  short r;
> +  return __builtin_add_overflow ((int) a, (int) b, &r);
> +}
> --- gcc/testsuite/gcc.target/i386/pr97950.c.jj2020-11-24 
> 23:16:27.601004905 +0100
> +++ gcc/testsuite/gcc.target/i386/pr97950.c   2021-09-02 09:28:06.934382216 
> +0200
> @@ -1,10 +1,10 @@
>  /* PR target/95950 */
>  /* { dg-do compile } */
>  /* { dg-options "-O2 -mtune=generic" } */
> -/* { dg-final { scan-assembler-times "\tseta\t" 4 } } */
> +/* { dg-final { scan-assembler-times "\tset\[ab]\t" 4 } } */
>  /* { dg-final { scan-assembler-times "\tseto\t" 16 } } */
>  /* { dg-final { scan-assembler-times "\tsetc\t" 4 } } */
> -/* { dg-final { scan-assembler-not "\tjn?a\t" } } */
> +/* { dg-final { scan-assembler-not "\tjn?\[ab]\t" } } */
>  /* { dg-final { scan-assembler-not "\tjn?o\t" } } */
>  /* { dg-final { scan-assembler-not "\tjn?c\t" } } */
>  
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)


[PATCH] match.pd: Demote IFN_{ADD,SUB,MUL}_OVERFLOW operands [PR99591]

2021-09-02 Thread Jakub Jelinek via Gcc-patches
Hi!

The overflow builtins work on infinite precision integers and then convert
to the result type's precision, so any argument promotions are useless.
The expand_arith_overflow expansion is able to demote the arguments itself
through get_range_pos_neg and get_min_precision calls and if needed promote
to whatever mode it decides to perform the operations in, but if there are
any promotions it demoted, those are already expanded.  Normally combine
would remove the useless sign or zero extensions when it sees the result
of those is only used in a lowpart subreg, but typically those lowpart
subregs appear multiple times in the pattern so that they describe properly
the overflow behavior and combine gives up, so we end up with e.g.
movswl  %si, %esi
movswl  %di, %edi
imulw   %si, %di
seto%al
where both movswl insns are useless.

The following patch fixes it by demoting operands of the ifns (only gets
rid of integral to integral conversions that increase precision).
While IFN_{ADD,MUL}_OVERFLOW are commutative and just one simplify would be
enough, IFN_SUB_OVERFLOW is not, therefore two simplifications.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2021-09-02  Jakub Jelinek  

PR tree-optimization/99591
* match.pd: Demote operands of IFN_{ADD,SUB,MUL}_OVERFLOW if they
were promoted.

* gcc.target/i386/pr99591.c: New test.
* gcc.target/i386/pr97950.c: Match or reject setb or jn?b instructions
together with seta or jn?a.

--- gcc/match.pd.jj 2021-08-30 08:36:11.226516509 +0200
+++ gcc/match.pd2021-09-01 10:43:15.072908430 +0200
@@ -5587,6 +5587,21 @@ (define_operator_list COND_TERNARY
(with { tree t = TREE_TYPE (@0), cpx = build_complex_type (t); }
 (cmp (imagpart (IFN_MUL_OVERFLOW:cpx @0 @1)) { build_zero_cst (t); })
 
+/* Demote operands of IFN_{ADD,SUB,MUL}_OVERFLOW.  */
+(for ovf (IFN_ADD_OVERFLOW IFN_SUB_OVERFLOW IFN_MUL_OVERFLOW)
+ (simplify
+  (ovf (convert@2 @0) @1)
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+   && INTEGRAL_TYPE_P (TREE_TYPE (@2))
+   && TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE (@0)))
+   (ovf @0 @1)))
+ (simplify
+  (ovf @1 (convert@2 @0))
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+   && INTEGRAL_TYPE_P (TREE_TYPE (@2))
+   && TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (TREE_TYPE (@0)))
+   (ovf @1 @0
+
 /* Simplification of math builtins.  These rules must all be optimizations
as well as IL simplifications.  If there is a possibility that the new
form could be a pessimization, the rule should go in the canonicalization
--- gcc/testsuite/gcc.target/i386/pr99591.c.jj  2021-09-01 10:49:32.286556087 
+0200
+++ gcc/testsuite/gcc.target/i386/pr99591.c 2021-09-01 10:49:17.450766597 
+0200
@@ -0,0 +1,32 @@
+/* PR tree-optimization/99591 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-not "\tmovs\[bw]l\t" } } */
+
+int
+foo (signed char a, signed char b)
+{
+  signed char r;
+  return __builtin_add_overflow (a, b, &r);
+}
+
+int
+bar (short a, short b)
+{
+  short r;
+  return __builtin_add_overflow (a, b, &r);
+}
+
+int
+baz (signed char a, signed char b)
+{
+  signed char r;
+  return __builtin_add_overflow ((int) a, (int) b, &r);
+}
+
+int
+qux (short a, short b)
+{
+  short r;
+  return __builtin_add_overflow ((int) a, (int) b, &r);
+}
--- gcc/testsuite/gcc.target/i386/pr97950.c.jj  2020-11-24 23:16:27.601004905 
+0100
+++ gcc/testsuite/gcc.target/i386/pr97950.c 2021-09-02 09:28:06.934382216 
+0200
@@ -1,10 +1,10 @@
 /* PR target/95950 */
 /* { dg-do compile } */
 /* { dg-options "-O2 -mtune=generic" } */
-/* { dg-final { scan-assembler-times "\tseta\t" 4 } } */
+/* { dg-final { scan-assembler-times "\tset\[ab]\t" 4 } } */
 /* { dg-final { scan-assembler-times "\tseto\t" 16 } } */
 /* { dg-final { scan-assembler-times "\tsetc\t" 4 } } */
-/* { dg-final { scan-assembler-not "\tjn?a\t" } } */
+/* { dg-final { scan-assembler-not "\tjn?\[ab]\t" } } */
 /* { dg-final { scan-assembler-not "\tjn?o\t" } } */
 /* { dg-final { scan-assembler-not "\tjn?c\t" } } */
 

Jakub



Re: DWARF for extern variable

2021-09-02 Thread Richard Biener via Gcc-patches
On Wed, Sep 1, 2021 at 7:25 PM Indu Bhagat  wrote:
>
> On 8/24/21 12:55 AM, Richard Biener wrote:
> > On Mon, Aug 23, 2021 at 11:18 PM Indu Bhagat via Gcc-patches
> >  wrote:
> >>
> >> Hello,
> >>
> >> What is the expected DWARF for extern variable in the following cases? I
> >> am seeing that the DWARF generated is different with gcc8.4.1 vs gcc-trunk.
> >>
> >>
> >> Testcase 2
> >> --
> >> extern const char a[];
> >> const char a[] = "testme";
> >>
> >> Testcase 2 Behavior
> >> 
> >> - Both gcc-trunk and gcc8.4.1 generate two DW_TAG_variable DIEs (the
> >> defining decl holds the reference to the non-defining decl via
> >> DW_AT_specification)
> >> - But gcc8.4.1 does not generate any DWARF for the type of the defining
> >> decl (const char[7]) but gcc-trunk does.
> >>
> >> ## DWARF for testcase 2 with gcc-trunk is as follows:
> >> <...>
> >><1><22>: Abbrev Number: 2 (DW_TAG_array_type)
> >>   <23>   DW_AT_type: <0x39>
> >>   <27>   DW_AT_sibling : <0x2d>
> >><2><2b>: Abbrev Number: 5 (DW_TAG_subrange_type)
> >><2><2c>: Abbrev Number: 0
> >><1><2d>: Abbrev Number: 1 (DW_TAG_const_type)
> >>   <2e>   DW_AT_type: <0x22>
> >><1><32>: Abbrev Number: 3 (DW_TAG_base_type)
> >>   <33>   DW_AT_byte_size   : 1
> >>   <34>   DW_AT_encoding: 6(signed char)
> >>   <35>   DW_AT_name: (indirect string, offset: 0x2035): char
> >><1><39>: Abbrev Number: 1 (DW_TAG_const_type)
> >>   <3a>   DW_AT_type: <0x32>
> >><1><3e>: Abbrev Number: 6 (DW_TAG_variable)
> >>   <3f>   DW_AT_name: a
> >>   <41>   DW_AT_decl_file   : 1
> >>   <42>   DW_AT_decl_line   : 1
> >>   <43>   DW_AT_decl_column : 19
> >>   <44>   DW_AT_type: <0x2d>
> >>   <48>   DW_AT_external: 1
> >>   <48>   DW_AT_declaration : 1
> >><1><48>: Abbrev Number: 2 (DW_TAG_array_type)
> >>   <49>   DW_AT_type: <0x39>
> >>   <4d>   DW_AT_sibling : <0x58>
> >><2><51>: Abbrev Number: 7 (DW_TAG_subrange_type)
> >>   <52>   DW_AT_type: <0x5d>
> >>   <56>   DW_AT_upper_bound : 6
> >><2><57>: Abbrev Number: 0
> >><1><58>: Abbrev Number: 1 (DW_TAG_const_type)
> >>   <59>   DW_AT_type: <0x48>
> >><1><5d>: Abbrev Number: 3 (DW_TAG_base_type)
> >>   <5e>   DW_AT_byte_size   : 8
> >>   <5f>   DW_AT_encoding: 7(unsigned)
> >>   <60>   DW_AT_name: (indirect string, offset: 0x2023): long
> >> unsigned int
> >><1><64>: Abbrev Number: 8 (DW_TAG_variable)
> >>   <65>   DW_AT_specification: <0x3e>
> >>   <69>   DW_AT_decl_line   : 2
> >>   <6a>   DW_AT_decl_column : 12
> >>   <6b>   DW_AT_type: <0x58>
> >
> > I suppose having both a DW_AT_specification and a DW_AT_type
> > is somewhat at odds.  It's likely because the definition specifies
> > the size of the array while the specification does not.  Not sure
> > what should be best done here.
> >
> > Richard.
>
> Hmm..I thought the generated DWARF by gcc-trunk for testcase 2 is
> coherent and specifies the information in alignment with the source :
> DW_AT_type of the defining declaration correctly specifies the type to
> be const char[7] while the DW_AT_specification pointing to the
> non-defining decl (and with type const char[] with no size info).
>
> The DWARF generated by gcc-8.4.1, however, does seem to be missing
> information though. It should have the information for the defining decl
> and hence, the size info. i.e., DW_AT_type pointing to a array with
> DW_TAG_subrange_type with attribute DW_AT_upper_bound = 6 like above.
> Isn't it ?

Yes.

> Indu
>
> >
> >>   <6f>   DW_AT_location: 9 byte block: 3 0 0 0 0 0 0 0 0
> >> (DW_OP_addr: 0)
> >><1><79>: Abbrev Number: 0
> >>
> >> ## DWARF for testcase 2 with gcc8.4.1 is as follows:
> >><1><21>: Abbrev Number: 2 (DW_TAG_array_type)
> >>   <22>   DW_AT_type: <0x38>
> >>   <26>   DW_AT_sibling : <0x2c>
> >><2><2a>: Abbrev Number: 3 (DW_TAG_subrange_type)
> >><2><2b>: Abbrev Number: 0
> >><1><2c>: Abbrev Number: 4 (DW_TAG_const_type)
> >>   <2d>   DW_AT_type: <0x21>
> >><1><31>: Abbrev Number: 5 (DW_TAG_base_type)
> >>   <32>   DW_AT_byte_size   : 1
> >>   <33>   DW_AT_encoding: 6(signed char)
> >>   <34>   DW_AT_name: (indirect string, offset: 0x1e04): char
> >><1><38>: Abbrev Number: 4 (DW_TAG_const_type)
> >>   <39>   DW_AT_type: <0x31>
> >><1><3d>: Abbrev Number: 6 (DW_TAG_variable)
> >>   <3e>   DW_AT_name: a
> >>   <40>   DW_AT_decl_file   : 1
> >>   <41>   DW_AT_decl_line   : 1
> >>   <42>   DW_AT_decl_column : 19
> >>   <43>   DW_AT_type: <0x2c>
> >>   <47>   DW_AT_external: 1
> >>   <47>   DW_AT_declaration : 1
> >><1><47>: Abbrev Number: 5 (DW_TAG_base_type)
> >>   <48>   DW_AT_byte_size   : 8
> >>