Re: [PATCH/RFC] Add a new memory gathering optimization for loop (PR98598)

2021-05-06 Thread Feng Xue OS via Gcc-patches
>> gcc/
>> PR tree-optimization/98598
>> * Makefile.in (OBJS): Add tree-ssa-loop-mgo.o.
>> * common.opt (-ftree-loop-mgo): New option.
> 
> Just a quick comment - -ftree-loop-mgo is user-facing and it isn't really a 
> good
> name.  -floop-mgo would be better but still I'd have no idea what this would 
> do.
> 
> I don't have a good suggestion here other than to expand it to
> -floop-gather-memory (?!).

OK. Better than "mgo", this abbr. is only a term for development use.

> The option documentation isn't informative either.
> 
> From:
> 
>   outer-loop ()
> {
>   inner-loop (iter, iter_count)
> {
>   Type1 v1 = LOAD (iter);
>   Type2 v2 = LOAD (v1);
>   Type3 v3 = LOAD (v2);
>   ...
>   iter = NEXT (iter);
> }
> }
> 
> To:
> 
>   typedef struct cache_elem
> {
>   bool   init;
>   Type1  c_v1;
>   Type2  c_v2;
>   Type3  c_v3;
> } cache_elem;
> 
>   cache_elem *cache_arr = calloc (iter_count, sizeof (cache_elem));
> 
>   outer-loop ()
> {
>   size_t cache_idx = 0;
> 
>   inner-loop (iter, iter_count)
> {
>   if (!cache_arr[cache_idx]->init)
> {
>   v1 = LOAD (iter);
>   v2 = LOAD (v1);
>   v3 = LOAD (v2);
> 
>   cache_arr[cache_idx]->init = true;
>   cache_arr[cache_idx]->c_v1 = v1;
>   cache_arr[cache_idx]->c_v2 = v2;
>   cache_arr[cache_idx]->c_v3 = v3;
> }
>   else
> {
>   v1 = cache_arr[cache_idx]->c_v1;
>   v2 = cache_arr[cache_idx]->c_v2;
>   v3 = cache_arr[cache_idx]->c_v3;
> }
>   ...
>   cache_idx++;
>   iter = NEXT (iter);
> }
> }
> 
>   free (cache_arr);
> 
> This is a _very_ special transform.  What it seems to do is
> optimize the dependent loads for outer loop iteration n > 1
> by caching the result(s).  If that's possible then you should
> be able to distribute the outer loop to one doing the caching
> and one using the cache.  Then this transform would be more
> like a tradidional array expansion of scalars?  In some cases
> also loop interchange could remove the need for the caching.
> 
> Doing MGO as the very first loop pass thus looks bad, I think
> MGO should be much later, for example after interchange.
> I also think that MGO should work in concert with loop
> distribution (which could have an improved cost model)
> rather than being a separate pass.
> 
> Your analysis phase looks quite expensive, building sth
> like a on-the side representation very closely matching SSA.
> It seems to work from PHI defs to uses, which looks backwards.

Did not catch this point very clearly. Would you please detail it more?

> You seem to roll your own dependence analysis code :/  Please
> have a look at loop distribution.
> 
> Also you build an actual structure type for reasons that escape
> me rather than simply accessing the allocated storage at
> appropriate offsets.
> 
> I think simply calling 'calloc' isn't OK because you might need
> aligned storage and because calloc might not be available.
> Please at least use 'malloc' and make sure MALLOC_ABI_ALIGNMENT
> is large enough for the data you want to place (or perform
> dynamic re-alignment yourself).  We probably want some generic
> middle-end utility to obtain aligned allocated storage at some
> point.
> 
> As said above I think you want to re-do this transform as
> a loop distribution transform.  I think if caching works then
> the loads should be distributable and the loop distribution
> transform should be enhanced to expand the scalars to arrays.

I checked code of loop distribution, and its trigger strategy seems
to be very conservative, now only targets simple and regular
index-based loop, and could not handle link-list traversal, which
consists of a series of discrete memory accesses, and MGO would
matter a lot. Additionally, for some complicate cases,  we could
not completely decompose MGO as two separate loops for
"do caching" and "use caching" respectively. An example:

for (i = 0; i < N; i++)
  {
for (j = 0; j < i; j++)
   {
   Type1 v1 = LOAD_FN1 (j);
   Type2 v2 = LOAD_FN2 (v1);
   Type3 v3 = LOAD_FN3 (v2);

   ...

   condition = ...
   }

if (condition)
  break;
  }

We should not cache all loads (Totally N) in one step since some
of them might be invalid after "condition" breaks loops. We have to
mix up "do caching" and "use caching", and let them dynamically
switched against "init" flag.  But loop distribution does have some
overlap on analysis and transformation with MGO, we will try to
see if there is a way to unify them.

Thanks,
Feng

[PATCH RFC] c++: don't call 'rvalue' in coroutines code

2021-05-06 Thread Jason Merrill via Gcc-patches
A change to check glvalue_p rather than specifically for TARGET_EXPR
revealed issues with the coroutines code's use of the 'rvalue' function,
which shouldn't be used on class glvalues, so I've removed those calls.

In build_co_await I just dropped them, because I don't see anything in the
co_await specification that indicates that we would want to move from an
lvalue result of operator co_await.  And simplified that code while I was
touching it; cp_build_modify_expr (...INIT_EXPR...) will call the
constructor.

In morph_fn_to_coro I changed the handling of the rvalue reference coroutine
frame field to use move, to treat the rval ref as an xvalue.  I used
forward_parm to pass the function parms to the constructor for the field.
And I simplified the return handling so we get the desired rvalue semantics
from the normal implicit move on return.

Iain, does this all make sense to you?

Incidentally, what is the standard citation for default-initializing the
non-void return value of the function if get_return_object returns void?
All I see is "The expression /promise/.get_return_object() is used to
initialize the glvalue result or prvalue result object of a call to a
coroutine", which suggests to me that that situation should be ill-formed.

gcc/cp/ChangeLog:

* coroutines.cc (build_co_await): Don't call 'rvalue'.
(flatten_await_stmt): Simplify initialization.
(morph_fn_to_coro): Change 'rvalue' to 'move'.  Simplify.

gcc/testsuite/ChangeLog:

* g++.dg/coroutines/coro-bad-gro-00-class-gro-scalar-return.C:
Adjust diagnostic.
---
 gcc/cp/coroutines.cc  | 117 +-
 .../coro-bad-gro-00-class-gro-scalar-return.C |   2 +-
 2 files changed, 31 insertions(+), 88 deletions(-)

diff --git a/gcc/cp/coroutines.cc b/gcc/cp/coroutines.cc
index dbd703a67cc..03543d5c112 100644
--- a/gcc/cp/coroutines.cc
+++ b/gcc/cp/coroutines.cc
@@ -950,18 +950,11 @@ build_co_await (location_t loc, tree a, 
suspend_point_kind suspend_kind)
   e_proxy = o;
   o = NULL_TREE; /* The var is already present.  */
 }
-  else if (type_build_ctor_call (o_type))
-{
-  e_proxy = get_awaitable_var (suspend_kind, o_type);
-  releasing_vec arg (make_tree_vector_single (rvalue (o)));
-  o = build_special_member_call (e_proxy, complete_ctor_identifier,
-, o_type, LOOKUP_NORMAL,
-tf_warning_or_error);
-}
   else
 {
   e_proxy = get_awaitable_var (suspend_kind, o_type);
-  o = build2 (INIT_EXPR, o_type, e_proxy, rvalue (o));
+  o = cp_build_modify_expr (loc, e_proxy, INIT_EXPR, o,
+   tf_warning_or_error);
 }
 
   /* I suppose we could check that this is contextually convertible to bool.  
*/
@@ -2989,15 +2982,8 @@ flatten_await_stmt (var_nest_node *n, hash_set 
*promoted,
  gcc_checking_assert (!already_present);
  tree inner = TREE_OPERAND (init, 1);
  gcc_checking_assert (TREE_CODE (inner) != COND_EXPR);
- if (type_build_ctor_call (var_type))
-   {
- releasing_vec p_in (make_tree_vector_single (init));
- init = build_special_member_call (var, complete_ctor_identifier,
-   _in, var_type, LOOKUP_NORMAL,
-   tf_warning_or_error);
-   }
- else
-   init = build2 (INIT_EXPR, var_type, var, init);
+ init = cp_build_modify_expr (input_location, var, INIT_EXPR, init,
+  tf_warning_or_error);
  /* Simplify for the case that we have an init containing the temp
 alone.  */
  if (t == n->init && n->var == NULL_TREE)
@@ -4862,43 +4848,19 @@ morph_fn_to_coro (tree orig, tree *resumer, tree 
*destroyer)
  vec_safe_push (promise_args, this_ref);
}
  else if (parm.rv_ref)
-   vec_safe_push (promise_args, rvalue(fld_idx));
+   vec_safe_push (promise_args, move (fld_idx));
  else
vec_safe_push (promise_args, fld_idx);
 
  if (parm.rv_ref || parm.pt_ref)
/* Initialise the frame reference field directly.  */
-   r = build_modify_expr (fn_start, TREE_OPERAND (fld_idx, 0),
-  parm.frame_type, INIT_EXPR,
-  DECL_SOURCE_LOCATION (arg), arg,
-  DECL_ARG_TYPE (arg));
- else if (type_build_ctor_call (parm.frame_type))
-   {
- vec *p_in;
- if (CLASS_TYPE_P (parm.frame_type)
- && classtype_has_non_deleted_move_ctor (parm.frame_type))
-   p_in = make_tree_vector_single (move (arg));
- else if (lvalue_p (arg))
-   p_in = make_tree_vector_single (rvalue (arg));
- else
-   p_in = make_tree_vector_single (arg);
- 

PowerPC64 ELFv2 -fpatchable-function-entry

2021-05-06 Thread Alan Modra via Gcc-patches
PowerPC64 ELFv2 dual entry point functions have a couple of problems
with -fpatchable-function-entry.  One is that the nops added after the
global entry land in the global entry code which is constrained to be
a power of two number of instructions, and zero global entry code has
special meaning for linkage.  The other is that the global entry code
isn't always used on function entry, and some uses of
-fpatchable-function-entry might want to affect all entries to the
function.  So this patch arranges to put one batch of nops before the
global entry, and the other after the local entry point.

PR target/98125
* config/rs6000/rs6000.c (rs6000_print_patchable_function_entry): New
function.
(TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY): Define.
* config/rs6000/rs6000-logue.c: Include targhooks.h.
(rs6000_output_function_prologue): Handle nops for
-fpatchable-function-entry after local entry point.

diff --git a/gcc/config/rs6000/rs6000-logue.c b/gcc/config/rs6000/rs6000-logue.c
index b0ac183ceff..ffa3bb3dcf1 100644
--- a/gcc/config/rs6000/rs6000-logue.c
+++ b/gcc/config/rs6000/rs6000-logue.c
@@ -51,6 +51,7 @@
 #include "gstab.h"  /* for N_SLINE */
 #include "dbxout.h" /* dbxout_ */
 #endif
+#include "targhooks.h"
 
 static int rs6000_ra_ever_killed (void);
 static void is_altivec_return_reg (rtx, void *);
@@ -3991,6 +3992,10 @@ rs6000_output_function_prologue (FILE *file)
   fputs (",1\n", file);
 }
 
+  int nops_after_entry = crtl->patch_area_size - crtl->patch_area_entry;
+  if (nops_after_entry > 0)
+default_print_patchable_function_entry (file, nops_after_entry, false);
+
   /* Output -mprofile-kernel code.  This needs to be done here instead of
  in output_function_profile since it must go after the ELFv2 ABI
  local entry point.  */
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index d43c36e7f1a..97f1b3e0674 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1404,6 +1404,10 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #undef TARGET_ASM_FUNCTION_EPILOGUE
 #define TARGET_ASM_FUNCTION_EPILOGUE rs6000_output_function_epilogue
 
+#undef TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY
+#define TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY \
+  rs6000_print_patchable_function_entry
+
 #undef TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA
 #define TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA rs6000_output_addr_const_extra
 
@@ -14953,6 +14957,33 @@ rs6000_assemble_visibility (tree decl, int vis)
 }
 #endif
 
+/* Write NOPs into the asm outfile FILE around a function entry.  This
+   routine may be called twice per function to put NOPs before and after
+   the function entry.  If RECORD_P is true the location of the NOPs will
+   be recorded by default_print_patchable_function_entry in a special
+   object section called "__patchable_function_entries".  Disable output
+   of any NOPs for the second call.  Those, if any, are output by
+   rs6000_output_function_prologue.  This means that for ELFv2 any NOPs
+   after the function entry are placed after the local entry point, not
+   the global entry point.  NOPs after the entry may be found at
+   record_loc + nops_before * 4 + local_entry_offset.  This holds true
+   when nops_before is zero.  */
+
+static void
+rs6000_print_patchable_function_entry (FILE *file,
+  unsigned HOST_WIDE_INT patch_area_size 
ATTRIBUTE_UNUSED,
+  bool record_p)
+{
+  /* Always call default_print_patchable_function_entry when RECORD_P in
+ order to output the location of the NOPs, but use the size of the
+ area before the entry on both possible calls.  If RECORD_P is true
+ on the second call then the area before the entry was zero size and
+ thus no NOPs will be output.  */
+  if (record_p)
+default_print_patchable_function_entry (file, crtl->patch_area_entry,
+   record_p);
+}
+
 enum rtx_code
 rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
 {


Revert "rs6000: Avoid -fpatchable-function-entry* regressions on powerpc64 be [PR98125]"

2021-05-06 Thread Alan Modra via Gcc-patches
This reverts commit b680b9049737198d010e49cf434704c6a6ed2b3f now
that the PowerPC64 ELFv1 regression is fixed properly.

PR testsuite/98125
* targhooks.h (default_print_patchable_function_entry_1): Delete.
* targhooks.c (default_print_patchable_function_entry_1): Delete.
(default_print_patchable_function_entry): Expand above.
* config/rs6000/rs6000.c (TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY):
Don't define.
(rs6000_print_patchable_function_entry): Delete.
* testsuite/g++.dg/pr93195a.C: Revert 2021-04-03 change.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 663eed4f055..d43c36e7f1a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -1362,10 +1362,6 @@ static const struct attribute_spec 
rs6000_attribute_table[] =
 #define TARGET_ASM_ASSEMBLE_VISIBILITY rs6000_assemble_visibility
 #endif
 
-#undef TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY
-#define TARGET_ASM_PRINT_PATCHABLE_FUNCTION_ENTRY \
-  rs6000_print_patchable_function_entry
-
 #undef TARGET_SET_UP_BY_PROLOGUE
 #define TARGET_SET_UP_BY_PROLOGUE rs6000_set_up_by_prologue
 
@@ -14957,30 +14953,6 @@ rs6000_assemble_visibility (tree decl, int vis)
 }
 #endif
 
-/* Write PATCH_AREA_SIZE NOPs into the asm outfile FILE around a function
-   entry.  If RECORD_P is true and the target supports named sections,
-   the location of the NOPs will be recorded in a special object section
-   called "__patchable_function_entries".  This routine may be called
-   twice per function to put NOPs before and after the function
-   entry.  */
-
-void
-rs6000_print_patchable_function_entry (FILE *file,
-  unsigned HOST_WIDE_INT patch_area_size,
-  bool record_p)
-{
-  unsigned int flags = SECTION_WRITE | SECTION_RELRO;
-  /* When .opd section is emitted, the function symbol
- default_print_patchable_function_entry_1 is emitted into the .opd section
- while the patchable area is emitted into the function section.
- Don't use SECTION_LINK_ORDER in that case.  */
-  if (!(TARGET_64BIT && DEFAULT_ABI != ABI_ELFv2)
-  && HAVE_GAS_SECTION_LINK_ORDER)
-flags |= SECTION_LINK_ORDER;
-  default_print_patchable_function_entry_1 (file, patch_area_size, record_p,
-   flags);
-}
-
 enum rtx_code
 rs6000_reverse_condition (machine_mode mode, enum rtx_code code)
 {
diff --git a/gcc/targhooks.c b/gcc/targhooks.c
index 952fad422eb..d69c9a2d819 100644
--- a/gcc/targhooks.c
+++ b/gcc/targhooks.c
@@ -1832,15 +1832,17 @@ default_compare_by_pieces_branch_ratio (machine_mode)
   return 1;
 }
 
-/* Helper for default_print_patchable_function_entry and other
-   print_patchable_function_entry hook implementations.  */
+/* Write PATCH_AREA_SIZE NOPs into the asm outfile FILE around a function
+   entry.  If RECORD_P is true and the target supports named sections,
+   the location of the NOPs will be recorded in a special object section
+   called "__patchable_function_entries".  This routine may be called
+   twice per function to put NOPs before and after the function
+   entry.  */
 
 void
-default_print_patchable_function_entry_1 (FILE *file,
- unsigned HOST_WIDE_INT
- patch_area_size,
- bool record_p,
- unsigned int flags)
+default_print_patchable_function_entry (FILE *file,
+   unsigned HOST_WIDE_INT patch_area_size,
+   bool record_p)
 {
   const char *nop_templ = 0;
   int code_num;
@@ -1862,6 +1864,9 @@ default_print_patchable_function_entry_1 (FILE *file,
   patch_area_number++;
   ASM_GENERATE_INTERNAL_LABEL (buf, "LPFE", patch_area_number);
 
+  unsigned int flags = SECTION_WRITE | SECTION_RELRO;
+  if (HAVE_GAS_SECTION_LINK_ORDER)
+   flags |= SECTION_LINK_ORDER;
   switch_to_section (get_section ("__patchable_function_entries",
  flags, current_function_decl));
   assemble_align (POINTER_SIZE);
@@ -1878,25 +1883,6 @@ default_print_patchable_function_entry_1 (FILE *file,
 output_asm_insn (nop_templ, NULL);
 }
 
-/* Write PATCH_AREA_SIZE NOPs into the asm outfile FILE around a function
-   entry.  If RECORD_P is true and the target supports named sections,
-   the location of the NOPs will be recorded in a special object section
-   called "__patchable_function_entries".  This routine may be called
-   twice per function to put NOPs before and after the function
-   entry.  */
-
-void
-default_print_patchable_function_entry (FILE *file,
-   unsigned HOST_WIDE_INT patch_area_size,
-   bool record_p)
-{
-  unsigned int flags = SECTION_WRITE | SECTION_RELRO;
-  if 

PowerPC64 ELFv1 -fpatchable-function-entry

2021-05-06 Thread Alan Modra via Gcc-patches
On PowerPC64 ELFv1 function symbols are defined on function
descriptors in an .opd section rather than in the function code.
.opd is not split up by the PowerPC64 backend for comdat groups or
other situations where per-function sections are required.  Thus
SECTION_LINK_ORDER can't use the function name to reference a suitable
section for ordering:  The .opd section might contain many other
function descriptors and they may be in a different order to the final
function code placement.  This patch arranges to use a code label
instead of the function name symbol.

I chose to emit the label inside default_elf_asm_named_section,
immediately before the .section directive using the label, and in case
someone uses .previous or the like, need to save and restore the
current section when switching to the function code section to emit
the label.  That requires a tweak to switch_to_section in order to get
the current section.  I checked all the TARGET_ASM_NAMED_SECTION
functions and unnamed.callback functions and it appears none will be
affected by that tweak.

PR target/98125
* varasm.c (default_elf_asm_named_section): Use a function
code label rather than the function symbol as the "o" argument.
(switch_to_section): Don't set in_section until section
directive has been emitted.

diff --git a/gcc/varasm.c b/gcc/varasm.c
index 97c1e6fff25..5f95f8cfa75 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -6866,6 +6866,26 @@ default_elf_asm_named_section (const char *name, 
unsigned int flags,
   *f = '\0';
 }
 
+  char func_label[256];
+  if (flags & SECTION_LINK_ORDER)
+{
+  static int recur;
+  if (recur)
+   gcc_unreachable ();
+  else
+   {
+ ++recur;
+ section *save_section = in_section;
+ static int func_code_labelno;
+ switch_to_section (function_section (decl));
+ ++func_code_labelno;
+ ASM_GENERATE_INTERNAL_LABEL (func_label, "LPFC", func_code_labelno);
+ ASM_OUTPUT_LABEL (asm_out_file, func_label);
+ switch_to_section (save_section);
+ --recur;
+   }
+}
+
   fprintf (asm_out_file, "\t.section\t%s,\"%s\"", name, flagchars);
 
   /* default_section_type_flags (above) knows which flags need special
@@ -6893,11 +6913,8 @@ default_elf_asm_named_section (const char *name, 
unsigned int flags,
fprintf (asm_out_file, ",%d", flags & SECTION_ENTSIZE);
   if (flags & SECTION_LINK_ORDER)
{
- tree id = DECL_ASSEMBLER_NAME (decl);
- ultimate_transparent_alias_target ();
- const char *name = IDENTIFIER_POINTER (id);
- name = targetm.strip_name_encoding (name);
- fprintf (asm_out_file, ",%s", name);
+ fputc (',', asm_out_file);
+ assemble_name_raw (asm_out_file, func_label);
}
   if (HAVE_COMDAT_GROUP && (flags & SECTION_LINKONCE))
{
@@ -7821,11 +7838,6 @@ switch_to_section (section *new_section, tree decl)
   else if (in_section == new_section)
 return;
 
-  if (new_section->common.flags & SECTION_FORGET)
-in_section = NULL;
-  else
-in_section = new_section;
-
   switch (SECTION_STYLE (new_section))
 {
 case SECTION_NAMED:
@@ -7843,6 +7855,11 @@ switch_to_section (section *new_section, tree decl)
   break;
 }
 
+  if (new_section->common.flags & SECTION_FORGET)
+in_section = NULL;
+  else
+in_section = new_section;
+
   new_section->common.flags |= SECTION_DECLARED;
 }
 


PR98125, PowerPC64 -fpatchable-function-entry

2021-05-06 Thread Alan Modra via Gcc-patches
This series of patches fixes -fpatchable-function-entry on PowerPC64
ELFv1 so that SECTION_LINK_ORDER (.section 'o' arg) is now supported,
and on PowerPC64 ELFv2 to not break the global entry code.

Bootstrapped powerpc64le-linux and x86_64-linux all langs.  I did see
one regression on both targets, libgo runtime/pprof.  It's unclear to
me what that means.

Alan Modra (3):
  PowerPC64 ELFv1 -fpatchable-function-entry
  Revert "rs6000: Avoid -fpatchable-function-entry* regressions on
powerpc64 be [PR98125]"
  PowerPC64 ELFv2 -fpatchable-function-entry

 gcc/config/rs6000/rs6000-logue.c |  5 
 gcc/config/rs6000/rs6000.c   | 47 +---
 gcc/targhooks.c  | 38 --
 gcc/targhooks.h  |  3 --
 gcc/testsuite/g++.dg/pr93195a.C  |  1 -
 gcc/varasm.c | 37 ++---
 6 files changed, 69 insertions(+), 62 deletions(-)



Re: [PATCH/RFC] combine: Tweak the condition of last_set invalidation

2021-05-06 Thread Kewen.Lin via Gcc-patches
Hi Segher,

>>
>> I think this should be postponed to stage 1 though?  Or is there
>> anything very urgent in it?
>>
> 
> Yeah, I agree that this belongs to stage1, and there isn't anything
> urgent about it.  Thanks for all further comments above!
> 

Gentle ping this:

https://gcc.gnu.org/pipermail/gcc-patches/2020-December/562015.html

BR,
Kewen


[PATCH] forwprop: Support vec perm fed by CTOR and CTOR/CST [PR99398]

2021-05-06 Thread Kewen.Lin via Gcc-patches
Hi, 

This patch is to teach forwprop to optimize some cases where the
permutated operands of vector permutation are from two same type
CTOR and CTOR or one CTOR and one VECTOR CST.  It aggressively
takes VIEW_CONVERT_EXPR as trivial copies and transform the vector
permutation into vector CTOR.

Bootstrapped/regtested on powerpc64le-linux-gnu P9, powerpc64-linux-gnu P8,
x86_64-redhat-linux and aarch64-linux-gnu.

Is it ok for trunk?

BR,
Kewen
--
gcc/ChangeLog:

PR tree-optimization/99398
* tree-ssa-forwprop.c (get_prop_source_stmt): Add optional argument
view_conv_prop to indicate whether to take VIEW_CONVERT_EXPR as
trivial copy.  Add handlings for this argument.
(remove_prop_source_from_use): Likewise.
(simplify_permutation): Optimize some cases where the fed operands
are CTOR/CST and propagated through VIEW_CONVERT_EXPR.  Add the
call to vec_perm_indices::new_shrinked_vector.
* vec-perm-indices.c (vec_perm_indices::new_shrinked_vector): New
function.
* vec-perm-indices.h (vec_perm_indices::new_shrinked_vector): New
declare.

gcc/testsuite/ChangeLog:

PR tree-optimization/99398
* gcc.target/powerpc/vec-perm-ctor-run.c: New test.
* gcc.target/powerpc/vec-perm-ctor.c: New test.
* gcc.target/powerpc/vec-perm-ctor.h: New test.
---
 .../gcc.target/powerpc/vec-perm-ctor-run.c| 124 +
 .../gcc.target/powerpc/vec-perm-ctor.c|   9 +
 .../gcc.target/powerpc/vec-perm-ctor.h| 163 ++
 gcc/tree-ssa-forwprop.c   | 135 +--
 gcc/vec-perm-indices.c|  64 +++
 gcc/vec-perm-indices.h|   1 +
 6 files changed, 481 insertions(+), 15 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-perm-ctor-run.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-perm-ctor.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-perm-ctor.h

diff --git a/gcc/testsuite/gcc.target/powerpc/vec-perm-ctor-run.c 
b/gcc/testsuite/gcc.target/powerpc/vec-perm-ctor-run.c
new file mode 100644
index 000..987d6db999c
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/vec-perm-ctor-run.c
@@ -0,0 +1,124 @@
+/* { dg-do run } */
+/* { dg-require-effective-target vsx_hw } */
+/* { dg-options "-O2 -mvsx" } */
+
+#include "vec-perm-ctor.h"
+
+#include 
+
+int
+main ()
+{
+  du a_du = 100ULL;
+  du b_du = 200ULL;
+
+  di a_di = -100;
+  di b_di = 200;
+
+  df a_df = 10.0;
+  df b_df = 20.0;
+
+  si a_si = 12;
+  si b_si = -25;
+  si c_si = -37;
+  si d_si = 50;
+
+  sf a_sf = 30.0f;
+  sf b_sf = 40.0f;
+  sf c_sf = 50.0f;
+  sf d_sf = 60.0f;
+
+  hu a_hu = 10;
+  hu b_hu = 20;
+  hu c_hu = 30;
+  hu d_hu = 40;
+  hu e_hu = 50;
+  hu f_hu = 60;
+  hu g_hu = 70;
+  hu h_hu = 80;
+
+  qi a_qi = 10;
+  qi b_qi = 20;
+  qi c_qi = -30;
+  qi d_qi = 40;
+  qi e_qi = -50;
+  qi f_qi = 60;
+  qi g_qi = 70;
+  qi h_qi = -80;
+
+  v2du res1 = test_ctor_ctor_same_du (a_du, b_du);
+  if (res1[0] != a_du || res1[1] != b_du)
+abort ();
+
+  v2df res2 = test_ctor_ctor_same_df (a_df, b_df);
+  if (res2[0] != a_df || res2[1] != b_df)
+abort ();
+
+  v4si res3 = test_ctor_ctor_same_si (a_si, b_si, c_si, d_si);
+  if (res3[0] != a_si || res3[1] != b_si || res3[2] != c_si || res3[3] != d_si)
+abort ();
+
+  v4sf res4 = test_ctor_ctor_same_sf (a_sf, b_sf, c_sf, d_sf);
+  if (res4[0] != a_sf || res4[1] != b_sf || res4[2] != c_sf || res4[3] != d_sf)
+abort ();
+
+  v8hu res5
+= test_ctor_ctor_same_hu (a_hu, b_hu, c_hu, d_hu, e_hu, f_hu, g_hu, h_hu);
+
+  if (res5[0] != a_hu || res5[1] != b_hu || res5[2] != c_hu || res5[3] != d_hu
+  || res5[4] != e_hu || res5[5] != f_hu || res5[6] != g_hu
+  || res5[7] != h_hu)
+abort ();
+
+  v16qi res6
+= test_ctor_ctor_same_qi (a_qi, b_qi, c_qi, d_qi, e_qi, f_qi, g_qi, h_qi);
+
+  if (res6[0] != a_qi || res6[1] != b_qi || res6[2] != c_qi || res6[3] != d_qi
+  || res6[4] != a_qi || res6[5] != b_qi || res6[6] != c_qi
+  || res6[7] != d_qi || res6[8] != e_qi || res6[9] != f_qi
+  || res6[10] != g_qi || res6[11] != h_qi || res6[12] != e_qi
+  || res6[13] != f_qi || res6[14] != g_qi || res6[15] != h_qi)
+abort ();
+
+  v2du res7 = test_ctor_cst_same_du (a_du, b_du);
+  if (res7[0] != a_du || res7[1] != 100)
+abort ();
+
+  v4sf res8 = test_ctor_cst_same_sf (a_sf, b_sf);
+  if (res8[0] != a_sf || res8[1] != 2.0f || res8[2] != b_sf || res8[3] != 4.0f)
+abort ();
+
+  v2df res9 = test_ctor_cst_same_df (a_df, b_df);
+  if (res9[0] != b_df || res9[1] != 200.0)
+abort ();
+
+  v4si res10 = test_cst_ctor_same_si (a_si, b_si);
+  if (res10[0] != 1 || res10[1] != 3 || res10[2] != a_si || res10[3] != b_si)
+abort ();
+
+  v2di res11 = test_ctor_cst_diff_di_si (a_di, b_di);
+  /* Need to take care of the endianness since the function converts vector
+ const to one different vector type (element 

Ping: [PATCH 1/2] correct BB frequencies after loop changed

2021-05-06 Thread guojiufu via Gcc-patches

Gentle ping.

Original message: 
https://gcc.gnu.org/pipermail/gcc-patches/2020-October/555871.html



Thanks,
Jiufu Guo.


[PATCH] rs6000: Support more short/char to float conversion

2021-05-06 Thread Kewen.Lin via Gcc-patches
Hi,

For some cases that when we load unsigned char/short values from
the appropriate unsigned char/short memories and convert them to
double/single precision floating point value, there would be
implicit conversions to int first.  It makes GCC not leverage the
P9 instructions lxsibzx/lxsihzx.  This patch is to add the related
define_insn_and_split to support this kind of scenario.

Bootstrapped/regtested on powerpc64le-linux-gnu P9 and
powerpc64-linux-gnu P8.

Is it ok for trunk?

BR,
Kewen
--
gcc/ChangeLog:

* config/rs6000/rs6000.md
(floatsi2_lfiwax__mem_zext): New
define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/p9-fpcvt-3.c: New test.
---
 gcc/config/rs6000/rs6000.md   | 22 +++
 gcc/testsuite/gcc.target/powerpc/p9-fpcvt-3.c | 27 +++
 2 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/p9-fpcvt-3.c

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index c8cdc42533c..3ac7ed20852 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -5504,6 +5504,28 @@ (define_insn_and_split "floatsi2_lfiwax_mem"
   [(set_attr "length" "8")
(set_attr "type" "fpload")])
 
+(define_insn_and_split "floatsi2_lfiwax__mem_zext"
+  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,")
+   (float:SFDF
+(zero_extend:SI
+ (match_operand:QHI 1 "indexed_or_indirect_operand" "Z,Z"
+   (clobber (match_scratch:DI 2 "=d,wa"))]
+  "TARGET_HARD_FLOAT &&  && TARGET_P9_VECTOR
+   && TARGET_POWERPC64 && TARGET_DIRECT_MOVE"
+  "#"
+  "&& 1"
+  [(pc)]
+{
+  operands[1] = rs6000_force_indexed_or_indirect_mem (operands[1]);
+  if (GET_CODE (operands[2]) == SCRATCH)
+operands[2] = gen_reg_rtx (DImode);
+  emit_insn (gen_zero_extendhidi2 (operands[2], operands[1]));
+  emit_insn (gen_floatdi2 (operands[0], operands[2]));
+  DONE;
+}
+  [(set_attr "length" "8")
+   (set_attr "type" "fpload")])
+
 (define_insn "lfiwzx"
   [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wa,wa,wa")
(unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r,wa")]
diff --git a/gcc/testsuite/gcc.target/powerpc/p9-fpcvt-3.c 
b/gcc/testsuite/gcc.target/powerpc/p9-fpcvt-3.c
new file mode 100644
index 000..d3bbe36b759
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/p9-fpcvt-3.c
@@ -0,0 +1,27 @@
+/* { dg-do compile { target lp64 } } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-mdejagnu-cpu=power9 -O2" } */
+
+/* Note that for unsigned cases, the differences from those ones in
+   p9-fpcvt-2.c is that they will be converted to int implicitly first
+   and then to floating point.  */
+
+double sc_df (signed char*p, double df) { return *p + df; }
+double uc_df (unsigned char  *p, double df) { return *p + df; }
+double ss_df (signed short   *p, double df) { return *p + df; }
+double us_df (unsigned short *p, double df) { return *p + df; }
+
+float sc_sf (signed char*p, float sf) { return *p + sf; }
+float uc_sf (unsigned char  *p, float sf) { return *p + sf; }
+float ss_sf (signed short   *p, float sf) { return *p + sf; }
+float us_sf (unsigned short *p, float sf) { return *p + sf; }
+
+/* { dg-final { scan-assembler "lxsibzx"  } } */
+/* { dg-final { scan-assembler "lxsihzx"  } } */
+/* { dg-final { scan-assembler "vextsb2d" } } */
+/* { dg-final { scan-assembler "vextsh2d" } } */
+/* { dg-final { scan-assembler-not "mfvsrd"   } } */
+/* { dg-final { scan-assembler-not "mfvsrwz"  } } */
+/* { dg-final { scan-assembler-not "mtvsrd"   } } */
+/* { dg-final { scan-assembler-not "mtvsrwa"  } } */
+/* { dg-final { scan-assembler-not "mtvsrwz"  } } */
-- 
2.17.1



[PATCH] rs6000: Adjust rs6000_density_test for strided_load

2021-05-06 Thread Kewen.Lin via Gcc-patches
Hi,

We noticed that SPEC2017 503.bwaves_r run time degrades by about 8%
on P8 and P9 if we enabled vectorization at O2 fast-math.

Comparing to Ofast, compiler doesn't do the loop interchange on the
innermost loop, it's not profitable to vectorize it then.  Since
with loop vectorization, the loop becomes very intensive (density
ratio is 83), there are many scalar loads and further to construct
vector, it's bad that the vector CTORs have to wait for the required
loads are ready.

Now we have the function rs6000_density_test to check this kind of
intensive case, but for this case, the threshold is too generic and
a bit high.  This patch is to tweak the density heuristics by
introducing some more thresholds for strided_load, avoid to affect
some potential bmks sensitive to DENSITY_PCT_THRESHOLD change which
is generic.

Bootstrapped/regtested on powerpc64le-linux-gnu P9.

Nothing remarkable was observed with SPEC2017 Power9 full run,
excepting for bwaves_r degradation has been fixed.

Is it ok for trunk?

BR,
Kewen
--
gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_density_test): Add new heuristics
for strided_load density check.
---
 gcc/config/rs6000/rs6000.c | 88 +-
 1 file changed, 77 insertions(+), 11 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index ffdf10098a9..5ae40d6f4ce 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5245,12 +5245,16 @@ rs6000_density_test (rs6000_cost_data *data)
   const int DENSITY_PCT_THRESHOLD = 85;
   const int DENSITY_SIZE_THRESHOLD = 70;
   const int DENSITY_PENALTY = 10;
+  const int DENSITY_LOAD_PCT_THRESHOLD = 80;
+  const int DENSITY_LOAD_FOR_CTOR_PCT_THRESHOLD = 65;
+  const int DENSITY_LOAD_SIZE_THRESHOLD = 20;
   struct loop *loop = data->loop_info;
   basic_block *bbs = get_loop_body (loop);
   int nbbs = loop->num_nodes;
   loop_vec_info loop_vinfo = loop_vec_info_for_loop (data->loop_info);
   int vec_cost = data->cost[vect_body], not_vec_cost = 0;
   int i, density_pct;
+  unsigned int nload_total = 0, nctor_for_strided = 0, nload_for_ctor = 0;
 
   /* Only care about cost of vector version, so exclude scalar version here.  
*/
   if (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) != (void *) data)
@@ -5272,21 +5276,83 @@ rs6000_density_test (rs6000_cost_data *data)
  if (!STMT_VINFO_RELEVANT_P (stmt_info)
  && !STMT_VINFO_IN_PATTERN_P (stmt_info))
not_vec_cost++;
+ else
+   {
+ stmt_vec_info vstmt_info = vect_stmt_to_vectorize (stmt_info);
+ if (STMT_VINFO_DATA_REF (vstmt_info)
+ && DR_IS_READ (STMT_VINFO_DATA_REF (vstmt_info)))
+   {
+ if (STMT_VINFO_STRIDED_P (vstmt_info))
+   {
+ unsigned int ncopies = 1;
+ unsigned int nunits = 1;
+ /* TODO: For VMAT_STRIDED_SLP, the total CTOR can be
+fewer due to group access.  Simply handle it here
+for now.  */
+ if (!STMT_SLP_TYPE (vstmt_info))
+   {
+ tree vectype = STMT_VINFO_VECTYPE (vstmt_info);
+ ncopies = vect_get_num_copies (loop_vinfo, vectype);
+ nunits = vect_nunits_for_cost (vectype);
+   }
+ unsigned int nloads = ncopies * nunits;
+ nload_for_ctor += nloads;
+ nload_total += nloads;
+ nctor_for_strided += ncopies;
+   }
+ else
+   nload_total++;
+   }
+   }
}
 }
-
   free (bbs);
-  density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);
 
-  if (density_pct > DENSITY_PCT_THRESHOLD
-  && vec_cost + not_vec_cost > DENSITY_SIZE_THRESHOLD)
-{
-  data->cost[vect_body] = vec_cost * (100 + DENSITY_PENALTY) / 100;
-  if (dump_enabled_p ())
-   dump_printf_loc (MSG_NOTE, vect_location,
-"density %d%%, cost %d exceeds threshold, penalizing "
-"loop body cost by %d%%", density_pct,
-vec_cost + not_vec_cost, DENSITY_PENALTY);
+  if (vec_cost + not_vec_cost > DENSITY_SIZE_THRESHOLD)
+{
+  density_pct = (vec_cost * 100) / (vec_cost + not_vec_cost);
+  if (density_pct > DENSITY_PCT_THRESHOLD)
+   {
+ data->cost[vect_body] = vec_cost * (100 + DENSITY_PENALTY) / 100;
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_NOTE, vect_location,
+"density %d%%, cost %d exceeds threshold, "
+"penalizing loop body cost by %d%%.\n",
+density_pct, vec_cost + not_vec_cost,
+DENSITY_PENALTY);
+   }
+  /* For one loop which has a large proportion 

[PATCH] rs6000: Make density_test only for vector version

2021-05-06 Thread Kewen.Lin via Gcc-patches
Hi,

When I was investigating density_test heuristics, I noticed that
the current rs6000_density_test could be used for single scalar
iteration cost calculation, through the call trace:
  vect_compute_single_scalar_iteration_cost
-> rs6000_finish_cost
 -> rs6000_density_test

It looks unexpected as its desriptive function comments and Bill
helped to confirm this needs to be fixed (thanks!).

So this patch is to check the passed data, if it's the same as
the one in loop_vinfo, it indicates it's working on vector version
cost calculation, otherwise just early return.

Bootstrapped/regtested on powerpc64le-linux-gnu P9.

Nothing remarkable was observed with SPEC2017 Power9 full run.

Is it ok for trunk?

BR,
Kewen
--
gcc/ChangeLog:

* config/rs6000/rs6000.c (rs6000_density_test): Early return if
calculating single scalar iteration cost.
 gcc/config/rs6000/rs6000.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 48b8efd732b..ffdf10098a9 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -5252,6 +5252,10 @@ rs6000_density_test (rs6000_cost_data *data)
   int vec_cost = data->cost[vect_body], not_vec_cost = 0;
   int i, density_pct;
 
+  /* Only care about cost of vector version, so exclude scalar version here.  
*/
+  if (LOOP_VINFO_TARGET_COST_DATA (loop_vinfo) != (void *) data)
+return;
+
   for (i = 0; i < nbbs; i++)
 {
   basic_block bb = bbs[i];
-- 
2.17.1



Re: [PATCH] run early sprintf warning after SSA (PR 100325)

2021-05-06 Thread Martin Sebor via Gcc-patches

On 5/6/21 8:32 AM, Aldy Hernandez wrote:



On 5/5/21 9:26 AM, Richard Biener wrote:

On Wed, May 5, 2021 at 1:32 AM Martin Sebor via Gcc-patches
 wrote:


With no optimization, -Wformat-overflow and -Wformat-truncation
runs early to detect a subset of simple bugs.  But as it turns out,
the pass runs just a tad too early, before SSA.  That causes it to
miss a class of problems that can easily be detected once code is
in SSA form, and I would expect might also cause false positives.

The attached change moves the sprintf pass just after pass_build_ssa,
similar to other early flow-sensitive warnings (-Wnonnull-compare and
-Wuninitialized).


Makes sense.  I suppose walloca might also benefit from SSA - it seems
to do range queries which won't work quite well w/o SSA?


The early Walloca pass that runs without optimization doesn't do much, 
as we've never had ranges so early.  All it does is diagnose _every_ 
call to alloca(), if -Walloca is passed:


   // The first time this pass is called, it is called before
   // optimizations have been run and range information is unavailable,
   // so we can only perform strict alloca checking.
   if (first_time_p)
     return warn_alloca != 0;

Though, I suppose we could move the first alloca pass after SSA is 
available and make it the one and only pass, since ranger only needs 
SSA.  However, I don't know how well this would work without value 
numbering or CSE.  For example, for gcc.dg/Walloca-4.c the gimple is:


    :
   _1 = rear_ptr_9(D) - w_10(D);
   _2 = (long unsigned int) _1;
   if (_2 <= 4095)
     goto ; [INV]
   else
     goto ; [INV]

    :
   _3 = rear_ptr_9(D) - w_10(D);
   _4 = (long unsigned int) _3;
   src_16 = __builtin_alloca (_4);
   goto ; [INV]

No ranges can be determined for _4.  However, if either FRE or DOM run, 
as they do value numbering and CSE respectively, we could easily 
determine a range as the above would become:


   :
   _1 = rear_ptr_9(D) - w_10(D);
   _2 = (long unsigned int) _1;
   if (_2 <= 4095)
     goto ; [INV]
   else
     goto ; [INV]

    :
   src_16 = __builtin_alloca (_2);
   goto ; [INV]

I'm inclined to leave the first alloca pass before SSA runs, as it 
doesn't do anything with ranges.  If anyone's open to a simple -O0 CSE 
type pass, it would be a different story.  Thoughts?


Improving the analysis at -O0 and getting better warnings that are
more consistent with what is issued with optimization would be very
helpful (as as long as it doesn't compromise debugging experience
of course).

Martin



Aldy





[committed] preprocessor: Fix pp-number lexing of digit separators [PR83873, PR97604]

2021-05-06 Thread Joseph Myers
When the preprocessor lexes preprocessing numbers in lex_number, it
accepts digit separators in more cases than actually permitted in
pp-numbers by the standard syntax.

One thing this accepts is adjacent digit separators; there is some
code to reject those later, but as noted in bug 83873 it fails to
cover the case of adjacent digit separators within a floating-point
exponent.  Accepting adjacent digit separators only results in a
missing diagnostic, not in valid code being rejected or being accepted
with incorrect semantics, because the correct lexing in such a case
would have '' start the following preprocessing tokens, and no valid
preprocessing token starts '' while ' isn't valid on its own as a
preprocessing token either.  So this patch fixes that case by moving
the error for adjacent digit separators to lex_number (allowing a more
specific diagnostic than if '' were excluded from the pp-number
completely).

Other cases inappropriately accepted involve digit separators before
'.', 'e+', 'e-', 'p+' or 'p-' (or corresponding uppercase variants).
In those cases, as shown by the test digit-sep-pp-number.C added, this
can result in valid code being wrongly rejected as a result of too
many characters being included in the pp-number.  So this case is
fixed by terminating the pp-number at the correct character according
to the standard.  That test also covers the case where a digit
separator was followed by an identifier-nondigit that is not a
nondigit (e.g. a UCN); that case was already handled correctly.

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  Applied to 
mainline.

libcpp/
PR c++/83873
PR preprocessor/97604
* lex.c (lex_number): Reject adjacent digit separators here.  Do
not allow digit separators before '.' or an exponent with sign.
* expr.c (cpp_classify_number): Do not check for adjacent digit
separators here.

gcc/testsuite/
PR c++/83873
PR preprocessor/97604
* g++.dg/cpp1y/digit-sep-neg-2.C,
g++.dg/cpp1y/digit-sep-pp-number.C: New tests.
* g++.dg/cpp1y/digit-sep-line-neg.C, g++.dg/cpp1y/digit-sep-neg.C:
Adjust expected messages.

diff --git a/gcc/testsuite/g++.dg/cpp1y/digit-sep-line-neg.C 
b/gcc/testsuite/g++.dg/cpp1y/digit-sep-line-neg.C
index fa3b1352109..239d0287b74 100644
--- a/gcc/testsuite/g++.dg/cpp1y/digit-sep-line-neg.C
+++ b/gcc/testsuite/g++.dg/cpp1y/digit-sep-line-neg.C
@@ -2,3 +2,4 @@
 // { dg-do preprocess { target c++14 } }
 
 #line 0''123 // { dg-error "is not a positive integer" }
+// { dg-error "adjacent digit separators" "adjacent" { target *-*-* } .-1 }
diff --git a/gcc/testsuite/g++.dg/cpp1y/digit-sep-neg-2.C 
b/gcc/testsuite/g++.dg/cpp1y/digit-sep-neg-2.C
new file mode 100644
index 000..09393aaf838
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/digit-sep-neg-2.C
@@ -0,0 +1,4 @@
+// Test adjacent digit separators rejected in exponent (bug 83873).
+// { dg-do compile { target c++14 } }
+
+double d = 1.0e1''0; /* { dg-error "adjacent digit separators" } */
diff --git a/gcc/testsuite/g++.dg/cpp1y/digit-sep-neg.C 
b/gcc/testsuite/g++.dg/cpp1y/digit-sep-neg.C
index 5343e52c5a5..300fe51041b 100644
--- a/gcc/testsuite/g++.dg/cpp1y/digit-sep-neg.C
+++ b/gcc/testsuite/g++.dg/cpp1y/digit-sep-neg.C
@@ -16,7 +16,7 @@ main()
   unsigned u = 0b0001''''''U; // { dg-error "digit 
separator outside digit sequence" }
 
   double d = 0.0;
-  d = 1'.602'176'565e-19; // { dg-error "digit separator adjacent to decimal 
point" }
+  d = 1'.602'176'565e-19; // { dg-warning "multi-character" }
   d = 1.'602'176'565e-19; // { dg-error "digit separator adjacent to decimal 
point" }
   d = 1.602''176'565e-19; // { dg-error "adjacent digit separators" }
   d = 1.602'176'565'e-19; // { dg-error "digit separator adjacent to exponent" 
}
@@ -29,4 +29,5 @@ main()
 
 // { dg-error "exponent has no digits" "exponent has no digits" { target *-*-* 
} 23 }
 // { dg-error "expected ';' before" "expected ';' before" { target *-*-* } 15 }
+// { dg-error "expected ';' before" "expected ';' before" { target *-*-* } 19 }
 // { dg-error "expected ';' before" "expected ';' before" { target *-*-* } 26 }
diff --git a/gcc/testsuite/g++.dg/cpp1y/digit-sep-pp-number.C 
b/gcc/testsuite/g++.dg/cpp1y/digit-sep-pp-number.C
new file mode 100644
index 000..9777382224d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/digit-sep-pp-number.C
@@ -0,0 +1,17 @@
+// Test lexing of pp-numbers does not allow digit separators that do
+// not form part of the pp-number syntax, when the code is valid with
+// correct lexing but not with too many characters accepted in the
+// pp-number (bug 97604).
+// { dg-do compile { target c++14 } }
+
+static_assert (0x0'e-0xe == 0, "signs");
+
+#define a0 '.' -
+#define acat(x) a ## x
+static_assert (acat (0'.') == 0, ".");
+
+// This case was not actually buggy.
+#define c0(x) 0
+#define b0 c0 (
+#define bcat(x) b ## x
+static_assert (bcat (0'\u00c0')) == 0, 

Re: [PATCH] Hashtable PR96088

2021-05-06 Thread François Dumont via Gcc-patches

Hi

    Considering your feedback on backtrace in debug mode is going to 
take me some time so here is another one.


    Compared to latest submission I've added a _Hash_arg_t partial 
specialization for std::hash<>. It is not strictly necessary for the 
moment but when we will eventually remove its nested argument_type it 
will be. I also wonder if it is not easier to handle for the compiler, 
not sure about that thought.


Tested under Linux x86_64, ok to commit ?

François


On 04/12/20 10:10 am, François Dumont wrote:
Following submission of the heterogeneous lookup in unordered 
containers I rebased this patch on top of it.


Appart from reducing its size because of some code reuse the 
heterogeneous lookup had no impact on this one. This is because when I 
cannot find out if conversion from inserted element type to hash 
functor can throw then I pass the element as-is, like if hash functor 
was transparent.


    libstdc++: Limit allocation on iterator insertion in Hashtable [PR 
96088]


    Detect Hash functor argument type to find out if it is different 
to the
    container key_type and if a temporary instance needs to be 
generated to invoke
    the functor from the iterator value_type key part. If this 
temporary generation
    can throw a key_type instance is generated at Hashtable level and 
used to call

    the functors and, if necessary, moved to the storage.

    libstdc++-v3/ChangeLog:

    PR libstdc++/96088
    * include/bits/hashtable_policy.h (_Select2nd): New.
    (_NodeBuilder<>): New.
    (_ReuseOrAllocNode<>::operator()): Use variadic template 
args.

    (_AllocNode<>::operator()): Likewise.
    (_Hash_code_base<>::_M_hash_code): Add _Kt template 
parameter.

    (_Hashtable_base<>::_M_equals): Add _Kt template parameter.
    * include/bits/hashtable.h
    (_Hashtable<>::__node_builder_t): New.
    (_Hashtable<>::_M_find_before_node): Add _Kt template 
parameter.

    (_Hashtable<>::_M_find_node): Likewise.
    (_Hashtable<>::_Hash_arg_t): New.
    (_Hashtable<>::_S_forward_key): New.
(_Hashtable<>::_M_insert_unique<>(_Kt&&, _Arg&&, const _NodeGenerator&)):
 New.
    (_Hashtable<>::_M_insert): Use latter.
    * testsuite/23_containers/unordered_map/96088.cc: New test.
    * testsuite/23_containers/unordered_multimap/96088.cc: New 
test.
    * testsuite/23_containers/unordered_multiset/96088.cc: New 
test.

    * testsuite/23_containers/unordered_set/96088.cc: New test.
    * testsuite/util/replacement_memory_operators.h
    (counter::_M_increment): New.
    (counter::_M_decrement): New.
    (counter::reset()): New.

Note that I plan to change counter type name to something more 
meaningful but only when back to stage 1.


François

On 24/10/20 4:25 pm, François Dumont wrote:

Hi

    Just a rebase of this patch.

François

On 17/10/20 6:21 pm, François Dumont wrote:

I eventually would like to propose the following resolution.

For multi-key containers I kept the same resolution build the node 
first and compute has code from the node key.


For unique-key ones I change behavior when I can't find out hash 
functor argument type. I am rather using the iterator key type and 
just hope that the user's functors are prepared for it.


For now I am using functor argument_type which is deprecated. I just 
hope that the day we remove it we will have a compiler built-in to 
get any functor argument type given an input type.


    libstdc++: Limit allocation on iterator insertion in Hashtable 
[PR 96088]


    Detect Hash functor argument type to find out if it is different 
to the
    container key_type and if a temporary instance needs to be 
generated to invoke
    the functor from the iterator value_type key part. If this 
temporary generation
    can throw a key_type instance is generated at Hashtable level 
and use to call

    the functors and, if needed, move it to the storage.

    libstdc++-v3/ChangeLog:

    PR libstdc++/96088
    * include/bits/hashtable_policy.h (_Select2nd): New.
    (_NodeBuilder<>): New.
    (_ReuseOrAllocNode<>::operator()): Use varriadic 
template args.

    (_AllocNode<>::operator()): Likewise.
    (_Hash_code_base<>::_M_hash_code): Add _KType template 
parameter.
    (_Hashtable_base<>::_M_equals): Add _KType template 
parameter.

    * include/bits/hashtable.h
    (_Hashtable<>::__node_builder_t): New.
    (_Hashtable<>::_M_find_before_node): Add _KType template 
parameter.

    (_Hashtable<>::_M_find_node): Likewise.
    (_Hashtable<>::_Hash_arg_t): New.
    (_Hashtable<>::_S_forward_key): New.
(_Hashtable<>::_M_insert_unique<>(_KType&&, _Arg&&, const 
_NodeGenerator&)):

 New.
    (_Hashtable<>::_M_insert): Use latter.
    * 

Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-06 Thread Marc Glisse

On Thu, 6 May 2021, Jakub Jelinek via Gcc-patches wrote:


Though, (x&1) == x is equivalent to both (x&~1)==0 and to x < 2U
and from the latter two it isn't obvious which one is better/more canonical.
On aarch64 I don't see differences in number of insns nor in their size:
 10:13001c00sxtbw0, w0
 14:721f781ftst w0, #0xfffe
 18:1a9f17e0csetw0, eq  // eq = none
 1c:d65f03c0ret
vs.
 20:12001c00and w0, w0, #0xff
 24:7100041fcmp w0, #0x1
 28:1a9f87e0csetw0, ls  // ls = plast
 2c:d65f03c0ret
On x86_64 same number of insns, but the comparison is shorter (note, the
spaceship result is a struct with signed char based enum):
 10:31 c0   xor%eax,%eax
 12:81 e7 fe 00 00 00   and$0xfe,%edi
 18:0f 94 c0sete   %al
 1b:c3  retq
 1c:0f 1f 40 00 nopl   0x0(%rax)
vs.
 20:31 c0   xor%eax,%eax
 22:40 80 ff 01 cmp$0x1,%dil
 26:0f 96 c0setbe  %al
 29:c3  retq
Generally, I'd think that the comparison should be better because it
will be most common in user code that way and VRP will be able to do the
best thing for it.


We can probably do it in 2 steps, first something like

(for cmp (eq ne)
 (simplify
  (cmp (bit_and:c @0 @1) @0)
  (cmp (@0 (bit_not! @1)) { build_zero_cst (TREE_TYPE (@0)); })))

to get rid of the double use, and then simplify X==0 to X<=~C if C is a 
mask 111...000 (I thought we already had a function to detect such masks, 
or the 000...111, but I can't find them anymore).


I agree that the comparison seems preferable, although if X is signed, the 
way GIMPLE represents types will add an inconvenient cast. And I think VRP 
already manages to use the bit test to derive a range.


--
Marc Glisse


Re: [Patch, fortran] PRs 46691 and 99819: Assumed and explicit size class arrays

2021-05-06 Thread Paul Richard Thomas via Gcc-patches
It's 46991 of course.

Many thanks

Paul


On Thu, 6 May 2021 at 17:15, Paul Richard Thomas <
paul.richard.tho...@gmail.com> wrote:

> Blast! Thanks for pointing it out. The testcase is in a directory
> ~/prs/pr46691, which I then took from the editor. Original sin and all
> that.
>
> Paul
>
>
> On Thu, 6 May 2021 at 17:06, Jonathan Wakely  wrote:
>
>> PR 46691 is the wrong PR number:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46691
>>
>> The comment in the testcase is wrong, and the ChangeLog file will get
>> auto-updated with the wrong number overnight (you can manually edit it
>> and push the fix tomorrow after it's been generated).
>>
>>
>>
>
> --
> "If you can't explain it simply, you don't understand it well enough" -
> Albert Einstein
>


-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


Re: [Patch, fortran] PRs 46691 and 99819: Assumed and explicit size class arrays

2021-05-06 Thread Paul Richard Thomas via Gcc-patches
Blast! Thanks for pointing it out. The testcase is in a directory
~/prs/pr46691, which I then took from the editor. Original sin and all
that.

Paul


On Thu, 6 May 2021 at 17:06, Jonathan Wakely  wrote:

> PR 46691 is the wrong PR number:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46691
>
> The comment in the testcase is wrong, and the ChangeLog file will get
> auto-updated with the wrong number overnight (you can manually edit it
> and push the fix tomorrow after it's been generated).
>
>
>

-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||

2021-05-06 Thread Jakub Jelinek via Gcc-patches
On Thu, May 06, 2021 at 04:21:40PM +0200, Tobias Burnus wrote:
>   * omp-low.c (lower_rec_simd_input_clauses): Set max_vf = 1 if
>   a truth_value_p reduction variable is nonintegral.
>   (lower_rec_input_clauses): Also handle SIMT part
>   for complex/float recution with && and ||.

s/recution/reduction/

> --- a/gcc/omp-low.c
> +++ b/gcc/omp-low.c
> @@ -4389,14 +4389,28 @@ lower_rec_simd_input_clauses (tree new_var, 
> omp_context *ctx,
>   {
> for (tree c = gimple_omp_for_clauses (ctx->stmt); c;
>  c = OMP_CLAUSE_CHAIN (c))
> - if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
> - && OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
> -   {
> - /* UDR reductions are not supported yet for SIMT, disable
> -SIMT.  */
> - sctx->max_vf = 1;
> - break;
> + {
> +   if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
> + continue;
> +
> +   if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
> + {
> +   /* UDR reductions are not supported yet for SIMT, disable
> +  SIMT.  */
> +   sctx->max_vf = 1;
> +   break;
> + }
> +
> +   if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c))
> +   && !INTEGRAL_TYPE_P (TREE_TYPE (new_var)))
> + {
> +   /* Doing boolean operations on non-boolean types is
> +  for conformance only, it's not worth supporting this
> +  for SIMT.  */

This comment needs to be adjusted to talk about non-integral types.

> +   sctx->max_vf = 1;
> +   break;
> }
> + }
>   }
>if (maybe_gt (sctx->max_vf, 1U))
>   {
> @@ -6432,28 +6446,34 @@ lower_rec_input_clauses (tree clauses, gimple_seq 
> *ilist, gimple_seq *dlist,
>  
> gimplify_assign (unshare_expr (ivar), x, [0]);
>  
> -   if (sctx.is_simt)
> - {
> -   if (!simt_lane)
> - simt_lane = create_tmp_var (unsigned_type_node);
> -   x = build_call_expr_internal_loc
> - (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
> -  TREE_TYPE (ivar), 2, ivar, simt_lane);
> -   x = build2 (code, TREE_TYPE (ivar), ivar, x);
> -   gimplify_assign (ivar, x, [2]);
> - }
> tree ivar2 = ivar;
> tree ref2 = ref;
> +   tree zero = NULL_TREE;
> if (is_fp_and_or)
>   {
> -   tree zero = build_zero_cst (TREE_TYPE (ivar));
> +   zero = build_zero_cst (TREE_TYPE (ivar));
> ivar2 = fold_build2_loc (clause_loc, NE_EXPR,
>  integer_type_node, ivar,
>  zero);
> ref2 = fold_build2_loc (clause_loc, NE_EXPR,
> integer_type_node, ref, zero);
>   }
> -   x = build2 (code, TREE_TYPE (ref), ref2, ivar2);
> +   if (sctx.is_simt)
> + {
> +   if (!simt_lane)
> + simt_lane = create_tmp_var (unsigned_type_node);
> +   x = build_call_expr_internal_loc
> + (UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
> +  TREE_TYPE (ivar), 2, ivar, simt_lane);
> +   if (is_fp_and_or)
> + x = fold_build2_loc (clause_loc, NE_EXPR,
> +  integer_type_node, x, zero);
> +   x = build2 (code, TREE_TYPE (ivar2), ivar2, x);
> +   if (is_fp_and_or)
> + x = fold_convert (TREE_TYPE (ivar), x);
> +   gimplify_assign (ivar, x, [2]);
> + }
> +   x = build2 (code, TREE_TYPE (ref2), ref2, ivar2);
> if (is_fp_and_or)
>   x = fold_convert (TREE_TYPE (ref), x);
> ref = build_outer_var_ref (var, ctx);

Is this hunk still needed when the first hunk is in?
I mean, this is in code guarded with
is_simd && lower_rec_simd_input_clauses (...) and that function
will return false for if (known_eq (sctx->max_vf, 1U)) which the first hunk
ensures.
So sctx.is_simt && is_fp_and_or shouldn't be true in that code.

Jakub



Re: [PATCH] run early sprintf warning after SSA (PR 100325)

2021-05-06 Thread Aldy Hernandez via Gcc-patches




On 5/5/21 9:26 AM, Richard Biener wrote:

On Wed, May 5, 2021 at 1:32 AM Martin Sebor via Gcc-patches
 wrote:


With no optimization, -Wformat-overflow and -Wformat-truncation
runs early to detect a subset of simple bugs.  But as it turns out,
the pass runs just a tad too early, before SSA.  That causes it to
miss a class of problems that can easily be detected once code is
in SSA form, and I would expect might also cause false positives.

The attached change moves the sprintf pass just after pass_build_ssa,
similar to other early flow-sensitive warnings (-Wnonnull-compare and
-Wuninitialized).


Makes sense.  I suppose walloca might also benefit from SSA - it seems
to do range queries which won't work quite well w/o SSA?


The early Walloca pass that runs without optimization doesn't do much, 
as we've never had ranges so early.  All it does is diagnose _every_ 
call to alloca(), if -Walloca is passed:


  // The first time this pass is called, it is called before
  // optimizations have been run and range information is unavailable,
  // so we can only perform strict alloca checking.
  if (first_time_p)
return warn_alloca != 0;

Though, I suppose we could move the first alloca pass after SSA is 
available and make it the one and only pass, since ranger only needs 
SSA.  However, I don't know how well this would work without value 
numbering or CSE.  For example, for gcc.dg/Walloca-4.c the gimple is:


   :
  _1 = rear_ptr_9(D) - w_10(D);
  _2 = (long unsigned int) _1;
  if (_2 <= 4095)
goto ; [INV]
  else
goto ; [INV]

   :
  _3 = rear_ptr_9(D) - w_10(D);
  _4 = (long unsigned int) _3;
  src_16 = __builtin_alloca (_4);
  goto ; [INV]

No ranges can be determined for _4.  However, if either FRE or DOM run, 
as they do value numbering and CSE respectively, we could easily 
determine a range as the above would become:


  :
  _1 = rear_ptr_9(D) - w_10(D);
  _2 = (long unsigned int) _1;
  if (_2 <= 4095)
goto ; [INV]
  else
goto ; [INV]

   :
  src_16 = __builtin_alloca (_2);
  goto ; [INV]

I'm inclined to leave the first alloca pass before SSA runs, as it 
doesn't do anything with ranges.  If anyone's open to a simple -O0 CSE 
type pass, it would be a different story.  Thoughts?


Aldy



Re: [Patch, fortran] PRs 46691 and 99819: Assumed and explicit size class arrays

2021-05-06 Thread Jonathan Wakely via Gcc-patches
PR 46691 is the wrong PR number:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=46691

The comment in the testcase is wrong, and the ChangeLog file will get
auto-updated with the wrong number overnight (you can manually edit it
and push the fix tomorrow after it's been generated).




Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||

2021-05-06 Thread Tobias Burnus

On 06.05.21 15:12, Tom de Vries wrote:


WIP patch below tries that approach and fixes the ICE,

Thanks!

but this simple example still doesn't work:
...
   #pragma omp target parallel reduction(&&: andf)


Try: map(andf). [Cf. PR99928 with pending patch at
https://gcc.gnu.org/pipermail/gcc-patches/2021-April/567838.html ]

I have now added your WIP patch to my patch, honoring the comment by Jakub.
I also copied the _Complex int example to -6.c to have also a target
version for this.

Lightly tested for now w/ and w/o offloading, will run the testsuite now.

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
OpenMP: Fix SIMT for complex/float reduction with && and ||

2021-05-06  Tobias Burnus 
	Tom de Vries  

gcc/ChangeLog:

	* omp-low.c (lower_rec_simd_input_clauses): Set max_vf = 1 if
	a truth_value_p reduction variable is nonintegral.
	(lower_rec_input_clauses): Also handle SIMT part
	for complex/float recution with && and ||.

libgomp/ChangeLog:

	* testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
	complex/floating-point || + && reduction with 'omp target'.
	* testsuite/libgomp.c-c++-common/reduction-5.c: Likewise.

 gcc/omp-low.c  |  58 --
 .../testsuite/libgomp.c-c++-common/reduction-5.c   | 192 
 .../testsuite/libgomp.c-c++-common/reduction-6.c   | 195 +
 3 files changed, 426 insertions(+), 19 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 26ceaf74b2d..c3c72241486 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4389,14 +4389,28 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
 	{
 	  for (tree c = gimple_omp_for_clauses (ctx->stmt); c;
 	   c = OMP_CLAUSE_CHAIN (c))
-	if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
-		&& OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
-	  {
-		/* UDR reductions are not supported yet for SIMT, disable
-		   SIMT.  */
-		sctx->max_vf = 1;
-		break;
+	{
+	  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
+		continue;
+
+	  if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
+		{
+		  /* UDR reductions are not supported yet for SIMT, disable
+		 SIMT.  */
+		  sctx->max_vf = 1;
+		  break;
+		}
+
+	  if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c))
+		  && !INTEGRAL_TYPE_P (TREE_TYPE (new_var)))
+		{
+		  /* Doing boolean operations on non-boolean types is
+		 for conformance only, it's not worth supporting this
+		 for SIMT.  */
+		  sctx->max_vf = 1;
+		  break;
 	  }
+	}
 	}
   if (maybe_gt (sctx->max_vf, 1U))
 	{
@@ -6432,28 +6446,34 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 
 		  gimplify_assign (unshare_expr (ivar), x, [0]);
 
-		  if (sctx.is_simt)
-			{
-			  if (!simt_lane)
-			simt_lane = create_tmp_var (unsigned_type_node);
-			  x = build_call_expr_internal_loc
-			(UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
-			 TREE_TYPE (ivar), 2, ivar, simt_lane);
-			  x = build2 (code, TREE_TYPE (ivar), ivar, x);
-			  gimplify_assign (ivar, x, [2]);
-			}
 		  tree ivar2 = ivar;
 		  tree ref2 = ref;
+		  tree zero = NULL_TREE;
 		  if (is_fp_and_or)
 			{
-			  tree zero = build_zero_cst (TREE_TYPE (ivar));
+			  zero = build_zero_cst (TREE_TYPE (ivar));
 			  ivar2 = fold_build2_loc (clause_loc, NE_EXPR,
 		   integer_type_node, ivar,
 		   zero);
 			  ref2 = fold_build2_loc (clause_loc, NE_EXPR,
 		  integer_type_node, ref, zero);
 			}
-		  x = build2 (code, TREE_TYPE (ref), ref2, ivar2);
+		  if (sctx.is_simt)
+			{
+			  if (!simt_lane)
+			simt_lane = create_tmp_var (unsigned_type_node);
+			  x = build_call_expr_internal_loc
+			(UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
+			 TREE_TYPE (ivar), 2, ivar, simt_lane);
+			  if (is_fp_and_or)
+			x = fold_build2_loc (clause_loc, NE_EXPR,
+		 integer_type_node, x, zero);
+			  x = build2 (code, TREE_TYPE (ivar2), ivar2, x);
+			  if (is_fp_and_or)
+			x = fold_convert (TREE_TYPE (ivar), x);
+			  gimplify_assign (ivar, x, [2]);
+			}
+		  x = build2 (code, TREE_TYPE (ref2), ref2, ivar2);
 		  if (is_fp_and_or)
 			x = fold_convert (TREE_TYPE (ref), x);
 		  ref = build_outer_var_ref (var, ctx);
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
new file mode 100644
index 000..8ac9930b241
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
@@ -0,0 +1,192 @@
+/* C / C++'s logical AND and OR operators take any scalar argument
+   which compares (un)equal to 0 - the result 1 or 0 and of type int.
+
+   In this testcase, the int result is again converted to a floating-poing
+   or complex type.
+
+   While having a floating-point/complex array element with || and && can make
+   sense, having a 

Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||

2021-05-06 Thread Tom de Vries
On 5/6/21 3:12 PM, Tom de Vries wrote:
> On 5/6/21 12:30 PM, Jakub Jelinek wrote:
>> On Thu, May 06, 2021 at 12:17:03PM +0200, Tobias Burnus wrote:
>>> OpenMP: Fix SIMT for complex/float reduction with && and ||
>>>
>>> gcc/ChangeLog:
>>>
>>> * omp-low.c (lower_rec_input_clauses): Also handle SIMT part
>>> for complex/float recution with && and ||.
>>>
>>> libgomp/ChangeLog:
>>>
>>> * testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
>>> complex/floating-point || + && recduction with 'omp target'.
>>
>> As the float/complex ||/&& reductions are IMHO just conformance issues, not
>> something anyone would actually use in meaningful code - floats or complex
>> aren't the most obvious or efficient holders of boolean values - I think
>> punting SIMT on those isn't a workaround, but the right solution.
>>
> 
> Ack.
> 
> WIP patch below tries that approach and fixes the ICE, but this simple
> example still doesn't work:
> ...
> int
> main ()
> {
>   float andf = 1;
> 
>   #pragma omp target parallel reduction(&&: andf)
>   for (int i=0; i < 1024; ++i)
> andf = andf && 0.0;
> 
>   if ((int)andf != 0)
> __builtin_abort ();
> 
>   return 0;
> }
> ...

Hm, after rewriting things like this:
...
  #pragma omp target map (tofrom: andf)
  #pragma omp parallel reduction(&&: andf)
  for (int i=0; i < 1024; ++i)
andf = andf && 0.0;
...
it does work.

My limited openmp knowledge is not enough to decide whether the fail of
the first variant is a test-case issue, or a gcc issue.

Thanks,
- Tom


Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||

2021-05-06 Thread Jakub Jelinek via Gcc-patches
On Thu, May 06, 2021 at 03:12:59PM +0200, Tom de Vries wrote:
> +   if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c))
> +   && TREE_CODE (TREE_TYPE (new_var)) != BOOLEAN_TYPE)

I would use && !INTEGRAL_TYPE_P (TREE_TYPE (new_var))
Especially in C code using || or && with int or other non-_Bool types
will pretty frequent.
Of course, if that doesn't work for SIMT either, it needs further work
and punting on those could be a temporary workaround.  But it would be
a preexisting issue, not something introduced with accepting &&/|| for
floating/complex types - we've accepted &&/|| for integral types forever.

Jakub



Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||

2021-05-06 Thread Tom de Vries
On 5/6/21 12:30 PM, Jakub Jelinek wrote:
> On Thu, May 06, 2021 at 12:17:03PM +0200, Tobias Burnus wrote:
>> OpenMP: Fix SIMT for complex/float reduction with && and ||
>>
>> gcc/ChangeLog:
>>
>>  * omp-low.c (lower_rec_input_clauses): Also handle SIMT part
>>  for complex/float recution with && and ||.
>>
>> libgomp/ChangeLog:
>>
>>  * testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
>>  complex/floating-point || + && recduction with 'omp target'.
> 
> As the float/complex ||/&& reductions are IMHO just conformance issues, not
> something anyone would actually use in meaningful code - floats or complex
> aren't the most obvious or efficient holders of boolean values - I think
> punting SIMT on those isn't a workaround, but the right solution.
> 

Ack.

WIP patch below tries that approach and fixes the ICE, but this simple
example still doesn't work:
...
int
main ()
{
  float andf = 1;

  #pragma omp target parallel reduction(&&: andf)
  for (int i=0; i < 1024; ++i)
andf = andf && 0.0;

  if ((int)andf != 0)
__builtin_abort ();

  return 0;
}
...

Thanks,
- Tom


diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 26ceaf74b2d..d8f2487054f 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -4389,14 +4389,28 @@ lower_rec_simd_input_clauses (tree new_var, omp_context *ctx,
 	{
 	  for (tree c = gimple_omp_for_clauses (ctx->stmt); c;
 	   c = OMP_CLAUSE_CHAIN (c))
-	if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_REDUCTION
-		&& OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
-	  {
-		/* UDR reductions are not supported yet for SIMT, disable
-		   SIMT.  */
-		sctx->max_vf = 1;
-		break;
-	  }
+	{
+	  if (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_REDUCTION)
+		continue;
+
+	  if (OMP_CLAUSE_REDUCTION_PLACEHOLDER (c))
+		{
+		  /* UDR reductions are not supported yet for SIMT, disable
+		 SIMT.  */
+		  sctx->max_vf = 1;
+		  break;
+		}
+
+	  if (truth_value_p (OMP_CLAUSE_REDUCTION_CODE (c))
+		  && TREE_CODE (TREE_TYPE (new_var)) != BOOLEAN_TYPE)
+		{
+		  /* Doing boolean operations on non-boolean types is
+		 for conformance only, it's not worth supporting this
+		 for SIMT.  */
+		  sctx->max_vf = 1;
+		  break;
+		}
+	}
 	}
   if (maybe_gt (sctx->max_vf, 1U))
 	{


[PATCH] refactor SSA rewriting timevars

2021-05-06 Thread Richard Biener
This avoids too deep stacks of timevars during incremental
SSA rewrite and basically use TV_TREE_INTO_SSA for all into-SSA
rewrite work and TV_TREE_SSA_INCREMENTAL for update_ssa.

Bootstrapped and tested on x86_64-unknown-linux-gnu, pushed.

2021-05-06  Richard Biener  

* timevar.def (TV_TREE_INSERT_PHI_NODES): Remove.
(TV_TREE_SSA_REWRITE_BLOCKS): Likewise.
(TV_TREE_INTO_SSA): New.
* tree-into-ssa.c (insert_phi_nodes): Do not account separately.
(rewrite_blocks): Likewise.
(pass_data_build_ssa): Account to TV_TREE_INTO_SSA.
---
 gcc/timevar.def |  3 +--
 gcc/tree-into-ssa.c | 11 +--
 2 files changed, 2 insertions(+), 12 deletions(-)

diff --git a/gcc/timevar.def b/gcc/timevar.def
index 63c0b3306de..16d1657436c 100644
--- a/gcc/timevar.def
+++ b/gcc/timevar.def
@@ -162,9 +162,8 @@ DEFTIMEVAR (TV_TREE_EARLY_VRP, "tree Early VRP")
 DEFTIMEVAR (TV_TREE_COPY_PROP, "tree copy propagation")
 DEFTIMEVAR (TV_FIND_REFERENCED_VARS  , "tree find ref. vars")
 DEFTIMEVAR (TV_TREE_PTA , "tree PTA")
-DEFTIMEVAR (TV_TREE_INSERT_PHI_NODES , "tree PHI insertion")
-DEFTIMEVAR (TV_TREE_SSA_REWRITE_BLOCKS, "tree SSA rewrite")
 DEFTIMEVAR (TV_TREE_SSA_OTHER   , "tree SSA other")
+DEFTIMEVAR (TV_TREE_INTO_SSA, "tree SSA rewrite")
 DEFTIMEVAR (TV_TREE_SSA_INCREMENTAL  , "tree SSA incremental")
 DEFTIMEVAR (TV_TREE_OPS , "tree operand scan")
 DEFTIMEVAR (TV_TREE_SSA_DOMINATOR_OPTS   , "dominator optimization")
diff --git a/gcc/tree-into-ssa.c b/gcc/tree-into-ssa.c
index ef990604030..85adb1ad8c7 100644
--- a/gcc/tree-into-ssa.c
+++ b/gcc/tree-into-ssa.c
@@ -1071,8 +1071,6 @@ insert_phi_nodes (bitmap_head *dfs)
   unsigned i;
   var_info *info;
 
-  timevar_push (TV_TREE_INSERT_PHI_NODES);
-
   /* When the gimplifier introduces SSA names it cannot easily avoid
  situations where abnormal edges added by CFG construction break
  the use-def dominance requirement.  For this case rewrite SSA
@@ -1141,8 +1139,6 @@ insert_phi_nodes (bitmap_head *dfs)
   insert_phi_nodes_for (info->var, idf, false);
   BITMAP_FREE (idf);
 }
-
-  timevar_pop (TV_TREE_INSERT_PHI_NODES);
 }
 
 
@@ -2282,9 +2278,6 @@ rewrite_update_dom_walker::after_dom_children 
(basic_block bb ATTRIBUTE_UNUSED)
 static void
 rewrite_blocks (basic_block entry, enum rewrite_mode what)
 {
-  /* Rewrite all the basic blocks in the program.  */
-  timevar_push (TV_TREE_SSA_REWRITE_BLOCKS);
-
   block_defs_stack.create (10);
 
   /* Recursively walk the dominator tree rewriting each statement in
@@ -2305,8 +2298,6 @@ rewrite_blocks (basic_block entry, enum rewrite_mode what)
 }
 
   block_defs_stack.release ();
-
-  timevar_pop (TV_TREE_SSA_REWRITE_BLOCKS);
 }
 
 class mark_def_dom_walker : public dom_walker
@@ -2402,7 +2393,7 @@ const pass_data pass_data_build_ssa =
   GIMPLE_PASS, /* type */
   "ssa", /* name */
   OPTGROUP_NONE, /* optinfo_flags */
-  TV_TREE_SSA_OTHER, /* tv_id */
+  TV_TREE_INTO_SSA, /* tv_id */
   PROP_cfg, /* properties_required */
   PROP_ssa, /* properties_provided */
   0, /* properties_destroyed */
-- 
2.26.2


Re: [committed] libstdc++: Use unsigned char argument to std::isdigit

2021-05-06 Thread Jonathan Wakely via Gcc-patches

On 05/05/21 22:14 +0100, Jonathan Wakely wrote:

On 05/05/21 21:57 +0200, François Dumont via Libstdc++ wrote:

On 05/05/21 2:01 pm, Jonathan Wakely via Libstdc++ wrote:

Passing plain char to isdigit is undefined if the value is negative.

libstdc++-v3/ChangeLog:

* include/std/charconv (__from_chars_alnum): Pass unsigned
char to std::isdigit.

Tested powerpc64le-linux. Committed to trunk.


   unsigned char __c = *__first;
-      if (std::isdigit(__c))
+      if (std::isdigit(static_cast(__c)))

I am very curious to know what this static_cast does 
on __c which is already unsigned char ? If it does I'll just start 
to hate C++ :-)


Maybe you wanted to put it on the previous *__first ?


Ugh, yes, but it's not even needed there because the implicit
conversion is fine.

We do need to fix the isspace calls in src/c++11/debug.cc but this one
was already correct. Thanks!


I've reverted that useless change, thanks for noticing it.




[committed] libstdc++: Fix definition of std::remove_cvref_t

2021-05-06 Thread Jonathan Wakely via Gcc-patches
I originally defined std::remove_cvref_t in terms of the internal
__remove_cvref_t trait, to avoid instantiating the remove_cvref class
template. However, as described in P1715R0 that is observable by users
and is thus non-conforming.

This defines remove_cvref_t as specified in the standard.

libstdc++-v3/ChangeLog:

* include/std/type_traits (remove_cvref_t): Define in terms of
remove_cvref.
* testsuite/20_util/remove_cvref/value.cc: Check alias.

Tested powerpc64le-linux. Committed to trunk.

commit 0e79e63026e43ad0577812ffb405dac8fa88af5b
Author: Jonathan Wakely 
Date:   Thu May 6 13:40:53 2021

libstdc++: Fix definition of std::remove_cvref_t

I originally defined std::remove_cvref_t in terms of the internal
__remove_cvref_t trait, to avoid instantiating the remove_cvref class
template. However, as described in P1715R0 that is observable by users
and is thus non-conforming.

This defines remove_cvref_t as specified in the standard.

libstdc++-v3/ChangeLog:

* include/std/type_traits (remove_cvref_t): Define in terms of
remove_cvref.
* testsuite/20_util/remove_cvref/value.cc: Check alias.

diff --git a/libstdc++-v3/include/std/type_traits 
b/libstdc++-v3/include/std/type_traits
index 1f8b57b04a0..eaf06fcb036 100644
--- a/libstdc++-v3/include/std/type_traits
+++ b/libstdc++-v3/include/std/type_traits
@@ -3223,12 +3223,21 @@ template 
   /// Remove references and cv-qualifiers.
   template
 struct remove_cvref
-{
-  using type = __remove_cvref_t<_Tp>;
-};
+: remove_cv<_Tp>
+{ };
 
   template
-using remove_cvref_t = __remove_cvref_t<_Tp>;
+struct remove_cvref<_Tp&>
+: remove_cv<_Tp>
+{ };
+
+  template
+struct remove_cvref<_Tp&&>
+: remove_cv<_Tp>
+{ };
+
+  template
+using remove_cvref_t = typename remove_cvref<_Tp>::type;
 
 #define __cpp_lib_type_identity 201806L
   /// Identity metafunction.
diff --git a/libstdc++-v3/testsuite/20_util/remove_cvref/value.cc 
b/libstdc++-v3/testsuite/20_util/remove_cvref/value.cc
index d4a28422977..a4f50d433dc 100644
--- a/libstdc++-v3/testsuite/20_util/remove_cvref/value.cc
+++ b/libstdc++-v3/testsuite/20_util/remove_cvref/value.cc
@@ -48,3 +48,10 @@ void test01()
   static_assert(is_same::type,
const int()>::value, "");
 }
+
+// Declare using nested name of class template
+template T func(typename std::remove_cvref::type);
+// Define using alias
+template T func(std::remove_cvref_t t) { return t; }
+// Call must not be ambiguous
+int i = func(1);


[PATCH] Avoid update_ssa quadraticness in loop splitting

2021-05-06 Thread Richard Biener
We already take care to not apply loop splitting to IL produced
by splitting so we should be able to delay updating SSA and
loop-closed SSA that was left broken after loop versioning
until after we processed all opportunities.

Bootstrapped and tested on x86_64-unknown-linux-gnu, I've also
built SPEC CPU 2017 with and without LTO for extra verification
and added a -fopt-info-loop message to one path where it was missing
to verify we are actually splitting loops in SPEC (we do).

Pushed.

2021-05-06  Richard Biener  

* tree-ssa-loop-split.c (split_loop): Delay updating SSA form.
Output an opt-info message.
(do_split_loop_on_cond): Likewise.
(tree_ssa_split_loops): Update SSA form here.
---
 gcc/tree-ssa-loop-split.c | 17 +++--
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index b80b6a75e62..3a09bbc39e5 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -589,7 +589,6 @@ split_loop (class loop *loop1)
   profile_probability::always (),
   true);
gcc_assert (loop2);
-   update_ssa (TODO_update_ssa);
 
edge new_e = connect_loops (loop1, loop2);
connect_loop_phis (loop1, loop2, new_e);
@@ -621,14 +620,13 @@ split_loop (class loop *loop1)
 
free_original_copy_tables ();
 
-   /* We destroyed LCSSA form above.  Eventually we might be able
-  to fix it on the fly, for now simply punt and use the helper.  */
-   rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
-
changed = true;
if (dump_file && (dump_flags & TDF_DETAILS))
  fprintf (dump_file, ";; Loop split.\n");
 
+   if (dump_enabled_p ())
+ dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, guard_stmt, "loop split\n");
+
/* Only deal with the first opportunity.  */
break;
   }
@@ -1532,8 +1530,6 @@ do_split_loop_on_cond (struct loop *loop1, edge 
invar_branch)
   to_loop1->flags |= true_invar ? EDGE_FALSE_VALUE : EDGE_TRUE_VALUE;
   to_loop2->flags |= true_invar ? EDGE_TRUE_VALUE : EDGE_FALSE_VALUE;
 
-  update_ssa (TODO_update_ssa);
-
   /* Due to introduction of a control flow edge from loop1 latch to loop2
  pre-header, we should update PHIs in loop2 to reflect this connection
  between loop1 and loop2.  */
@@ -1541,8 +1537,6 @@ do_split_loop_on_cond (struct loop *loop1, edge 
invar_branch)
 
   free_original_copy_tables ();
 
-  rewrite_into_loop_closed_ssa_1 (NULL, 0, SSA_OP_USE, loop1);
-
   return true;
 }
 
@@ -1644,7 +1638,10 @@ tree_ssa_split_loops (void)
   free_dominance_info (CDI_POST_DOMINATORS);
 
   if (changed)
-return TODO_cleanup_cfg;
+{
+  rewrite_into_loop_closed_ssa (NULL, TODO_update_ssa);
+  return TODO_cleanup_cfg;
+}
   return 0;
 }
 
-- 
2.26.2


[PATCH] tree-optimization/100434 - DSE aggregate call LHS

2021-05-06 Thread Richard Biener
This makes DSE consider aggregate LHS of calls as dead, for pure
or const calls the whole stmt and for others by removing the LHS.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

I wonder if there's a more canonical test for call LHS that cannot
be removed than

+  if (gimple_call_return_slot_opt_p (as_a (stmt))
+ && (TREE_ADDRESSABLE (TREE_TYPE (gimple_call_fntype (stmt)))
+ || !poly_int_tree_p
+   (TYPE_SIZE (TREE_TYPE (gimple_call_fntype (stmt))
+   return;

?

Richard.

2021-05-05  Richard Biener  

PR tree-optimization/100434
* tree-ssa-dse.c (initialize_ao_ref_for_dse): Handle
call LHS.
(dse_optimize_stmt): Handle call LHS by dropping the
LHS or the whole call if it doesn't have other
side-effects.
(pass_dse::execute): Adjust.

* gcc.dg/tree-ssa/ssa-dse-43.c: New testcase.
---
 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-43.c |  22 +
 gcc/tree-ssa-dse.c | 107 +
 2 files changed, 87 insertions(+), 42 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-43.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-43.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-43.c
new file mode 100644
index 000..f8785e9da46
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dse-43.c
@@ -0,0 +1,22 @@
+/* { dg-do compile } */
+/* { dg-options "-O -fdump-tree-dse1-details" } */
+
+struct X { int x; };
+struct X x;
+
+extern struct X foo (void);
+void bar()
+{
+  x = foo();
+  x = (struct X){};
+}
+
+extern struct X __attribute__((const)) foo2 (int);
+void bar2()
+{
+  x = foo2 (1);
+  x = foo2 (2);
+}
+
+/* { dg-final { scan-tree-dump-times "Deleted dead store in call LHS: x = foo 
" 1 "dse1" } } */
+/* { dg-final { scan-tree-dump-times "Deleted dead store: x = foo2 " 1 "dse1" 
} } */
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 63c876a1ff2..c3939a6417f 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -140,10 +140,13 @@ initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
  break;
}
 }
-  else if (is_gimple_assign (stmt))
+  else if (tree lhs = gimple_get_lhs (stmt))
 {
-  ao_ref_init (write, gimple_assign_lhs (stmt));
-  return true;
+  if (TREE_CODE (lhs) != SSA_NAME)
+   {
+ ao_ref_init (write, lhs);
+ return true;
+   }
 }
   return false;
 }
@@ -1035,7 +1038,7 @@ delete_dead_or_redundant_assignment (gimple_stmt_iterator 
*gsi, const char *type
post dominates the first store, then the first store is dead.  */
 
 static void
-dse_optimize_stmt (gimple_stmt_iterator *gsi, sbitmap live_bytes)
+dse_optimize_stmt (function *fun, gimple_stmt_iterator *gsi, sbitmap 
live_bytes)
 {
   gimple *stmt = gsi_stmt (*gsi);
 
@@ -1113,49 +1116,69 @@ dse_optimize_stmt (gimple_stmt_iterator *gsi, sbitmap 
live_bytes)
}
 }
 
-  if (is_gimple_assign (stmt))
+  bool by_clobber_p = false;
+
+  /* Check if this statement stores zero to a memory location,
+ and if there is a subsequent store of zero to the same
+ memory location.  If so, remove the subsequent store.  */
+  if (gimple_assign_single_p (stmt)
+  && initializer_zerop (gimple_assign_rhs1 (stmt)))
+dse_optimize_redundant_stores (stmt);
+
+  /* Self-assignments are zombies.  */
+  if (is_gimple_assign (stmt)
+  && operand_equal_p (gimple_assign_rhs1 (stmt),
+ gimple_assign_lhs (stmt), 0))
+;
+  else
 {
-  bool by_clobber_p = false;
-
-  /* Check if this statement stores zero to a memory location,
-and if there is a subsequent store of zero to the same
-memory location.  If so, remove the subsequent store.  */
-  if (gimple_assign_single_p (stmt)
- && initializer_zerop (gimple_assign_rhs1 (stmt)))
-   dse_optimize_redundant_stores (stmt);
-
-  /* Self-assignments are zombies.  */
-  if (operand_equal_p (gimple_assign_rhs1 (stmt),
-  gimple_assign_lhs (stmt), 0))
-   ;
-  else
-   {
- bool byte_tracking_enabled
-   = setup_live_bytes_from_ref (, live_bytes);
- enum dse_store_status store_status;
- store_status = dse_classify_store (, stmt,
-byte_tracking_enabled,
-live_bytes, _clobber_p);
- if (store_status == DSE_STORE_LIVE)
-   return;
+  bool byte_tracking_enabled
+ = setup_live_bytes_from_ref (, live_bytes);
+  enum dse_store_status store_status;
+  store_status = dse_classify_store (, stmt,
+byte_tracking_enabled,
+live_bytes, _clobber_p);
+  if (store_status == DSE_STORE_LIVE)
+   return;
 
- if (store_status == DSE_STORE_MAYBE_PARTIAL_DEAD)
-   {
- maybe_trim_partially_dead_store (, 

Re: [Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||

2021-05-06 Thread Jakub Jelinek via Gcc-patches
On Thu, May 06, 2021 at 12:17:03PM +0200, Tobias Burnus wrote:
> OpenMP: Fix SIMT for complex/float reduction with && and ||
> 
> gcc/ChangeLog:
> 
>   * omp-low.c (lower_rec_input_clauses): Also handle SIMT part
>   for complex/float recution with && and ||.
> 
> libgomp/ChangeLog:
> 
>   * testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
>   complex/floating-point || + && recduction with 'omp target'.

As the float/complex ||/&& reductions are IMHO just conformance issues, not
something anyone would actually use in meaningful code - floats or complex
aren't the most obvious or efficient holders of boolean values - I think
punting SIMT on those isn't a workaround, but the right solution.

Jakub



Re: [PATCH 2/2] libstdc++: Implement LWG 3533 changes to foo_view::iterator::base()

2021-05-06 Thread Jonathan Wakely via Gcc-patches

On 05/05/21 16:23 -0400, Patrick Palka via Libstdc++ wrote:

Tested on x86_64-pc-linux-gnu, does this look OK for trunk/10/11?


Yes, thanks.


libstdc++-v3/ChangeLog:

* include/std/ranges (filter_view::_Iterator::base): Make the
const& overload return a const reference and remove its
constraint as per LWG 3533. Make unconditionally noexcept.
(transform_view::_Iterator::base): Likewise.
(elements_view::_Iterator::base): Likewise.
---
libstdc++-v3/include/std/ranges | 14 ++
1 file changed, 6 insertions(+), 8 deletions(-)

diff --git a/libstdc++-v3/include/std/ranges b/libstdc++-v3/include/std/ranges
index 7075fa3ae6e..bc11505c167 100644
--- a/libstdc++-v3/include/std/ranges
+++ b/libstdc++-v3/include/std/ranges
@@ -1199,9 +1199,8 @@ namespace views::__adaptor
_M_parent(__parent)
{ }

-   constexpr _Vp_iter
-   base() const &
- requires copyable<_Vp_iter>
+   constexpr const _Vp_iter&
+   base() const & noexcept
{ return _M_current; }

constexpr _Vp_iter
@@ -1467,9 +1466,8 @@ namespace views::__adaptor
: _M_current(std::move(__i._M_current)), _M_parent(__i._M_parent)
  { }

- constexpr _Base_iter
- base() const &
-   requires copyable<_Base_iter>
+ constexpr const _Base_iter&
+ base() const & noexcept
  { return _M_current; }

  constexpr _Base_iter
@@ -3403,8 +3401,8 @@ namespace views::__adaptor
: _M_base(std::move(base))
  { }

-  constexpr _Vp
-  base() const& requires copy_constructible<_Vp>
+  constexpr const _Vp&
+  base() const & noexcept
  { return _M_base; }

  constexpr _Vp
--
2.31.1.442.g7e39198978





Re: [PATCH 1/2] libstdc++: Implement LWG 3391 changes to move/counted_iterator::base

2021-05-06 Thread Jonathan Wakely via Gcc-patches

On 05/05/21 16:23 -0400, Patrick Palka via Libstdc++ wrote:

Tested on x86_64-pc-linux-gnu, does this look OK for trunk/10/11?


Yes, thanks.


libstdc++-v3/ChangeLog:

* include/bits/stl_iterator.h (move_iterator::base): Make the
const& overload return a const reference and remove its
constraint as per LWG 3391.  Make unconditionally noexcept.
(counted_iterator::base): Likewise.
* testsuite/24_iterators/move_iterator/lwg3391.cc: New test.
* testsuite/24_iterators/move_iterator/move_only.cc: Adjust
has_member_base concept to decay-copy the result of base().
---
libstdc++-v3/include/bits/stl_iterator.h  | 13 ++-
.../24_iterators/move_iterator/lwg3391.cc | 37 +++
.../24_iterators/move_iterator/move_only.cc   |  8 +++-
3 files changed, 48 insertions(+), 10 deletions(-)
create mode 100644 libstdc++-v3/testsuite/24_iterators/move_iterator/lwg3391.cc

diff --git a/libstdc++-v3/include/bits/stl_iterator.h 
b/libstdc++-v3/include/bits/stl_iterator.h
index 049f83cff90..2409cd71f86 100644
--- a/libstdc++-v3/include/bits/stl_iterator.h
+++ b/libstdc++-v3/include/bits/stl_iterator.h
@@ -1409,11 +1409,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  base() const
  { return _M_current; }
#else
-  constexpr iterator_type
-  base() const &
-#if __cpp_lib_concepts
-   requires copy_constructible
-#endif
+  constexpr const iterator_type&
+  base() const & noexcept
  { return _M_current; }

  constexpr iterator_type
@@ -2141,10 +2138,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  return *this;
}

-  constexpr _It
-  base() const &
-  noexcept(is_nothrow_copy_constructible_v<_It>)
-  requires copy_constructible<_It>
+  constexpr const _It&
+  base() const & noexcept
  { return _M_current; }

  constexpr _It
diff --git a/libstdc++-v3/testsuite/24_iterators/move_iterator/lwg3391.cc 
b/libstdc++-v3/testsuite/24_iterators/move_iterator/lwg3391.cc
new file mode 100644
index 000..18e015777cd
--- /dev/null
+++ b/libstdc++-v3/testsuite/24_iterators/move_iterator/lwg3391.cc
@@ -0,0 +1,37 @@
+// Copyright (C) 2021 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++2a" }
+// { dg-do compile { target c++2a } }
+
+// Verify LWG 3391 changes.
+
+#include 
+#include 
+
+#include 
+
+using __gnu_test::test_range;
+using __gnu_test::input_iterator_wrapper_nocopy;
+
+void
+test01()
+{
+  extern test_range rx;
+  auto v = rx | std::views::take(5);
+  std::ranges::begin(v) != std::ranges::end(v);
+}
diff --git a/libstdc++-v3/testsuite/24_iterators/move_iterator/move_only.cc 
b/libstdc++-v3/testsuite/24_iterators/move_iterator/move_only.cc
index 639adfab0e2..5537dfbf3cd 100644
--- a/libstdc++-v3/testsuite/24_iterators/move_iterator/move_only.cc
+++ b/libstdc++-v3/testsuite/24_iterators/move_iterator/move_only.cc
@@ -43,7 +43,13 @@ template<> struct std::iterator_traits
static_assert(std::input_iterator);

template
-  concept has_member_base = requires (T t) { std::forward(t).base(); };
+  concept has_member_base = requires (T t) {
+// LWG 3391 made the const& overload of move_iterator::base()
+// unconstrained and return a const reference.  So rather than checking
+// whether base() is valid (which is now trivially true in an unevaluated
+// context), the below now checks whether decay-copying base() is valid.
+[](auto){}(std::forward(t).base());
+  };

using move_only_move_iterator = std::move_iterator;

--
2.31.1.442.g7e39198978





Re: [PATCH] phiopt: Optimize (x <=> y) cmp z [PR94589]

2021-05-06 Thread Jakub Jelinek via Gcc-patches
On Wed, May 05, 2021 at 06:52:27PM +0200, Jakub Jelinek via Gcc-patches wrote:
> On Wed, May 05, 2021 at 01:45:29PM +0200, Marc Glisse wrote:
> > On Tue, 4 May 2021, Jakub Jelinek via Gcc-patches wrote:
> > 
> > > 2) the pr94589-2.C testcase should be matching just 12 times each, but 
> > > runs
> > > into operator>=(strong_ordering, unspecified) being defined as
> > > (_M_value&1)==_M_value
> > > rather than _M_value>=0.  When not honoring NaNs, the 2 case should be
> > > unreachable and so (_M_value&1)==_M_value is then equivalent to 
> > > _M_value>=0,
> > > but is not a single use but two uses.  I'll need to pattern match that 
> > > case
> > > specially.
> > 
> > Somewhere in RTL (_M_value&1)==_M_value is turned into (_M_value&-2)==0,
> > that could be worth doing already in GIMPLE.
> 
> Apparently it is
>   /* Simplify eq/ne (and/ior x y) x/y) for targets with a BICS instruction or
>  constant folding if x/y is a constant.  */
>   if ((code == EQ || code == NE)
>   && (op0code == AND || op0code == IOR)
>   && !side_effects_p (op1)
>   && op1 != CONST0_RTX (cmp_mode))
> {
>   /* Both (eq/ne (and x y) x) and (eq/ne (ior x y) y) simplify to
>  (eq/ne (and (not y) x) 0).  */
> ...
>   /* Both (eq/ne (and x y) y) and (eq/ne (ior x y) x) simplify to
>  (eq/ne (and (not x) y) 0).  */
> Yes, doing that on GIMPLE for the case where the not argument is constant
> would simplify the phiopt follow-up (it would be single imm use then).

Though, (x&1) == x is equivalent to both (x&~1)==0 and to x < 2U
and from the latter two it isn't obvious which one is better/more canonical.
On aarch64 I don't see differences in number of insns nor in their size:
  10:   13001c00sxtbw0, w0
  14:   721f781ftst w0, #0xfffe
  18:   1a9f17e0csetw0, eq  // eq = none
  1c:   d65f03c0ret
vs.
  20:   12001c00and w0, w0, #0xff
  24:   7100041fcmp w0, #0x1
  28:   1a9f87e0csetw0, ls  // ls = plast
  2c:   d65f03c0ret
On x86_64 same number of insns, but the comparison is shorter (note, the
spaceship result is a struct with signed char based enum):
  10:   31 c0   xor%eax,%eax
  12:   81 e7 fe 00 00 00   and$0xfe,%edi
  18:   0f 94 c0sete   %al
  1b:   c3  retq   
  1c:   0f 1f 40 00 nopl   0x0(%rax)
vs.
  20:   31 c0   xor%eax,%eax
  22:   40 80 ff 01 cmp$0x1,%dil
  26:   0f 96 c0setbe  %al
  29:   c3  retq   
Generally, I'd think that the comparison should be better because it
will be most common in user code that way and VRP will be able to do the
best thing for it.

Before we decide how to optimize it, can't we change libstdc++
to use the more efficient implementation?

I.e. either following, or { return (__v._M_value & ~1) == 0; } ?

2021-05-06  Jakub Jelinek  

* compare (operator>=(partial_ordering, __cmp_cat::__unspec),
operator<=(__cmp_cat::__unspec, partial_ordering)): Simplify to
(unsigned char) __v._M_value < 2.

--- gcc/compare.jj  2021-03-10 17:36:41.288491012 +0100
+++ gcc/compare 2021-05-06 11:44:37.519553192 +0200
@@ -107,7 +107,7 @@ namespace std
 
 friend constexpr bool
 operator>=(partial_ordering __v, __cmp_cat::__unspec) noexcept
-{ return __cmp_cat::type(__v._M_value & 1) == __v._M_value; }
+{ return static_cast(__v._M_value) < 2; }
 
 friend constexpr bool
 operator< (__cmp_cat::__unspec, partial_ordering __v) noexcept
@@ -119,7 +119,7 @@ namespace std
 
 friend constexpr bool
 operator<=(__cmp_cat::__unspec, partial_ordering __v) noexcept
-{ return __cmp_cat::type(__v._M_value & 1) == __v._M_value; }
+{ return static_cast(__v._M_value) < 2; }
 
 friend constexpr bool
 operator>=(__cmp_cat::__unspec, partial_ordering __v) noexcept


Jakub



[Patch] + [nvptx RFH/RFC]: OpenMP: Fix SIMT for complex/float reduction with && and ||

2021-05-06 Thread Tobias Burnus

The complex/float && and || reduction patch missed a target testcase
(→ attached) which revealed that also a SIMT needed some special
handling, but just runs on non-SIMT systems.

The omp-low.c patch is rather simple - and I think it semantically
okay.
[Note to the change: It looks more completed than it is:
- moving 'zero' decl out of the 'if' block
- moving that if block before the 'if (sctx.is_simt)' block
-  'if (is_fp_and_or)' to the 'if (sctx.is_simt)' block.]

I think at least the testcase should be added, possibly also
the omp-low.c change – albeit I get a later ICE (see below),
which needs either an XFAIL or a fix.

 * * *

ICE with NVPTX:

When the device lto1 starts, it fails when expanding the
intrinsic XCHG_BFLY function.

We have 'ivar' = complex float, which at rtx level is
converted to a concatenation (via gen_reg_rtx()).
In omp-low.c:
  IFN_GOMP_SIMT_XCHG_BFLY (TREE_TYPE(ivar), ivar, simt_lane)

Later in expand_GOMP_SIMT_XCHG_BFLY, we call:
371   expand_insn (targetm.code_for_omp_simt_xchg_bfly, 3, ops);
which fails by running into unreachable of 'expand_insn'
7844  if (!maybe_expand_insn (icode, nops, ops))
7845gcc_unreachable ();

icode = CODE_FOR_omp_simt_xchg_bfly
nops = 3

(gdb) p ops[0]->type
$3 = EXPAND_OUTPUT

(gdb) p debug(ops[0]->value)
(concat:SC (reg:SF 85)
(reg:SF 86))

(gdb) p ops[1]->type
$5 = EXPAND_INPUT

(gdb) p debug(ops[1]->value)
(concat:SC (reg:SF 26 [ orfc ])
(reg:SF 27 [ orfc+4 ]))

(gdb) p ops[2]->type
$7 = EXPAND_INPUT

(gdb) p debug(ops[2]->value)
(reg:SI 52 [ _74 ])

The mentioned concat happens in


How to fix this? Or does this fall into the same category as
PR100321 (fixed by: r12-395, Disable SIMT for user-defined reduction) with its
follow-up PR 100408?

Small testcase is:

_Complex float rcf[1024];
int
reduction_or ()
{
  _Complex float orfc = 0;
  for (int i=0; i < 1024; ++i)
orfc = orfc || rcf[i];
  return __real__ orfc;
}

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München 
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank 
Thürauf
OpenMP: Fix SIMT for complex/float reduction with && and ||

gcc/ChangeLog:

	* omp-low.c (lower_rec_input_clauses): Also handle SIMT part
	for complex/float recution with && and ||.

libgomp/ChangeLog:

	* testsuite/libgomp.c-c++-common/reduction-5.c: New test, testing
	complex/floating-point || + && recduction with 'omp target'.

 gcc/omp-low.c  |  30 ++--
 .../testsuite/libgomp.c-c++-common/reduction-5.c   | 192 +
 2 files changed, 210 insertions(+), 12 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 26ceaf7..46220c5 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -6432,28 +6432,34 @@ lower_rec_input_clauses (tree clauses, gimple_seq *ilist, gimple_seq *dlist,
 
 		  gimplify_assign (unshare_expr (ivar), x, [0]);
 
-		  if (sctx.is_simt)
-			{
-			  if (!simt_lane)
-			simt_lane = create_tmp_var (unsigned_type_node);
-			  x = build_call_expr_internal_loc
-			(UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
-			 TREE_TYPE (ivar), 2, ivar, simt_lane);
-			  x = build2 (code, TREE_TYPE (ivar), ivar, x);
-			  gimplify_assign (ivar, x, [2]);
-			}
 		  tree ivar2 = ivar;
 		  tree ref2 = ref;
+		  tree zero = NULL_TREE;
 		  if (is_fp_and_or)
 			{
-			  tree zero = build_zero_cst (TREE_TYPE (ivar));
+			  zero = build_zero_cst (TREE_TYPE (ivar));
 			  ivar2 = fold_build2_loc (clause_loc, NE_EXPR,
 		   integer_type_node, ivar,
 		   zero);
 			  ref2 = fold_build2_loc (clause_loc, NE_EXPR,
 		  integer_type_node, ref, zero);
 			}
-		  x = build2 (code, TREE_TYPE (ref), ref2, ivar2);
+		  if (sctx.is_simt)
+			{
+			  if (!simt_lane)
+			simt_lane = create_tmp_var (unsigned_type_node);
+			  x = build_call_expr_internal_loc
+			(UNKNOWN_LOCATION, IFN_GOMP_SIMT_XCHG_BFLY,
+			 TREE_TYPE (ivar), 2, ivar, simt_lane);
+			  if (is_fp_and_or)
+			x = fold_build2_loc (clause_loc, NE_EXPR,
+		 integer_type_node, x, zero);
+			  x = build2 (code, TREE_TYPE (ivar2), ivar2, x);
+			  if (is_fp_and_or)
+			x = fold_convert (TREE_TYPE (ivar), x);
+			  gimplify_assign (ivar, x, [2]);
+			}
+		  x = build2 (code, TREE_TYPE (ref2), ref2, ivar2);
 		  if (is_fp_and_or)
 			x = fold_convert (TREE_TYPE (ref), x);
 		  ref = build_outer_var_ref (var, ctx);
diff --git a/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
new file mode 100644
index 000..346c882
--- /dev/null
+++ b/libgomp/testsuite/libgomp.c-c++-common/reduction-5.c
@@ -0,0 +1,192 @@
+/* C / C++'s logical AND and OR operators take any scalar argument
+   which compares (un)equal to 0 - the result 1 or 0 and of type int.
+
+   In this testcase, the int result is again converted to a floating-poing
+   or complex type.
+
+   While having a floating-point/complex 

Re: [PATCH] phiopt: Use gphi *phi instead of gimple *phi some more

2021-05-06 Thread Richard Biener
On Thu, 6 May 2021, Jakub Jelinek wrote:

> Hi!
> 
> Various functions in phiopt are also called with a gphi * but use
> gimple * argument for it.  Fixed thusly, bootstrapped/regtested on
> x86_64-linux and i686-linux, ok for trunk?

Yes (obvious even).

Richard.

> 2021-05-06  Jakub Jelinek  
> 
>   * tree-ssa-phiopt.c (value_replacement, minmax_replacement,
>   abs_replacement, xor_replacement,
>   cond_removal_in_popcount_clz_ctz_pattern,
>   replace_phi_edge_with_variable): Change type of phi argument from
>   gimple * to gphi *.
> 
> --- gcc/tree-ssa-phiopt.c.jj  2021-05-05 15:06:23.253189139 +0200
> +++ gcc/tree-ssa-phiopt.c 2021-05-05 18:02:07.600019038 +0200
> @@ -57,23 +57,23 @@ static bool conditional_replacement (bas
>  static gphi *factor_out_conditional_conversion (edge, edge, gphi *, tree, 
> tree,
>   gimple *);
>  static int value_replacement (basic_block, basic_block,
> -   edge, edge, gimple *, tree, tree);
> +   edge, edge, gphi *, tree, tree);
>  static bool minmax_replacement (basic_block, basic_block,
> - edge, edge, gimple *, tree, tree);
> + edge, edge, gphi *, tree, tree);
>  static bool abs_replacement (basic_block, basic_block,
> -  edge, edge, gimple *, tree, tree);
> +  edge, edge, gphi *, tree, tree);
>  static bool xor_replacement (basic_block, basic_block,
> -  edge, edge, gimple *, tree, tree);
> +  edge, edge, gphi *, tree, tree);
>  static bool spaceship_replacement (basic_block, basic_block,
>  edge, edge, gphi *, tree, tree);
>  static bool cond_removal_in_popcount_clz_ctz_pattern (basic_block, 
> basic_block,
> -   edge, edge, gimple *,
> +   edge, edge, gphi *,
> tree, tree);
>  static bool cond_store_replacement (basic_block, basic_block, edge, edge,
>   hash_set *);
>  static bool cond_if_else_store_replacement (basic_block, basic_block, 
> basic_block);
>  static hash_set * get_non_trapping ();
> -static void replace_phi_edge_with_variable (basic_block, edge, gimple *, 
> tree);
> +static void replace_phi_edge_with_variable (basic_block, edge, gphi *, tree);
>  static void hoist_adjacent_loads (basic_block, basic_block,
> basic_block, basic_block);
>  static bool gate_hoist_loads (void);
> @@ -389,7 +389,7 @@ tree_ssa_phiopt_worker (bool do_store_el
>  
>  static void
>  replace_phi_edge_with_variable (basic_block cond_block,
> - edge e, gimple *phi, tree new_tree)
> + edge e, gphi *phi, tree new_tree)
>  {
>basic_block bb = gimple_bb (phi);
>basic_block block_to_remove;
> @@ -1113,8 +1113,7 @@ absorbing_element_p (tree_code code, tre
>  
>  static int
>  value_replacement (basic_block cond_bb, basic_block middle_bb,
> -edge e0, edge e1, gimple *phi,
> -tree arg0, tree arg1)
> +edge e0, edge e1, gphi *phi, tree arg0, tree arg1)
>  {
>gimple_stmt_iterator gsi;
>gimple *cond;
> @@ -1422,8 +1421,7 @@ value_replacement (basic_block cond_bb,
>  
>  static bool
>  minmax_replacement (basic_block cond_bb, basic_block middle_bb,
> - edge e0, edge e1, gimple *phi,
> - tree arg0, tree arg1)
> + edge e0, edge e1, gphi *phi, tree arg0, tree arg1)
>  {
>tree result;
>edge true_edge, false_edge;
> @@ -2266,7 +2264,7 @@ spaceship_replacement (basic_block cond_
>  static bool
>  cond_removal_in_popcount_clz_ctz_pattern (basic_block cond_bb,
> basic_block middle_bb,
> -   edge e1, edge e2, gimple *phi,
> +   edge e1, edge e2, gphi *phi,
> tree arg0, tree arg1)
>  {
>gimple *cond;
> @@ -2424,7 +2422,7 @@ cond_removal_in_popcount_clz_ctz_pattern
>  static bool
>  abs_replacement (basic_block cond_bb, basic_block middle_bb,
>edge e0 ATTRIBUTE_UNUSED, edge e1,
> -  gimple *phi, tree arg0, tree arg1)
> +  gphi *phi, tree arg0, tree arg1)
>  {
>tree result;
>gassign *new_stmt;
> @@ -2548,7 +2546,7 @@ abs_replacement (basic_block cond_bb, ba
>  static bool
>  xor_replacement (basic_block cond_bb, basic_block middle_bb,
>edge e0 ATTRIBUTE_UNUSED, edge e1,
> -  gimple *phi, tree arg0, tree arg1)
> +  gphi *phi, tree arg0, tree arg1)
>  {
>if (!INTEGRAL_TYPE_P (TREE_TYPE (arg1)))
>  return false;
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE Software Solutions 

Re: [committed] Get avr building again

2021-05-06 Thread Senthil Kumar Selvaraj via Gcc-patches


Jeff Law via Gcc-patches writes:

> Removes references to CC_STATUS_INIT from the avr port, which should get 
> it to the point of building again.
>
>
> Committed to the trunk.

Thanks, I was about to send a patch for that.

Regards
Senthil
>
>
> Jeff



RE: [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON.

2021-05-06 Thread Tamar Christina via Gcc-patches


> -Original Message-
> From: Christophe Lyon 
> Sent: Thursday, May 6, 2021 10:23 AM
> To: Tamar Christina 
> Cc: gcc Patches ; nd 
> Subject: Re: [PATCH 3/4][AArch32]: Add support for sign differing dot-
> product usdot for NEON.
> 
> On Wed, 5 May 2021 at 19:39, Tamar Christina via Gcc-patches  patc...@gcc.gnu.org> wrote:
> >
> > Hi All,
> >
> > This adds optabs implementing usdot_prod.
> >
> > The following testcase:
> >
> > #define N 480
> > #define SIGNEDNESS_1 unsigned
> > #define SIGNEDNESS_2 signed
> > #define SIGNEDNESS_3 signed
> > #define SIGNEDNESS_4 unsigned
> >
> > SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > SIGNEDNESS_3 char *restrict a,
> >SIGNEDNESS_4 char *restrict b)
> > {
> >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > {
> >   int av = a[i];
> >   int bv = b[i];
> >   SIGNEDNESS_2 short mult = av * bv;
> >   res += mult;
> > }
> >   return res;
> > }
> >
> > Generates
> >
> > f:
> > vmov.i32q8, #0  @ v4si
> > add r3, r2, #480
> > .L2:
> > vld1.8  {q10}, [r2]!
> > vld1.8  {q9}, [r1]!
> > vusdot.s8   q8, q9, q10
> > cmp r3, r2
> > bne .L2
> > vadd.i32d16, d16, d17
> > vpadd.i32   d16, d16, d16
> > vmov.32 r3, d16[0]
> > add r0, r0, r3
> > bx  lr
> >
> > instead of
> >
> > f:
> > vmov.i32q8, #0  @ v4si
> > add r3, r2, #480
> > .L2:
> > vld1.8  {q9}, [r2]!
> > vld1.8  {q11}, [r1]!
> > cmp r3, r2
> > vmull.s8 q10, d18, d22
> > vmull.s8 q9, d19, d23
> > vaddw.s16   q8, q8, d20
> > vaddw.s16   q8, q8, d21
> > vaddw.s16   q8, q8, d18
> > vaddw.s16   q8, q8, d19
> > bne .L2
> > vadd.i32d16, d16, d17
> > vpadd.i32   d16, d16, d16
> > vmov.32 r3, d16[0]
> > add r0, r0, r3
> > bx  lr
> >
> > For NEON.  I couldn't figure out if the MVE instruction vmlaldav.s16
> > could be used to emulate this.  Because it would require additional
> > widening to work I left MVE out of this patch set but perhaps someone
> should take a look.
> >
> > Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> I guess you mean arm-linux-gnueabihf ?
> 

Oops, yeah, automatic pilot..

> >
> > Ok for master?
> >
> > Thanks,
> > Tamar
> >
> > gcc/ChangeLog:
> >
> > * config/arm/neon.md (usdot_prod): New.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/arm/simd/vusdot-autovec.c: New test.
> >
> > --- inline copy of patch --
> > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md index
> >
> fec2cc91d24b6eff7b6fc8fdd54f39b3d646c468..23ad411178db77c5d19bee7452
> bc
> > 1070331c1aa0 100644
> > --- a/gcc/config/arm/neon.md
> > +++ b/gcc/config/arm/neon.md
> > @@ -3075,6 +3075,24 @@ (define_expand "dot_prod"
> >DONE;
> >  })
> >
> > +;; Auto-vectorizer pattern for usdot
> > +(define_expand "usdot_prod"
> > +  [(set (match_operand:VCVTI 0 "register_operand")
> > +   (plus:VCVTI (unspec:VCVTI [(match_operand: 1
> > +   "register_operand")
> > +  (match_operand: 2
> > +   "register_operand")]
> > +UNSPEC_DOT_US)
> > +   (match_operand:VCVTI 3 "register_operand")))]
> > +  "TARGET_I8MM"
> > +{
> > +  emit_insn (
> > +gen_neon_usdot (operands[3], operands[3], operands[1],
> > +   operands[2]));
> > +  emit_insn (gen_rtx_SET (operands[0], operands[3]));
> > +  DONE;
> > +})
> > +
> >  (define_expand "neon_copysignf"
> >[(match_operand:VCVTF 0 "register_operand")
> > (match_operand:VCVTF 1 "register_operand") diff --git
> > a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> > b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> > new file mode 100644
> > index
> >
> ..7cc56f68817d77d6950df0ab37
> 2d
> > 6fbaad6b3813
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> > @@ -0,0 +1,38 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
> > +
> > +#define N 480
> > +#define SIGNEDNESS_1 unsigned
> > +#define SIGNEDNESS_2 signed
> > +#define SIGNEDNESS_3 signed
> > +#define SIGNEDNESS_4 unsigned
> > +
> > +SIGNEDNESS_1 int __attribute__ ((noipa)) f (SIGNEDNESS_1 int res,
> > +SIGNEDNESS_3 char *restrict a,
> > +   SIGNEDNESS_4 char *restrict b)
> > +{
> > +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > +{
> > +  int av = a[i];
> > +  int bv = b[i];
> > +  SIGNEDNESS_2 short mult = av * bv;
> > +  res += mult;
> > +}
> > +  return res;
> > +}
> > +
> > +SIGNEDNESS_1 int __attribute__ ((noipa)) g (SIGNEDNESS_1 int res,
> > +SIGNEDNESS_3 char *restrict b,
> > +   

Re: [PATCH 3/4][AArch32]: Add support for sign differing dot-product usdot for NEON.

2021-05-06 Thread Christophe Lyon via Gcc-patches
On Wed, 5 May 2021 at 19:39, Tamar Christina via Gcc-patches
 wrote:
>
> Hi All,
>
> This adds optabs implementing usdot_prod.
>
> The following testcase:
>
> #define N 480
> #define SIGNEDNESS_1 unsigned
> #define SIGNEDNESS_2 signed
> #define SIGNEDNESS_3 signed
> #define SIGNEDNESS_4 unsigned
>
> SIGNEDNESS_1 int __attribute__ ((noipa))
> f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
>SIGNEDNESS_4 char *restrict b)
> {
>   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> {
>   int av = a[i];
>   int bv = b[i];
>   SIGNEDNESS_2 short mult = av * bv;
>   res += mult;
> }
>   return res;
> }
>
> Generates
>
> f:
> vmov.i32q8, #0  @ v4si
> add r3, r2, #480
> .L2:
> vld1.8  {q10}, [r2]!
> vld1.8  {q9}, [r1]!
> vusdot.s8   q8, q9, q10
> cmp r3, r2
> bne .L2
> vadd.i32d16, d16, d17
> vpadd.i32   d16, d16, d16
> vmov.32 r3, d16[0]
> add r0, r0, r3
> bx  lr
>
> instead of
>
> f:
> vmov.i32q8, #0  @ v4si
> add r3, r2, #480
> .L2:
> vld1.8  {q9}, [r2]!
> vld1.8  {q11}, [r1]!
> cmp r3, r2
> vmull.s8 q10, d18, d22
> vmull.s8 q9, d19, d23
> vaddw.s16   q8, q8, d20
> vaddw.s16   q8, q8, d21
> vaddw.s16   q8, q8, d18
> vaddw.s16   q8, q8, d19
> bne .L2
> vadd.i32d16, d16, d17
> vpadd.i32   d16, d16, d16
> vmov.32 r3, d16[0]
> add r0, r0, r3
> bx  lr
>
> For NEON.  I couldn't figure out if the MVE instruction vmlaldav.s16 could be
> used to emulate this.  Because it would require additional widening to work I
> left MVE out of this patch set but perhaps someone should take a look.
>
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.

I guess you mean arm-linux-gnueabihf ?

>
> Ok for master?
>
> Thanks,
> Tamar
>
> gcc/ChangeLog:
>
> * config/arm/neon.md (usdot_prod): New.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/arm/simd/vusdot-autovec.c: New test.
>
> --- inline copy of patch --
> diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
> index 
> fec2cc91d24b6eff7b6fc8fdd54f39b3d646c468..23ad411178db77c5d19bee7452bc1070331c1aa0
>  100644
> --- a/gcc/config/arm/neon.md
> +++ b/gcc/config/arm/neon.md
> @@ -3075,6 +3075,24 @@ (define_expand "dot_prod"
>DONE;
>  })
>
> +;; Auto-vectorizer pattern for usdot
> +(define_expand "usdot_prod"
> +  [(set (match_operand:VCVTI 0 "register_operand")
> +   (plus:VCVTI (unspec:VCVTI [(match_operand: 1
> +   "register_operand")
> +  (match_operand: 2
> +   "register_operand")]
> +UNSPEC_DOT_US)
> +   (match_operand:VCVTI 3 "register_operand")))]
> +  "TARGET_I8MM"
> +{
> +  emit_insn (
> +gen_neon_usdot (operands[3], operands[3], operands[1],
> +   operands[2]));
> +  emit_insn (gen_rtx_SET (operands[0], operands[3]));
> +  DONE;
> +})
> +
>  (define_expand "neon_copysignf"
>[(match_operand:VCVTF 0 "register_operand")
> (match_operand:VCVTF 1 "register_operand")
> diff --git a/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c 
> b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> new file mode 100644
> index 
> ..7cc56f68817d77d6950df0ab372d6fbaad6b3813
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/simd/vusdot-autovec.c
> @@ -0,0 +1,38 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.2-a+i8mm" } */
> +
> +#define N 480
> +#define SIGNEDNESS_1 unsigned
> +#define SIGNEDNESS_2 signed
> +#define SIGNEDNESS_3 signed
> +#define SIGNEDNESS_4 unsigned
> +
> +SIGNEDNESS_1 int __attribute__ ((noipa))
> +f (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict a,
> +   SIGNEDNESS_4 char *restrict b)
> +{
> +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> +{
> +  int av = a[i];
> +  int bv = b[i];
> +  SIGNEDNESS_2 short mult = av * bv;
> +  res += mult;
> +}
> +  return res;
> +}
> +
> +SIGNEDNESS_1 int __attribute__ ((noipa))
> +g (SIGNEDNESS_1 int res, SIGNEDNESS_3 char *restrict b,
> +   SIGNEDNESS_4 char *restrict a)
> +{
> +  for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> +{
> +  int av = a[i];
> +  int bv = b[i];
> +  SIGNEDNESS_2 short mult = av * bv;
> +  res += mult;
> +}
> +  return res;
> +}
> +
> +/* { dg-final { scan-assembler-times {vusdot.s8} 2 { target { 
> arm-*-*-gnueabihf } } } } */
>
>
> --


Re: [Patch, fortran] PRs 46691 and 99819: Assumed and explicit size class arrays

2021-05-06 Thread Andre Vehreschild via Gcc-patches
Hi Paul,

this and the Changelog LGTM at least for 12. Give it a consolidation time before
applying to 11. Having had some issues in the vicinity of the code you addressed
I am quite happy to see how easy the fix looks.

Any chances you can take a look at
 https://gcc.gnu.org/pipermail/fortran/2021-April/055990.html
?

Regards,
Andre

On Thu, 6 May 2021 07:57:05 +0100
Paul Richard Thomas via Fortran  wrote:

> Hi All,
>
> Although I had undertaken to concentrate on PDTs, PR99819 so intrigued me
> that I became locked into it :-( After extensive, fruitless rummaging
> through decl.c and trans-decl.c, I realised that the problem was far
> simpler than it seemed and that it lay in class.c. After that PR was fixed,
> PR46691 was a trivial follow up.
>
> The comments in the patch explain the fixes. I left a TODO for the extent
> checking of assumed size class arrays. I will try to fix it before pushing.
>
> Regtested on FC33/x86_64 and checked against the 'other brand'. OK for
> 12-branch and, perhaps, 11-branch?
>
> Regards
>
> Paul
>
> Fortran: Assumed and explicit size class arrays [PR46691/99819].
>
> 2021-05-06  Paul Thomas  
>
> gcc/fortran/ChangeLog
>
> PR fortran/46691
> PR fortran/99819
> * class.c (gfc_build_class_symbol): Class array types that are
> not deferred shape or assumed rank are given a unique name and
> placed in the procedure namespace.
> * trans-array.c (gfc_trans_g77_array): Obtain the data pointer
> for class arrays.
> (gfc_trans_dummy_array_bias): Suppress the runtime error for
> extent violations in explicit shape class arrays because it
> always fails.
> * trans-expr.c (gfc_conv_procedure_call): Handle assumed size
> class actual arguments passed to non-descriptor formal args by
> using the data pointer, stored as the symbol's backend decl.
>
> gcc/testsuite/ChangeLog
>
> PR fortran/46691
> PR fortran/99819
> * gfortran.dg/class_dummy_6.f90: New test.
> * gfortran.dg/class_dummy_6.f90: New test.


--
Andre Vehreschild * Email: vehre ad gmx dot de


Re: [PATCH] regcprop: Fix another cprop_hardreg bug [PR100342]

2021-05-06 Thread Jakub Jelinek via Gcc-patches
On Wed, May 05, 2021 at 07:44:46PM +0200, Jakub Jelinek via Gcc-patches wrote:
> So perhaps just the vd->e[dr].mode in there could change to
> GET_MODE (src) and drop the previous PR98694 change?

I've bootstrapped/regtested that successfully on the trunk
(on {x86_64,i686}-linux), though haven't attempted to merge the two comments:

2021-05-06  Jakub Jelinek  

PR rtl-optimization/100342
* regcprop.c (copy_value): When copying a source reg in a wider
mode than it has recorded for the value, adjust recorded destination
mode too.

* gcc.target/i386/pr100342.c: New test.

--- gcc/regcprop.c.jj   2020-04-30 17:41:37.624675304 +0200
+++ gcc/regcprop.c  2021-05-05 16:24:01.667308941 +0200
@@ -382,10 +382,22 @@ copy_value (rtx dest, rtx src, struct va
  (set (reg:HI R5) (reg:HI R[0-4]))
 
  in which all registers have only 16 defined bits.  */
-  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src))
-  && partial_subreg_p (vd->e[sr].mode,
-   vd->e[vd->e[sr].oldest_regno].mode))
-return;
+
+  /* If a narrower value is copied using wider mode, the upper bits
+ are undefined (could be e.g. a former paradoxical subreg).  Signal
+ in that case we've only copied value using the narrower mode.
+ Consider:
+ (set (reg:DI r14) (mem:DI ...))
+ (set (reg:QI si) (reg:QI r14))
+ (set (reg:DI bp) (reg:DI r14))
+ (set (reg:DI r14) (const_int ...))
+ (set (reg:DI dx) (reg:DI si))
+ (set (reg:DI si) (const_int ...))
+ (set (reg:DI dx) (reg:DI bp))
+ The last set is not redundant, while the low 8 bits of dx are already
+ equal to low 8 bits of bp, the other bits are undefined.  */
+  else if (partial_subreg_p (vd->e[sr].mode, GET_MODE (src)))
+set_value_regno (dr, vd->e[sr].mode, vd);
 
   /* Link DR at the end of the value chain used by SR.  */
 
--- gcc/testsuite/gcc.target/i386/pr100342.c.jj 2021-05-05 17:01:29.139356719 
+0200
+++ gcc/testsuite/gcc.target/i386/pr100342.c2021-05-05 17:01:14.287521150 
+0200
@@ -0,0 +1,70 @@
+/* PR rtl-optimization/100342 */
+/* { dg-do run { target int128 } } */
+/* { dg-options "-O2 -fno-dse -fno-forward-propagate -Wno-psabi -mno-sse2" } */
+
+#define SHL(x, y) ((x) << ((y) & (sizeof(x) * 8 - 1)))
+#define SHR(x, y) ((x) >> ((y) & (sizeof(x) * 8 - 1)))
+#define ROR(x, y) (SHR(x, y)) | (SHL(x, (sizeof(x) * 8 - (y
+#define SHLV(x, y) ((x) << ((y) & (sizeof((x)[0]) * 8 - 1)))
+#define SHLSV(x, y) ((x) << ((y) & (sizeof((y)[0]) * 8 - 1)))
+typedef unsigned char A;
+typedef unsigned char __attribute__((__vector_size__ (8))) B;
+typedef unsigned char __attribute__((__vector_size__ (16))) C;
+typedef unsigned char __attribute__((__vector_size__ (32))) D;
+typedef unsigned char __attribute__((__vector_size__ (64))) E;
+typedef unsigned short F;
+typedef unsigned short __attribute__((__vector_size__ (16))) G;
+typedef unsigned int H;
+typedef unsigned int __attribute__((__vector_size__ (32))) I;
+typedef unsigned long long J;
+typedef unsigned long long __attribute__((__vector_size__ (8))) K;
+typedef unsigned long long __attribute__((__vector_size__ (32))) L;
+typedef unsigned long long __attribute__((__vector_size__ (64))) M;
+typedef unsigned __int128 N;
+typedef unsigned __int128 __attribute__((__vector_size__ (16))) O;
+typedef unsigned __int128 __attribute__((__vector_size__ (32))) P;
+typedef unsigned __int128 __attribute__((__vector_size__ (64))) Q;
+B v1;
+D v2;
+L v3;
+K v4;
+I v5;
+O v6;
+
+B
+foo (A a, C b, E c, F d, G e, H f, J g, M h, N i, P j, Q k)
+{
+  b &= (A) f;
+  k += a;
+  G l = e;
+  D m = v2 >= (A) (J) v1;
+  J r = a + g;
+  L n = v3 <= f;
+  k -= i / f;
+  l -= (A) g;
+  c |= (A) d;
+  b -= (A) i;
+  J o = ROR (__builtin_clz (r), a);
+  K p = v4 | f, q = v4 <= f;
+  P s = SHLV (SHLSV (__builtin_bswap64 (i), (P) (0 < j)) <= 0, j);
+  n += a <= r;
+  M t = (M) (a / SHLV (c, 0)) != __builtin_bswap64 (i);
+  I u = f - v5;
+  E v = (E) h + (E) t + (E) k;
+  D w = (union { D b[2]; }) { }.b[0] + ((union { E b; }) v).b[1] + m + (D) u + 
(D) n + (D) s;
+  C x = ((union { D b; }) w).b[1] + b + (C) l + (C) v6;
+  B y = ((union { C a; B b; }) x).b + ((union { C a; B b[2]; }) x).b[1] + (B) 
p + (B) q;
+  J z = i + o;
+  F z2 = z;
+  A z3 = z2;
+  return y + z3;
+}
+
+int
+main ()
+{
+  B x = foo (0, (C) { }, (E) { }, 10, (G) { }, 4, 2, (M) { }, 
123842323652213865LL, (P) { 1 }, (Q) { });
+  if ((J) x != 0x2e2c2e2c2e2c2e30ULL)
+__builtin_abort();
+  return 0;
+}


Jakub



[PATCH] phiopt: Use gphi *phi instead of gimple *phi some more

2021-05-06 Thread Jakub Jelinek via Gcc-patches
Hi!

Various functions in phiopt are also called with a gphi * but use
gimple * argument for it.  Fixed thusly, bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?

2021-05-06  Jakub Jelinek  

* tree-ssa-phiopt.c (value_replacement, minmax_replacement,
abs_replacement, xor_replacement,
cond_removal_in_popcount_clz_ctz_pattern,
replace_phi_edge_with_variable): Change type of phi argument from
gimple * to gphi *.

--- gcc/tree-ssa-phiopt.c.jj2021-05-05 15:06:23.253189139 +0200
+++ gcc/tree-ssa-phiopt.c   2021-05-05 18:02:07.600019038 +0200
@@ -57,23 +57,23 @@ static bool conditional_replacement (bas
 static gphi *factor_out_conditional_conversion (edge, edge, gphi *, tree, tree,
gimple *);
 static int value_replacement (basic_block, basic_block,
- edge, edge, gimple *, tree, tree);
+ edge, edge, gphi *, tree, tree);
 static bool minmax_replacement (basic_block, basic_block,
-   edge, edge, gimple *, tree, tree);
+   edge, edge, gphi *, tree, tree);
 static bool abs_replacement (basic_block, basic_block,
-edge, edge, gimple *, tree, tree);
+edge, edge, gphi *, tree, tree);
 static bool xor_replacement (basic_block, basic_block,
-edge, edge, gimple *, tree, tree);
+edge, edge, gphi *, tree, tree);
 static bool spaceship_replacement (basic_block, basic_block,
   edge, edge, gphi *, tree, tree);
 static bool cond_removal_in_popcount_clz_ctz_pattern (basic_block, basic_block,
- edge, edge, gimple *,
+ edge, edge, gphi *,
  tree, tree);
 static bool cond_store_replacement (basic_block, basic_block, edge, edge,
hash_set *);
 static bool cond_if_else_store_replacement (basic_block, basic_block, 
basic_block);
 static hash_set * get_non_trapping ();
-static void replace_phi_edge_with_variable (basic_block, edge, gimple *, tree);
+static void replace_phi_edge_with_variable (basic_block, edge, gphi *, tree);
 static void hoist_adjacent_loads (basic_block, basic_block,
  basic_block, basic_block);
 static bool gate_hoist_loads (void);
@@ -389,7 +389,7 @@ tree_ssa_phiopt_worker (bool do_store_el
 
 static void
 replace_phi_edge_with_variable (basic_block cond_block,
-   edge e, gimple *phi, tree new_tree)
+   edge e, gphi *phi, tree new_tree)
 {
   basic_block bb = gimple_bb (phi);
   basic_block block_to_remove;
@@ -1113,8 +1113,7 @@ absorbing_element_p (tree_code code, tre
 
 static int
 value_replacement (basic_block cond_bb, basic_block middle_bb,
-  edge e0, edge e1, gimple *phi,
-  tree arg0, tree arg1)
+  edge e0, edge e1, gphi *phi, tree arg0, tree arg1)
 {
   gimple_stmt_iterator gsi;
   gimple *cond;
@@ -1422,8 +1421,7 @@ value_replacement (basic_block cond_bb,
 
 static bool
 minmax_replacement (basic_block cond_bb, basic_block middle_bb,
-   edge e0, edge e1, gimple *phi,
-   tree arg0, tree arg1)
+   edge e0, edge e1, gphi *phi, tree arg0, tree arg1)
 {
   tree result;
   edge true_edge, false_edge;
@@ -2266,7 +2264,7 @@ spaceship_replacement (basic_block cond_
 static bool
 cond_removal_in_popcount_clz_ctz_pattern (basic_block cond_bb,
  basic_block middle_bb,
- edge e1, edge e2, gimple *phi,
+ edge e1, edge e2, gphi *phi,
  tree arg0, tree arg1)
 {
   gimple *cond;
@@ -2424,7 +2422,7 @@ cond_removal_in_popcount_clz_ctz_pattern
 static bool
 abs_replacement (basic_block cond_bb, basic_block middle_bb,
 edge e0 ATTRIBUTE_UNUSED, edge e1,
-gimple *phi, tree arg0, tree arg1)
+gphi *phi, tree arg0, tree arg1)
 {
   tree result;
   gassign *new_stmt;
@@ -2548,7 +2546,7 @@ abs_replacement (basic_block cond_bb, ba
 static bool
 xor_replacement (basic_block cond_bb, basic_block middle_bb,
 edge e0 ATTRIBUTE_UNUSED, edge e1,
-gimple *phi, tree arg0, tree arg1)
+gphi *phi, tree arg0, tree arg1)
 {
   if (!INTEGRAL_TYPE_P (TREE_TYPE (arg1)))
 return false;

Jakub



[PATCH] arm: Avoid emitting bogus CFA adjusts for CMSE nonsecure calls [PR99725]

2021-05-06 Thread Alex Coplan via Gcc-patches
Hi all,

The PR shows us attaching REG_CFA_ADJUST_CFA notes to stack pointer
adjustments emitted in cmse_nonsecure_call_inline_register_clear (when
-march=armv8.1-m.main). However, the stack pointer is not guaranteed to
be the CFA reg. If we're at -O0 or we have -fno-omit-frame-pointer, then
the frame pointer will be used as the CFA reg, and these notes on the sp
adjustments will lead to ICEs in dwarf2out_frame_debug_adjust_cfa.

This patch avoids emitting these notes if the current function has a
frame pointer.

Testing:
 * Bootstrapped and regtested on arm-linux-gnueabihf, no regressions.
 * Regtested an arm-eabi cross configured with --with-arch=armv8.1-m.main, no
   regressions.

OK for trunk and backports as appropriate?

Thanks,
Alex

gcc/ChangeLog:

PR target/99725
* config/arm/arm.c (cmse_nonsecure_call_inline_register_clear):
Avoid emitting CFA adjusts on the sp if we have the fp.

gcc/testsuite/ChangeLog:

PR target/99725
* gcc.target/arm/cmse/pr99725.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 0371d9818fd..2962071adfd 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -18774,10 +18774,14 @@ cmse_nonsecure_call_inline_register_clear (void)
  imm = gen_int_mode (- lazy_store_stack_frame_size, SImode);
  add_insn = emit_insn (gen_addsi3 (stack_pointer_rtx,
stack_pointer_rtx, imm));
- arm_add_cfa_adjust_cfa_note (add_insn,
-  - lazy_store_stack_frame_size,
-  stack_pointer_rtx,
-  stack_pointer_rtx);
+ /* If we have the frame pointer, then it will be the
+CFA reg.  Otherwise, the stack pointer is the CFA
+reg, so we need to emit a CFA adjust.  */
+ if (!frame_pointer_needed)
+   arm_add_cfa_adjust_cfa_note (add_insn,
+- lazy_store_stack_frame_size,
+stack_pointer_rtx,
+stack_pointer_rtx);
  emit_insn (gen_lazy_store_multiple_insn (stack_pointer_rtx));
}
  /* Save VFP callee-saved registers.  */
@@ -18815,10 +18819,11 @@ cmse_nonsecure_call_inline_register_clear (void)
  rtx_insn *add_insn =
emit_insn (gen_addsi3 (stack_pointer_rtx,
   stack_pointer_rtx, imm));
- arm_add_cfa_adjust_cfa_note (add_insn,
-  lazy_store_stack_frame_size,
-  stack_pointer_rtx,
-  stack_pointer_rtx);
+ if (!frame_pointer_needed)
+   arm_add_cfa_adjust_cfa_note (add_insn,
+lazy_store_stack_frame_size,
+stack_pointer_rtx,
+stack_pointer_rtx);
}
  /* Restore VFP callee-saved registers.  */
  else
diff --git a/gcc/testsuite/gcc.target/arm/cmse/pr99725.c 
b/gcc/testsuite/gcc.target/arm/cmse/pr99725.c
new file mode 100644
index 000..284da184f96
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/cmse/pr99725.c
@@ -0,0 +1,5 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mcmse -g" } */
+typedef int __attribute__((cmse_nonsecure_call)) (*t)();
+t f;
+void g() { f(); }


Re: [PATCH] split loop for NE condition.

2021-05-06 Thread Richard Biener
On Thu, 6 May 2021, guojiufu wrote:

> On 2021-05-03 20:18, Richard Biener wrote:
> > On Thu, 29 Apr 2021, Jiufu Guo wrote:
> > 
> >> When there is the possibility that overflow may happen on the loop index,
> >> a few optimizations would not happen. For example code:
> >> 
> >> foo (int *a, int *b, unsigned k, unsigned n)
> >> {
> >>   while (++k != n)
> >> a[k] = b[k]  + 1;
> >> }
> >> 
> >> For this code, if "l > n", overflow may happen.  if "l < n" at begining,
> >> it could be optimized (e.g. vectorization).
> >> 
> >> We can split the loop into two loops:
> >> 
> >>   while (++k > n)
> >> a[k] = b[k]  + 1;
> >>   while (l++ < n)
> >> a[k] = b[k]  + 1;
> >> 
> >> then for the second loop, it could be optimized.
> >> 
> >> This patch is splitting this kind of small loop to achieve better
> >> performance.
> >> 
> >> Bootstrap and regtest pass on ppc64le.  Is this ok for trunk?
> > 
> > Do you have any statistics on how often this splits a loop during
> > bootstrap (use --with-build-config=bootstrap-O3)?  Or alternatively
> > on SPEC?
> 
> In SPEC2017, there are ~240 loops are split.  And I saw some performance
> improvement on xz.
> I would try bootstrap-O3 (encounter ICE).
> 
> > 
> > Actual comments on the patch inline.
> > 
> >> Thanks!
> >> 
> >> Jiufu Guo.
> >> 
> >> gcc/ChangeLog:
> >> 
> >> 2021-04-29  Jiufu Guo  
> >> 
> >>  * params.opt (max-insns-ne-cond-split): New.
> >>  * tree-ssa-loop-split.c (connect_loop_phis): Add new param.
> >>  (get_ne_cond_branch): New function.
> >>  (split_ne_loop): New function.
> >>  (split_loop_on_ne_cond): New function.
> >>  (tree_ssa_split_loops): Use split_loop_on_ne_cond.
> >> 
> >> gcc/testsuite/ChangeLog:
> >> 2021-04-29  Jiufu Guo  
> >> 
> >>  * gcc.dg/loop-split1.c: New test.
> >> 
> >> ---
> >>  gcc/params.opt |   4 +
> >>  gcc/testsuite/gcc.dg/loop-split1.c |  28 
> >>  gcc/tree-ssa-loop-split.c  | 219 
> >> -
> >>  3 files changed, 247 insertions(+), 4 deletions(-)
> >>  create mode 100644 gcc/testsuite/gcc.dg/loop-split1.c
> >> 
> >> diff --git a/gcc/params.opt b/gcc/params.opt
> >> index 2e4cbdd7a71..900b59b5136 100644
> >> --- a/gcc/params.opt
> >> +++ b/gcc/params.opt
> >> @@ -766,6 +766,10 @@ Min. ratio of insns to prefetches to enable
> >> prefetching for a loop with an unkno
> >> Common Joined UInteger Var(param_min_loop_cond_split_prob) Init(30)
> >> IntegerRange(0, 100) Param Optimization
> >> The minimum threshold for probability of semi-invariant condition statement
> >> to trigger loop split.
> >> 
> >> +-param=max-insns-ne-cond-split=
> >> +Common Joined UInteger Var(param_max_insn_ne_cond_split) Init(64) Param
> >> Optimization
> >> +The maximum threshold for insnstructions number of a loop with ne
> >> condition to split.
> >> +
> >>  -param=min-nondebug-insn-uid=
> >>  Common Joined UInteger Var(param_min_nondebug_insn_uid) Param
> >>  The minimum UID to be used for a nondebug insn.
> >> diff --git a/gcc/testsuite/gcc.dg/loop-split1.c
> >> b/gcc/testsuite/gcc.dg/loop-split1.c
> >> new file mode 100644
> >> index 000..4c466aa9f54
> >> --- /dev/null
> >> +++ b/gcc/testsuite/gcc.dg/loop-split1.c
> >> @@ -0,0 +1,28 @@
> >> +/* { dg-do compile } */
> >> +/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
> >> +
> >> +void
> >> +foo (int *a, int *b, unsigned l, unsigned n)
> >> +{
> >> +  while (++l != n)
> >> +a[l] = b[l]  + 1;
> >> +}
> >> +
> >> +void
> >> +foo1 (int *a, int *b, unsigned l, unsigned n)
> >> +{
> >> +  while (l++ != n)
> >> +a[l] = b[l]  + 1;
> >> +}
> >> +
> >> +unsigned
> >> +foo2 (char *a, char *b, unsigned l, unsigned n)
> >> +{
> >> +  while (++l != n)
> >> +if (a[l] != b[l])
> >> +  break;
> >> +
> >> +  return l;
> >> +}
> >> +
> >> +/* { dg-final { scan-tree-dump-times "Loop split" 3 "lsplit" } } */
> >> diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
> >> index b80b6a75e62..a6d28078e5e 100644
> >> --- a/gcc/tree-ssa-loop-split.c
> >> +++ b/gcc/tree-ssa-loop-split.c
> >> @@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
> >>  #include "cfghooks.h"
> >>  #include "gimple-fold.h"
> >>  #include "gimplify-me.h"
> >> +#include "tree-ssa-loop-ivopts.h"
> >> 
> >>  /* This file implements two kinds of loop splitting.
> >> 
> >> @@ -233,7 +234,8 @@ easy_exit_values (class loop *loop)
> >> this.  The loops need to fulfill easy_exit_values().  */
> >> 
> >> static void
> >> -connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e)
> >> +connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e,
> >> + bool use_prev = false)
> >>  {
> >>basic_block rest = loop_preheader_edge (loop2)->src;
> >>gcc_assert (new_e->dest == rest);
> >> @@ -248,13 +250,14 @@ connect_loop_phis (class loop *loop1, class loop
> >> *loop2, edge new_e)
> >> !gsi_end_p (psi_first);
> >> gsi_next (_first), gsi_next (_second))
> >>  {
> 

Re: [PATCH] IBM Z: Fix error checking for builtin vec_permi

2021-05-06 Thread Andreas Krebbel via Gcc-patches
On 5/6/21 9:56 AM, Marius Hillenbrand wrote:
> Hi,
> 
> this patch fixes the check of immediate operands to the builtin vec_permi and
> adds a new test for this built-in.
> 
> Reg-rested and bootstrapped on s390x.
> 
> Is it OK for master? Is it OK for backporting to gcc-11?
> 
> Regards,
> Marius
> 
> 
> --8<--8<-8<-
> 
> The builtin vec_permi is peculiar in that its immediate operand is
> encoded differently than the immediate operand that is backing the
> builtin. This fixes the check for the immediate operand, adding a
> regression test in the process.
> 
> This partially reverts commit 3191c1f4488d1f7563b563d7ae2a102a26f16d82
> 
> gcc/ChangeLog:
> 
> 2021-05-04  Marius Hillenbrand  
> 
> * config/s390/s390-builtins.def (O_M5, O1_M5, ...): Remove unused 
> macros.
> (s390_vec_permi_s64, s390_vec_permi_b64, s390_vec_permi_u64)
> (s390_vec_permi_dbl, s390_vpdi): Use the O3_U2 type for the immediate
> operand.
>   * config/s390/s390.c (s390_const_operand_ok): Remove unused
>   values.
> 
> gcc/testsuite/ChangeLog:
> 
> * gcc.target/s390/zvector/imm-range-error-1.c: Fix test for
>   __builtin_s390_vpdi.
> * gcc.target/s390/zvector/vec-permi.c: New test for builtin
>   vec_permi.

Ok for mainline and GCC 11 branch. Thanks for the fix!

Andreas


RE: [GCC-10 backport][PATCH] arm: Fix testisms introduced with fix for pr target/95646.

2021-05-06 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 05 May 2021 14:32
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> 
> Subject: [GCC-10 backport][PATCH] arm: Fix testisms introduced with fix for
> pr target/95646.
> 
> Hi,
> 
> This is a backport to gcc-10, cleanly applied on the branch.
> 
> This patch changes the test to use the effective-target machinery disables
> the
> error message "ARMv8-M Security Extensions incompatible with selected
> FPU" when
> -mfloat-abi=soft.
> Further changes 'asm' to '__asm__' to avoid failures with '-std=' options.
> 
> Regression tested on arm-none-eabi.
> 
> Is this Ok for GCC-10 branch?
> 

Ok.
Thanks,
Kyrill

> Regards,
> Srinath.
> 
> gcc/ChangeLog:
> 2020-07-06  Andre Vieira  
> 
>   * config/arm/arm.c (arm_options_perform_arch_sanity_checks): Do
> not
>   check +D32 for CMSE if -mfloat-abi=soft
> 
> gcc/testsuite/ChangeLog:
> 2020-07-06  Andre Vieira  
> 
>   * gcc.target/arm/pr95646.c: Fix testism.
> 
> (cherry picked from commit 80297f897758f59071968ddff2a04a8d11481117)
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index
> 6f4381fd6e959321d8d319fafdce4079c7b54e5f..c3bbd9fd5e177f07b37610df
> 57d4f02bd0402761 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -3850,7 +3850,7 @@ arm_options_perform_arch_sanity_checks (void)
> 
>/* We don't clear D16-D31 VFP registers for cmse_nonsecure_call
> functions
>   and ARMv8-M Baseline and Mainline do not allow such configuration.  */
> -  if (use_cmse && LAST_VFP_REGNUM > LAST_LO_VFP_REGNUM)
> +  if (use_cmse && TARGET_HARD_FLOAT && LAST_VFP_REGNUM >
> LAST_LO_VFP_REGNUM)
>  error ("ARMv8-M Security Extensions incompatible with selected FPU");
> 
> 
> diff --git a/gcc/testsuite/gcc.target/arm/pr95646.c
> b/gcc/testsuite/gcc.target/arm/pr95646.c
> index
> 12d06a0c8c1ed7de1f8d4d15130432259e613a32..cde1b2d9d36a4e39cd916f
> dcc9eef424a22bd589 100644
> --- a/gcc/testsuite/gcc.target/arm/pr95646.c
> +++ b/gcc/testsuite/gcc.target/arm/pr95646.c
> @@ -1,10 +1,7 @@
>  /* { dg-do compile } */
> -/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" 
> } { "-
> march=armv8-m.base" } } */
> -/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mcpu=*" } 
> { "-
> mcpu=cortex-m23" } } */
> -/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mfpu=*" } 
> { } }
> */
> -/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { 
> "-mfloat-abi=*" }
> { "-mfloat-abi=soft" } } */
> -/* { dg-options "-mcpu=cortex-m23 -mcmse" } */
> -/* { dg-additional-options "-Os" } */
> +/* { dg-require-effective-target arm_arch_v8m_base_ok } */
> +/* { dg-add-options arm_arch_v8m_base } */
> +/* { dg-additional-options "-mcmse -Os" } */
>  /* { dg-final { check-function-bodies "**" "" } } */
> 
>  int __attribute__ ((cmse_nonsecure_entry))
> @@ -27,6 +24,6 @@ foo (void)
>  int __attribute__ ((cmse_nonsecure_entry))
>  bar (void)
>  {
> -  asm ("": : : "r9");
> +  __asm__ ("" : : : "r9");
>return 1;
>  }



RE: [GCC-10 backport][PATCH] arm: PR target/95646: Do not clobber callee saved registers with CMSE.

2021-05-06 Thread Kyrylo Tkachov via Gcc-patches



> -Original Message-
> From: Srinath Parvathaneni 
> Sent: 05 May 2021 14:32
> To: gcc-patches@gcc.gnu.org
> Cc: Kyrylo Tkachov ; Richard Earnshaw
> 
> Subject: [GCC-10 backport][PATCH] arm: PR target/95646: Do not clobber
> callee saved registers with CMSE.
> 
> Hi,
> 
> This is a backport to gcc-10, cleanly applied on the branch.
> 
> As reported in bugzilla when the -mcmse option is used while compiling for
> size
> (-Os) with a thumb-1 target the generated code will clear the registers 
> r7-r10.
> These however are callee saved and should be preserved accross ABI
> boundaries.
> The reason this happens is because these registers are made "fixed" when
> optimising for size with Thumb-1 in a way to make sure they are not used, as
> pushing and popping hi-registers requires extra moves to and from LO_REGS.
> 
> To fix this, this patch uses 'callee_saved_reg_p', which accounts for this
> optimisation, instead of 'call_used_or_fixed_reg_p'. Be aware of
> 'callee_saved_reg_p''s definition, as it does still take call used registers
> into account, which aren't callee_saved in my opinion, so it is a rather
> misnoemer, works in our advantage here though as it does exactly what we
> need.
> 
> Regression tested on arm-none-eabi.
> 
> Is this Ok for GCC-10 branch?

Ok.
Thanks,
Kyrill

> 
> Regards,
> Srinath.
> 
> gcc/ChangeLog:
> 2020-06-19  Andre Vieira  
> 
>   PR target/95646
>   * config/arm/arm.c: (cmse_nonsecure_entry_clear_before_return):
> Use
>   'callee_saved_reg_p' instead of 'calL_used_or_fixed_reg_p'.
> 
> gcc/testsuite/ChangeLog:
> 2020-06-19  Andre Vieira  
> 
>   PR target/95646
>   * gcc.target/arm/pr95646.c: New test.
> 
> (cherry picked from commit 5f426554fd804d65509875d706d8b8bc3a48393b)
> 
> 
> ### Attachment also inlined for ease of reply
> ###
> 
> 
> diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
> index
> 781bcc8ca42e10524595cb6c90b61450a41f739e..6f4381fd6e959321d8d319f
> afdce4079c7b54e5f 100644
> --- a/gcc/config/arm/arm.c
> +++ b/gcc/config/arm/arm.c
> @@ -27011,7 +27011,7 @@ cmse_nonsecure_entry_clear_before_return
> (void)
>   continue;
>if (IN_RANGE (regno, IP_REGNUM, PC_REGNUM))
>   continue;
> -  if (call_used_or_fixed_reg_p (regno)
> +  if (!callee_saved_reg_p (regno)
> && (!IN_RANGE (regno, FIRST_VFP_REGNUM, LAST_VFP_REGNUM)
> || TARGET_HARD_FLOAT))
>   bitmap_set_bit (to_clear_bitmap, regno);
> diff --git a/gcc/testsuite/gcc.target/arm/pr95646.c
> b/gcc/testsuite/gcc.target/arm/pr95646.c
> new file mode 100644
> index
> ..12d06a0c8c1ed7de1f8d4d
> 15130432259e613a32
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arm/pr95646.c
> @@ -0,0 +1,32 @@
> +/* { dg-do compile } */
> +/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-march=*" 
> } { "-
> march=armv8-m.base" } } */
> +/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mcpu=*" } 
> { "-
> mcpu=cortex-m23" } } */
> +/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { "-mfpu=*" } 
> { } }
> */
> +/* { dg-skip-if "avoid conflicting multilib options" { *-*-* } { 
> "-mfloat-abi=*" }
> { "-mfloat-abi=soft" } } */
> +/* { dg-options "-mcpu=cortex-m23 -mcmse" } */
> +/* { dg-additional-options "-Os" } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +int __attribute__ ((cmse_nonsecure_entry))
> +foo (void)
> +{
> +  return 1;
> +}
> +/* { { dg-final { scan-assembler-not "mov\tr9, r0" } } */
> +
> +/*
> +** __acle_se_bar:
> +**   mov (r[0-3]), r9
> +**   push{\1}
> +** ...
> +**   pop {(r[0-3])}
> +**   mov r9, \2
> +** ...
> +**   bxnslr
> +*/
> +int __attribute__ ((cmse_nonsecure_entry))
> +bar (void)
> +{
> +  asm ("": : : "r9");
> +  return 1;
> +}



Re: [Patch, fortran] PRs 46691 and 99819: Assumed and explicit size class arrays

2021-05-06 Thread Paul Richard Thomas via Gcc-patches
Hi All,

Please find below a corrected ChangeLog.

Sorry that I didn't get it right first go.

Paul

Fortran: Assumed and explicit size class arrays [PR46691/99819].

2021-05-06  Paul Thomas  

gcc/fortran/ChangeLog

PR fortran/46691
PR fortran/99819
* class.c (gfc_build_class_symbol): Remove the error that
disables assumed size class arrays. Class array types that are
not deferred shape or assumed rank are given a unique name and
placed in the procedure namespace.
* trans-array.c (gfc_trans_g77_array): Obtain the data pointer
for class arrays.
(gfc_trans_dummy_array_bias): Suppress the runtime error for
extent violations in explicit shape class arrays because it
always fails.
* trans-expr.c (gfc_conv_procedure_call): Handle assumed size
class actual arguments passed to non-descriptor formal args by
using the data pointer, stored as the symbol's backend decl.

gcc/testsuite/ChangeLog

PR fortran/46691
PR fortran/99819
* gfortran.dg/class_dummy_6.f90: New test.
* gfortran.dg/class_dummy_7.f90: New test.


On Thu, 6 May 2021 at 07:57, Paul Richard Thomas <
paul.richard.tho...@gmail.com> wrote:

> Hi All,
>
> Although I had undertaken to concentrate on PDTs, PR99819 so intrigued me
> that I became locked into it :-( After extensive, fruitless rummaging
> through decl.c and trans-decl.c, I realised that the problem was far
> simpler than it seemed and that it lay in class.c. After that PR was fixed,
> PR46691 was a trivial follow up.
>
> The comments in the patch explain the fixes. I left a TODO for the extent
> checking of assumed size class arrays. I will try to fix it before pushing.
>
> Regtested on FC33/x86_64 and checked against the 'other brand'. OK for
> 12-branch and, perhaps, 11-branch?
>
> Regards
>
> Paul
>
> Fortran: Assumed and explicit size class arrays [PR46691/99819].
>
> 2021-05-06  Paul Thomas  
>
> gcc/fortran/ChangeLog
>
> PR fortran/46691
> PR fortran/99819
> * class.c (gfc_build_class_symbol): Class array types that are
> not deferred shape or assumed rank are given a unique name and
> placed in the procedure namespace.
> * trans-array.c (gfc_trans_g77_array): Obtain the data pointer
> for class arrays.
> (gfc_trans_dummy_array_bias): Suppress the runtime error for
> extent violations in explicit shape class arrays because it
> always fails.
> * trans-expr.c (gfc_conv_procedure_call): Handle assumed size
> class actual arguments passed to non-descriptor formal args by
> using the data pointer, stored as the symbol's backend decl.
>
> gcc/testsuite/ChangeLog
>
> PR fortran/46691
> PR fortran/99819
> * gfortran.dg/class_dummy_6.f90: New test.
> * gfortran.dg/class_dummy_6.f90: New test.
>
>

-- 
"If you can't explain it simply, you don't understand it well enough" -
Albert Einstein


[PATCH] ipa/100373 - fix emutls lowering compare-debug issue

2021-05-06 Thread Richard Biener
emutls figured that tls uses in debug-insns need lowering but
that obviously has effects on code-generation as can be seen
in the following IL diff with the new testcase:

[local count: 1073741824]:
-  a = 0;
+  # DEBUG BEGIN_STMT
   _4 = __builtin___emutls_get_address (&__emutls_v.b);
+  # DEBUG D#1 => *_4
+  # DEBUG d => (long int) D#1
+  # DEBUG BEGIN_STMT
+  a = 0;
+  # DEBUG BEGIN_STMT
   *_4 = 0;
   return;

where it figured the debug use of b in the original

   [local count: 1073741824]:
  # DEBUG BEGIN_STMT
  # DEBUG D#1 => b
  # DEBUG d => (long int) D#1
  # DEBUG BEGIN_STMT
  a = 0;

needs lowering (it maybe does when we want to produce perfect
debug but that's just bad luck).

The following patch fixes this by avoiding to create a new
emutls address when visiting debug stmts and instead resets them.
Another option might be to simply not lower debug stmt uses
but I have no way to verify actual debug info for this.

Bootstrapped and tested on x86_64-unknown-linux-gnu, Iain reports
this fixes the bootstrap-debug miscompare on darwin and also
passes testing there.

Pushed.

2021-05-05  Richard Biener  

PR ipa/100373
* tree-emutls.c (gen_emutls_addr): Pass in whether we're
dealing with a debug use and only query existing addresses
if so.
(lower_emutls_1): Avoid splitting out addresses for debug
stmts, reset the debug stmt when we fail to find existing
lowered addresses.
(lower_emutls_phi_arg): Set wi.stmt.

* gcc.dg/pr100373.c: New testcase.
---
 gcc/testsuite/gcc.dg/pr100373.c | 11 +++
 gcc/tree-emutls.c   | 17 +
 2 files changed, 24 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr100373.c

diff --git a/gcc/testsuite/gcc.dg/pr100373.c b/gcc/testsuite/gcc.dg/pr100373.c
new file mode 100644
index 000..d4cd52a95de
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr100373.c
@@ -0,0 +1,11 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fcompare-debug" } */
+
+int a;
+_Thread_local int b;
+void c()
+{
+  long d = b;
+  a = 0;
+  b = 0;
+}
diff --git a/gcc/tree-emutls.c b/gcc/tree-emutls.c
index 1c9c5d5aee1..92cb6194f21 100644
--- a/gcc/tree-emutls.c
+++ b/gcc/tree-emutls.c
@@ -394,13 +394,13 @@ struct lower_emutls_data
Append any new computation statements required to D->SEQ.  */
 
 static tree
-gen_emutls_addr (tree decl, struct lower_emutls_data *d)
+gen_emutls_addr (tree decl, struct lower_emutls_data *d, bool for_debug)
 {
   /* Compute the address of the TLS variable with help from runtime.  */
   tls_var_data *data = tls_map->get (varpool_node::get (decl));
   tree addr = data->access;
 
-  if (addr == NULL)
+  if (addr == NULL && !for_debug)
 {
   varpool_node *cvar;
   tree cdecl;
@@ -480,7 +480,7 @@ lower_emutls_1 (tree *ptr, int *walk_subtrees, void 
*cb_data)
*ptr = t = unshare_expr (t);
 
  /* If we're allowed more than just is_gimple_val, continue.  */
- if (!wi->val_only)
+ if (!wi->val_only || is_gimple_debug (wi->stmt))
{
  *walk_subtrees = 1;
  return NULL_TREE;
@@ -536,7 +536,15 @@ lower_emutls_1 (tree *ptr, int *walk_subtrees, void 
*cb_data)
   return NULL_TREE;
 }
 
-  addr = gen_emutls_addr (t, d);
+  addr = gen_emutls_addr (t, d, is_gimple_debug (wi->stmt));
+  if (!addr)
+{
+  gimple_debug_bind_reset_value (wi->stmt);
+  update_stmt (wi->stmt);
+  wi->changed = false;
+  /* Stop walking operands.  */
+  return error_mark_node;
+}
   if (is_addr)
 {
   /* Replace "" with "addr" in the statement.  */
@@ -590,6 +598,7 @@ lower_emutls_phi_arg (gphi *phi, unsigned int i,
   memset (, 0, sizeof (wi));
   wi.info = d;
   wi.val_only = true;
+  wi.stmt = phi;
   walk_tree (>def, lower_emutls_1, , NULL);
 
   /* For normal statements, we let update_stmt do its job.  But for phi
-- 
2.26.2


[PATCH] Fix IPA SRA removal of DECL_BY_REFERENCE return

2021-05-06 Thread Richard Biener
While doing bogus call LHS removal I noticed that cloning with
dropping a return value creates a bogus replacement for a
DECL_BY_REFERENCE DECL_RESULT, resulting in MEM_REFs of
aggregates rather than pointers.  The following fixes this
latent issue.

I don't have a testcase that ICEs without doing a bogus DSE
transform though.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

2021-05-06  Richard Biener  

* tree-inline.c (tree_function_versioning): Fix DECL_BY_REFERENCE
return variable removal.
---
 gcc/tree-inline.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/gcc/tree-inline.c b/gcc/tree-inline.c
index 49a5850f410..8f945b88c12 100644
--- a/gcc/tree-inline.c
+++ b/gcc/tree-inline.c
@@ -6367,6 +6367,8 @@ tree_function_versioning (tree old_decl, tree new_decl,
   tree resdecl_repl = copy_result_decl_to_var (DECL_RESULT (old_decl),
   );
   declare_inline_vars (NULL, resdecl_repl);
+  if (DECL_BY_REFERENCE (DECL_RESULT (old_decl)))
+   resdecl_repl = build_fold_addr_expr (resdecl_repl);
   insert_decl_map (, DECL_RESULT (old_decl), resdecl_repl);
 
   DECL_RESULT (new_decl)
-- 
2.26.2


[Ada] ACATS 4.1R-c611a04: Class-wide preconditions in dispatching calls

2021-05-06 Thread Pierre-Marie de Rodat
This patch is a partial implementation of the semantics mandated in
AI12-0195 concerning class-wide preconditions on dispatching calls: the
precondition that applies is that of the denoted subprogram entity, not
that of the body that is actually executed.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_disp.adb (Build_Class_Wide_Check): Extending the
functionality of this routine to climb to the ancestors
searching for the enclosing overridden dispatching primitive
that has a class-wide precondition to generate the check.diff --git a/gcc/ada/exp_disp.adb b/gcc/ada/exp_disp.adb
--- a/gcc/ada/exp_disp.adb
+++ b/gcc/ada/exp_disp.adb
@@ -709,10 +709,13 @@ package body Exp_Disp is
   Eq_Prim_Op  : Entity_Id := Empty;
   Controlling_Tag : Node_Id;
 
-  procedure Build_Class_Wide_Check;
+  procedure Build_Class_Wide_Check (E : Entity_Id);
   --  If the denoted subprogram has a class-wide precondition, generate a
   --  check using that precondition before the dispatching call, because
-  --  this is the only class-wide precondition that applies to the call.
+  --  this is the only class-wide precondition that applies to the call;
+  --  otherwise climb to the ancestors searching for the enclosing
+  --  overridden primitive of E that has a class-wide precondition (and
+  --  use it to generate the check).
 
   function New_Value (From : Node_Id) return Node_Id;
   --  From is the original Expression. New_Value is equivalent to a call
@@ -723,7 +726,14 @@ package body Exp_Disp is
   -- Build_Class_Wide_Check --
   
 
-  procedure Build_Class_Wide_Check is
+  procedure Build_Class_Wide_Check (E : Entity_Id) is
+ Subp : Entity_Id := E;
+
+ function Has_Class_Wide_Precondition
+   (Subp : Entity_Id) return Boolean;
+ --  Evaluates if the dispatching subprogram Subp has a class-wide
+ --  precondition.
+
  function Replace_Formals (N : Node_Id) return Traverse_Result;
  --  Replace occurrences of the formals of the subprogram by the
  --  corresponding actuals in the call, given that this check is
@@ -735,6 +745,32 @@ package body Exp_Disp is
  --  has not been analyzed yet, in which case we use the Chars
  --  field to recognize intended occurrences of the formals.
 
+ -
+ -- Has_Class_Wide_Precondition --
+ -
+
+ function Has_Class_Wide_Precondition
+   (Subp : Entity_Id) return Boolean
+ is
+Prec : Node_Id := Empty;
+
+ begin
+if Present (Contract (Subp))
+  and then Present (Pre_Post_Conditions (Contract (Subp)))
+then
+   Prec := Pre_Post_Conditions (Contract (Subp));
+
+   while Present (Prec) loop
+  exit when Pragma_Name (Prec) = Name_Precondition
+and then Class_Present (Prec);
+  Prec := Next_Pragma (Prec);
+   end loop;
+end if;
+
+return Present (Prec)
+  and then not Is_Ignored (Prec);
+ end Has_Class_Wide_Precondition;
+
  -
  -- Replace_Formals --
  -
@@ -750,27 +786,46 @@ package body Exp_Disp is
if Present (Entity (N)) and then Is_Formal (Entity (N)) then
   while Present (F) loop
  if F = Entity (N) then
-Rewrite (N, New_Copy_Tree (A));
-
---  If the formal is class-wide, and thus not a
---  controlling argument, preserve its type because
---  it may appear in a nested call with a class-wide
---  parameter.
+if not Is_Controlling_Actual (N) then
+   Rewrite (N, New_Copy_Tree (A));
+
+   --  If the formal is class-wide, and thus not a
+   --  controlling argument, preserve its type because
+   --  it may appear in a nested call with a class-wide
+   --  parameter.
+
+   if Is_Class_Wide_Type (Etype (F)) then
+  Set_Etype (N, Etype (F));
+
+   --  Conversely, if this is a controlling argument
+   --  (in a dispatching call in the condition) that
+   --  is a dereference, the source is an access-to-
+   --  -class-wide type, so preserve the dispatching
+   --  nature of the call in the rewritten condition.
+
+   elsif Nkind (Parent (N)) = N_Explicit_Dereference
+ and then 

[Ada] Avoid repeated analysis of constraint ranges

2021-05-06 Thread Pierre-Marie de Rodat
Don't call Analyse just before Process_Range_Expr_In_Decl, because the
latter starts with a call to Analyze_And_Resolve anyway.

This fixes a violation with the required sequencing in resolution of
overloaded nodes that requires the Etype being Any_Type and
Is_Overloading being False before calling Add_One_Interp for multiple
interpretations.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch3.adb (Constraint_Index): Remove redundant problematic
analysis.diff --git a/gcc/ada/sem_ch3.adb b/gcc/ada/sem_ch3.adb
--- a/gcc/ada/sem_ch3.adb
+++ b/gcc/ada/sem_ch3.adb
@@ -14171,8 +14171,6 @@ package body Sem_Ch3 is
   then
  --  A Range attribute will be transformed into N_Range by Resolve
 
- Analyze (S);
- Set_Etype (S, T);
  R := S;
 
  Process_Range_Expr_In_Decl (R, T);




[Ada] Bad expansion with -gnato2 and if expression

2021-05-06 Thread Pierre-Marie de Rodat
When dealing with -gnato2 inside an if expression, we might drop on the
floor the If or Else_Nodes.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch4.adb (Expand_N_If_Expression):
Apply_Arithmetic_Overflow_Check will not deal with
Then/Else_Actions so skip minimizing overflow checks if any
actions are present.diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -5915,9 +5915,14 @@ package body Exp_Ch4 is
--  Start of processing for Expand_N_If_Expression
 
begin
-  --  Check for MINIMIZED/ELIMINATED overflow mode
+  --  Check for MINIMIZED/ELIMINATED overflow mode.
+  --  Apply_Arithmetic_Overflow_Check will not deal with Then/Else_Actions
+  --  so skip this step if any actions are present.
 
-  if Minimized_Eliminated_Overflow_Check (N) then
+  if Minimized_Eliminated_Overflow_Check (N)
+and then No (Then_Actions (N))
+and then No (Else_Actions (N))
+  then
  Apply_Arithmetic_Overflow_Check (N);
  return;
   end if;




[Ada] AI12-0411: Add "bool" to Interfaces.C

2021-05-06 Thread Pierre-Marie de Rodat
This AI adds in Interfaces.C a binding to the C type _Bool/bool.  Note
that this AI makes ambiguous the construct pragma Assert (False) when
using Interfaces.C. To hopefully provide slightly better backward
compatibility, we make Interfaces.C.Extensions.bool a renaming of
Interfaces.C.bool so that the two types are compatible.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/i-c.ads (bool): New type.
* libgnat/i-cexten.ads, libgnat/i-cexten__128.ads (bool): Now
a subtype of Interfaces.C.bool.
* libgnarl/s-interr__vxworks.adb (Interrupt_Manager): Qualify
False.
* libgnarl/s-interr.adb, libgnarl/s-interr__hwint.adb,
libgnarl/s-tasini.adb, libgnarl/s-tasren.adb,
libgnarl/s-tassta.adb, libgnarl/s-tpobmu.adb,
libgnarl/s-tpobop.adb, libgnarl/s-tpopmo.adb: Replace
Assert (False) by Assert (Standard.False).diff --git a/gcc/ada/libgnarl/s-interr.adb b/gcc/ada/libgnarl/s-interr.adb
--- a/gcc/ada/libgnarl/s-interr.adb
+++ b/gcc/ada/libgnarl/s-interr.adb
@@ -783,7 +783,7 @@ package body System.Interrupts is
   null;
 
when others =>
-  pragma Assert (False);
+  pragma Assert (Standard.False);
   null;
 end case;
 
@@ -1228,7 +1228,7 @@ package body System.Interrupts is
 when X : others =>
System.IO.Put_Line ("Exception in Interrupt_Manager");
System.IO.Put_Line (Ada.Exceptions.Exception_Information (X));
-   pragma Assert (False);
+   pragma Assert (Standard.False);
  end;
   end loop;
end Interrupt_Manager;


diff --git a/gcc/ada/libgnarl/s-interr__hwint.adb b/gcc/ada/libgnarl/s-interr__hwint.adb
--- a/gcc/ada/libgnarl/s-interr__hwint.adb
+++ b/gcc/ada/libgnarl/s-interr__hwint.adb
@@ -1023,7 +1023,7 @@ package body System.Interrupts is
null;
 
 when others =>
-   pragma Assert (False);
+   pragma Assert (Standard.False);
null;
  end;
   end loop;


diff --git a/gcc/ada/libgnarl/s-interr__vxworks.adb b/gcc/ada/libgnarl/s-interr__vxworks.adb
--- a/gcc/ada/libgnarl/s-interr__vxworks.adb
+++ b/gcc/ada/libgnarl/s-interr__vxworks.adb
@@ -1040,7 +1040,7 @@ package body System.Interrupts is
null;
 
 when others =>
-   pragma Assert (False);
+   pragma Assert (Standard.False);
null;
  end;
   end loop;


diff --git a/gcc/ada/libgnarl/s-tasini.adb b/gcc/ada/libgnarl/s-tasini.adb
--- a/gcc/ada/libgnarl/s-tasini.adb
+++ b/gcc/ada/libgnarl/s-tasini.adb
@@ -420,7 +420,7 @@ package body System.Tasking.Initialization is
 when Terminated
| Unactivated
 =>
-   pragma Assert (False);
+   pragma Assert (Standard.False);
null;
 
 when Activating
@@ -535,7 +535,7 @@ package body System.Tasking.Initialization is
  C := C.Common.All_Tasks_Link;
   end loop;
 
-  pragma Assert (False);
+  pragma Assert (Standard.False);
end Remove_From_All_Tasks_List;
 
---


diff --git a/gcc/ada/libgnarl/s-tasren.adb b/gcc/ada/libgnarl/s-tasren.adb
--- a/gcc/ada/libgnarl/s-tasren.adb
+++ b/gcc/ada/libgnarl/s-tasren.adb
@@ -165,7 +165,7 @@ package body System.Tasking.Rendezvous is
 
  --  Should never get here ???
 
- pragma Assert (False);
+ pragma Assert (Standard.False);
  raise Standard'Abort_Signal;
   end if;
 
@@ -236,7 +236,7 @@ package body System.Tasking.Rendezvous is
 
  --  Should never get here ???
 
- pragma Assert (False);
+ pragma Assert (Standard.False);
  raise Standard'Abort_Signal;
   end if;
 
@@ -646,7 +646,7 @@ package body System.Tasking.Rendezvous is
 
  --  Should never get here ???
 
- pragma Assert (False);
+ pragma Assert (Standard.False);
  raise Standard'Abort_Signal;
   end if;
 
@@ -1251,7 +1251,7 @@ package body System.Tasking.Rendezvous is
 
  --  Should never get here ???
 
- pragma Assert (False);
+ pragma Assert (Standard.False);
  raise Standard'Abort_Signal;
   end if;
 
@@ -1400,7 +1400,7 @@ package body System.Tasking.Rendezvous is
 
 --  Should never get here
 
-pragma Assert (False);
+pragma Assert (Standard.False);
 null;
   end case;
 


diff --git a/gcc/ada/libgnarl/s-tassta.adb b/gcc/ada/libgnarl/s-tassta.adb
--- a/gcc/ada/libgnarl/s-tassta.adb
+++ b/gcc/ada/libgnarl/s-tassta.adb
@@ -578,7 +578,7 @@ package body System.Tasking.Stages is
 
  --  ??? Should never get here
 
- pragma Assert (False);
+ pragma Assert (Standard.False);
  raise Standard'Abort_Signal;
   end if;
 


diff --git a/gcc/ada/libgnarl/s-tpobmu.adb 

[Ada] In CodePeer mode, use regular-exception handling

2021-05-06 Thread Pierre-Marie de Rodat
After a change done for GNATProve in 2016, CodePeer_Mode was not using
the regular exception mechanism. This resulted in imprecise control-flow
graphs in some cases. Revert that change for CodePeer. And, since for
GNATprove_Mode Operating_Mode is never Generate_Code, we can simplify
the code a bit.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* gnat1drv.adb (Adjust_Global_Switches): Simplify logic.diff --git a/gcc/ada/gnat1drv.adb b/gcc/ada/gnat1drv.adb
--- a/gcc/ada/gnat1drv.adb
+++ b/gcc/ada/gnat1drv.adb
@@ -633,13 +633,9 @@ procedure Gnat1drv is
   end if;
 
   --  Set and check exception mechanism. This is only meaningful when
-  --  compiling, and in particular not meaningful for special modes used
-  --  for program analysis rather than compilation: CodePeer mode and
-  --  GNATprove mode.
+  --  generating code.
 
-  if Operating_Mode = Generate_Code
-and then not (CodePeer_Mode or GNATprove_Mode)
-  then
+  if Operating_Mode = Generate_Code then
  case Targparm.Frontend_Exceptions_On_Target is
 when True =>
case Targparm.ZCX_By_Default_On_Target is




[Ada] Missing semantic error on ineffective Others_Clause

2021-05-06 Thread Pierre-Marie de Rodat
Compiler fails to reject an Others_Clause in an aggregate for a
constrained array type when previous components of the aggregate cover
the full index range of the array subtype, and the expression in the
Others_Clause has a type incompatible with the component type of the
array. The Others_Clause does not generate any code but the construct is
illegal. The error was previously reported only in -gnatc mode.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_aggr.adb (Expand_Array_Aggregate): If the expression in an
Others_Clause has not been analyzed because previous analysis of
the enclosing aggregate showed the clause to be ineffective i.e.
cover a null range, analyze it now to detect a possible type
illegality.diff --git a/gcc/ada/exp_aggr.adb b/gcc/ada/exp_aggr.adb
--- a/gcc/ada/exp_aggr.adb
+++ b/gcc/ada/exp_aggr.adb
@@ -5958,6 +5958,21 @@ package body Exp_Aggr is
 
 if Nkind (First (Choice_List (Assoc))) = N_Others_Choice then
Others_Present (Dim) := True;
+
+   --  An others_clause may be superfluous if previous components
+   --  cover the full given range of a constrained array. In such
+   --  a case an others_clause does not contribute any additional
+   --  components and has not been analyzed. We analyze it now to
+   --  detect type errors in the expression, even though no code
+   --  will be generated for it.
+
+   if Dim = Aggr_Dimension
+ and then Nkind (Assoc) /= N_Iterated_Component_Association
+ and then not Analyzed (Expression (Assoc))
+ and then not Box_Present (Assoc)
+   then
+  Preanalyze_And_Resolve (Expression (Assoc), Ctyp);
+   end if;
 end if;
  end if;
 




[Ada] Assert failure on pragma Inline in procedure body

2021-05-06 Thread Pierre-Marie de Rodat
Is_Inline_Pragma isn't always dealing properly with N not being a list
member and Spec_Id being null. This is now fixed.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch6.adb (Is_Inline_Pragma): Protect against N not being a
list member in both branches.diff --git a/gcc/ada/sem_ch6.adb b/gcc/ada/sem_ch6.adb
--- a/gcc/ada/sem_ch6.adb
+++ b/gcc/ada/sem_ch6.adb
@@ -3416,15 +3416,13 @@ package body Sem_Ch6 is
 Prag := Empty;
  end if;
 
- if Present (Prag) then
+ if Present (Prag) and then Is_List_Member (N) then
 if Present (Spec_Id) then
-   if Is_List_Member (N)
- and then Is_List_Member (Unit_Declaration_Node (Spec_Id))
+   if Is_List_Member (Unit_Declaration_Node (Spec_Id))
  and then In_Same_List (N, Unit_Declaration_Node (Spec_Id))
then
   Analyze (Prag);
end if;
-
 else
--  Create a subprogram declaration, to make treatment uniform.
--  Make the sloc of the subprogram name that of the entity in




[Ada] Do not second-guess the hardware for underflow handling of Scaling

2021-05-06 Thread Pierre-Marie de Rodat
This defers to the hardware implementation for the handling of underflow
in the Scaling routine also in the case where denormals are not supported.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-fatgen.adb (Scaling): Use single handling of
underflow.  Add pragma Annotate.diff --git a/gcc/ada/libgnat/s-fatgen.adb b/gcc/ada/libgnat/s-fatgen.adb
--- a/gcc/ada/libgnat/s-fatgen.adb
+++ b/gcc/ada/libgnat/s-fatgen.adb
@@ -781,11 +781,9 @@ package body System.Fat_Gen is
  --  Check for underflow
 
  elsif Adjustment < IEEE_Emin - Exp then
---  Check for gradual underflow
+--  Check for possibly gradual underflow (up to the hardware)
 
-if T'Denorm
-  and then Adjustment >= IEEE_Emin - Mantissa - Exp
-then
+if Adjustment >= IEEE_Emin - Mantissa - Exp then
Expf := IEEE_Emin;
Expi := Exp + Adjustment - Expf;
 
@@ -810,6 +808,9 @@ package body System.Fat_Gen is
 --  Given that Expi >= -Mantissa, only -64 is problematic
 
 if Expi = -64 then
+   pragma Annotate
+ (CodePeer, Intentional, "test always false",
+  "test always false in some instantiations");
XX := XX / 2.0;
Expi := -63;
 end if;




[Ada] Set Raises_CE flag only in Apply_Compile_Time_Constraint_Error

2021-05-06 Thread Pierre-Marie de Rodat
Routine Apply_Compile_Time_Constraint_Error, when operating in GNAT
mode, always sets Raises_Constraint_Error flag, so there is no need to
follow it with calls to Set_Raises_Constraint_Error. These calls only
had an effect in GNATprove mode, but they were following few calls to
Apply_Compile_Time_Constraint_Error.

Now the call to Set_Raises_Constraint_Error is executed exclusively as
part of Apply_Compile_Time_Constraint_Error, both in GNAT and GNATprove
modes.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_ch4.adb (Analyze_Selected_Component): Remove explicit call
to Set_Raises_Constraint_Error on statically missing component.
* sem_eval.adb (Eval_Arithmetic_Op): Likewise for static
divisions by integer and real zeros.
* sem_util.adb (Apply_Compile_Time_Constraint_Error): Call
Set_Raises_Constraint_Error before exiting early in GNATprove
mode.diff --git a/gcc/ada/sem_ch4.adb b/gcc/ada/sem_ch4.adb
--- a/gcc/ada/sem_ch4.adb
+++ b/gcc/ada/sem_ch4.adb
@@ -5452,8 +5452,6 @@ package body Sem_Ch4 is
(N, "component not present in }??",
 CE_Discriminant_Check_Failed,
 Ent => Prefix_Type);
-
- Set_Raises_Constraint_Error (N);
  return;
   end if;
 


diff --git a/gcc/ada/sem_eval.adb b/gcc/ada/sem_eval.adb
--- a/gcc/ada/sem_eval.adb
+++ b/gcc/ada/sem_eval.adb
@@ -2124,7 +2124,6 @@ package body Sem_Eval is
  Apply_Compile_Time_Constraint_Error
(N, "division by zero", CE_Divide_By_Zero,
 Warn => not Stat or SPARK_Mode = On);
- Set_Raises_Constraint_Error (N);
  return;
 
   --  Otherwise we can do the division
@@ -2226,7 +2225,6 @@ package body Sem_Eval is
if UR_Is_Zero (Right_Real) then
   Apply_Compile_Time_Constraint_Error
 (N, "division by zero", CE_Divide_By_Zero);
-  Set_Raises_Constraint_Error (N);
   return;
end if;
 


diff --git a/gcc/ada/sem_util.adb b/gcc/ada/sem_util.adb
--- a/gcc/ada/sem_util.adb
+++ b/gcc/ada/sem_util.adb
@@ -1436,6 +1436,7 @@ package body Sem_Util is
   --  generate a check message.
 
   if GNATprove_Mode then
+ Set_Raises_Constraint_Error (N);
  return;
   end if;
 




[Ada] Make Is_OK_Static_Subtype use Is_Static_Subtype

2021-05-06 Thread Pierre-Marie de Rodat
The reason for this change is two-fold:
- There is code duplication between Is_OK_Static_Subtype and
  Is_Static_Subtype
- Is_Static_Subtype is more correct than Is_OK_Static_Subtype (e.g. the
  dynamic predicate checks are more complete in Is_Static_Subtype).

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_eval.adb (Is_OK_Static_Subtype): Call Is_Static_Subtype,
remove redundant checks.diff --git a/gcc/ada/sem_eval.adb b/gcc/ada/sem_eval.adb
--- a/gcc/ada/sem_eval.adb
+++ b/gcc/ada/sem_eval.adb
@@ -5549,23 +5549,16 @@ package body Sem_Eval is
  return False;
   end if;
 
-  Anc_Subt := Ancestor_Subtype (Typ);
-
-  if Anc_Subt = Empty then
- Anc_Subt := Base_T;
-  end if;
+  --  Then, check if the subtype is strictly static. This takes care of
+  --  checking for generics and predicates.
 
-  if Is_Generic_Type (Root_Type (Base_T))
-or else Is_Generic_Actual_Type (Base_T)
-  then
- return False;
-
-  elsif Has_Dynamic_Predicate_Aspect (Typ) then
+  if not Is_Static_Subtype (Typ) then
  return False;
+  end if;
 
   --  String types
 
-  elsif Is_String_Type (Typ) then
+  if Is_String_Type (Typ) then
  return
Ekind (Typ) = E_String_Literal_Subtype
  or else
@@ -5579,6 +5572,12 @@ package body Sem_Eval is
 return True;
 
  else
+Anc_Subt := Ancestor_Subtype (Typ);
+
+if No (Anc_Subt) then
+   Anc_Subt := Base_T;
+end if;
+
 --  Scalar_Range (Typ) might be an N_Subtype_Indication, so use
 --  Get_Type_{Low,High}_Bound.
 




[Ada] Reset x87 FPU to 64-bit precision for floating-point I/O on Linux

2021-05-06 Thread Pierre-Marie de Rodat
This is basically done on all x86/x86-64 native systems except for Linux
and makes it possible to work around other precision settings, e.g Java's.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* init.c (__gnat_init_float): Use full version on Linux too.diff --git a/gcc/ada/init.c b/gcc/ada/init.c
--- a/gcc/ada/init.c
+++ b/gcc/ada/init.c
@@ -2749,11 +2749,7 @@ __gnat_install_handler (void)
 /* __gnat_init_float */
 /*/
 
-/* This routine is called as each process thread is created, for possible
-   initialization of the FP processor.  This version is used under INTERIX
-   and WIN32.  */
-
-#if defined (_WIN32) || defined (__INTERIX) \
+#if defined (_WIN32) || defined (__INTERIX) || defined (__linux__) \
   || defined (__Lynx__) || defined(__NetBSD__) || defined(__FreeBSD__) \
   || defined (__OpenBSD__) || defined (__DragonFly__) || defined(__QNX__)
 
@@ -2763,13 +2759,10 @@ void
 __gnat_init_float (void)
 {
 #if defined (__i386__) || defined (__x86_64__)
-
-  /* This is used to properly initialize the FPU on an x86 for each
- process thread.  */
-
+  /* This is used to properly initialize the FPU to 64-bit precision on an x86
+ for each process thread and also for floating-point I/O.  */
   asm ("finit");
-
-#endif  /* Defined __i386__ */
+#endif
 }
 #endif
 




[Ada] Fix off-by-one bug in underflow handling of Scaling

2021-05-06 Thread Pierre-Marie de Rodat
Gradual underflow needs to be used for one more exponent.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-fatgen.adb (Scaling): Fix off-by-one bug for underflow.diff --git a/gcc/ada/libgnat/s-fatgen.adb b/gcc/ada/libgnat/s-fatgen.adb
--- a/gcc/ada/libgnat/s-fatgen.adb
+++ b/gcc/ada/libgnat/s-fatgen.adb
@@ -784,7 +784,7 @@ package body System.Fat_Gen is
 --  Check for gradual underflow
 
 if T'Denorm
-  and then Adjustment >= IEEE_Emin - (Mantissa - 1) - Exp
+  and then Adjustment >= IEEE_Emin - Mantissa - Exp
 then
Expf := IEEE_Emin;
Expi := Exp + Adjustment - Expf;
@@ -807,6 +807,13 @@ package body System.Fat_Gen is
 Float_Word (IEEE_Ebias + Expf) * Exp_Factor;
 
  if Expi < 0 then
+--  Given that Expi >= -Mantissa, only -64 is problematic
+
+if Expi = -64 then
+   XX := XX / 2.0;
+   Expi := -63;
+end if;
+
 XX := XX / T (UST.Long_Long_Unsigned (2) ** (-Expi));
  end if;
 




[Ada] Spurious constraint error on conversion of access types

2021-05-06 Thread Pierre-Marie de Rodat
This patch fixes an error in the compiler whereby a spurious constraint
error is raised at runtime on type conversions between
access-to-discriminanted types when the object being converted is null.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* checks.adb (Make_Discriminant_Constraint_Check): Add check for
null when the type being converted is an access type.diff --git a/gcc/ada/checks.adb b/gcc/ada/checks.adb
--- a/gcc/ada/checks.adb
+++ b/gcc/ada/checks.adb
@@ -3658,6 +3658,20 @@ package body Checks is
  Cond := Build_Discriminant_Checks (Expr, Expr_Type);
  Set_Discriminant_Constraint (Expr_Type, Old_Constraints);
 
+ --  Conversion between access types requires that we check for null
+ --  before checking discriminants.
+
+ if Is_Access_Type (Etype (Expr)) then
+Cond := Make_And_Then (Loc,
+  Left_Opnd  =>
+Make_Op_Ne (Loc,
+  Left_Opnd  =>
+Duplicate_Subexpr_No_Checks
+  (Expr, Name_Req => True),
+  Right_Opnd => Make_Null (Loc)),
+  Right_Opnd => Cond);
+ end if;
+
  Insert_Action (N,
Make_Raise_Constraint_Error (Loc,
  Condition => Cond,




[Ada] Make new implementation of System.Fat_Gen.Valid more robust

2021-05-06 Thread Pierre-Marie de Rodat
The comparison of any denormalized number with 0.0 may return True on
hardware not supporting denormals, so we need to do a bit comparison.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/s-fatgen.adb (Valid): Do a bit comparison with 0.0
when denormalized numbers are not supported.diff --git a/gcc/ada/libgnat/s-fatgen.adb b/gcc/ada/libgnat/s-fatgen.adb
--- a/gcc/ada/libgnat/s-fatgen.adb
+++ b/gcc/ada/libgnat/s-fatgen.adb
@@ -959,7 +959,19 @@ package body System.Fat_Gen is
   else pragma Assert (Exp = IEEE_Emin - 1);
  --  This is a denormalized number, valid if T'Denorm is True or 0.0
 
- return T'Denorm or else X.all = 0.0;
+ if T'Denorm then
+return True;
+
+ --  Note that we cannot do a direct comparison with 0.0 because the
+ --  hardware may evaluate it to True for all denormalized numbers.
+
+ else
+--  First clear the sign bit (the exponent is already zero)
+
+Rep (MSW) := Rep (MSW) and not Sign_Mask;
+
+return (for all J in 0 .. Rep_Last => Rep (J) = 0);
+ end if;
   end if;
end Valid;
 




[Ada] Fix handling of PATs

2021-05-06 Thread Pierre-Marie de Rodat
When trying to compare two fields, one that's a packed array of known
length and one that's a packed array of unknown length, Expand_Packed_Eq
converts each side of the equality to its packed array type
and then says that's enough if the types are modular integer types.
That's correct, however, only if they're the same PATs.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_pakd.adb (Expand_Packed_Eq): Fix handling of PATs.diff --git a/gcc/ada/exp_pakd.adb b/gcc/ada/exp_pakd.adb
--- a/gcc/ada/exp_pakd.adb
+++ b/gcc/ada/exp_pakd.adb
@@ -1909,9 +1909,10 @@ package body Exp_Pakd is
   --  where PAT is the packed array type. This works fine, since in the
   --  modular case we guarantee that the unused bits are always zeroes.
   --  We do have to compare the lengths because we could be comparing
-  --  two different subtypes of the same base type.
+  --  two different subtypes of the same base type. We can only do this
+  --  if the PATs on both sides are the same.
 
-  if Is_Modular_Integer_Type (PAT) then
+  if Is_Modular_Integer_Type (PAT) and then PAT = Etype (R) then
  Rewrite (N,
Make_And_Then (Loc,
  Left_Opnd =>




[Ada] Remove hardcoded pragma Warnings from the formal vectors library

2021-05-06 Thread Pierre-Marie de Rodat
Pragma Warnings in the generic formal vectors library was only
suppressing a message in one specific instantiation within one specific
GNATprove test.  This pragma was referring to "type T defined at line
4", which is exactly the type declared in that test; the suppression
didn't work for any other instantiations.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* libgnat/a-cofove.adb (Insert_Space): Remove hardcoded pragma
Warnings.diff --git a/gcc/ada/libgnat/a-cofove.adb b/gcc/ada/libgnat/a-cofove.adb
--- a/gcc/ada/libgnat/a-cofove.adb
+++ b/gcc/ada/libgnat/a-cofove.adb
@@ -868,11 +868,7 @@ is
 --  less than 0, so it is safe to compute the following sum without
 --  fear of overflow.
 
-pragma Warnings
-  (Off, "value not in range of type ""T"" defined at line 4");
 Index := No_Index + Index_Type'Base (Count_Type'Last);
-pragma Warnings
-  (On, "value not in range of type ""T"" defined at line 4");
 
 if Index <= Index_Type'Last then
 




[Ada] Implement tiered support for floating-point exponentiation

2021-05-06 Thread Pierre-Marie de Rodat
This changes the implementation of floating-point exponentiation from
using Long_Long_Float for all floating-point types to using a base type
tailored to the type being operated on.

The computation is done in double precision internally, which gives more
accurate results for Long_Float and Long_Long_Float.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* Makefile.rtl (GNATRTL_NONTASKING_OBJS): Add s-exponr, s-exnflt
and s-exnlfl.
* exp_ch4.adb (Expand_N_Op_Expon): Use RE_Exn_Float for Short_Float.
* rtsfind.ads (RTU_Id): Add System_Exn_Flt and System_Exn_LFlt.
(RE_Id): Adjust entries for RE_Exn_Float and RE_Exn_Long_Float.
(RE_Unit_Table): Likewise.
* libgnat/s-exnflt.ads: New file.
* libgnat/s-exnlfl.ads: Likewise.
* libgnat/s-exnllf.ads: Change to mere instantiation.
* libgnat/s-exnllf.adb: Move implementation to...
* libgnat/s-exponr.ads: New generic unit.
* libgnat/s-exponr.adb: ...here and also make it generic.
(Expon): Do the computation in double precision internally.diff --git a/gcc/ada/Makefile.rtl b/gcc/ada/Makefile.rtl
--- a/gcc/ada/Makefile.rtl
+++ b/gcc/ada/Makefile.rtl
@@ -583,6 +583,8 @@ GNATRTL_NONTASKING_OBJS= \
   s-exctab$(objext) \
   s-exctra$(objext) \
   s-exnint$(objext) \
+  s-exnflt$(objext) \
+  s-exnlfl$(objext) \
   s-exnllf$(objext) \
   s-exnlli$(objext) \
   s-expint$(objext) \
@@ -590,6 +592,7 @@ GNATRTL_NONTASKING_OBJS= \
   s-expllu$(objext) \
   s-expmod$(objext) \
   s-exponn$(objext) \
+  s-exponr$(objext) \
   s-expont$(objext) \
   s-exponu$(objext) \
   s-expuns$(objext) \


diff --git a/gcc/ada/exp_ch4.adb b/gcc/ada/exp_ch4.adb
--- a/gcc/ada/exp_ch4.adb
+++ b/gcc/ada/exp_ch4.adb
@@ -9081,15 +9081,12 @@ package body Exp_Ch4 is
   --  overflow), and if there is an infinity generated and a range check
   --  is required, the check will fail anyway.
 
-  --  Historical note: we used to convert everything to Long_Long_Float
-  --  and call a single common routine, but this had the undesirable effect
-  --  of giving different results for small static exponent values and the
-  --  same dynamic values.
-
   else
  pragma Assert (Is_Floating_Point_Type (Rtyp));
 
- if Rtyp = Standard_Float then
+ --  Short_Float and Float are the same type for GNAT
+
+ if Rtyp = Standard_Short_Float or else Rtyp = Standard_Float then
 Etyp := Standard_Float;
 Rent := RE_Exn_Float;
 


diff --git /dev/null b/gcc/ada/libgnat/s-exnflt.ads
new file mode 100644
--- /dev/null
+++ b/gcc/ada/libgnat/s-exnflt.ads
@@ -0,0 +1,41 @@
+--
+--  --
+-- GNAT RUN-TIME COMPONENTS --
+--  --
+--   S Y S T E M . E X N _ F L T--
+--  --
+-- S p e c  --
+--  --
+--Copyright (C) 2021, Free Software Foundation, Inc.--
+--  --
+-- GNAT is free software;  you can  redistribute it  and/or modify it under --
+-- terms of the  GNU General Public License as published  by the Free Soft- --
+-- ware  Foundation;  either version 3,  or (at your option) any later ver- --
+-- sion.  GNAT is distributed in the hope that it will be useful, but WITH- --
+-- OUT ANY WARRANTY;  without even the  implied warranty of MERCHANTABILITY --
+-- or FITNESS FOR A PARTICULAR PURPOSE. --
+--  --
+-- As a special exception under Section 7 of GPL version 3, you are granted --
+-- additional permissions described in the GCC Runtime Library Exception,   --
+-- version 3.1, as published by the Free Software Foundation.   --
+--  --
+-- You should have received a copy of the GNU General Public License and--
+-- a copy of the GCC Runtime Library Exception along with this program; --
+-- see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see--
+-- .  --
+--  --
+-- GNAT was originally developed  by the GNAT team at  New York University. --
+-- Extensive contributions were provided by Ada Core Technologies Inc.  --
+-- 

[Ada] Fix restriction No_Enumeration_Maps on both Image attributes

2021-05-06 Thread Pierre-Marie de Rodat
Restriction No_Enumeration_Maps was only checked on Value attribute (and
its Wide and Wide_Wide variants). Now it is checked also on Img, Image
and its Wide and Wide_Wide variants, just like described in the GNAT RM.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_attr.adb (Check_Enum_Image): Reword comment; add
Check_Enumeration_Maps parameter.  Now this routine combines
both referencing enumeration literals and checking restriction
No_Enumeration_Maps, if required.
(Analyze_Attribute): Remove duplicated code and instead call
Check_Enum_Image.diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -318,14 +318,20 @@ package body Sem_Attr is
   procedure Check_E2;
   --  Check that two attribute arguments are present
 
-  procedure Check_Enum_Image;
-  --  If the prefix type of 'Image is an enumeration type, set all its
-  --  literals as referenced, since the image function could possibly end
-  --  up referencing any of the literals indirectly. Same for Enum_Val.
-  --  Set the flag only if the reference is in the main code unit. Same
-  --  restriction when resolving 'Value; otherwise an improperly set
-  --  reference when analyzing an inlined body will lose a proper
-  --  warning on a useless with_clause.
+  procedure Check_Enum_Image (Check_Enumeration_Maps : Boolean := False);
+  --  Common processing for the Image and Value family of attributes,
+  --  including their Wide and Wide_Wide versions, Enum_Val and Img.
+  --
+  --  If the prefix type of an attribute is an enumeration type, set all
+  --  its literals as referenced, since the attribute function can
+  --  indirectly reference any of the literals. Set the referenced flag
+  --  only if the attribute is in the main code unit; otherwise an
+  --  improperly set reference when analyzing an inlined body will lose a
+  --  proper warning on a useless with_clause.
+  --
+  --  If Check_Enumeration_Maps is True, then the attribute expansion
+  --  requires enumeration maps, so check whether restriction
+  --  No_Enumeration_Maps is active.
 
   procedure Check_First_Last_Valid;
   --  Perform all checks for First_Valid and Last_Valid attributes
@@ -1527,7 +1533,7 @@ package body Sem_Attr is
 Validate_Non_Static_Attribute_Function_Call;
  end if;
 
- Check_Enum_Image;
+ Check_Enum_Image (Check_Enumeration_Maps => True);
 
  --  Check restriction No_Fixed_IO. Note the check of Comes_From_Source
  --  to avoid giving a duplicate message for when Image attributes
@@ -1962,10 +1968,22 @@ package body Sem_Attr is
   -- Check_Enum_Image --
   --
 
-  procedure Check_Enum_Image is
+  procedure Check_Enum_Image (Check_Enumeration_Maps : Boolean := False) is
  Lit : Entity_Id;
 
   begin
+ --  Ensure that Check_Enumeration_Maps parameter is set precisely for
+ --  attributes whose implementation requires enumeration maps.
+
+ pragma Assert
+   (Check_Enumeration_Maps = (Attr_Id in Attribute_Image
+   | Attribute_Img
+   | Attribute_Value
+   | Attribute_Wide_Image
+   | Attribute_Wide_Value
+   | Attribute_Wide_Wide_Image
+   | Attribute_Wide_Wide_Value));
+
  --  When an enumeration type appears in an attribute reference, all
  --  literals of the type are marked as referenced. This must only be
  --  done if the attribute reference appears in the current source.
@@ -1975,6 +1993,10 @@ package body Sem_Attr is
  if Is_Enumeration_Type (P_Base_Type)
and then In_Extended_Main_Code_Unit (N)
  then
+if Check_Enumeration_Maps then
+   Check_Restriction (No_Enumeration_Maps, N);
+end if;
+
 Lit := First_Literal (P_Base_Type);
 while Present (Lit) loop
Set_Referenced (Lit);
@@ -7116,33 +7138,7 @@ package body Sem_Attr is
   =>
  Check_E1;
  Check_Scalar_Type;
-
- --  Case of enumeration type
-
- --  When an enumeration type appears in an attribute reference, all
- --  literals of the type are marked as referenced. This must only be
- --  done if the attribute reference appears in the current source.
- --  Otherwise the information on references may differ between a
- --  normal compilation and one that performs inlining.
-
- if Is_Enumeration_Type (P_Type)
-   and then In_Extended_Main_Code_Unit (N)
- then
-

[Ada] Remove unused initial value in Read_Library_Info_From_Full

2021-05-06 Thread Pierre-Marie de Rodat
Remove an initial value of a local variable that were never used. Static
analysis tools like CodePeer should easily recognize that this variable
is only accessed after being written.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* osint.adb (Read_Library_Info_From_Full): Cleanup unused
initial value.diff --git a/gcc/ada/osint.adb b/gcc/ada/osint.adb
--- a/gcc/ada/osint.adb
+++ b/gcc/ada/osint.adb
@@ -2564,7 +2564,7 @@ package body Osint is
   --  Read data from the file
 
   declare
- Actual_Len : Integer := 0;
+ Actual_Len : Integer;
 
  Lo : constant Text_Ptr := 0;
  --  Low bound for allocated text buffer




[Ada] Crash on if_expression used as index of discriminant-dependent array

2021-05-06 Thread Pierre-Marie de Rodat
The compiler aborts on an if_expression used as an index when the object
being indexed is a discriminant-dependent component, and the frontend
generates an improper range check on the expression instead of the
required index check.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_res.adb (Resolve_If_Expression): If the context of the
expression is an indexed_component, resolve the expression and
its dependent_expressions with the base type of the index, to
ensure that an index check is generated when resolving the
enclosing indexxed_component, and avoid an improper use of
discriminants out of scope, when the index type is
discriminant-dependent.diff --git a/gcc/ada/sem_res.adb b/gcc/ada/sem_res.adb
--- a/gcc/ada/sem_res.adb
+++ b/gcc/ada/sem_res.adb
@@ -9095,6 +9095,16 @@ package body Sem_Res is
   --  that the context in general allows sliding, while a qualified
   --  expression forces equality of bounds.
 
+  Result_Type  : Entity_Id := Typ;
+  --  So in most cases the type of the If_Expression and of its
+  --  dependent expressions is that of the context. However, if
+  --  the expression is the index of an Indexed_Component, we must
+  --  ensure that a proper index check is applied, rather than a
+  --  range check on the index type (which might be discriminant
+  --  dependent). In this case we resolve with the base type of the
+  --  index type, and the index check is generated in the resolution
+  --  of the indexed_component above.
+
   -
   -- Apply_Check --
   -
@@ -9118,10 +9128,10 @@ package body Sem_Res is
  else
 Rewrite (Expr,
   Make_Qualified_Expression (Loc,
-Subtype_Mark => New_Occurrence_Of (Typ, Loc),
+Subtype_Mark => New_Occurrence_Of (Result_Type, Loc),
 Expression   => Relocate_Node (Expr)));
 
-Analyze_And_Resolve (Expr, Typ);
+Analyze_And_Resolve (Expr, Result_Type);
  end if;
   end Apply_Check;
 
@@ -9140,6 +9150,12 @@ package body Sem_Res is
  return;
   end if;
 
+  if Nkind (Parent (N)) = N_Indexed_Component
+or else Nkind (Parent (Parent (N))) = N_Indexed_Component
+  then
+ Result_Type := Base_Type (Typ);
+  end if;
+
   Then_Expr := Next (Condition);
 
   if No (Then_Expr) then
@@ -9149,7 +9165,7 @@ package body Sem_Res is
   Else_Expr := Next (Then_Expr);
 
   Resolve (Condition, Any_Boolean);
-  Resolve (Then_Expr, Typ);
+  Resolve (Then_Expr, Result_Type);
   Apply_Check (Then_Expr);
 
   --  If ELSE expression present, just resolve using the determined type
@@ -9163,7 +9179,7 @@ package body Sem_Res is
 Resolve (Else_Expr, Any_Real);
 
  else
-Resolve (Else_Expr, Typ);
+Resolve (Else_Expr, Result_Type);
  end if;
 
  Apply_Check (Else_Expr);
@@ -9187,7 +9203,7 @@ package body Sem_Res is
   elsif Root_Type (Typ) = Standard_Boolean then
  Else_Expr :=
Convert_To (Typ, New_Occurrence_Of (Standard_True, Sloc (N)));
- Analyze_And_Resolve (Else_Expr, Typ);
+ Analyze_And_Resolve (Else_Expr, Result_Type);
  Append_To (Expressions (N), Else_Expr);
 
   else
@@ -9195,7 +9211,7 @@ package body Sem_Res is
  Append_To (Expressions (N), Error);
   end if;
 
-  Set_Etype (N, Typ);
+  Set_Etype (N, Result_Type);
 
   if not Error_Posted (N) then
  Eval_If_Expression (N);




[Ada] Remove excessive conditions in iterations across components

2021-05-06 Thread Pierre-Marie de Rodat
When iterating with First_Component/Next_Component we don't need to
check that the Ekind of the iterator variable is E_Component.

Code cleanup related to record equality and unchecked unions; behaviour
is unaffected.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* exp_ch3.adb (Expand_Freeze_Array_Type): Remove excessive
condition.
(Expand_N_Object_Declaration): Likewise.
(Build_Equivalent_Aggregate): Likewise.
(Initialization_Warning): Likewise; change another excessive
condition into assertion.
* freeze.adb (Freeze_Entity): Remove excessive condition.diff --git a/gcc/ada/exp_ch3.adb b/gcc/ada/exp_ch3.adb
--- a/gcc/ada/exp_ch3.adb
+++ b/gcc/ada/exp_ch3.adb
@@ -5477,9 +5477,7 @@ package body Exp_Ch3 is
   First_Component (Base_Type (Underlying_Type (Etype (Typ;
 Comp := First_Component (Typ);
 while Present (Comp) loop
-   if Ekind (Comp) = E_Component
- and then Chars (Comp) = Chars (Old_Comp)
-   then
+   if Chars (Comp) = Chars (Old_Comp) then
   Set_Discriminant_Checking_Func
 (Comp, Discriminant_Checking_Func (Old_Comp));
end if;
@@ -6153,8 +6151,7 @@ package body Exp_Ch3 is
 
 Comp := First_Component (Full_Type);
 while Present (Comp) loop
-   if Ekind (Comp) = E_Component
- and then Present (Expression (Parent (Comp)))
+   if Present (Expression (Parent (Comp)))
  and then
not Is_OK_Static_Expression (Expression (Parent (Comp)))
then
@@ -6186,9 +6183,7 @@ package body Exp_Ch3 is
 
 Comp := First_Component (Full_Type);
 while Present (Comp) loop
-   if Ekind (Comp) = E_Component
- and then Present (Expression (Parent (Comp)))
-   then
+   if Present (Expression (Parent (Comp))) then
   Append_To (Component_Associations (Aggr),
 Make_Component_Association (Loc,
   Choices=> New_List (New_Occurrence_Of (Comp, Loc)),
@@ -9001,11 +8996,10 @@ package body Exp_Ch3 is
   begin
  Comp := First_Component (E);
  while Present (Comp) loop
-if Ekind (Comp) = E_Discriminant
-  or else
-(Nkind (Parent (Comp)) = N_Component_Declaration
-  and then Present (Expression (Parent (Comp
-then
+pragma Assert
+  (Nkind (Parent (Comp)) = N_Component_Declaration);
+
+if Present (Expression (Parent (Comp))) then
Warning_Needed := True;
exit;
 end if;


diff --git a/gcc/ada/freeze.adb b/gcc/ada/freeze.adb
--- a/gcc/ada/freeze.adb
+++ b/gcc/ada/freeze.adb
@@ -6847,10 +6847,9 @@ package body Freeze is
 begin
Comp := First_Component (E);
while Present (Comp) loop
-  Typ  := Etype (Comp);
+  Typ := Etype (Comp);
 
-  if Ekind (Comp) = E_Component
-and then Is_Access_Type (Typ)
+  if Is_Access_Type (Typ)
 and then Scope (Typ) /= E
 and then Base_Type (Designated_Type (Typ)) = E
 and then Is_Itype (Designated_Type (Typ))




[Ada] Remove redundant condition for Image attribute and Ada version

2021-05-06 Thread Pierre-Marie de Rodat
Routine Error_Msg_Ada_2012_Feature includes a test on Ada_Version, so
there is no need to explicitly test the same condition just before the
call.

Tested on x86_64-pc-linux-gnu, committed on trunk

gcc/ada/

* sem_attr.adb (Analyze_Image_Attribute): Remove redundant
condition; add a missing header box.diff --git a/gcc/ada/sem_attr.adb b/gcc/ada/sem_attr.adb
--- a/gcc/ada/sem_attr.adb
+++ b/gcc/ada/sem_attr.adb
@@ -1467,6 +1467,10 @@ package body Sem_Attr is
  --  Check that Image_Type is legal as the type of a prefix of 'Image.
  --  Legality depends on the Ada language version.
 
+ --
+ -- Check_Image_Type --
+ --
+
  procedure Check_Image_Type (Image_Type : Entity_Id) is
  begin
 --  Image_Type may be empty in case of another error detected,
@@ -1493,7 +1497,7 @@ package body Sem_Attr is
 Set_Etype (N, Str_Typ);
 Check_Image_Type (Etype (P));
 
-if Attr_Id /= Attribute_Img and then Ada_Version < Ada_2012 then
+if Attr_Id /= Attribute_Img then
Error_Msg_Ada_2012_Feature ("|Object''Image", Sloc (P));
 end if;
  else




Re: [PATCH] split loop for NE condition.

2021-05-06 Thread guojiufu via Gcc-patches

On 2021-05-03 20:18, Richard Biener wrote:

On Thu, 29 Apr 2021, Jiufu Guo wrote:

When there is the possibility that overflow may happen on the loop 
index,

a few optimizations would not happen. For example code:

foo (int *a, int *b, unsigned k, unsigned n)
{
  while (++k != n)
a[k] = b[k]  + 1;
}

For this code, if "l > n", overflow may happen.  if "l < n" at 
begining,

it could be optimized (e.g. vectorization).

We can split the loop into two loops:

  while (++k > n)
a[k] = b[k]  + 1;
  while (l++ < n)
a[k] = b[k]  + 1;

then for the second loop, it could be optimized.

This patch is splitting this kind of small loop to achieve better 
performance.


Bootstrap and regtest pass on ppc64le.  Is this ok for trunk?


Do you have any statistics on how often this splits a loop during
bootstrap (use --with-build-config=bootstrap-O3)?  Or alternatively
on SPEC?


In SPEC2017, there are ~240 loops are split.  And I saw some performance 
improvement on xz.

I would try bootstrap-O3 (encounter ICE).



Actual comments on the patch inline.


Thanks!

Jiufu Guo.

gcc/ChangeLog:

2021-04-29  Jiufu Guo  

* params.opt (max-insns-ne-cond-split): New.
* tree-ssa-loop-split.c (connect_loop_phis): Add new param.
(get_ne_cond_branch): New function.
(split_ne_loop): New function.
(split_loop_on_ne_cond): New function.
(tree_ssa_split_loops): Use split_loop_on_ne_cond.

gcc/testsuite/ChangeLog:
2021-04-29  Jiufu Guo  

* gcc.dg/loop-split1.c: New test.

---
 gcc/params.opt |   4 +
 gcc/testsuite/gcc.dg/loop-split1.c |  28 
 gcc/tree-ssa-loop-split.c  | 219 
-

 3 files changed, 247 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/loop-split1.c

diff --git a/gcc/params.opt b/gcc/params.opt
index 2e4cbdd7a71..900b59b5136 100644
--- a/gcc/params.opt
+++ b/gcc/params.opt
@@ -766,6 +766,10 @@ Min. ratio of insns to prefetches to enable 
prefetching for a loop with an unkno
 Common Joined UInteger Var(param_min_loop_cond_split_prob) Init(30) 
IntegerRange(0, 100) Param Optimization
 The minimum threshold for probability of semi-invariant condition 
statement to trigger loop split.


+-param=max-insns-ne-cond-split=
+Common Joined UInteger Var(param_max_insn_ne_cond_split) Init(64) 
Param Optimization
+The maximum threshold for insnstructions number of a loop with ne 
condition to split.

+
 -param=min-nondebug-insn-uid=
 Common Joined UInteger Var(param_min_nondebug_insn_uid) Param
 The minimum UID to be used for a nondebug insn.
diff --git a/gcc/testsuite/gcc.dg/loop-split1.c 
b/gcc/testsuite/gcc.dg/loop-split1.c

new file mode 100644
index 000..4c466aa9f54
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/loop-split1.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fsplit-loops -fdump-tree-lsplit-details" } */
+
+void
+foo (int *a, int *b, unsigned l, unsigned n)
+{
+  while (++l != n)
+a[l] = b[l]  + 1;
+}
+
+void
+foo1 (int *a, int *b, unsigned l, unsigned n)
+{
+  while (l++ != n)
+a[l] = b[l]  + 1;
+}
+
+unsigned
+foo2 (char *a, char *b, unsigned l, unsigned n)
+{
+  while (++l != n)
+if (a[l] != b[l])
+  break;
+
+  return l;
+}
+
+/* { dg-final { scan-tree-dump-times "Loop split" 3 "lsplit" } } */
diff --git a/gcc/tree-ssa-loop-split.c b/gcc/tree-ssa-loop-split.c
index b80b6a75e62..a6d28078e5e 100644
--- a/gcc/tree-ssa-loop-split.c
+++ b/gcc/tree-ssa-loop-split.c
@@ -41,6 +41,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfghooks.h"
 #include "gimple-fold.h"
 #include "gimplify-me.h"
+#include "tree-ssa-loop-ivopts.h"

 /* This file implements two kinds of loop splitting.

@@ -233,7 +234,8 @@ easy_exit_values (class loop *loop)
this.  The loops need to fulfill easy_exit_values().  */

 static void
-connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e)
+connect_loop_phis (class loop *loop1, class loop *loop2, edge new_e,
+  bool use_prev = false)
 {
   basic_block rest = loop_preheader_edge (loop2)->src;
   gcc_assert (new_e->dest == rest);
@@ -248,13 +250,14 @@ connect_loop_phis (class loop *loop1, class loop 
*loop2, edge new_e)

!gsi_end_p (psi_first);
gsi_next (_first), gsi_next (_second))
 {
-  tree init, next, new_init;
+  tree init, next, new_init, prev;
   use_operand_p op;
   gphi *phi_first = psi_first.phi ();
   gphi *phi_second = psi_second.phi ();

   init = PHI_ARG_DEF_FROM_EDGE (phi_first, firste);
   next = PHI_ARG_DEF_FROM_EDGE (phi_first, firstn);
+  prev = PHI_RESULT (phi_first);
   op = PHI_ARG_DEF_PTR_FROM_EDGE (phi_second, seconde);
   gcc_assert (operand_equal_for_phi_arg_p (init, USE_FROM_PTR 
(op)));


@@ -279,7 +282,7 @@ connect_loop_phis (class loop *loop1, class loop 
*loop2, edge new_e)


   gphi * newphi = create_phi_node (new_init, rest);
   add_phi_arg (newphi, init, skip_first, 

[PATCH] IBM Z: Fix error checking for builtin vec_permi

2021-05-06 Thread Marius Hillenbrand via Gcc-patches
Hi,

this patch fixes the check of immediate operands to the builtin vec_permi and
adds a new test for this built-in.

Reg-rested and bootstrapped on s390x.

Is it OK for master? Is it OK for backporting to gcc-11?

Regards,
Marius


--8<--8<-8<-

The builtin vec_permi is peculiar in that its immediate operand is
encoded differently than the immediate operand that is backing the
builtin. This fixes the check for the immediate operand, adding a
regression test in the process.

This partially reverts commit 3191c1f4488d1f7563b563d7ae2a102a26f16d82

gcc/ChangeLog:

2021-05-04  Marius Hillenbrand  

* config/s390/s390-builtins.def (O_M5, O1_M5, ...): Remove unused 
macros.
(s390_vec_permi_s64, s390_vec_permi_b64, s390_vec_permi_u64)
(s390_vec_permi_dbl, s390_vpdi): Use the O3_U2 type for the immediate
operand.
* config/s390/s390.c (s390_const_operand_ok): Remove unused
values.

gcc/testsuite/ChangeLog:

* gcc.target/s390/zvector/imm-range-error-1.c: Fix test for
__builtin_s390_vpdi.
* gcc.target/s390/zvector/vec-permi.c: New test for builtin
vec_permi.
---
 gcc/config/s390/s390-builtins.def | 44 ++-
 gcc/config/s390/s390.c|  7 ++-
 .../s390/zvector/imm-range-error-1.c  |  2 +-
 .../gcc.target/s390/zvector/vec-permi.c   | 54 +++
 4 files changed, 76 insertions(+), 31 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/zvector/vec-permi.c

diff --git a/gcc/config/s390/s390-builtins.def 
b/gcc/config/s390/s390-builtins.def
index f77ab750d22..8ca002dc55a 100644
--- a/gcc/config/s390/s390-builtins.def
+++ b/gcc/config/s390/s390-builtins.def
@@ -29,7 +29,6 @@
 #undef O_U16
 #undef O_U32
 
-#undef O_M5
 #undef O_M12
 
 #undef O_S2
@@ -89,11 +88,6 @@
 #undef O3_U32
 #undef O4_U32
 
-#undef O1_M5
-#undef O2_M5
-#undef O3_M5
-#undef O4_M5
-
 #undef O1_M12
 #undef O2_M12
 #undef O3_M12
@@ -164,20 +158,19 @@
 #define O_U168 /* unsigned 16 bit literal */
 #define O_U329 /* unsigned 32 bit literal */
 
-#define O_M510 /* matches bitmask of 5 */
-#define O_M12   11 /* matches bitmask of 12 */
+#define O_M12   10 /* matches bitmask of 12 */
 
-#define O_S212 /* signed  2 bit literal */
-#define O_S313 /* signed  3 bit literal */
-#define O_S414 /* signed  4 bit literal */
-#define O_S515 /* signed  5 bit literal */
-#define O_S816 /* signed  8 bit literal */
-#define O_S12   17 /* signed 12 bit literal */
-#define O_S16   18 /* signed 16 bit literal */
-#define O_S32   19 /* signed 32 bit literal */
+#define O_S211 /* signed  2 bit literal */
+#define O_S312 /* signed  3 bit literal */
+#define O_S413 /* signed  4 bit literal */
+#define O_S514 /* signed  5 bit literal */
+#define O_S815 /* signed  8 bit literal */
+#define O_S12   16 /* signed 12 bit literal */
+#define O_S16   17 /* signed 16 bit literal */
+#define O_S32   18 /* signed 32 bit literal */
 
-#define O_ELEM  20 /* Element selector requiring modulo arithmetic. */
-#define O_LIT   21 /* Operand must be a literal fitting the target type.  */
+#define O_ELEM  19 /* Element selector requiring modulo arithmetic. */
+#define O_LIT   20 /* Operand must be a literal fitting the target type.  */
 
 #define O_SHIFT 5
 
@@ -230,11 +223,6 @@
 #define O3_U32 (O_U32 << (2 * O_SHIFT))
 #define O4_U32 (O_U32 << (3 * O_SHIFT))
 
-#define O1_M5 O_M5
-#define O2_M5 (O_M5 << O_SHIFT)
-#define O3_M5 (O_M5 << (2 * O_SHIFT))
-#define O4_M5 (O_M5 << (3 * O_SHIFT))
-
 #define O1_M12 O_M12
 #define O2_M12 (O_M12 << O_SHIFT)
 #define O3_M12 (O_M12 << (2 * O_SHIFT))
@@ -671,12 +659,12 @@ OB_DEF_VAR (s390_vec_perm_dbl,  s390_vperm,   
  0,
 B_DEF  (s390_vperm, vec_permv16qi,  0, 
 B_VX,   0,  BT_FN_UV16QI_UV16QI_UV16QI_UV16QI)
 
 OB_DEF (s390_vec_permi, s390_vec_permi_s64, 
s390_vec_permi_dbl, B_VX,   BT_FN_OV4SI_OV4SI_OV4SI_INT)
-OB_DEF_VAR (s390_vec_permi_s64, s390_vpdi,  0, 
 O3_M5,  BT_OV_V2DI_V2DI_V2DI_INT)
-OB_DEF_VAR (s390_vec_permi_b64, s390_vpdi,  0, 
 O3_M5,  BT_OV_BV2DI_BV2DI_BV2DI_INT)
-OB_DEF_VAR (s390_vec_permi_u64, s390_vpdi,  0, 
 O3_M5,  BT_OV_UV2DI_UV2DI_UV2DI_INT)
-OB_DEF_VAR (s390_vec_permi_dbl, s390_vpdi,  0, 
 O3_M5,  BT_OV_V2DF_V2DF_V2DF_INT)
+OB_DEF_VAR (s390_vec_permi_s64, s390_vpdi,  0, 
 O3_U2,  BT_OV_V2DI_V2DI_V2DI_INT)
+OB_DEF_VAR (s390_vec_permi_b64, s390_vpdi,  0, 
 O3_U2,  BT_OV_BV2DI_BV2DI_BV2DI_INT)
+OB_DEF_VAR (s390_vec_permi_u64, s390_vpdi,  0, 
 O3_U2,  BT_OV_UV2DI_UV2DI_UV2DI_INT)
+OB_DEF_VAR 

Re: [PATCH 3/3] go: use htab_eq_string in godump

2021-05-06 Thread Richard Biener via Gcc-patches
On Thu, May 6, 2021 at 12:44 AM Tom Tromey  wrote:
>
> This changes godump to use the new htab_eq_string function.

OK.

> gcc
>
> * godump.c (string_hash_eq): Remove.
> (go_finish): Use htab_eq_string.
> ---
>  gcc/godump.c | 14 +++---
>  1 file changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/godump.c b/gcc/godump.c
> index 7864d9d63e5d..cf9989490356 100644
> --- a/gcc/godump.c
> +++ b/gcc/godump.c
> @@ -109,14 +109,6 @@ macro_hash_del (void *v)
>XDELETE (mhv);
>  }
>
> -/* For the string hash tables.  */
> -
> -static int
> -string_hash_eq (const void *y1, const void *y2)
> -{
> -  return strcmp ((const char *) y1, (const char *) y2) == 0;
> -}
> -
>  /* A macro definition.  */
>
>  static void
> @@ -1374,11 +1366,11 @@ go_finish (const char *filename)
>real_debug_hooks->finish (filename);
>
>container.type_hash = htab_create (100, htab_hash_string,
> - string_hash_eq, NULL);
> +htab_eq_string, NULL);
>container.invalid_hash = htab_create (10, htab_hash_string,
> -   string_hash_eq, NULL);
> +   htab_eq_string, NULL);
>container.keyword_hash = htab_create (50, htab_hash_string,
> -string_hash_eq, NULL);
> +   htab_eq_string, NULL);
>obstack_init (_obstack);
>
>keyword_hash_init ();
> --
> 2.26.3
>


Re: [PATCH 2/3] gcc: use htab_eq_string

2021-05-06 Thread Richard Biener via Gcc-patches
On Thu, May 6, 2021 at 12:43 AM Tom Tromey  wrote:
>
> This changes one spot in GCC to use the new htab_eq_string function.

OK.

> gcc
>
> * gengtype-state.c (read_state): Use htab_eq_string.
> (string_eq): Remove.
> ---
>  gcc/gengtype-state.c | 11 +--
>  1 file changed, 1 insertion(+), 10 deletions(-)
>
> diff --git a/gcc/gengtype-state.c b/gcc/gengtype-state.c
> index 891f2e18a610..a8fde959f4eb 100644
> --- a/gcc/gengtype-state.c
> +++ b/gcc/gengtype-state.c
> @@ -2556,15 +2556,6 @@ equals_type_number (const void *ty1, const void *ty2)
>return type1->state_number == type2->state_number;
>  }
>
> -static int
> -string_eq (const void *a, const void *b)
> -{
> -  const char *a0 = (const char *)a;
> -  const char *b0 = (const char *)b;
> -
> -  return (strcmp (a0, b0) == 0);
> -}
> -
>
>  /* The function reading the state, called by main from gengtype.c.  */
>  void
> @@ -2588,7 +2579,7 @@ read_state (const char *path)
>state_seen_types =
>  htab_create (2017, hash_type_number, equals_type_number, NULL);
>state_ident_tab =
> -htab_create (4027, htab_hash_string, string_eq, NULL);
> +htab_create (4027, htab_hash_string, htab_eq_string, NULL);
>read_state_version (version_string);
>read_state_srcdir ();
>read_state_languages ();
> --
> 2.26.3
>


Re: [PATCH 1/3] libiberty: add htab_eq_string

2021-05-06 Thread Richard Biener via Gcc-patches
On Thu, May 6, 2021 at 12:41 AM Tom Tromey  wrote:
>
> The libiberty hash table includes a helper function for strings, but
> no equality function.  Consequently, this equality function has been
> reimplemented a number of times in both the gcc and binutils-gdb
> source trees.  This patch adds the function to the libiberty hash
> table, as a step toward the goal of removing all the copies.
>
> One change to gcc is included here.  Normally I would have put this in
> the next patch, but gensupport.c used the most natural name for its
> reimplementation of this function, and this can't coexist with the
> extern function in libiberty.

OK.

> include
>
> * hashtab.h (htab_eq_string): Declare.
>
> libiberty
>
> * hashtab.c (htab_eq_string): New function.
>
> gcc
>
> * gensupport.c (htab_eq_string): Remove.
> ---
>  gcc/gensupport.c| 8 
>  include/hashtab.h   | 3 +++
>  libiberty/hashtab.c | 7 +++
>  3 files changed, 10 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/gensupport.c b/gcc/gensupport.c
> index 0f19bd706646..e1ca06dbc1ec 100644
> --- a/gcc/gensupport.c
> +++ b/gcc/gensupport.c
> @@ -2322,14 +2322,6 @@ gen_reader::handle_unknown_directive (file_location 
> loc, const char *rtx_name)
>  process_rtx (x, loc);
>  }
>
> -/* Comparison function for the mnemonic hash table.  */
> -
> -static int
> -htab_eq_string (const void *s1, const void *s2)
> -{
> -  return strcmp ((const char*)s1, (const char*)s2) == 0;
> -}
> -
>  /* Add mnemonic STR with length LEN to the mnemonic hash table
> MNEMONIC_HTAB.  A trailing zero end character is appended to STR
> and a permanent heap copy of STR is created.  */
> diff --git a/include/hashtab.h b/include/hashtab.h
> index b3a6265eeb6e..77c5eec79055 100644
> --- a/include/hashtab.h
> +++ b/include/hashtab.h
> @@ -192,6 +192,9 @@ extern htab_eq htab_eq_pointer;
>  /* A hash function for null-terminated strings.  */
>  extern hashval_t htab_hash_string (const void *);
>
> +/* An equality function for null-terminated strings.  */
> +extern int htab_eq_string (const void *, const void *);
> +
>  /* An iterative hash function for arbitrary data.  */
>  extern hashval_t iterative_hash (const void *, size_t, hashval_t);
>  /* Shorthand for hashing something with an intrinsic size.  */
> diff --git a/libiberty/hashtab.c b/libiberty/hashtab.c
> index 0c7208effe11..7c424e8f6cc1 100644
> --- a/libiberty/hashtab.c
> +++ b/libiberty/hashtab.c
> @@ -841,6 +841,13 @@ htab_hash_string (const PTR p)
>return r;
>  }
>
> +/* An equality function for null-terminated strings.  */
> +int
> +htab_eq_string (const void *a, const void *b)
> +{
> +  return strcmp ((const char *) a, (const char *) b) == 0;
> +}
> +
>  /* DERIVED FROM:
>  
>  lookup2.c, by Bob Jenkins, December 1996, Public Domain.
> --
> 2.26.3
>


Re: [PATCH] -Walloca-larger-than with constant sizes at -O0 (PR 100425)

2021-05-06 Thread Richard Biener via Gcc-patches
On Wed, May 5, 2021 at 7:29 PM Martin Sebor  wrote:
>
> On 5/5/21 1:32 AM, Richard Biener wrote:
> > On Wed, May 5, 2021 at 4:20 AM Martin Sebor via Gcc-patches
> >  wrote:
> >>
> >> Even when explicitly enabled, -Walloca-larger-than doesn't run
> >> unless optimization is enabled as well.  This prevents diagnosing
> >> alloca calls with constant arguments in excess of the limit that
> >> could otherwise be flagged even at -O0, making the warning less
> >> consistent and less useful than is possible.
> >>
> >> The attached patch enables -Walloca-larger-than for calls with
> >> constant arguments in excess of the limit even at -O0 (variable
> >> arguments are only handled with optimization, when VRP runs).
> >
> > Hmm, but then the pass runs even without -Walloca or -Walloca-larger-than
> > or -Wvla[-larger-than].  It performs an IL walk we should avoid in those
> > cases.
> >
> > So the patch is OK but can you please come up with a gate that disables
> > the pass when all of the warnings it handles won't fire anyway?
>
> -W{alloca,vla}-larger-than=PTRDIFF_MAX are enabled by default so
> the pass needs to do walk.
>
> FWIW, it would make sense to me to consolidate all the checking of
> calls for arguments with excessive sizes/values into the same pass
> and single walk (with code still in separate source files).  As it
> is, some are done in their own passes (like alloca and sprintf),
> and others during expansion (-Wstringop-overflow), and others in
> calls.c (-Walloc-size-larger-than).  That leads to repetitive code
> and inconsistent approaches and inconsistent false positives and
> negatives (because some are done at -O0 but others require
> optimization).

True - that would be a nice cleanup (and speedup as well).

Richard.

> Martin
>
> >
> > Thanks,
> > Richard.
> >
> >> Martin
>


[Patch, fortran] PRs 46691 and 99819: Assumed and explicit size class arrays

2021-05-06 Thread Paul Richard Thomas via Gcc-patches
Hi All,

Although I had undertaken to concentrate on PDTs, PR99819 so intrigued me
that I became locked into it :-( After extensive, fruitless rummaging
through decl.c and trans-decl.c, I realised that the problem was far
simpler than it seemed and that it lay in class.c. After that PR was fixed,
PR46691 was a trivial follow up.

The comments in the patch explain the fixes. I left a TODO for the extent
checking of assumed size class arrays. I will try to fix it before pushing.

Regtested on FC33/x86_64 and checked against the 'other brand'. OK for
12-branch and, perhaps, 11-branch?

Regards

Paul

Fortran: Assumed and explicit size class arrays [PR46691/99819].

2021-05-06  Paul Thomas  

gcc/fortran/ChangeLog

PR fortran/46691
PR fortran/99819
* class.c (gfc_build_class_symbol): Class array types that are
not deferred shape or assumed rank are given a unique name and
placed in the procedure namespace.
* trans-array.c (gfc_trans_g77_array): Obtain the data pointer
for class arrays.
(gfc_trans_dummy_array_bias): Suppress the runtime error for
extent violations in explicit shape class arrays because it
always fails.
* trans-expr.c (gfc_conv_procedure_call): Handle assumed size
class actual arguments passed to non-descriptor formal args by
using the data pointer, stored as the symbol's backend decl.

gcc/testsuite/ChangeLog

PR fortran/46691
PR fortran/99819
* gfortran.dg/class_dummy_6.f90: New test.
* gfortran.dg/class_dummy_6.f90: New test.
diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c
index 89353218417..93118ad3455 100644
--- a/gcc/fortran/class.c
+++ b/gcc/fortran/class.c
@@ -630,6 +630,7 @@ gfc_get_len_component (gfc_expr *e, int k)
component '_vptr' which determines the dynamic type.  When this CLASS
entity is unlimited polymorphic, then also add a component '_len' to
store the length of string when that is stored in it.  */
+static int ctr = 0;
 
 bool
 gfc_build_class_symbol (gfc_typespec *ts, symbol_attribute *attr,
@@ -645,13 +646,6 @@ gfc_build_class_symbol (gfc_typespec *ts, symbol_attribute *attr,
 
   gcc_assert (as);
 
-  if (*as && (*as)->type == AS_ASSUMED_SIZE)
-{
-  gfc_error ("Assumed size polymorphic objects or components, such "
-		 "as that at %C, have not yet been implemented");
-  return false;
-}
-
   if (attr->class_ok)
 /* Class container has already been built.  */
 return true;
@@ -693,7 +687,30 @@ gfc_build_class_symbol (gfc_typespec *ts, symbol_attribute *attr,
   else
 ns = ts->u.derived->ns;
 
-  gfc_find_symbol (name, ns, 0, );
+  /* Although this might seem to be counterintuitive, we can build separate
+ class types with different array specs because the TKR interface checks
+ work on the declared type. All array type other than deferred shape or
+ assumed rank are added to the function namespace to ensure that they
+ are properly distinguished.  */
+  if (attr->dummy && !attr->codimension && (*as)
+  && !((*as)->type == AS_DEFERRED || (*as)->type == AS_ASSUMED_RANK))
+{
+  char *sname;
+  ns = gfc_current_ns;
+  gfc_find_symbol (name, ns, 0, );
+  /* If a local class type with this name already exists, update the
+	 name with an index.  */
+  if (fclass)
+	{
+	  fclass = NULL;
+	  sname = xasprintf ("%s_%d", name, ++ctr);
+	  free (name);
+	  name = sname;
+	}
+}
+  else
+gfc_find_symbol (name, ns, 0, );
+
   if (fclass == NULL)
 {
   gfc_symtree *st;
diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c
index e99980fd223..6d38ea78273 100644
--- a/gcc/fortran/trans-array.c
+++ b/gcc/fortran/trans-array.c
@@ -6524,7 +6524,14 @@ gfc_trans_g77_array (gfc_symbol * sym, gfc_wrapped_block * block)
   /* Set the pointer itself if we aren't using the parameter directly.  */
   if (TREE_CODE (parm) != PARM_DECL)
 {
-  tmp = convert (TREE_TYPE (parm), GFC_DECL_SAVED_DESCRIPTOR (parm));
+  tmp = GFC_DECL_SAVED_DESCRIPTOR (parm);
+  if (sym->ts.type == BT_CLASS)
+	{
+	  tmp = build_fold_indirect_ref_loc (input_location, tmp);
+	  tmp = gfc_class_data_get (tmp);
+	  tmp = gfc_conv_descriptor_data_get (tmp);
+	}
+  tmp = convert (TREE_TYPE (parm), tmp);
   gfc_add_modify (, parm, tmp);
 }
   stmt = gfc_finish_block ();
@@ -6626,7 +6633,8 @@ gfc_trans_dummy_array_bias (gfc_symbol * sym, tree tmpdesc,
   && VAR_P (sym->ts.u.cl->backend_decl))
 gfc_conv_string_length (sym->ts.u.cl, NULL, );
 
-  checkparm = (as->type == AS_EXPLICIT
+  /* TODO: Fix the exclusion of class arrays from extent checking.  */
+  checkparm = (as->type == AS_EXPLICIT && !is_classarray
 	   && (gfc_option.rtcheck & GFC_RTCHECK_BOUNDS));
 
   no_repack = !(GFC_DECL_PACKED_ARRAY (tmpdesc)
diff --git a/gcc/fortran/trans-expr.c b/gcc/fortran/trans-expr.c
index 213f32b0a67..5f5479561c2 100644
--- a/gcc/fortran/trans-expr.c
+++ b/gcc/fortran/trans-expr.c
@@ -6420,6 +6420,15 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 fsym ? fsym->attr.intent