Re: [PATCH v2 3/3] pretty-print: Don't translate escape sequences to windows console API

2024-05-12 Thread LIU Hao

在 2024-05-10 01:02, Peter Damianov 写道:

-  if (GetConsoleMode (h, &mode))
-/* If it is a console, translate ANSI escape codes as needed.  */
+  if (GetConsoleMode (h, &mode) && !(mode & 
ENABLE_VIRTUAL_TERMINAL_PROCESSING))
+/* If it is a console, and doesn't support ANSI escape codes, translate
+ * them as needed.
+ */


nitpicking: This should probably be


+ * them as needed.  */



CC'ing Jonathan Yong. This series of patches look good to me.



--
Best regards,
LIU Hao



OpenPGP_signature.asc
Description: OpenPGP digital signature


Re: [Patch, fortran] PR113363 - ICE on ASSOCIATE and unlimited polymorphic function

2024-05-12 Thread Paul Richard Thomas
Hi Harald,

Please find attached my resubmission for pr113363. The changes are as
follows:
(i) The chunk in gfc_conv_procedure_call is new. This was the source of one
of the memory leaks;
(ii) The incorporation of the _len field in trans_class_assignment was done
for the pr84006 patch;
(iii) The source of all the invalid memory accesses and so on was down to
the use of realloc. I tried all sorts of workarounds such as testing the
vptrs and the sizes but only free followed by malloc worked. I have no idea
at all why this is the case; and
(iv) I took account of your remarks about the chunk in trans-array.cc by
removing it and that the chunk in trans-stmt.cc would leak frontend memory.

OK for mainline (and -14 branch after a few-weeks)?

Regards

Paul

Fortran: Fix wrong code in unlimited polymorphic assignment [PR113363]

2024-05-12  Paul Thomas  

gcc/fortran
PR fortran/113363
* trans-array.cc (gfc_array_init_size): Use the expr3 dtype so
that the correct element size is used.
* trans-expr.cc (gfc_conv_procedure_call): Remove restriction
that ss and ss->loop be present for the finalization of class
array function results.
(trans_class_assignment): Use free and malloc, rather than
realloc, for character expressions assigned to unlimited poly
entities.
* trans-stmt.cc (gfc_trans_allocate): Build a correct rhs for
the assignment of an unlimited polymorphic 'source'.

gcc/testsuite/
PR fortran/113363
* gfortran.dg/pr113363.f90: New test.


> > The first chunk in trans-array.cc ensures that the array dtype is set to
> > the source dtype. The second chunk ensures that the lhs _len field does
> not
> > default to zero and so is specific to dynamic types of character.
> >
>
> Why the two gfc_copy_ref?  valgrind pointed my to the tail
> of gfc_copy_ref which already has:
>
>dest->next = gfc_copy_ref (src->next);
>
> so this looks redundant and leaks frontend memory?
>
> ***
>
> Playing with the testcase, I find several invalid writes with
> valgrind, or a heap buffer overflow with -fsanitize=address .
>
>
>
diff --git a/gcc/fortran/trans-array.cc b/gcc/fortran/trans-array.cc
index 7ec33fb1598..c5b56f4e273 100644
--- a/gcc/fortran/trans-array.cc
+++ b/gcc/fortran/trans-array.cc
@@ -5957,6 +5957,11 @@ gfc_array_init_size (tree descriptor, int rank, int corank, tree * poffset,
   tmp = gfc_conv_descriptor_dtype (descriptor);
   gfc_add_modify (pblock, tmp, gfc_get_dtype_rank_type (rank, type));
 }
+  else if (expr3_desc && GFC_DESCRIPTOR_TYPE_P (TREE_TYPE (expr3_desc)))
+{
+  tmp = gfc_conv_descriptor_dtype (descriptor);
+  gfc_add_modify (pblock, tmp, gfc_conv_descriptor_dtype (expr3_desc));
+}
   else
 {
   tmp = gfc_conv_descriptor_dtype (descriptor);
diff --git a/gcc/fortran/trans-expr.cc b/gcc/fortran/trans-expr.cc
index 4590aa6edb4..e315e2d3370 100644
--- a/gcc/fortran/trans-expr.cc
+++ b/gcc/fortran/trans-expr.cc
@@ -8245,8 +8245,7 @@ gfc_conv_procedure_call (gfc_se * se, gfc_symbol * sym,
 	 call the finalization function of the temporary. Note that the
 	 nullification of allocatable components needed by the result
 	 is done in gfc_trans_assignment_1.  */
-  if (expr && ((gfc_is_class_array_function (expr)
-		&& se->ss && se->ss->loop)
+  if (expr && (gfc_is_class_array_function (expr)
 		   || gfc_is_alloc_class_scalar_function (expr))
 	  && se->expr && GFC_CLASS_TYPE_P (TREE_TYPE (se->expr))
 	  && expr->must_finalize)
@@ -12028,18 +12027,25 @@ trans_class_assignment (stmtblock_t *block, gfc_expr *lhs, gfc_expr *rhs,
 
   /* Reallocate if dynamic types are different. */
   gfc_init_block (&re_alloc);
-  tmp = fold_convert (pvoid_type_node, class_han);
-  re = build_call_expr_loc (input_location,
-builtin_decl_explicit (BUILT_IN_REALLOC), 2,
-tmp, size);
-  re = fold_build2_loc (input_location, MODIFY_EXPR, TREE_TYPE (tmp), tmp,
-			re);
-  tmp = fold_build2_loc (input_location, NE_EXPR,
-			 logical_type_node, rhs_vptr, old_vptr);
-  re = fold_build3_loc (input_location, COND_EXPR, void_type_node,
-			tmp, re, build_empty_stmt (input_location));
-  gfc_add_expr_to_block (&re_alloc, re);
-
+  if (UNLIMITED_POLY (lhs) && rhs->ts.type == BT_CHARACTER)
+	{
+	  gfc_add_expr_to_block (&re_alloc, gfc_call_free (class_han));
+	  gfc_allocate_using_malloc (&re_alloc, class_han, size, NULL_TREE);
+	}
+  else
+	{
+	  tmp = fold_convert (pvoid_type_node, class_han);
+	  re = build_call_expr_loc (input_location,
+builtin_decl_explicit (BUILT_IN_REALLOC),
+2, tmp, size);
+	  re = fold_build2_loc (input_location, MODIFY_EXPR, TREE_TYPE (tmp),
+tmp, re);
+	  tmp = fold_build2_loc (input_location, NE_EXPR,
+ logical_type_node, rhs_vptr, old_vptr);
+	  re = fold_build3_loc (input_location, COND_EXPR, void_type_node,
+tmp, re, build_empty_stmt (input_location));
+	  gfc_add_expr_to_block (&re_alloc, re);
+	}
   tree realloc_expr = lhs->ts.type == BT_CLASS ?
 	  gfc_finish_block 

[COMMITTED] Regenerate cygming.opt.urls and mingw.opt.urls

2024-05-12 Thread Mark Wielaard
The new cygming.opt.urls and mingw.opt.urls in the
gcc/config/mingw/cygming.opt.urls directory need to generated by make
regenerate-opt-urls in the gcc subdirectory. They still contained
references to the gcc/config/i386 directory from which they were
copied.

Fixes: 1f05dfc131c7 ("Reuse MinGW from i386 for AArch64")
Fixes: e8d003736e6c ("Rename "x86 Windows Options" to "Cygwin and MinGW 
Options"")

gcc/ChangeLog:

* config/mingw/cygming.opt.urls: Regenerate.
* config/mingw/mingw.opt.urls: Likewise.
---
 gcc/config/mingw/cygming.opt.urls | 7 +++
 gcc/config/mingw/mingw.opt.urls   | 2 +-
 2 files changed, 4 insertions(+), 5 deletions(-)

diff --git a/gcc/config/mingw/cygming.opt.urls 
b/gcc/config/mingw/cygming.opt.urls
index c624e22e4427..af11c4997609 100644
--- a/gcc/config/mingw/cygming.opt.urls
+++ b/gcc/config/mingw/cygming.opt.urls
@@ -1,4 +1,4 @@
-; Autogenerated by regenerate-opt-urls.py from gcc/config/i386/cygming.opt and 
generated HTML
+; Autogenerated by regenerate-opt-urls.py from gcc/config/mingw/cygming.opt 
and generated HTML
 
 mconsole
 UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mconsole)
@@ -9,9 +9,8 @@ UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mdll)
 mnop-fun-dllimport
 UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mnop-fun-dllimport)
 
-; skipping UrlSuffix for 'mthreads' due to multiple URLs:
-;   duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1'
-;   duplicate: 'gcc/x86-Options.html#index-mthreads'
+mthreads
+UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1)
 
 mwin32
 UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mwin32)
diff --git a/gcc/config/mingw/mingw.opt.urls b/gcc/config/mingw/mingw.opt.urls
index f8ee5be6a535..40fb086606b2 100644
--- a/gcc/config/mingw/mingw.opt.urls
+++ b/gcc/config/mingw/mingw.opt.urls
@@ -1,4 +1,4 @@
-; Autogenerated by regenerate-opt-urls.py from gcc/config/i386/mingw.opt and 
generated HTML
+; Autogenerated by regenerate-opt-urls.py from gcc/config/mingw/mingw.opt and 
generated HTML
 
 mcrtdll=
 UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mcrtdll)
-- 
2.39.3



[PATCH] fortran: Assume there is no cyclic reference with submodule symbols [PR99798]

2024-05-12 Thread Mikael Morin
Hello,

Here is my final patch to fix the ICE of PR99798.
It's maybe overly verbose with comments, but the memory management is
hopefully clarified.
I tested this with a full fortran regression test on x86_64-linux and a
manual check with valgrind on the testcase.
OK for master?

-- 8< --

This prevents a premature release of memory with procedure symbols from
submodules, causing random compiler crashes.

The problem is a fragile detection of cyclic references, which can match
with procedures host-associated from a module in submodules, in cases where it
shouldn't.  The formal namespace is released, and with it the dummy arguments
symbols of the procedure.  But there is no cyclic reference, so the procedure
symbol itself is not released and remains, with pointers to its dummy arguments
now dangling.

The fix adds a condition to avoid the case, and refactors to a new predicate
by the way.  Part of the original condition is also removed, for lack of a
reason to keep it.

PR fortran/99798

gcc/fortran/ChangeLog:

* symbol.cc (gfc_release_symbol): Move the condition guarding
the handling cyclic references...
(cyclic_reference_break_needed): ... here as a new predicate.
Remove superfluous parts.  Add a condition preventing any premature
release with submodule symbols.

gcc/testsuite/ChangeLog:

* gfortran.dg/submodule_33.f08: New test.
---
 gcc/fortran/symbol.cc  | 54 +-
 gcc/testsuite/gfortran.dg/submodule_33.f08 | 20 
 2 files changed, 72 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/submodule_33.f08

diff --git a/gcc/fortran/symbol.cc b/gcc/fortran/symbol.cc
index 8f7deac1d1e..0a1646def67 100644
--- a/gcc/fortran/symbol.cc
+++ b/gcc/fortran/symbol.cc
@@ -3179,6 +3179,57 @@ gfc_free_symbol (gfc_symbol *&sym)
 }
 
 
+/* Returns true if the symbol SYM has, through its FORMAL_NS field, a reference
+   to itself which should be eliminated for the symbol memory to be released
+   via normal reference counting.
+
+   The implementation is crucial as it controls the proper release of symbols,
+   especially (contained) procedure symbols, which can represent a lot of 
memory
+   through the namespace of their body.
+
+   We try to avoid freeing too much memory (causing dangling pointers), to not
+   leak too much (wasting memory), and to avoid expensive walks of the symbol
+   tree (which would be the correct way to check for a cycle).  */
+
+bool
+cyclic_reference_break_needed (gfc_symbol *sym)
+{
+  /* Normal symbols don't reference themselves.  */
+  if (sym->formal_ns == nullptr)
+return false;
+
+  /* Procedures at the root of the file do have a self reference, but they 
don't
+ have a reference in a parent namespace preventing the release of the
+ procedure namespace, so they can use the normal reference counting.  */
+  if (sym->formal_ns == sym->ns)
+return false;
+
+  /* If sym->refs == 1, we can use normal reference counting.  If sym->refs > 
2,
+ the symbol won't be freed anyway, with or without cyclic reference.  */
+  if (sym->refs != 2)
+return false;
+
+  /* Procedure symbols host-associated from a module in submodules are special,
+ because the namespace of the procedure block in the submodule is different
+ from the FORMAL_NS namespace generated by host-association.  So there are
+ two different namespaces representing the same procedure namespace.  As
+ FORMAL_NS comes from host-association, which only imports symbols visible
+ from the outside (dummy arguments basically), we can assume there is no
+ self reference through FORMAL_NS in that case.  */
+  if (sym->attr.host_assoc && sym->attr.used_in_submodule)
+return false;
+
+  /* We can assume that contained procedures have cyclic references, because
+ the symbol of the procedure itself is accessible in the procedure body
+ namespace.  So we assume that symbols with a formal namespace different
+ from the declaration namespace and two references, one of which is about
+ to be removed, are procedures with just the self reference left.  At this
+ point, the symbol SYM matches that pattern, so we return true here to
+ permit the release of SYM.  */
+  return true;
+}
+
+
 /* Decrease the reference counter and free memory when we reach zero.
Returns true if the symbol has been freed, false otherwise.  */
 
@@ -3188,8 +3239,7 @@ gfc_release_symbol (gfc_symbol *&sym)
   if (sym == NULL)
 return false;
 
-  if (sym->formal_ns != NULL && sym->refs == 2 && sym->formal_ns != sym->ns
-  && (!sym->attr.entry || !sym->module))
+  if (cyclic_reference_break_needed (sym))
 {
   /* As formal_ns contains a reference to sym, delete formal_ns just
 before the deletion of sym.  */
diff --git a/gcc/testsuite/gfortran.dg/submodule_33.f08 
b/gcc/testsuite/gfortran.dg/submodule_33.f08
new file mode 100644
index 000..b61d750def1
-

[PATCH] c++/modules: Ensure all partial specialisations are tracked [PR114947]

2024-05-12 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

Constrained partial specialisations aren't all necessarily tracked on
the instantiation table.  The modules code uses a separate
'partial_specializations' table to track them instead to ensure that
they get walked and emitted when emitting a module, but currently this
does not always happen.

The attached testcase fails in two ways.  First, because the partial
specialisation is just a declaration (and not a definition),
'set_defining_module' never ends up getting called on it and so it never
gets added to the partial specialisation table.  We fix this by ensuring
that when partial specializations are created they always get added, and
so we never miss one. To prevent adding partial specialisations multiple
times we split this out as a new function.

The second way it fails is that when exporting the primary interface for
a module with partitions, we also re-walk the specializations of all
imported partitions to merge them into a single BMI.  So this patch
ensures that after calling 'match_mergeable_specialization' we also
ensure that if the name came from a partition it gets added to the
specialization table so that a dependency is correctly created for it.

PR c++/114947

gcc/cp/ChangeLog:

* cp-tree.h (set_defining_module_for_partial_spec): Declare.
* module.cc (trees_in::decl_value): Track partial specs coming
from partitions.
(set_defining_module): Don't track partial specialisations here
anymore.
(set_defining_module_for_partial_spec): New function.
* pt.cc (process_partial_specialization): Call it.

gcc/testsuite/ChangeLog:

* g++.dg/modules/partial-4_a.C: New test.
* g++.dg/modules/partial-4_b.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/cp-tree.h   |  1 +
 gcc/cp/module.cc   | 22 ++
 gcc/cp/pt.cc   |  2 ++
 gcc/testsuite/g++.dg/modules/partial-4_a.C |  8 
 gcc/testsuite/g++.dg/modules/partial-4_b.C |  5 +
 5 files changed, 34 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/partial-4_a.C
 create mode 100644 gcc/testsuite/g++.dg/modules/partial-4_b.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index db098c32f2d..2580bf05fb2 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7418,6 +7418,7 @@ extern unsigned get_importing_module (tree, bool = false) 
ATTRIBUTE_PURE;
 /* Where current instance of the decl got declared/defined/instantiated.  */
 extern void set_instantiating_module (tree);
 extern void set_defining_module (tree);
+extern void set_defining_module_for_partial_spec (tree);
 extern void maybe_key_decl (tree ctx, tree decl);
 extern void propagate_defining_module (tree decl, tree orig);
 extern void remove_defining_module (tree decl);
diff --git a/gcc/cp/module.cc b/gcc/cp/module.cc
index 520dd710549..3ca963cb3e9 100644
--- a/gcc/cp/module.cc
+++ b/gcc/cp/module.cc
@@ -8416,6 +8416,11 @@ trees_in::decl_value ()
  add_mergeable_specialization (!is_type, &spec, decl, spec_flags);
}
 
+  /* When making a CMI from a partition we're going to need to walk partial
+specializations again, so make sure they're tracked.  */
+  if (state->is_partition () && (spec_flags & 2))
+   set_defining_module_for_partial_spec (inner);
+
   if (NAMESPACE_SCOPE_P (decl)
  && (mk == MK_named || mk == MK_unique
  || mk == MK_enum || mk == MK_friend_spec)
@@ -19246,13 +19251,22 @@ set_defining_module (tree decl)
  vec_safe_push (class_members, decl);
}
}
-  else if (DECL_IMPLICIT_TYPEDEF_P (decl)
-  && CLASSTYPE_TEMPLATE_SPECIALIZATION (TREE_TYPE (decl)))
-   /* This is a partial or explicit specialization.  */
-   vec_safe_push (partial_specializations, decl);
 }
 }
 
+/* Also remember DECL if it's a newly declared class template partial
+   specialization, because these are not necessarily added to the
+   instantiation tables.  */
+
+void
+set_defining_module_for_partial_spec (tree decl)
+{
+  if (module_p ()
+  && DECL_IMPLICIT_TYPEDEF_P (decl)
+  && CLASSTYPE_TEMPLATE_SPECIALIZATION (TREE_TYPE (decl)))
+vec_safe_push (partial_specializations, decl);
+}
+
 void
 set_originating_module (tree decl, bool friend_p ATTRIBUTE_UNUSED)
 {
diff --git a/gcc/cp/pt.cc b/gcc/cp/pt.cc
index 1816bfd1f40..6d33bac90b0 100644
--- a/gcc/cp/pt.cc
+++ b/gcc/cp/pt.cc
@@ -5456,6 +5456,8 @@ process_partial_specialization (tree decl)
   gcc_checking_assert (!TI_PARTIAL_INFO (tinfo));
   TI_PARTIAL_INFO (tinfo) = build_template_info (tmpl, NULL_TREE);
 
+  set_defining_module_for_partial_spec (decl);
+
   for (inst = DECL_TEMPLATE_INSTANTIATIONS (maintmpl); inst;
inst = TREE_CHAIN (inst))
 {
diff --git a/gcc/testsuite/g++.dg/modules/partial-4_a.C 
b/gcc/testsuite/g++.dg/modules/partia

[PATCH v3] driver: Output to a temp file; rename upon success [PR80182]

2024-05-12 Thread Peter Damianov
Currently, commands like:
gcc -o file.c -lm
will delete the user's code.

This patch makes the linker write executables to a temp file, and then renames
the temp file if successful. This fixes the case above, but has limitations.
The source file will still get overwritten if the link "succeeds", such as the
case of: gcc -o file.c -lm -r

It's not perfect, but it should hopefully stop some people from ruining their
day.

gcc/ChangeLog:
PR driver/80182
* gcc.cc (output_file_temp): New global variable
(driver_handle_option): Create temp file for executable output
(driver::maybe_run_linker): Rename output_file_temp to output_file if
the linker ran successfully

Signed-off-by: Peter Damianov 
---

v3: don't attempt to create temp files -> rename for -o /dev/null

 gcc/gcc.cc | 53 +
 1 file changed, 37 insertions(+), 16 deletions(-)

diff --git a/gcc/gcc.cc b/gcc/gcc.cc
index 830a4700a87..5e38c6e578a 100644
--- a/gcc/gcc.cc
+++ b/gcc/gcc.cc
@@ -2138,6 +2138,11 @@ static int have_E = 0;
 /* Pointer to output file name passed in with -o. */
 static const char *output_file = 0;
 
+/* We write the output file to a temp file, and rename it if linking
+   is successful. This is to prevent mistakes like: gcc -o file.c -lm from
+   deleting the user's code.  */
+static const char *output_file_temp = 0;
+
 /* Pointer to input file name passed in with -truncate.
This file should be truncated after linking. */
 static const char *totruncate_file = 0;
@@ -4610,10 +4615,18 @@ driver_handle_option (struct gcc_options *opts,
 #if defined(HAVE_TARGET_EXECUTABLE_SUFFIX) || 
defined(HAVE_TARGET_OBJECT_SUFFIX)
   arg = convert_filename (arg, ! have_c, 0);
 #endif
-  output_file = arg;
+  output_file_temp = output_file = arg;
+  /* If creating an executable, create a temp file for the output, unless
+ -o /dev/null was requested. This will later get renamed, if the linker
+ succeeds.  */
+  if (!have_c && strcmp (output_file, HOST_BIT_BUCKET) != 0)
+{
+  output_file_temp = make_temp_file ("");
+  record_temp_file (output_file_temp, false, true);
+}
   /* On some systems, ld cannot handle "-o" without a space.  So
 split the option from its argument.  */
-  save_switch ("-o", 1, &arg, validated, true);
+  save_switch ("-o", 1, &output_file_temp, validated, true);
   return true;
 
 case OPT_pie:
@@ -9266,22 +9279,30 @@ driver::maybe_run_linker (const char *argv0) const
   linker_was_run = (tmp != execution_count);
 }
 
-  /* If options said don't run linker,
- complain about input files to be given to the linker.  */
-
-  if (! linker_was_run && !seen_error ())
-for (i = 0; (int) i < n_infiles; i++)
-  if (explicit_link_files[i]
- && !(infiles[i].language && infiles[i].language[0] == '*'))
+  if (!seen_error ())
+{
+  if (linker_was_run)
+   /* If the linker finished without errors, rename the output from the
+  temporary file to the real output name.  */
+   rename (output_file_temp, output_file);
+  else
{
- warning (0, "%s: linker input file unused because linking not done",
-  outfiles[i]);
- if (access (outfiles[i], F_OK) < 0)
-   /* This is can be an indication the user specifed an errorneous
-  separated option value, (or used the wrong prefix for an
-  option).  */
-   error ("%s: linker input file not found: %m", outfiles[i]);
+ /* If options said don't run linker,
+complain about input files to be given to the linker.  */
+ for (i = 0; (int) i < n_infiles; i++)
+   if (explicit_link_files[i]
+   && !(infiles[i].language && infiles[i].language[0] == '*'))
+ {
+   warning (0, "%s: linker input file unused because linking not 
done",
+outfiles[i]);
+   if (access (outfiles[i], F_OK) < 0)
+ /* This is can be an indication the user specifed an 
errorneous
+separated option value, (or used the wrong prefix for an
+option).  */
+ error ("%s: linker input file not found: %m", outfiles[i]);
+ }
}
+}
 }
 
 /* The end of "main".  */
-- 
2.39.2



[to-be-committed] [RISC-V] Improve single inverted bit extraction

2024-05-12 Thread Jeff Law
So the first time I sent this, I attached the wrong patch.  As a result 
the CI system wasn't happy.


The second time I sent the right patch, but I don't see evidence the CI 
system ran the correct patch through.  So I'm just starting over ;-)


--

So this patch fixes a minor code generation inefficiency that (IIRC) the
RAU team discovered a while ago in spec.

If we want the inverted value of a single bit we can use bext to extract
the bit, then seq to invert the value (if viewed as a 0/1 truth value).

The RTL is fairly convoluted, but it's basically a right shift to get
the bit into position, bitwise-not then masking off all but the low bit.
So it's a 3->2 combine, hidden by the fact that and-not is a
define_insn_and_split, so it actually looks like a 2->2 combine.

We've run this through Ventana's internal CI (which includes
zba_zbb_zbs) and I've run it in my own tester (rv64gc, rv32gcv).  I'll
wait for the upstream CI to finish with positive results before pushing.

Jeff* config/riscv/riscv.cc (riscv_build_integer_1): Recognize cases where
we can use shNadd to improve constant synthesis.
(riscv_move_integer): Handle code generation for shNadd.

gcc/testsuite
* gcc.target/riscv/synthesis-1.c: Also count shNadd instructions.
* gcc.target/riscv/synthesis-3.c: New test.

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index d76a72d30e0..cf2fa04d4c4 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -711,6 +711,30 @@ (define_insn "*bext"
   "bext\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; This is a bext followed by a seqz.  Normally this would be a 3->2 split
+;; But the and-not pattern with a constant operand is a define_insn_and_split,
+;; so this looks like a 2->2 split, which combine rejects.  So implement it
+;; as a define_insn_and_split as well.
+(define_insn_and_split "*bextseqzdisi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (and:DI
+ (not:DI
+   (subreg:DI
+ (lshiftrt:SI
+   (match_operand:SI 1 "register_operand" "r")
+   (match_operand:QI 2 "register_operand" "r")) 0))
+  (const_int 1)))]
+  "TARGET_64BIT && TARGET_ZBS"
+  "#"
+  "&& 1"
+  [(set (match_dup 0) (and:DI (subreg:DI
+   (lshiftrt:SI (match_dup 1)
+(match_dup 2)) 0)
+ (const_int 1)))
+   (set (match_dup 0) (eq:DI (match_dup 0) (const_int 0)))]
+  ""
+  [(set_attr "type" "bitmanip")])
+
 ;; When performing `(a & (1UL << bitno)) ? 0 : -1` the combiner
 ;; usually has the `bitno` typed as X-mode (i.e. no further
 ;; zero-extension is performed around the bitno).
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c
new file mode 100644
index 000..53f47dc3afe
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+
+_Bool match(const int ch, int fMap) {
+return ((fMap & (1<<(ch))) == 0);
+}
+
+
+/* { dg-final { scan-assembler-times "bext\t" 1 } } */
+/* { dg-final { scan-assembler-times "seqz\t" 1 } } */
+/* { dg-final { scan-assembler-not "sraw\t" } } */
+/* { dg-final { scan-assembler-not "not\t" } } */
+/* { dg-final { scan-assembler-not "andi\t" } } */


Re: [PATCH] fortran: Assume there is no cyclic reference with submodule symbols [PR99798]

2024-05-12 Thread Paul Richard Thomas
Hi Mikael,

That is an ingenious solution. Given the complexity, I think that the
comments are well warranted.

OK for master and, I would suggest, 14-branch after a few weeks.

Thanks!

Paul

On Sun, 12 May 2024 at 14:16, Mikael Morin  wrote:

> Hello,
>
> Here is my final patch to fix the ICE of PR99798.
> It's maybe overly verbose with comments, but the memory management is
> hopefully clarified.
> I tested this with a full fortran regression test on x86_64-linux and a
> manual check with valgrind on the testcase.
> OK for master?
>
> -- 8< --
>
> This prevents a premature release of memory with procedure symbols from
> submodules, causing random compiler crashes.
>
> The problem is a fragile detection of cyclic references, which can match
> with procedures host-associated from a module in submodules, in cases
> where it
> shouldn't.  The formal namespace is released, and with it the dummy
> arguments
> symbols of the procedure.  But there is no cyclic reference, so the
> procedure
> symbol itself is not released and remains, with pointers to its dummy
> arguments
> now dangling.
>
> The fix adds a condition to avoid the case, and refactors to a new
> predicate
> by the way.  Part of the original condition is also removed, for lack of a
> reason to keep it.
>
> PR fortran/99798
>
> gcc/fortran/ChangeLog:
>
> * symbol.cc (gfc_release_symbol): Move the condition guarding
> the handling cyclic references...
> (cyclic_reference_break_needed): ... here as a new predicate.
> Remove superfluous parts.  Add a condition preventing any premature
> release with submodule symbols.
>
> gcc/testsuite/ChangeLog:
>
> * gfortran.dg/submodule_33.f08: New test.
> ---
>  gcc/fortran/symbol.cc  | 54 +-
>  gcc/testsuite/gfortran.dg/submodule_33.f08 | 20 
>  2 files changed, 72 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gfortran.dg/submodule_33.f08
>
> diff --git a/gcc/fortran/symbol.cc b/gcc/fortran/symbol.cc
> index 8f7deac1d1e..0a1646def67 100644
> --- a/gcc/fortran/symbol.cc
> +++ b/gcc/fortran/symbol.cc
> @@ -3179,6 +3179,57 @@ gfc_free_symbol (gfc_symbol *&sym)
>  }
>
>
> +/* Returns true if the symbol SYM has, through its FORMAL_NS field, a
> reference
> +   to itself which should be eliminated for the symbol memory to be
> released
> +   via normal reference counting.
> +
> +   The implementation is crucial as it controls the proper release of
> symbols,
> +   especially (contained) procedure symbols, which can represent a lot of
> memory
> +   through the namespace of their body.
> +
> +   We try to avoid freeing too much memory (causing dangling pointers),
> to not
> +   leak too much (wasting memory), and to avoid expensive walks of the
> symbol
> +   tree (which would be the correct way to check for a cycle).  */
> +
> +bool
> +cyclic_reference_break_needed (gfc_symbol *sym)
> +{
> +  /* Normal symbols don't reference themselves.  */
> +  if (sym->formal_ns == nullptr)
> +return false;
> +
> +  /* Procedures at the root of the file do have a self reference, but
> they don't
> + have a reference in a parent namespace preventing the release of the
> + procedure namespace, so they can use the normal reference counting.
> */
> +  if (sym->formal_ns == sym->ns)
> +return false;
> +
> +  /* If sym->refs == 1, we can use normal reference counting.  If
> sym->refs > 2,
> + the symbol won't be freed anyway, with or without cyclic reference.
> */
> +  if (sym->refs != 2)
> +return false;
> +
> +  /* Procedure symbols host-associated from a module in submodules are
> special,
> + because the namespace of the procedure block in the submodule is
> different
> + from the FORMAL_NS namespace generated by host-association.  So
> there are
> + two different namespaces representing the same procedure namespace.
> As
> + FORMAL_NS comes from host-association, which only imports symbols
> visible
> + from the outside (dummy arguments basically), we can assume there is
> no
> + self reference through FORMAL_NS in that case.  */
> +  if (sym->attr.host_assoc && sym->attr.used_in_submodule)
> +return false;
> +
> +  /* We can assume that contained procedures have cyclic references,
> because
> + the symbol of the procedure itself is accessible in the procedure
> body
> + namespace.  So we assume that symbols with a formal namespace
> different
> + from the declaration namespace and two references, one of which is
> about
> + to be removed, are procedures with just the self reference left.  At
> this
> + point, the symbol SYM matches that pattern, so we return true here to
> + permit the release of SYM.  */
> +  return true;
> +}
> +
> +
>  /* Decrease the reference counter and free memory when we reach zero.
> Returns true if the symbol has been freed, false otherwise.  */
>
> @@ -3188,8 +3239,7 @@ gfc_release_symbol (gfc_symbol *&sy

[PATCH, committed] Fortran: fix frontend memleak

2024-05-12 Thread Harald Anlauf
Dear all,

the attached obvious patch fixes a frontend memleak that was introduced
recently, and which shows up when checking for inquiry references.
I came across it when working on pr115039.

Committed after regtesting as r15-391-g13b6ac4ebd04f0.

Thanks,
Harald

From 13b6ac4ebd04f0703d92828c9268b0b216890b0d Mon Sep 17 00:00:00 2001
From: Harald Anlauf 
Date: Sun, 12 May 2024 21:48:03 +0200
Subject: [PATCH] Fortran: fix frontend memleak

gcc/fortran/ChangeLog:

	* primary.cc (gfc_match_varspec): Replace 'ref' argument to
	is_inquiry_ref() by NULL when the result is not needed to avoid
	a memleak.
---
 gcc/fortran/primary.cc | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/fortran/primary.cc b/gcc/fortran/primary.cc
index 606e84432be..8e7833769a8 100644
--- a/gcc/fortran/primary.cc
+++ b/gcc/fortran/primary.cc
@@ -2250,7 +2250,7 @@ gfc_match_varspec (gfc_expr *primary, int equiv_flag, bool sub_flag,
 	 can be found.  If this was an inquiry reference with the same name
 	 as a derived component and the associate-name type is not derived
 	 or class, this is fixed up in 'gfc_fixup_inferred_type_refs'.  */
-  if (mm == MATCH_YES && is_inquiry_ref (name, &tmp)
+  if (mm == MATCH_YES && is_inquiry_ref (name, NULL)
 	  && !(sym->ts.type == BT_UNKNOWN
 		&& gfc_find_derived_types (sym, gfc_current_ns, name)))
 	inquiry = true;
--
2.35.3



Re: [Patch, fortran] PR113363 - ICE on ASSOCIATE and unlimited polymorphic function

2024-05-12 Thread Harald Anlauf

Hi Paul,

this looks all good now, and is OK for mainline as well as backporting!

***

While playing with the testcase, I found 3 remaining smaller issues that
are pre-existing, so they should not delay your present work.  To make
it clear: these are not regressions.

When "maliciously" perturbing the testcase by adding parentheses in the
right places, I see the following:

Replacing

  associate (var => foo ()) ! OK after r14-9489-g3fd46d859cda10

by

  associate (var => (foo ()))

gives an ICE here with 14-branch and 15-mainline.

Similarly replacing

  allocate (y, source = x(1))   ! Gave zero length here

by

  allocate (y, source = (x(1)))

Furthermore, replacing

  allocate(x, source = foo ())
by

  allocate(x, source = (foo ()))

gives a runtime segfault with both 14-branch and 15-mainline.
So this is something for another day...

Thanks for the patch!

Harald


Am 12.05.24 um 13:27 schrieb Paul Richard Thomas:

Hi Harald,

Please find attached my resubmission for pr113363. The changes are as
follows:
(i) The chunk in gfc_conv_procedure_call is new. This was the source of one
of the memory leaks;
(ii) The incorporation of the _len field in trans_class_assignment was done
for the pr84006 patch;
(iii) The source of all the invalid memory accesses and so on was down to
the use of realloc. I tried all sorts of workarounds such as testing the
vptrs and the sizes but only free followed by malloc worked. I have no idea
at all why this is the case; and
(iv) I took account of your remarks about the chunk in trans-array.cc by
removing it and that the chunk in trans-stmt.cc would leak frontend memory.

OK for mainline (and -14 branch after a few-weeks)?

Regards

Paul

Fortran: Fix wrong code in unlimited polymorphic assignment [PR113363]

2024-05-12  Paul Thomas  

gcc/fortran
PR fortran/113363
* trans-array.cc (gfc_array_init_size): Use the expr3 dtype so
that the correct element size is used.
* trans-expr.cc (gfc_conv_procedure_call): Remove restriction
that ss and ss->loop be present for the finalization of class
array function results.
(trans_class_assignment): Use free and malloc, rather than
realloc, for character expressions assigned to unlimited poly
entities.
* trans-stmt.cc (gfc_trans_allocate): Build a correct rhs for
the assignment of an unlimited polymorphic 'source'.

gcc/testsuite/
PR fortran/113363
* gfortran.dg/pr113363.f90: New test.



The first chunk in trans-array.cc ensures that the array dtype is set to
the source dtype. The second chunk ensures that the lhs _len field does

not

default to zero and so is specific to dynamic types of character.



Why the two gfc_copy_ref?  valgrind pointed my to the tail
of gfc_copy_ref which already has:

dest->next = gfc_copy_ref (src->next);

so this looks redundant and leaks frontend memory?

***

Playing with the testcase, I find several invalid writes with
valgrind, or a heap buffer overflow with -fsanitize=address .










[PATCH] rtlanal: Correct cost regularization in pattern_cost

2024-05-12 Thread HAO CHEN GUI
Hi,
   The cost return from set_src_cost might be zero. Zero for
pattern_cost means unknown cost. So the regularization converts the zero
to COSTS_N_INSNS (1).

   // pattern_cost
   cost = set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), speed);
   return cost > 0 ? cost : COSTS_N_INSNS (1);

   But if set_src_cost returns a value less than COSTS_N_INSNS (1), it's
untouched and just returned by pattern_cost. Thus "zero" from set_src_cost
is higher than "one" from set_src_cost.

  For instance, i386 returns cost "one" for zero_extend op.
//ix86_rtx_costs
case ZERO_EXTEND:
  /* The zero extensions is often completely free on x86_64, so make
 it as cheap as possible.  */
  if (TARGET_64BIT && mode == DImode
  && GET_MODE (XEXP (x, 0)) == SImode)
*total = 1;

  This patch fixes the problem by converting all costs which are less than
COSTS_N_INSNS (1) to COSTS_N_INSNS (1).

  Bootstrapped and tested on x86 and powerpc64-linux BE and LE with no
regressions. Is it OK for the trunk?

Thanks
Gui Haochen

ChangeLog
rtlanal: Correct cost regularization in pattern_cost

For the pattern_cost (insn_cost), the smallest known cost is
COSTS_N_INSNS (1) and zero means the cost is unknown.  The method calls
set_src_cost which might returns 0 or a value less than COSTS_N_INSNS (1).
For these cases, pattern_cost should always return COSTS_N_INSNS (1).
Current regularization is wrong and a value less than COSTS_N_INSNS (1)
but larger than 0 will be returned.  This patch corrects it.

gcc/
* rtlanal.cc (pattern_cost): Return COSTS_N_INSNS (1) when the cost
is less than COSTS_N_INSNS (1).

patch.diff
diff --git a/gcc/rtlanal.cc b/gcc/rtlanal.cc
index 4158a531bdd..f7b3d7d72ce 100644
--- a/gcc/rtlanal.cc
+++ b/gcc/rtlanal.cc
@@ -5762,7 +5762,7 @@ pattern_cost (rtx pat, bool speed)
 return 0;

   cost = set_src_cost (SET_SRC (set), GET_MODE (SET_DEST (set)), speed);
-  return cost > 0 ? cost : COSTS_N_INSNS (1);
+  return cost > COSTS_N_INSNS (1) ? cost : COSTS_N_INSNS (1);
 }

 /* Calculate the cost of a single instruction.  A return value of zero


[to-be-committed] [RISC-V] Improve single inverted bit extraction - v2

2024-05-12 Thread Jeff Law


So the first version failed CI and after looking at the patch again, I 
think it can be improved.


First, the output pattern might as well go ahead and use the 
zero_extract form.


Second, we should be able to handle cases where all the ops are in 
word_mode as well as when the shift is in a narrow made.


Third, the testcase should cover additional modes.

Fourth, fix some lint issues with tabs vs spaces.

This has only been lightly tested, so it should be interesting to see 
what CI shows.


Jeffdiff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index d76a72d30e0..724511b6df3 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -711,6 +711,49 @@ (define_insn "*bext"
   "bext\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; This is a bext followed by a seqz.  Normally this would be a 3->2 split
+;; But the and-not pattern with a constant operand is a define_insn_and_split,
+;; so this looks like a 2->2 split, which combine rejects.  So implement it
+;; as a define_insn_and_split as well.
+(define_insn_and_split "*bextseqzdisi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (and:DI
+ (not:DI
+   (subreg:DI
+ (lshiftrt:SI
+   (match_operand:SI 1 "register_operand" "r")
+   (match_operand:QI 2 "register_operand" "r")) 0))
+ (const_int 1)))]
+  "TARGET_64BIT && TARGET_ZBS"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (zero_extract:DI (match_dup 1)
+(const_int 1)
+(zero_extend:DI (match_dup 2
+   (set (match_dup 0) (eq:DI (match_dup 0) (const_int 0)))]
+  "operands[1] = gen_lowpart (word_mode, operands[1]);"
+  [(set_attr "type" "bitmanip")])
+
+(define_insn_and_split "*bextseqzdisi"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (and:X
+ (not:X
+   (lshiftrt:X
+ (match_operand:X 1 "register_operand" "r")
+ (match_operand:QI 2 "register_operand" "r")))
+ (const_int 1)))]
+  "TARGET_ZBS"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (zero_extract:X (match_dup 1)
+   (const_int 1)
+   (zero_extend:X (match_dup 2
+   (set (match_dup 0) (eq:X (match_dup 0) (const_int 0)))]
+  "operands[1] = gen_lowpart (word_mode, operands[1]);"
+  [(set_attr "type" "bitmanip")])
+
 ;; When performing `(a & (1UL << bitno)) ? 0 : -1` the combiner
 ;; usually has the `bitno` typed as X-mode (i.e. no further
 ;; zero-extension is performed around the bitno).
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c
new file mode 100644
index 000..719df442fed
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } } */
+
+
+_Bool match(const int ch, int fMap) {
+return ((fMap & (1<<(ch))) == 0);
+}
+
+_Bool match2(const int ch, int fMap) {
+return ((fMap & (1UL<<(ch))) == 0);
+}
+
+
+/* { dg-final { scan-assembler-times "bext\t" 1 } } */
+/* { dg-final { scan-assembler-times "seqz\t" 1 } } */
+/* { dg-final { scan-assembler-not "sraw\t" } } */
+/* { dg-final { scan-assembler-not "not\t" } } */
+/* { dg-final { scan-assembler-not "andi\t" } } */


[x86 SSE] Improve handling of ternlog instructions in i386/sse.md

2024-05-12 Thread Roger Sayle

This patch improves the way that the x86 backend recognizes and
expands AVX512's bitwise ternary logic (vpternlog) instructions.

As a motivating example consider the following code which calculates
the carry out from a (binary) full adder:

typedef unsigned long long v4di __attribute((vector_size(32)));

v4di foo(v4di a, v4di b, v4di c)
{
return (a & b) | ((a ^ b) & c);
}

with -O2 -march=cascadelake current mainline produces:

foo:vpternlogq  $96, %ymm0, %ymm1, %ymm2
vmovdqa %ymm0, %ymm3
vmovdqa %ymm2, %ymm0
vpternlogq  $248, %ymm3, %ymm1, %ymm0
ret

with the patch below, we now generate a single instruction:

foo:vpternlogq  $232, %ymm2, %ymm1, %ymm0
ret


The AVX512 vpternlog[qd] instructions are a very cool addition to the
x86 instruction set, that can calculate any Boolean function of three
inputs in a single fast instruction.  As the truth table for any
three-input function has 8 rows, any specific function can be represented
by specifying those bits, i.e. by an 8-bit byte, an immediate integer
between 0 and 256.

Examples of ternary functions and their indices are given below:

0x01   1:  ~((b|a)|c)
0x02   2:  (~(b|a))&c
0x03   3:  ~(b|a)
0x04   4:  (~(c|a))&b
0x05   5:  ~(c|a)
0x06   6:  (c^b)&~a
0x07   7:  ~((c&b)|a)
0x08   8:  (~a&c)&b (~a&b)&c (c&b)&~a
0x09   9:  ~((c^b)|a)
0x0a  10:  ~a&c
0x0b  11:  ~((~c&b)|a) (~b|c)&~a
0x0c  12:  ~a&b
0x0d  13:  ~((~b&c)|a) (~c|b)&~a
0x0e  14:  (c|b)&~a
0x0f  15:  ~a
0x10  16:  (~(c|b))&a
0x11  17:  ~(c|b)
...
0xf4 244:  (~c&b)|a
0xf5 245:  ~c|a
0xf6 246:  (c^b)|a
0xf7 247:  (~(c&b))|a
0xf8 248:  (c&b)|a
0xf9 249:  (~(c^b))|a
0xfa 250:  c|a
0xfb 251:  (c|a)|~b (~b|a)|c (~b|c)|a
0xfc 252:  b|a
0xfd 253:  (b|a)|~c (~c|a)|b (~c|b)|a
0xfe 254:  (b|a)|c (c|a)|b (c|b)|a

A naive implementation (in many compilers) might be add define_insn
patterns for all 256 different functions.  The situation is even
worse as many of these Boolean functions don't have a "canonical form"
(as produced by simplify_rtx) and would each need multiple patterns.
See the space-separated equivalent expressions in the table above.

This need to provide instruction "templates" might explain why GCC,
LLVM and ICC all exhibit similar coverage problems in their ability
to recognize x86 ternlog ternary functions.

Perhaps a unique feature of GCC's design is that in addition to regular
define_insn templates, machine descriptions can also perform pattern
matching via a match_operator (and its corresponding predicate).
This patch introduces a ternlog_operand predicate that matches a
(possibly infinite) set of expression trees, identifying those that
have at most three unique operands.  This then allows a
define_insn_and_split to recognize suitable expressions and then
transform them into the appropriate UNSPEC_VTERNLOG as a pre-reload
splitter.  This design allows combine to smash together arbitrarily
complex Boolean expressions, then transform them into an UNSPEC
before register allocation.  As an "optimization", where possible
ix86_expand_ternlog generates a simpler binary operation, using
AND, XOR, IOR or ANDN where possible, and in a few cases attempts
to "canonicalize" the ternlog, by reordering or duplicating operands,
so that later CSE passes have a hope of spotting equivalent values.

Another benefit of this patch is that it improves the code
generated for PR target/115021 [see comment #1].

This patch leaves the existing ternlog patterns in sse.md (for now),
many of which are made obsolete by these changes.  In theory we now
only need one define_insn for UNSPEC_VTERNLOG.  One complication from
these previous variants was that they inconsistently used decimal vs.
hexadecimal to specify the immediate constant operand in assembly
language, making the list of tweaks to the testsuite with this patch
larger than it might have been.  I propose to remove the vestigial
patterns in a follow-up patch, once this approach has baked (proven
to be stable) on mainline.


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?

2024-05-12  Roger Sayle  

gcc/ChangeLog
PR target/115021
* config/i386/i386-expand.cc (ix86_expand_args_builtin): Call
fixup_modeless_constant before testing predicates.  Only call
copy_to_mode_reg on memory operands (after the first one).
(ix86_gen_bcst_mem): Helper function to convert a CONST_VECTOR
into a VEC_DUPLICATE if possible.
(ix86_ternlog_idx):  Convert an RTX expression into a ternlog
index between 0 and 255, recording the operands in ARGS, if
possible or return -1 if this is not possible/valid.
(ix86_ternlog_leaf_p): Helper function to identify "leaves"
of a ternlog expression, e.g. REG_P, MEM_P, CONST_VECTOR, etc.
(ix86_ternlog_operand_p): Test whether a expression is suitable
for and

[SUBREG V4 0/4] Add DF_LIVE_SUBREG data and apply to IRA and LRA

2024-05-12 Thread Juzhe-Zhong
V3: Address comments from Dimitar Dimitrov

V4: Move detailed function from subreg-live-range.h to subreg-live-range.cc.

These patches are used to add a new data flow DF_LIVE_SUBREG,
which will track subreg liveness and then apply it to IRA and LRA
passes (enabled via -O3 or -ftrack-subreg-liveness). These patches
are for GCC 15. And these codes are pushed to the devel/subreg-coalesce
branch. In addition, my colleague Shuo Chen will also be involved in some
of the remain work, thank you for your support.

These patches are separated from the subreg-coalesce patches submitted
a few months ago. I refactored the code according to comments. The next
patches will support subreg coalesce base on they. Here are some data
abot build time of SPEC INT 2017 (x86-64 target):

  baseline   baseline(+track-subreg-liveness)
specint2017 build time :  1892s  1883s

Regarding build times, I've run it a few times, but they all seem to take
much less time. Since the difference is small, it's possible that it's just
a change in environment. But it's theoretically possible, since supporting
subreg-liveness could have reduced the number of living regs.

For memory usage, I trided PR 69609 by valgrind, peak memory size grow from
2003910656 to 2003947520, very small increase.

Note that these patches don't enable register coalesce with subreg liveness in 
IRA/LRA,
so no performance change as expected.

And we will enable register coalsece with subreg liveness tracking in the 
followup patches.

Bootstrap and Regtested on x86-64 no regression.

Co-authored-by: Lehua Ding 

Juzhe-Zhong (4):
  DF: Add -ftrack-subreg-liveness option
  DF: Add DF_LIVE_SUBREG problem
  IRA: Apply DF_LIVE_SUBREG data
  LRA: Apply DF_LIVE_SUBREG data

 gcc/Makefile.in  |   1 +
 gcc/common.opt   |   4 +
 gcc/common.opt.urls  |   3 +
 gcc/df-problems.cc   | 886 ++-
 gcc/df.h | 159 +++
 gcc/doc/invoke.texi  |   8 +
 gcc/ira-build.cc |   7 +-
 gcc/ira-color.cc |   8 +-
 gcc/ira-emit.cc  |  12 +-
 gcc/ira-lives.cc |   7 +-
 gcc/ira.cc   |  19 +-
 gcc/lra-coalesce.cc  |  27 +-
 gcc/lra-constraints.cc   | 109 -
 gcc/lra-int.h|   4 +
 gcc/lra-lives.cc | 357 
 gcc/lra-remat.cc |   8 +-
 gcc/lra-spills.cc|  27 +-
 gcc/lra.cc   |  10 +-
 gcc/opts.cc  |   1 +
 gcc/regs.h   |   5 +
 gcc/sbitmap.cc   |  98 +
 gcc/sbitmap.h|   2 +
 gcc/subreg-live-range.cc | 233 ++
 gcc/subreg-live-range.h  |  60 +++
 gcc/timevar.def  |   1 +
 25 files changed, 1920 insertions(+), 136 deletions(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

-- 
2.36.3



[SUBREG V4 4/4] LRA: Apply DF_LIVE_SUBREG data

2024-05-12 Thread Juzhe-Zhong
---
 gcc/lra-coalesce.cc|  27 +++-
 gcc/lra-constraints.cc | 109 ++---
 gcc/lra-int.h  |   4 +
 gcc/lra-lives.cc   | 357 -
 gcc/lra-remat.cc   |   8 +-
 gcc/lra-spills.cc  |  27 +++-
 gcc/lra.cc |  10 +-
 7 files changed, 430 insertions(+), 112 deletions(-)

diff --git a/gcc/lra-coalesce.cc b/gcc/lra-coalesce.cc
index a9b5b51cb3f..9416775a009 100644
--- a/gcc/lra-coalesce.cc
+++ b/gcc/lra-coalesce.cc
@@ -186,19 +186,28 @@ static bitmap_head used_pseudos_bitmap;
 /* Set up USED_PSEUDOS_BITMAP, and update LR_BITMAP (a BB live info
bitmap).  */
 static void
-update_live_info (bitmap lr_bitmap)
+update_live_info (bitmap all, bitmap full, bitmap partial)
 {
   unsigned int j;
   bitmap_iterator bi;
 
   bitmap_clear (&used_pseudos_bitmap);
-  EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, lr_bitmap,
+  EXECUTE_IF_AND_IN_BITMAP (&coalesced_pseudos_bitmap, all,
FIRST_PSEUDO_REGISTER, j, bi)
 bitmap_set_bit (&used_pseudos_bitmap, first_coalesced_pseudo[j]);
-  if (! bitmap_empty_p (&used_pseudos_bitmap))
+  if (!bitmap_empty_p (&used_pseudos_bitmap))
 {
-  bitmap_and_compl_into (lr_bitmap, &coalesced_pseudos_bitmap);
-  bitmap_ior_into (lr_bitmap, &used_pseudos_bitmap);
+  bitmap_and_compl_into (all, &coalesced_pseudos_bitmap);
+  bitmap_ior_into (all, &used_pseudos_bitmap);
+
+  if (flag_track_subreg_liveness)
+   {
+ bitmap_and_compl_into (full, &coalesced_pseudos_bitmap);
+ bitmap_ior_and_compl_into (full, &used_pseudos_bitmap, partial);
+
+ bitmap_and_compl_into (partial, &coalesced_pseudos_bitmap);
+ bitmap_ior_and_compl_into (partial, &used_pseudos_bitmap, full);
+   }
 }
 }
 
@@ -301,8 +310,12 @@ lra_coalesce (void)
   bitmap_initialize (&used_pseudos_bitmap, ®_obstack);
   FOR_EACH_BB_FN (bb, cfun)
 {
-  update_live_info (df_get_live_in (bb));
-  update_live_info (df_get_live_out (bb));
+  update_live_info (df_get_subreg_live_in (bb),
+   df_get_subreg_live_full_in (bb),
+   df_get_subreg_live_partial_in (bb));
+  update_live_info (df_get_subreg_live_out (bb),
+   df_get_subreg_live_full_out (bb),
+   df_get_subreg_live_partial_out (bb));
   FOR_BB_INSNS_SAFE (bb, insn, next)
if (INSN_P (insn)
&& bitmap_bit_p (&involved_insns_bitmap, INSN_UID (insn)))
diff --git a/gcc/lra-constraints.cc b/gcc/lra-constraints.cc
index e945a4da451..c9246e6be58 100644
--- a/gcc/lra-constraints.cc
+++ b/gcc/lra-constraints.cc
@@ -6565,34 +6565,86 @@ update_ebb_live_info (rtx_insn *head, rtx_insn *tail)
{
  if (prev_bb != NULL)
{
- /* Update df_get_live_in (prev_bb):  */
+ /* Update subreg live (prev_bb):  */
+ bitmap subreg_all_in = df_get_subreg_live_in (prev_bb);
+ bitmap subreg_full_in = df_get_subreg_live_full_in (prev_bb);
+ bitmap subreg_partial_in = df_get_subreg_live_partial_in 
(prev_bb);
+ subregs_live *range_in = df_get_subreg_live_range_in (prev_bb);
  EXECUTE_IF_SET_IN_BITMAP (&check_only_regs, 0, j, bi)
if (bitmap_bit_p (&live_regs, j))
- bitmap_set_bit (df_get_live_in (prev_bb), j);
-   else
- bitmap_clear_bit (df_get_live_in (prev_bb), j);
+ {
+   bitmap_set_bit (subreg_all_in, j);
+   if (flag_track_subreg_liveness)
+ {
+   bitmap_set_bit (subreg_full_in, j);
+   if (bitmap_bit_p (subreg_partial_in, j))
+ {
+   bitmap_clear_bit (subreg_partial_in, j);
+   range_in->remove_range (j);
+ }
+ }
+ }
+   else if (bitmap_bit_p (subreg_all_in, j))
+ {
+   bitmap_clear_bit (subreg_all_in, j);
+   if (flag_track_subreg_liveness)
+ {
+   bitmap_clear_bit (subreg_full_in, j);
+   if (bitmap_bit_p (subreg_partial_in, j))
+ {
+   bitmap_clear_bit (subreg_partial_in, j);
+   range_in->remove_range (j);
+ }
+ }
+ }
}
+ bitmap subreg_all_out = df_get_subreg_live_out (curr_bb);
  if (curr_bb != last_bb)
{
- /* Update df_get_live_out (curr_bb):  */
+ /* Update subreg live (curr_bb):  */
+ bitmap subreg_full_out = df_get_subreg_live_full_out (curr_bb);
+ bitmap subreg_partial_out = df_get_subreg_live_partial_out 
(curr_bb);
+ subregs_live *range_out = df_get_subreg_liv

[SUBREG V4 1/4] DF: Add -ftrack-subreg-liveness option

2024-05-12 Thread Juzhe-Zhong
---
 gcc/common.opt  | 4 
 gcc/common.opt.urls | 3 +++
 gcc/doc/invoke.texi | 8 
 gcc/opts.cc | 1 +
 4 files changed, 16 insertions(+)

diff --git a/gcc/common.opt b/gcc/common.opt
index 40cab3cb36a..5710e817abe 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2163,6 +2163,10 @@ fira-share-spill-slots
 Common Var(flag_ira_share_spill_slots) Init(1) Optimization
 Share stack slots for spilled pseudo-registers.
 
+ftrack-subreg-liveness
+Common Var(flag_track_subreg_liveness) Init(0) Optimization
+Track subreg liveness information.
+
 fira-verbose=
 Common RejectNegative Joined UInteger Var(flag_ira_verbose) Init(5)
 -fira-verbose= Control IRA's level of diagnostic messages.
diff --git a/gcc/common.opt.urls b/gcc/common.opt.urls
index f71ed80a34b..59f27a6f7c6 100644
--- a/gcc/common.opt.urls
+++ b/gcc/common.opt.urls
@@ -880,6 +880,9 @@ 
UrlSuffix(gcc/Optimize-Options.html#index-fira-share-save-slots)
 fira-share-spill-slots
 UrlSuffix(gcc/Optimize-Options.html#index-fira-share-spill-slots)
 
+ftrack-subreg-liveness
+UrlSuffix(gcc/Optimize-Options.html#index-ftrack-subreg-liveness)
+
 fira-verbose=
 UrlSuffix(gcc/Developer-Options.html#index-fira-verbose)
 
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index ddcd5213f06..fbcde8aa745 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13188,6 +13188,14 @@ Disable sharing of stack slots allocated for 
pseudo-registers.  Each
 pseudo-register that does not get a hard register gets a separate
 stack slot, and as a result function stack frames are larger.
 
+@opindex ftrack-subreg-liveness
+@item -ftrack-subreg-liveness
+Enable tracking subreg liveness information. This infomation allows IRA
+and LRA to support subreg coalesce feature which can improve the quality
+of register allocation.
+
+This option is enabled at level @option{-O3} for all targets.
+
 @opindex flra-remat
 @item -flra-remat
 Enable CFG-sensitive rematerialization in LRA.  Instead of loading
diff --git a/gcc/opts.cc b/gcc/opts.cc
index 14d1767e48f..8fe3a213807 100644
--- a/gcc/opts.cc
+++ b/gcc/opts.cc
@@ -698,6 +698,7 @@ static const struct default_options default_options_table[] 
=
 { OPT_LEVELS_3_PLUS, OPT_funswitch_loops, NULL, 1 },
 { OPT_LEVELS_3_PLUS, OPT_fvect_cost_model_, NULL, VECT_COST_MODEL_DYNAMIC 
},
 { OPT_LEVELS_3_PLUS, OPT_fversion_loops_for_strides, NULL, 1 },
+{ OPT_LEVELS_3_PLUS, OPT_ftrack_subreg_liveness, NULL, 1 },
 
 /* -O3 parameters.  */
 { OPT_LEVELS_3_PLUS, OPT__param_max_inline_insns_auto_, NULL, 30 },
-- 
2.36.3



[SUBREG V4 2/4] DF: Add DF_LIVE_SUBREG problem

2024-05-12 Thread Juzhe-Zhong
---
 gcc/Makefile.in  |   1 +
 gcc/df-problems.cc   | 886 ++-
 gcc/df.h | 159 +++
 gcc/regs.h   |   5 +
 gcc/sbitmap.cc   |  98 +
 gcc/sbitmap.h|   2 +
 gcc/subreg-live-range.cc | 233 ++
 gcc/subreg-live-range.h  |  60 +++
 gcc/timevar.def  |   1 +
 9 files changed, 1444 insertions(+), 1 deletion(-)
 create mode 100644 gcc/subreg-live-range.cc
 create mode 100644 gcc/subreg-live-range.h

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a7f15694c34..67d2e3ca1bc 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1684,6 +1684,7 @@ OBJS = \
store-motion.o \
streamer-hooks.o \
stringpool.o \
+   subreg-live-range.o \
substring-locations.o \
target-globals.o \
targhooks.o \
diff --git a/gcc/df-problems.cc b/gcc/df-problems.cc
index 88ee0dd67fc..01f1f850925 100644
--- a/gcc/df-problems.cc
+++ b/gcc/df-problems.cc
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "target.h"
 #include "rtl.h"
 #include "df.h"
+#include "subreg-live-range.h"
 #include "memmodel.h"
 #include "tm_p.h"
 #include "insn-config.h"
@@ -1344,8 +1345,891 @@ df_lr_verify_transfer_functions (void)
   bitmap_clear (&all_blocks);
 }
 
+/*
+   REGISTER AND SUBREGS LIVES
+   Like DF_LR, but include tracking subreg liveness.  Currently used to provide
+   subreg liveness related information to the register allocator.  The subreg
+   information is currently tracked for registers that satisfy the following
+   conditions:
+ 1.  REG is a pseudo register
+ 2.  MODE_SIZE > UNIT_SIZE
+ 3.  MODE_SIZE is a multiple of UNIT_SIZE
+ 4.  REG is used via subreg pattern
+   Assuming: MODE = the machine mode of the REG
+MODE_SIZE = GET_MODE_SIZE (MODE)
+UNIT_SIZE = REGMODE_NATURAL_SIZE (MODE)
+   Condition 3 is currently strict, maybe it can be removed in the future, but
+   for now it is sufficient.
+*/
+
+/* These two empty data are used as default data in case the user does not turn
+ * on the track-subreg-liveness feature.  */
+bitmap_head df_subreg_empty_bitmap;
+subregs_live df_subreg_empty_live;
+
+/* Private data for live_subreg problem.  */
+struct df_live_subreg_problem_data
+{
+  /* Record registers that need to track subreg liveness.  */
+  bitmap_head tracked_regs;
+  /* An obstack for the bitmaps we need for this problem.  */
+  bitmap_obstack live_subreg_bitmaps;
+};
+
+/* Helper functions.  */
+
+static df_live_subreg_bb_info *
+df_live_subreg_get_bb_info (unsigned int index)
+{
+  if (index < df_live_subreg->block_info_size)
+return &static_cast (
+  df_live_subreg->block_info)[index];
+  else
+return nullptr;
+}
+
+static df_live_subreg_local_bb_info *
+get_live_subreg_local_bb_info (unsigned int bb_index)
+{
+  return df_live_subreg_get_bb_info (bb_index);
+}
+
+/* Return true if regno is a multireg.  */
+bool
+multireg_p (int regno)
+{
+  if (regno < FIRST_PSEUDO_REGISTER)
+return false;
+  rtx regno_rtx = regno_reg_rtx[regno];
+  machine_mode reg_mode = GET_MODE (regno_rtx);
+  poly_int64 total_size = GET_MODE_SIZE (reg_mode);
+  poly_int64 natural_size = REGMODE_NATURAL_SIZE (reg_mode);
+  return maybe_gt (total_size, natural_size)
+&& multiple_p (total_size, natural_size);
+}
+
+/* Return true if the REGNO need be track with subreg liveness.  */
+
+static bool
+need_track_subreg_p (unsigned regno)
+{
+  auto problem_data
+= (struct df_live_subreg_problem_data *) df_live_subreg->problem_data;
+  return bitmap_bit_p (&problem_data->tracked_regs, regno);
+}
+
+/* Fill RANGE with the subreg range for OP in REGMODE_NATURAL_SIZE granularity.
+ */
+void
+init_range (rtx op, sbitmap range)
+{
+  rtx reg = SUBREG_P (op) ? SUBREG_REG (op) : op;
+  machine_mode reg_mode = GET_MODE (reg);
+
+  if (!read_modify_subreg_p (op))
+{
+  bitmap_set_range (range, 0, get_nblocks (reg_mode));
+  return;
+}
+
+  rtx subreg = op;
+  machine_mode subreg_mode = GET_MODE (subreg);
+  poly_int64 offset = SUBREG_BYTE (subreg);
+  int nblocks = get_nblocks (reg_mode);
+  poly_int64 unit_size = REGMODE_NATURAL_SIZE (reg_mode);
+  poly_int64 subreg_size = GET_MODE_SIZE (subreg_mode);
+  poly_int64 left = offset + subreg_size;
+
+  int subreg_start = -1;
+  int subreg_nblocks = -1;
+  for (int i = 0; i < nblocks; i += 1)
+{
+  poly_int64 right = unit_size * (i + 1);
+  if (subreg_start < 0 && maybe_lt (offset, right))
+   subreg_start = i;
+  if (subreg_nblocks < 0 && maybe_le (left, right))
+   {
+ subreg_nblocks = i + 1 - subreg_start;
+ break;
+   }
+}
+  gcc_assert (subreg_start >= 0 && subreg_nblocks > 0);
+
+  bitmap_set_range (range, subreg_start, subreg_nblocks);
+}
+
+/* Remove R

[SUBREG V4 3/4] IRA: Apply DF_LIVE_SUBREG data

2024-05-12 Thread Juzhe-Zhong
---
 gcc/ira-build.cc |  7 ---
 gcc/ira-color.cc |  8 
 gcc/ira-emit.cc  | 12 ++--
 gcc/ira-lives.cc |  7 ---
 gcc/ira.cc   | 19 ---
 5 files changed, 30 insertions(+), 23 deletions(-)

diff --git a/gcc/ira-build.cc b/gcc/ira-build.cc
index ea593d5a087..283ff36d3dd 100644
--- a/gcc/ira-build.cc
+++ b/gcc/ira-build.cc
@@ -1921,7 +1921,8 @@ create_bb_allocnos (ira_loop_tree_node_t bb_node)
   create_insn_allocnos (PATTERN (insn), NULL, false);
   /* It might be a allocno living through from one subloop to
  another.  */
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_in (bb), FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_in (bb), FIRST_PSEUDO_REGISTER,
+i, bi)
 if (ira_curr_regno_allocno_map[i] == NULL)
   ira_create_allocno (i, false, ira_curr_loop_tree_node);
 }
@@ -1937,9 +1938,9 @@ create_loop_allocnos (edge e)
   bitmap_iterator bi;
   ira_loop_tree_node_t parent;
 
-  live_in_regs = df_get_live_in (e->dest);
+  live_in_regs = df_get_subreg_live_in (e->dest);
   border_allocnos = ira_curr_loop_tree_node->border_allocnos;
-  EXECUTE_IF_SET_IN_REG_SET (df_get_live_out (e->src),
+  EXECUTE_IF_SET_IN_REG_SET (df_get_subreg_live_out (e->src),
 FIRST_PSEUDO_REGISTER, i, bi)
 if (bitmap_bit_p (live_in_regs, i))
   {
diff --git a/gcc/ira-color.cc b/gcc/ira-color.cc
index b9ae32d1b4d..bfebc48ef83 100644
--- a/gcc/ira-color.cc
+++ b/gcc/ira-color.cc
@@ -2786,8 +2786,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   FOR_EACH_EDGE (e, ei, loop_node->loop->header->preds)
if (e->src != loop_node->loop->latch
&& (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno
  freq += EDGE_FREQUENCY (e);
 }
   else
@@ -2795,8 +2795,8 @@ ira_loop_edge_freq (ira_loop_tree_node_t loop_node, int 
regno, bool exit_p)
   auto_vec edges = get_loop_exit_edges (loop_node->loop);
   FOR_EACH_VEC_ELT (edges, i, e)
if (regno < 0
-   || (bitmap_bit_p (df_get_live_out (e->src), regno)
-   && bitmap_bit_p (df_get_live_in (e->dest), regno)))
+   || (bitmap_bit_p (df_get_subreg_live_out (e->src), regno)
+   && bitmap_bit_p (df_get_subreg_live_in (e->dest), regno)))
  freq += EDGE_FREQUENCY (e);
 }
 
diff --git a/gcc/ira-emit.cc b/gcc/ira-emit.cc
index d347f11fa02..8075b082e36 100644
--- a/gcc/ira-emit.cc
+++ b/gcc/ira-emit.cc
@@ -510,8 +510,8 @@ generate_edge_moves (edge e)
 return;
   src_map = src_loop_node->regno_allocno_map;
   dest_map = dest_loop_node->regno_allocno_map;
-  regs_live_in_dest = df_get_live_in (e->dest);
-  regs_live_out_src = df_get_live_out (e->src);
+  regs_live_in_dest = df_get_subreg_live_in (e->dest);
+  regs_live_out_src = df_get_subreg_live_out (e->src);
   EXECUTE_IF_SET_IN_REG_SET (regs_live_in_dest,
 FIRST_PSEUDO_REGISTER, regno, bi)
 if (bitmap_bit_p (regs_live_out_src, regno))
@@ -1229,16 +1229,16 @@ add_ranges_and_copies (void)
 destination block) to use for searching allocnos by their
 regnos because of subsequent IR flattening.  */
   node = IRA_BB_NODE (bb)->parent;
-  bitmap_copy (live_through, df_get_live_in (bb));
+  bitmap_copy (live_through, df_get_subreg_live_in (bb));
   add_range_and_copies_from_move_list
(at_bb_start[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
-  bitmap_copy (live_through, df_get_live_out (bb));
+  bitmap_copy (live_through, df_get_subreg_live_out (bb));
   add_range_and_copies_from_move_list
(at_bb_end[bb->index], node, live_through, REG_FREQ_FROM_BB (bb));
   FOR_EACH_EDGE (e, ei, bb->succs)
{
- bitmap_and (live_through,
- df_get_live_in (e->dest), df_get_live_out (bb));
+ bitmap_and (live_through, df_get_subreg_live_in (e->dest),
+ df_get_subreg_live_out (bb));
  add_range_and_copies_from_move_list
((move_t) e->aux, node, live_through,
 REG_FREQ_FROM_EDGE_FREQ (EDGE_FREQUENCY (e)));
diff --git a/gcc/ira-lives.cc b/gcc/ira-lives.cc
index e07d3dc3e89..7641184069d 100644
--- a/gcc/ira-lives.cc
+++ b/gcc/ira-lives.cc
@@ -1254,7 +1254,8 @@ process_out_of_region_eh_regs (basic_block bb)
   if (! eh_p)
 return;
 
-  EXECUTE_IF_SET_IN_BITMAP (df_get_live_out (bb), FIRST_PSEUDO_REGISTER, i, bi)
+  EXECUTE_IF_SET_IN_BITMAP (df_get_subreg_live_out (bb), FIRST_PSEUDO_REGISTER,
+   i, bi)
 {
   ira_allocno_t a = ira_curr_regno_allocno_map[i];
   for (int n = ALLOCNO_NUM_OBJECTS (a) - 1; n >= 0; n--)
@@ -1288,7 +1289,7

Re: [COMMITTED] Regenerate cygming.opt.urls and mingw.opt.urls

2024-05-12 Thread Christophe Lyon
Thank you Mark and sorry for missing this during the reviews.

Christophe


Le dim. 12 mai 2024, 14:54, Mark Wielaard  a écrit :

> The new cygming.opt.urls and mingw.opt.urls in the
> gcc/config/mingw/cygming.opt.urls directory need to generated by make
> regenerate-opt-urls in the gcc subdirectory. They still contained
> references to the gcc/config/i386 directory from which they were
> copied.
>
> Fixes: 1f05dfc131c7 ("Reuse MinGW from i386 for AArch64")
> Fixes: e8d003736e6c ("Rename "x86 Windows Options" to "Cygwin and MinGW
> Options"")
>
> gcc/ChangeLog:
>
> * config/mingw/cygming.opt.urls: Regenerate.
> * config/mingw/mingw.opt.urls: Likewise.
> ---
>  gcc/config/mingw/cygming.opt.urls | 7 +++
>  gcc/config/mingw/mingw.opt.urls   | 2 +-
>  2 files changed, 4 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/config/mingw/cygming.opt.urls
> b/gcc/config/mingw/cygming.opt.urls
> index c624e22e4427..af11c4997609 100644
> --- a/gcc/config/mingw/cygming.opt.urls
> +++ b/gcc/config/mingw/cygming.opt.urls
> @@ -1,4 +1,4 @@
> -; Autogenerated by regenerate-opt-urls.py from
> gcc/config/i386/cygming.opt and generated HTML
> +; Autogenerated by regenerate-opt-urls.py from
> gcc/config/mingw/cygming.opt and generated HTML
>
>  mconsole
>  UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mconsole)
> @@ -9,9 +9,8 @@ UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mdll)
>  mnop-fun-dllimport
>  UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mnop-fun-dllimport)
>
> -; skipping UrlSuffix for 'mthreads' due to multiple URLs:
> -;   duplicate: 'gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1'
> -;   duplicate: 'gcc/x86-Options.html#index-mthreads'
> +mthreads
> +UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mthreads-1)
>
>  mwin32
>  UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mwin32)
> diff --git a/gcc/config/mingw/mingw.opt.urls
> b/gcc/config/mingw/mingw.opt.urls
> index f8ee5be6a535..40fb086606b2 100644
> --- a/gcc/config/mingw/mingw.opt.urls
> +++ b/gcc/config/mingw/mingw.opt.urls
> @@ -1,4 +1,4 @@
> -; Autogenerated by regenerate-opt-urls.py from gcc/config/i386/mingw.opt
> and generated HTML
> +; Autogenerated by regenerate-opt-urls.py from gcc/config/mingw/mingw.opt
> and generated HTML
>
>  mcrtdll=
>  UrlSuffix(gcc/Cygwin-and-MinGW-Options.html#index-mcrtdll)
> --
> 2.39.3
>
>


Re: [PATCHv2] rs6000: Enable overlapped by-pieces operations

2024-05-12 Thread Kewen.Lin
on 2024/5/10 17:29, HAO CHEN GUI wrote:
> Hi,
>   This patch enables overlapped by-piece operations. On rs6000, default
> move/set/clear ratio is 2. So the overlap is only enabled with compare
> by-pieces.
> 
>   Compared to previous version, the change is to remove power8
> requirement from test case.
> https://gcc.gnu.org/pipermail/gcc-patches/2024-May/651045.html
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?

OK,thanks!

BR,
Kewen

> 
> Thanks
> Gui Haochen
> 
> ChangeLog
> rs6000: Enable overlapped by-pieces operations
> 
> This patch enables overlapped by-piece operations by defining
> TARGET_OVERLAP_OP_BY_PIECES_P to true.  On rs6000, default move/set/clear
> ratio is 2.  So the overlap is only enabled with compare by-pieces.
> 
> gcc/
>   * config/rs6000/rs6000.cc (TARGET_OVERLAP_OP_BY_PIECES_P): Define.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/block-cmp-9.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
> index 117999613d8..e713a1e1d57 100644
> --- a/gcc/config/rs6000/rs6000.cc
> +++ b/gcc/config/rs6000/rs6000.cc
> @@ -1776,6 +1776,9 @@ static const scoped_attribute_specs *const 
> rs6000_attribute_table[] =
>  #undef TARGET_CONST_ANCHOR
>  #define TARGET_CONST_ANCHOR 0x8000
> 
> +#undef TARGET_OVERLAP_OP_BY_PIECES_P
> +#define TARGET_OVERLAP_OP_BY_PIECES_P hook_bool_void_true
> +
>  
> 
>  /* Processor table.  */
> diff --git a/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c 
> b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
> new file mode 100644
> index 000..f16429c2ffb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/block-cmp-9.c
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not {\ml[hb]z\M} } } */
> +
> +/* Test if by-piece overlap compare is enabled and following case is
> +   implemented by two overlap word loads and compares.  */
> +
> +int foo (const char* s1, const char* s2)
> +{
> +  return __builtin_memcmp (s1, s2, 7) == 0;
> +}



Ping [PATCH-1, rs6000] Add a new type of CC mode - CCBCD for bcd insns [PR100736, PR114732]

2024-05-12 Thread HAO CHEN GUI
Hi,
  Gently ping the series of patches.
[PATCH-1, rs6000]Add a new type of CC mode - CCBCD for bcd insns [PR100736, 
PR114732]
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650217.html
[PATCH-2, rs6000] Add a new type of CC mode - CCLTEQ
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650218.html
[PATCH-3, rs6000] Set CC mode of vector string isolate insns to CCEQ
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650219.html
[PATCH-4, rs6000] Optimize single cc bit reverse implementation
https://gcc.gnu.org/pipermail/gcc-patches/2024-April/650220.html
[PATCH-5, rs6000] Replace explicit CC bit reverse with common format
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650766.html
[PATCH-6, rs6000] Split setcc to two insns after reload
https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650856.html

Thanks
Gui Haochen

在 2024/4/30 15:18, HAO CHEN GUI 写道:
> Hi,
>   It's the first patch of a series of patches optimizing CC modes on
> rs6000.
> 
>   bcd insns set all four bits of a CR field. But it has different single
> bit reverse behavior than CCFP's. The forth bit of bcd cr fields is used
> to indict overflow or invalid number. It's not a bit for unordered test.
> So the "le" test should be reversed to "gt" not "ungt". The "ge" test
> should be reversed to "lt" not "unlt". That's the root cause of PR100736
> and PR114732.
> 
>   This patch fixes the issue by adding a new type of CC mode - CCBCD for
> all bcd insns. Here a new setcc_rev pattern is added for ccbcd. It will
> be merged to a uniform pattern which is for all CC modes in sequential
> patch.
> 
>   The rtl code "unordered" is still used for testing overflow or
> invalid number. IMHO, the "unordered" on a CC mode can be considered as
> testing the forth bit of a CR field setting or not. The "eq" on a CC mode
> can be considered as testing the third bit setting or not. Thus we avoid
> creating lots of unspecs for the CR bit testing.
> 
>   Bootstrapped and tested on powerpc64-linux BE and LE with no
> regressions. Is it OK for the trunk?
> 
> Thanks
> Gui Haochen
> 
> 
> ChangeLog
> rs6000: Add a new type of CC mode - CCBCD for bcd insns
> 
> gcc/
>   PR target/100736
>   PR target/114732
>   * config/rs6000/altivec.md (bcd_): Replace CCFP
>   with CCBCD.
>   (*bcd_test_): Likewise.
>   (*bcd_test2_): Likewise.
>   (bcd__): Likewise.
>   (*bcdinvalid_): Likewise.
>   (bcdinvalid_): Likewise.
>   (bcdshift_v16qi): Likewise.
>   (bcdmul10_v16qi): Likewise.
>   (bcddiv10_v16qi): Likewise.
>   (peephole for bcd_add/sub): Likewise.
>   * config/rs6000/predicates.md (branch_comparison_operator): Add CCBCD
>   and its supported comparison codes.
>   * config/rs6000/rs6000-modes.def (CC_MODE): Add CCBCD.
>   * config/rs6000/rs6000.cc (validate_condition_mode): Add CCBCD
>   assertion.
>   * config/rs6000/rs6000.md (CC_any): Add CCBCD.
>   (ccbcd_rev): New code iterator.
>   (*_cc): New insn and split pattern for CCBCD reverse
>   compare.
> 
> gcc/testsuite/
>   PR target/100736
>   PR target/114732
>   * gcc.target/powerpc/pr100736.c: New.
>   * gcc.target/powerpc/pr114732.c: New.
> 
> patch.diff
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index bb20441c096..9fa8cf89f61 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -4443,7 +4443,7 @@ (define_insn "bcd_"
> (match_operand:VBCD 2 "register_operand" "v")
> (match_operand:QI 3 "const_0_to_1_operand" "n")]
>UNSPEC_BCD_ADD_SUB))
> -   (clobber (reg:CCFP CR6_REGNO))]
> +   (clobber (reg:CCBCD CR6_REGNO))]
>"TARGET_P8_VECTOR"
>"bcd. %0,%1,%2,%3"
>[(set_attr "type" "vecsimple")])
> @@ -4454,8 +4454,8 @@ (define_insn "bcd_"
>  ;; probably should be one that can go in the VMX (Altivec) registers, so we
>  ;; can't use DDmode or DFmode.
>  (define_insn "*bcd_test_"
> -  [(set (reg:CCFP CR6_REGNO)
> - (compare:CCFP
> +  [(set (reg:CCBCD CR6_REGNO)
> + (compare:CCBCD
>(unspec:V2DF [(match_operand:VBCD 1 "register_operand" "v")
>  (match_operand:VBCD 2 "register_operand" "v")
>  (match_operand:QI 3 "const_0_to_1_operand" "i")]
> @@ -4472,8 +4472,8 @@ (define_insn "*bcd_test2_"
> (match_operand:VBCD 2 "register_operand" "v")
> (match_operand:QI 3 "const_0_to_1_operand" "i")]
>UNSPEC_BCD_ADD_SUB))
> -   (set (reg:CCFP CR6_REGNO)
> - (compare:CCFP
> +   (set (reg:CCBCD CR6_REGNO)
> + (compare:CCBCD
>(unspec:V2DF [(match_dup 1)
>  (match_dup 2)
>  (match_dup 3)]
> @@ -4566,8 +4566,8 @@ (define_insn "vclrrb"
> [(set_attr "type" "vecsimple")])
> 
>  (define_expand "bcd__"
> -  [(parallel [(set (reg:CCFP CR6_REGNO)
> -(compare:CCFP
> +  [(parallel [(set (reg:CCBCD CR6_REGNO)
> + 

Re: Re: [PATCH v1 1/1] RISC-V: Nan-box the result of movbf on soft-bf16

2024-05-12 Thread Xiao Zeng
2024-05-09 04:01  Jeff Law  wrote:
>
>
>
>On 5/7/24 6:38 PM, Xiao Zeng wrote:
>> 1 This patch implements the Nan-box of bf16.
>>
>> 2 Please refer to the Nan-box implementation of hf16 in:
>> 
>>
>> 3 The discussion about Nan-box can be found on the website:
>> 
>>
>> 4 Below test are passed for this patch
>>  * The riscv fully regression test.
>>
>> gcc/ChangeLog:
>>
>> * config/riscv/riscv.cc (riscv_legitimize_move): Expand movbf
>> with Nan-boxing value.
>> * config/riscv/riscv.md (*movbf_softfloat_boxing): New pattern.
>>
>> gcc/testsuite/ChangeLog:
>>
>> * gcc.target/riscv/_Bfloat16-nanboxing.c: New test.
>> ---
>>   gcc/config/riscv/riscv.cc | 51 ++-
>>   gcc/config/riscv/riscv.md | 12 -
>>   .../gcc.target/riscv/_Bfloat16-nanboxing.c    | 38 ++
>>   3 files changed, 76 insertions(+), 25 deletions(-)
>>   create mode 100644 gcc/testsuite/gcc.target/riscv/_Bfloat16-nanboxing.c
>>
>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
>> index 545e68566dc..be2cb245733 100644
>> --- a/gcc/config/riscv/riscv.cc
>> +++ b/gcc/config/riscv/riscv.cc
>> @@ -3120,35 +3120,38 @@ riscv_legitimize_move (machine_mode mode, rtx dest, 
>> rtx src)
>
>>  
>> - if (TARGET_HARD_FLOAT
>> - && !TARGET_ZFHMIN && mode == HFmode
>> - && REG_P (dest) && FP_REG_P (REGNO (dest))
>> - && REG_P (src) && !FP_REG_P (REGNO (src))
>> - && can_create_pseudo_p ())
>[ ... ]
>
>> +  if (TARGET_HARD_FLOAT
>> +  && ((!TARGET_ZFHMIN && mode == HFmode)
>> +  || (!TARGET_ZFBFMIN && mode == BFmode))
>> +  && REG_P (dest) && FP_REG_P (REGNO (dest)) && REG_P (src)
>> +  && !FP_REG_P (REGNO (src)) && can_create_pseudo_p ())
>
>So there's a bit of gratutious rewriting going on here.  I realize you
>were fixing formatting problems (thanks!), 
> but I don't see a need to rewriting the tests starting with REG_P.  I put 
> those back in their
>original form with the whitespace fixes. 
Thanks

>
>I'll push the fixed version momentarily. 
Thanks

>
>Thanks again!
>
>jeff
> 
In the past few days, I have been on vacation. Okay, let's continue to push 
forward with bf16.

Thanks
Xiao Zeng



[PATCH] Don't reduce estimated unrolled size for innermost loop.

2024-05-12 Thread liuhongt
As testcase in the PR, O3 cunrolli may prevent vectorization for the
innermost loop and increase register pressure.
The patch removes the 1/3 reduction of unr_insn for innermost loop for UL_ALL.
ul != UR_ALL is needed since some small loop complete unrolling at O2 relies
the reduction.

Bootstrapped and regtested on x86_64-pc-linux-gnu{-m32,}.
No big impact for SPEC2017.
Ok for trunk?

gcc/ChangeLog:

PR tree-optimization/112325
* tree-ssa-loop-ivcanon.cc (estimated_unrolled_size): Add 2
new parameters: loop and ul, and remove unr_insns reduction
for innermost loop.
(try_unroll_loop_completely): Pass loop and ul to
estimated_unrolled_size.

gcc/testsuite/ChangeLog:

* gcc.dg/tree-ssa/pr112325.c: New test.
* gcc.dg/vect/pr69783.c: Add extra option --param
max-completely-peeled-insns=300.
---
 gcc/testsuite/gcc.dg/tree-ssa/pr112325.c | 57 
 gcc/testsuite/gcc.dg/vect/pr69783.c  |  2 +-
 gcc/tree-ssa-loop-ivcanon.cc | 16 +--
 3 files changed, 71 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr112325.c

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c
new file mode 100644
index 000..14208b3e7f8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr112325.c
@@ -0,0 +1,57 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
+
+typedef unsigned short ggml_fp16_t;
+static float table_f32_f16[1 << 16];
+
+inline static float ggml_lookup_fp16_to_fp32(ggml_fp16_t f) {
+unsigned short s;
+__builtin_memcpy(&s, &f, sizeof(unsigned short));
+return table_f32_f16[s];
+}
+
+typedef struct {
+ggml_fp16_t d;
+ggml_fp16_t m;
+unsigned char qh[4];
+unsigned char qs[32 / 2];
+} block_q5_1;
+
+typedef struct {
+float d;
+float s;
+char qs[32];
+} block_q8_1;
+
+void ggml_vec_dot_q5_1_q8_1(const int n, float * restrict s, const void * 
restrict vx, const void * restrict vy) {
+const int qk = 32;
+const int nb = n / qk;
+
+const block_q5_1 * restrict x = vx;
+const block_q8_1 * restrict y = vy;
+
+float sumf = 0.0;
+
+for (int i = 0; i < nb; i++) {
+unsigned qh;
+__builtin_memcpy(&qh, x[i].qh, sizeof(qh));
+
+int sumi = 0;
+
+for (int j = 0; j < qk/2; ++j) {
+const unsigned char xh_0 = ((qh >> (j + 0)) << 4) & 0x10;
+const unsigned char xh_1 = ((qh >> (j + 12)) ) & 0x10;
+
+const int x0 = (x[i].qs[j] & 0xF) | xh_0;
+const int x1 = (x[i].qs[j] >> 4) | xh_1;
+
+sumi += (x0 * y[i].qs[j]) + (x1 * y[i].qs[j + qk/2]);
+}
+
+sumf += (ggml_lookup_fp16_to_fp32(x[i].d)*y[i].d)*sumi + 
ggml_lookup_fp16_to_fp32(x[i].m)*y[i].s;
+}
+
+*s = sumf;
+}
+
+/* { dg-final { scan-tree-dump {(?n)Not unrolling loop [1-9] \(--param 
max-completely-peel-times limit reached} "cunrolli"} } */
diff --git a/gcc/testsuite/gcc.dg/vect/pr69783.c 
b/gcc/testsuite/gcc.dg/vect/pr69783.c
index 5df95d0ce4e..a1f75514d72 100644
--- a/gcc/testsuite/gcc.dg/vect/pr69783.c
+++ b/gcc/testsuite/gcc.dg/vect/pr69783.c
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-require-effective-target vect_float } */
-/* { dg-additional-options "-Ofast -funroll-loops" } */
+/* { dg-additional-options "-Ofast -funroll-loops --param 
max-completely-peeled-insns=300" } */
 
 #define NXX 516
 #define NYY 516
diff --git a/gcc/tree-ssa-loop-ivcanon.cc b/gcc/tree-ssa-loop-ivcanon.cc
index bf017137260..5e0eca647a1 100644
--- a/gcc/tree-ssa-loop-ivcanon.cc
+++ b/gcc/tree-ssa-loop-ivcanon.cc
@@ -444,7 +444,9 @@ tree_estimate_loop_size (class loop *loop, edge exit, edge 
edge_to_cancel,
 
 static unsigned HOST_WIDE_INT
 estimated_unrolled_size (struct loop_size *size,
-unsigned HOST_WIDE_INT nunroll)
+unsigned HOST_WIDE_INT nunroll,
+enum unroll_level ul,
+class loop* loop)
 {
   HOST_WIDE_INT unr_insns = ((nunroll)
 * (HOST_WIDE_INT) (size->overall
@@ -453,7 +455,15 @@ estimated_unrolled_size (struct loop_size *size,
 unr_insns = 0;
   unr_insns += size->last_iteration - 
size->last_iteration_eliminated_by_peeling;
 
-  unr_insns = unr_insns * 2 / 3;
+  /* For innermost loop, loop body is not likely to be simplied as much as 1/3.
+ and may increase a lot of register pressure.
+ UL != UL_ALL is need to unroll small loop at O2.  */
+  class loop *loop_father = loop_outer (loop);
+  if (loop->inner || !loop_father
+  || loop_father->latch == EXIT_BLOCK_PTR_FOR_FN (cfun)
+  || ul != UL_ALL)
+unr_insns = unr_insns * 2 / 3;
+
   if (unr_insns <= 0)
 unr_insns = 1;
 
@@ -837,7 +847,7 @@ try_unroll_loop_completely (class loop *loop,
 
  unsigned HOST_WIDE_INT ninsns = size.overall;
  unsigned HOST_WIDE_INT unr_insns
-   = es

MAINTAINERS: Add myself to write after approval

2024-05-12 Thread Xiao Zeng
ChangeLog:

* MAINTAINERS: Add myself.
---
 MAINTAINERS | 1 +
 1 file changed, 1 insertion(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 71e02abc426..361059fd55c 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -738,6 +738,7 @@ Kwok Cheung Yeung   

 Greta Yorsh
 David Yuste
 Adhemerval Zanella 
+Xiao Zeng   
 Dennis Zhang   
 Yufeng Zhang   
 Qing Zhao  
-- 
2.17.1



[PATCH] report message for operator %a on unaddressible exp

2024-05-12 Thread Jiufu Guo
Hi,

For PR96866, when gcc print asm code for modifier "%a" which requires
an address operand, while the operand is with the constraint "X" which
allow non-address form.  An error message would be reported to indicate
the invalid asm operands.

Bootstrap®test pass on ppc64{,le}.
Is this ok for trunk?

BR,
Jeff(Jiufu Guo)

PR target/96866

gcc/ChangeLog:

* config/rs6000/rs6000.cc (print_operand_address):

gcc/testsuite/ChangeLog:

* gcc.target/powerpc/pr96866-1.c: New test.
* gcc.target/powerpc/pr96866-2.c: New test.

---
 gcc/config/rs6000/rs6000.cc  |  6 ++
 gcc/testsuite/gcc.target/powerpc/pr96866-1.c | 15 +++
 gcc/testsuite/gcc.target/powerpc/pr96866-2.c | 10 ++
 3 files changed, 31 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-1.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/pr96866-2.c

diff --git a/gcc/config/rs6000/rs6000.cc b/gcc/config/rs6000/rs6000.cc
index 117999613d8..50943d76f79 100644
--- a/gcc/config/rs6000/rs6000.cc
+++ b/gcc/config/rs6000/rs6000.cc
@@ -14659,6 +14659,12 @@ print_operand_address (FILE *file, rtx x)
   else if (SYMBOL_REF_P (x) || GET_CODE (x) == CONST
   || GET_CODE (x) == LABEL_REF)
 {
+  if (this_is_asm_operands && !address_operand (x, VOIDmode))
+   {
+ output_operand_lossage ("invalid expression as operand");
+ return;
+   }
+
   output_addr_const (file, x);
   if (small_data_operand (x, GET_MODE (x)))
fprintf (file, "@%s(%s)", SMALL_DATA_RELOC,
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-1.c 
b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
new file mode 100644
index 000..6554a472a11
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr96866-1.c
@@ -0,0 +1,15 @@
+/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  */
+/* { dg-excess-errors "pr96866-2.c" } */
+/* { dg-options "-fPIC -O2" } */
+
+int x[2];
+
+int __attribute__ ((noipa))
+f1 (void)
+{
+  int n;
+  int *p = x;
+  *p++;
+  __asm__ volatile("ld %0, %a1" : "=r"(n) : "X"(p));
+  return n;
+}
diff --git a/gcc/testsuite/gcc.target/powerpc/pr96866-2.c 
b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
new file mode 100644
index 000..a5ec96f29dd
--- /dev/null
+++ b/gcc/testsuite/gcc.target/powerpc/pr96866-2.c
@@ -0,0 +1,10 @@
+/* It's to verify no ICE here, ignore error messages about invalid 'asm'.  */
+/* { dg-excess-errors "pr96866-2.c" } */
+/* { dg-options "-fPIC -O2" } */
+
+void
+f (void)
+{
+  extern int x;
+  __asm__ volatile("#%a0" ::"X"(&x));
+}
-- 
2.25.1



[PATCH] c++/modules: Remember that header units have CMIs

2024-05-12 Thread Nathaniel Shead
Bootstrapped and regtested on x86_64-pc-linux-gnu, OK for trunk?

-- >8 --

This appears to be an oversight in the definition of module_has_cmi_p;
this comes up transitively in other functions used for e.g. determining
whether a name could potentially be accessed in a different translation
unit.

This change will allow us to use the function directly in more places
that need to additional work only if generating a module CMI in the
future.

gcc/cp/ChangeLog:

* cp-tree.h (module_has_cmi_p): Also true for header units.

gcc/testsuite/ChangeLog:

* g++.dg/modules/linkage-3_a.H: New test.
* g++.dg/modules/linkage-3_b.C: New test.
* g++.dg/modules/linkage-3_c.C: New test.

Signed-off-by: Nathaniel Shead 
---
 gcc/cp/cp-tree.h   |  2 +-
 gcc/testsuite/g++.dg/modules/linkage-3_a.H | 19 +++
 gcc/testsuite/g++.dg/modules/linkage-3_b.C |  9 +
 gcc/testsuite/g++.dg/modules/linkage-3_c.C | 10 ++
 4 files changed, 39 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/g++.dg/modules/linkage-3_a.H
 create mode 100644 gcc/testsuite/g++.dg/modules/linkage-3_b.C
 create mode 100644 gcc/testsuite/g++.dg/modules/linkage-3_c.C

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index db098c32f2d..609904918ba 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -7379,7 +7379,7 @@ inline bool module_interface_p ()
 inline bool module_partition_p ()
 { return module_kind & MK_PARTITION; }
 inline bool module_has_cmi_p ()
-{ return module_kind & (MK_INTERFACE | MK_PARTITION); }
+{ return module_kind & (MK_INTERFACE | MK_PARTITION | MK_HEADER); }
 
 inline bool module_purview_p ()
 { return module_kind & MK_PURVIEW; }
diff --git a/gcc/testsuite/g++.dg/modules/linkage-3_a.H 
b/gcc/testsuite/g++.dg/modules/linkage-3_a.H
new file mode 100644
index 000..1e0ebd927e2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/linkage-3_a.H
@@ -0,0 +1,19 @@
+// { dg-additional-options "-fmodule-header -Wno-error=c++20-extensions" }
+// { dg-module-cmi {} }
+
+// Like linkage-1, but for header units.
+
+// External linkage definitions must be declared as 'inline' to satisfy
+// [module.import] p6, so we don't need to care about voldemort types in
+// function definitions.
+
+// Strictly speaking this is not required to be supported:
+// [module.import] p5 says that when two different TUs import header-names
+// identifying the same header or source file, it is unspecified whether
+// they import the same header unit, and thus 's' could be a different entity
+// in each TU.  But with out current implementation this seems to reasonable to
+// allow (and it does currently work).
+
+struct { int y; } s;
+decltype(s) f();  // { dg-warning "used but not defined" "" { target 
c++17_down } }
+inline auto x = f();
diff --git a/gcc/testsuite/g++.dg/modules/linkage-3_b.C 
b/gcc/testsuite/g++.dg/modules/linkage-3_b.C
new file mode 100644
index 000..935ef6150ec
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/linkage-3_b.C
@@ -0,0 +1,9 @@
+// { dg-additional-options "-fmodules-ts" }
+
+struct {} unrelated;
+
+import "linkage-3_a.H";
+
+decltype(s) f() {
+  return { 123 };
+}
diff --git a/gcc/testsuite/g++.dg/modules/linkage-3_c.C 
b/gcc/testsuite/g++.dg/modules/linkage-3_c.C
new file mode 100644
index 000..be527ff552d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/modules/linkage-3_c.C
@@ -0,0 +1,10 @@
+// { dg-module-do run }
+// { dg-additional-options "-fmodules-ts" }
+
+import "linkage-3_a.H";
+
+int main() {
+  auto a = x.y;
+  if (a != 123)
+__builtin_abort();
+}
-- 
2.43.2



[PATCH][_Hashtable] Fix some implementation inconsistencies

2024-05-12 Thread François Dumont

    libstdc++: [_Hashtable] Fix some implementation inconsistencies

    Get rid of the different usages of the mutable keyword except in
    _Prime_rehash_policy where it is preserved for abi compatibility 
reason.


    Fix comment to explain that we need the computation of bucket index 
noexcept

    to be able to rehash the container when needed.

    For Standard instantiations through std::unordered_xxx containers 
we already
    force caching of hash code when hash functor is not noexcep so it 
is guarantied.


    The static_assert purpose in _Hashtable on _M_bucket_index is thus 
limited

    to usages of _Hashtable with exotic _Hashtable_traits.

    libstdc++-v3/ChangeLog:

    * include/bits/hashtable_policy.h 
(_NodeBuilder<>::_S_build): Remove

    const qualification on _NodeGenerator instance.
(_ReuseOrAllocNode<>::operator()(_Args&&...)): Remove const qualification.
    (_ReuseOrAllocNode<>::_M_nodes): Remove mutable.
    (_Insert_base<>::_M_insert_range): Remove _NodeGetter const 
qualification.
    (_Hash_code_base<>::_M_bucket_index(const 
_Hash_node_value<>&, size_t)):
    Simplify noexcept declaration, we already static_assert 
that _RangeHash functor

    is noexcept.
    * include/bits/hashtable.h: Rework comments. Remove const 
qualifier on

    _NodeGenerator& arguments.

Tested under Linux x64, ok to commit ?

François

diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index cd3e1ac297c..e8e51714d72 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -48,7 +48,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 using __cache_default
   =  __not_<__and_,
-  // Mandatory to have erase not throwing.
+  // Mandatory for the rehash process.
   __is_nothrow_invocable>>;
 
   // Helper to conditionally delete the default constructor.
@@ -481,7 +481,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
void
-   _M_assign(_Ht&&, const _NodeGenerator&);
+   _M_assign(_Ht&&, _NodeGenerator&);
 
   void
   _M_move_assign(_Hashtable&&, true_type);
@@ -919,7 +919,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
std::pair
-   _M_insert_unique(_Kt&&, _Arg&&, const _NodeGenerator&);
+   _M_insert_unique(_Kt&&, _Arg&&, _NodeGenerator&);
 
   template
static __conditional_t<
@@ -939,7 +939,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
std::pair
-   _M_insert_unique_aux(_Arg&& __arg, const _NodeGenerator& __node_gen)
+   _M_insert_unique_aux(_Arg&& __arg, _NodeGenerator& __node_gen)
{
  return _M_insert_unique(
_S_forward_key(_ExtractKey{}(std::forward<_Arg>(__arg))),
@@ -948,7 +948,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
std::pair
-   _M_insert(_Arg&& __arg, const _NodeGenerator& __node_gen,
+   _M_insert(_Arg&& __arg, _NodeGenerator& __node_gen,
  true_type /* __uks */)
{
  using __to_value
@@ -959,7 +959,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   template
iterator
-   _M_insert(_Arg&& __arg, const _NodeGenerator& __node_gen,
+   _M_insert(_Arg&& __arg, _NodeGenerator& __node_gen,
  false_type __uks)
{
  using __to_value
@@ -972,7 +972,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
iterator
_M_insert(const_iterator, _Arg&& __arg,
- const _NodeGenerator& __node_gen, true_type __uks)
+ _NodeGenerator& __node_gen, true_type __uks)
{
  return
_M_insert(std::forward<_Arg>(__arg), __node_gen, __uks).first;
@@ -982,7 +982,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
iterator
_M_insert(const_iterator, _Arg&&,
- const _NodeGenerator&, false_type __uks);
+ _NodeGenerator&, false_type __uks);
 
   size_type
   _M_erase(true_type __uks, const key_type&);
@@ -1414,7 +1414,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   void
   _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal,
 _Hash, _RangeHash, _Unused, _RehashPolicy, _Traits>::
-  _M_assign(_Ht&& __ht, const _NodeGenerator& __node_gen)
+  _M_assign(_Ht&& __ht, _NodeGenerator& __node_gen)
   {
__buckets_ptr __buckets = nullptr;
if (!_M_buckets)
@@ -1656,8 +1656,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 ~_Hashtable() noexcept
 {
   // Getting a bucket index from a node shall not throw because it is used
-  // in methods (erase, swap...) that shall not throw. Need a complete
-  // type to check this, so do it in the destructor not at class scope.
+  // during the rehash process. This static_assert purpose is limited to 
usage
+  // of _Hashtable with _Hashtable_traits requesting non-cached hash code.
+  

[to-be-committed] [RISC-V] Improve single inverted bit extraction - v3

2024-05-12 Thread Jeff Law


The only change in v2 vs v3 is testsuite adjustments for the updated 
sequences and fixing the name of the second pattern.


--


So this patch fixes a minor code generation inefficiency that (IIRC) the
RAU team discovered a while ago in spec.

If we want the inverted value of a single bit we can use bext to extract
the bit, then seq to invert the value (if viewed as a 0/1 truth value).

The RTL is fairly convoluted, but it's basically a right shift to get
the bit into position, bitwise-not then masking off all but the low bit.
So it's a 3->2 combine, hidden by the fact that and-not is a
define_insn_and_split, so it actually looks like a 2->2 combine.

We've run this through Ventana's internal CI (which includes
zba_zbb_zbs) and I've run it in my own tester (rv64gc, rv32gcv).  I'll
wait for the upstream CI to finish with positive results before pushing.

Jeff

gcc/
* config/riscv/bitmanip.md (bextseqzdisi): New patterns.

gcc/testsuite/

* gcc.target/riscv/zbs-bext-2.c: New test.
* gcc.target/riscv/zbs-bext.c: Fix one of the possible expectes 
sequences.


diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index d76a72d30e0..724511b6df3 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -711,6 +711,49 @@ (define_insn "*bext"
   "bext\t%0,%1,%2"
   [(set_attr "type" "bitmanip")])
 
+;; This is a bext followed by a seqz.  Normally this would be a 3->2 split
+;; But the and-not pattern with a constant operand is a define_insn_and_split,
+;; so this looks like a 2->2 split, which combine rejects.  So implement it
+;; as a define_insn_and_split as well.
+(define_insn_and_split "*bextseqzdisi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+   (and:DI
+ (not:DI
+   (subreg:DI
+ (lshiftrt:SI
+   (match_operand:SI 1 "register_operand" "r")
+   (match_operand:QI 2 "register_operand" "r")) 0))
+ (const_int 1)))]
+  "TARGET_64BIT && TARGET_ZBS"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (zero_extract:DI (match_dup 1)
+(const_int 1)
+(zero_extend:DI (match_dup 2
+   (set (match_dup 0) (eq:DI (match_dup 0) (const_int 0)))]
+  "operands[1] = gen_lowpart (word_mode, operands[1]);"
+  [(set_attr "type" "bitmanip")])
+
+(define_insn_and_split "*bextseqz"
+  [(set (match_operand:X 0 "register_operand" "=r")
+   (and:X
+ (not:X
+   (lshiftrt:X
+ (match_operand:X 1 "register_operand" "r")
+ (match_operand:QI 2 "register_operand" "r")))
+ (const_int 1)))]
+  "TARGET_ZBS"
+  "#"
+  "&& 1"
+  [(set (match_dup 0)
+   (zero_extract:X (match_dup 1)
+   (const_int 1)
+   (zero_extend:X (match_dup 2
+   (set (match_dup 0) (eq:X (match_dup 0) (const_int 0)))]
+  "operands[1] = gen_lowpart (word_mode, operands[1]);"
+  [(set_attr "type" "bitmanip")])
+
 ;; When performing `(a & (1UL << bitno)) ? 0 : -1` the combiner
 ;; usually has the `bitno` typed as X-mode (i.e. no further
 ;; zero-extension is performed around the bitno).
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c
new file mode 100644
index 000..79f120b2286
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bext-2.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" } } */
+
+
+_Bool match(const int ch, int fMap) {
+return ((fMap & (1<<(ch))) == 0);
+}
+
+_Bool match2(const int ch, int fMap) {
+return ((fMap & (1UL<<(ch))) == 0);
+}
+
+
+/* { dg-final { scan-assembler-times "bext\t" 2 } } */
+/* { dg-final { scan-assembler-times "seqz\t|xori\t" 2 } } */
+/* { dg-final { scan-assembler-not "sraw\t" } } */
+/* { dg-final { scan-assembler-not "not\t" } } */
+/* { dg-final { scan-assembler-not "andi\t" } } */
diff --git a/gcc/testsuite/gcc.target/riscv/zbs-bext.c 
b/gcc/testsuite/gcc.target/riscv/zbs-bext.c
index ff75dad6528..0db97f5ab59 100644
--- a/gcc/testsuite/gcc.target/riscv/zbs-bext.c
+++ b/gcc/testsuite/gcc.target/riscv/zbs-bext.c
@@ -38,7 +38,7 @@ long bext64_4(long a, char bitno)
 
 /* { dg-final { scan-assembler-times "bexti\t" 1 } } */
 /* { dg-final { scan-assembler-times "bext\t" 5 } } */
-/* { dg-final { scan-assembler-times "xori\t|snez\t" 1 } } */
+/* { dg-final { scan-assembler-times "xori\t|seqz\t" 1 } } */
 /* { dg-final { scan-assembler-times "addi\t" 1 } } */
 /* { dg-final { scan-assembler-times "neg\t" 1 } } */
 /* { dg-final { scan-assembler-not {\mandi} } } */


Re: Fix gnu versioned namespace mode 00/03

2024-05-12 Thread François Dumont



On 07/05/2024 18:15, Iain Sandoe wrote:

Hi François


On 4 May 2024, at 22:11, François Dumont  wrote:

Here is the list of patches to restore gnu versioned namespace mode.

1/3: Bump gnu version namespace

This is important to be done first so that once build of gnu versioned 
namespace is fixed there is no chance to have another build of '__8' version 
with a different abi than last successful '__8' build.

2/3: Fix build using cxx11 abi for versioned namespace

3/3: Proposal to default to "new" abi when dual abi is disabled and accept any 
default-libstdcxx-abi either dual abi is enabled or not.

All testsuite run for following configs:

- dual abi

- gcc4-compatible only abi

- new only abi

- versioned namespace abi

At the risk of delaying this (a bit) - I think we should also consider items 
like once_call that have broken impls.
Do you have any pointer to this once_call problem, sorry I'm not aware 
about it (apart from your messages).

  in the current library - and at least get proposed replacements available 
behind the versioned namespace; rather than using up a namespace version with 
the current broken code.


I'm not proposing to fix all library bugs on all platforms with this 
patch, just fix the versioned namespace mode.


As to do so I also need to adopt cxx11 abi in versioned mode it already 
justify a bump of version.




I have a proposed once_call replacement (but I think Jonathan also has one or 
more alternatives there)

Please can we try to identify any other similar blocked fixes?


How ? We can only count on bugzilla bug reports to do so, no ?

If we face another similar problem in the future, after gcc 15 release, 
then we'll just have to bump again. Is it such a problem ?


The reason I'm proposing to integrate this patch this early in gcc 15 
stage is to have time to integrate any other library fix/optimization 
that could make use of it. I already have 1 on my side for the hashtable 
implementation. I hope your once_call fix also have time to be ready for 
gcc 15, no ?


François



Re: [PATCH] internal-fn: Do not force vcond operand to reg.

2024-05-12 Thread Robin Dapp
> How does this make a difference in the end?  I'd expect say forwprop to
> fix things?

In general we try to only add the masking "boilerplate" of our
instructions at split time so fwprop, combine et al. can do their
work uninhibited of it (and we don't need numerous
(if_then_else ... (if_then_else) ...) combinations in our patterns).
A vec constant we expand directly to a masked representation, though
which makes further simplification difficult.  I can experiment with
changing that if preferred.

My thinking was, however, that for other operations like binops we
directly emit the right variant via expand_operands without
forcing to a reg and don't even need to fwprop so I wanted to
imitate that.

Regards
 Robin



Re: [PATCH 1/13] rs6000, Remove __builtin_vsx_cmple* builtins

2024-05-12 Thread Kewen.Lin
Hi,

on 2024/4/20 05:16, Carl Love wrote:
> 
> rs6000, Remove __builtin_vsx_cmple* builtins
> 
> The built-ins __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
> __builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi should take
> unsigned arguments and return an unsigned result.  The current definitions
> take signed arguments and return signed results which is incorrect.
> 
> The signed and unsigned versions of __builtin_vsx_cmple* are not
> documented in extend.texi.  Also there are no test cases for the
> built-ins.
> 
> Users can use the existing vec_cmple as PVIPR defines instead of
> __builtin_vsx_cmple_u16qi, __builtin_vsx_cmple_u2di,
> __builtin_vsx_cmple_u4si and __builtin_vsx_cmple_u8hi,
> __builtin_vsx_cmple_16qi, __builtin_vsx_cmple_2di,
> __builtin_vsx_cmple_4si and __builtin_vsx_cmple_8hi,
> __builtin_altivec_cmple_1ti, __builtin_altivec_cmple_u1ti.
> 
> Hence these built-ins are redundant and are removed by this patch.

OK for trunk, thanks.

BR,
Kewen

> 
> gcc/ChangeLog:
>   * config/rs6000/rs6000-builtin.cc (RS6000_BIF_CMPLE_16QI,
>   RS6000_BIF_CMPLE_U16QI, RS6000_BIF_CMPLE_8HI,
>   RS6000_BIF_CMPLE_U8HI, RS6000_BIF_CMPLE_4SI, RS6000_BIF_CMPLE_U4SI,
>   RS6000_BIF_CMPLE_2DI, RS6000_BIF_CMPLE_U2DI, RS6000_BIF_CMPLE_1TI,
>   RS6000_BIF_CMPLE_U1TI): Remove case statements.
>   config/rs6000/rs6000-builtins.def (__builtin_vsx_cmple_16qi,
>   __builtin_vsx_cmple_2di, __builtin_vsx_cmple_4si,
>   __builtin_vsx_cmple_8hi, __builtin_vsx_cmple_u16qi,
>   __builtin_vsx_cmple_u2di, __builtin_vsx_cmple_u4si,
>   __builtin_vsx_cmple_u8hi): Remove buit-in definitions.
> ---
>  gcc/config/rs6000/rs6000-builtin.cc   | 13 
>  gcc/config/rs6000/rs6000-builtins.def | 30 ---
>  2 files changed, 43 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-builtin.cc 
> b/gcc/config/rs6000/rs6000-builtin.cc
> index 320affd79e3..ac9f16fe51a 100644
> --- a/gcc/config/rs6000/rs6000-builtin.cc
> +++ b/gcc/config/rs6000/rs6000-builtin.cc
> @@ -2027,19 +2027,6 @@ rs6000_gimple_fold_builtin (gimple_stmt_iterator *gsi)
>fold_compare_helper (gsi, GT_EXPR, stmt);
>return true;
>  
> -case RS6000_BIF_CMPLE_16QI:
> -case RS6000_BIF_CMPLE_U16QI:
> -case RS6000_BIF_CMPLE_8HI:
> -case RS6000_BIF_CMPLE_U8HI:
> -case RS6000_BIF_CMPLE_4SI:
> -case RS6000_BIF_CMPLE_U4SI:
> -case RS6000_BIF_CMPLE_2DI:
> -case RS6000_BIF_CMPLE_U2DI:
> -case RS6000_BIF_CMPLE_1TI:
> -case RS6000_BIF_CMPLE_U1TI:
> -  fold_compare_helper (gsi, LE_EXPR, stmt);
> -  return true;
> -
>  /* flavors of vec_splat_[us]{8,16,32}.  */
>  case RS6000_BIF_VSPLTISB:
>  case RS6000_BIF_VSPLTISH:
> diff --git a/gcc/config/rs6000/rs6000-builtins.def 
> b/gcc/config/rs6000/rs6000-builtins.def
> index 3bc7fed6956..7c36976a089 100644
> --- a/gcc/config/rs6000/rs6000-builtins.def
> +++ b/gcc/config/rs6000/rs6000-builtins.def
> @@ -1337,30 +1337,6 @@
>const vss __builtin_vsx_cmpge_u8hi (vus, vus);
>  CMPGE_U8HI vector_nltuv8hi {}
>  
> -  const vsc __builtin_vsx_cmple_16qi (vsc, vsc);
> -CMPLE_16QI vector_ngtv16qi {}
> -
> -  const vsll __builtin_vsx_cmple_2di (vsll, vsll);
> -CMPLE_2DI vector_ngtv2di {}
> -
> -  const vsi __builtin_vsx_cmple_4si (vsi, vsi);
> -CMPLE_4SI vector_ngtv4si {}
> -
> -  const vss __builtin_vsx_cmple_8hi (vss, vss);
> -CMPLE_8HI vector_ngtv8hi {}
> -
> -  const vsc __builtin_vsx_cmple_u16qi (vsc, vsc);
> -CMPLE_U16QI vector_ngtuv16qi {}
> -
> -  const vsll __builtin_vsx_cmple_u2di (vsll, vsll);
> -CMPLE_U2DI vector_ngtuv2di {}
> -
> -  const vsi __builtin_vsx_cmple_u4si (vsi, vsi);
> -CMPLE_U4SI vector_ngtuv4si {}
> -
> -  const vss __builtin_vsx_cmple_u8hi (vss, vss);
> -CMPLE_U8HI vector_ngtuv8hi {}
> -
>const vd __builtin_vsx_concat_2df (double, double);
>  CONCAT_2DF vsx_concat_v2df {}
>  
> @@ -3117,12 +3093,6 @@
>const vbq __builtin_altivec_cmpge_u1ti (vuq, vuq);
>  CMPGE_U1TI vector_nltuv1ti {}
>  
> -  const vbq __builtin_altivec_cmple_1ti (vsq, vsq);
> -CMPLE_1TI vector_ngtv1ti {}
> -
> -  const vbq __builtin_altivec_cmple_u1ti (vuq, vuq);
> -CMPLE_U1TI vector_ngtuv1ti {}
> -
>const unsigned long long __builtin_altivec_cntmbb (vuc, const int<1>);
>  VCNTMBB vec_cntmb_v16qi {}
>  




Re: Fix gnu versioned namespace mode 00/03

2024-05-12 Thread Iain Sandoe



> On 13 May 2024, at 06:06, François Dumont  wrote:
> 
> 
> On 07/05/2024 18:15, Iain Sandoe wrote:
>> Hi François
>> 
>>> On 4 May 2024, at 22:11, François Dumont  wrote:
>>> 
>>> Here is the list of patches to restore gnu versioned namespace mode.
>>> 
>>> 1/3: Bump gnu version namespace
>>> 
>>> This is important to be done first so that once build of gnu versioned 
>>> namespace is fixed there is no chance to have another build of '__8' 
>>> version with a different abi than last successful '__8' build.
>>> 
>>> 2/3: Fix build using cxx11 abi for versioned namespace
>>> 
>>> 3/3: Proposal to default to "new" abi when dual abi is disabled and accept 
>>> any default-libstdcxx-abi either dual abi is enabled or not.
>>> 
>>> All testsuite run for following configs:
>>> 
>>> - dual abi
>>> 
>>> - gcc4-compatible only abi
>>> 
>>> - new only abi
>>> 
>>> - versioned namespace abi
>> At the risk of delaying this (a bit) - I think we should also consider items 
>> like once_call that have broken impls.
> Do you have any pointer to this once_call problem, sorry I'm not aware about 
> it (apart from your messages).

(although this mentions one specific target, it applies more widely).
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66146

Also, AFAICT, any nested once_call is a problem (not just exceptions).

>>  in the current library - and at least get proposed replacements available 
>> behind the versioned namespace; rather than using up a namespace version 
>> with the current broken code.
> 
> I'm not proposing to fix all library bugs on all platforms with this patch, 
> just fix the versioned namespace mode.

Sorry, I was not intending to suggest that (although perhaps my comments read 
that way).

I was trying to suggest that, in the case where we have proposed fixes that are 
blocked because they are ABI breaks, that those could be put behind the 
versioned namspace (it was not an intention to suggest that such additions 
should be part of this patch series).

> As to do so I also need to adopt cxx11 abi in versioned mode it already 
> justify a bump of version.

I see - it’s just a bit strange that we are bumping a version for a mode that 
does not currently work;  however, i guess someone might have deployed it even 
so.
> 
> The reason I'm proposing to integrate this patch this early in gcc 15 stage 
> is to have time to integrate any other library fix/optimization that could 
> make use of it. I already have 1 on my side for the hashtable implementation

Ah, then I think we are aiming for the same thing.

> . I hope your once_call fix also have time to be ready for gcc 15, no ?

Yes; if we put it behind the versioned namespace - there are (I think) several 
proposed solutions to that specific issue.

thanks
Iain

> 
> François



[PATCH] c++: Avoid using __array_rank as a variable name [PR115061]

2024-05-12 Thread Ken Matsui
This patch fixes a compilation error when building GCC using Clang.
Since __array_rank is used as a built-in trait name, use rank instead.

PR c++/115061

gcc/cp/ChangeLog:

* semantics.cc (finish_trait_expr): Use rank instead of
__array_rank.

Signed-off-by: Ken Matsui 
---
 gcc/cp/semantics.cc | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/cp/semantics.cc b/gcc/cp/semantics.cc
index 43b175f92fd..df62e2d80db 100644
--- a/gcc/cp/semantics.cc
+++ b/gcc/cp/semantics.cc
@@ -12914,10 +12914,10 @@ finish_trait_expr (location_t loc, cp_trait_kind 
kind, tree type1, tree type2)
   tree val;
   if (kind == CPTK_RANK)
 {
-  size_t __array_rank = 0;
+  size_t rank = 0;
   for (; TREE_CODE (type1) == ARRAY_TYPE; type1 = TREE_TYPE (type1))
-   ++__array_rank;
-  val = build_int_cst (size_type_node, __array_rank);
+   ++rank;
+  val = build_int_cst (size_type_node, rank);
 }
   else
 val = (trait_expr_value (kind, type1, type2)
-- 
2.44.0



Re: gcc/DATESTAMP wasn't updated since 20240507

2024-05-12 Thread Rainer Orth
Richard Biener  writes:

> On Thu, 9 May 2024, Jakub Jelinek wrote:
>
>> On Thu, May 09, 2024 at 12:14:43PM +0200, Jakub Jelinek wrote:
>> > On Thu, May 09, 2024 at 12:04:38PM +0200, Rainer Orth wrote:
>> > > I just noticed that gcc/DATESTAMP wasn't updated yesterday and today,
>> > > staying at 20240507.
>> > 
>> > I think it is because of the r15-268 commit, we do support
>> > This reverts commit ...
>> > when the referenced commit contains a ChangeLog message, but here
>> > it doesn't, as it is a revert commit.
>> 
>> Indeed and also the r15-311 commit.
>> Please don't Revert Revert, we don't really support that, had to fix it all
>> by hand.
>
> I do wonder if we can run the ChangeLog processing checks as part of
> the pre-commit hook and reject such pushes.  It seems we have two
> implementations, one in the pre-commit hook and the processing itself
> rather than having a single implementation that can run in two modes?

Unfortunately, the datestamp is again stuck at 20240509.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University