date:20240608

[Patch, fortran] PR59104

2024-06-08 Thread Paul Richard Thomas

Hi All,

The attached fixes a problem that, judging by the comments, has been looked
at periodically over the last ten years but just looked to be too
fiendishly complicated to fix. This is not in small part because of the
confusing ordering of dummies in the tlink chain and the unintuitive
placement of all deferred initializations to the front of the init chain in
the wrapped block.

The result of the existing ordering is that the initialization code for
non-dummy variables that depends on the function result occurs before any
initialization code for the function result itself. The fix ensures that:
(i) These variables are placed correctly in the tlink chain, respecting
inter-dependencies; and (ii) The dependent initializations are placed at
the end of the wrapped block init chain.  The details appear in the
comments in the patch. It is entirely possible that a less clunky fix
exists but I failed to find it.

OK for mainline?

Regards

Paul


Change.Logs
Description: Binary data
diff --git a/gcc/fortran/dependency.cc b/gcc/fortran/dependency.cc
index bafe8cbc5bc..97ace8c778e 100644
--- a/gcc/fortran/dependency.cc
+++ b/gcc/fortran/dependency.cc
@@ -2497,3 +2497,63 @@ gfc_omp_expr_prefix_same (gfc_expr *lexpr, gfc_expr *rexpr)
 
   return true;
 }
+
+
+/* gfc_function_dependency returns true for non-dummy symbols with dependencies
+   on an old-fashioned function result (ie. proc_name = proc_name->result).
+   This is used to ensure that initialization code appears after the function
+   result is treated and that any mutual dependencies between these symbols are
+   respected.  */
+
+static bool
+dependency_fcn (gfc_expr *e, gfc_symbol *sym,
+		 int *f ATTRIBUTE_UNUSED)
+{
+  if (e == NULL)
+return false;
+
+  if (e && e->expr_type == EXPR_VARIABLE
+  && e->symtree
+  && e->symtree->n.sym == sym)
+return true;
+
+  return false;
+}
+
+
+bool
+gfc_function_dependency (gfc_symbol *sym, gfc_symbol *proc_name)
+{
+  bool front = false;
+
+  if (proc_name && proc_name->attr.function
+  && proc_name == proc_name->result
+  && !(sym->attr.dummy || sym->attr.result))
+{
+  if (sym->as && sym->as->type == AS_EXPLICIT)
+	{
+	  for (int dim = 0; dim < sym->as->rank; dim++)
+	{
+	  if (sym->as->lower[dim]
+		  && sym->as->lower[dim]->expr_type != EXPR_CONSTANT)
+		front = gfc_traverse_expr (sym->as->lower[dim], proc_name,
+	   dependency_fcn, 0);
+	  if (front)
+		break;
+	  if (sym->as->upper[dim]
+		  && sym->as->upper[dim]->expr_type != EXPR_CONSTANT)
+		front = gfc_traverse_expr (sym->as->upper[dim], proc_name,
+	   dependency_fcn, 0);
+	  if (front)
+		break;
+	}
+	}
+
+  if (sym->ts.type == BT_CHARACTER
+	  && sym->ts.u.cl && sym->ts.u.cl->length
+	  && sym->ts.u.cl->length->expr_type != EXPR_CONSTANT)
+	front = gfc_traverse_expr (sym->ts.u.cl->length, proc_name,
+   dependency_fcn, 0);
+}
+  return front;
+ }
diff --git a/gcc/fortran/dependency.h b/gcc/fortran/dependency.h
index ea4bd04b0e8..0fa5f93d0fc 100644
--- a/gcc/fortran/dependency.h
+++ b/gcc/fortran/dependency.h
@@ -23,7 +23,7 @@ enum gfc_dep_check
 {
   NOT_ELEMENTAL,/* Not elemental case: normal dependency check.  */
   ELEM_CHECK_VARIABLE,  /* Test whether variables overlap.  */
-  ELEM_DONT_CHECK_VARIABLE  /* Test whether variables overlap only if used 
+  ELEM_DONT_CHECK_VARIABLE  /* Test whether variables overlap only if used
 			   in an expression.  */
 };
 
@@ -43,3 +43,5 @@ bool gfc_are_equivalenced_arrays (gfc_expr *, gfc_expr *);
 bool gfc_omp_expr_prefix_same (gfc_expr *, gfc_expr *);
 
 gfc_expr * gfc_discard_nops (gfc_expr *);
+
+bool gfc_function_dependency (gfc_symbol *, gfc_symbol *);
\ No newline at end of file
diff --git a/gcc/fortran/error.cc b/gcc/fortran/error.cc
index 65e38b0e866..60f607ecc4f 100644
--- a/gcc/fortran/error.cc
+++ b/gcc/fortran/error.cc
@@ -892,7 +892,7 @@ error_print (const char *type, const char *format0, va_list argp)
 #else
 	  m = INTTYPE_MAXIMUM (ptrdiff_t);
 #endif
-	  m = 2 * m + 1;  
+	  m = 2 * m + 1;
 	  error_uinteger (a & m);
 	}
 	  else
diff --git a/gcc/fortran/symbol.cc b/gcc/fortran/symbol.cc
index 0a1646def67..7e39981e843 100644
--- a/gcc/fortran/symbol.cc
+++ b/gcc/fortran/symbol.cc
@@ -27,6 +27,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "parse.h"
 #include "match.h"
 #include "constructor.h"
+#include "dependency.h"
 
 
 /* Strings for all symbol attributes.  We use these for dumping the
@@ -948,15 +949,18 @@ conflict_std:
 void
 gfc_set_sym_referenced (gfc_symbol *sym)
 {
+  gfc_symbol *proc_name = sym->ns->proc_name ? sym->ns->proc_name : NULL;
 
   if (sym->attr.referenced)
 return;
 
   sym->attr.referenced = 1;
 
-  /* Remember which order dummy variables are accessed in.  */
-  if (sym->attr.dummy)
-sym->dummy_order = next_dummy_order++;
+  /* Remember which order dummy variables and symbols with function result
+ dependencies are accessed in.  *

Re: [PATCH] [tree-prof] skip if errors were seen [PR113681]

2024-06-08 Thread Jeff Law





On 4/15/24 10:03 PM, Alexandre Oliva wrote:

On Mar 29, 2024, Alexandre Oliva  wrote:


On Mar 22, 2024, Jeff Law  wrote:

On 3/9/24 2:11 AM, Alexandre Oliva wrote:

ipa_tree_profile asserts that the symtab is in IPA_SSA state, but we
don't reach that state and ICE if e.g. ipa-strub passes report errors.
Skip this pass if errors were seen.
Regstrapped on x86_64-linux-gnu.  Ok to install?

for  gcc/ChangeLog
PR tree-optimization/113681
* tree-profiling.cc (pass_ipa_tree_profile::gate): Skip if
seen_errors.
for  gcc/testsuite/ChangeLog
PR tree-optimization/113681
* c-c++-common/strub-pr113681.c: New.

So I've really never dug into strub, but this would seem to imply that
an error from strub is non-fatal?



Yeah.  I believe that's no different from other passes.



Various other passes have seen_errors guards, but ipa-prof didn't.


Specifically, pass_build_ssa_passes in passes.cc is gated with
!seen_errors(), so we skip all the passes bundled in it, and don't
advance the symtab state to IPA_SSA.  So other passes that would require
IPA_SSA need to be gated similarly.


I suppose the insertion point for the strubm pass was one where others
passes didn't previously issue errors, so that wasn't an issue for
ipa-prof.  But now it is.


The patch needed adjustments to resolve conflicts with unrelated
changes.


[tree-prof] skip if errors were seen [PR113681]

ipa_tree_profile asserts that the symtab is in IPA_SSA state, but we
don't reach that state and ICE if e.g. ipa-strub passes report errors.
Skip this pass if errors were seen.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

PR tree-optimization/113681
* tree-profiling.cc (pass_ipa_tree_profile::gate): Skip if
seen_errors.

for  gcc/testsuite/ChangeLog

PR tree-optimization/113681
* c-c++-common/strub-pr113681.c: New.

OK.
jeff

Re: [middle-end PATCH] Prefer PLUS over IOR in RTL expansion of multi-word shifts/rotates.

2024-06-08 Thread Jeff Law





On 1/18/24 12:54 PM, Roger Sayle wrote:


This patch tweaks RTL expansion of multi-word shifts and rotates to use
PLUS rather than IOR for disjunctive operations.  During expansion of
these operations, the middle-end creates RTL like (X<>C2)
where the constants C1 and C2 guarantee that bits don't overlap.
Hence the IOR can be performed by any any_or_plus operation, such as
IOR, XOR or PLUS; for word-size operations where carry chains aren't
an issue these should all be equally fast (single-cycle) instructions.
The benefit of this change is that targets with shift-and-add insns,
like x86's lea, can benefit from the LSHIFT-ADD form.

An example of a backend that benefits is ARC, which is demonstrated
by these two simple functions:

unsigned long long foo(unsigned long long x) { return x<<2; }

which with -O2 is currently compiled to:

foo:lsr r2,r0,30
 asl_s   r1,r1,2
 asl_s   r0,r0,2
 j_s.d   [blink]
 or_sr1,r1,r2

with this patch becomes:

foo:lsr r2,r0,30
 add2r1,r2,r1
 j_s.d   [blink]
 asl_s   r0,r0,2

unsigned long long bar(unsigned long long x) { return (x<<2)|(x>>62); }

which with -O2 is currently compiled to 6 insns + return:

bar:lsr r12,r0,30
 asl_s   r3,r1,2
 asl_s   r0,r0,2
 lsr_s   r1,r1,30
 or_sr0,r0,r1
 j_s.d   [blink]
 or  r1,r12,r3

with this patch becomes 4 insns + return:

bar:lsr r3,r1,30
 lsr r2,r0,30
 add2r1,r2,r1
 j_s.d   [blink]
 add2r0,r3,r0


This patch has been tested on x86_64-pc-linux-gnu with make bootstrap
and make -k check, both with and without --target_board=unix{-m32}
with no new failures.  Ok for mainline?


2024-01-18  Roger Sayle  

gcc/ChangeLog
 * expmed.cc (expand_shift_1): Use add_optab instead of ior_optab
 to generate PLUS instead or IOR when unioning disjoint bitfields.
 * optabs.cc (expand_subword_shift): Likewise.
 (expand_binop): Likewise for double-word rotate.
Also note that on some targets like RISC-V, there's more freedom to 
generate compressed instructions from "and" rather than "or".


Anyway, given the time elapsed since submission, I went ahead and 
retested on x86, then committed & pushed to the trunk.


Thanks!

jeff

[PATCH v2] MIPS: Use signaling fcmp instructions for LT/LE/LTGT

2024-06-08 Thread YunQiang Su

LT/LE: c.lt.fmt/c.le.fmt on pre-R6 and cmp.lt.fmt/cmp.le.fmt have
different semantic:
   c.lt.fmt will signal for all NaN, including qNaN;
   cmp.lt.fmt will only signal sNaN, while not qNaN;
   cmp.slt.fmt has the same semantic as c.lt.fmt;
   lt/le of RTL will signaling qNaN.

while in `s__using_`, RTL operation
`lt`/`le` are convert to c/cmp's lt/le, which is correct for C.cond.fmt,
while not for CMP.cond.fmt. Let's convert them to slt/sle if ISA_HAS_CCF.

For LTGT, which signals qNaN, `sne` of r6 has same semantic, while pre-R6
has only inverse one `ngl`.  Thus for RTL we have to use the `uneq` as the
operator, and introduce a new CC mode: CCEmode to mark it as signaling.

This patch can fix
   gcc.dg/torture/pr91323.c for pre-R6;
   gcc.dg/torture/builtin-iseqsig-* for R6.

gcc:
* config/mips/mips-modes.def: New CC_MODE CCE.
* config/mips/mips-protos.h(mips_output_compare): New function.
* config/mips/mips.cc(mips_allocate_fcc): Set CCEmode count=1.
(mips_emit_compare): Use CCEmode for LTGT/LT/LE for pre-R6.
(mips_output_compare): New function. Convert lt/le to slt/sle
for R6; convert ueq to ngl for CCEmode.
(mips_hard_regno_mode_ok_uncached): Mention CCEmode.
* config/mips/mips.h: Mention CCEmode for LOAD_EXTEND_OP.
* config/mips/mips.md(FPCC): Add CCE.
(define_mode_iterator MOVECC): Mention CCE.
(define_mode_attr reg): Add CCE with "z".
(define_mode_attr fpcmp): Add CCE with "c".
(define_code_attr fcond): ltgt should use sne instead of ne.
(s__using_): call mips_output_compare.
---
 gcc/config/mips/mips-modes.def |  1 +
 gcc/config/mips/mips-protos.h  |  2 ++
 gcc/config/mips/mips.cc| 48 +++---
 gcc/config/mips/mips.h |  2 +-
 gcc/config/mips/mips.md| 19 +-
 5 files changed, 61 insertions(+), 11 deletions(-)

diff --git a/gcc/config/mips/mips-modes.def b/gcc/config/mips/mips-modes.def
index 323570928fc..21f50a22546 100644
--- a/gcc/config/mips/mips-modes.def
+++ b/gcc/config/mips/mips-modes.def
@@ -54,4 +54,5 @@ ADJUST_ALIGNMENT (CCV4, 16);
 CC_MODE (CCDSP);
 
 /* For floating point conditions in FP registers.  */
+CC_MODE (CCE);
 CC_MODE (CCF);
diff --git a/gcc/config/mips/mips-protos.h b/gcc/config/mips/mips-protos.h
index 835f42128b9..fcc0a0ae663 100644
--- a/gcc/config/mips/mips-protos.h
+++ b/gcc/config/mips/mips-protos.h
@@ -394,4 +394,6 @@ extern bool mips_bit_clear_p (enum machine_mode, unsigned 
HOST_WIDE_INT);
 extern void mips_bit_clear_info (enum machine_mode, unsigned HOST_WIDE_INT,
  int *, int *);
 
+extern const char *mips_output_compare (const char *fpcmp, const char *fcond,
+   const char *fmt, const char *fpcc_mode, bool swap);
 #endif /* ! GCC_MIPS_PROTOS_H */
diff --git a/gcc/config/mips/mips.cc b/gcc/config/mips/mips.cc
index 278d9446482..b7acf041903 100644
--- a/gcc/config/mips/mips.cc
+++ b/gcc/config/mips/mips.cc
@@ -5659,7 +5659,7 @@ mips_allocate_fcc (machine_mode mode)
 
   gcc_assert (TARGET_HARD_FLOAT && ISA_HAS_8CC);
 
-  if (mode == CCmode)
+  if (mode == CCmode || mode == CCEmode)
 count = 1;
   else if (mode == CCV2mode)
 count = 2;
@@ -5788,17 +5788,57 @@ mips_emit_compare (enum rtx_code *code, rtx *op0, rtx 
*op1, bool need_eq_ne_p)
  /* Three FP conditions cannot be implemented by reversing the
 operands for C.cond.fmt, instead a reversed condition code is
 required and a test for false.  */
+ machine_mode ccmode = CCmode;
+ switch (*code)
+   {
+   case LTGT:
+   case LT:
+   case LE:
+ ccmode = CCEmode;
+ break;
+   default:
+ break;
+   }
  *code = mips_reversed_fp_cond (&cmp_code) ? EQ : NE;
  if (ISA_HAS_8CC)
-   *op0 = mips_allocate_fcc (CCmode);
+   *op0 = mips_allocate_fcc (ccmode);
  else
-   *op0 = gen_rtx_REG (CCmode, FPSW_REGNUM);
+   *op0 = gen_rtx_REG (ccmode, FPSW_REGNUM);
}
 
   *op1 = const0_rtx;
   mips_emit_binary (cmp_code, *op0, cmp_op0, cmp_op1);
 }
 }
+
+
+const char *
+mips_output_compare (const char *fpcmp, const char *fcond,
+   const char *fmt, const char *fpcc_mode, bool swap)
+{
+  const char *fc = fcond;
+
+  if (ISA_HAS_CCF)
+{
+  /* c.lt.fmt is signaling, while cmp.lt.fmt is quiet.  */
+  if (strcmp (fcond, "lt") == 0)
+   fc = "slt";
+  else if (strcmp (fcond, "le") == 0)
+   fc = "sle";
+}
+  else if (strcmp (fpcc_mode, "cce") == 0)
+{
+  /* It was LTGT, while we have only inverse one.  It was then converted
+to UNEQ by mips_reversed_fp_cond, and we used CCEmode to mark it.
+Lets convert it back to ngl now.  */
+  if (strcmp (fcond, "ueq") == 0)
+   fc = "ngl";
+}
+  if (swap)
+return concat(fpcmp, ".", fc,

Re: [patch] install.texi (nvptx): Recommend nvptx-tools 2024-05-30

2024-06-08 Thread Gerald Pfeifer

On Mon, 3 Jun 2024, Richard Biener wrote:
> install.texi also has the issue that it's not pre-packaged in a
> easy to discover and readable file in the release tarballs and that
> the online version is only for trunk.

The latter is only partially true: we generally try to keep it applicable 
more broadly - to a fault at times, if you look at some of the recent 
pruning I had to do.

Gerald

Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection

2024-06-08 Thread Jeff Law

On 5/29/24 5:12 AM, Mariam Arutunian wrote:

IIRC we looked at the problem of canonicalizing the loop into a form
where we didn't necessarily have conditional blocks, instead we had
branchless sequences for the conditional xor and dealing with the high
bit in the crc.  My recollection was that the coremark CRC loop would
always canonicalize, but that in general we still saw multiple CRC
implementations that did not canonicalize and thus we still needed the
more complex matching.  Correct?

The loop in CoreMark is not fully canonicalized in that form,
as there are still branches present for the conditional XOR operation.
I checked that using the -O2 and -O3 flags.
A bit of a surprise.  Though it may be the case that some of the 
canonicalization steps are happening later in the pipeline.  No worries 
as I think we'd already concluded that we'd see at least some CRC 
implementations that wouldn't canonicalize down to branchless sequences 
for the conditional xor.

 > +
 > +gimple *
 > +crc_optimization::find_shift_after_xor (tree xored_crc)
 > +{
 > +  imm_use_iterator imm_iter;
 > +  use_operand_p use_p;
 > +
 > +  if (TREE_CODE (xored_crc) != SSA_NAME)
 > +    return nullptr;
If we always expect XORED_CRC to be an SSA_NAME, we might be able to use
gcc_assert TREE_CODE (XORED_CRC) == SSA_NAME);

I'm not sure that it always has to be an SSA_NAME.

For a logical operation like XOR it should always have the form

SSA_NAME = SSA_NAME ^ (SSA_NAME | CONSTANT)

The constant might be a vector  constant, but the basic form won't 
change.  It's one of the nicer properties of gimple.  In contrast RTL 
would allow a variety of lvalues and rvalues, including MEMs, REGs, 
SUBREGs, extensions, other binary ops, etc etc.

 > +
 > +/* Set M_PHI_FOR_CRC and M_PHI_FOR_DATA fields.
 > +   Returns false if there are more than two (as in CRC
calculation only CRC's
 > +   and data's phi may exist) or no phi statements in STMTS (at
least there must
 > +   be CRC's phi).
 > +   Otherwise, returns true.  */
 > +
 > +bool
 > +crc_optimization::set_crc_and_data_phi (auto_vec &stmts)
 > +{
 > +  for (auto stmt_it = stmts.begin (); stmt_it != stmts.end ();
stmt_it++)
 > +    {
 > +      if (is_a (*stmt_it) && bb_loop_header_p (gimple_bb
(*stmt_it)))
 > +     {
 > +       if (!m_phi_for_crc)
 > +         m_phi_for_crc = as_a (*stmt_it);
 > +       else if (!m_phi_for_data)
 > +         m_phi_for_data = as_a (*stmt_it);
 > +       else
 > +         {
 > +           if (dump_file && (dump_flags & TDF_DETAILS))
 > +             fprintf (dump_file, "Xor-ed variable depends on
more than 2 "
 > +                                 "phis.\n");
 > +           return false;
 > +         }
 > +     }
 > +    }
 > +  return m_phi_for_crc;
Hmm.  For a given PHI, how do we know if it's for the data item or the
crc item, or something else (like a loop counter) entirely?

I trace the def-use chain upwards from the XOR statement to determine 
which PHI node corresponds to CRC and data.
Since we assume the loop calculates CRC, I expect only variables 
representing data and CRC to participate in these operations.
In the implementations I support, the loop counter is used only for the 
iteration.
Any misidentification of CRC and data would occur only if the loop 
doesn't calculate CRC, in which case next checks would fail, leading the 
algorithm to identify it as not CRC.

Here, the PHI nodes for CRC and data might be mixed in places.
I just assume that the first found PHI is CRC, second data.
I correctly determine them later with the | 
*swap_crc_and_data_if_needed*| function.

Ah, OK.  That probably deserves a comment in this code.

jeff

Re: [RFC/RFA] [PATCH 01/12] Implement internal functions for efficient CRC computation

2024-06-08 Thread Jeff Law





On 5/27/24 7:51 AM, Mariam Arutunian wrote:



I carefully reviewed the indentation of the code using different editors 
and viewers, and everything appeared correct.
I double-checked the specific sections mentioned, and they also looked 
right.

In this reply message I see that it's not correct. I'll try to fix it.

Thanks for double-checking.  It's one of the downsides of email based flows.

Jeff

Re: [RFC/RFA] [PATCH 08/12] Add a new pass for naive CRC loops detection

2024-06-08 Thread Jeff Law





On 6/4/24 7:41 AM, Mariam Arutunian wrote:
/Mariam, your thoughts on whether or not those two phases could handle a 
loop with two CRC calculations inside, essentially creating two calls to 
our new builtins? /


/
/

It is feasible, but it would likely demand considerable effort and 
additional work to implement effectively.
Thanks for the confirmation.  I suspect it likely doesn't come up often 
in practice either.






The key would be to only simulate the use-def cycle from the loop-closed PHI 
(plus the loop control of course, but miter/SCEV should be enough there) and 
just replace that LC PHI, leaving loop DCE to DCE.


Thank you, this is a good idea to just replace the PHI and leave the loop to 
DCE to remove only single CRC parts.
It does seem like replacing the PHI when we have an optimizable case 
might simplify that aspect of the implementation.





The current pass only verifies cases where a single CRC calculation is 
performed within the loop. During the verification phase,
I ensure that there are no other calculations aside from those necessary for 
the considered CRC computation.

Also, when I was investigating the bitwise CRC implementations used in 
different software, in all cases the loop was calculating just one CRC and no 
other calculations were done.
Thus, in almost all cases, the first phase will filter out non-CRCs, and during 
the second phase, only real CRCs with no other calculations will be executed.
This ensures that unnecessary statements won't be executed in most cases.
But we may have had a degree of sampling bias here.  If I remember 
correctly I used the initial filtering pass as the "trigger" to report a 
potential CRC case.  If that initial filtering pass rejected cases with 
other calculations in the loop, then we never would have seen those.




Leaving the loop to DCE will simplify the process of removing parts connected 
to a single CRC calculation.
However, since now we detect a loop that only calculates a single CRC, we can 
entirely remove it at this stage without additional checks.
Let's evaluate this option as we get to the later patches in the series. 
 What I like about Richard's suggestion is that it "just works" and it 
will continue to work, even as the overall infrastructure changes.  In 
contrast a bespoke loop removal implementation in a specific pass may 
need adjustment if other aspects of our infrastructure change.






If we really want a separate pass (or utility to work on a single 
loop) then we might consider moving some of the final value replacement 
code that doesn’t work with only SCEV there as well. There’s also 
special code in loop distribution for strlen recognition now, not 
exactly fitting in. >



Note I had patches to do final value replacement on demand from CD-DCE when it 
figures a loop has no side effects besides of its reduction outputs (still want 
to pick this up at some point again).


Oh, this could provide useful insights for our implementation.
Are you thinking of reusing that on-demand analysis to reduce the set of 
loops we analyze?


Jeff

Re: [PING] [contrib] validate_failures.py: fix python 3.12 escape sequence warnings

2024-06-08 Thread Jeff Law





On 5/14/24 8:12 AM, Gabi Falk wrote:

Hi,

This one still needs review:

https://inbox.sourceware.org/gcc-patches/20240415233833.104460-1-gabif...@gmx.com/

I think I just ACK'd an equivalent patch from someone else this week.

jeff

Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-06-08 Thread Jeff Law





On 3/1/24 1:12 AM, Demin Han wrote:

Hi juzhe,

I also thought it’s related to commutive firstly.

Following things make me to do the removal:

1.No tests fails in regression

2.When I write if (a == 2) and if (2 == a), the results are same

GCC canonicalizes comparisons so that constants appear second.

Jeff

Re: [committed] i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]

2024-06-08 Thread Uros Bizjak

On Sat, Jun 8, 2024 at 2:09 PM Gerald Pfeifer  wrote:
>
> On Sat, 8 Jun 2024, Uros Bizjak wrote:
> > gcc/ChangeLog:
> >
> > * config/i386/i386.md (usadd3): New expander.
> > (x86_movcc_0_m1_neg): Use SWI mode iterator.
>
> When you write "committed", did you actually push?

Yes, IIRC, the request was to mark pushed change with the word "committed".

> If so, us being on Git now it might be good to adjust terminology.

No problem, I can say "pushed" if that is more descriptive.

Thanks,
Uros.

Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-06-08 Thread Jeff Law





On 2/29/24 11:27 PM, demin.han wrote:

We can unify eqne and other comparison operations.

Tested on RV32 and RV64

gcc/ChangeLog:

* config/riscv/riscv-vector-builtins-bases.cc: Remove eqne cond
* config/riscv/vector.md (@pred_eqne_scalar): Remove patterns
(*pred_eqne_scalar_merge_tie_mask): Ditto
(*pred_eqne_scalar): Ditto
(*pred_eqne_scalar_narrow): Ditto
So I'll tentatively ACK this for the trunk, assuming Robin doesn't 
object before Tuesday's patchwork meeting.


jeff

Re: [PATCH 1/5] RISC-V: Remove float vector eqne pattern

2024-06-08 Thread Jeff Law





On 5/16/24 1:21 PM, Robin Dapp wrote:

Can eqne pattern removal patches be committed firstly?


Please first make sure you test with corner cases, NaNs in
particular.  I'm pretty sure we don't have any test cases for
those.

But isn't canonicalization of EQ/NE safe, even for IEEE NaN and +-0.0?

target = (a == b) ? x : y
target = (a != b) ? y : x

Are equivalent, even for IEEE IIRC.

jeff

Re: [PATCH 0/2] fix RISC-V zcmp popretz [PR113715]

2024-06-08 Thread Jeff Law





On 6/5/24 8:42 PM, Fei Gao wrote:


But let's back up and get a good explanation of what the problem is.
Based on patch 2/2 it looks like we have lost an assignment to the
return register.

To someone not familiar with this code, it sounds to me like we've made
a mistake earlier and we're now defining a hook that lets us go back and
fix that earlier mistake.   I'm probably wrong, but so far that's what
it sounds like.

Hi Jeff

You're right. Let me rephrase  patch 2/2 with more details. Search /* feigao to 
location the point I'm
tring to explain.

code snippets from gcc/function.cc
void
thread_prologue_and_epilogue_insns (void)
{
...
   /*feigao:
         targetm.gen_epilogue () is called here to generate epilogue sequence.

https://gcc.gnu.org/git/?p=gcc.git;a=commit;h=b27d323a368033f0b37e93c57a57a35fd9997864
Commit above tries in targetm.gen_epilogue () to detect if
there's li  a0,0 insn at the end of insn chain, if so, cm.popret
is replaced by cm.popretz and lia0,0 insn is deleted.
So that seems like the critical issue.  Generation of the 
prologue/epilogue really shouldn't be changing other instructions in the 
instruction stream.  I'm not immediately aware of another target that 
does that, an it seems like a rather risky thing to do.



It looks like the cm.popretz's RTL exposes the assignment to a0 and 
there's a DCE pass that runs after insertion of the prologue/epilogue. 
So I would suggest leaving the assignment to a0 in the RTL chain and see 
if the later DCE pass after prologue generation eliminates the redundant 
assignment.  That seems a lot cleaner.




Jeff

Re: [Patch, PR Fortran/90072] Polymorphic Dispatch to Polymophic Return Type Memory Leak

2024-06-08 Thread Tobias Burnus


Andre Vehreschild wrote:

PS That's good news about the funding. Maybe we will get to see "built in"
coarrays soon?

You hopefully will see Nikolas work on the shared memory coarray support, if
that is what you mean by "built in" coarrays. I will be working on the
distributed memory coarray support esp. fixing the module issues and some other
team related things.


Cool! (Both of it.)

I assume "distributed memory coarray support" is still based on Open
Coarrays?

* * *

I am asking because there is coarray API being defined: Parallel Runtime
Interface for Fortran (PRIF), https://go.lbl.gov/prif

with an implementation called Caffeine – CoArray Fortran Framework of
Efficient Interfaces to Network Environments,
https://crd.lbl.gov/caffeine which uses GASNet or POSIX processes.

Well, the among the implementers is (unsurprising?) Damian – and the
idea seems to be that LLVM's FLANG will use the API.

Tobias

PS: I think it might be useful in the long run to support both
PRIF/Caffeine and OpenCoarrays.

I have attached my hello-world patch for -fcoarray=prif that I wrote
after ISC-HPC; it only handles this_image() / num_images() + init/stop.
I got confirmation by the PRIF developers that the next revision will
permit calling __prif_MOD_prif_init multiple times such that one can use
it in the constructor for static coarrays, which won't work otherwise.
gcc/ChangeLog:

	* flag-types.h (enum gfc_fcoarray):

gcc/fortran/ChangeLog:

	* invoke.texi:
	* lang.opt:
	* trans-decl.cc (gfc_build_builtin_function_decls):
	(create_main_function):
	* trans-intrinsic.cc (trans_this_image):
	(trans_num_images):
	* trans.h (GTY):

 gcc/flag-types.h   |  3 ++-
 gcc/fortran/invoke.texi|  7 +-
 gcc/fortran/lang.opt   |  5 +++-
 gcc/fortran/trans-decl.cc  | 56 --
 gcc/fortran/trans-intrinsic.cc | 42 +++
 gcc/fortran/trans.h|  5 
 6 files changed, 108 insertions(+), 10 deletions(-)

diff --git a/gcc/flag-types.h b/gcc/flag-types.h
index 5a2b461fa75..babd747c01d 100644
--- a/gcc/flag-types.h
+++ b/gcc/flag-types.h
@@ -427,7 +427,8 @@ enum gfc_fcoarray
 {
   GFC_FCOARRAY_NONE = 0,
   GFC_FCOARRAY_SINGLE,
-  GFC_FCOARRAY_LIB
+  GFC_FCOARRAY_LIB,
+  GFC_FCOARRAY_PRIF
 };
 
 
diff --git a/gcc/fortran/invoke.texi b/gcc/fortran/invoke.texi
index 40e8e4a7cdd..331a40d31db 100644
--- a/gcc/fortran/invoke.texi
+++ b/gcc/fortran/invoke.texi
@@ -1753,7 +1753,12 @@ Single-image mode, i.e. @code{num_images()} is always one.
 
 @item @samp{lib}
 Library-based coarray parallelization; a suitable GNU Fortran coarray
-library needs to be linked.
+library needs to be linked such as @url{http://opencoarrays.org}.
+
+@item @samp{prif}
+Using the Parallel Runtime Interface for Fortran (PRIF),
+@url{https://go.lbl.gov/@/prif}; for instance, via Caffeine,
+@url{https://go.lbl.gov/@/caffeine}.
 @end table
 
 
diff --git a/gcc/fortran/lang.opt b/gcc/fortran/lang.opt
index 5efd4a0129a..9ba957d5571 100644
--- a/gcc/fortran/lang.opt
+++ b/gcc/fortran/lang.opt
@@ -786,7 +786,7 @@ Copy array sections into a contiguous block on procedure entry.
 
 fcoarray=
 Fortran RejectNegative Joined Enum(gfc_fcoarray) Var(flag_coarray) Init(GFC_FCOARRAY_NONE)
--fcoarray=	Specify which coarray parallelization should be used.
+-fcoarray=	Specify which coarray parallelization should be used.
 
 Enum
 Name(gfc_fcoarray) Type(enum gfc_fcoarray) UnknownError(Unrecognized option: %qs)
@@ -800,6 +800,9 @@ Enum(gfc_fcoarray) String(single) Value(GFC_FCOARRAY_SINGLE)
 EnumValue
 Enum(gfc_fcoarray) String(lib) Value(GFC_FCOARRAY_LIB)
 
+EnumValue
+Enum(gfc_fcoarray) String(prif) Value(GFC_FCOARRAY_PRIF)
+
 fcheck=
 Fortran RejectNegative JoinedOrMissing
 -fcheck=[...]	Specify which runtime checks are to be performed.
diff --git a/gcc/fortran/trans-decl.cc b/gcc/fortran/trans-decl.cc
index dca7779528b..d1c0e2ee997 100644
--- a/gcc/fortran/trans-decl.cc
+++ b/gcc/fortran/trans-decl.cc
@@ -170,6 +170,10 @@ tree gfor_fndecl_co_sum;
 tree gfor_fndecl_caf_is_present;
 tree gfor_fndecl_caf_random_init;
 
+tree gfor_fndecl_prif_init;
+tree gfor_fndecl_prif_stop;
+tree gfor_fndecl_prif_this_image_no_coarray;
+tree gfor_fndecl_prif_num_images;
 
 /* Math functions.  Many other math functions are handled in
trans-intrinsic.cc.  */
@@ -4147,6 +4151,31 @@ gfc_build_builtin_function_decls (void)
 	get_identifier (PREFIX("caf_random_init")),
 	void_type_node, 2, logical_type_node, logical_type_node);
 }
+  else if (flag_coarray == GFC_FCOARRAY_PRIF)
+{
+  tree pint_type = build_pointer_type (integer_type_node);
+  tree pbool_type = build_pointer_type (boolean_type_node);
+  tree pintmax_type_node = get_typenode_from_name (INTMAX_TYPE);
+  pintmax_type_node = build_pointer_type (pintmax_type_node);
+
+  gfor_fndecl_prif_init = gfc_build_library_function_decl_with_spec (
+	get_identifier ("__prif_MOD_prif_init"), ". W ",
+	void_type_node, 1, pint

Re: [PATCH] haifa-sched: Avoid the fusion priority of the fused insn to affect the subsequent insn sequence.

2024-06-08 Thread Jeff Law





On 6/6/24 8:51 PM, Jin Ma wrote:


I am very sorry that I did not check the commit information carefully. The 
statement is somewhat inaccurate.


When the insn 1 and 2, 3 and 4 can be fusioned, then there is the
following sequence:

;;    insn |
;;      1  | sp=sp-0x18
;;  +   2  | [sp+0x10]=ra
;;      3  | [sp+0x8]=s0
;;      4  | [sp+0x0]=s1



The fusion priority of the insn 2, 3, and 4 are the same. According to
the current algorithm, since abs(0x10-0x8)


;;    insn |
;;      1  | sp=sp-0x18
;;  +   2  | [sp+0x10]=ra
;;      4  | [sp+0x8]=s1
;;  +   3  | [sp+0x0]=s0

gcc/ChangeLog:



  * haifa-sched.cc (rank_for_schedule): Likewise.


When the insn 1 and 2, 4 and 3 can be fusioned, then there is the
following sequence:

;;    insn |
;;      1  | sp=sp-0x18
;;  +   2  | [sp+0x10]=ra
;;      3  | [sp+0x8]=s0
;;      4  | [sp+0x0]=s1

The fusion priority of the insn 2, 3, and 4 are the same. According to
the current algorithm, since abs(0x10-0x8)I'd really love to see a testcase here, particularly since I'm still 
having trouble understanding the code you're currently getting vs the 
code you want.


Furthermore, I think I need to understand the end motivation here.  I 
always think of fusion priority has bringing insns consecutive so that 
peephole pass can then squash two more more insns into a single insn. 
THe canonical case being load/store pairs.



If you're trying to generate pairs, then that's fine.  I just want to 
make sure I understand the goal.  And if you're trying to generate pairs 
what actually can be paired?  I must admit I don't have any notable 
experience with the thead core extensions.


If you're just trying to keep the instructions consecutive in the IL, 
then I don't think fusion priorities are a significant concern.  Much 
more important for that case is the fusion pair detection (which I think 
is about to get a lot more attention in the near future).


Jeff

Re: [committed v4] libstdc++: Fix std::ranges::iota is not included in numeric [PR108760]

2024-06-08 Thread Jonathan Wakely

On Sat, 8 Jun 2024 at 16:56, Ulrich Drepper  wrote:
>
> On Sat, Jun 8, 2024 at 5:03 PM Jonathan Wakely  wrote:
> > I'm in two minds about backporting this one. It would be good to fix the
> > non-conformance problem for the release branches, but it also
> > potentially breaks some code that uses ranges::iota without including
> > .
>
> I say add the change as soon as possible so that there is as little
> code as possible relying on the non-standard header.

Yes, that's good rationale for doing the backports.

Re: [wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features

2024-06-08 Thread Tobias Burnus


Hi Gerald,

Gerald Pfeifer wrote:

Looks like a janitorial task to fix the absolute links, possibly
excluding those with /git, /onlinedocs, /wiki – or assuming that the
main page is GCC.gnu.org, relying on the redirects.

It's on my list. A first quick check indicates there isn't much to do,
though. :-)


You could consider

htdocs/search.html:

to avoid a redirect (but it is not a broken link);
otherwise, I but I concur that it seems to be (mostly) fine :-)

* * *


+  loop-transformation constructs are now supported.
I'm thinking "loop transformation" in English? Or is this a specific term
from the standard?

Loop transformation happens at the end. But e.g "(#pragma omp) unroll
full" is a directive and, e.g.
...
is a construct (= directive + structured block (if any) + end directive
(if any)).

I believe there was a misunderstanding and I wasn't clear enough: I was
wondering whether instead of "loop-transformation" the patch should have
"loop transformation".

In your response you use the version without dash, so I guess we agree?
:-)


(Pedantically it's a hyphen (-) and not a(n en/em) dash (–/—), i.e. '-' 
not '--' or '---' in TeX.)


No, we don't. – There is a difference whether the two words are used 
alone or as modifier to a noun, like the "this is well defined" vs. "a 
well-defined project".


Thus, while "loop transformation happens" is without hyphen (as we both 
agree),* for "loop(-| )tranformation constructs" the (non-)usage of 
hyphens is not well defined; grouping wise, those are clearly '((loop 
transformation) constructs)' and not '(loop (transformation constructs))'.


I believe both variants are perfectly fine.

BTW: In the OpenMP pre-6.0 draft (TR12), the verb 'transform' is now 
used as noun not with suffix '-ation' but with the suffix '-ing' (also 
referred to as gerund) such that a section title now uses 
"Loop-Transforming Constructs"; I think for '(word) plus (-ing word)' – 
used as modifier –, a hyphen is a tad more common than for '(word) plus 
'(word with -ation suffix)'.


Tobias

* The Oxford Guide to Style points out some words that do get 
hyphenated: clear-cut, drip-proof, take-off, part-time, … – or to refer 
to the abstract meaning rather than literal: bull's-eye, crow's-feet, … 
— Formerly, present particle plus noun got hyphenated when the compound 
was acted on: walking-stick, walking-frame. Likewise, it was formerly 
normal in British English to hyphenate a single adjectival noun and the 
noun it modified: note-cue, title-page, volume-number (less common now, 
but can linger in some combination). And until recently: small 
scale-factory (vs. small-scale factory), white water-lily (vs. 
white-water lily).

Re: Reverted recent patches to resource.cc

2024-06-08 Thread Jeff Law





On 5/29/24 8:07 PM, Jeff Law wrote:



On 5/29/24 7:28 PM, Hans-Peter Nilsson wrote:

From: Hans-Peter Nilsson 
Date: Mon, 27 May 2024 19:51:47 +0200



2: Does not depend on 1, but corrects an incidentally found wart:
find_basic_block calls fails too often.  Replace it with "modern"
insn-to-basic-block cross-referencing.

3: Just an addendum to 2: removes an "if", where the condition is now
always-true, dominated by a gcc_assert, and where the change in
indentation was too ugly.

4: Corrects another incidentally found wart: for the last 15 years the
code in resource.cc has only been called from within reorg.cc (and
reorg.c), specifically not possibly before calling init_resource_info
or after free_resource_info, so we can discard the code that tests
certain allocated arrays for NULL.  I didn't even bother with a
gcc_assert; besides some gen*-generated files, only reorg.cc includes
resource.h (not to be confused with the system sys/resource.h).
A grep says the #include resource.h can be removed from those gen*
files and presumably from RESOURCE_H(!) as well.  Some Other Time.
Also, removed a redundant "if (tinfo != NULL)" and moved the then-code
into the previous then-clause.

   resource.cc: Replace calls to find_basic_block with cfgrtl
 BLOCK_FOR_INSN
   resource.cc (mark_target_live_regs): Remove check for bb not found
   resource.cc: Remove redundant conditionals


I had to revert those last three patches due to PR
bootstrap/115284.  I hope to revisit once I have a means to
reproduce (and fix) the underlying bug.  It doesn't have to
be a bug with those changes per-se: IMHO the "improved"
lifetimes could just as well have uncovered a bug elsewhere
in reorg.  It's still on me to resolve that situation; done.
I'm just glad the cause was the incidental improvements and
not the original bug I wanted to fix.

There appears to be only a single supported SPARC machine in
cfarm: cfarm216, and I currently can't reach it due to what
appears to be issues at my end.  I guess I'll either fix
that or breathe life into sparc-elf+sim.

Or if you've got a reasonable server to use, QEMU might save you :-)



Even better option.  The sh4/sh4eb-linux-gnu ports with 
execute/ieee/fp-cmp-5.c test.  That started execution failing at -O2 
with the first patch in the series and there are very clear assembly 
differences before/after your change.  Meaning you can probably look at 
them with just a cross compile and compare the before/after.



Jeff

Re: [committed v4] libstdc++: Fix std::ranges::iota is not included in numeric [PR108760]

2024-06-08 Thread Ulrich Drepper

On Sat, Jun 8, 2024 at 5:03 PM Jonathan Wakely  wrote:
> I'm in two minds about backporting this one. It would be good to fix the
> non-conformance problem for the release branches, but it also
> potentially breaks some code that uses ranges::iota without including
> .

I say add the change as soon as possible so that there is as little
code as possible relying on the non-standard header.

[committed v2][libstdc++] Add constexpr specifier to function __atomic_impl::__clear_padding

2024-06-08 Thread Jonathan Wakely

Here's what I pushed to trunk, using the macro instead of the plain
keyword, and with a testcase.

Thanks for the patch, Deev.

Tested x86_64-linux. Pushed to trunk. I'll backport this too.

-- >8 --

This is called from the std::atomic constructor,
which needs to be usable in constant expressions.

libstdc++-v3/ChangeLog:

* include/bits/atomic_base.h (__atomic_impl::__clear_padding):
Add missing constexpr specifier.
* testsuite/29_atomics/atomic_float/constinit.cc: New test.

Co-authored-by: Jonathan Wakely 
---
 libstdc++-v3/include/bits/atomic_base.h | 2 +-
 libstdc++-v3/testsuite/29_atomics/atomic_float/constinit.cc | 3 +++
 2 files changed, 4 insertions(+), 1 deletion(-)
 create mode 100644 libstdc++-v3/testsuite/29_atomics/atomic_float/constinit.cc

diff --git a/libstdc++-v3/include/bits/atomic_base.h 
b/libstdc++-v3/include/bits/atomic_base.h
index 062f1549740..20901b7fc06 100644
--- a/libstdc++-v3/include/bits/atomic_base.h
+++ b/libstdc++-v3/include/bits/atomic_base.h
@@ -968,7 +968,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
 template
-  _GLIBCXX_ALWAYS_INLINE _Tp*
+  _GLIBCXX_ALWAYS_INLINE _GLIBCXX14_CONSTEXPR _Tp*
   __clear_padding(_Tp& __val) noexcept
   {
auto* __ptr = std::__addressof(__val);
diff --git a/libstdc++-v3/testsuite/29_atomics/atomic_float/constinit.cc 
b/libstdc++-v3/testsuite/29_atomics/atomic_float/constinit.cc
new file mode 100644
index 000..6b3f4f76b4c
--- /dev/null
+++ b/libstdc++-v3/testsuite/29_atomics/atomic_float/constinit.cc
@@ -0,0 +1,3 @@
+// { dg-do compile { target c++20 } }
+#include 
+constinit std::atomic a(0.0f);
-- 
2.45.1

[committed v4] libstdc++: Fix std::ranges::iota is not included in numeric [PR108760]

2024-06-08 Thread Jonathan Wakely

From: Michael Levine 

I committed the missing include separately, and pushed Michael's change
as attached (with some whitespace tweaks and a changelog entry).

Thanks for the patch, Michael!

Tested x86_64-linux. Pushed to trunk.

I'm in two minds about backporting this one. It would be good to fix the
non-conformance problem for the release branches, but it also
potentially breaks some code that uses ranges::iota without including
. Ideally we'd make ranges::iota available in *both* 
and  for gcc-13 and gcc-14, as a transition aid. I'm not sure
I can be bothered to move it to a separate header to make that work, nor
to include all of bits/ranges_algo.h in .

-- >8 --

Before this patch, using std::ranges::iota required including
 when it should have been sufficient to only include
.

libstdc++-v3/ChangeLog:

PR libstdc++/108760
* include/bits/ranges_algo.h (ranges::out_value_result):
Move to .
(ranges::iota_result, ranges::__iota_fn, ranges::iota): Move to
.
* include/bits/ranges_algobase.h (ranges::out_value_result):
Move to here.
* include/std/numeric (ranges::iota_result, ranges::__iota_fn)
(ranges::iota): Move to here.
* testsuite/25_algorithms/iota/1.cc: Renamed to ...
* testsuite/26_numerics/iota/2.cc: ... here.

Signed-off-by: Michael Levine 
---
 libstdc++-v3/include/bits/ranges_algo.h   | 52 ---
 libstdc++-v3/include/bits/ranges_algobase.h   | 24 +
 libstdc++-v3/include/std/numeric  | 38 ++
 .../iota/1.cc => 26_numerics/iota/2.cc}   |  2 +-
 4 files changed, 63 insertions(+), 53 deletions(-)
 rename libstdc++-v3/testsuite/{25_algorithms/iota/1.cc => 
26_numerics/iota/2.cc} (96%)

diff --git a/libstdc++-v3/include/bits/ranges_algo.h 
b/libstdc++-v3/include/bits/ranges_algo.h
index 62faff173bd..d258be0b93f 100644
--- a/libstdc++-v3/include/bits/ranges_algo.h
+++ b/libstdc++-v3/include/bits/ranges_algo.h
@@ -3521,58 +3521,6 @@ namespace ranges
 
 #endif // __glibcxx_ranges_contains
 
-#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
-
-  template
-struct out_value_result
-{
-  [[no_unique_address]] _Out out;
-  [[no_unique_address]] _Tp value;
-
-  template
-   requires convertible_to
- && convertible_to
-   constexpr
-   operator out_value_result<_Out2, _Tp2>() const &
-   { return {out, value}; }
-
-  template
-   requires convertible_to<_Out, _Out2>
- && convertible_to<_Tp, _Tp2>
-   constexpr
-   operator out_value_result<_Out2, _Tp2>() &&
-   { return {std::move(out), std::move(value)}; }
-};
-
-  template
-using iota_result = out_value_result<_Out, _Tp>;
-
-  struct __iota_fn
-  {
-template _Sent, 
weakly_incrementable _Tp>
-  requires indirectly_writable<_Out, const _Tp&>
-  constexpr iota_result<_Out, _Tp>
-  operator()(_Out __first, _Sent __last, _Tp __value) const
-  {
-   while (__first != __last)
- {
-   *__first = static_cast(__value);
-   ++__first;
-   ++__value;
- }
-   return {std::move(__first), std::move(__value)};
-  }
-
-template _Range>
-  constexpr iota_result, _Tp>
-  operator()(_Range&& __r, _Tp __value) const
-  { return (*this)(ranges::begin(__r), ranges::end(__r), 
std::move(__value)); }
-  };
-
-  inline constexpr __iota_fn iota{};
-
-#endif // __glibcxx_ranges_iota
-
 #if __glibcxx_ranges_find_last >= 202207L // C++ >= 23
 
   struct __find_last_fn
diff --git a/libstdc++-v3/include/bits/ranges_algobase.h 
b/libstdc++-v3/include/bits/ranges_algobase.h
index e1f00838818..7ce5ac314f2 100644
--- a/libstdc++-v3/include/bits/ranges_algobase.h
+++ b/libstdc++-v3/include/bits/ranges_algobase.h
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include  // __memcpy
 #include  // ranges::begin, ranges::range etc.
 #include   // __invoke
 #include  // __is_byte
@@ -71,6 +72,29 @@ namespace ranges
__is_move_iterator> = true;
   } // namespace __detail
 
+#if __glibcxx_ranges_iota >= 202202L // C++ >= 23
+  template
+struct out_value_result
+{
+  [[no_unique_address]] _Out out;
+  [[no_unique_address]] _Tp value;
+
+  template
+   requires convertible_to
+ && convertible_to
+   constexpr
+   operator out_value_result<_Out2, _Tp2>() const &
+   { return {out, value}; }
+
+  template
+   requires convertible_to<_Out, _Out2>
+ && convertible_to<_Tp, _Tp2>
+   constexpr
+   operator out_value_result<_Out2, _Tp2>() &&
+   { return {std::move(out), std::move(value)}; }
+};
+#endif // __glibcxx_ranges_iota
+
   struct __equal_fn
   {
 template _Sent1,
diff --git a/libstdc++-v3/include/std/numeric b/libstdc++-v3/include/std/numeric
index c912db4a519..201bb8e74a1 100644
--- a/libstdc++-v3/include/std/numeric
+++ b/libstdc++-v3/include/std/numeric
@@ -89,6 +89,10 @@
 #define __glibcxx_want_sa

[committed] libstdc++: Define __cpp_lib_ranges in

2024-06-08 Thread Jonathan Wakely

Tested x86_64-linux. Pushed to trunk.

-- >8 --

The __cpp_lib_ranges macro is missing from .

libstdc++-v3/ChangeLog:

* include/std/algorithm: Define __glibcxx_want_ranges.
* testsuite/25_algorithms/headers/algorithm/synopsis.cc: Check
feature test macro in C++20 mode.
---
 libstdc++-v3/include/std/algorithm| 1 +
 .../testsuite/25_algorithms/headers/algorithm/synopsis.cc | 8 
 2 files changed, 9 insertions(+)

diff --git a/libstdc++-v3/include/std/algorithm 
b/libstdc++-v3/include/std/algorithm
index a4602a8807e..163e6b5dca7 100644
--- a/libstdc++-v3/include/std/algorithm
+++ b/libstdc++-v3/include/std/algorithm
@@ -67,6 +67,7 @@
 #define __glibcxx_want_constexpr_algorithms
 #define __glibcxx_want_freestanding_algorithm
 #define __glibcxx_want_parallel_algorithm
+#define __glibcxx_want_ranges
 #define __glibcxx_want_ranges_contains
 #define __glibcxx_want_ranges_find_last
 #define __glibcxx_want_ranges_fold
diff --git a/libstdc++-v3/testsuite/25_algorithms/headers/algorithm/synopsis.cc 
b/libstdc++-v3/testsuite/25_algorithms/headers/algorithm/synopsis.cc
index 8c61a614a47..08a47aa95c3 100644
--- a/libstdc++-v3/testsuite/25_algorithms/headers/algorithm/synopsis.cc
+++ b/libstdc++-v3/testsuite/25_algorithms/headers/algorithm/synopsis.cc
@@ -19,6 +19,14 @@
 
 #include 
 
+#if __cplusplus >= 202002L
+#ifndef __cpp_lib_ranges
+# error "Feature test macro for ranges is missing in "
+#elif __cpp_lib_ranges < 201911L
+# error "Feature test macro for ranges has wrong value in "
+#endif
+#endif
+
 namespace std
  {
   // 25.1, non-modifying sequence operations:
-- 
2.45.1

Re: [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-06-08 Thread Jeff Law





On 6/8/24 1:53 AM, Richard Sandiford wrote:



I realise there are many ways of writing this out there though,
so that's just a suggestion.  (And only lightly tested.)

FWIW, we could easily extend the interface to work on wide_ints if we
ever need it for N>63.
I think there's constraints elsewhere that keep us in the N<=63 range. 
If we extended things elsewhere to include TI then we could fully 
support 64bit CRCs.


I don't *think* it's that hard, but we haven't actually tried.

Jeff

[PATCH v3 6/6] aarch64: Add DLL import/export to AArch64 target

2024-06-08 Thread Evgeny Karpov

This patch reuses the MinGW implementation to enable DLL import/export
functionality for the aarch64-w64-mingw32 target. It also modifies
environment configurations for MinGW.

gcc/ChangeLog:

* config.gcc: Add winnt-dll.o, which contains the DLL
import/export implementation.
* config/aarch64/aarch64.cc (aarch64_legitimize_pe_coff_symbol):
Add a conditional function that reuses the MinGW implementation
for COFF and does nothing otherwise.
(aarch64_load_symref_appropriately): Add dllimport
implementation.
(aarch64_expand_call): Likewise.
(aarch64_legitimize_address): Likewise.
* config/aarch64/cygming.h (SYMBOL_FLAG_DLLIMPORT): Modify MinGW
environment to support DLL import/export.
(SYMBOL_FLAG_DLLEXPORT): Likewise.
(SYMBOL_REF_DLLIMPORT_P): Likewise.
(SYMBOL_FLAG_STUBVAR): Likewise.
(SYMBOL_REF_STUBVAR_P): Likewise.
(TARGET_VALID_DLLIMPORT_ATTRIBUTE_P): Likewise.
(TARGET_ASM_FILE_END): Likewise.
(SUB_TARGET_RECORD_STUB): Likewise.
(GOT_ALIAS_SET): Likewise.
(PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED): Likewise.
(HAVE_64BIT_POINTERS): Likewise.
---
 gcc/config.gcc|  4 +++-
 gcc/config/aarch64/aarch64.cc | 37 +++
 gcc/config/aarch64/cygming.h  | 26 ++--
 3 files changed, 64 insertions(+), 3 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index d053b98efa8..331285b7b6d 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1276,10 +1276,12 @@ aarch64-*-mingw*)
tm_file="${tm_file} mingw/mingw32.h"
tm_file="${tm_file} mingw/mingw-stdint.h"
tm_file="${tm_file} mingw/winnt.h"
+   tm_file="${tm_file} mingw/winnt-dll.h"
tmake_file="${tmake_file} aarch64/t-aarch64"
target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
+   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt-dll.cc"
extra_options="${extra_options} mingw/cygming.opt mingw/mingw.opt"
-   extra_objs="${extra_objs} winnt.o"
+   extra_objs="${extra_objs} winnt.o winnt-dll.o"
c_target_objs="${c_target_objs} msformat-c.o"
d_target_objs="${d_target_objs} winnt-d.o"
tmake_file="${tmake_file} mingw/t-cygming"
diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 3418e57218f..5706b9aeb6b 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -860,6 +860,10 @@ static const attribute_spec aarch64_gnu_attributes[] =
   { "Advanced SIMD type", 1, 1, false, true,  false, true,  NULL, NULL },
   { "SVE type",  3, 3, false, true,  false, true,  NULL, NULL 
},
   { "SVE sizeless type",  0, 0, false, true,  false, true,  NULL, NULL },
+#if TARGET_DLLIMPORT_DECL_ATTRIBUTES
+  { "dllimport", 0, 0, false, false, false, false, handle_dll_attribute, NULL 
},
+  { "dllexport", 0, 0, false, false, false, false, handle_dll_attribute, NULL 
},
+#endif
 #ifdef SUBTARGET_ATTRIBUTE_TABLE
   SUBTARGET_ATTRIBUTE_TABLE
 #endif
@@ -2819,6 +2823,15 @@ tls_symbolic_operand_type (rtx addr)
   return tls_kind;
 }
 
+rtx aarch64_legitimize_pe_coff_symbol (rtx addr, bool inreg)
+{
+#if TARGET_PECOFF
+  return legitimize_pe_coff_symbol (addr, inreg);
+#else
+  return NULL_RTX;
+#endif
+}
+
 /* We'll allow lo_sum's in addresses in our legitimate addresses
so that combine would take care of combining addresses where
necessary, but for generation purposes, we'll generate the address
@@ -2865,6 +2878,17 @@ static void
 aarch64_load_symref_appropriately (rtx dest, rtx imm,
   enum aarch64_symbol_type type)
 {
+  /* If legitimize returns a value
+ copy it directly to the destination and return.  */
+
+  rtx tmp = aarch64_legitimize_pe_coff_symbol (imm, true);
+
+  if (tmp)
+{
+   emit_insn (gen_rtx_SET (dest, tmp));
+   return;
+}
+
   switch (type)
 {
 case SYMBOL_SMALL_ABSOLUTE:
@@ -11233,6 +11257,12 @@ aarch64_expand_call (rtx result, rtx mem, rtx cookie, 
bool sibcall)
 
   gcc_assert (MEM_P (mem));
   callee = XEXP (mem, 0);
+
+  tmp = aarch64_legitimize_pe_coff_symbol (callee, false);
+
+  if (tmp)
+callee = tmp;
+
   mode = GET_MODE (callee);
   gcc_assert (mode == Pmode);
 
@@ -12709,6 +12739,13 @@ aarch64_anchor_offset (HOST_WIDE_INT offset, 
HOST_WIDE_INT size,
 static rtx
 aarch64_legitimize_address (rtx x, rtx /* orig_x  */, machine_mode mode)
 {
+  if (TARGET_DLLIMPORT_DECL_ATTRIBUTES)
+{
+  rtx tmp = aarch64_legitimize_pe_coff_symbol (x, true);
+  if (tmp)
+   return tmp;
+}
+
   /* Try to split X+CONST into Y=X+(CONST & ~mask), Y+(CONST&mask),
  where mask is selected by alignment and size of the offset.
  We try to pick as large a range for the offset as possible to
diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
index 76623153080..e26488735db

[PATCH v3 5/6] Adjust DLL import/export implementation for AArch64

2024-06-08 Thread Evgeny Karpov

The DLL import/export mingw implementation, originally from ix86, requires
minor adjustments to be compatible with AArch64.

gcc/ChangeLog:

* config/i386/cygming.h (PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED):
Declare whether an external declaration should be legitimized.
(HAVE_64BIT_POINTERS): Define whether the target supports 64-bit
pointers.
* config/mingw/mingw32.h (defined): Use the correct DllMainCRTStartup
entry function.
* config/mingw/winnt-dll.cc (defined): Exclude ix86-related code.
---
 gcc/config/i386/cygming.h | 5 +
 gcc/config/mingw/mingw32.h| 2 +-
 gcc/config/mingw/winnt-dll.cc | 2 +-
 3 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index 4bb8d7f920c..0493b3be875 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -472,3 +472,8 @@ do {\
 
 #undef GOT_ALIAS_SET
 #define GOT_ALIAS_SET mingw_GOT_alias_set ()
+
+#define PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED \
+  ix86_cmodel == CM_LARGE_PIC || ix86_cmodel == CM_MEDIUM_PIC
+
+#define HAVE_64BIT_POINTERS TARGET_64BIT_DEFAULT
diff --git a/gcc/config/mingw/mingw32.h b/gcc/config/mingw/mingw32.h
index fa6e307476c..0c9d5424942 100644
--- a/gcc/config/mingw/mingw32.h
+++ b/gcc/config/mingw/mingw32.h
@@ -82,7 +82,7 @@ along with GCC; see the file COPYING3.  If not see
 #endif
 
 #undef SUB_LINK_ENTRY
-#if TARGET_64BIT_DEFAULT
+#if HAVE_64BIT_POINTERS
 #define SUB_LINK_ENTRY SUB_LINK_ENTRY64
 #else
 #define SUB_LINK_ENTRY SUB_LINK_ENTRY32
diff --git a/gcc/config/mingw/winnt-dll.cc b/gcc/config/mingw/winnt-dll.cc
index 1354402a959..66c445cba77 100644
--- a/gcc/config/mingw/winnt-dll.cc
+++ b/gcc/config/mingw/winnt-dll.cc
@@ -206,7 +206,7 @@ legitimize_pe_coff_symbol (rtx addr, bool inreg)
}
 }
 
-  if (ix86_cmodel != CM_LARGE_PIC && ix86_cmodel != CM_MEDIUM_PIC)
+  if (!PE_COFF_EXTERN_DECL_SHOULD_BE_LEGITIMIZED)
 return NULL_RTX;
 
   if (GET_CODE (addr) == SYMBOL_REF
-- 
2.25.1

[PATCH v3 4/6] aarch64: Add selectany attribute handling

2024-06-08 Thread Evgeny Karpov

This patch extends the aarch64 attributes list with the selectany
attribute for the aarch64-w64-mingw32 target and reuses the mingw
implementation to handle it.

* config/aarch64/aarch64.cc:
Extend the aarch64 attributes list.
* config/aarch64/cygming.h (SUBTARGET_ATTRIBUTE_TABLE):
Define the selectany attribute.
---
 gcc/config/aarch64/aarch64.cc | 5 -
 gcc/config/aarch64/cygming.h  | 3 +++
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
index 13191ec8e34..3418e57218f 100644
--- a/gcc/config/aarch64/aarch64.cc
+++ b/gcc/config/aarch64/aarch64.cc
@@ -859,7 +859,10 @@ static const attribute_spec aarch64_gnu_attributes[] =
  NULL },
   { "Advanced SIMD type", 1, 1, false, true,  false, true,  NULL, NULL },
   { "SVE type",  3, 3, false, true,  false, true,  NULL, NULL 
},
-  { "SVE sizeless type",  0, 0, false, true,  false, true,  NULL, NULL }
+  { "SVE sizeless type",  0, 0, false, true,  false, true,  NULL, NULL },
+#ifdef SUBTARGET_ATTRIBUTE_TABLE
+  SUBTARGET_ATTRIBUTE_TABLE
+#endif
 };
 
 static const scoped_attribute_specs aarch64_gnu_attribute_table =
diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
index 0d048879311..76623153080 100644
--- a/gcc/config/aarch64/cygming.h
+++ b/gcc/config/aarch64/cygming.h
@@ -154,6 +154,9 @@ still needed for compilation.  */
 flag_stack_check = STATIC_BUILTIN_STACK_CHECK; \
   } while (0)
 
+#define SUBTARGET_ATTRIBUTE_TABLE \
+  { "selectany", 0, 0, true, false, false, false, \
+mingw_handle_selectany_attribute, NULL }
 
 #define SUPPORTS_ONE_ONLY 1
 
-- 
2.25.1

[PATCH v3 3/6] Rename functions for reuse in AArch64

2024-06-08 Thread Evgeny Karpov

This patch renames functions related to dllimport/dllexport
and selectany functionality. These functions will be reused
in the aarch64-w64-mingw32 target.

gcc/ChangeLog:

* config/i386/cygming.h (mingw_pe_record_stub):
Rename functions in mingw folder which will be reused for
aarch64.
(TARGET_ASM_FILE_END): Update to new target-independent name.
(SUBTARGET_ATTRIBUTE_TABLE): Likewise.
(TARGET_VALID_DLLIMPORT_ATTRIBUTE_P): Likewise.
(SUB_TARGET_RECORD_STUB): Likewise.
* config/i386/i386-protos.h (ix86_handle_selectany_attribute): Likewise.
(mingw_handle_selectany_attribute): Likewise.
(i386_pe_valid_dllimport_attribute_p): Likewise.
(mingw_pe_valid_dllimport_attribute_p): Likewise.
(i386_pe_file_end): Likewise.
(mingw_pe_file_end): Likewise.
(i386_pe_record_stub): Likewise.
(mingw_pe_record_stub): Likewise.
* config/mingw/winnt.cc (ix86_handle_selectany_attribute): Likewise.
(mingw_handle_selectany_attribute): Likewise.
(i386_pe_valid_dllimport_attribute_p): Likewise.
(mingw_pe_valid_dllimport_attribute_p): Likewise.
(i386_pe_record_stub): Likewise.
(mingw_pe_record_stub): Likewise.
(i386_pe_file_end): Likewise.
(mingw_pe_file_end): Likewise.
* config/mingw/winnt.h (mingw_handle_selectany_attribute):
Declate functionality that will be reused by multiple targets.
(mingw_pe_file_end): Likewise.
(mingw_pe_record_stub): Likewise.
(mingw_pe_valid_dllimport_attribute_p): Likewise.
---
 gcc/config/i386/cygming.h | 6 +++---
 gcc/config/i386/i386-protos.h | 3 ---
 gcc/config/mingw/winnt.cc | 8 
 gcc/config/mingw/winnt.h  | 6 +-
 4 files changed, 12 insertions(+), 11 deletions(-)

diff --git a/gcc/config/i386/cygming.h b/gcc/config/i386/cygming.h
index 56945f00c11..4bb8d7f920c 100644
--- a/gcc/config/i386/cygming.h
+++ b/gcc/config/i386/cygming.h
@@ -344,7 +344,7 @@ do {\
 
 /* Output function declarations at the end of the file.  */
 #undef TARGET_ASM_FILE_END
-#define TARGET_ASM_FILE_END i386_pe_file_end
+#define TARGET_ASM_FILE_END mingw_pe_file_end
 
 /* Kludge because of missing PE-COFF support for early LTO debug.  */
 #undef  TARGET_ASM_LTO_START
@@ -445,7 +445,7 @@ do {\
 
 #define SUBTARGET_ATTRIBUTE_TABLE \
   { "selectany", 0, 0, true, false, false, false, \
-ix86_handle_selectany_attribute, NULL }
+mingw_handle_selectany_attribute, NULL }
   /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
affects_type_identity, handler, exclude } */
 
@@ -453,7 +453,7 @@ do {\
 #undef NO_PROFILE_COUNTERS
 #define NO_PROFILE_COUNTERS 1
 
-#define TARGET_VALID_DLLIMPORT_ATTRIBUTE_P i386_pe_valid_dllimport_attribute_p
+#define TARGET_VALID_DLLIMPORT_ATTRIBUTE_P mingw_pe_valid_dllimport_attribute_p
 #define TARGET_CXX_ADJUST_CLASS_AT_DEFINITION 
i386_pe_adjust_class_at_definition
 #define SUBTARGET_MANGLE_DECL_ASSEMBLER_NAME i386_pe_mangle_decl_assembler_name
 
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index a9171c3d2d8..4f48dc0bf75 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -269,7 +269,6 @@ extern unsigned int ix86_local_alignment (tree, 
machine_mode,
 extern unsigned int ix86_minimum_alignment (tree, machine_mode,
unsigned int);
 extern tree ix86_handle_shared_attribute (tree *, tree, tree, int, bool *);
-extern tree ix86_handle_selectany_attribute (tree *, tree, tree, int, bool *);
 extern int x86_field_alignment (tree, int);
 extern tree ix86_valid_target_attribute_tree (tree, tree,
  struct gcc_options *,
@@ -309,12 +308,10 @@ extern void ix86_register_pragmas (void);
 extern void i386_pe_record_external_function (tree, const char *);
 extern bool i386_pe_binds_local_p (const_tree);
 extern const char *i386_pe_strip_name_encoding_full (const char *);
-extern bool i386_pe_valid_dllimport_attribute_p (const_tree);
 extern void i386_pe_asm_output_aligned_decl_common (FILE *, tree,
const char *,
HOST_WIDE_INT,
HOST_WIDE_INT);
-extern void i386_pe_file_end (void);
 extern void i386_pe_asm_lto_start (void);
 extern void i386_pe_asm_lto_end (void);
 extern void i386_pe_start_function (FILE *, const char *, tree);
diff --git a/gcc/config/mingw/winnt.cc b/gcc/config/mingw/winnt.cc
index 9901576ade0..803e5f5ec85 100644
--- a/gcc/config/mingw/winnt.cc
+++ b/gcc/config/mingw/winnt.cc
@@ -71,8 +71,8 @@ ix86_handle_shared_attribute (tree *node, tree name, tree, 
int,
 /* Handle a "selectany" attribute;
arg

[PATCH v3 2/6] Extract ix86 dllimport implementation to mingw

2024-06-08 Thread Evgeny Karpov


From 8f6fd4775792b443a72dfbc8d95bf3ff5b516d18 Mon Sep 17 00:00:00 2001
From: Evgeny Karpov 
Date: Thu, 6 Jun 2024 22:38:35 +0200
Subject: [PATCH v3 2/6] Extract ix86 dllimport implementation to mingw

This patch extracts the ix86 implementation for expanding a SYMBOL
into its corresponding dllimport, far-address, or refptr symbol.
It will be reused in the aarch64-w64-mingw32 target.
The implementation is copied as is from i386/i386.cc with
minor changes to follow to the code style.

Also this patch replaces the original DLL import/export
implementation in ix86 with mingw.

gcc/ChangeLog:

* config.gcc: Add winnt-dll.o, which contains the DLL
import/export implementation.
* config/i386/cygming.h (SUB_TARGET_RECORD_STUB): Remove the
old implementation. Rename the required function to MinGW.
Use MinGW implementation for COFF and nothing otherwise.
(GOT_ALIAS_SET): Likewise.
* config/i386/i386-expand.cc (ix86_expand_move): Likewise.
* config/i386/i386-expand.h (ix86_GOT_alias_set): Likewise.
(legitimize_pe_coff_symbol): Likewise.
* config/i386/i386-protos.h (i386_pe_record_stub): Likewise.
* config/i386/i386.cc (is_imported_p): Likewise.
(legitimate_pic_address_disp_p): Likewise.
(ix86_GOT_alias_set): Likewise.
(legitimize_pic_address): Likewise.
(legitimize_tls_address): Likewise.
(struct dllimport_hasher): Likewise.
(GTY): Likewise.
(get_dllimport_decl): Likewise.
(legitimize_pe_coff_extern_decl): Likewise.
(legitimize_dllimport_symbol): Likewise.
(legitimize_pe_coff_symbol): Likewise.
(ix86_legitimize_address): Likewise.
* config/i386/i386.h (GOT_ALIAS_SET): Likewise.
* config/mingw/winnt.cc (i386_pe_record_stub): Likewise.
(mingw_pe_record_stub): Likewise.
* config/mingw/winnt.h (mingw_pe_record_stub): Likewise.
* config/mingw/t-cygming: Add the winnt-dll.o compilation.
* config/mingw/winnt-dll.cc: New file.
* config/mingw/winnt-dll.h: New file.
---
 gcc/config.gcc |  12 +-
 gcc/config/i386/cygming.h  |   5 +-
 gcc/config/i386/i386-expand.cc |   4 +-
 gcc/config/i386/i386-expand.h  |   2 -
 gcc/config/i386/i386-protos.h  |   1 -
 gcc/config/i386/i386.cc| 205 ++---
 gcc/config/i386/i386.h |   2 +
 gcc/config/mingw/t-cygming |   6 +
 gcc/config/mingw/winnt-dll.cc  | 231 +
 gcc/config/mingw/winnt-dll.h   |  30 +
 gcc/config/mingw/winnt.cc  |   2 +-
 gcc/config/mingw/winnt.h   |   1 +
 12 files changed, 298 insertions(+), 203 deletions(-)
 create mode 100644 gcc/config/mingw/winnt-dll.cc
 create mode 100644 gcc/config/mingw/winnt-dll.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 553a310f4bd..d053b98efa8 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -2177,11 +2177,13 @@ i[4567]86-wrs-vxworks*|x86_64-wrs-vxworks7*)
 i[34567]86-*-cygwin*)
tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h i386/cygming.h 
i386/cygwin.h i386/cygwin-stdint.h"
tm_file="${tm_file} mingw/winnt.h"
+   tm_file="${tm_file} mingw/winnt-dll.h"
xm_file=i386/xm-cygwin.h
tmake_file="${tmake_file} mingw/t-cygming t-slibgcc"
target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
+   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt-dll.cc"
extra_options="${extra_options} mingw/cygming.opt i386/cygwin.opt"
-   extra_objs="${extra_objs} winnt.o winnt-stubs.o"
+   extra_objs="${extra_objs} winnt.o winnt-stubs.o winnt-dll.o"
c_target_objs="${c_target_objs} msformat-c.o"
cxx_target_objs="${cxx_target_objs} winnt-cxx.o msformat-c.o"
d_target_objs="${d_target_objs} cygwin-d.o"
@@ -2196,11 +2198,13 @@ x86_64-*-cygwin*)
need_64bit_isa=yes
tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h i386/cygming.h 
i386/cygwin.h i386/cygwin-w64.h i386/cygwin-stdint.h"
tm_file="${tm_file} mingw/winnt.h"
+   tm_file="${tm_file} mingw/winnt-dll.h"
xm_file=i386/xm-cygwin.h
tmake_file="${tmake_file} mingw/t-cygming t-slibgcc"
target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
+   target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt-dll.cc"
extra_options="${extra_options} mingw/cygming.opt i386/cygwin.opt"
-   extra_objs="${extra_objs} winnt.o winnt-stubs.o"
+   extra_objs="${extra_objs} winnt.o winnt-stubs.o winnt-dll.o"
c_target_objs="${c_target_objs} msformat-c.o"
cxx_target_objs="${cxx_target_objs} winnt-cxx.o msformat-c.o"
d_target_objs="${d_target_objs} cygwin-d.o"
@@ -2266,6 +2270,7 @@ i[34567]86-*-mingw* | x86_64-*-mingw*)
esac
tm_file="${tm_file} mingw/mingw-stdint.h"
tm_file="${tm_file} mingw/winnt.h"
+   tm_file="${tm_file} mingw/winnt-dll.h"
t

[PATCH v3 2/6] Extract ix86 dllimport implementation to mingw

2024-06-08 Thread Evgeny Karpov




v3-0002-Extract-ix86-dllimport-implementation-to-mingw.patch
Description: v3-0002-Extract-ix86-dllimport-implementation-to-mingw.patch

[PATCH v3 1/6] Move mingw_* declarations to the mingw folder

2024-06-08 Thread Evgeny Karpov

This patch refactors recent changes to move mingw-related
functionality to the mingw folder. More renamings to the mingw_
prefix will be done in follow-up commits.

This is the first commit in the second patch series to add DLL
import/export implementation to AArch64.

Coauthors: Zac Walker ,
Mark Harmstone   and
Ron Riddle 

Refactored, prepared, and validated by
Radek Barton  and
Evgeny Karpov 

gcc/ChangeLog:

* config.gcc: Move mingw_* declations to mingw.
* config/aarch64/aarch64-protos.h
(mingw_pe_maybe_record_exported_symbol): Likewise.
(mingw_pe_section_type_flags): Likewise.
(mingw_pe_unique_section): Likewise.
(mingw_pe_encode_section_info): Likewise.
* config/aarch64/cygming.h
(mingw_pe_asm_named_section): Likewise.
(mingw_pe_declare_function_type): Likewise.
* config/i386/i386-protos.h
(mingw_pe_unique_section): Likewise.
(mingw_pe_declare_function_type): Likewise.
(mingw_pe_maybe_record_exported_symbol): Likewise.
(mingw_pe_encode_section_info): Likewise.
(mingw_pe_section_type_flags): Likewise.
(mingw_pe_asm_named_section): Likewise.
* config/mingw/winnt.h: New file.
---
 gcc/config.gcc  |  4 
 gcc/config/aarch64/aarch64-protos.h |  5 -
 gcc/config/aarch64/cygming.h|  4 
 gcc/config/i386/i386-protos.h   |  6 --
 gcc/config/mingw/winnt.h| 33 +
 5 files changed, 37 insertions(+), 15 deletions(-)
 create mode 100644 gcc/config/mingw/winnt.h

diff --git a/gcc/config.gcc b/gcc/config.gcc
index e500ba63e32..553a310f4bd 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -1275,6 +1275,7 @@ aarch64-*-mingw*)
tm_file="${tm_file} aarch64/cygming.h"
tm_file="${tm_file} mingw/mingw32.h"
tm_file="${tm_file} mingw/mingw-stdint.h"
+   tm_file="${tm_file} mingw/winnt.h"
tmake_file="${tmake_file} aarch64/t-aarch64"
target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
extra_options="${extra_options} mingw/cygming.opt mingw/mingw.opt"
@@ -2175,6 +2176,7 @@ i[4567]86-wrs-vxworks*|x86_64-wrs-vxworks7*)
;;
 i[34567]86-*-cygwin*)
tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h i386/cygming.h 
i386/cygwin.h i386/cygwin-stdint.h"
+   tm_file="${tm_file} mingw/winnt.h"
xm_file=i386/xm-cygwin.h
tmake_file="${tmake_file} mingw/t-cygming t-slibgcc"
target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
@@ -2193,6 +2195,7 @@ i[34567]86-*-cygwin*)
 x86_64-*-cygwin*)
need_64bit_isa=yes
tm_file="${tm_file} i386/unix.h i386/bsd.h i386/gas.h i386/cygming.h 
i386/cygwin.h i386/cygwin-w64.h i386/cygwin-stdint.h"
+   tm_file="${tm_file} mingw/winnt.h"
xm_file=i386/xm-cygwin.h
tmake_file="${tmake_file} mingw/t-cygming t-slibgcc"
target_gtfiles="$target_gtfiles \$(srcdir)/config/mingw/winnt.cc"
@@ -2262,6 +2265,7 @@ i[34567]86-*-mingw* | x86_64-*-mingw*)
;;
esac
tm_file="${tm_file} mingw/mingw-stdint.h"
+   tm_file="${tm_file} mingw/winnt.h"
tmake_file="${tmake_file} t-winnt mingw/t-cygming t-slibgcc"
 case ${target} in
x86_64-w64-*)
diff --git a/gcc/config/aarch64/aarch64-protos.h 
b/gcc/config/aarch64/aarch64-protos.h
index 1d3f94c813e..42639e9efcf 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -1110,11 +1110,6 @@ extern void aarch64_output_patchable_area (unsigned int, 
bool);
 
 extern void aarch64_adjust_reg_alloc_order ();
 
-extern void mingw_pe_maybe_record_exported_symbol (tree, const char *, int);
-extern unsigned int mingw_pe_section_type_flags (tree, const char *, int);
-extern void mingw_pe_unique_section (tree, int);
-extern void mingw_pe_encode_section_info (tree, rtx, int);
-
 bool aarch64_optimize_mode_switching (aarch64_mode_entity);
 void aarch64_restore_za (rtx);
 
diff --git a/gcc/config/aarch64/cygming.h b/gcc/config/aarch64/cygming.h
index 2e7b01feb76..0d048879311 100644
--- a/gcc/config/aarch64/cygming.h
+++ b/gcc/config/aarch64/cygming.h
@@ -51,10 +51,6 @@ still needed for compilation.  */
 #include 
 #endif
 
-extern void mingw_pe_asm_named_section (const char *, unsigned int, tree);
-extern void mingw_pe_declare_function_type (FILE *file, const char *name,
-   int pub);
-
 #define TARGET_ASM_NAMED_SECTION  mingw_pe_asm_named_section
 
 /* Select attributes for named sections.  */
diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h
index f37d207ae64..a924cb3b620 100644
--- a/gcc/config/i386/i386-protos.h
+++ b/gcc/config/i386/i386-protos.h
@@ -306,16 +306,10 @@ extern void ix86_target_macros (void);
 extern void ix86_register_pragmas (void);
 
 /* In winnt.cc  */
-extern void mingw_pe_unique_section (tree, int);
-extern void mingw_pe_declare_function_type (FILE *, const ch

[PATCH v3 0/6] Add DLL import/export implementation to AArch64

2024-06-08 Thread Evgeny Karpov

I am resubmitting the patch series without changes to the patchwork due to an 
issue with the mail client in the previous submission.
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653894.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653952.html

Regards,
Evgeny

Evgeny Karpov (6):
  Move mingw_* declarations to the mingw folder
  Extract ix86 dllimport implementation to mingw
  Rename functions for reuse in AArch64
  aarch64: Add selectany attribute handling
  Adjust DLL import/export implementation for AArch64
  aarch64: Add DLL import/export to AArch64 target

 gcc/config.gcc  |  20 ++-
 gcc/config/aarch64/aarch64-protos.h |   5 -
 gcc/config/aarch64/aarch64.cc   |  42 -
 gcc/config/aarch64/cygming.h|  33 +++-
 gcc/config/i386/cygming.h   |  16 +-
 gcc/config/i386/i386-expand.cc  |   4 +-
 gcc/config/i386/i386-expand.h   |   2 -
 gcc/config/i386/i386-protos.h   |  10 --
 gcc/config/i386/i386.cc | 205 ++--
 gcc/config/i386/i386.h  |   2 +
 gcc/config/mingw/mingw32.h  |   2 +-
 gcc/config/mingw/t-cygming  |   6 +
 gcc/config/mingw/winnt-dll.cc   | 231 
 gcc/config/mingw/winnt-dll.h|  30 
 gcc/config/mingw/winnt.cc   |  10 +-
 gcc/config/mingw/winnt.h|  38 +
 16 files changed, 423 insertions(+), 233 deletions(-)
 create mode 100644 gcc/config/mingw/winnt-dll.cc
 create mode 100644 gcc/config/mingw/winnt-dll.h
 create mode 100644 gcc/config/mingw/winnt.h

-- 
2.25.1

[PATCH v2 0/6] Add DLL import/export implementation to AArch64

2024-06-08 Thread Evgeny Karpov

I am resubmitting the patch series without changes to the patchwork due to an 
issue with the mail client in the previous submission.
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653894.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653952.html

Regards,
Evgeny

Evgeny Karpov (6):
  Move mingw_* declarations to the mingw folder
  Extract ix86 dllimport implementation to mingw
  Rename functions for reuse in AArch64
  aarch64: Add selectany attribute handling
  Adjust DLL import/export implementation for AArch64
  aarch64: Add DLL import/export to AArch64 target

 gcc/config.gcc  |  20 ++-
 gcc/config/aarch64/aarch64-protos.h |   5 -
 gcc/config/aarch64/aarch64.cc   |  42 -
 gcc/config/aarch64/cygming.h|  33 +++-
 gcc/config/i386/cygming.h   |  16 +-
 gcc/config/i386/i386-expand.cc  |   4 +-
 gcc/config/i386/i386-expand.h   |   2 -
 gcc/config/i386/i386-protos.h   |  10 --
 gcc/config/i386/i386.cc | 205 ++--
 gcc/config/i386/i386.h  |   2 +
 gcc/config/mingw/mingw32.h  |   2 +-
 gcc/config/mingw/t-cygming  |   6 +
 gcc/config/mingw/winnt-dll.cc   | 231 
 gcc/config/mingw/winnt-dll.h|  30 
 gcc/config/mingw/winnt.cc   |  10 +-
 gcc/config/mingw/winnt.h|  38 +
 16 files changed, 423 insertions(+), 233 deletions(-)
 create mode 100644 gcc/config/mingw/winnt-dll.cc
 create mode 100644 gcc/config/mingw/winnt-dll.h
 create mode 100644 gcc/config/mingw/winnt.h

-- 
2.25.1

[PATCH v2 0/6] Add DLL import/export implementation to AArch64

2024-06-08 Thread Evgeny Karpov

I am resubmitting the patch series without changes to the patchwork due to an 
issue with the mail client in the previous submission.
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653894.html
https://gcc.gnu.org/pipermail/gcc-patches/2024-June/653952.html

Regards,
Evgeny

Evgeny Karpov (6):
  Move mingw_* declarations to the mingw folder
  Extract ix86 dllimport implementation to mingw
  Rename functions for reuse in AArch64
  aarch64: Add selectany attribute handling
  Adjust DLL import/export implementation for AArch64
  aarch64: Add DLL import/export to AArch64 target

 gcc/config.gcc  |  20 ++-
 gcc/config/aarch64/aarch64-protos.h |   5 -
 gcc/config/aarch64/aarch64.cc   |  42 -
 gcc/config/aarch64/cygming.h|  33 +++-
 gcc/config/i386/cygming.h   |  16 +-
 gcc/config/i386/i386-expand.cc  |   4 +-
 gcc/config/i386/i386-expand.h   |   2 -
 gcc/config/i386/i386-protos.h   |  10 --
 gcc/config/i386/i386.cc | 205 ++--
 gcc/config/i386/i386.h  |   2 +
 gcc/config/mingw/mingw32.h  |   2 +-
 gcc/config/mingw/t-cygming  |   6 +
 gcc/config/mingw/winnt-dll.cc   | 231 
 gcc/config/mingw/winnt-dll.h|  30 
 gcc/config/mingw/winnt.cc   |  10 +-
 gcc/config/mingw/winnt.h|  38 +
 16 files changed, 423 insertions(+), 233 deletions(-)
 create mode 100644 gcc/config/mingw/winnt-dll.cc
 create mode 100644 gcc/config/mingw/winnt-dll.h
 create mode 100644 gcc/config/mingw/winnt.h

-- 
2.25.1

[PATCH v2 2/6] Extract ix86 dllimport implementation to mingw

2024-06-08 Thread Evgeny Karpov

This patch makes changes to the i386.cc file, which contains the ASCII 0x0C 
character. However, this character was replaced by the mail client, and the 
patchwork could not validate the series. I am resubmitting the patch as an 
attachment.



v2-0002-Extract-ix86-dllimport-implementation-to-mingw.patch
Description: v2-0002-Extract-ix86-dllimport-implementation-to-mingw.patch

Re: [committed] i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]

2024-06-08 Thread Gerald Pfeifer

On Sat, 8 Jun 2024, Uros Bizjak wrote:
> gcc/ChangeLog:
> 
> * config/i386/i386.md (usadd3): New expander.
> (x86_movcc_0_m1_neg): Use SWI mode iterator.

When you write "committed", did you actually push? 

If so, us being on Git now it might be good to adjust terminology.

Gerald

[pushed] wwwdocs: *: Refer to /onlinedocs fully qualified via gcc.gnu.org

2024-06-08 Thread Gerald Pfeifer

Thanks for Tobias for pointing these two out. 

Pushed.

Gerald
---
 htdocs/gcc-14/porting_to.html | 2 +-
 htdocs/gcc-5/changes.html | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/htdocs/gcc-14/porting_to.html b/htdocs/gcc-14/porting_to.html
index ef02e071..3de15d02 100644
--- a/htdocs/gcc-14/porting_to.html
+++ b/htdocs/gcc-14/porting_to.html
@@ -428,7 +428,7 @@ exec /usr/bin/gcc -fpermissive "$@"
 
 C code generators that cannot be updated to generate valid standard C
 can emit
-#pragma GCC 
diagnostic warning
+https://gcc.gnu.org/onlinedocs/gcc-14.1.0/gcc/Diagnostic-Pragmas.html";>#pragma
 GCC diagnostic warning
 directives to turn these errors back into warnings:
 
 
diff --git a/htdocs/gcc-5/changes.html b/htdocs/gcc-5/changes.html
index ab3da60b..81ad4be3 100644
--- a/htdocs/gcc-5/changes.html
+++ b/htdocs/gcc-5/changes.html
@@ -438,7 +438,7 @@ version 2 and the current setting.
 
   Runtime Library (libstdc++)
   
-A Dual
+A https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_abi.html";>Dual
 ABI is provided by the library. A new ABI is enabled by default.
 The old ABI is still supported and can be used by defining the macro
 _GLIBCXX_USE_CXX11_ABI to 0 before
-- 
2.45.1

Re: [wwwdocs] gcc-15/changes.html + projects/gomp: update for new OpenMP features

2024-06-08 Thread Gerald Pfeifer

On Fri, 7 Jun 2024, Tobias Burnus wrote:
>> +  https://gcc.gnu.org/projects/gomp/";>OpenMP
>> Can you please make this a relative link, i.e. "../projects/gomp/"?
> Good point. I thought such links should be absolute because of (www.)GNU.org,
> i.e.
> 
> https://www.gnu.org/software/gcc/releases.html

We only need to use absolutely links for material only available on 
gcc.gnu.org such as Bugzilla, the Wiki, or /onlinedocs.

Everyone directly unter wwwdocs/htdocs can be relative.

> GNU.org does not have the documentation, but going to
> https://www.gnu.org/software/gcc/onlinedocs/ or a subpage redirects (302
> temporary redirect) to the GCC website. Likewise for '../git' but for
> '../wiki' it has a HTTP 404 not found; fortunately, ../wiki/ works.
> 
> I think there are plenty of links which could be relative ones but are
> absolute ones.

The original trigger to be careful and provide absolute links was  
https://www.gnu.org/software/gcc/ which was established early in the
days of GCC 2.95 after the egcs/gcc/GCC reconciliation.

This is also useful in case anyone wants to use these pages locally.

> Looks like a janitorial task to fix the absolute links, possibly 
> excluding those with /git, /onlinedocs, /wiki – or assuming that the 
> main page is GCC.gnu.org, relying on the redirects.

It's on my list. A first quick check indicates there isn't much to do, 
though. :-)

> o In any case, those links are probably broken on GNU.org:
> 
> htdocs/gcc-14/porting_to.html: href="/onlinedocs/gcc-14.1.0/gcc/Diagnostic-Pragmas.html">#pragma GCC
> diagnostic warning
> 
> htdocs/gcc-5/changes.html:    A  href="/onlinedocs/libstdc++/manual/using_dual_abi.html">Dual

Yes, those are definitely in need of being fixed. I'll do so in a minute. 
Thanks for pointing out the two.

>> +  loop-transformation constructs are now supported.
>> I'm thinking "loop transformation" in English? Or is this a specific term
>> from the standard?
> Loop transformation happens at the end. But e.g "(#pragma omp) unroll 
> full" is a directive and, e.g.
> 
> #pragma omp unroll partial(2)
> 
> for (int i=0; i < n; i++)
> 
> a[i] = 5;
> 
> is a construct (= directive + structured block (if any) + end directive 
> (if any)).

I believe there was a misunderstanding and I wasn't clear enough: I was 
wondering whether instead of "loop-transformation" the patch should have 
"loop transformation".

In your response you use the version without dash, so I guess we agree? 
:-)

Cheers,
Gerald

Re: [RFC/RFA] [PATCH 06/12] aarch64: Implement new expander for efficient CRC computation

2024-06-08 Thread Richard Sandiford

Mariam Arutunian  writes:
> This patch introduces two new expanders for the aarch64 backend,
> dedicated to generate optimized code for CRC computations.
> The new expanders are designed to leverage specific hardware capabilities
> to achieve faster CRC calculations,
> particularly using the pmul or crc32 instructions when supported by the
> target architecture.

Thanks for porting this to aarch64!

> Expander 1: Bit-Forward CRC (crc4)
> For targets that support pmul instruction (TARGET_AES),
> the expander will generate code that uses the pmul (crypto_pmulldi)
> instruction for CRC computation.
>
> Expander 2: Bit-Reversed CRC (crc_rev4)
> The expander first checks if the target supports the CRC32 instruction set
> (TARGET_CRC32)
> and the polynomial in use is 0x1EDC6F41 (iSCSI). If the conditions are met,
> it emits calls to the corresponding crc32 instruction (crc32b, crc32h,
> crc32w, or crc32x depending on the data size).
> If the target does not support crc32 but supports pmul, it then uses the
> pmul (crypto_pmulldi) instruction for bit-reversed CRC computation.
>
> Otherwise table-based CRC is generated.
>
>   gcc/config/aarch64/
>
> * aarch64-protos.h (aarch64_expand_crc_using_clmul): New extern
> function declaration.
> (aarch64_expand_reversed_crc_using_clmul):  Likewise.
> * aarch64.cc (aarch64_expand_crc_using_clmul): New function.
> (aarch64_expand_reversed_crc_using_clmul):  Likewise.
> * aarch64.md (UNSPEC_CRC, UNSPEC_CRC_REV):  New unspecs.
> (crc_rev4): New expander for reversed CRC.
> (crc4): New expander for reversed CRC.
> * iterators.md (crc_data_type): New mode attribute.
>
>   gcc/testsuite/gcc.target/aarch64/
>
> * crc-1-pmul.c: Likewise.
> * crc-10-pmul.c: Likewise.
> * crc-12-pmul.c: Likewise.
> * crc-13-pmul.c: Likewise.
> * crc-14-pmul.c: Likewise.
> * crc-17-pmul.c: Likewise.
> * crc-18-pmul.c: Likewise.
> * crc-21-pmul.c: Likewise.
> * crc-22-pmul.c: Likewise.
> * crc-23-pmul.c: Likewise.
> * crc-4-pmul.c: Likewise.
> * crc-5-pmul.c: Likewise.
> * crc-6-pmul.c: Likewise.
> * crc-7-pmul.c: Likewise.
> * crc-8-pmul.c: Likewise.
> * crc-9-pmul.c: Likewise.
> * crc-CCIT-data16-pmul.c: Likewise.
> * crc-CCIT-data8-pmul.c: Likewise.
> * crc-coremark-16bitdata-pmul.c: Likewise.
> * crc-crc32-data16.c: New test.
> * crc-crc32-data32.c: Likewise.
> * crc-crc32-data8.c: Likewise.
>
> Signed-off-by: Mariam Arutunian  diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 1d3f94c813e..167e1140f0d 100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -1117,5 +1117,8 @@ extern void mingw_pe_encode_section_info (tree, rtx, 
> int);
>  
>  bool aarch64_optimize_mode_switching (aarch64_mode_entity);
>  void aarch64_restore_za (rtx);
> +void aarch64_expand_crc_using_clmul (rtx *);
> +void aarch64_expand_reversed_crc_using_clmul (rtx *);
> +
>  
>  #endif /* GCC_AARCH64_PROTOS_H */
> diff --git a/gcc/config/aarch64/aarch64.cc b/gcc/config/aarch64/aarch64.cc
> index ee12d8897a8..05cd0296d38 100644
> --- a/gcc/config/aarch64/aarch64.cc
> +++ b/gcc/config/aarch64/aarch64.cc
> @@ -30265,6 +30265,135 @@ aarch64_retrieve_sysreg (const char *regname, bool 
> write_p, bool is128op)
>return sysreg->encoding;
>  }
>  
> +/* Generate assembly to calculate CRC
> +   using carry-less multiplication instruction.
> +   OPERANDS[1] is input CRC,
> +   OPERANDS[2] is data (message),
> +   OPERANDS[3] is the polynomial without the leading 1.  */
> +
> +void
> +aarch64_expand_crc_using_clmul (rtx *operands)

This should probably be pmul rather than clmul.

> +{
> +  /* Check and keep arguments.  */
> +  gcc_assert (!CONST_INT_P (operands[0]));
> +  gcc_assert (CONST_INT_P (operands[3]));
> +  rtx crc = operands[1];
> +  rtx data = operands[2];
> +  rtx polynomial = operands[3];
> +
> +  unsigned HOST_WIDE_INT
> +  crc_size = GET_MODE_BITSIZE (GET_MODE (operands[0])).to_constant ();
> +  gcc_assert (crc_size <= 32);
> +  unsigned HOST_WIDE_INT
> +  data_size = GET_MODE_BITSIZE (GET_MODE (data)).to_constant ();

We could instead make the interface:

void
aarch64_expand_crc_using_pmul (scalar_mode crc_mode, scalar_mode data_mode,
   rtx *operands)

so that the lines above don't need the to_constant.  This should "just
work" on the .md file side, since the modes being passed are naturally
scalar_mode.

I think it'd be worth asserting also that data_size <= crc_size.
(Although we could handle any MAX (data_size, crc_size) <= 32
with some adjustment.)

> +
> +  /* Calculate the quotient.  */
> +  unsigned HOST_WIDE_INT
> +  q = gf2n_poly_long_div_quotient (UINTVAL (polynomial), crc_size + 1);
> +
> +  /* CRC calculation's main part.  */
> +  if (crc_size > data_size)
> +crc = expand_shift (RSHIFT_EXPR, DImode, crc, crc_size - data_size,
> + NULL_R

Re: [PATCH] c++: Make _cast<> parsing more robust to errors [PR108438]

2024-06-08 Thread Simon Martin

Hi Jason,

On 7 Jun 2024, at 19:30, Jason Merrill wrote:

> On 6/7/24 08:12, Simon Martin wrote:
>> We ICE upon the following when trying to emit a 
>> -Wlogical-not-parentheses
>> warning:
>>
>> === cut here ===
>> template  T foo (T arg, T& ref, T* ptr) {
>>int a = 1;
>>return static_cast(a);
>> }
>> === cut here ===
>>
>> This patch makes *_cast<*> parsing more robust by skipping to the 
>> closing '>'
>> upon error in the target type.
>>
>> Successfully tested on x86_64-pc-linux-gnu.
>>
>> (Note that I have a patch pending review that also adds 
>> g++.dg/parse/crash74.C;
>> I will obviously handle the name conflict at commit time)
>>
>>  PR c++/108438
>>
>> gcc/cp/ChangeLog:
>>
>>  * parser.cc (cp_parser_postfix_expression): Skip to the closing '>'
>>  upon error parsing the target type of *_cast<*> expressions.
>>
>> gcc/testsuite/ChangeLog:
>>
>>  * g++.dg/parse/crash74.C: New test.
>>
>> ---
>>   gcc/cp/parser.cc | 3 ++-
>>   gcc/testsuite/g++.dg/parse/crash74.C | 9 +
>>   2 files changed, 11 insertions(+), 1 deletion(-)
>>   create mode 100644 gcc/testsuite/g++.dg/parse/crash74.C
>>
>> diff --git a/gcc/cp/parser.cc b/gcc/cp/parser.cc
>> index bc4a2359153..3516c2aa38b 100644
>> --- a/gcc/cp/parser.cc
>> +++ b/gcc/cp/parser.cc
>> @@ -7569,7 +7569,8 @@ cp_parser_postfix_expression (cp_parser 
>> *parser, bool address_p, bool cast_p,
>>NULL);
>>  parser->in_type_id_in_expr_p = saved_in_type_id_in_expr_p;
>>  /* Look for the closing `>'.  */
>> -cp_parser_require (parser, CPP_GREATER, RT_GREATER);
>> +if (!cp_parser_require (parser, CPP_GREATER, RT_GREATER))
>> +  cp_parser_skip_to_end_of_template_parameter_list (parser);
>
> Looks like this could use 
> cp_parser_require_end_of_template_parameter_list.
Indeed, thanks for pointing me to this function.
>
> OK with that change.
Merged with the change made via 
https://gcc.gnu.org/g:2c9643c27ecddb7f597d34009d89e932b4aca58e

-- Simon
>
> Jason

[pushed] wwwdocs: gcc-12: Break up markup of list of AArch64 options

2024-06-08 Thread Gerald Pfeifer

When showing a list of options marked up as code, each individual option 
should be marked up, not the entire list and he commas as part of that.

Pushed.

Gerald
---
 htdocs/gcc-12/changes.html | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/htdocs/gcc-12/changes.html b/htdocs/gcc-12/changes.html
index 2f790e0b..9e2cee50 100644
--- a/htdocs/gcc-12/changes.html
+++ b/htdocs/gcc-12/changes.html
@@ -691,7 +691,8 @@ function Multiply (S1, S2 : Sign) return Sign is
 AArch64 & arm
 
   Newer revisions of the Arm Architecture are supported as arguments to the
-  -march option: armv8.7-a, armv8.8-a, armv9-a.
+  -march option: armv8.7-a, 
+  armv8.8-a, armv9-a.
   The Arm Cortex-A510 CPU is now supported through the cortex-a510
argument to the -mcpu and -mtune options.
   
-- 
2.45.1

[committed] i386: Implement .SAT_ADD for unsigned scalar integers [PR112600]

2024-06-08 Thread Uros Bizjak

The following testcase:

unsigned
add_sat(unsigned x, unsigned y)
{
unsigned z;
return __builtin_add_overflow(x, y, &z) ? -1u : z;
}

currently compiles (-O2) to:

add_sat:
addl%esi, %edi
jc  .L3
movl%edi, %eax
ret
.L3:
orl $-1, %eax
ret

We can expand through usadd{m}3 optab to use carry flag from the addition
and generate branchless code using SBB instruction implementing:

unsigned res = x + y;
res |= -(res < x);

add_sat:
addl%esi, %edi
sbbl%eax, %eax
orl %edi, %eax
ret

PR target/112600

gcc/ChangeLog:

* config/i386/i386.md (usadd3): New expander.
(x86_movcc_0_m1_neg): Use SWI mode iterator.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-a.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index ffcf63e1cba..bc2ef819df6 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9870,6 +9870,26 @@ (define_insn_and_split "*sub3_ne_0"
 operands[1] = force_reg (mode, operands[1]);
 })
 
+(define_expand "usadd3"
+  [(set (match_operand:SWI 0 "register_operand")
+   (us_plus:SWI (match_operand:SWI 1 "register_operand")
+(match_operand:SWI 2 "")))]
+  ""
+{
+  rtx res = gen_reg_rtx (mode);
+  rtx msk = gen_reg_rtx (mode);
+  rtx dst;
+
+  emit_insn (gen_add3_cc_overflow_1 (res, operands[1], operands[2]));
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  dst = expand_simple_binop (mode, IOR, res, msk,
+operands[0], 1, OPTAB_DIRECT);
+
+  if (!rtx_equal_p (dst, operands[0]))
+emit_move_insn (operands[0], dst);
+  DONE;
+})
+
 ;; The patterns that match these are at the end of this file.
 
 (define_expand "xf3"
@@ -24945,8 +24965,8 @@ (define_insn "*x86_movcc_0_m1_neg"
 
 (define_expand "x86_movcc_0_m1_neg"
   [(parallel
-[(set (match_operand:SWI48 0 "register_operand")
- (neg:SWI48 (ltu:SWI48 (reg:CCC FLAGS_REG) (const_int 0
+[(set (match_operand:SWI 0 "register_operand")
+ (neg:SWI (ltu:SWI (reg:CCC FLAGS_REG) (const_int 0
  (clobber (reg:CC FLAGS_REG))])])
 
 (define_split
diff --git a/gcc/testsuite/gcc.target/i386/pr112600-a.c 
b/gcc/testsuite/gcc.target/i386/pr112600-a.c
new file mode 100644
index 000..fa122bc7a3f
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr112600-a.c
@@ -0,0 +1,32 @@
+/* PR target/112600 */
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { scan-assembler-times "sbb" 4 } } */
+
+unsigned char
+add_sat_char (unsigned char x, unsigned char y)
+{
+  unsigned char z;
+  return __builtin_add_overflow(x, y, &z) ? -1u : z;
+}
+
+unsigned short
+add_sat_short (unsigned short x, unsigned short y)
+{
+  unsigned short z;
+  return __builtin_add_overflow(x, y, &z) ? -1u : z;
+}
+
+unsigned int
+add_sat_int (unsigned int x, unsigned int y)
+{
+  unsigned int z;
+  return __builtin_add_overflow(x, y, &z) ? -1u : z;
+}
+
+unsigned long
+add_sat_long (unsigned long x, unsigned long y)
+{
+  unsigned long z;
+  return __builtin_add_overflow(x, y, &z) ? -1ul : z;
+}

RE: [PATCH v3] RISC-V: Implement .SAT_SUB for unsigned scalar int

2024-06-08 Thread Li, Pan2

> LGTM.

Committed, thanks Robin.

> Let's keep in mind that min/max will save us two insns(?)
> and a conditional move would save us one.

Got it, cmov is well designed for such case(s).

Pan


-Original Message-
From: Robin Dapp  
Sent: Friday, June 7, 2024 9:57 PM
To: Li, Pan2 ; gcc-patches@gcc.gnu.org
Cc: rdapp@gmail.com; juzhe.zh...@rivai.ai; kito.ch...@gmail.com
Subject: Re: [PATCH v3] RISC-V: Implement .SAT_SUB for unsigned scalar int

LGTM.

Let's keep in mind that min/max will save us two insns(?)
and a conditional move would save us one.

Regards
 Robin

[PATCH] MIPS/testsuite: add -mno-branch-likely to r10k-cache-barrier-13.c

2024-06-08 Thread YunQiang Su

In mips.cc(mips_reorg_process_insns), there is this claim:

Also delete cache barriers if the last instruction
was an annulled branch.  INSN will not be speculatively
executed.

And with -O1 on mips64, we can generate binary code like this,
which fails this test.

gcc/testsuite
* gcc.target/mips/r10-cache-barrier-13.c: Add -mno-branch-likely
option.
---
 gcc/testsuite/gcc.target/mips/r10k-cache-barrier-13.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.target/mips/r10k-cache-barrier-13.c 
b/gcc/testsuite/gcc.target/mips/r10k-cache-barrier-13.c
index ee9c84b5988..ac005fb08b3 100644
--- a/gcc/testsuite/gcc.target/mips/r10k-cache-barrier-13.c
+++ b/gcc/testsuite/gcc.target/mips/r10k-cache-barrier-13.c
@@ -1,4 +1,4 @@
-/* { dg-options "-mr10k-cache-barrier=store" } */
+/* { dg-options "-mr10k-cache-barrier=store -mno-branch-likely" } */
 
 /* Test that indirect calls are protected.  */
 
-- 
2.39.3 (Apple Git-146)

Re: [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-06-08 Thread Richard Sandiford

Thanks a lot for doing this!  It's a really nice series.

Just had a comment on the long division helper:

Mariam Arutunian  writes:
> +/* Return the quotient of polynomial long division of x^2N by POLYNOMIAL
> +   in GF (2^N).  */

It looks like there might be an off-by-one discrepancy between the comment
and the code.  The comment suggests that N is the degree of the polynomial
(crc_size), whereas the callers seem to pass crc_size + 1.  This doesn't
matter in practice since...

> +
> +unsigned HOST_WIDE_INT
> +gf2n_poly_long_div_quotient (unsigned HOST_WIDE_INT polynomial, size_t n)
> +{
> +  vec x2n;
> +  vec pol, q;
> +  /* Create vector of bits, for the polynomial.  */
> +  pol.create (n + 1);
> +  for (size_t i = 0; i < n; i++)
> +{
> +  pol.quick_push (polynomial & 1);
> +  polynomial >>= 1;
> +}
> +  pol.quick_push (1);
> +
> +  /* Create vector for x^2n polynomial.  */
> +  x2n.create (2 * n - 1);
> +  for (size_t i = 0; i < 2 * (n - 1); i++)
> +x2n.safe_push (0);
> +  x2n.safe_push (1);

...this compensates by setting the dividend to x^(2N-2).  And although
the first loop reads crc_size+1 bits from polynomial before adding the
implicit leading 1, only the low crc_size elements of poly affect the
result.

If we do pass crc_size as N, a simpler way of writing the routine might be:

{
  /* The result has degree N, so needs N + 1 bits.  */
  gcc_assert (n < 64);

  /* Perform a division step for the x^2N coefficient.  At this point the
 quotient and remainder have N implicit trailing zeros.  */
  unsigned HOST_WIDE_INT quotient = 1;
  unsigned HOST_WIDE_INT remainder = polynomial;

  /* Process the coefficients for x^(2N-1) down to x^N, with each step
 reducing the number of implicit trailing zeros by one.  */
  for (unsigned int i = 0; i < n; ++i)
{
  bool coeff = remainder & (HOST_WIDE_INT_1U << (n - 1));
  quotient = (quotient << 1) | coeff;
  remainder = (remainder << 1) ^ (coeff ? polynomial : 0);
}
  return quotient;
}

I realise there are many ways of writing this out there though,
so that's just a suggestion.  (And only lightly tested.)

FWIW, we could easily extend the interface to work on wide_ints if we
ever need it for N>63.  

Thanks,
Richard

> +
> +  q.create (n);
> +  for (size_t i = 0; i < n; i++)
> +q.quick_push (0);
> +
> +  /* Calculate the quotient of x^2n/polynomial.  */
> +  for (int i = n - 1; i >= 0; i--)
> +{
> +  int d = x2n[i + n - 1];
> +  if (d == 0)
> + continue;
> +  for (int j = i + n - 1; j >= i; j--)
> + x2n[j] ^= (pol[j - i]);
> +  q[i] = 1;
> +}
> +
> +  /* Get the number from the vector of 0/1s.  */
> +  unsigned HOST_WIDE_INT quotient = 0;
> +  for (size_t i = 0; i < q.length (); i++)
> +{
> +  quotient <<= 1;
> +  quotient = quotient | q[q.length () - i - 1];
> +}
> +  return quotient;
> +}

[pushed] wwwdocs: news: Update links re GCC Runtime Library Exception

2024-06-08 Thread Gerald Pfeifer

Note this is more than just http->https.

Pushed. 

Gerald
---
 htdocs/news.html | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/htdocs/news.html b/htdocs/news.html
index aeac6935..5f652d90 100644
--- a/htdocs/news.html
+++ b/htdocs/news.html
@@ -678,13 +678,13 @@ platforms that support dynamically loadable objects.
 
 The GCC Steering Committee, along with the Free Software Foundation
 and the Software Freedom Law Center, is pleased to announce the release
-of a new http://www.gnu.org/licenses/gcc-exception.html";>GCC
+of a new https://www.gnu.org/licenses/gcc-exception-3.1.html";>GCC
 Runtime Library Exception.
 
 This license exception has been developed to allow various GCC
 libraries to upgrade to GPLv3.  It will also enable the development
 of a plugin framework for GCC.
-(http://www.gnu.org/licenses/gcc-exception-faq.html";>Rationale
+(https://www.gnu.org/licenses/gcc-exception-3.1-faq.html";>Rationale
 document and FAQ)
 
 
-- 
2.45.1

46 matches

Mail list logo