Re: [RFC/RFA] [PATCH 03/12] RISC-V: Add CRC expander to generate faster CRC.

2024-06-09 Thread Mariam Arutunian
On Sat, Jun 8, 2024, 09:53 Richard Sandiford 
wrote:

> Thanks a lot for doing this!  It's a really nice series.
>


Thank you for your positive feedback and for your review and suggestions on
the patch series.

Just had a comment on the long division helper:
>
> Mariam Arutunian  writes:
> > +/* Return the quotient of polynomial long division of x^2N by POLYNOMIAL
> > +   in GF (2^N).  */
>
> It looks like there might be an off-by-one discrepancy between the comment
> and the code.  The comment suggests that N is the degree of the polynomial
> (crc_size), whereas the callers seem to pass crc_size + 1.  This doesn't
> matter in practice since...
>
> > +
> > +unsigned HOST_WIDE_INT
> > +gf2n_poly_long_div_quotient (unsigned HOST_WIDE_INT polynomial, size_t
> n)
> > +{
> > +  vec x2n;
> > +  vec pol, q;
> > +  /* Create vector of bits, for the polynomial.  */
> > +  pol.create (n + 1);
> > +  for (size_t i = 0; i < n; i++)
> > +{
> > +  pol.quick_push (polynomial & 1);
> > +  polynomial >>= 1;
> > +}
> > +  pol.quick_push (1);
> > +
> > +  /* Create vector for x^2n polynomial.  */
> > +  x2n.create (2 * n - 1);
> > +  for (size_t i = 0; i < 2 * (n - 1); i++)
> > +x2n.safe_push (0);
> > +  x2n.safe_push (1);
>
> ...this compensates by setting the dividend to x^(2N-2).  And although
> the first loop reads crc_size+1 bits from polynomial before adding the
> implicit leading 1, only the low crc_size elements of poly affect the
> result.
>


Yes. Initially, I implemented it quickly using an implementation I found
with the intention of refining it later.


> If we do pass crc_size as N, a simpler way of writing the routine might be:
>
> {
>   /* The result has degree N, so needs N + 1 bits.  */
>   gcc_assert (n < 64);
>
>   /* Perform a division step for the x^2N coefficient.  At this point the
>  quotient and remainder have N implicit trailing zeros.  */
>   unsigned HOST_WIDE_INT quotient = 1;
>   unsigned HOST_WIDE_INT remainder = polynomial;
>
>   /* Process the coefficients for x^(2N-1) down to x^N, with each step
>  reducing the number of implicit trailing zeros by one.  */
>   for (unsigned int i = 0; i < n; ++i)
> {
>   bool coeff = remainder & (HOST_WIDE_INT_1U << (n - 1));
>   quotient = (quotient << 1) | coeff;
>   remainder = (remainder << 1) ^ (coeff ? polynomial : 0);
> }
>   return quotient;
> }
>
> I realise there are many ways of writing this out there though,
> so that's just a suggestion.  (And only lightly tested.)
>

Thanks, I appreciate your input.
I'm currently on vacation, but after I return, I'll apply all the changes
and send a new version.


> FWIW, we could easily extend the interface to work on wide_ints if we
> ever need it for N>63.



 The problem is keeping the whole quotient. For 64 degree, it may need 65
bits. Jeff already answered this. Alternatively, there might be a method to
perform the calculation without retaining that extra bit, but I haven't
explored it yet.


Thank you again,
Mariam


Thanks,
> Richard
>
> > +
> > +  q.create (n);
> > +  for (size_t i = 0; i < n; i++)
> > +q.quick_push (0);
> > +
> > +  /* Calculate the quotient of x^2n/polynomial.  */
> > +  for (int i = n - 1; i >= 0; i--)
> > +{
> > +  int d = x2n[i + n - 1];
> > +  if (d == 0)
> > + continue;
> > +  for (int j = i + n - 1; j >= i; j--)
> > + x2n[j] ^= (pol[j - i]);
> > +  q[i] = 1;
> > +}
> > +
> > +  /* Get the number from the vector of 0/1s.  */
> > +  unsigned HOST_WIDE_INT quotient = 0;
> > +  for (size_t i = 0; i < q.length (); i++)
> > +{
> > +  quotient <<= 1;
> > +  quotient = quotient | q[q.length () - i - 1];
> > +}
> > +  return quotient;
> > +}
>


[pushed] doc: Remove link to www.amelek.gda.pl/avr/

2024-06-09 Thread Gerald Pfeifer
The entire server/site appears gone for a while.

gcc:
* doc/install.texi (avr): Remove link to www.amelek.gda.pl/avr/.
---
 gcc/doc/install.texi | 2 --
 1 file changed, 2 deletions(-)

diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 906c78aaca5..2addafd2465 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -4016,8 +4016,6 @@ can also be obtained from:
 @itemize @bullet
 @item
 @uref{http://www.nongnu.org/avr/,,http://www.nongnu.org/avr/}
-@item
-@uref{http://www.amelek.gda.pl/avr/,,http://www.amelek.gda.pl/avr/}
 @end itemize
 
 The following error:
-- 
2.45.1


[committed] i386: Implement .SAT_SUB for unsigned scalar integers [PR112600]

2024-06-09 Thread Uros Bizjak
The following testcase:

unsigned
sub_sat (unsigned x, unsigned y)
{
  unsigned res;
  res = x - y;
  res &= -(x >= y);
  return res;
}

currently compiles (-O2) to:

sub_sat:
movl%edi, %edx
xorl%eax, %eax
subl%esi, %edx
cmpl%esi, %edi
setnb   %al
negl%eax
andl%edx, %eax
ret

We can expand through ussub{m}3 optab to use carry flag from the subtraction
and generate code using SBB instruction implementing:

unsigned res = x - y;
res &= ~(-(x < y));

sub_sat:
subl%esi, %edi
sbbl%eax, %eax
notl%eax
andl%edi, %eax
ret

PR target/112600

gcc/ChangeLog:

* config/i386/i386.md (ussub3): New expander.
(sub_3): Ditto.

gcc/testsuite/ChangeLog:

* gcc.target/i386/pr112600-b.c: New test.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index bc2ef819df6..d69bc8d6e48 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -8436,6 +8436,14 @@ (define_expand "usubv4"
   "ix86_fixup_binary_operands_no_copy (MINUS, mode, operands,
   TARGET_APX_NDD);")
 
+(define_expand "sub_3"
+  [(parallel [(set (reg:CC FLAGS_REG)
+  (compare:CC
+(match_operand:SWI 1 "nonimmediate_operand")
+(match_operand:SWI 2 "")))
+ (set (match_operand:SWI 0 "register_operand")
+  (minus:SWI (match_dup 1) (match_dup 2)))])])
+
 (define_insn "*sub_3"
   [(set (reg FLAGS_REG)
(compare (match_operand:SWI 1 "nonimmediate_operand" "0,0,rm,r")
@@ -9883,7 +9891,28 @@ (define_expand "usadd3"
   emit_insn (gen_add3_cc_overflow_1 (res, operands[1], operands[2]));
   emit_insn (gen_x86_movcc_0_m1_neg (msk));
   dst = expand_simple_binop (mode, IOR, res, msk,
-operands[0], 1, OPTAB_DIRECT);
+operands[0], 1, OPTAB_WIDEN);
+
+  if (!rtx_equal_p (dst, operands[0]))
+emit_move_insn (operands[0], dst);
+  DONE;
+})
+
+(define_expand "ussub3"
+  [(set (match_operand:SWI 0 "register_operand")
+   (us_minus:SWI (match_operand:SWI 1 "register_operand")
+ (match_operand:SWI 2 "")))]
+  ""
+{
+  rtx res = gen_reg_rtx (mode);
+  rtx msk = gen_reg_rtx (mode);
+  rtx dst;
+
+  emit_insn (gen_sub_3 (res, operands[1], operands[2]));
+  emit_insn (gen_x86_movcc_0_m1_neg (msk));
+  msk = expand_simple_unop (mode, NOT, msk, NULL, 1);
+  dst = expand_simple_binop (mode, AND, res, msk,
+operands[0], 1, OPTAB_WIDEN);
 
   if (!rtx_equal_p (dst, operands[0]))
 emit_move_insn (operands[0], dst);


Re: [PATCH] ifcvt.cc: Prevent excessive if-conversion for conditional moves

2024-06-09 Thread YunQiang Su
> >
> > gcc/ChangeLog:
> >
> >   * ifcvt.cc (cond_move_process_if_block):
> >   Consider the result of targetm.noce_conversion_profitable_p()
> >   when replacing the original sequence with the converted one.
> THanks.  I pushed this to the trunk.
>

Sorry for the delay report. With this patch the test
gcc.target/mips/movcc-3.c fails.


> Jeff



-- 
YunQiang Su


Re: [PATCH] ifcvt.cc: Prevent excessive if-conversion for conditional moves

2024-06-09 Thread YunQiang Su
YunQiang Su  于2024年6月9日周日 18:25写道:
>
> > >
> > > gcc/ChangeLog:
> > >
> > >   * ifcvt.cc (cond_move_process_if_block):
> > >   Consider the result of targetm.noce_conversion_profitable_p()
> > >   when replacing the original sequence with the converted one.
> > THanks.  I pushed this to the trunk.
> >
>
> Sorry for the delay report. With this patch the test
> gcc.target/mips/movcc-3.c fails.
>

The problem may be caused by the different of `seq` and `edge e`.
In `seq`, there may be a compare operation, while
`default_max_noce_ifcvt_seq_cost`
only count the branch operation.

The rtx_cost may consider the compare operation in `seq` as quite expensive.


-- 
YunQiang Su


Re: [PING] [contrib] validate_failures.py: fix python 3.12 escape sequence warnings

2024-06-09 Thread Gabi Falk
Hi,

On Sat, Jun 08, 2024 at 03:34:02PM -0600, Jeff Law wrote:
> On 5/14/24 8:12 AM, Gabi Falk wrote:
> > Hi,
> >
> > This one still needs review:
> >
> > https://inbox.sourceware.org/gcc-patches/20240415233833.104460-1-gabif...@gmx.com/
> I think I just ACK'd an equivalent patch from someone else this week.

Looks like it hasn't been merged yet, and I couldn't find it in the
mailing list archive.
Anyway, I hope either one gets merged soon. :)

--
gabi


[PATCH] LoongArch: Use bstrins for "value & (-1u << const)"

2024-06-09 Thread Xi Ruoyao
A move/bstrins pair is as fast as a (addi.w|lu12i.w|lu32i.d|lu52i.d)/and
pair, and twice fast as a srli/slli pair.  When the src reg and the dst
reg happens to be the same, the move instruction can be optimized away.

gcc/ChangeLog:

* config/loongarch/predicates.md (high_bitmask_operand): New
predicate.
* config/loongarch/constraints.md (Yy): New constriant.
* config/loongarch/loongarch.md (and3_align): New
define_insn_and_split.

gcc/testsuite/ChangeLog:

* gcc.target/loongarch/bstrins-1.c: New test.
* gcc.target/loongarch/bstrins-2.c: New test.
---

Bootstrapped and regtested on loongarch64-linux-gnu.  Ok for trunk?

 gcc/config/loongarch/constraints.md|  5 +
 gcc/config/loongarch/loongarch.md  | 17 +
 gcc/config/loongarch/predicates.md |  4 
 gcc/testsuite/gcc.target/loongarch/bstrins-1.c |  9 +
 gcc/testsuite/gcc.target/loongarch/bstrins-2.c | 14 ++
 5 files changed, 49 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/loongarch/bstrins-1.c
 create mode 100644 gcc/testsuite/gcc.target/loongarch/bstrins-2.c

diff --git a/gcc/config/loongarch/constraints.md 
b/gcc/config/loongarch/constraints.md
index f07d31650d2..12cf5e2924a 100644
--- a/gcc/config/loongarch/constraints.md
+++ b/gcc/config/loongarch/constraints.md
@@ -94,6 +94,7 @@
 ;;   "A constant @code{move_operand} that can be safely loaded using
 ;;   @code{la}."
 ;;"Yx"
+;;"Yy"
 ;; "Z" -
 ;;"ZC"
 ;;  "A memory operand whose address is formed by a base register and offset
@@ -291,6 +292,10 @@ (define_constraint "Yx"
"@internal"
(match_operand 0 "low_bitmask_operand"))
 
+(define_constraint "Yy"
+   "@internal"
+   (match_operand 0 "high_bitmask_operand"))
+
 (define_constraint "YI"
   "@internal
A replicated vector const in which the replicated value is in the range
diff --git a/gcc/config/loongarch/loongarch.md 
b/gcc/config/loongarch/loongarch.md
index 5c80c169cbf..25c1d323ba0 100644
--- a/gcc/config/loongarch/loongarch.md
+++ b/gcc/config/loongarch/loongarch.md
@@ -1542,6 +1542,23 @@ (define_insn "and3_extended"
   [(set_attr "move_type" "pick_ins")
(set_attr "mode" "")])
 
+(define_insn_and_split "and3_align"
+  [(set (match_operand:GPR 0 "register_operand" "=r")
+   (and:GPR (match_operand:GPR 1 "register_operand" "r")
+(match_operand:GPR 2 "high_bitmask_operand" "Yy")))]
+  ""
+  "#"
+  ""
+  [(set (match_dup 0) (match_dup 1))
+   (set (zero_extract:GPR (match_dup 0) (match_dup 2) (const_int 0))
+   (const_int 0))]
+{
+  int len;
+
+  len = low_bitmask_len (mode, ~INTVAL (operands[2]));
+  operands[2] = GEN_INT (len);
+})
+
 (define_insn_and_split "*bstrins__for_mask"
   [(set (match_operand:GPR 0 "register_operand" "=r")
(and:GPR (match_operand:GPR 1 "register_operand" "r")
diff --git a/gcc/config/loongarch/predicates.md 
b/gcc/config/loongarch/predicates.md
index eba7f246c84..58e406ea522 100644
--- a/gcc/config/loongarch/predicates.md
+++ b/gcc/config/loongarch/predicates.md
@@ -293,6 +293,10 @@ (define_predicate "low_bitmask_operand"
   (and (match_code "const_int")
(match_test "low_bitmask_len (mode, INTVAL (op)) > 12")))
 
+(define_predicate "high_bitmask_operand"
+  (and (match_code "const_int")
+   (match_test "low_bitmask_len (mode, ~INTVAL (op)) > 0")))
+
 (define_predicate "d_operand"
   (and (match_code "reg")
(match_test "GP_REG_P (REGNO (op))")))
diff --git a/gcc/testsuite/gcc.target/loongarch/bstrins-1.c 
b/gcc/testsuite/gcc.target/loongarch/bstrins-1.c
new file mode 100644
index 000..7cb3a952322
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/bstrins-1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "bstrins\\.d\t\\\$r4,\\\$r0,4,0" } } */
+
+long
+x (long a)
+{
+  return a & -32;
+}
diff --git a/gcc/testsuite/gcc.target/loongarch/bstrins-2.c 
b/gcc/testsuite/gcc.target/loongarch/bstrins-2.c
new file mode 100644
index 000..9777f502e5a
--- /dev/null
+++ b/gcc/testsuite/gcc.target/loongarch/bstrins-2.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=loongarch64 -mabi=lp64d" } */
+/* { dg-final { scan-assembler "bstrins\\.d\t\\\$r\[0-9\]+,\\\$r0,4,0" } } */
+
+struct aligned_buffer {
+  _Alignas(32) char x[1024];
+};
+
+extern int f(char *);
+int g(void)
+{
+  struct aligned_buffer buf;
+  return f(buf.x);
+}
-- 
2.45.2



Re: [PATCH v2] Target-independent store forwarding avoidance.

2024-06-09 Thread Jeff Law



On 6/7/24 4:31 PM, Jeff Law wrote:



I've actually added it to my tester just to see if there's any fallout. 
It'll take a week to churn through the long running targets that 
bootstrap in QEMU, but the crosses should have data Monday.
The first round naturally didn't trigger anything because the option is 
off by default.  So I twiddled it to be on at -O1 and above.


epiphany-elf ICEs in gen_rtx_SUBREG with the attached .i file compiled 
with -O2:



root@577c7458c93a://home/jlaw/jenkins/workspace/epiphany-elf/epiphany-elf-obj/newlib/epiphany-elf/newlib/libm/complex#
 epiphany-elf-gcc -O2 libm_a-cacos.i
during RTL pass: avoid_store_forwarding
../../../..//newlib-cygwin/newlib/libm/complex/cacos.c: In function 'cacos':
../../../..//newlib-cygwin/newlib/libm/complex/cacos.c:99:1: internal compiler 
error: in gen_rtx_SUBREG, at emit-rtl.cc:1032
0x614538 gen_rtx_SUBREG(machine_mode, rtx_def*, poly_int<1u, unsigned long>)
../../..//gcc/gcc/emit-rtl.cc:1032
0x614538 gen_rtx_SUBREG(machine_mode, rtx_def*, poly_int<1u, unsigned long>)
../../..//gcc/gcc/emit-rtl.cc:1030
0xe82216 process_forwardings
../../..//gcc/gcc/avoid-store-forwarding.cc:273
0xe82216 avoid_store_forwarding
../../..//gcc/gcc/avoid-store-forwarding.cc:489
0xe82667 execute
../../..//gcc/gcc/avoid-store-forwarding.cc:558



ft32-elf ICE'd in bitmap_check_index at various optimization levels:


FAIL: execute/pr108498-2.c   -O1  (internal compiler error: in 
bitmap_check_index, at sbitmap.h:104)
FAIL: execute/pr108498-2.c   -O1  (test for excess errors)
FAIL: execute/pr108498-2.c   -O2  (internal compiler error: in 
bitmap_check_index, at sbitmap.h:104)
FAIL: execute/pr108498-2.c   -O2  (test for excess errors)
FAIL: execute/pr108498-2.c   -O3 -g  (internal compiler error: in 
bitmap_check_index, at sbitmap.h:104)
FAIL: execute/pr108498-2.c   -O3 -g  (test for excess errors)



avr, c6x,

lm32-elf failed to build libgcc with an ICE in leaf_function_p, I 
haven't isolated that yet.



There were other failures as well.  But you've got a few to start with 
and we can retest pretty easily as the patch evolves.


jeff

# 0 "../../../..//newlib-cygwin/newlib/libm/complex/cacos.c"
# 1 
"//home/jlaw/jenkins/workspace/epiphany-elf/epiphany-elf-obj/newlib/epiphany-elf/newlib//"
# 0 ""
# 0 ""
# 1 "../../../..//newlib-cygwin/newlib/libm/complex/cacos.c"
# 77 "../../../..//newlib-cygwin/newlib/libm/complex/cacos.c"
# 1 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/complex.h"
 1 3 4
# 15 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/complex.h"
 3 4
# 1 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/sys/cdefs.h"
 1 3 4
# 45 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/sys/cdefs.h"
 3 4
# 1 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 1 3 4







# 1 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/sys/features.h"
 1 3 4
# 28 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/sys/features.h"
 3 4
# 1 "./_newlib_version.h" 1 3 4
# 29 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/sys/features.h"
 2 3 4
# 9 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 2 3 4
# 41 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4

# 41 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4
typedef signed char __int8_t;

typedef unsigned char __uint8_t;
# 55 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4
typedef short int __int16_t;

typedef short unsigned int __uint16_t;
# 77 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4
typedef long int __int32_t;

typedef long unsigned int __uint32_t;
# 103 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4
typedef long long int __int64_t;

typedef long long unsigned int __uint64_t;
# 134 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4
typedef signed char __int_least8_t;

typedef unsigned char __uint_least8_t;
# 160 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4
typedef short int __int_least16_t;

typedef short unsigned int __uint_least16_t;
# 182 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4
typedef long int __int_least32_t;

typedef long unsigned int __uint_least32_t;
# 200 
"/home/jlaw/jenkins/workspace/epiphany-elf/newlib-cygwin/newlib/libc/include/machine/_default_types.h"
 3 4
typedef long long in

Re: [PING] [contrib] validate_failures.py: fix python 3.12 escape sequence warnings

2024-06-09 Thread Jeff Law




On 6/9/24 5:45 AM, Gabi Falk wrote:

Hi,

On Sat, Jun 08, 2024 at 03:34:02PM -0600, Jeff Law wrote:

On 5/14/24 8:12 AM, Gabi Falk wrote:

Hi,

This one still needs review:

https://inbox.sourceware.org/gcc-patches/20240415233833.104460-1-gabif...@gmx.com/

I think I just ACK'd an equivalent patch from someone else this week.


Looks like it hasn't been merged yet, and I couldn't find it in the
mailing list archive.
Anyway, I hope either one gets merged soon. :)
I'm sure it will.  The variant I asked is from someone with commit 
privs, so they'll push it to the tree when convenient for them.


jeff



Re: [PATCH] ifcvt.cc: Prevent excessive if-conversion for conditional moves

2024-06-09 Thread Jeff Law




On 6/9/24 5:28 AM, YunQiang Su wrote:

YunQiang Su  于2024年6月9日周日 18:25写道:




gcc/ChangeLog:

   * ifcvt.cc (cond_move_process_if_block):
   Consider the result of targetm.noce_conversion_profitable_p()
   when replacing the original sequence with the converted one.

THanks.  I pushed this to the trunk.



Sorry for the delay report. With this patch the test
gcc.target/mips/movcc-3.c fails.



The problem may be caused by the different of `seq` and `edge e`.
In `seq`, there may be a compare operation, while
`default_max_noce_ifcvt_seq_cost`
only count the branch operation.

The rtx_cost may consider the compare operation in `seq` as quite expensive.
Overall it sounds like a target issue to me -- ie, now that we're 
testing for profitability instead of just assuming it's profitable some 
targets need adjustment.  Either in their costing model or in the 
testsuite expectations.


Jeff



Re: [to-be-committed] [RISC-V] Use Zbkb for general 64 bit constants when profitable

2024-06-09 Thread Jeff Law




On 6/7/24 11:49 AM, Andreas Schwab wrote:

In file included from ../../gcc/rtl.h:3973,
  from ../../gcc/config/riscv/riscv.cc:31:
In function 'rtx_def* init_rtx_fmt_ee(rtx, machine_mode, rtx, rtx)',
 inlined from 'rtx_def* gen_rtx_fmt_ee_stat(rtx_code, machine_mode, rtx, 
rtx)' at ./genrtl.h:50:26,
 inlined from 'void riscv_move_integer(rtx, rtx, long int, machine_mode)' 
at ../../gcc/config/riscv/riscv.cc:2786:10:
./genrtl.h:37:16: error: 'x' may be used uninitialized 
[-Werror=maybe-uninitialized]
37 |   XEXP (rt, 0) = arg0;
../../gcc/config/riscv/riscv.cc: In function 'void riscv_move_integer(rtx, rtx, 
long int, machine_mode)':
../../gcc/config/riscv/riscv.cc:2723:7: note: 'x' was declared here
  2723 |   rtx x;
   |   ^
cc1plus: all warnings being treated as errors
Thanks.  I guess the change in control flow in there does hide x's state 
pretty well.  It may not even be provable as initialized without knowing 
how this routine interacts with the costing phase that fills in the codes.


I'll take care of it.

Thanks again,
jeff



[committed] [RISC-V] Fix false-positive uninitialized variable

2024-06-09 Thread Jeff Law
Andreas noted we were getting an uninit warning after the recent 
constant synthesis changes.  Essentially there's no way for the uninit 
analysis code to know the first entry in the CODES array is a UNKNOWN 
which will set X before its first use.


So trivial initialization with NULL_RTX is the obvious fix.

Pushed to the trunk.

Jeff

commit 932c6f8dd8859afb13475c2de466bd1a159530da
Author: Jeff Law 
Date:   Sun Jun 9 09:17:55 2024 -0600

[committed] [RISC-V] Fix false-positive uninitialized variable

Andreas noted we were getting an uninit warning after the recent constant
synthesis changes.  Essentially there's no way for the uninit analysis code 
to
know the first entry in the CODES array is a UNKNOWN which will set X before
its first use.

So trivial initialization with NULL_RTX is the obvious fix.

Pushed to the trunk.

gcc/

* config/riscv/riscv.cc (riscv_move_integer): Initialize "x".

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 95f3636f8e4..c17141d909a 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -2720,7 +2720,7 @@ riscv_move_integer (rtx temp, rtx dest, HOST_WIDE_INT 
value,
   struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
   machine_mode mode;
   int i, num_ops;
-  rtx x;
+  rtx x = NULL_RTX;
 
   mode = GET_MODE (dest);
   /* We use the original mode for the riscv_build_integer call, because HImode


Ping^3: [PATCH 0/2] Fix two test failures with --enable-default-pie [PR70150]

2024-06-09 Thread Xi Ruoyao
Ping https://gcc.gnu.org/pipermail/gcc-patches/2024-May/650763.html
again, adding more reviewers into CC...

On Mon, 2024-05-06 at 12:45 +0800, Xi Ruoyao wrote:
> In GCC 14.1-rc1, there are two new (comparing to GCC 13) failures if
> the build is configured --enable-default-pie.  Let's fix them.
> 
> Tested on x86_64-linux-gnu.  Ok for trunk and releases/gcc-14?
> 
> Xi Ruoyao (2):
>   i386: testsuite: Add -no-pie for pr113689-1.c [PR70150]
>   i386: testsuite: Adapt fentryname3.c for r14-811 change [PR70150]
> 
>  gcc/testsuite/gcc.target/i386/fentryname3.c | 3 +--
>  gcc/testsuite/gcc.target/i386/pr113689-1.c  | 2 +-
>  2 files changed, 2 insertions(+), 3 deletions(-)

-- 
Xi Ruoyao 
School of Aerospace Science and Technology, Xidian University


Re: [Patch, fortran] PR59104

2024-06-09 Thread Paul Richard Thomas
Hi All,

I have extended the testcase - see below and have
s/dependent_decls_2/dependent_decls_2.f90/ in the ChnageLog.

Cheers

Paul

! { dg-do run }
!
! Fix for PR59104 in which the dependence on the old style function result
! was not taken into account in the ordering of auto array allocation and
! characters with dependent lengths.
!
! Contributed by Tobias Burnus  
!
module m
   implicit none
   integer, parameter :: dp = kind([double precision::])
   contains
  function f(x)
 integer, intent(in) :: x
 real(dp) f(x/2)
 real(dp) g(x/2)
 integer y(size (f)+1) ! This was the original problem
 integer z(size (f) + size (y)) ! Found in development of the fix
 integer w(size (f) + size (y) + x) ! Check dummy is OK
 f = 10.0
 y = 1! Stop -Wall from complaining
 z = 1
 g = 1
 w = 1
 if (size (f) .ne. 1) stop 1
 if (size (g) .ne. 1) stop 2
 if (size (y) .ne. 2) stop 3
 if (size (z) .ne. 3) stop 4
 if (size (w) .ne. 5) stop 5
  end function f
  function e(x) result(f)
 integer, intent(in) :: x
 real(dp) f(x/2)
 real(dp) g(x/2)
 integer y(size (f)+1)
 integer z(size (f) + size (y)) ! As was this.
 integer w(size (f) + size (y) + x)
 f = 10.0
 y = 1
 z = 1
 g = 1
 w = 1
 if (size (f) .ne. 2) stop 6
 if (size (g) .ne. 2) stop 7
 if (size (y) .ne. 3) stop 8
 if (size (z) .ne. 5) stop 9
 if (size (w) .ne. 9) stop 10
  end function
  function d(x)  ! After fixes to arrays, what was needed was known!
integer, intent(in) :: x
character(len = x/2) :: d
character(len = len (d)) :: line
character(len = len (d) + len (line)) :: line2
character(len = len (d) + len (line) + x) :: line3
line = repeat ("a", len (d))
line2 = repeat ("b", x)
line3 = repeat ("c", len (line3))
if (len (line2) .ne. x) stop 11
if (line3 .ne. "") stop 12
d = line
  end
end module m

program p
   use m
   implicit none
   real(dp) y

   y = sum (f (2))
   if (int (y) .ne. 10) stop 13
   y = sum (e (4))
   if (int (y) .ne. 20) stop 14
   if (d (4) .ne. "aa") stop 15
end program p



On Sun, 9 Jun 2024 at 07:14, Paul Richard Thomas <
paul.richard.tho...@gmail.com> wrote:

> Hi All,
>
> The attached fixes a problem that, judging by the comments, has been
> looked at periodically over the last ten years but just looked to be too
> fiendishly complicated to fix. This is not in small part because of the
> confusing ordering of dummies in the tlink chain and the unintuitive
> placement of all deferred initializations to the front of the init chain in
> the wrapped block.
>
> The result of the existing ordering is that the initialization code for
> non-dummy variables that depends on the function result occurs before any
> initialization code for the function result itself. The fix ensures that:
> (i) These variables are placed correctly in the tlink chain, respecting
> inter-dependencies; and (ii) The dependent initializations are placed at
> the end of the wrapped block init chain.  The details appear in the
> comments in the patch. It is entirely possible that a less clunky fix
> exists but I failed to find it.
>
> OK for mainline?
>
> Regards
>
> Paul
>
>
>
>


[to-be-committed] [RISC-V] Use bext for extracting a bit into a SImode object

2024-06-09 Thread Jeff Law
bext is defined as (src >> n) & 1.  With that formulation, particularly 
the "&1" means the result is implicitly zero extended.  So we can safely 
use it on SI objects for rv64 without the need to do any explicit extension.


This patch adds the obvious pattern and a few testcases.   I think one 
of the tests is derived from coremark, the other two from spec2017.


This has churned through Ventana's CI system repeatedly since it was 
first written.  Assuming pre-commit CI doesn't complain, I'll commit it 
on Raphael's behalf later today or Monday.



Jeff

gcc/
* config/riscv/bitmanip.md (*bextdisi): New pattern.

gcc/testsuite

* gcc.target/riscv/bext-ext.c: New test.

commit e32599b6c863cffa594ab1eca8f4e11562c4bc6a
Author: Raphael Zinsly 
Date:   Fri Mar 22 16:20:21 2024 -0600

Improvement to bext discovery from Raphael.  Extracted from his "Add Zbs 
extended patterns" MR.

diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
index 00560be6161..6559d4d6950 100644
--- a/gcc/config/riscv/bitmanip.md
+++ b/gcc/config/riscv/bitmanip.md
@@ -854,6 +854,18 @@ (define_insn_and_split "*bextseqzdisi"
   "operands[1] = gen_lowpart (word_mode, operands[1]);"
   [(set_attr "type" "bitmanip")])
 
+;; The logical-and against 0x1 implicitly extends the result.   So we can treat
+;; an SImode bext as-if it's DImode without any explicit extension.
+(define_insn "*bextdisi"
+  [(set (match_operand:DI 0 "register_operand" "=r")
+(and:DI (subreg:DI (lshiftrt:SI
+(match_operand:SI 1 "register_operand" "r")
+(match_operand:QI 2 "register_operand" "r")) 0)
+(const_int 1)))]
+  "TARGET_64BIT && TARGET_ZBS"
+  "bext\t%0,%1,%2"
+  [(set_attr "type" "bitmanip")])
+
 ;; When performing `(a & (1UL << bitno)) ? 0 : -1` the combiner
 ;; usually has the `bitno` typed as X-mode (i.e. no further
 ;; zero-extension is performed around the bitno).
diff --git a/gcc/testsuite/gcc.target/riscv/bext-ext.c 
b/gcc/testsuite/gcc.target/riscv/bext-ext.c
new file mode 100644
index 000..eeef07d7013
--- /dev/null
+++ b/gcc/testsuite/gcc.target/riscv/bext-ext.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-options "-march=rv64gc_zbs -mabi=lp64" } */
+/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
+typedef unsigned long uint64_t;
+typedef unsigned int uint32_t;
+
+uint64_t bext1 (int dst, const uint32_t i)
+{
+  uint64_t checks = 1U;
+  checks &= dst >> i;
+  return checks;
+}
+
+int bext2 (int dst, int i_denom)
+{
+  dst = 1 & (dst >> i_denom);
+  return dst;
+}
+
+const uint32_t bext3 (uint32_t bit_count, uint32_t symbol)
+{
+  return (symbol >> bit_count) & 1;
+}
+
+/* { dg-final { scan-assembler-times "bext\t" 3 } } */
+/* { dg-final { scan-assembler-not "sllw\t"} } */
+/* { dg-final { scan-assembler-not "srlw\t"} } */


Re: Reverted recent patches to resource.cc

2024-06-09 Thread Hans-Peter Nilsson
> Date: Sat, 8 Jun 2024 11:10:21 -0600
> From: Jeff Law 

> >>>resource.cc: Replace calls to find_basic_block with cfgrtl
> >>>  BLOCK_FOR_INSN
> >>>resource.cc (mark_target_live_regs): Remove check for bb not found
> >>>resource.cc: Remove redundant conditionals
> >>
> >> I had to revert those last three patches due to PR
> >> bootstrap/115284.  I hope to revisit once I have a means to
> >> reproduce (and fix) the underlying bug.  It doesn't have to
> >> be a bug with those changes per-se: IMHO the "improved"
> >> lifetimes could just as well have uncovered a bug elsewhere
> >> in reorg.  It's still on me to resolve that situation; done.
> >> I'm just glad the cause was the incidental improvements and
> >> not the original bug I wanted to fix.
> >>
> >> There appears to be only a single supported SPARC machine in
> >> cfarm: cfarm216, and I currently can't reach it due to what
> >> appears to be issues at my end.  I guess I'll either fix
> >> that or breathe life into sparc-elf+sim.
> > Or if you've got a reasonable server to use, QEMU might save you :-)
> > 
> 
> Even better option.  The sh4/sh4eb-linux-gnu ports with 
> execute/ieee/fp-cmp-5.c test.  That started execution failing at -O2 
> with the first patch in the series and there are very clear assembly 
> differences before/after your change.  Meaning you can probably look at 
> them with just a cross compile and compare the before/after.

Interesting, thanks for this.  (I'd expect assembly
differences, but only for slightly improved performance.)

Not sure when I'll revisit the underlying problem at
bootstrap/115284 though, perhaps not this week.  (For
context, for the gallery, for the record: the bootstrap
problem there is solved.)

brgds, H-P


Re: [Patch, fortran] PR59104

2024-06-09 Thread Harald Anlauf

Hi Paul,

your approach sounds entirely reasonable.

But as the following addition to the testcase shows, there seem to
be loopholes left.

When I add the following to function f:

 integer :: l1(size(y))
 integer :: l2(size(z))
 print *, size (l1), size (l2), size (z)

I get:

   0   0   3

Expected:

   2   3   3

Can you please check?

Thanks,
Harald


Am 09.06.24 um 17:57 schrieb Paul Richard Thomas:

Hi All,

I have extended the testcase - see below and have
s/dependent_decls_2/dependent_decls_2.f90/ in the ChnageLog.

Cheers

Paul

! { dg-do run }
!
! Fix for PR59104 in which the dependence on the old style function result
! was not taken into account in the ordering of auto array allocation and
! characters with dependent lengths.
!
! Contributed by Tobias Burnus  
!
module m
implicit none
integer, parameter :: dp = kind([double precision::])
contains
   function f(x)
  integer, intent(in) :: x
  real(dp) f(x/2)
  real(dp) g(x/2)
  integer y(size (f)+1) ! This was the original problem
  integer z(size (f) + size (y)) ! Found in development of the fix
  integer w(size (f) + size (y) + x) ! Check dummy is OK
  f = 10.0
  y = 1! Stop -Wall from complaining
  z = 1
  g = 1
  w = 1
  if (size (f) .ne. 1) stop 1
  if (size (g) .ne. 1) stop 2
  if (size (y) .ne. 2) stop 3
  if (size (z) .ne. 3) stop 4
  if (size (w) .ne. 5) stop 5
   end function f
   function e(x) result(f)
  integer, intent(in) :: x
  real(dp) f(x/2)
  real(dp) g(x/2)
  integer y(size (f)+1)
  integer z(size (f) + size (y)) ! As was this.
  integer w(size (f) + size (y) + x)
  f = 10.0
  y = 1
  z = 1
  g = 1
  w = 1
  if (size (f) .ne. 2) stop 6
  if (size (g) .ne. 2) stop 7
  if (size (y) .ne. 3) stop 8
  if (size (z) .ne. 5) stop 9
  if (size (w) .ne. 9) stop 10
   end function
   function d(x)  ! After fixes to arrays, what was needed was known!
 integer, intent(in) :: x
 character(len = x/2) :: d
 character(len = len (d)) :: line
 character(len = len (d) + len (line)) :: line2
 character(len = len (d) + len (line) + x) :: line3
 line = repeat ("a", len (d))
 line2 = repeat ("b", x)
 line3 = repeat ("c", len (line3))
 if (len (line2) .ne. x) stop 11
 if (line3 .ne. "") stop 12
 d = line
   end
end module m

program p
use m
implicit none
real(dp) y

y = sum (f (2))
if (int (y) .ne. 10) stop 13
y = sum (e (4))
if (int (y) .ne. 20) stop 14
if (d (4) .ne. "aa") stop 15
end program p



On Sun, 9 Jun 2024 at 07:14, Paul Richard Thomas <
paul.richard.tho...@gmail.com> wrote:


Hi All,

The attached fixes a problem that, judging by the comments, has been
looked at periodically over the last ten years but just looked to be too
fiendishly complicated to fix. This is not in small part because of the
confusing ordering of dummies in the tlink chain and the unintuitive
placement of all deferred initializations to the front of the init chain in
the wrapped block.

The result of the existing ordering is that the initialization code for
non-dummy variables that depends on the function result occurs before any
initialization code for the function result itself. The fix ensures that:
(i) These variables are placed correctly in the tlink chain, respecting
inter-dependencies; and (ii) The dependent initializations are placed at
the end of the wrapped block init chain.  The details appear in the
comments in the patch. It is entirely possible that a less clunky fix
exists but I failed to find it.

OK for mainline?

Regards

Paul











Re: [PATCH] FreeBSD: Stop linking _p libs for -pg as of FreeBSD 14

2024-06-09 Thread Gerald Pfeifer
On Fri, 13 Aug 2021, Andreas Tobler via Gcc-patches wrote:
> I would like to commit the attached patch to trunk and after a settling 
> period also to all open branches.
> Is this ok?

Our MAINTAINERS file has the following entry:

  freebsd   Andreas Tobler   

So ... yes. :-)
 
Seeing this did not make it into our tree, I applied the patchset,
bootstrapped on x86_64-unknown-freebsd13.2 and pushed with a minor 
simplification to the ChangeLog. Patch as pushed below...

Gerald


commit 48abb540701447b0cd9df7542720ab65a34fc1b1
Author: Andreas Tobler 
Date:   Sun Jun 9 23:18:04 2024 +0200

FreeBSD: Stop linking _p libs for -pg as of FreeBSD 14

As of FreeBSD version 14, FreeBSD no longer provides profiled system
libraries like libc_p and libpthread_p. Stop linking against them if
the FreeBSD major version is 14 or more.

gcc:
* config/freebsd-spec.h: Change fbsd-lib-spec for FreeBSD > 13,
do not link against profiled system libraries if -pg is invoked.
Add a define to note about this change.
* config/aarch64/aarch64-freebsd.h: Use the note to inform if
-pg is invoked on FreeBSD > 13.
* config/arm/freebsd.h: Likewise.
* config/i386/freebsd.h: Likewise.
* config/i386/freebsd64.h: Likewise.
* config/riscv/freebsd.h: Likewise.
* config/rs6000/freebsd64.h: Likewise.
* config/rs6000/sysv4.h: Likeise.

diff --git a/gcc/config/aarch64/aarch64-freebsd.h 
b/gcc/config/aarch64/aarch64-freebsd.h
index 53cc17a1caf..e26d69ce46c 100644
--- a/gcc/config/aarch64/aarch64-freebsd.h
+++ b/gcc/config/aarch64/aarch64-freebsd.h
@@ -35,6 +35,7 @@
 #undef  FBSD_TARGET_LINK_SPEC
 #define FBSD_TARGET_LINK_SPEC " \
 %{p:%nconsider using `-pg' instead of `-p' with gprof (1)}  \
+" FBSD_LINK_PG_NOTE "  \
 %{v:-V} \
 %{assert*} %{R*} %{rpath*} %{defsym*}   \
 %{shared:-Bshareable %{h*} %{soname*}}  \
diff --git a/gcc/config/arm/freebsd.h b/gcc/config/arm/freebsd.h
index 9d0a5a842ab..ee4860ae637 100644
--- a/gcc/config/arm/freebsd.h
+++ b/gcc/config/arm/freebsd.h
@@ -47,6 +47,7 @@
 #undef LINK_SPEC
 #define LINK_SPEC "\
   %{p:%nconsider using `-pg' instead of `-p' with gprof (1)}   \
+  " FBSD_LINK_PG_NOTE "
\
   %{v:-V}  \
   %{assert*} %{R*} %{rpath*} %{defsym*}
\
   %{shared:-Bshareable %{h*} %{soname*}}   \
diff --git a/gcc/config/freebsd-spec.h b/gcc/config/freebsd-spec.h
index a6d1ad1280f..f43056bf2cf 100644
--- a/gcc/config/freebsd-spec.h
+++ b/gcc/config/freebsd-spec.h
@@ -92,19 +92,29 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
If not, see
libc, depending on whether we're doing profiling or need threads support.
(similar to the default, except no -lg, and no -p).  */
 
+#if FBSD_MAJOR < 14
+#define FBSD_LINK_PG_NOTHREADS "%{!pg: -lc}  %{pg: -lc_p}"
+#define FBSD_LINK_PG_THREADS   "%{!pg: %{pthread:-lpthread} -lc} " \
+   "%{pg: %{pthread:-lpthread} -lc_p}"
+#define FBSD_LINK_PG_NOTE ""
+#else
+#define FBSD_LINK_PG_NOTHREADS "%{-lc} "
+#define FBSD_LINK_PG_THREADS   "%{pthread:-lpthread} -lc "
+#define FBSD_LINK_PG_NOTE "%{pg:%nFreeBSD no longer provides profiled "\
+ "system libraries}"
+#endif
+
 #ifdef FBSD_NO_THREADS
 #define FBSD_LIB_SPEC "
\
   %{pthread: %eThe -pthread option is only supported on FreeBSD when gcc \
 is built with the --enable-threads configure-time option.} \
   %{!shared:   \
-%{!pg: -lc}
\
-%{pg:  -lc_p}  \
+" FBSD_LINK_PG_NOTHREADS " \
   }"
 #else
 #define FBSD_LIB_SPEC "
\
   %{!shared:   \
-%{!pg: %{pthread:-lpthread} -lc}   \
-%{pg:  %{pthread:-lpthread_p} -lc_p}   \
+" FBSD_LINK_PG_THREADS "   \
   }\
   %{shared:\
 %{pthread:-lpthread} -lc   \
diff --git a/gcc/config/i386/freebsd.h b/gcc/config/i386/freebsd.h
index 3c57dc7cfae..583c752bb76 100644
--- a/gcc/config/i386/fr

Re: [PATCH] FreeBSD: Stop linking _p libs for -pg as of FreeBSD 14

2024-06-09 Thread Andrew Pinski
On Sun, Jun 9, 2024 at 2:22 PM Gerald Pfeifer  wrote:
>
> On Fri, 13 Aug 2021, Andreas Tobler via Gcc-patches wrote:
> > I would like to commit the attached patch to trunk and after a settling
> > period also to all open branches.
> > Is this ok?
>
> Our MAINTAINERS file has the following entry:
>
>   freebsd   Andreas Tobler   
>
> So ... yes. :-)
>
> Seeing this did not make it into our tree, I applied the patchset,
> bootstrapped on x86_64-unknown-freebsd13.2 and pushed with a minor
> simplification to the ChangeLog. Patch as pushed below...

Note this was https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113218
which I just closed as fixed. I am not sure if someone was going to
backport it though.

Thanks,
Andrew

>
> Gerald
>
>
> commit 48abb540701447b0cd9df7542720ab65a34fc1b1
> Author: Andreas Tobler 
> Date:   Sun Jun 9 23:18:04 2024 +0200
>
> FreeBSD: Stop linking _p libs for -pg as of FreeBSD 14
>
> As of FreeBSD version 14, FreeBSD no longer provides profiled system
> libraries like libc_p and libpthread_p. Stop linking against them if
> the FreeBSD major version is 14 or more.
>
> gcc:
> * config/freebsd-spec.h: Change fbsd-lib-spec for FreeBSD > 13,
> do not link against profiled system libraries if -pg is invoked.
> Add a define to note about this change.
> * config/aarch64/aarch64-freebsd.h: Use the note to inform if
> -pg is invoked on FreeBSD > 13.
> * config/arm/freebsd.h: Likewise.
> * config/i386/freebsd.h: Likewise.
> * config/i386/freebsd64.h: Likewise.
> * config/riscv/freebsd.h: Likewise.
> * config/rs6000/freebsd64.h: Likewise.
> * config/rs6000/sysv4.h: Likeise.
>
> diff --git a/gcc/config/aarch64/aarch64-freebsd.h 
> b/gcc/config/aarch64/aarch64-freebsd.h
> index 53cc17a1caf..e26d69ce46c 100644
> --- a/gcc/config/aarch64/aarch64-freebsd.h
> +++ b/gcc/config/aarch64/aarch64-freebsd.h
> @@ -35,6 +35,7 @@
>  #undef  FBSD_TARGET_LINK_SPEC
>  #define FBSD_TARGET_LINK_SPEC " \
>  %{p:%nconsider using `-pg' instead of `-p' with gprof (1)}  \
> +" FBSD_LINK_PG_NOTE "  \
>  %{v:-V} \
>  %{assert*} %{R*} %{rpath*} %{defsym*}   \
>  %{shared:-Bshareable %{h*} %{soname*}}  \
> diff --git a/gcc/config/arm/freebsd.h b/gcc/config/arm/freebsd.h
> index 9d0a5a842ab..ee4860ae637 100644
> --- a/gcc/config/arm/freebsd.h
> +++ b/gcc/config/arm/freebsd.h
> @@ -47,6 +47,7 @@
>  #undef LINK_SPEC
>  #define LINK_SPEC "\
>%{p:%nconsider using `-pg' instead of `-p' with gprof (1)}   \
> +  " FBSD_LINK_PG_NOTE "  
>   \
>%{v:-V}  \
>%{assert*} %{R*} %{rpath*} %{defsym*}  
>   \
>%{shared:-Bshareable %{h*} %{soname*}}   \
> diff --git a/gcc/config/freebsd-spec.h b/gcc/config/freebsd-spec.h
> index a6d1ad1280f..f43056bf2cf 100644
> --- a/gcc/config/freebsd-spec.h
> +++ b/gcc/config/freebsd-spec.h
> @@ -92,19 +92,29 @@ see the files COPYING3 and COPYING.RUNTIME respectively.  
> If not, see
> libc, depending on whether we're doing profiling or need threads support.
> (similar to the default, except no -lg, and no -p).  */
>
> +#if FBSD_MAJOR < 14
> +#define FBSD_LINK_PG_NOTHREADS "%{!pg: -lc}  %{pg: -lc_p}"
> +#define FBSD_LINK_PG_THREADS   "%{!pg: %{pthread:-lpthread} -lc} " \
> +   "%{pg: %{pthread:-lpthread} -lc_p}"
> +#define FBSD_LINK_PG_NOTE ""
> +#else
> +#define FBSD_LINK_PG_NOTHREADS "%{-lc} "
> +#define FBSD_LINK_PG_THREADS   "%{pthread:-lpthread} -lc "
> +#define FBSD_LINK_PG_NOTE "%{pg:%nFreeBSD no longer provides profiled "\
> + "system libraries}"
> +#endif
> +
>  #ifdef FBSD_NO_THREADS
>  #define FBSD_LIB_SPEC "  
>   \
>%{pthread: %eThe -pthread option is only supported on FreeBSD when gcc \
>  is built with the --enable-threads configure-time option.} \
>%{!shared:   \
> -%{!pg: -lc}  
>   \
> -%{pg:  -lc_p}  \
> +" FBSD_LINK_PG_NOTHREADS " \
>}"
>  #else
>  #define FBSD_LIB_SPEC "  
>   \
>%{!shared:   \
> -%{!pg: %{pthread:-lpthread} -lc}   \
> -%{pg:  %{pthread:-lpthread_p} -lc_p}   \
> +" 

Re: [PATCH] fixincludes: bypass the math_exception fix on __cplusplus

2024-06-09 Thread Bruce Korb
:-D Looks good to me. EXCEPT I think the test sample file would need a 
change, too. I didn't see that.


On 6/7/24 02:37, FX Coudert wrote:

The fixincludes fix “math_exception” is being applied overly broadly, including 
many targets which don’t need it, like darwin (and probably all non-glibc 
targets). I’m not sure if it is still needed on any target, but because I can’t 
be absolutely positive about that, I don’t want to remove it. But it dates from 
before 1998.

In subsequent times (2000) it was bypassed on glibc headers, as well as Solaris 
10. It was still needed on Solaris 8 and 9, which are (AFAICT) unsupported 
nowadays. The fix was originally bypassed on __cplusplus, which is the correct 
thing to do, but that bypass was neutralized to cater to a bug on Solaris 8 and 
9 headers. Now that those are gone… let’s revert to the previous bypass.


Bootstrapped and regtested on x86_64-apple-darwin23, where it no longer “fixes” 
the header unnecessarily.
OK to push?

FX





Re: [PATCH] fixincludes: bypass the math_exception fix on __cplusplus

2024-06-09 Thread FX Coudert
> :-D Looks good to me. EXCEPT I think the test sample file would need a 
> change, too. I didn't see that.

Running “make check” produces the additional diff, which I’ll add to the patch 
before I push. Does it look okay?

FX




diff --git a/fixincludes/tests/base/math.h b/fixincludes/tests/base/math.h
index 7b92f29a409..3c378c5df95 100644
--- a/fixincludes/tests/base/math.h
+++ b/fixincludes/tests/base/math.h
@@ -7,12 +7,6 @@
 This had to be done to correct non-standard usages in the
 original, manufacturer supplied header file.  */
  -#ifndef FIXINC_WRAP_MATH_H_MATH_EXCEPTION
-#define FIXINC_WRAP_MATH_H_MATH_EXCEPTION 1
-
-#ifdef __cplusplus
-#define exception __math_exception
-#endif
 #if defined( BROKEN_CABS_CHECK )
@@ -146,8 +140,3 @@ int foo;
 #endif /* _C99 */
   #endif  /* VXWORKS_MATH_H_FP_C99_CHECK */
-#ifdef __cplusplus
-#undef exception
-#endif
-
-#endif  /* FIXINC_WRAP_MATH_H_MATH_EXCEPTION */



Re: [PATCH] ifcvt.cc: Prevent excessive if-conversion for conditional moves

2024-06-09 Thread YunQiang Su
> > The rtx_cost may consider the compare operation in `seq` as quite expensive.
> Overall it sounds like a target issue to me -- ie, now that we're
> testing for profitability instead of just assuming it's profitable some
> targets need adjustment.  Either in their costing model or in the
> testsuite expectations.
>

Yes. You are right, I find the real problem. In mips-cpus.def

MIPS_CPU ("mips32", PROCESSOR_4KC, MIPS_ISA_MIPS32,
PTF_AVOID_BRANCHLIKELY_ALWAYS)
MIPS_CPU ("mips64", PROCESSOR_5KC, MIPS_ISA_MIPS64,
PTF_AVOID_BRANCHLIKELY_ALWAYS)
MIPS_CPU ("mips64r2", PROCESSOR_5KC, MIPS_ISA_MIPS64R2,
PTF_AVOID_BRANCHLIKELY_ALWAYS)
MIPS_CPU ("mips64r3", PROCESSOR_5KC, MIPS_ISA_MIPS64R3,
PTF_AVOID_BRANCHLIKELY_ALWAYS)
MIPS_CPU ("mips64r5", PROCESSOR_5KC, MIPS_ISA_MIPS64R5,
PTF_AVOID_BRANCHLIKELY_ALWAYS)

Here PROCESSOR_4KC and PROCESSOR_5KC are both FPU-less.

> Jeff
>


-- 
YunQiang Su


[PING] Re: [PATCH v7 1/9] Improve must tail in RTL backend

2024-06-09 Thread Andi Kleen


Need reviewers for the tree and middle-end parts, as well as the C frontend.

Thanks!

-Andi


[PATCH] aarch64: Improve popcount for bytes [PR113042]

2024-06-09 Thread Andrew Pinski
For popcount for bytes, we don't need the reduction addition
after the vector cnt instruction as we are only counting one
byte's popcount.
This implements a new define_expand to handle that.

Bootstrapped and tested on aarch64-linux-gnu with no regressions.

PR target/113042

gcc/ChangeLog:

* config/aarch64/aarch64.md (popcountqi2): New pattern.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/popcnt5.c: New test.

Signed-off-by: Andrew Pinski 
---
 gcc/config/aarch64/aarch64.md  | 26 ++
 gcc/testsuite/gcc.target/aarch64/popcnt5.c | 19 
 2 files changed, 45 insertions(+)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/popcnt5.c

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 389a1906e23..ebaf7ec9970 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5358,6 +5358,32 @@ (define_expand "popcount2"
 }
 })
 
+/* Popcount for byte can remove the reduction part after the popcount.
+   For optimization reasons, enabling this for CSSC. */
+(define_expand "popcountqi2"
+  [(set (match_operand:QI 0 "register_operand" "=w")
+   (popcount:QI (match_operand:QI 1 "register_operand" "w")))]
+  "TARGET_CSSC || TARGET_SIMD"
+{
+  rtx in = operands[1];
+  rtx out = operands[0];
+  if (TARGET_CSSC)
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  rtx out1 = gen_reg_rtx (SImode);
+  emit_insn (gen_zero_extendqisi2 (tmp, in));
+  emit_insn (gen_popcountsi2 (out1, tmp));
+  emit_move_insn (out, gen_lowpart (QImode, out1));
+  DONE;
+}
+  rtx v = gen_reg_rtx (V8QImode);
+  rtx v1 = gen_reg_rtx (V8QImode);
+  emit_move_insn (v, gen_lowpart (V8QImode, in));
+  emit_insn (gen_popcountv8qi2 (v1, v));
+  emit_move_insn (out, gen_lowpart (QImode, v1));
+  DONE;
+})
+
 (define_insn "clrsb2"
   [(set (match_operand:GPI 0 "register_operand" "=r")
 (clrsb:GPI (match_operand:GPI 1 "register_operand" "r")))]
diff --git a/gcc/testsuite/gcc.target/aarch64/popcnt5.c 
b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
new file mode 100644
index 000..406369d9b29
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/popcnt5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" } } */
+/* PR target/113042 */
+
+#pragma GCC target "+nocssc"
+
+/*
+** h8:
+** ldr b[0-9]+, \[x0\]
+** cnt v[0-9]+.8b, v[0-9]+.8b
+** smovw0, v[0-9]+.b\[0\]
+** ret
+*/
+/* We should not need the addv here since we only need a byte popcount. */
+
+unsigned h8 (const unsigned char *a) {
+ return __builtin_popcountg (a[0]);
+}
-- 
2.42.0



[_Hashtable] Optimize destructor

2024-06-09 Thread François Dumont

Hi

libstdc++: [_Hashtable] Optimize destructor

Hashtable destructor do not need to call clear() method that in addition to
destroying all nodes also reset all buckets to nullptr.

libstdc++-v3/ChangeLog:

    * include/bits/hashtable.h (~_Hashtable()): Replace clear call with
    a _M_deallocate_nodes call.

Tested under Linux x64, ok to commit ?

François
diff --git a/libstdc++-v3/include/bits/hashtable.h 
b/libstdc++-v3/include/bits/hashtable.h
index e8e51714d72..45b232111da 100644
--- a/libstdc++-v3/include/bits/hashtable.h
+++ b/libstdc++-v3/include/bits/hashtable.h
@@ -1666,7 +1666,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
"Cache the hash code or qualify your functors involved"
" in hash code and bucket index computation with noexcept");
 
-  clear();
+  this->_M_deallocate_nodes(_M_begin());
   _M_deallocate_buckets();
 }
 


Re: [PATCH] Hard register asm constraint

2024-06-09 Thread Stefan Schulze Frielinghaus
Ping.

On Fri, May 24, 2024 at 11:13:12AM +0200, Stefan Schulze Frielinghaus wrote:
> This implements hard register constraints for inline asm.  A hard register
> constraint is of the form {regname} where regname is any valid register.  This
> basically renders register asm superfluous.  For example, the snippet
> 
> int test (int x, int y)
> {
>   register int r4 asm ("r4") = x;
>   register int r5 asm ("r5") = y;
>   unsigned int copy = y;
>   asm ("foo %0,%1,%2" : "+d" (r4) : "d" (r5), "d" (copy));
>   return r4;
> }
> 
> could be rewritten into
> 
> int test (int x, int y)
> {
>   asm ("foo %0,%1,%2" : "+{r4}" (x) : "{r5}" (y), "d" (y));
>   return x;
> }
> 
> As a side-effect this also solves the problem of call-clobbered registers.
> That being said, I was wondering whether we could utilize this feature in 
> order
> to get rid of local register asm automatically?  For example, converting
> 
> // Result will be in r2 on s390
> extern int bar (void);
> 
> void test (void)
> {
>   register int x asm ("r2") = 42;
>   bar ();
>   asm ("foo %0\n" :: "r" (x));
> }
> 
> into
> 
> void test (void)
> {
>   int x = 42;
>   bar ();
>   asm ("foo %0\n" :: "{r2}" (x));
> }
> 
> in order to get rid of the limitation of call-clobbered registers which may
> lead to subtle bugs---especially if you think of non-obvious calls e.g.
> introduced by sanitizer/tracer/whatever.  Since such a transformation has the
> potential to break existing code do you see any edge cases where this might be
> problematic or even show stoppers?  Currently, even
> 
> int test (void)
> {
>   register int x asm ("r2") = 42;
>   register int y asm ("r2") = 24;
>   asm ("foo %0,%1\n" :: "r" (x), "r" (y));
> }
> 
> is allowed which seems error prone to me.  Thus, if 100% backwards
> compatibility would be required, then automatically converting every register
> asm to the new mechanism isn't viable.  Still quite a lot could be 
> transformed.
> Any thoughts?
> 
> Currently I allow multiple alternatives as demonstrated by
> gcc/testsuite/gcc.target/s390/asm-hard-reg-2.c.  However, since a hard 
> register
> constraint is pretty specific I could also think of erroring out in case of
> alternatives.  Are there any real use cases out there for multiple
> alternatives where one would like to use hard register constraints?
> 
> With the current implementation we have a "user visible change" in the sense
> that for
> 
> void test (void)
> {
>   register int x asm ("r2") = 42;
>   register int y asm ("r2") = 24;
>   asm ("foo   %0,%1\n" : "=r" (x), "=r" (y));
> }
> 
> we do not get the error
> 
>   "invalid hard register usage between output operands"
> 
> anymore but rather
> 
>   "multiple outputs to hard register: %r2"
> 
> This is due to the error handling in gimplify_asm_expr ().  Speaking of 
> errors,
> I also error out earlier as before which means that e.g. in pr87600-2.c only
> the first error is reported and processing is stopped afterwards which means
> the subsequent tests fail.
> 
> I've been skimming through all targets and it looks to me as if none is using
> curly brackets for their constraints.  Of course, I may have missed something.
> 
> Cheers,
> Stefan
> 
> PS: Current state for Clang: https://reviews.llvm.org/D105142
> 
> ---
>  gcc/cfgexpand.cc  |  42 ---
>  gcc/genpreds.cc   |   4 +-
>  gcc/gimplify.cc   | 115 +-
>  gcc/lra-constraints.cc|  17 +++
>  gcc/recog.cc  |  14 ++-
>  gcc/stmt.cc   | 102 +++-
>  gcc/stmt.h|  10 +-
>  .../gcc.target/s390/asm-hard-reg-1.c  | 103 
>  .../gcc.target/s390/asm-hard-reg-2.c  |  29 +
>  .../gcc.target/s390/asm-hard-reg-3.c  |  24 
>  gcc/testsuite/lib/scanasm.exp |   4 +
>  11 files changed, 407 insertions(+), 57 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-1.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-2.c
>  create mode 100644 gcc/testsuite/gcc.target/s390/asm-hard-reg-3.c
> 
> diff --git a/gcc/cfgexpand.cc b/gcc/cfgexpand.cc
> index 557cb28733b..47f71a2e803 100644
> --- a/gcc/cfgexpand.cc
> +++ b/gcc/cfgexpand.cc
> @@ -2955,44 +2955,6 @@ expand_asm_loc (tree string, int vol, location_t locus)
>emit_insn (body);
>  }
>  
> -/* Return the number of times character C occurs in string S.  */
> -static int
> -n_occurrences (int c, const char *s)
> -{
> -  int n = 0;
> -  while (*s)
> -n += (*s++ == c);
> -  return n;
> -}
> -
> -/* A subroutine of expand_asm_operands.  Check that all operands have
> -   the same number of alternatives.  Return true if so.  */
> -
> -static bool
> -check_operand_nalternatives (const vec &constraints)
> -{
> -  unsigned len = constraints.length();
> -  if (len > 0)
> -{
> -  int nalternatives = n

Re: [Patch, fortran] PR59104

2024-06-09 Thread Paul Richard Thomas
Hi Harald,

Thanks for the loophole detection! It is obvious now I see it, as is the
fix. I'll get on to it as soon as I find some time.

Cheers

Paul


On Sun, 9 Jun 2024 at 21:35, Harald Anlauf  wrote:

> Hi Paul,
>
> your approach sounds entirely reasonable.
>
> But as the following addition to the testcase shows, there seem to
> be loopholes left.
>
> When I add the following to function f:
>
>   integer :: l1(size(y))
>   integer :: l2(size(z))
>   print *, size (l1), size (l2), size (z)
>
> I get:
>
> 0   0   3
>
> Expected:
>
> 2   3   3
>
> Can you please check?
>
> Thanks,
> Harald
>
>
> Am 09.06.24 um 17:57 schrieb Paul Richard Thomas:
> > Hi All,
> >
> > I have extended the testcase - see below and have
> > s/dependent_decls_2/dependent_decls_2.f90/ in the ChnageLog.
> >
> > Cheers
> >
> > Paul
> >
> > ! { dg-do run }
> > !
> > ! Fix for PR59104 in which the dependence on the old style function
> result
> > ! was not taken into account in the ordering of auto array allocation and
> > ! characters with dependent lengths.
> > !
> > ! Contributed by Tobias Burnus  
> > !
> > module m
> > implicit none
> > integer, parameter :: dp = kind([double precision::])
> > contains
> >function f(x)
> >   integer, intent(in) :: x
> >   real(dp) f(x/2)
> >   real(dp) g(x/2)
> >   integer y(size (f)+1) ! This was the original problem
> >   integer z(size (f) + size (y)) ! Found in development of the
> fix
> >   integer w(size (f) + size (y) + x) ! Check dummy is OK
> >   f = 10.0
> >   y = 1! Stop -Wall from complaining
> >   z = 1
> >   g = 1
> >   w = 1
> >   if (size (f) .ne. 1) stop 1
> >   if (size (g) .ne. 1) stop 2
> >   if (size (y) .ne. 2) stop 3
> >   if (size (z) .ne. 3) stop 4
> >   if (size (w) .ne. 5) stop 5
> >end function f
> >function e(x) result(f)
> >   integer, intent(in) :: x
> >   real(dp) f(x/2)
> >   real(dp) g(x/2)
> >   integer y(size (f)+1)
> >   integer z(size (f) + size (y)) ! As was this.
> >   integer w(size (f) + size (y) + x)
> >   f = 10.0
> >   y = 1
> >   z = 1
> >   g = 1
> >   w = 1
> >   if (size (f) .ne. 2) stop 6
> >   if (size (g) .ne. 2) stop 7
> >   if (size (y) .ne. 3) stop 8
> >   if (size (z) .ne. 5) stop 9
> >   if (size (w) .ne. 9) stop 10
> >end function
> >function d(x)  ! After fixes to arrays, what was needed was known!
> >  integer, intent(in) :: x
> >  character(len = x/2) :: d
> >  character(len = len (d)) :: line
> >  character(len = len (d) + len (line)) :: line2
> >  character(len = len (d) + len (line) + x) :: line3
> >  line = repeat ("a", len (d))
> >  line2 = repeat ("b", x)
> >  line3 = repeat ("c", len (line3))
> >  if (len (line2) .ne. x) stop 11
> >  if (line3 .ne. "") stop 12
> >  d = line
> >end
> > end module m
> >
> > program p
> > use m
> > implicit none
> > real(dp) y
> >
> > y = sum (f (2))
> > if (int (y) .ne. 10) stop 13
> > y = sum (e (4))
> > if (int (y) .ne. 20) stop 14
> > if (d (4) .ne. "aa") stop 15
> > end program p
> >
> >
> >
> > On Sun, 9 Jun 2024 at 07:14, Paul Richard Thomas <
> > paul.richard.tho...@gmail.com> wrote:
> >
> >> Hi All,
> >>
> >> The attached fixes a problem that, judging by the comments, has been
> >> looked at periodically over the last ten years but just looked to be too
> >> fiendishly complicated to fix. This is not in small part because of the
> >> confusing ordering of dummies in the tlink chain and the unintuitive
> >> placement of all deferred initializations to the front of the init
> chain in
> >> the wrapped block.
> >>
> >> The result of the existing ordering is that the initialization code for
> >> non-dummy variables that depends on the function result occurs before
> any
> >> initialization code for the function result itself. The fix ensures
> that:
> >> (i) These variables are placed correctly in the tlink chain, respecting
> >> inter-dependencies; and (ii) The dependent initializations are placed at
> >> the end of the wrapped block init chain.  The details appear in the
> >> comments in the patch. It is entirely possible that a less clunky fix
> >> exists but I failed to find it.
> >>
> >> OK for mainline?
> >>
> >> Regards
> >>
> >> Paul
> >>
> >>
> >>
> >>
> >
>
>


Re: [PATCH v2] Target-independent store forwarding avoidance.

2024-06-09 Thread Richard Biener
On Fri, 7 Jun 2024, Jeff Law wrote:

> 
> 
> On 6/6/24 4:10 AM, Manolis Tsamis wrote:
> > This pass detects cases of expensive store forwarding and tries to avoid
> > them
> > by reordering the stores and using suitable bit insertion sequences.
> > For example it can transform this:
> > 
> >   strbw2, [x1, 1]
> >   ldr x0, [x1]  # Expensive store forwarding to larger load.
> > 
> > To:
> > 
> >   ldr x0, [x1]
> >   strbw2, [x1]
> >   bfi x0, x2, 0, 8
> > 
> > Assembly like this can appear with bitfields or type punning / unions.
> > On stress-ng when running the cpu-union microbenchmark the following
> > speedups
> > have been observed.
> > 
> >Neoverse-N1:  +29.4%
> >Intel Coffeelake: +13.1%
> >AMD 5950X:+17.5%
> > 
> > gcc/ChangeLog:
> > 
> >  * Makefile.in: Add avoid-store-forwarding.o.
> >  * common.opt: New option -favoid-store-forwarding.
> >  * params.opt: New param store-forwarding-max-distance.
> >  * doc/invoke.texi: Document new pass.
> >  * doc/passes.texi: Document new pass.
> >  * passes.def: Schedule a new pass.
> >  * tree-pass.h (make_pass_rtl_avoid_store_forwarding): Declare.
> >  * avoid-store-forwarding.cc: New file.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> >  * gcc.target/aarch64/avoid-store-forwarding-1.c: New test.
> >  * gcc.target/aarch64/avoid-store-forwarding-2.c: New test.
> >  * gcc.target/aarch64/avoid-store-forwarding-3.c: New test.
> >  * gcc.target/aarch64/avoid-store-forwarding-4.c: New test.
> So this is getting a lot more interesting.  I think the first time I looked at
> this it was more concerned with stores feeding something like a load-pair and
> avoiding the store forwarding penalty for that case.  Am I mis-remembering, or
> did it get significantly more general?
> 
> 
> 
> 
> 
> > +
> > +static unsigned int stats_sf_detected = 0;
> > +static unsigned int stats_sf_avoided = 0;
> > +
> > +static rtx
> > +get_load_mem (rtx expr)
> Needs a function comment.  You should probably mention that EXPR must be a
> single_set in that comment.
> 
> 
> 
>  +
> > +  rtx dest;
> > +  if (eliminate_load)
> > +dest = gen_reg_rtx (load_inner_mode);
> > +  else
> > +dest = SET_DEST (load);
> > +
> > +  int move_to_front = -1;
> > +  int total_cost = 0;
> > +
> > +  /* Check if we can emit bit insert instructions for all forwarded stores.
> > */
> > +  FOR_EACH_VEC_ELT (stores, i, it)
> > +{
> > +  it->mov_reg = gen_reg_rtx (GET_MODE (it->store_mem));
> > +  rtx_insn *insns = NULL;
> > +
> > +  /* If we're eliminating the load then find the store with zero offset
> > +and use it as the base register to avoid a bit insert.  */
> > +  if (eliminate_load && it->offset == 0)
> So often is this triggering?  We have various codes in the gimple optimizers
> to detect store followed by a load from the same address and do the
> forwarding.  If they're happening with any frequency that would be a good sign
> code in DOM and elsewhere isn't working well.
> 
> THe way these passes detect this case is to take store, flip the operands
> around (ie, it looks like a load) and enter that into the expression hash
> tables.  After that standard redundancy elimination approaches will work.
> 
> 
> > +   {
> > + start_sequence ();
> > +
> > + /* We can use a paradoxical subreg to force this to a wider mode, as
> > +the only use will be inserting the bits (i.e., we don't care
> > about
> > +the value of the higher bits).  */
> Which may be a good hint about the cases you're capturing -- if the
> modes/sizes differ that would make more sense since I don't think we're as
> likely to be capturing those cases.

Yeah, we handle stores from constants quite well and FRE can forward
stores from SSA names to smaller loads by inserting BIT_FIELD_REFs but 
it does have some additional restrictions.

> 
> > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > index 4e8967fd8ab..c769744d178 100644
> > --- a/gcc/doc/invoke.texi
> > +++ b/gcc/doc/invoke.texi
> > @@ -12657,6 +12657,15 @@ loop unrolling.
> >   This option is enabled by default at optimization levels @option{-O1},
> >   @option{-O2}, @option{-O3}, @option{-Os}.
> >   
> > +@opindex favoid-store-forwarding
> > +@item -favoid-store-forwarding
> > +@itemx -fno-avoid-store-forwarding
> > +Many CPUs will stall for many cycles when a load partially depends on
> > previous
> > +smaller stores.  This pass tries to detect such cases and avoid the penalty
> > by
> > +changing the order of the load and store and then fixing up the loaded
> > value.
> > +
> > +Disabled by default.
> Is there any particular reason why this would be off by default at -O1 or
> higher?  It would seem to me that on modern cores that this transformation
> should easily be a win.  Even on an old in-order core, avoiding the load with
> the bit insert is likely profitable, just not as much so.

I would think it's the targets to decide for a default.

> > diff --git a/

[PATCH] AVX-512: Pacify -Wshift-overflow=2. [PR115409]

2024-06-09 Thread Collin Funk
A shift of 31 on a signed int is undefined behavior.  Since unsigned
int is 32-bits wide this change fixes it and silences the warning.

gcc/ChangeLog:

PR target/115409
* config/i386/avx512fp16intrin.h (_mm512_conj_pch): Make the
constant unsigned before shifting.
* config/i386/avx512fp16vlintrin.h (_mm256_conj_pch): Likewise.
(_mm_conj_pch): Likewise.

Signed-off-by: Collin Funk 
---
 gcc/config/i386/avx512fp16intrin.h   | 2 +-
 gcc/config/i386/avx512fp16vlintrin.h | 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/config/i386/avx512fp16intrin.h 
b/gcc/config/i386/avx512fp16intrin.h
index f86050b2087..1869a920dd3 100644
--- a/gcc/config/i386/avx512fp16intrin.h
+++ b/gcc/config/i386/avx512fp16intrin.h
@@ -3355,7 +3355,7 @@ extern __inline __m512h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm512_conj_pch (__m512h __A)
 {
-  return (__m512h) _mm512_xor_epi32 ((__m512i) __A, _mm512_set1_epi32 (1<<31));
+  return (__m512h) _mm512_xor_epi32 ((__m512i) __A, _mm512_set1_epi32 
(1U<<31));
 }
 
 extern __inline __m512h
diff --git a/gcc/config/i386/avx512fp16vlintrin.h 
b/gcc/config/i386/avx512fp16vlintrin.h
index a1e1cb567ff..405a06bbb9e 100644
--- a/gcc/config/i386/avx512fp16vlintrin.h
+++ b/gcc/config/i386/avx512fp16vlintrin.h
@@ -181,7 +181,7 @@ extern __inline __m256h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm256_conj_pch (__m256h __A)
 {
-  return (__m256h) _mm256_xor_epi32 ((__m256i) __A, _mm256_avx512_set1_epi32 
(1<<31));
+  return (__m256h) _mm256_xor_epi32 ((__m256i) __A, _mm256_avx512_set1_epi32 
(1U<<31));
 }
 
 extern __inline __m256h
@@ -209,7 +209,7 @@ extern __inline __m128h
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
 _mm_conj_pch (__m128h __A)
 {
-  return (__m128h) _mm_xor_epi32 ((__m128i) __A, _mm_avx512_set1_epi32 
(1<<31));
+  return (__m128h) _mm_xor_epi32 ((__m128i) __A, _mm_avx512_set1_epi32 
(1U<<31));
 }
 
 extern __inline __m128h
-- 
2.45.2