[Bug target/115687] RISC-V optimization when "lui" instructions can be merged

2024-06-27 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115687

--- Comment #8 from Palmer Dabbelt  ---
(In reply to Andrew Waterman from comment #6)
> I note MIPS sets TARGET_CONST_ANCHOR to 0x8000, and that architecture's
> ADDIU instruction has a 16-bit immediate.  RISC-V's ADDI instruction has a
> 12-bit immediate, so presumably we should be setting it to 0x800.

Ya, sorry, I wasn't paying attention -- regardless I think Vineet's on the
right track here with the splitter messing with us here, the incoming code has
a constant anchor already so dealing with those is sort of a different problem.

If removing that splitter is on the TODO list that seems reasonable, though I'd
taken a very different approach and just hacked up a post-split CSE as it seems
like we could end up in more situations like this.  I have no idea if that's a
sane idea, I sent an RFC to the lists to see what people think...

[Bug target/115687] RISC-V optimization when "lui" instructions can be merged

2024-06-27 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115687

--- Comment #5 from palmer at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #3)
> (In reply to Andrew Pinski from comment #2)
> > There is some code in cse.cc which does handle this.
> > See
> > https://gcc.gnu.org/onlinedocs/gccint/Misc.html#index-
> > TARGET_005fCONST_005fANCHOR also.
> 
> MIPS, aarch64 and rs6000 all define TARGET_CONST_ANCHOR (well MIPS sets
> targetm.const_anchor depending on if it is mips16/micromips or mips32/64).

Oh, thanks, I didn't know about that.  It looks like just adding 

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 9bba5da016e..6080298c36c 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -12019,6 +12019,9 @@ riscv_c_mode_for_floating_type (enum tree_index ti)
 #undef TARGET_C_MODE_FOR_FLOATING_TYPE
 #define TARGET_C_MODE_FOR_FLOATING_TYPE riscv_c_mode_for_floating_type

+#undef TARGET_CONST_ANCHOR
+#define TARGET_CONST_ANCHOR 0x4000
+
 struct gcc_target targetm = TARGET_INITIALIZER;

 #include "gt-riscv.h"

isn't enough for us here, but it seems like we should have something along
those lines.  2d7c73ee5ea ("AArch64: Enable TARGET_CONST_ANCHOR") has a test
case, so hopefully it's not that tricky to get something that exposes the issue
on RISC-V as well...

[Bug target/115687] RISC-V optimization when "lui" instructions can be merged

2024-06-27 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115687

--- Comment #4 from palmer at gcc dot gnu.org ---
Just poking around a bit: I think this is coming from CSE, which is replacing

(insn 5 2 6 2 (set (reg:DI 135)
(const_int 16384 [0x4000])) "pr115687.c":7:12 275 {*movdi_64bit}
 (nil))
(insn 6 5 7 2 (set (reg:DI 12 a2)
(plus:DI (reg:DI 135)
(const_int -16 [0xfff0]))) "pr115687.c":7:12 5 {adddi3}
 (expr_list:REG_EQUAL (const_int 16368 [0x3ff0])
(nil)))
(insn 7 6 8 2 (set (reg:DI 136)
(const_int 16384 [0x4000])) "pr115687.c":7:12 275 {*movdi_64bit}
 (nil))
(insn 8 7 9 2 (set (reg:DI 11 a1)
(plus:DI (reg:DI 136)
(const_int 32 [0x20]))) "pr115687.c":7:12 5 {adddi3}
 (expr_list:REG_EQUAL (const_int 16416 [0x4020])
(nil)))
(insn 9 8 10 2 (set (reg:DI 137)
(const_int 16384 [0x4000])) "pr115687.c":7:12 275 {*movdi_64bit}
 (nil))

with

(insn 5 2 6 2 (set (reg:DI 135)
(const_int 16384 [0x4000])) "pr115687.c":7:12 275 {*movdi_64bit}
 (nil))
(insn 6 5 7 2 (set (reg:DI 12 a2)
(const_int 16368 [0x3ff0])) "pr115687.c":7:12 273 {*mvconst_internal}
 (expr_list:REG_DEAD (reg:DI 135)
(expr_list:REG_EQUAL (const_int 16368 [0x3ff0])
(nil
(insn 7 6 8 2 (set (reg:DI 136)
(reg:DI 135)) "pr115687.c":7:12 275 {*movdi_64bit}
 (expr_list:REG_EQUAL (const_int 16384 [0x4000])
(nil)))
(insn 8 7 9 2 (set (reg:DI 11 a1)
(const_int 16416 [0x4020])) "pr115687.c":7:12 273 {*mvconst_internal}
 (expr_list:REG_DEAD (reg:DI 136)
(expr_list:REG_EQUAL (const_int 16416 [0x4020])
(nil
(insn 9 8 10 2 (set (reg:DI 137)
(reg:DI 135)) "pr115687.c":7:12 275 {*movdi_64bit}
 (expr_list:REG_EQUAL (const_int 16384 [0x4000])
(nil)))
(insn 10 9 11 2 (set (reg:DI 10 a0)
(const_int 16400 [0x4010])) "pr115687.c":7:12 273 {*mvconst_internal}
 (expr_list:REG_DEAD (reg:DI 137)
(expr_list:REG_EQUAL (const_int 16400 [0x4010])
(nil

that seems to be as-designed -- or at least as this comment in cse.cc seems to
be describing

  /* Find cheapest and skip it for the next time.   For items
 of equal cost, use this order:
 src_folded, src, src_eqv, src_related and hash table entry.  */

That seems like a bit of a heuristic, but I haven't poked around this stuff to
really understand how it's handling multiple uses of the incoming constant
anchor.

[Bug target/115687] RISC-V optimization when "lui" instructions can be merged

2024-06-27 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115687

palmer at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2024-06-27
 Ever confirmed|0   |1
 CC||palmer at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #1 from palmer at gcc dot gnu.org ---
Looks reasonable to me.  Here's a slightly smaller test case

long bar(unsigned int a, unsigned int b, unsigned int c);

long foo(void) {
unsigned int a = 0x4010; // (0x4 << 12) + 0x10
unsigned int b = 0x4020; // (0x4 << 12) + 0x20
unsigned int c = 0x3FF0; // (0x4 << 12) - 0x10
return bar(a, b, c);
}

which under -O2 produces

li  a2,16384
li  a1,16384
li  a0,16384
addia2,a2,-16
addia1,a1,32
addia0,a0,16
tailbar

Jeff had been doing a bunch of constant generation stuff, not sure if he's got
a fix for this one in the works?

[Bug target/115217] New: Register pairs can't be encoded in RISC-V inline asm blocks

2024-05-24 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115217

Bug ID: 115217
   Summary: Register pairs can't be encoded in RISC-V inline asm
blocks
   Product: gcc
   Version: 15.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: palmer at gcc dot gnu.org
  Target Milestone: ---

Alex is trying to do the amocas.q support in Linux, which operates on paired X
registers by providing only the even register in the instruction encoding.  As
far as I can tell we don't have a way to encode this in inline assembly
constraints: we can just force a pair of registers via the asm/register
variable type tricks, but we don't have a constraint for allocating them.

IIUC just doing something like the following isn't sufficient, as we'd need to
also read/clobber the paired register and I'm not sure how to plumb that into
inline assembly.

It's not super clear if that's a real performance problem as there's not going
to be a ton of 128-bit CAS instances in code, but I figured it's worth filing a
bug just in case anyone's looking or there's some inline asm trick I don't know
about.  Both x86_64 and aarch64 look like they're using explicit register
numbers in their 128-bit cmpxchg code in Linux, so maybe that's a sign this
isn't worth doing?

diff --git a/gcc/config/riscv/constraints.md b/gcc/config/riscv/constraints.md
index a9ee346af6f..c372c5f8853 100644
--- a/gcc/config/riscv/constraints.md
+++ b/gcc/config/riscv/constraints.md
@@ -141,6 +141,10 @@ (define_constraint "T"
   (and (match_operand 0 "move_operand")
(match_test "CONSTANT_P (op)")))

+(define_constraint "re"
+  "The even part of an X register pair, not including the x0-x0 pair"
+  (match_test "even_x_register_pair (op)"))
+
 ;; Zfa constraints.

 (define_constraint "zfli"
diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
index 0fb5729fdcf..70ba741ab54 100644
--- a/gcc/config/riscv/predicates.md
+++ b/gcc/config/riscv/predicates.md
@@ -201,6 +201,11 @@ (define_predicate "zcmp_mv_sreg_operand"
 : IN_RANGE (REGNO (op), S0_REGNUM, S1_REGNUM)
 || IN_RANGE (REGNO (op), S2_REGNUM, S7_REGNUM)")))

+;; Some operations (like amocas.q) operate on X register pairs.
+(define_predicate "even_x_register_pair"
+  (and (match_code "reg")
+   (match_test "GP_REG_P (REGNO (op)) && (REGNO (op) % 2 == 0)")))
+
 ;; Only use branch-on-bit sequences when the mask is not an ANDI immediate.
 (define_predicate "branch_on_bit_operand"
   (and (match_code "const_int")

[Bug target/114809] [RISC-V RVV] Counting elements might be simpler

2024-04-22 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114809

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Keywords||missed-optimization
 Ever confirmed|0   |1
 CC||palmer at gcc dot gnu.org
   Last reconfirmed||2024-04-22

--- Comment #1 from palmer at gcc dot gnu.org ---
Thanks.

Sounds like there's really two issues here: a missed peephole and a more
complex set of micro-architectural tradeoffs.

The peephole seems like a pretty straight-forward missed optimization, if
you've got a smaller reproducer it's probably worth filing another bug for it. 
We're right at the end of the GCC-14 release process and ended up with some
last-minute breakages so stuff is pretty chaotic right now, having the bug will
make it easier to avoid forgetting about this.

The reduction looks way more complicated to me.  Just thinking a bit as I'm
watching the regressions run, I think there's a few options for generating the
code here:

* Do we accumulate into a vector and then reduce, or reduce and then
accumulate?
* Do we reduce via a sum-reduction or a popcnt?
* Do we reconfigure to a wider type or handle the overflow?

I think this will depend on the cost model for the hardware: we're essentially
trading off operations of one flavor of op for another, and that's going to
depend on how these ops perform.  Your suggestion is essentially a
reconfiguration vs reduction trade-off, which is probably going to be
implementation-specific.

Do you have a system that this code performs poorly on?  If there's something
concrete to target and we're not generating good code that's pretty actionable,
otherwise I think this one is going to be hard to reason about for a bit.

[Bug target/114175] [13/14] RISC-V: Execution test failures on gcc.dg/c23-stdarg-6.c

2024-02-29 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114175

--- Comment #18 from palmer at gcc dot gnu.org ---
(In reply to palmer from comment #17)
> (In reply to Edwin Lu from comment #16)
> > So if I understand correctly, there may also be a problem where it's trying
> > to create that named first argument but also trying to pass it as a variadic
> > argument.
> 
> Ya, sounds like that could very likely be the source of the bug.

and to be a little less vague: I'd guess we're just treating "unnamed" as
"variadic" somewhere in the calling convention code, and that we're missing the
special case of large return values being unnamed but not variadic arguments
(even in variadic functions).

[Bug target/114175] [13/14] RISC-V: Execution test failures on gcc.dg/c23-stdarg-6.c

2024-02-29 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114175

--- Comment #17 from palmer at gcc dot gnu.org ---
(In reply to Edwin Lu from comment #16)
> (In reply to palmer from comment #15)
> > It's a little easier to see from the float version of the code.
> > 
> > $ cat gcc/testsuite/gcc.dg/c23-stdarg-6.c 
> > /* Test C23 variadic functions with no named parameters, or last named
> >parameter with a declaration not allowed in C17.  Execution tests.  */
> > /* { dg-do run } */
> > /* { dg-options "-std=c23 -pedantic-errors" } */
> > 
> > #include 
> > #include 
> > 
> > extern void abort (void);
> > extern void exit (int);
> > struct s { char c[1000]; };
> > 
> > struct s
> > f (...)
> > {
> >   va_list ap;
> >   va_start (ap);
> >   int r = va_arg (ap, double);
> >   va_end (ap);
> >   struct s ret = {};
> >   ret.c[0] = r;
> >   ret.c[999] = 42;
> >   return ret;
> > }
> > 
> > int
> > main ()
> > {
> >   struct s x = f (1.0);
> >   fprintf(stderr, "%d\n", x.c[0]);
> >   if (x.c[0] != 1)
> > abort ();
> >   exit (0);
> > }
> > $ riscv64-unknown-linux-gnu-gcc gcc/testsuite/gcc.dg/c23-stdarg-6.c -o test
> > -std=c2x -static -O3
> > $ riscv64-unknown-linux-gnu-objdump -d test
> > ...
> > 00010412 :
> > ...
> >1042e:   850amv  a0,sp
> > ...
> >10438:   112000efjal 1054a 
> > ...
> > 0001054a :
> >1054a:   f20507d3fmv.d.x fa5,a0
> > 
> > The psABI says
> > 
> > A callee with variadic arguments is responsible for copying the contents
> > of registers used to pass variadic arguments to the vararg save area,
> > which must be contiguous with arguments passed on the stack.
> > 
> > which I'm taking to mean the "1.0" is meant to be passed in a register.  It
> > also says
> > 
> > Values are returned in the same manner as a first named argument of the
> > same type would be passed. If such an argument would have been passed by
> > reference, the caller allocates memory for the return value, and passes
> > the address as an implicit first parameter.
> > 
> 
> The psABI also says this in the paragraph before
> 
>   In the base integer calling convention, variadic arguments are passed 
>   in the same manner as named arguments, with one exception. Variadic 
>   arguments with 2×XLEN-bit alignment and size at most 2×XLEN bits are
>   passed in an aligned register pair (i.e., the first register in the
> pair 
>   is even-numbered), or on the stack by value if none is available.
> After a
>   variadic argument has been passed on the stack, all future arguments
> will
>   also be passed on the stack (i.e. the last argument register may be
> left 
>   unused due to the aligned register pair rule).

Edwin and I were talking in the office a bit before he posted this.  My
interpretation (and IIUC he agrees) is that this clause doesn't apply here: the
psABI says the return value is passed as if it was a named argument, so even
though it's passed on the stack we should continue to pass small variadic
arguments in registers.

We should check with LLVM, though, just to make sure everyone is interpreting
things the same way.  GCC is inconsistent between the caller and callee here,
so we might as well match what LLVM is doing.

> > So I think we're screwing up both ends of this one: the caller is passing
> > the return struct in a0 (losing the first arg), which the callee is
> > obtaining the first argument from a0 (losing the return struct).
> > 
> > That all very much seems like a backend bug to me.
> 
> So if I understand correctly, there may also be a problem where it's trying
> to create that named first argument but also trying to pass it as a variadic
> argument.

Ya, sounds like that could very likely be the source of the bug.

[Bug target/114175] [13/14] RISC-V: Execution test failures on gcc.dg/c23-stdarg-6.c

2024-02-29 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114175

--- Comment #15 from palmer at gcc dot gnu.org ---
It's a little easier to see from the float version of the code.

$ cat gcc/testsuite/gcc.dg/c23-stdarg-6.c 
/* Test C23 variadic functions with no named parameters, or last named
   parameter with a declaration not allowed in C17.  Execution tests.  */
/* { dg-do run } */
/* { dg-options "-std=c23 -pedantic-errors" } */

#include 
#include 

extern void abort (void);
extern void exit (int);
struct s { char c[1000]; };

struct s
f (...)
{
  va_list ap;
  va_start (ap);
  int r = va_arg (ap, double);
  va_end (ap);
  struct s ret = {};
  ret.c[0] = r;
  ret.c[999] = 42;
  return ret;
}

int
main ()
{
  struct s x = f (1.0);
  fprintf(stderr, "%d\n", x.c[0]);
  if (x.c[0] != 1)
abort ();
  exit (0);
}
$ riscv64-unknown-linux-gnu-gcc gcc/testsuite/gcc.dg/c23-stdarg-6.c -o test
-std=c2x -static -O3
$ riscv64-unknown-linux-gnu-objdump -d test
...
00010412 :
...
   1042e:   850amv  a0,sp
...
   10438:   112000efjal 1054a 
...
0001054a :
   1054a:   f20507d3fmv.d.x fa5,a0

The psABI says

A callee with variadic arguments is responsible for copying the contents
of registers used to pass variadic arguments to the vararg save area,
which must be contiguous with arguments passed on the stack.

which I'm taking to mean the "1.0" is meant to be passed in a register.  It
also says

Values are returned in the same manner as a first named argument of the
same type would be passed. If such an argument would have been passed by
reference, the caller allocates memory for the return value, and passes
the address as an implicit first parameter.

So I think we're screwing up both ends of this one: the caller is passing the
return struct in a0 (losing the first arg), which the callee is obtaining the
first argument from a0 (losing the return struct).

That all very much seems like a backend bug to me.

[Bug target/114175] [13/14] RISC-V: Execution test failures on gcc.dg/c23-stdarg-6.c

2024-02-29 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114175

palmer at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2024-02-29
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW

--- Comment #14 from palmer at gcc dot gnu.org ---
Looks like it's a problem with the struct return argument mixing with va_start
/ va_arg.  This much smaller test case still fails, and on gcc-13

$ cat gcc/testsuite/gcc.dg/c23-stdarg-6.c
/* Test C23 variadic functions with no named parameters, or last named
   parameter with a declaration not allowed in C17.  Execution tests.  */
/* { dg-do run } */
/* { dg-options "-std=c23 -pedantic-errors" } */

#include 
#include 

extern void abort (void);
extern void exit (int);
struct s { char c[1000]; };

struct s
f (...)
{
  va_list ap;
  va_start (ap);
  double r = va_arg (ap, int);
  va_end (ap);
  struct s ret = {};
  ret.c[0] = r;
  ret.c[999] = 42;
  return ret;
}

int
main ()
{
  struct s x = f (1);
  fprintf(stderr, "%d\n", x.c[0]);
  if (x.c[0] != 1)
abort ();
  exit (0);
}
$ riscv64-unknown-linux-gnu-gcc gcc/testsuite/gcc.dg/c23-stdarg-6.c -o test
-std=c2x -static -O3
$ qemu-riscv64 ./test
16
Aborted

The output value seems to change from time to time, which smells like some
uninitialized access.  I'd bet we're just not properly skipping over the output
stack space in riscv_va_start().  Not quite sure where to start, though, as
ours is so much simpler than arm64 that it's going to take a bit to figure out
what's going on.

[Bug target/114175] [13/14] RISC-V: Execution test failures on gcc.dg/c23-stdarg-6.c

2024-02-29 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114175

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #9 from palmer at gcc dot gnu.org ---
(In reply to Jakub Jelinek from comment #8)
> Guess somebody should read the psABI, figure out whether it is passed right
> on the caller side (without the patch or with it) or callee and debug
> afterwards.

Do you have a pointer to which call is actually failing?  I don't have a clean
tree right now and my box is backed up testing linux/glibc merges...

[Bug other/109668] 'python' vs. 'python3'

2024-02-09 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109668

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #3 from palmer at gcc dot gnu.org ---
Jan-Benedict Glaw is reporting (via a crosstool-ng bug
<https://github.com/crosstool-ng/crosstool-ng/issues/2039>) that we've got a
few python2 scripts in the RISC-V port that can just be converted over.  I just
sent along a patch to clean that up.

[Bug target/113686] [RISC-V] TLS (Local Exec) relaxation on structures (LE)

2024-01-31 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113686

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||nelsonc1225 at sourceware dot 
org,
   ||palmer at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2024-01-31

--- Comment #1 from palmer at gcc dot gnu.org ---
(In reply to H. Peter Anvin from comment #0)
> When the Local Exec TLS model is in use, gcc generates inefficient code for
> accessing the member of a structure:
> 
> struct foobar {
>int alpha;
>int beta;
> };
> 
> _Thread_local struct foobar foo;
> 
> void func(int bar)
> {
> foo.beta = bar;
> }
> 
> # Version 1
> luia1,%tprel_hi(foo)
> adda1,a1,tp,%tprel_add(foo)
> addi   a1,a1,%tprel_lo(foo)
> sw a0,4(a1)
> 
> However, in this case it could be generated as:
> 
> # Version 2
> luia1,%tprel_hi(sym+4)
> addi   a1,a1,tp,%tprel_add(sym+4)
> sw a0,%tprel_lo(sym+4)(a1)
> 
> ... which, if %tprel_hi(sym+4) == 0, as it often is for small embedded
> software, the linker can relax to a simple (tp) reference:
> 
> # Version 2a (post-relaxation with small .tbss)
> sw a0,%tprel_lo(sym+4)(tp)
> 
> The linker will *not* relax version 1 all the way; leaving an unnecessary mv:
> 
> # Version 1a (post-relaxation with small .tbss)
> mv a1,tp
> sw a0,%tprel_lo(sym+4)(tp)
> 
> It is of course trickier for the case of multiple subsequent references to
> the structure if the structure is not aligned, as gcc can't know a priori
> where the 4K breaks are[*]. The version 1 code is more efficient in that
> case (3 instructions + 1 instruction/field as opposed to 3
> instructions/field.)
> 
> However, if the structure *is* aligned, gcc will still not optimize 1 into 2.
> 
> There are at least a few options I see:
> 
> 1. gcc option: gcc can generate version 2 code for a single field reference,
> or if the alignment is such that all fields are guaranteed to fall inside
> the same 4K window.

IIUC we could do this without adding anything to the linker or psABI, it's just
better code from GCC (we already have TPREL_LO12_S for the stores).  That's
just better code so it seems uncontroversial to me.

> 2. gcc and optional ABI option: introduce a "TLS TE-tiny" model for deep
> embedded use, where the combined size of the TSS area is limited to 4K
> equivalent to the way direct gp references [or zero, if the global pointer
> is 0] work. Thus, direct (tp) references can be used.

Unless I'm missing something, we never emit direct GP references from GCC right
now.  We rely on the linker to relax them.

> NOTE: With the current binutils, this will error unless .option norelax is
> in effect. It might be desirable to instead have a new relocation type,
> which would require binutils support. Alternatively, ld should recognize
> that the TLS offset is within +/- 2K and suppress the warning in that case
> (since at that point the address is available the the linker.)
> 
> The linker could be further optimized by allowing the TLS to offset;
> presumably equivalently to the __global_pointer$ symbol.
> 
> 3. binutils option: teach ld to relax these kinds of chained pointer
> references.

I'd favor adding support better for relaxing TP-relative sequences to the
linker where we can, it avoids the need for a new code model and we've already
got most of the linker complexity as it's required for GP.  So I think we can
essentially just call these LD missed optimizations.  Nelson might be out for a
bit, but I added him to the CC list.

> [*] Rant: in my opinion, the lui/auipc instructions are fundamentally
> misdesigned by not having an overlap bit to guarantee a sizable window.

I agree we've got auipc issues, it bites us all over the place (we essentially
can't share a hi* between multiple lo*s, as we don't know when overflow is
going to happen).  There'd been some vague proposals to add a third relocation
in the chain to align things, but I think they fizzled out because it'd require
talking to the psABI folks.

I think we're broadly safe for lui, though, so not sure if I'm missing
something there?  The low bits are always 0 so the intermediate alignment is
known.

[Bug libstdc++/84568] libstdc++-v3 configure checks for atomic operations fail on riscv

2024-01-18 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84568

--- Comment #13 from palmer at gcc dot gnu.org ---
I just stumbled back into this one.  I think it's fixed?

[Bug target/113087] [14] RISC-V rv64gcv vector: Runtime mismatch with rv64gc

2023-12-22 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=113087

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #22 from palmer at gcc dot gnu.org ---
(In reply to JuzheZhong from comment #19)
> (In reply to Vineet Gupta from comment #18)
> > (In reply to JuzheZhong from comment #17)
> > > PLCT told me they passed with zvl256b.
> > > 
> > > I always run SPEC with FIXED-VLMAX since we always care about peak
> > > performance
> > > on our board.
> > 
> > Sure we all have our preferred peak performance configs. But the compiler
> > needs to work for all vendors' configs. So as a test, can you try a scalable
> > build run at your end to at least see if you can see those issues ?
> 
> I am not able to build and test SPEC since I don't have QEMU and SPEC
> environment.

Sorry, I'm kind of confused here: you're saying you can't build/test SPEC, but
then above saying you run SPEC.

> I should ask my colleague to do that but they are quite busy with company's
> things and frankly I can't pull more resource on open source work from my
> company.

[Bug target/112531] [14] RISC-V: gcc.dg/unroll-8.c rtl-dump scan errors with --param=riscv-autovec-preference=scalable

2023-11-21 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112531

palmer at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2023-11-21
 Ever confirmed|0   |1
 CC||palmer at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #5 from palmer at gcc dot gnu.org ---
(In reply to Robin Dapp from comment #4)
> Personally, I don't mind having some FAILs as long as we know them and
> understand the reason for them.   I wouldn't insist on "fixing" them but
> don't mind if others prefer to have the results "clean".  Probably a matter
> of taste.

IIUC every target still has some FAILs, so it's kind of just par for the
course.  That said, if we're going to put the work into root causing the
failure far enough to determine it's invalid then we're most of the way to just
making the failure disappear.  I guess it's a little more work upfront, but
otherwise everyone has to maintain some list of "tests that FAIL, but we're
ignoring".  We had some of that in the riscv-gnu-toolchain allowlist, but even
then it becomes clunky to maintain.

So I think we're unlikely to ever get them all, but at least for the ones that
are somewhat easy to root cause I think we might as well just fix them.  I just
sent along a fix for this one:
https://inbox.sourceware.org/gcc-patches/20231121232704.12336-5-pal...@rivosinc.com/

[Bug target/112295] RISC-V: Short forward branch pessimisation for ALU operations

2023-10-30 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112295

palmer at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2023-10-30
 Ever confirmed|0   |1
 CC||palmer at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #1 from palmer at gcc dot gnu.org ---
IIRC something along these lines came up when originally doing the SFB stuff. 
I don't remember if the SiFive 7-series can fuse the proposed code as there's
an extra a0 read in there.  I do remember missing some fusion opportunities but
deciding they weren't worth worrying about at the time as the CMOV got most of
the benefit.

Even if SiFive can't fuse these, we'll probably end up with implementations
that can at some point.  So IMO it's a valid missed optimization.

[Bug target/111600] [14 Regression] RISC-V bootstrap time regression

2023-09-27 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111600

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org,
   ||vineetg at gcc dot gnu.org

--- Comment #7 from palmer at gcc dot gnu.org ---
(In reply to Andreas Schwab from comment #3)
> Here are the build times of the stage1 compiler:
> 
> 20230714  21573
> 20230722  19932   -7.6%
> 20230728  21608   +8.4%
> 20230804  21841   +1.0%
> 20230811  25016   +14.5%
> 20230818  25429   +1.7%
> 20230825  25872   +1.7%
> 20230901  25965   +0.4%
> 20230908  28824   +11.0%
> 20230915  30926   +7.3%
> 20230922  40180   +30.0%

Did anything else change?  The latest binutils has better debug support, so I
could imagine us ending up with some longer compiler times as a result -- there
has to be more than just that here, though.

Aside from that we have had a ton of vector codegen go in over the last few
months, but this is a pretty huge increase so I agree it's worrisome.  I'm
adding Vineet to the CC list, as he's been doing some SPEC runs.  I don't think
we've had any major runtime regressions, but looks like dwarf2out.cc times have
crept up a bit which is also worrisome.

Also what exactly are you timing?  Native boostraps on QEMU?

[Bug target/104831] RISCV libatomic LR.aq/SC.rl pair insufficient for SEQ_CST

2023-09-25 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104831

palmer at gcc dot gnu.org changed:

   What|Removed |Added

   Assignee|unassigned at gcc dot gnu.org  |patrick at rivosinc dot 
com
 Status|UNCONFIRMED |ASSIGNED
 Ever confirmed|0   |1
   Last reconfirmed||2023-09-25

--- Comment #10 from palmer at gcc dot gnu.org ---
This should be fixed, looks like we just forgot to close the bug.  I've
assigned it to Patrick to make sure everything's finished.

[Bug c/111518] relro protection not working in riscv

2023-09-21 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111518

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #1 from palmer at gcc dot gnu.org ---
(In reply to sattdeepan from comment #0)
> 1. Compile with -z,relro,-z,now flag:
> gcc -g -Wl,-z,norelro -O0  -o test_partial test_relro.c

Those don't match: the comment says relro+now, but the command line says
norelro.  I'm just double checking to make sure the run is from a relro+now
build, as opposed to a norelro build.

Also, I get a warning building the code.  I don't think it'll result in bad
behavior here, though.

$ riscv64-unknown-linux-gnu-gcc test.c -o test
test.c: In function ‘main’:
test.c:7:35: warning: passing argument 1 of ‘strtol’ from incompatible pointer
type [-Wincompatible-pointer-types]
7 | size_t *p = (size_t *) strtol(argv[1], NULL, 16);
  |   ^~~
  |   |
  |   int *
In file included from test.c:2:
/usr/riscv64-unknown-linux-gnu/usr/include/stdlib.h:177:48: note: expected
‘const char * restrict’ but argument is of type ‘int *’
  177 | extern long int strtol (const char *__restrict __nptr,
  | ~~~^~

[Bug target/111501] RISC-V: non-optimal casting when shifting

2023-09-20 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111501

palmer at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2023-09-20
   Keywords||missed-optimization
 Status|UNCONFIRMED |NEW
  Component|c   |target
 Ever confirmed|0   |1
 CC||palmer at gcc dot gnu.org,
   ||vineetg at gcc dot gnu.org

--- Comment #1 from palmer at gcc dot gnu.org ---
Adding Vineet.

[Bug target/111139] RISC-V: improve scalar constants cost model

2023-08-24 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=39

palmer at gcc dot gnu.org changed:

   What|Removed |Added

   Last reconfirmed||2023-08-24
 Ever confirmed|0   |1
 CC||palmer at gcc dot gnu.org
 Status|UNCONFIRMED |NEW

--- Comment #1 from palmer at gcc dot gnu.org ---
I think maybe Jeff said he was going to look at this?  IMO even just a
refactoring here would be great, like Vineet is pointing out it's become a bit
of a mess.

Also: Edwin is refactoring the rest of the types.

[Bug target/111065] [RISCV] t-linux-multilib specifies incorrect multilib reuse patterns

2023-08-18 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111065

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #3 from palmer at gcc dot gnu.org ---
(In reply to Tommy Murphy from comment #2)
> Thanks @Kito Cheng - but I don't really understand how your comment relates
> to the specific issue of the t-linux-multilib reuse "mappings" being
> incorrect (and possibly the reverse of what was originally intended?)? Maybe
> you can clarify? Thanks again.

The Linux and ELF multilibs are different: for Linux we assumed ISA
compatibility was up to the distro, so multilib just handles the ABI side of
things.  That said, C does bleed into the ABI so we should really fix that --
presumably we'd need some psABI work there, compatibility is going to be a bit
clunky so it's probably best to just add two explicit ABI variants.

[Bug target/111020] RFE: RISC-V: ability to cherry-pick additional instructions

2023-08-14 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111020

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #4 from palmer at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #3)
> (In reply to H. Peter Anvin from comment #2)
> > Named subsets are, inherently, designed to make sense toward mass-produced
> > products where the hardware and software are designed (mostly)
> > independently. However, what I mean with "very deep embedded use" is
> > hardware and software being co-designed.
> > 
> > The RISC-V ISA policy is that those are considered vendor-specific subsets
> > and are to be given an X* name; however, gcc obviously needs to be able to
> > understand the meaning of this X* name. At this point there is no way to do
> > without changing the source code in nontrivial ways.
> > 
> > Regardless of if it is done in source code or at runtime, by implementing a
> > fine-grained, preferably table-driven, approach to subsets in gcc then it
> > would be very simple for a hardware implementor to define their custom
> > X-subsets without a lot of surgery to the code, *and* it makes it possible
> > to take it one step further and allowing custom (or newly defined! - there
> > have been multiple instances already of new subsets of existing instructions
> > defined a posteori) instruction subsets to be defined in a configuration
> > file.
> 
> I am 100% disagree here. Because if you do this there would be a huge
> explosion of what is and is not considered a subset. THIS is why it should
> be defined at the ISA level instead. Why just CTZ for ZBB what next just
> bseti or bexti of ZBS?
> 
> defining the specific set during your development is different from a
> production compiler really. GCC should aim for production compiler quality
> even for highly embedded targets.

IMO adding some config file for custom subsets is going to make more headaches
than it fixes.  For a while we had args like "-mno-div", but that's kind of
hacky and we eventually ended up with Zmmul to handle it -- having an external
config file controlling this would expose a lot of interface surface we don't
have a sane way to test.

If vendors want a custom subset then they can make one, it'll just be called
"X${vendor}${subset}".  We've already got a few forks/subsets floating around,
look at the T-Head and Ventana stuff.  For a few instructions it's pretty
mechanical, aside from fixing whatever fallout comes from splitting off the
subset.

We do currently require (IIRC we still didn't write this down) some amount of
public commitment to hardware availability to take that code, but if that's the
problem we should try and figure something out.  It's certainly a pain for
vendors to keep in-development trees around, but we're trading that off with
upstream pain -- I've found these sorts of subsets drift around until the HW
actually ships, so we don't want to end up stuck keeping around subsets that
didn't ship.

Vendors also have the option of just implementing all the instructions (via
some trap or microcode or whatever), thus turning this into a performance
problem.  That sort of just trades one problem for another, but we've got some
examples of this as well (SiFive traps on a bunch of stuff, for example).

[Bug target/110748] RISC-V: optimize store of DF 0.0

2023-07-20 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748

--- Comment #7 from palmer at gcc dot gnu.org ---
(In reply to palmer from comment #6)
> (In reply to Jeffrey A. Law from comment #5)
> > I'd bet it's const_0_operand not allowing CONST_DOUBLE.
> > 
> > The question is what unintended side effects we'd have if we allowed
> > CONST_DOUBLE 0.0 in const_0_operand.
> 
> We don't have a architectural 0 register, so we'd probably end up needing to
> refactor some stuff.  It's probably smoother to add some sort of
> "reg_or_0_or_0f_operand" type predicate, and then convert the floating-point
> stuff that takes X registers over to that (at least stores and
> integer->float conversions, but maybe some comparisons too?).

Should have said "We don't have a architectural 0 floating-point register", we
have x0 (which is why that reg_or_0 stuff shows up).

[Bug target/110748] RISC-V: optimize store of DF 0.0

2023-07-20 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110748

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #6 from palmer at gcc dot gnu.org ---
(In reply to Jeffrey A. Law from comment #5)
> I'd bet it's const_0_operand not allowing CONST_DOUBLE.
> 
> The question is what unintended side effects we'd have if we allowed
> CONST_DOUBLE 0.0 in const_0_operand.

We don't have a architectural 0 register, so we'd probably end up needing to
refactor some stuff.  It's probably smoother to add some sort of
"reg_or_0_or_0f_operand" type predicate, and then convert the floating-point
stuff that takes X registers over to that (at least stores and integer->float
conversions, but maybe some comparisons too?).

[Bug target/110722] New: FP is Saved/Restored around inline assembly

2023-07-18 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110722

Bug ID: 110722
   Summary: FP is Saved/Restored around inline assembly
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: palmer at gcc dot gnu.org
  Target Milestone: ---

I'm not sure if this is some ABI-related requirement that I've managed to
forget about, but it looks like we're saving/restoring FP around inline
assembly.

long fp_asm(long arg0)
{
asm volatile ("addi a0, a0, 1" : "+r"(arg0));
return arg0;
}

produces

fp_asm(long):
addisp,sp,-16
sd  s0,8(sp)
addis0,sp,16
addi a0, a0, 1
ld  s0,8(sp)
addisp,sp,16
jr  ra

We've got a ton of inline assembly in Linux these days and defconfig has
`-fno-omit-frame-pointer`, so this probably manifests as a performance issue
for someone somewhere -- though Clement just ran into it because he was
curious, so I don't have anything concrete.

[Bug target/110478] RISC-V multilib gcc zicsr in the -march causing incorrect libgcc to be used

2023-06-29 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110478

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #5 from palmer at gcc dot gnu.org ---
(In reply to Bin Meng from comment #4)
> I can't get the build to pass with the same configure scripts on current GCC
> HEAD :(
> 
> --host=x86_64-linux-gnu --build=aarch64-linux --target=riscv64-linux
> --enable-targets=all
> --prefix=/home/arnd/cross/x86_64/gcc-13.1.0-nolibc/riscv64-linux
> --enable-languages=c --without-headers --disable-bootstrap --disable-nls
> --disable-threads --disable-shared --disable-libmudflap --disable-libssp
> --disable-libgomp --disable-decimal-float --disable-libquadmath
> --disable-libatomic --disable-libcc1 --disable-libmpx
> --enable-checking=release --with-static-standard-libraries
> 
> Error below:
> 
> Checking multilib configuration for libgcc...
> make[2]: Entering directory '/home/bmeng/git/gcc/riscv64-linux/libgcc'
> Makefile:183: ../.././gcc/libgcc.mvars: No such file or directory
> make[2]: *** No rule to make target '../.././gcc/libgcc.mvars'.  Stop.
> make[2]: Leaving directory '/home/bmeng/git/gcc/riscv64-linux/libgcc'
> make[1]: *** [Makefile:12855: all-target-libgcc] Error 2

It's building for me using riscv-gnu-toolchain and 070a6bf0bdc ("Update
documentation to clarify a GCC extension [PR c/77650]").  If the failure is
still reproducing on a HEAD can you give me a pointer to the exact commit? 
Also might be better to put that in a different bug, it's probably not the same
issue.

[Bug target/109989] RISC-V: Missing sign extension with int to float conversion with 64bit soft floats

2023-06-21 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109989

--- Comment #4 from palmer at gcc dot gnu.org ---
I left some cruft in that reproducer, it should have been

volatile float f[2];
int x[2];

void func() {
  x[0] = -1;
  x[1] = 2;

  for (int i = 0; i < 1; ++i)
f[i] = x[i];
}

Not sure what's going on yet, but nothing jumps out in that bisected patch. 
I'm guessing we've just got something wrong with poly/scalar conversion,
there's a bunch of implicit assumptions based around si/di conversions in our
backend and I bet we're violating something there.  That's all been a sticking
point for a while and I think some of the Ventana guys were looking into
cleaning it up, maybe Jeff has a better idea?

[Bug target/109989] RISC-V: Missing sign extension with int to float conversion with 64bit soft floats

2023-06-21 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109989

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-06-21
 Ever confirmed|0   |1
 CC||palmer at gcc dot gnu.org

--- Comment #3 from palmer at gcc dot gnu.org ---
I can confirm this is a bug.  The shortest reproducer I can get is:

$ cat test.c
volatile float f[2];
int x[2];

float fconv(int x);

void func() {
  x[0] = -1;
  x[1] = 2; // Removal of this line avoids the bug

  for (int i = 0; i < 1; ++i)
f[i] = x[i];
}
$ ./toolchain/install/bin/riscv64-unknown-linux-gnu-gcc test.c -S -o-
-march=rv64imac -mabi=lp64 -O1 -ftree-slp-vectorize -funroll-loops
-fdump-rtl-all
...
func:
addisp,sp,-16
sd  ra,8(sp)
li  a0,3
sllira,a0,32
addia0,ra,-1
lui a5,%hi(x)
sd  a0,%lo(x)(a5)
call__floatsisf
lui t0,%hi(f)
sw  a0,%lo(f)(t0)
ld  ra,8(sp)
addisp,sp,16
jr  ra
...

With the bug being that a0 contains a non-ABI-canonical value at the call to
__floatsisf.

[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly

2023-06-19 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201

--- Comment #6 from palmer at gcc dot gnu.org ---
(In reply to Craig Topper from comment #3)
> I don't have a testsuite. I saw that gcc had crypto builtins and I happened
> to noticed the tests in gcc weren't passing constant arguments.
> 
> We also have a divergence in names between clang and gcc for some crypto
> builtins. We really need to define a scalar crypto intrinsic header file.

OK, let's try and get that sorted out?  We're generally not supposed to be
merging intrinsics without some sort of spec to point at, but we did a pretty
poor job at that for the V intrinsics and it looks like we've slipped a bit
here too.

Unless I'm missing something we haven't released GCC with the crypto intrinsics
yet, so we should be safe to fix bugs there as they come up.

[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly

2023-06-19 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201

--- Comment #5 from palmer at gcc dot gnu.org ---
(In reply to Jeffrey A. Law from comment #4)
> Yea, the tests aren't great.  They'll be better shortly.  They'll test
> non-constant arguments and out-of-range constants, expecting a suitable
> diagnostic.  They'll also test the extrema of valid constants.

Awesome, thanks!

[Bug target/110201] RISC-V: __builtin_riscv_sm4ks and __builtin_riscv_sm4ed produce invalid assembly

2023-06-19 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110201

--- Comment #2 from palmer at gcc dot gnu.org ---
Do you guys have a test suite for these, or did you just happen to run into it?
 The intrinsic testing has been a bit of a blind spot in GCC land.

[Bug target/110146] New: ICE in riscv_vector::function_builder::add_unique_function()

2023-06-06 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110146

Bug ID: 110146
   Summary: ICE in
riscv_vector::function_builder::add_unique_function()
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: palmer at gcc dot gnu.org
  Target Milestone: ---

A few of us were talking about this in the patchwork sync today, I think Juzhe
might have a fix already.  I'm getting a few thousand ICEs running the test
suite, they started yesterday for me.

Executing on host:
/scratch/merges/rgt-gcc-trunk/toolchain/build-gcc-linux-stage2/gcc/xgcc
-B/scratch/merges/rgt-gcc-trunk/toolchain/build-gcc-linux-stage2/gcc/
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/base/binop_vx_constraint-30.c
 -march=rv32imac -mabi=ilp32 -mcmodel=medlow   -fdiagnostics-plain-output  
-march=rv32gcv -mabi=ilp32d -O3 -S   -o binop_vx_constraint-30.s(timeout =
600)
spawn -ignore SIGHUP
/scratch/merges/rgt-gcc-trunk/toolchain/build-gcc-linux-stage2/gcc/xgcc
-B/scratch/merges/rgt-gcc-trunk/toolchain/build-gcc-linux-stage2/gcc/
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/base/binop_vx_constraint-30.c
-march=rv32imac -mabi=ilp32 -mcmodel=medlow -fdiagnostics-plain-output
-march=rv32gcv -mabi=ilp32d -O3 -S -o binop_vx_constraint-30.s
In file included from
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/base/riscv_vector.h:8,
 from
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/testsuite/gcc.target/riscv/rvv/base/binop_vx_constraint-30.c:4:
/scratch/merges/rgt-gcc-trunk/toolchain/build-gcc-linux-stage2/gcc/include/riscv_vector.h:94:9:
internal compiler error: Segmentation fault
0x1107ac3 crash_signal
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/toplev.cc:314
0x7fcf6952508f ???
   
/build/glibc-SzIz7B/glibc-2.31/signal/../sysdeps/unix/sysv/linux/x86_64/sigaction.c:0
0x14022bb tree_class_check(tree_node*, tree_code_class, char const*, int, char
const*)
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/tree.h:3689
0x14022bb type_hash_canon_hash(tree_node*)
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/tree.cc:6028
0x1418475 build_function_type(tree_node*, tree_node*, bool)
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/tree.cc:7454
0x1586775
riscv_vector::function_builder::add_unique_function(riscv_vector::function_instance
const&, riscv_vector::function_shape const*, tree_node*, vec&)
   
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/config/riscv/riscv-vector-builtins.cc:3414
0x158715b build_one
   
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/config/riscv/riscv-vector-builtins-shapes.cc:52
0x15871f7 build_all
   
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/config/riscv/riscv-vector-builtins-shapes.cc:69
0x15871f7 riscv_vector::build_base::build(riscv_vector::function_builder&,
riscv_vector::function_group_info const&) const
   
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/config/riscv/riscv-vector-builtins-shapes.cc:84
0x1580cf6
riscv_vector::function_builder::register_function_group(riscv_vector::function_group_info
const&)
   
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/config/riscv/riscv-vector-builtins.cc:3259
0x1580cf6 riscv_vector::handle_pragma_vector()
   
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/config/riscv/riscv-vector-builtins.cc:4072
0x156757e riscv_pragma_intrinsic
   
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/config/riscv/riscv-c.cc:191
0xa9747a c_parser_pragma
   
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/c/c-parser.cc:13330
0xac4a55 c_parser_external_declaration
   
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/c/c-parser.cc:1906
0xac521d c_parser_translation_unit
   
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/c/c-parser.cc:1779
0xac521d c_parse_file()
   
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/c/c-parser.cc:24657
0xb33d69 c_common_parse_file()
   
/scratch/merges/rgt-gcc-trunk/riscv-gnu-toolchain/gcc/gcc/c-family/c-opts.cc:1248
Please submit a full bug report, with preprocessed source (by using
-freport-bug).
Please include the complete backtrace with any bug report.
See <https://gcc.gnu.org/bugs/> for instructions.
compiler exited with status 1

[Bug target/109972] RISC-V: Could use umodsi3/udivsi3/divsi3 libcalls for 32-bit division/remainder on RV64 without M extension

2023-06-01 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109972

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 Ever confirmed|0   |1
 CC||palmer at gcc dot gnu.org
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2023-06-01
   Severity|normal  |enhancement

--- Comment #1 from palmer at gcc dot gnu.org ---
Thanks.  Craig and I had talked about this offline, it looks like a real
improvement to me.  We're not super worried about rv32 or code size, maybe Kito
is?

[Bug target/109933] __atomic_test_and_set is broken for BIG ENDIAN riscv targets

2023-05-23 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109933

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #7 from palmer at gcc dot gnu.org ---
(In reply to Rory Bolt from comment #6)
> Ah... that code makes so much sense now...
> 
> So my original comment about simply using a different constant was too
> simplistic; what is being attempted is to shift the constant 1 into the
> correct byte position since the flag is only a single byte but the atomic
> store is done on a word... so the shift logic will need to be rewritten for
> big endian targets. This also explains the masking of the low order address
> bits...
> 
> Interesting!

That seems likely to be the culprit.  Do you have time to send a patch?

We should probably also poke through the other sub-word patterns and make sure
nothing else got dropped for BE.

[Bug target/104338] RISC-V: Subword atomics result in library calls

2023-05-16 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104338

--- Comment #20 from palmer at gcc dot gnu.org ---
(In reply to rvalue from comment #19)
> (In reply to Aurelien Jarno from comment #18)
> > I wonder if the following patch should also be backported, as it
> > doesn't make sense to link with -latomic anymore with inline subword atomic
> > operations
> 
> Agreed. It's now meaningless to keep this workaround for RISC-V as the
> problem has been resolved.

Yep.  Can someone send the backport?

[Bug target/109547] New: RISC-V: Multiple vsetvli for load/store loop

2023-04-18 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=109547

Bug ID: 109547
   Summary: RISC-V: Multiple vsetvli for load/store loop
   Product: gcc
   Version: 14.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: palmer at gcc dot gnu.org
  Target Milestone: ---

I was just poking around with a simple loop using the vector intrinsics and
found some odd generated code.  This is on the gcc-13 branch, but that's pretty
close to trunk so I'm filing it for 14.  I'm probably not going to have time to
look for a bit, as it seems to just be a performance issue.

$ cat test.c
#include 

void func(unsigned char *out, unsigned char *in, unsigned long len) {
  unsigned long i = 0;
  while (i < len) {
unsigned long vl = __riscv_vsetvl_e8m1(len - i);
vuint8m1_t r = __riscv_vle8_v_u8m1(in + i, vl);
__riscv_vse8_v_u8m1(out + i, r, vl);
i += vl;
  }
}
$ ../toolchain/install/bin/riscv64-unknown-linux-gnu-gcc test.c -O3 -c -S -o-
-march=rv64gcv -fdump-rtl-all
.file   "test.c"
.option nopic
.attribute arch,
"rv64i2p0_m2p0_a2p0_f2p0_d2p0_c2p0_v1p0_zve32f1p0_zve32x1p0_zve64d1p0_zve64f1p0_zve64x1p0_zvl128b1p0_zvl32b1p0_zvl64b1p0"
.attribute unaligned_access, 0
.attribute stack_align, 16
.text
.align  1
.globl  func
.type   func, @function
func:
.LFB2:
.cfi_startproc
beq a2,zero,.L1
li  a5,0
.L3:
sub a4,a2,a5
add a6,a1,a5
add a3,a0,a5
vsetvli a4,a4,e8,m1,ta,mu
vsetvli zero,a4,e8,m1,ta,ma
add a5,a5,a4
vle8.v  v24,0(a6)
vse8.v  v24,0(a3)
bgtua2,a5,.L3
.L1:
ret
.cfi_endproc
.LFE2:
.size   func, .-func
.ident  "GCC: (g85b95ea729c) 13.0.1 20230417 (prerelease)"
.section.note.GNU-stack,"",@progbits

[Bug rtl-optimization/108826] Inefficient address generation on POWER and RISC-V

2023-02-16 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108826

--- Comment #5 from palmer at gcc dot gnu.org ---
We've run into a handful of things that look like this before, I'm not sure if
it's a backend issue or something more general.  There's two patterns here that
are frequently bad on RISC-V: "unsigned int" array indices and unsigned int
shifting.  I think they might both boil down to some problems we have tracking
the high parts of registers around ABI boundaries.

FWIW, the smallest bad code I can get is

unsigned int func(unsigned int ui) {
return (ui >> 6 & 5) << 2;
}

func:
srliw   a0,a0,6
slliw   a0,a0,2
andia0,a0,20
ret

which is particularly awkward as enough is going right to try and move that
andi, but we still end up with the double shifts.

[Bug target/104338] RISC-V: Subword atomics result in library calls

2023-01-26 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104338

--- Comment #12 from palmer at gcc dot gnu.org ---
I've got a somewhat recently rebased version of Patrick's patch floating
around, it passed testing but I got hung up on the futex_time64 thing and
forgot about it.  Not sure if folks think it's too late for the upcoming CGC
release, but I wouldn't be opposed to taking it -- looks like distros aro going
to apply workarounds if we don't do something, so at least this way there'll be
a single workaround in trunk.

There's some bigger fixes in the works for the whole memory model as we've got
other issues, but since those are a bit tricker it might be worth just doing
the stop-gap thing for now.

[Bug target/106585] RISC-V: Mis-optimized code gen for zbs

2022-12-08 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106585

--- Comment #10 from palmer at gcc dot gnu.org ---
(In reply to Andrew Waterman from comment #9)
> On Wed, Dec 7, 2022 at 7:02 PM palmer at gcc dot gnu.org via Gcc-bugs
>  wrote:
> >
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106585
> >
> > palmer at gcc dot gnu.org changed:
> >
> >What|Removed |Added
> > 
> >          CC||palmer at gcc dot gnu.org
> >
> > --- Comment #8 from palmer at gcc dot gnu.org ---
> > (In reply to Jeffrey A. Law from comment #7)
> > > Raphael and I are poking at this a bit.  I can't convince myself that it's
> > > actually safe to use GPR for the bit manipulation patterns.
> > >
> > > For rv64 I'm pretty sure the b* instructions are operating on 64bit
> > > quantities only.  Meaning they might twiddle the SI sign bit without
> > > extending.  If we were to change these patterns to use GPR and the result
> > > then fed an addw (for example) then we would have inconsistent register
> > > state as operand twiddled by the prior b* pattern wouldn't have been sign
> > > extended.
> > >
> > > To be clear, I think this is a limitation imposed by the ISA docs, not GCC
> > > where this will be reasonably well defined.
> >
> > So you're worried about addw (and the various other OP-32 instructions) 
> > needing
> > signed extended high parts in registers in order to function as expected?  
> > I've
> > never gotten that from the ISA manual, there might be some vestigial 
> > MIPS-isms
> > floating around the RISC-V port that indicate that though (as we've got 
> > similar
> > constraints for the comparisons).
> >
> > That said, I'v gone and actually read the ISA manual here and it's not at 
> > all
> > specific.  I'm seeing
> >
> > ADDW and SUBW are RV64I-only instructions that are defined analogously
> > to ADD and SUB but operate on 32-bit values and produce signed 32-bit
> > results.  Overflows are ignored, and the low 32-bits of the result is
> > sign-extended to 64-bits and written to the destination register.
> >
> > which doesn't explicitly say the high 32-bits of the inputs are ignored.  As
> > far as I can tell "32-bit values" isn't defined anywhere, so that's not so
> > useful.
> >
> > Do you know if there's any hardware that needs extended values for addw and
> > friends?  That'd almost certainly break a lot of binaries, but I could
> > certainly buy an argument saying it's to the spec (and the actual words in 
> > the
> > spec, not just this "anything goes" compatibility stuff).
> 
> The spec explicitly says that the upper 32 bits of the inputs are
> ignored; you just need to read a few paragraphs up.
> https://github.com/riscv/riscv-isa-manual/blob/
> b7080e0d18765730ff4f3d07b866b4884a8be401/src/rv64.tex#L18-L21

Ah, sorry I missed that.  So I think we're fine from the ISA side of things,
it's just the SW constraints to worry about.

> 
> >
> > > With that in mind I think the only path forward is new patterns that 
> > > (sadly)
> > > use explicit subregs for sources, but still set a DImode destination.
> > >
> > > I'm the newbie here, so if I've misinterpreted the ISA docs incorrectly,
> > > don't hesitate to let me know.
> >
> > Kind of just a related FYI: the comparison instructions and various bits of 
> > the
> > ABI do require values in canonical forms (the ABI stuff isn't exactly sign
> > extended, but there's a rule).  That's all a big fragile.

[Bug target/106585] RISC-V: Mis-optimized code gen for zbs

2022-12-07 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106585

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #8 from palmer at gcc dot gnu.org ---
(In reply to Jeffrey A. Law from comment #7)
> Raphael and I are poking at this a bit.  I can't convince myself that it's
> actually safe to use GPR for the bit manipulation patterns.
> 
> For rv64 I'm pretty sure the b* instructions are operating on 64bit
> quantities only.  Meaning they might twiddle the SI sign bit without
> extending.  If we were to change these patterns to use GPR and the result
> then fed an addw (for example) then we would have inconsistent register
> state as operand twiddled by the prior b* pattern wouldn't have been sign
> extended.
> 
> To be clear, I think this is a limitation imposed by the ISA docs, not GCC
> where this will be reasonably well defined.

So you're worried about addw (and the various other OP-32 instructions) needing
signed extended high parts in registers in order to function as expected?  I've
never gotten that from the ISA manual, there might be some vestigial MIPS-isms
floating around the RISC-V port that indicate that though (as we've got similar
constraints for the comparisons).

That said, I'v gone and actually read the ISA manual here and it's not at all
specific.  I'm seeing

ADDW and SUBW are RV64I-only instructions that are defined analogously
to ADD and SUB but operate on 32-bit values and produce signed 32-bit
results.  Overflows are ignored, and the low 32-bits of the result is
sign-extended to 64-bits and written to the destination register.

which doesn't explicitly say the high 32-bits of the inputs are ignored.  As
far as I can tell "32-bit values" isn't defined anywhere, so that's not so
useful.

Do you know if there's any hardware that needs extended values for addw and
friends?  That'd almost certainly break a lot of binaries, but I could
certainly buy an argument saying it's to the spec (and the actual words in the
spec, not just this "anything goes" compatibility stuff).

> With that in mind I think the only path forward is new patterns that (sadly)
> use explicit subregs for sources, but still set a DImode destination.
> 
> I'm the newbie here, so if I've misinterpreted the ISA docs incorrectly,
> don't hesitate to let me know.

Kind of just a related FYI: the comparison instructions and various bits of the
ABI do require values in canonical forms (the ABI stuff isn't exactly sign
extended, but there's a rule).  That's all a big fragile.

[Bug target/106602] riscv: suboptimal codegen for zero_extendsidi2_shifted w/o bitmanip

2022-11-01 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106602

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #8 from palmer at gcc dot gnu.org ---
(In reply to Jeffrey A. Law from comment #7)
> There's some code in combine that's supposed to take advantage of REG_EQUAL
> notes which is supposed to help with this kind of scenario.  Digging into
> that might help.

IMO that's the right way to go here.  I think anything we do in the RISC-V
backend would likely just push around the problem: splitting early seems like
generally the right thing to do, but we'll eventually trip up combine just by
virtue of making instruction sequences longer.  IIUC something like REG_EQUAL
would allow us to keep both flavors around so something can sort it out later.

That said, I've never really reached this deep into the middle end so it's all
a bit over my head and I decided it'd be saner to just close the file and say
nothing ;)

[Bug target/106815] [13 Regression] ICE: in riscv_excess_precision, at config/riscv/riscv.cc:5967 with -fexcess-precision=16 on any input

2022-09-02 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106815

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org
 Ever confirmed|0   |1
 Status|UNCONFIRMED |NEW
   Last reconfirmed||2022-09-03

--- Comment #1 from palmer at gcc dot gnu.org ---
Thanks.  I'm pretty sure all we need is

diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
index 675d92c0961..9b6d3e95b1b 100644
--- a/gcc/config/riscv/riscv.cc
+++ b/gcc/config/riscv/riscv.cc
@@ -5962,6 +5962,7 @@ riscv_excess_precision (enum excess_precision_type type)
   return (TARGET_ZFH ? FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16
 : FLT_EVAL_METHOD_PROMOTE_TO_FLOAT);
 case EXCESS_PRECISION_TYPE_IMPLICIT:
+case EXCESS_PRECISION_TYPE_FLOAT16:
   return FLT_EVAL_METHOD_PROMOTE_TO_FLOAT16;
 default:
   gcc_unreachable ();

I'll send it out assuming it passes the tests.

[Bug middle-end/106818] code is genereated differently with or without 'extern'

2022-09-02 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106818

--- Comment #8 from palmer at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #7)
> (In reply to baoshan from comment #6)
> > > really of unknown alignment then sharing the lui might not work.
> > Can you elaborate why shareing the lui might not work?

Unless I've managed to screw up some bit arithmetic here, it's just overflow
that we're not detecting at link time:

$ cat test.c 
extern char glob[4];

int _start(void) {
int *i = (int *)glob;
return *i;
}
$ cat glob.s 
.section .sdata
.balign 4096
.global empty
empty:
.rep 2046
.byte 0
.endr
.global glob
glob:
.byte 1, 2, 3, 4
$ riscv64-linux-gnu-gcc test.c glob.s -O3 -o test -static -fno-PIE
-mcmodel=medlow -mexplicit-relocs -nostdlib
$ riscv64-linux-gnu-objdump -d test
...
0001010c <_start>:
   1010c:   66c9lui a3,0x12
   1010e:   7ff6c703lbu a4,2047(a3) # 127ff 
   10112:   7fe6c603lbu a2,2046(a3)
   10116:   8006c783lbu a5,-2048(a3)
   1011a:   8016c503lbu a0,-2047(a3)
...

So that's going to load

a3 = 0x127ff 
a2 = 0x127fd
a5 = 0x11800
a6 = 0x11801

Which is wrong.

We can't detect it at link time because both relocations are being processed
correctly, they just don't know about each other (and really can't, because
there's nothing coupling them together).

> Linker relaxation not coming in and relaxing it to be use gp offsets instead.
> It is one of the worst parts of the riscv toolchain ...

Though this time linker relaxation is actually biting us twice:

First, it's masking this problem for small programs: if these accesses are all
within range of GP we end up producing executables that function fine, as the
relaxation calculates the full addresses to use as GP offsets.

Second, the GP relaxations just don't work when we share LUIs for
possibly-misaligned symbols because we delete the LUI if the first low-half is
within GP range.  For example:

$ cat glob.s 
.section .sdata
.global empty
empty:
.rep 4090
.byte 0
.endr
.global glob
glob:
.byte 1, 2, 3, 4
$ riscv64-linux-gnu-gcc test.c glob.s -O3 -o test -static -fno-PIE
-mcmodel=medlow -mexplicit-relocs --save-temps -nostdlib
$ riscv64-linux-gnu-objdump -d test
...
0001010c <_start>:
   1010c:   7fb1c703lbu a4,2043(gp) # 12127 
   10110:   7fa1c603lbu a2,2042(gp) # 12126 
   10114:   1286c783lbu a5,296(a3)
   10118:   1296c503lbu a0,297(a3)
...

We had that problem with the AUIPC->GP relaxation as well, but could fix it
there because the low half points to the high half.  Here I think there's also
nothing we can do in the linker, as there's no way to tell when the result of
the LUI is completely unused -- we could deal with simple cases like this, but
with control flow there's no way to handle all of them.

[Bug middle-end/106818] code is genereated differently with or without 'extern'

2022-09-02 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106818

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #4 from palmer at gcc dot gnu.org ---
(In reply to Andrew Pinski from comment #2)
> Most likely known alignment or not.
> Riscv targets are sensitive to alignment.

Not sure I'm allowed to paste the code in for them, but that's what's going on
here: with -mtune=thead-c906 both cases have a single store, the default is for
Rocket which has very slow misaligned accesses.

That said, I think we actually have a bug here: if the extern symbol was really
of unknown alignment then sharing the lui might not work.

[Bug target/106807] RISC-V: libatomic routines are infinate loops

2022-09-01 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106807

--- Comment #3 from palmer at gcc dot gnu.org ---
(In reply to Andreas Schwab from comment #1)
> That happens if you use a modified compiler that automatically adds
> -latomic, so that configure in libatomic thinks that the builtins are
> available and expanded inline.

I'm just running trunk (or at least trying to, something from the environment
may have leaked in).  I do find that my multilib libatomic autoconf runs have
-pthread, which causes -latomic on RISC-V.  I'm not sure what's made that
happen, though.

Maybe the right answer here is to just go commit
https://gcc.gnu.org/pipermail/gcc-patches/2022-August/600151.html rather than
spending time trying to figure out how my environment has broken?

[Bug target/106807] New: RISC-V: libatomic routines are infinate loops

2022-09-01 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106807

Bug ID: 106807
   Summary: RISC-V: libatomic routines are infinate loops
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: palmer at gcc dot gnu.org
  Target Milestone: ---

We've started compiling some libatomic routines to infinite loops, for example
testsuite/gcc.dg/atomic/stdatomic-load-1.c ends up with

000107ce <__atomic_fetch_add_1>:
   107ce:   4615li  a2,5
   107d0:   bffdj   107ce <__atomic_fetch_add_1>

This results in a bunch of test suite failures.

[Bug target/106544] riscv_print_operand does not check to see if the operands are valid to do INTVAL on them

2022-08-06 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106544

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #1 from palmer at gcc dot gnu.org ---
Do you have an example of how to trigger the ICE?  I don't know what RTL
checking is and poking around on the internet isn't helping much.  I've got
something I think should fix the problem (just checking CONST_INT_P every
time), but i'd like to have a test case.

[Bug target/106517] New: RISC-V: Inefficient Generated Code for Floating Point to Integer Rounds

2022-08-03 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=106517

Bug ID: 106517
   Summary: RISC-V: Inefficient Generated Code for Floating Point
to Integer Rounds
   Product: gcc
   Version: 13.0
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: palmer at gcc dot gnu.org
  Target Milestone: ---

RISC-V has a handful of floating-point conversion instructions that we don't
appear to be taking advantage of.  For example

long f(double in)
{
return __builtin_floor(in);
}

generates a call to the floor() library routine, while I believe we can
implement in via just a "fcvt.l.d a0, fa0, rdn" (RISC-V clang and arm64 GCC). 
There are a bunch of similar patterns, the aarch64 test suite seems to have
pretty good coverage of them.

We should port those tests over to RISC-V, figure out which conversions we can
implement directly, and then fix whatever's broken.  I started poking around a
bit and found that even some of the conversions where we have MD file patterns
aren't behaving as expected, so there might be some deeper issue going on.

This has come up in a handful of forums lately and while we're still hoping to
find some time to look into it, I figured it'd be best to open at least a basic
bug so at least we can have one place to track the issues.

[Bug target/105355] -msmall-data-limit= unexpectedly accepts a separate argument

2022-05-11 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105355

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #6 from palmer at gcc dot gnu.org ---
Sorry, I hadn't seen the original bug.  This looks good to me, I don't think
there was any use case for something like "-msmall-data-limit= N" (ie, with the
space).  Looks like that's been there since the original port, so it was
probably just an oversight.

[Bug tree-optimization/102892] [12/13 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2022-05-03 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102892

--- Comment #13 from palmer at gcc dot gnu.org ---
I just posted a patch
<https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593995.html> that removes
the undefined behavior from this test case, with that it links on RISC-V.

[Bug tree-optimization/102892] [12/13 Regression] Dead Code Elimination Regression at -O3 (trunk vs 11.2.0)

2022-05-03 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102892

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #12 from palmer at gcc dot gnu.org ---
I just posted a patch
<https://gcc.gnu.org/pipermail/gcc-patches/2022-May/593995.html> that removes
the undefined behavior from this test case, with that it links on RISC-V.

[Bug target/104338] RISC-V: Subword atomics result in library calls

2022-04-07 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104338

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||kito.cheng at gmail dot com

--- Comment #6 from palmer at gcc dot gnu.org ---
Kito pointed out earlier today that it should already be possible to default to
libatomic via a --with-specs=... configure-time argument already, so one option
here would be to just add an example/reference spec to GCC.  That would allow
distros to opt in to the "always link libatomic" behavior, if they want to risk
the ABI-related issues like we see in libstdcxx (which we'd of course have to
fix).  It doesn't sort out the long-tail issues related to ABI compatibility
between GCC and LLVM (and the suggested mappings), but at least it gives folks
a unified mechanism for doing this.

I know it's pretty late, but that seems like something we could do on the
GCC-12 timeline.  It seems like the distro folks are pretty fed up with waiting
so they're just going to backport/hack this if we miss GCC-12, might as well
have one way for that to happen.

[Bug target/104338] RISC-V: Subword atomics result in library calls

2022-04-07 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104338

--- Comment #5 from palmer at gcc dot gnu.org ---
(In reply to rvalue from comment #4)
> In short term, maybe we can change the spec to link against libatomic by
> default (implemented in
> https://github.com/riscv-collab/riscv-gcc/commit/
> 2c4857d0981501b7c50bbf228de9e287611f8ae5). It will solve a lot of build
> errors if we revert the value of `LIB_SPEC` instead of only link against
> libatomic when `-pthread` is present.
> 
> Detailed talk about this:
> https://github.com/riscv-collab/riscv-gcc/issues/337

We talked through some options like that and decided it was too risky for
GCC-12.  We already found one ABI break related to this (see 84568), and want
to make sure we give distros adequate advance notice before something that we
know to break ABIs.

That said, it's really not a GCC ABI break, it's a per-package configure issue.
 We can fix the libstdcxx fallout, which is the only bit we know about right
now (though it's not like we've scrubbed builds for this).  If the folks
building distros think it's better to risk the ABI breaks rather than chase
around the build failures, then I'm fine rushing something in to GCC-12.

I see Andreas is already here, I'm having some trouble adding anyone else
though (I can never quite figure out Bugzilla...).

[Bug target/104831] RISCV libatomic LR.aq/SC.rl pair insufficient for SEQ_CST

2022-03-07 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104831

--- Comment #1 from palmer at gcc dot gnu.org ---
I'm not quite sure what the rules on targeting 12 for this one: it's not
technically a regression, as it's always been broken, but it is a bug.  I'd err
on the side of taking a fix, as we're just strengthening the GCC implementation
so it should be pretty safe.  That said, anything related to memory models has
the potential for complicated fallout so I could understand wanting to wait
just to play it safe.

Patrick has a patch in progress.

[Bug libstdc++/84568] libstdc++-v3 configure checks for atomic operations fail on riscv

2022-02-08 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84568

--- Comment #10 from palmer at gcc dot gnu.org ---
(In reply to Jonathan Wakely from comment #7)
> (In reply to Jonathan Wakely from comment #6)
> > (In reply to Jonathan Wakely from comment #5)
> > > (In reply to palmer from comment #3)
> > > > It looks like LLVM already has inline atomics, so presumably the same 
> > > > issues
> > > > would arise when mixing libstdc++ libraries compiled with LLVM and GCC.
> > > 
> > > Yup.
> > 
> > But not just "when mixing libstdc++ libraries". When mixing pretty much any
> > C++ code that uses libstdc++ headers.
> 
> Oh actually, sorry, that's wrong. The atomic policy for libstdc++ is set at
> configure time, based on the GCC building it. We define a macro, and that is
> fixed for the lifetime of that libstdc++ installation. So it doesn't matter
> if you compile those same headers with Clang, which _could_ use atomic
> built-ins, the atomic policy is still decided by the macro which doesn't
> change after installation.
> 
> So it's only a problem when mixing user code compiled with old and new
> libstdc++ headers.
> 
> And I've been confusing the _GLIBCXX_ATOMIC_BUILTINS macro with the
> _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY macro.

I hadn't even noticed the _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY macro, thanks!

> If _GLIBCXX_ATOMIC_BUILTINS changes from undefined to 1, I think that's OK.
> Old code will still call the non-inline functions in libstdc++.so, but those
> will now be consistent with the inline ones that new code is calling.

My specific worry was users mixing in routines from two different versions of
libstdc++: for example, maybe there's some statically linked executable that
puts a shared pointer into a mmap'd region, which it then expects to work when 

After signing off last night I realized that none of that would work anyway,
though, as even with the same library on both ends users would end up with a
different mutex and thus races.  So I think that one isn't worth worrying
about.

[snip, I'm mixing two replies here]

> Thanks. Changing that will cause an ABI break in the headers (and so affect
> user code, not just the libstdc++.so library).
> 
> Clang and GCC will still be compatible, because the macros are still set
> once by configure when building libstdc++.

>From my reading of this, GCC and clang will build libstdc++ binaries with
incompatible ABIs: clang has inline atomics, so the 2-byte CAS check will
succeed and we'll end up with libstdcxx_atomic_lock_policy=atomic .  I don't
actually have a clang build around to test that with, though, and I'm not sure
if folks are shipping clang-built libstdc++ anywhere (and if so, are expecting
it to be compatible with a GCC-built libstdc++).

> One solution would be to override the checks in libstdc++-v3/acinclude.m4 so
> that _GLIBCXX_HAVE_ATOMIC_LOCK_POLICY is also #undef for RISC-V, even after
> the atomic built-ins are supported. That would preserve the ABI, but would
> mean ref-counting in libstdc++ is sub-optimal.
> 
> Or let the default change, and vendors who want to preserve the old ABI can
> configure with --with-libstdcxx-lock-policy=mutex to override the default.

I guess I'm not really sure here: normally I'd say we're stuck with the default
being ABI compatible, but I don't know the rules in libstdc++.  I'm assuming
that forcing the default to be mutex could still allow users who want atomic to
configure with --with-libstdcxx-lock-policy=atomic, so at least there's a path
forward.  IIUC users will get link errors when moving between the two flavors
(the explicit template instantiations will have different integer values), so
at least there's a way for distros to make sure they've re-built everything if
they want to change.

I could also imagine a much more complicated third option: essentially
upgrading the mutex to an RW-like lock and allowing the atomic-based routines
to proceed concurrently.  I poked around the locking code a bit have no idea if
this would even be possible, it's all complicated enough that it seems like at
best a bad idea.

I guess this is really a distro sort of question, but I'd lean towards forcing
the default to mutex on RISC-V, thus keeping ABI compatibility.  Then at least
the distros can pick if they want to have a flag day around this.

[Bug libstdc++/84568] libstdc++-v3 configure checks for atomic operations fail on riscv

2022-02-07 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84568

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #3 from palmer at gcc dot gnu.org ---
I'm not sure what the right way to go about fixing this is.

Assuming the inline atomics
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104338#c3> work out we'll end up
with libstdc++ using atomics when compiled with newer GCCs and using locks when
compiled with older GCCs.  Those two implementations aren't compatible with
each other, as the direct atomics won't respect the lock.  That certainly means
these two implementations can't coexist, but I'm not actually sure if this is
an ABI break because I don't know what the ABI surface is supposed to be here.

As far as I can tell simple uses cases are safe here, as __exchange_and_add
isn't inlined outside of the shared library (though I'm not seeing anything
guaranteeing that, so I may be wrong here).  Doing something like trying to
mmap a shared_ptr and access from both flavors of the library (maybe one's
statically linked, for example) would also break, and I could imagine that
existing in real code.

It looks like LLVM already has inline atomics, so presumably the same issues
would arise when mixing libstdc++ libraries compiled with LLVM and GCC.

[Bug target/104338] New: RISC-V: Subword atomics result in library calls

2022-02-01 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104338

Bug ID: 104338
   Summary: RISC-V: Subword atomics result in library calls
   Product: gcc
   Version: unknown
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: target
  Assignee: unassigned at gcc dot gnu.org
  Reporter: palmer at gcc dot gnu.org
  Target Milestone: ---

There's a handful of bugs sort of related to this one, but nothing specific. 
This has been a long-standing issue and I think folks are generally familiar
with it, but just to summarize: we don't have sub-word atomic instruction in
RISC-V, so we just call out to the libatomic routines.  This causes fallout in
a handful of places (see 86005 and 81358, for example) and there's been some
attempts to resolve it but nothing appears to have stuck.

I figured it'd be a good starter project for Patrick, as he's yet to do any GCC
stuff.  He's working through it and doesn't have anything to post yet, but
figured I'd just open the bug now so folks knew what was going on from our end.

[Bug target/94136] GCC doc for built-in function __builtin___clear_cache() not 100% correct

2021-04-29 Thread palmer at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=94136

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org

--- Comment #4 from palmer at gcc dot gnu.org ---
IMO this isn't actually a documentation bug, it's a functionality bug.  I've
sent a patch, sorry it took a while -- I hadn't seen this one, but just
stumbled across Jim's mostly unrelated email and happened to find this bug.

[Bug target/85492] riscv64: endless loop when throwing an exception from a constructor

2018-04-27 Thread palmer at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85492

--- Comment #5 from palmer at gcc dot gnu.org ---
Thanks Jim.  This looks good to me, are you comfortable submitting glibc
patches?  If so then I'll commit it, otherwise I can send it out myself.

[Bug target/85492] riscv64: endless loop when throwing an exception from a constructor

2018-04-26 Thread palmer at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=85492

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 CC||palmer at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |palmer at gcc dot 
gnu.org

--- Comment #2 from palmer at gcc dot gnu.org ---
After talking with Jim, I have a suspicion this is a glibc bug -- we changed
some stuff in the symbol resolution path as part of the upstreaming process and
I bet we screwed something up.  I'm going to leave this bug open for now but
investigate.

[Bug target/82717] [RISCV] Default value of the -mabi option doesn't match documentation

2017-10-27 Thread palmer at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82717

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #7 from palmer at gcc dot gnu.org ---
Thanks!

[Bug target/82717] [RISCV] Default value of the -mabi option doesn't match documentation

2017-10-27 Thread palmer at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82717

--- Comment #6 from palmer at gcc dot gnu.org ---
Author: palmer
Date: Fri Oct 27 15:22:43 2017
New Revision: 254153

URL: https://gcc.gnu.org/viewcvs?rev=254153=gcc=rev
Log:
RISC-V: Correct and improve the "-mabi" documentation

The documentation for the "-mabi" argument on RISC-V was incorrect.  We
chose to treat this as a documentation bug rather than a code bug, and
to make the documentation match what GCC currently does.  In the
process, I also improved the documentation a bit.

Thanks to Alex Bradbury for finding the bug!

PR target/82717: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82717

gcc/ChangeLog

2017-10-27  Palmer Dabbelt  <pal...@dabbelt.com>

PR target/82717
* doc/invoke.texi (RISC-V) <-mabi>: Correct and improve.

Modified:
trunk/gcc/ChangeLog
trunk/gcc/doc/invoke.texi

[Bug target/82717] [RISCV] Default value of the -mabi option doesn't match documentation

2017-10-25 Thread palmer at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82717

--- Comment #4 from palmer at gcc dot gnu.org ---
(In reply to Alex Bradbury from comment #2)
> (In reply to palmer from comment #1)
> > Thanks Alex -- you're correct that this is a documentation/code mismatch.  I
> > just talked to Andrew and we think it's best to change the documentation. 
> > How does this sound:
> > 
> > """
> > @item -mabi=@var{ABI-string}
> > @opindex mabi
> > Specify integer and floating-point calling convention.  The default for this
> > argument is system dependent, users who want a specific calling convention
> > should specify one explicitly.  The valid calling conventions are: ilp32,
> > ilp32f, ilp32d, lp64, lp64f, and lp64d.
> > """
> 
> I can see how a doc fix probably makes most sense at this point, as the
> behaviour has been shipping in GCC for a while. Although I like the idea of
> less typing it could be that inferring a default ABI from the target -march
> was a little too magic in the first place.

We kicked around the idea of making GCC match the docs, but decided that the
whole reason we have two arguments here is because there's no way to guess at
the ABI from the ISA.

For example, what happens if you pass "-march=rv64imafc" on a Linux system? 
The "use the natural ABI" convention would say you get "-mabi=lp64f", but we
don't build as part of the multilib set of Linux.

> I'd also suggest adding another sentence or two to explain that ilp32 and
> lp64 describe soft-float calling conventions, while the f and d suffixes
> indicate hard single or double precision floating point calling conventions.

"""
@item -mabi=@var{ABI-string}
@opindex mabi
Specify integer and floating-point calling convention.  @var{ABI-string}
contains two parts: the size of integer types and the registers used for
floating-point types.  For example "-march=rv64ifd -mabi=lp64d" means that
"long" and pointers are 64-bit (implicitly defining "int" to be 32-bit), and
that floating-point values up to 64 bits wide are passed in F registers. 
Contrast this with "-march=rv64ifd -mabi=lp64f", which still allows the
compiler to generate code that uses the F and D extensions but only allows
floating-point values up to 32 bits long to be passed in registers; or
"-march=rv64ifd -mabi=lp64", in which no floating-point arguments will be
passed in registers.

The default for this argument is system dependent, users who want a specific
calling convention should specify one explicitly.  The valid calling
conventions are: ilp32, ilp32f, ilp32d, lp64, lp64f, and lp64d.  Some calling
conventions are impossible to implement on some ISAs: for example,
"-march=rv32if -mabi=ilp32d" is invalid because the ABI requires 64-bit values
be passed in F registers, but F registers are only 32 bits wide.
"""

[Bug target/82717] [RISCV] Default value of the -mabi option doesn't match documentation

2017-10-25 Thread palmer at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82717

palmer at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |ASSIGNED
   Last reconfirmed||2017-10-25
 CC||palmer at gcc dot gnu.org
   Assignee|unassigned at gcc dot gnu.org  |palmer at gcc dot 
gnu.org
 Ever confirmed|0   |1

--- Comment #1 from palmer at gcc dot gnu.org ---
Thanks Alex -- you're correct that this is a documentation/code mismatch.  I
just talked to Andrew and we think it's best to change the documentation.  How
does this sound:

"""
@item -mabi=@var{ABI-string}
@opindex mabi
Specify integer and floating-point calling convention.  The default for this
argument is system dependent, users who want a specific calling convention
should specify one explicitly.  The valid calling conventions are: ilp32,
ilp32f, ilp32d, lp64, lp64f, and lp64d.
"""

[Bug target/79912] [7 regression] LRA unable to generate reloads after r245655

2017-03-20 Thread palmer at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79912

--- Comment #18 from palmer at gcc dot gnu.org ---
Author: palmer
Date: Mon Mar 20 16:43:21 2017
New Revision: 246283

URL: https://gcc.gnu.org/viewcvs?rev=246283=gcc=rev
Log:
RISC-V: Don't prefer FP_REGS for integers

On RISC-V we can't store integers in floating-point registers as this is
forbidden by the ISA.  We've always disallowed this, but we were
setting the preferred mode to FP_REGS for some integer modes.  This
caused the LRA to blow up with some hard to read error messages.

This patch removes the prefered mode hook, as the right thing to do here
is nothing.

Thanks to Kito for finding the bug, and mpf for the fix.  See also
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79912>.

PR target/79912

Modified:
trunk/gcc/ChangeLog
trunk/gcc/config/riscv/riscv.c