Re: [PATCH] RISC-V: use fclass insns to implement isfinite and isnormal builtins

2024-06-28 Thread Andrew Waterman
+1 to any change that reduces the number of fflags accesses.


On Fri, Jun 28, 2024 at 5:54 PM Vineet Gupta  wrote:
>
> Currently isfinite and isnormal use float compare instructions with fp
> flags save/restored around them. Our perf team complained this could be
> costly in uarch. RV Base ISA already has FCLASS.{d,s,h} instruction to
> do FP compares w/o disturbing FP exception flags.
>
> Coincidently, upstream ijust few days back got support for the
> corresponding optabs. All that is needed is to wire these up in the
> backend.
>
> I was also hoping to get __builtin_inf() done but unforutnately it
> requires little more rtl foo/bar to implement a tri-modal return.
>
> Currently going thru CI testing.
>
> gcc/ChangeLog:
> * config/riscv/riscv.md: Add UNSPEC_FCLASS, UNSPEC_ISFINITE,
> USPEC_ISNORMAL.
> define_insn for fclass.
> define_expand for isfinite and isnormal.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/riscv/fclass.c: New test.
>
> Signed-off-by: Vineet Gupta 
> ---
>  gcc/config/riscv/riscv.md   | 56 +
>  gcc/testsuite/gcc.target/riscv/fclass.c | 18 
>  2 files changed, 74 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/fclass.c
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index ff37125e3f28..fc4441916137 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -68,6 +68,9 @@
>UNSPEC_FMAX
>UNSPEC_FMINM
>UNSPEC_FMAXM
> +  UNSPEC_FCLASS
> +  UNSPEC_ISFINITE
> +  UNSPEC_ISNORMAL
>
>;; Stack tie
>UNSPEC_TIE
> @@ -3436,6 +3439,59 @@
> (set_attr "mode" "")
> (set (attr "length") (const_int 16))])
>
> +;; fclass instruction output bitmap
> +;;   0 negative infinity
> +;;   1 negative normal number.
> +;;   2 negative subnormal number.
> +;;   3 -0
> +;;   4 +0
> +;;   5 positive subnormal number.
> +;;   6 positive normal number.
> +;;   7 positive infinity
> +;;   8 signaling NaN.
> +;;   9 quiet NaN
> +(define_insn "fclass"
> +  [(set (match_operand:SI  0 "register_operand" "=r")
> +   (unspec:SI [(match_operand:ANYF 1 "register_operand" " f")]
> +  UNSPEC_FCLASS))]
> +  "TARGET_HARD_FLOAT"
> +  "fclass.\t%0,%1"
> +  [(set_attr "type" "fcmp")
> +   (set_attr "mode" "")])
> +
> +(define_expand "isfinite2"
> +  [(set (match_operand:SI  0 "register_operand" "=r")
> +   (unspec:SI [(match_operand:ANYF 1 "register_operand" " f")]
> +  UNSPEC_ISFINITE))]
> +  "TARGET_HARD_FLOAT"
> +{
> +  rtx tmp = gen_reg_rtx (SImode);
> +  emit_insn (gen_fclass (tmp, operands[1]));
> +  riscv_emit_binary (AND, tmp, tmp, GEN_INT (0x7e));
> +  rtx cmp = gen_rtx_NE (SImode, tmp, const0_rtx);
> +  emit_insn (gen_cstoresi4 (operands[0], cmp, tmp, const0_rtx));
> +
> +  DONE;
> +})
> +
> +;; TODO: isinf is a bit tricky as it require trimodal return
> +;;  1 if 0x80, -1 if 0x1, 0 otherwise
> +
> +(define_expand "isnormal2"
> +  [(set (match_operand:SI  0 "register_operand" "=r")
> +   (unspec:SI [(match_operand:ANYF 1 "register_operand" " f")]
> +  UNSPEC_ISNORMAL))]
> +  "TARGET_HARD_FLOAT"
> +{
> +  rtx tmp = gen_reg_rtx (SImode);
> +  emit_insn (gen_fclass (tmp, operands[1]));
> +  riscv_emit_binary (AND, tmp, tmp, GEN_INT (0x42));
> +  rtx cmp = gen_rtx_NE (SImode, tmp, const0_rtx);
> +  emit_insn (gen_cstoresi4 (operands[0], cmp, tmp, const0_rtx));
> +
> +  DONE;
> +})
> +
>  (define_insn "*seq_zero_"
>[(set (match_operand:GPR   0 "register_operand" "=r")
> (eq:GPR (match_operand:X 1 "register_operand" " r")
> diff --git a/gcc/testsuite/gcc.target/riscv/fclass.c 
> b/gcc/testsuite/gcc.target/riscv/fclass.c
> new file mode 100644
> index ..0dfac982ebeb
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/fclass.c
> @@ -0,0 +1,18 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target hard_float } */
> +/* { dg-options "-march=rv64gc -mabi=lp64d  -ftrapping-math { target { rv64 
> } } } */
> +/* { dg-options "-march=rv32gc -mabi=ilp32d -ftrapping-math { target { rv32 
> } } } */
> +
> +int t_isfinite(double a)
> +{
> +  return __builtin_isfinite(a);
> +}
> +
> +int t_isnormal(double a)
> +{
> +  return __builtin_isnormal(a);
> +}
> +
> +/* { dg-final { scan-assembler-not   {\mfrflags}  } } */
> +/* { dg-final { scan-assembler-not   {\mfsflags}  } } */
> +/* { dg-final { scan-assembler-times {\tfclass} 2 } } */
> --
> 2.34.1
>


Re: [PATCH v2 1/3] RISC-V: Add basic Zaamo and Zalrsc support

2024-06-04 Thread Andrew Waterman
On Tue, Jun 4, 2024 at 10:31 AM Patrick O'Neill  wrote:
>
> On 6/3/24 20:00, Kito Cheng wrote:
>
> Hi Patrick:
>
> One dumb question around Zaamo and Zalrsc, could we still got correct
> atomic semantic with only Zaamo or only Zalrsc? I guess Zalrsc only
> probably ok, but how about Zaamo only?
>
> This is a very valid question - AFAIK Zalrsc is always correct and
> Zaamo is _not_ always correct.
>
> We use the mappings present in the PSABI doc when directly emitting
> insns.
>
> LR/SC sequences can approximate atomic insns with a retry loop so it
> will emit valid asm for any 'a' extension usage (patch 3/3 adds this
> support).
>
> Zaamo cannot approximate LR/SC sequences so GCC emit a libatomic call
> if your code requires an LR/SC. This _is_ invalid behavior and is
> discussed here: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86005

Note also there's an old proof that the Zaamo instructions are
insufficient to emulate CAS.  Since LR/SC _is_ sufficient to emulate
CAS, it follows logically that Zaamo is insufficient to emulate LR/SC.
https://cs.brown.edu/~mph/Herlihy91/p124-herlihy.pdf

>
> TLDR: Zaamo can only support amo ops and will emit calls for LR/SC
> ops which is invalid behavior when mixed with atomic
> loads/stores/amo ops (currently observable on trunk with non-atomic
> targets emitting fenced loads/stores mixed with libatomic calls).
>
> And another question around authorship: I notice you are listed as
> co-authored, and signed off by Edwin, but according to the mail (and
> the result of git pw patch apply) the main author is you? So I'm just
> curious who the main author is? not necessary to list co-authored
> again if it's you, and need to update author info if it's Edwin, I
> know you guy are in same the company, so that's may not big issue is
> not clear, but personally I would like to mention correct authorship
> if possible :P
>
> Edwin wrote the initial 1/3 patch and I did edits on top of that.
> Authorship got clobbered when I was rebasing. If this revision
> gets approved I'll fix it before merging. Thanks for catching this!
>
> Thanks!
> Patrick
>
> [1] How to update author for single commit:
> https://stackoverflow.com/questions/3042437/how-can-i-change-the-commit-author-for-a-single-commit
>
> On Tue, Jun 4, 2024 at 5:54 AM Patrick O'Neill  wrote:
>
> The A extension has been split into two parts: Zaamo and Zalrsc.
> This patch adds basic support by making the A extension imply Zaamo and
> Zalrsc.
>
> Zaamo/Zalrsc spec: https://github.com/riscv/riscv-zaamo-zalrsc/tags
> Ratification: https://jira.riscv.org/browse/RVS-1995
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc: Add Zaamo and Zalrsc.
> * config/riscv/arch-canonicalize: Make A imply Zaamo and Zalrsc.
> * config/riscv/riscv.opt: Add Zaamo and Zalrsc
> * config/riscv/sync.md: Convert TARGET_ATOMIC to TARGET_ZAAMO and
> TARGET_ZALRSC.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/attribute-15.c: Adjust expected arch string.
> * gcc.target/riscv/attribute-16.c: Ditto.
> * gcc.target/riscv/attribute-17.c: Ditto.
> * gcc.target/riscv/attribute-18.c: Ditto.
> * gcc.target/riscv/pr110696.c: Ditto.
> * gcc.target/riscv/rvv/base/pr114352-1.c: Ditto.
> * gcc.target/riscv/rvv/base/pr114352-3.c: Ditto.
>
> Signed-off-by: Edwin Lu 
> Co-authored-by: Patrick O'Neill 
> ---
>  gcc/common/config/riscv/riscv-common.cc   | 11 +--
>  gcc/config/riscv/arch-canonicalize|  1 +
>  gcc/config/riscv/riscv.opt|  6 +++-
>  gcc/config/riscv/sync.md  | 30 +--
>  gcc/testsuite/gcc.target/riscv/attribute-15.c |  2 +-
>  gcc/testsuite/gcc.target/riscv/attribute-16.c |  2 +-
>  gcc/testsuite/gcc.target/riscv/attribute-17.c |  2 +-
>  gcc/testsuite/gcc.target/riscv/attribute-18.c |  2 +-
>  gcc/testsuite/gcc.target/riscv/pr110696.c |  2 +-
>  .../gcc.target/riscv/rvv/base/pr114352-1.c|  4 +--
>  .../gcc.target/riscv/rvv/base/pr114352-3.c|  8 ++---
>  11 files changed, 41 insertions(+), 29 deletions(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 88204393fde..78dfd6b1470 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -79,6 +79,9 @@ static const riscv_implied_info_t riscv_implied_info[] =
>{"f", "zicsr"},
>{"d", "zicsr"},
>
> +  {"a", "zaamo"},
> +  {"a", "zalrsc"},
> +
>{"zdinx", "zfinx"},
>{"zfinx", "zicsr"},
>{"zdinx", "zicsr"},
> @@ -255,6 +258,8 @@ static const struct riscv_ext_version 
> riscv_ext_version_table[] =
>{"za64rs",  ISA_SPEC_CLASS_NONE, 1, 0},
>{"za128rs", ISA_SPEC_CLASS_NONE, 1, 0},
>{"zawrs", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zaamo", ISA_SPEC_CLASS_NONE, 1, 0},
> +  {"zalrsc", ISA_SPEC_CLASS_NONE, 1, 0},
>
>{"zba", ISA_SPEC_CLASS_NONE, 1, 0},
>{"zbb", ISA_SPEC_CLASS_NONE, 1, 

Re: [PATCH][risc-v] libstdc++: Preserve signbit of nan when converting float to double [PR113578]

2024-05-08 Thread Andrew Waterman
On Tue, May 7, 2024 at 9:46 AM Jonathan Wakely  wrote:
>
> On Tue, 7 May 2024 at 17:39, Jonathan Wakely  wrote:
> >
> > On Tue, 7 May 2024 at 17:33, Jeff Law wrote:
> > >
> > >
> > >
> > > On 5/7/24 9:36 AM, Andreas Schwab wrote:
> > > > On Mai 07 2024, Jonathan Wakely wrote:
> > > >
> > > >> +#ifdef __riscv
> > > >> +return _M_insert(__builtin_copysign((double)__f,
> > > >> +
> > > >> (double)-__builtin_signbit(__f));
> > > >
> > > > Should this use static_cast?
> >
> > Meh. It wouldn't fit in 80 columns any more with static_cast, and it
> > means exactly the same thing.
> >
> > > And it's missing a close paren.
> >
> > Now that's more important! Thanks.
>
> Also, I've just realised that signbit might return a negative value if
> the signbit is set. The spec only says it returns non-zero if the
> signbit is set.
>
> So maybe we want:
>
> #ifdef __riscv
> const int __neg = __builtin_signbit(__f) ? -1 : 0;
> return _M_insert(__builtin_copysign(static_cast(__f),
>   static_cast(__neg)));
> #else
> return _M_insert(static_cast(__f));
> #endif

We can avoid the signbit call altogether by taking advantage of the
fact that type-punning the float to an int, then converting that int
to a double, will produce a double with the sign of the original
value, with no exceptions raised in the process.  (I don't know
whether we're allowed to use std::bit_cast in this context, but a
type-punning memcpy would have the same effect.)

  int __i = std::bit_cast(__f);
  return _M_insert(__builtin_copysign(static_cast(__f),
static_cast(__i)));

Empirically, this saves 3 instructions on RV64 or 1 instruction on
RV32 (as measured on GCC 13.2.0).  Note, I'm not trying to drag-race
on performance here.  Rather, I'm trying to minimize the extent to
which this RISC-V idiosyncrasy results in static code-size bloat.

BTW, I agree with Palmer that adding a __builtin with these semantics
seems advisable if this pattern turns out to recur frequently.


Re: [RFA][RISC-V] Improve constant synthesis for constants with 2 bits set

2024-04-30 Thread Andrew Waterman
On Tue, Apr 30, 2024 at 8:54 PM Jeff Law  wrote:
>
>
> In doing some preparation work for using zbkb's pack instructions for
> constant synthesis I figured it would be wise to get a sense of how well
> our constant synthesis is actually working and address any clear issues.
>
> So the first glaring inefficiency is in our handling of constants with a
> small number of bits set.  Let's start with just two bits set.   There
> are 2016 distinct constants in that space (rv64).  With Zbs enabled the
> absolute worst we should ever do is two instructions (bseti+bseti).  Yet
> we have 503 cases where we're generating 3+ instructions when there's
> just two bits set in the constant.  A constant like 0x80001000
> generates 4 instructions!
>
> This patch adds bseti (and indirectly binvi if we needed it) as a first
> class citizen for constant synthesis.  There's two components to this
> change.
>
> First, we can't generate an IOR with a constant like (1 << 45) as an
> operand.  The IOR/XOR define_insn is in riscv.md.  The constant argument
> for those patterns must match an arith_operand which means its not
> really usable for generating bseti directly in the cases we care about
> (at least one of the bits will be in the 32..63 range and thus won't
> match arith_operand).
>
> We have a few things we could do.  One would be to extend the existing
> pattern to incorporate bseti cases.  But I suspect folks like the
> separation of the base architecture (riscv.md) from the Zb* extensions
> (bitmanip.md).  We could also try to generate the RTL for bseti
> directly, bypassing gen_fmt_ee (which forces undesirable constants into
> registers based on the predicate of the appropriate define_insn).
> Neither of these seemed particularly appealing to me.
>
> So what I've done instead is to make ior/xor a define_expand and have
> the expander allow a wider set of constant operands when Zbs is enabled.
>   That allows us to keep the bulk of Zb* support inside bitmanip.md and
> continue to use gen_fmt_ee in the constant synthesis paths.

Seems like a clean solution to me.

>
> Note the code generation in this case is designed to first set as many
> bits as we can with lui, then with addi since those can both set
> multiple bits at a time.  If there are any residual bits left to set we
> can emit bseti instructions up to the current cost ceiling.
>
> This results in fixing all of the 503 2-bit set cases where we emitted
> too many instructions.  It also significantly helps other scenarios with
> more bits set.
>
> The testcase I'm including verifies the number of instructions we
> generate for the full set of 2016 possible cases.  Obviously this won't
> be possible as we increase the number of bits (there are something like
> 48k cases with just 3 bits set).
>
> Build and regression tested on rv64gc.  OK for the trunk?
>
>
> THanks,
> Jeff
>


Re: [gcc-15 2/3] RISC-V: avoid LUI based const mat: keep stack offsets aligned

2024-03-19 Thread Andrew Waterman
On Tue, Mar 19, 2024 at 1:05 PM Vineet Gupta  wrote:
>
>
>
> On 3/19/24 06:10, Jeff Law wrote:
> > On 3/19/24 12:48 AM, Andrew Waterman wrote:
> >> On Mon, Mar 18, 2024 at 5:28 PM Vineet Gupta  wrote:
> >>> On 3/16/24 13:21, Jeff Law wrote:
> >>>> |   59944:add s0,sp,2047  <
> >>>> |   59948:mv  a2,a0
> >>>> |   5994c:mv  a3,a1
> >>>> |   59950:mv  a0,sp
> >>>> |   59954:li  a4,1
> >>>> |   59958:lui a1,0x1
> >>>> |   5995c:add s0,s0,1 <---
> >>>> |   59960:jal 59a3c
> >>>>
> >>>> SP here becomes unaligned, even if transitively which is undesirable as
> >>>> well as incorrect:
> >>>>- ABI requires stack to be 8 byte aligned
> >>>>- asm code looks weird and unexpected
> >>>>- to the user it might falsely seem like a compiler bug even when not,
> >>>>  specially when staring at asm for debugging unrelated issue.
> >>>> It's not ideal, but I think it's still ABI compliant as-is.  If it
> >>>> wasn't, then I suspect things like virtual origins in Ada couldn't be
> >>>> made ABI compliant.
> >>> To be clear are u suggesting ADD sp, sp, 2047 is ABI compliant ?
> >>> I'd still like to avoid it as I'm sure someone will complain about it.
> >>>
> >>>>> With the patch, we get following correct code instead:
> >>>>>
> >>>>> | ..
> >>>>> | 59944: add s0,sp,2032
> >>>>> | ..
> >>>>> | 5995c: add s0,s0,16
> >>>> Alternately you could tighten the positive side of the range of the
> >>>> splitter from patch 1/3 so that you could always use 2032 rather than
> >>>> 2047 on the first addi.   ie instead of allowing 2048..4094, allow
> >>>> 2048..4064.
> >>> 2033..4064 vs. 2048..4094
> >>>
> >>> Yeah I was a bit split about this as well. Since you are OK with either,
> >>> I'll keep them as-is and perhaps add this observation to commitlog.
> >> There's a subset of embedded use cases where an interrupt service
> >> routine continues on the same stack as the interrupted thread,
> >> requiring sp to always have an ABI-compliant value (i.e. 16B aligned,
> >> and with no important data on the stack at an address below sp).
> >>
> >> Although not all use cases care about this property, it seems more
> >> straightforward to maintain the invariant everywhere, rather than
> >> selectively enforce it.
> > Just to be clear, the changes don't misalign the stack pointer at all.
> > They merely have the potential to create *another* pointer into the
> > stack which may or may not be aligned.  Which is totally normal, it's no
> > different than taking the address of a char on the stack.
>
> Right I never saw any sp,sp,2047 getting generated - not even in the
> first version of patch which lacked any filtering of stack regs via
> riscv_reg_frame_related () and obviously didn't have the stack variant
> of splitter. I don't know if that is just being lucky and not enough
> testing exposure (I only spot checked buildroot libc, vmlinux) or
> something somewhere enforces that.
>
> However given that misaligned pointer off of stack is a non-issue, I
> think we can do the following:
>
> 1. keep just one splitter with 2047 based predicates and constraint (and
> not 2032) for both stack-related and general regs.
> 2. gate the splitter on only operands[0] being not stack related
> (currently it checks for either [0] or [1]) - this allows the prominent
> case where SP is simply a src, and avoids when any potential shenanigans
> to SP itself.

Agreed.  I misread the original code (add s0,sp,2047 looks a lot like
add sp,sp,2047 from a quick glance on a cell phone).

>
> -Vineet


Re: RISC-V: Use convert instructions instead of calling library functions

2024-03-19 Thread Andrew Waterman
On Tue, Mar 19, 2024 at 9:23 AM Palmer Dabbelt  wrote:

> On Mon, 18 Mar 2024 20:50:14 PDT (-0700), jeffreya...@gmail.com wrote:
> >
> >
> > On 3/18/24 3:09 AM, Jivan Hakobyan wrote:
> >> As RV has round instructions it is reasonable to use them instead of
> >> calling the library functions.
> >>
> >> With my patch for the following C code:
> >> double foo(double a) {
> >>  return ceil(a);
> >> }
> >>
> >> GCC generates the following ASM code (before it was tail call)
> >> foo:
> >>  fabs.d  fa4,fa0
> >>  lui a5,%hi(.LC0)
> >>  fld fa3,%lo(.LC0)(a5)
> >>  flt.d   a5,fa4,fa3
> >>  beq a5,zero,.L3
> >>  fcvt.l.d a5,fa0,rup
>
> I'm not sure exactly what context this is in, but my reading of
> "according to the current rounding mode" means we'd usually use the
> dynamic rounding mode.
>

ceil doesn't depend on the current rounding mode; rup is correct.  For
rint, you'd be correct.


>
> >>  fcvt.d.lfa4,a5
> >>  fsgnj.d fa0,fa4,fa0
> >> .L3:
> >>  ret
> >>
> >> .LC0:
> >>  .word   0
> >>  .word   1127219200 // 0x4330
> >>
> >>
> >> The patch I have evaluated on SPEC2017.
> >> Counted dynamic instructions counts and got the following improvements
> >>
> >> 510.parest_r   262 m  -
> >> 511.povray_r  2.1  b0.04%
> >> 521.wrt_r269 m   -
> >> 526.blender_r3 b 0.1%
> >> 527.cam4_r   15 b   0.6%
> >> 538.imagick_r365 b 7.6%
> >>
> >> Overall executed 385 billion fewer instructions which is 0.5%.
> > A few more notes.
> >
> > The sequence Jivan is using is derived from LLVM.  The condition in the
> > generated code tests for those values were are supposed to pass through
> > unaltered.  The condition in the pattern ensures we do something
> > sensible WRT FE_INEXACT and mirrors how other ports handle these insns.
>
> My only worry here is that when we were doing the other patterns we
> decided not to do rint.  I can't remember exactly why, but from reading
> the docs we probably just skipped it due to the inexact handling and Zfa
> having an instruction that just does this.  FP stuff is always a bit of
> a time sink, so at that point it probably just fell off the priority
> list.
>
> I'm not really an FP guy, so I usually just poke around what the other
> ports generate and try to figure out what's going on.  arm64 has the Zfa
> instruction and x86 FP is complicated, so I'm not sure exactly who else
> to look at for this sort of stuff.  From just looking at the code,
> though, I think there's two issues -- I'm not really an FP person,
> though, so take this with a grain of salt:
>
> IIUC what we've got here doesn't actually set the inexact flag for the
> bounds check failure case


I think the original code was correct.  If you exceed the bounds, then by
construction you know that the input was already an integer, and so not
setting NX is correct.  If you didn't exceed the bounds, then fcvt.l.d will
set NX when appropriate.

, as we're just loading up an exact constant
> and doing the bounds check.  We're also not clamping to INT_MAX-type
> values, but not sure if we're supposed to.  I think we could fix both of
> those by adjusting the expansion to something like
>
>   fabs.d  fa4,fa0
>   lui a5,%hi(.LC0)
>   fld fa3,%lo(.LC0)(a5)
>   flt.d   a5,fa4,fa3
>   bne a5,zero,.L3
>   mv  fa0, fa3
>  .L3:
>   fcvt.l.d a5,fa0,rup
>   fcvt.d.lfa4,a5
>   fsgnj.d fa0,fa4,fa0
>   ret
>
> and then adjusting the constant to be an epsilon larger than INT_MAX so
> it'll still trigger the clamping but also inexact.
>

ceil/rint/etc. are supposed to work for the whole range of their
floating-point type; the range of the integral types isn't supposed to
affect the result.  The original code was correct in this regard, too.


>
> There's also a pair of changes to the ISA in 2020 that added the
> conversion inexact handling requirement, it was a grey area before.  I
> don't remember exactly what happened there, but I did remember it
> happening.  I don't think anyone cares all that much about the
> performance of systems that target the older ISAs, so maybe we just
> restrict the non-libcall expansion to ISAs that contain the new wording?
>
> com

Re: [gcc-15 2/3] RISC-V: avoid LUI based const mat: keep stack offsets aligned

2024-03-19 Thread Andrew Waterman
On Mon, Mar 18, 2024 at 5:28 PM Vineet Gupta  wrote:
>
>
>
> On 3/16/24 13:21, Jeff Law wrote:
> > |   59944:add s0,sp,2047  <
> > |   59948:mv  a2,a0
> > |   5994c:mv  a3,a1
> > |   59950:mv  a0,sp
> > |   59954:li  a4,1
> > |   59958:lui a1,0x1
> > |   5995c:add s0,s0,1 <---
> > |   59960:jal 59a3c
> >
> > SP here becomes unaligned, even if transitively which is undesirable as
> > well as incorrect:
> >   - ABI requires stack to be 8 byte aligned
> >   - asm code looks weird and unexpected
> >   - to the user it might falsely seem like a compiler bug even when not,
> > specially when staring at asm for debugging unrelated issue.
> > It's not ideal, but I think it's still ABI compliant as-is.  If it
> > wasn't, then I suspect things like virtual origins in Ada couldn't be
> > made ABI compliant.
>
> To be clear are u suggesting ADD sp, sp, 2047 is ABI compliant ?
> I'd still like to avoid it as I'm sure someone will complain about it.
>
> >> With the patch, we get following correct code instead:
> >>
> >> | ..
> >> | 59944: add s0,sp,2032
> >> | ..
> >> | 5995c: add s0,s0,16
> > Alternately you could tighten the positive side of the range of the
> > splitter from patch 1/3 so that you could always use 2032 rather than
> > 2047 on the first addi.   ie instead of allowing 2048..4094, allow
> > 2048..4064.
>
> 2033..4064 vs. 2048..4094
>
> Yeah I was a bit split about this as well. Since you are OK with either,
> I'll keep them as-is and perhaps add this observation to commitlog.

There's a subset of embedded use cases where an interrupt service
routine continues on the same stack as the interrupted thread,
requiring sp to always have an ABI-compliant value (i.e. 16B aligned,
and with no important data on the stack at an address below sp).

Although not all use cases care about this property, it seems more
straightforward to maintain the invariant everywhere, rather than
selectively enforce it.

>
> Thx,
> -Vineet


Re: [PATCH 5/5] RISC-V: Support vmsxx.vx for autovec comparison of vec and imm

2024-03-01 Thread Andrew Waterman
On Fri, Mar 1, 2024 at 4:07 AM Robin Dapp  wrote:
>
> Hi Han,
>
> in addition to what Juzhe mentioned (and that late-combine is going
> to handle such cases) it should be noted that register pressure
> should not be the only consideration here.  Many uarchs have a higher
> latency for register-file-crossing moves.  At least without spilling
> the vv variant is preferable, with spilling it very much depends.

And of course there are uarches for which this is not the case (e.g.
post-commit decoupled vector unit), in which case the .vx and .vf
versions are preferable to the .vv form regardless of vector register
pressure, because they reduce vector regfile access energy (especially
if a splat can be avoided).  So it's a job for -mtune.

>
>
> Regards
>  Robin
>


Re: [PATCH] RISC-V: Don't make Ztso imply A

2023-12-15 Thread Andrew Waterman
On Fri, Dec 15, 2023 at 1:38 PM Jeff Law  wrote:
>
>
>
> On 12/12/23 20:54, Palmer Dabbelt wrote:
> > I can't actually find anything in the ISA manual that makes Ztso imply
> > A.  In theory the memory ordering is just a different thing that the set
> > of availiable instructions (ie, Ztso without A would still imply TSO for
> > loads and stores).  It also seems like a configuration that could be
> > sane to build: without A it's all but impossible to write any meaningful
> > multi-core code, and TSO is really cheap for a single core.
> >
> > That said, I think it's kind of reasonable to provide A to users asking
> > for Ztso.  So maybe even if this was a mistake it's the right thing to
> > do?
> >
> > gcc/ChangeLog:
> >
> >   * common/config/riscv/riscv-common.cc (riscv_implied_info):
> >   Remove {"ztso", "a"}.
> I'd tend to think step #1 is to determine what the ISA intent is,
> meaning engagement with RVI.
>
> We've got time for that engagement and to adjust based on the result.
> So I'd tend to defer until we know if Ztso should imply A or not.

Palmer is correct.  There is no coupling between Ztso and A.  (And
there are uncontrived examples of such systems: e.g. embedded
processors without caches that don't support the LR/SC instructions,
but happen to be TSO.)

>
> jeff


Re: [PATCH] RISC-V: Fix wrong tune parameters on int_div

2023-10-27 Thread Andrew Waterman
On Fri, Oct 27, 2023 at 6:44 AM Jeff Law  wrote:
>
>
>
> On 10/27/23 01:37, juzhe.zh...@rivai.ai wrote:
> > LGTM from my side.
> >
> > The original integer division COST seems too low.
> Almost certainly, though there may be good reasons why it was initially
> set so low.  I'm generally hesitant to change things like that without
> either someone with knowledge of the code/uarch stepping in with a
> recommendation or some kind of analysis showing their wrong.

It is probably just a bug, as the cores I'm familiar with mentioned in
the original post do benefit from converting division by constant into
multiplication.  (We should make sure we continue emitting a division
instruction when optimizing for size, though.)

>
> >
> > Hi, Jeff and Kito. Could take a look at this patch ?
> It's on the list.
>
> jeff


Re: [PATCH] RISC-V: Fix wrong tune parameters on int_div

2023-10-27 Thread Andrew Waterman
On Fri, Oct 27, 2023 at 6:55 AM Jeff Law  wrote:
>
>
>
> On 10/27/23 01:49, Robin Dapp wrote:
> >> @@ -346,7 +346,7 @@ static const struct riscv_tune_param rocket_tune_info 
> >> = {
> >> {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},  /* fp_mul */
> >> {COSTS_N_INSNS (20), COSTS_N_INSNS (20)},/* fp_div */
> >> {COSTS_N_INSNS (4), COSTS_N_INSNS (4)},  /* int_mul */
> >> -  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},   /* int_div */
> >> +  {COSTS_N_INSNS (33), COSTS_N_INSNS (65)}, /* int_div */
> >> 1,   /* issue_rate */
> >> 3,   /* branch_cost */
> >> 5,   /* memory_cost */
> >> @@ -361,7 +361,7 @@ static const struct riscv_tune_param 
> >> sifive_7_tune_info = {
> >> {COSTS_N_INSNS (4), COSTS_N_INSNS (5)},  /* fp_mul */
> >> {COSTS_N_INSNS (20), COSTS_N_INSNS (20)},/* fp_div */
> >> {COSTS_N_INSNS (4), COSTS_N_INSNS (4)},  /* int_mul */
> >> -  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)},   /* int_div */
> >> +  {COSTS_N_INSNS (33), COSTS_N_INSNS (65)}, /* int_div */
> >> 2,   /* issue_rate */
> >> 4,   /* branch_cost */
> >> 3,   /* memory_cost */
> >> @@ -376,7 +376,7 @@ static const struct riscv_tune_param 
> >> thead_c906_tune_info = {
> >> {COSTS_N_INSNS (4), COSTS_N_INSNS (5)}, /* fp_mul */
> >> {COSTS_N_INSNS (20), COSTS_N_INSNS (20)}, /* fp_div */
> >> {COSTS_N_INSNS (4), COSTS_N_INSNS (4)}, /* int_mul */
> >> -  {COSTS_N_INSNS (6), COSTS_N_INSNS (6)}, /* int_div */
> >> +  {COSTS_N_INSNS (18), COSTS_N_INSNS (34)}, /* int_div */
> >> 1,/* issue_rate */
> >> 3,/* branch_cost */
> >> 5,/* memory_cost */
> >
> > Instruction costs don't really correspond to latencies even though
> > sometimes they are used as if they were.  I'm a bit wary of using
> > e.g. 65 which would disparage each use of an integer division inside
> > a sequence.
> >
> > Could you check which costs we need in order to still emit your wanted
> > sequence?  Maybe we can use values a bit lower than yours and still
> > get the proper code.  Where is the decision being made actually?
> The main use of costing of a div/mod instruction is to guide the
> reciprocal division code when dividing by a constant.In that context
> we're comparing costs against a sequence of multiplies, shifts, add/sub
> insns which are almost always costed by their latency.  So using latency
> for division is a reasonable place to start.
>
> The other thing that might be worth investigating for those processors
> would be to set "use_divmod_expansion" in the cost structure.  I've
> heard talk of fusing div/mod into divmod, though I'm not aware of any
> part implementing that fusion

I'm also unaware of existing implementations that fuse these
operations; div + mul + sub is probably best for most uarches...

> (from a prior life, that would seem to
> require a 2nd output port on the integer unit which could be highly
> undesirable).

...but it can be done more cheaply than this, so I wouldn't foreclose
on the possibility.  Nevertheless, future work, as you say.

> Anyway, this could be a followup item for Yangyu if it
> looks profitable.
>
> jeff


Re: [committed] [PR target/93062] RISC-V: Handle long conditional branches for RISC-V

2023-10-11 Thread Andrew Waterman
On Tue, Oct 10, 2023 at 8:26 PM Jeff Law  wrote:
>
>
>
> On 10/10/23 18:24, Andrew Waterman wrote:
> > I remembered another concern since we discussed this patch privately.
> > Using ra for long calls results in a sequence that will corrupt the
> > return-address stack.
> Yup.  We've actually got data on that internally, it's not showing up in
> a significant way in practice.
>
>
>I know nothing
> > about the complexity of register scavenging, but it would be nice to
> > opportunistically use a scratch register (other than t0), falling back
> > to ra only when necessary.
> The nice thing about making $ra fixed is some can add a register
> scavenging approach, then fall back to $ra if they're unable to find a
> register to reuse.
>
> >
> > Tangentially, I noticed the patch uses `jump label, ra' for far
> > branches but uses `call label' for far jumps.  These corrupt the RAS
> > in opposite ways (the former pops the RAS and the latter pushes it.
> > Any reason for using a different sequence in one than the other?
> I'd noticed it as well -- that's the way it was in the patch that was
> already in Ventana's tree ;-)  My plan was to address that separately
> after dropping in enough infrastructure to allow me to force everything
> to be far branches for testing purposes.

Sounds like we're thinking many of the same thoughts... thanks for
dragging this patch towards the finish line!

>
> jeff


Re: [committed] [PR target/93062] RISC-V: Handle long conditional branches for RISC-V

2023-10-10 Thread Andrew Waterman
I remembered another concern since we discussed this patch privately.
Using ra for long calls results in a sequence that will corrupt the
return-address stack.  Corrupting the RAS is potentially more costly
than mispredicting a branch, since it can result in a cascading
sequence of mispredictions as the program returns up the stack.  Of
course, if these long calls are dynamically quite rare, this isn't the
end of the world.  But it's always preferable to use a register other
than ra or t0 to avoid this performance reduction.  I know nothing
about the complexity of register scavenging, but it would be nice to
opportunistically use a scratch register (other than t0), falling back
to ra only when necessary.

Tangentially, I noticed the patch uses `jump label, ra' for far
branches but uses `call label' for far jumps.  These corrupt the RAS
in opposite ways (the former pops the RAS and the latter pushes it.
Any reason for using a different sequence in one than the other?



On Tue, Oct 10, 2023 at 3:11 PM Jeff Law  wrote:
>
>
> Ventana has had a variant of this patch from Andrew W. in its tree for
> at least a year.   I'm dusting it off and submitting it on Andrew's behalf.
>
> There's multiple approaches we could be using here.
>
> First we could make $ra fixed and use it as the scratch register for the
> long branch sequences.
>
> Second, we could add a match_scratch to all the conditional branch
> patterns and allow the register allocator to assign the scratch register
> from the pool of GPRs.
>
> Third we could do register scavenging.  This can usually work, though it
> can get complex in some scenarios.
>
> Forth we could use trampolines for extended reach.
>
> Andrew's original patch did a bit of the first approach (make $ra fixed)
> and mostly the second approach.  The net is it was probably the worst in
> terms of impacting code generation -- we lost a register *and* forced
> every branch instruction to get a scratch register allocated.
>
> I had expected the second approach to produce better code than the
> first, but that wasn't actually the case in practice.  It's probably a
> combination of allocating a GPR at every branch point (even with a life
> of a single insn, there's a cost) and perhaps the additional operands on
> conditional branches spoiling simplistic pattern matching in one or more
> passes.
>
> In addition to performing better based on dynamic instruction counts,
> the first approach is significantly simpler to implement.  Given those
> two positives, that's what I've chosen to go with.  Yes it does remove
> $ra from the set of registers available, but the impact of that is *tiny*.
>
> If someone wanted to dive into one of the other approaches to address a
> real world impact, that's great.  If that happens I would strongly
> suggest also evaluating perlbench from spec2017.  It seems particularly
> sensitive to this issue in terms of approach #2's impact on code generation.
>
> I've built & regression tested this variant on the vt1 configuration
> without regressions.  Earlier versions have been bootstrapped as well.
>
> Pushed to the trunk,
>
> Jeff
>


Re: RISC-V sign extension query

2023-09-18 Thread Andrew Waterman via Gcc-patches
Vineet,

Your understanding of the ABI is correct; both int and unsigned int
arguments must already be sign-extended.  The sext.w is semantically
unnecessary; the bltu could correctly reference a1 instead of a6.

Good luck eliminating it!

Andrew


On Mon, Sep 18, 2023 at 12:45 PM Vineet Gupta  wrote:
>
> Hi Jeff, Andrew
>
> I've been looking into redundant sign extension and while there are
> things to be improved in REE, there's something I wanted to confirm
> before heading off into the weeds.
>
> Consider the test below:
>
> int foo(int unused, int n, unsigned y, unsigned delta){
>int s = 0;
>unsigned int x = 0;// if int, sext elided
>for (;x  s += x+y;
>return s;
> }
>
> -O2 -march=rv64gc_zba_zbb_zbs
>
> foo2:
>  sext.wa6,a1# 1
>  beqa1,zero,.L4
>  lia5,0
>  lia0,0
> .L3:
>  addwa4,a2,a5
>  addwa5,a3,a5
>  addwa0,a4,a0
>  bltua5,a6,.L3
>  ret
> .L4:
>  lia0,0
>  ret
>
> I believe the SEXT.W is not semantically needed as a1 is supposed to be
> sign extended already at call site as per psABI [1]. I quote
>
>  "When passed in registers or on the stack, integer scalars narrower
> than XLEN bits are widened according to the sign of their type up to 32
> bits, then sign-extended to XLEN bits"
>
> However currently RISC-V backend thinks otherwise: changing @x to int,
> causes the the sign extend to go away. I think both the cases should
> behave the same (and not generate SEXT.w) given the ABI clause above.
> Note that this manifests in initial RTL expand itself
> generating/or-not-generating the sign_extend so if it is unnecessary we
> can avoid late fixups in REE.
>
> So the questions is for you to confirm if my conclusions above are correct.
>
> For the cases which do require sign extends, but not being eliminated
> due to "missing definition(s)" I'm working on adapting Ajit's REE ABI
> interfaces work [2] to work for RISC-V as well.
>
> Thx,
> -Vineet
>
> [1]
> https://github.com/riscv-non-isa/riscv-elf-psabi-doc/blob/master/riscv-cc.adoc
> [2] https://gcc.gnu.org/pipermail/gcc-patches/2023-September/630713.html


Re: RISC-V: Replace not + bitwise_imm with li + bitwise_not

2023-09-11 Thread Andrew Waterman via Gcc-patches
Note this is a size-speed tradeoff, as the Zcb extension has a
16-bit-wide C.NOT instruction.  Might want to suppress this
optimization when Zcb is present and the function is being optimized
for size.


On Mon, Sep 11, 2023 at 9:52 AM Jivan Hakobyan via Gcc-patches
 wrote:
>
> In the case when we have C code like this
>
> int foo (int a) {
>return 100 & ~a;
> }
>
> GCC generates the following instruction sequence
>
> foo:
>  not a0,a0
>  andia0,a0,100
>  ret
>
> This patch replaces that with this sequence
> foo:
>  li a5,100
>  andn a0,a5,a0
>  ret
>
> The profitability comes from an out-of-order processor being able to
> issue the "li a5, 100" at any time after it's fetched while "not a0, a0" has
> to wait until any prior setter of a0 has reached completion.
>
>
> gcc/ChangeLog:
> * config/riscv/bitmanip.md (*_not_const): New split
> pattern.
>
> gcc/testsuite/ChangeLog:
> * gcc.target/riscv/zbb-andn-orn-01.c: New test.
> * gcc.target/riscv/zbb-andn-orn-02.c: Likewise.
>
>
> --
> With the best regards
> Jivan Hakobyan


Re: [RFC PATCH 0/2] RISC-V: __builtin_riscv_pause for all environment

2023-08-13 Thread Andrew Waterman via Gcc-patches
On Sun, Aug 13, 2023 at 12:53 PM Philipp Tomsich
 wrote:
>
> On Sat, 12 Aug 2023 at 01:31, Jeff Law via Gcc-patches
>  wrote:
> >
> >
> >
> > On 8/9/23 16:39, Tsukasa OI wrote:
> > > On 2023/08/10 5:05, Jeff Law wrote:
> >
> > >> I'd tend to think we do not want to expose the intrinsic unless the
> > >> right extensions are enabled -- even though the encoding is a no-op and
> > >> we could emit it as a .insn.
> > >
> > > I think that makes sense.  The only reason I implemented the
> > > no-'Zihintpause' version is because GCC 13 implemented the built-in
> > > unconditionally.  If the compatibility breakage is considered minimum (I
> > > don't know, though), I'm ready to submit 'Zihintpause'-only version of
> > > this patch set.
> > While it's a compatibility break I don't think we have a need to
> > preserve this kind of compatibility.  I suspect anyone using
> > __builtin_riscv_pause was probably already turning on Zihintpause and if
> > they weren't they should have been :-0
> >
> >
> > I'm sure we'll kick this around in the Tuesday meeting and hopefully
> > make a decision about the desired direction.  You're obviously welcome
> > to join if you're inclined.  Let me know if you need an invite.
>
> The original discussion (and I believe that Andrew was the decisive
> voice in the end) came to the conclusion that—given that pause is a
> true hint—it could always be enabled.

I continue to think that, since it's semantically valid to execute a
HINT on any implementation, there's little utility in ever rejecting
the HINT builtins, or in rejecting explicit HINTs in asm, irrespective
of -march.  But ultimately it isn't a big deal either way.

> We had originally expected to enable it only if Zihintpause was part
> of the target architecture, but viewing it as "just a name for an
> already existing pure hint" also made sense.
> Note that on systems that don't implement Zihintpause, the hint is
> guarantueed to not have an architectural effect.
>
> That said, I don't really have a strong leaning one way or another.
> Philipp.


Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]

2023-07-01 Thread Andrew Waterman via Gcc-patches
On Sat, Jul 1, 2023 at 7:04 AM Jeff Law  wrote:
>
>
>
> On 7/1/23 02:00, Andrew Waterman wrote:
>
> >
> > Yeah, that might end up being a false economy for superscalars.
> >
> > In general, I wouldn't recommend spending too many cleverness beans on
> > non-Zba+Zbb implementations.  Going forward, we should expect that
> > even very simple cores provide those extensions.
> I suspect you under-estimate how difficult it is to get the distros to
> move forward on baseline ISAs.

Yeah, true.

>
> jeff


Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]

2023-07-01 Thread Andrew Waterman via Gcc-patches
On Fri, Jun 30, 2023 at 5:36 PM Palmer Dabbelt  wrote:
>
> On Fri, 30 Jun 2023 17:25:54 PDT (-0700), Andrew Waterman wrote:
> > On Fri, Jun 30, 2023 at 5:13 PM Vineet Gupta  wrote:
> >>
> >>
> >>
> >> On 6/30/23 16:50, Andrew Waterman wrote:
> >> > I don't believe this is correct; the subtraction is needed to account
> >> > for the fact that the low part might be negative, resulting in a
> >> > borrow from the high part.  See the output for your test case below:
> >> >
> >> > $ cat test.c
> >> > #include 
> >> >
> >> > int main()
> >> > {
> >> >unsigned long result, tmp;
> >> >
> >> > asm (
> >> >"li  %1,-252645376\n"
> >> >"addi%1,%1,240\n"
> >> >"slli%0,%1,32\n"
> >> >"add %0,%0,%1"
> >> >  : "=r" (result), "=r" (tmp));
> >> >
> >> >printf("%lx\n", result);
> >> >
> >> >return 0;
> >> > }
> >> > $ riscv64-unknown-elf-gcc -O2 test.c
> >> > $ spike pk a.out
> >> > bbl loader
> >> > f0f0f0eff0f0f0f0
> >> > $
> >>
> >> Thx for the quick feedback Andew. I'm clearly lacking in signed math :-(
> >> So is it possible to have a better code seq for the testcase at all ?
> >
> > You're welcome!
> >
> > When Zba is implemented, then inserting a zext.w would do the trick;
> > see below.  (The generalization is that the zext.w is needed if the
> > 32-bit constant is negative.)  When Zba is not implemented, I think
> > the original sequence is optimal.
> >
> > li  a5, -252645376
> > addia5, a5, 240
> > sllia0, a5, 32
> > zext.w  a5, a5
> > add a0, a0, a5
>
> For the non-Zba case, I think we can leverage the two high parts
> starting out the same to save an instruction generating the constant.
> So for the original code sequence of
>
> li  a5,-252645376
> addia5,a5,241
> li  a0,-252645376
> sllia5,a5,32
> addia0,a0,240
> add a0,a5,a0
> ret
>
> we could instead generate
>
> li  a5,-252645376
> addia0,a5,240
> addia5,a5,241
> sllia5,a5,32
> add a0,a5,a0
> ret
>
> which IIUC produces the same result.  I think something along the lines
> of this (with the corresponding cost function updates) would do it
>
> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> index de578b5b899..32b6033a966 100644
> --- a/gcc/config/riscv/riscv.cc
> +++ b/gcc/config/riscv/riscv.cc
> @@ -704,7 +704,13 @@ riscv_split_integer (HOST_WIDE_INT val, machine_mode 
> mode)
>rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
>
>riscv_move_integer (hi, hi, hival, mode);
> -  riscv_move_integer (lo, lo, loval, mode);
> +  if (riscv_integer_cost (loval - hival) + 1 < riscv_integer_cost 
> (loval)) {
> +rtx delta = gen_reg_rrtx (mode);
> +riscv_move_integer (delta, delta, loval - hival, mode);
> +lo = gen_rtx_fmt_ee (PLUS, mode, hi, delta);
> +  } else {
> +riscv_move_integer (lo, lo, loval, mode);
> +  }
>
>hi = gen_rtx_fmt_ee (ASHIFT, mode, hi, GEN_INT (32));
>hi = force_reg (mode, hi);
>
> though I suppose that would produce a slightly different sequence that has the
> same number of instructions but a slightly longer dependency chain, something
> more like
>
> li  a5,-252645376
> addia5,a5,241
> addia0,a5,-1
> sllia5,a5,32
> add a0,a5,a0
> ret
>
> Take that all with a grain of salt, though, as I just ate some very spicy
> chicken and can barely see straight :)

Yeah, that might end up being a false economy for superscalars.

In general, I wouldn't recommend spending too many cleverness beans on
non-Zba+Zbb implementations.  Going forward, we should expect that
even very simple cores provide those extensions.

>
>
> >
> >
> >>
> >> -Vineet
> >>
> >> >
> >> >
> >> > On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta  
> >> > wrote:
> >> >>
> >> >>
> >> >> On 6/30/23 16:33, Vineet Gupta wrote:
> >> >>> Ran into a minor snafu in const splitting code when playing with test
> >> &

Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]

2023-06-30 Thread Andrew Waterman via Gcc-patches
On Fri, Jun 30, 2023 at 5:13 PM Vineet Gupta  wrote:
>
>
>
> On 6/30/23 16:50, Andrew Waterman wrote:
> > I don't believe this is correct; the subtraction is needed to account
> > for the fact that the low part might be negative, resulting in a
> > borrow from the high part.  See the output for your test case below:
> >
> > $ cat test.c
> > #include 
> >
> > int main()
> > {
> >unsigned long result, tmp;
> >
> > asm (
> >"li  %1,-252645376\n"
> >"addi%1,%1,240\n"
> >"slli%0,%1,32\n"
> >"add %0,%0,%1"
> >  : "=r" (result), "=r" (tmp));
> >
> >printf("%lx\n", result);
> >
> >return 0;
> > }
> > $ riscv64-unknown-elf-gcc -O2 test.c
> > $ spike pk a.out
> > bbl loader
> > f0f0f0eff0f0f0f0
> > $
>
> Thx for the quick feedback Andew. I'm clearly lacking in signed math :-(
> So is it possible to have a better code seq for the testcase at all ?

You're welcome!

When Zba is implemented, then inserting a zext.w would do the trick;
see below.  (The generalization is that the zext.w is needed if the
32-bit constant is negative.)  When Zba is not implemented, I think
the original sequence is optimal.

li  a5, -252645376
addia5, a5, 240
sllia0, a5, 32
zext.w  a5, a5
add a0, a0, a5


>
> -Vineet
>
> >
> >
> > On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta  wrote:
> >>
> >>
> >> On 6/30/23 16:33, Vineet Gupta wrote:
> >>> Ran into a minor snafu in const splitting code when playing with test
> >>> case from an old PR/23813.
> >>>
> >>>long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }
> >>>
> >>> This currently generates
> >>>
> >>>li  a5,-252645376
> >>>addia5,a5,241
> >>>li  a0,-252645376
> >>>sllia5,a5,32
> >>>addia0,a0,240
> >>>add a0,a5,a0
> >>>ret
> >>>
> >>> The signed math in hival extraction introduces an additional bit,
> >>> causing loval == hival check to fail.
> >>>
> >>> | riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at 
> >>> ../gcc/config/riscv/riscv.cc:702
> >>> | 702   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> >>> | (gdb)n
> >>> | 703   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> >>> | (gdb)
> >> FWIW (and I missed adding this observation to the changelog) I pondered
> >> about using unsigned loval/hival with zext_hwi() but that in certain
> >> cases can cause additional insns
> >>
> >> e.g. constant 0x8000_ is codegen to LI 1 +SLLI 31 vs, LI
> >> 0x_8000
> >>
> >>
> >>> | 704   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
> >>> | (gdb) p/x val
> >>> | $2 = 0xf0f0f0f0f0f0f0f0
> >>> | (gdb) p/x loval
> >>> | $3 = 0xf0f0f0f0
> >>> | (gdb) p/x hival
> >>> | $4 = 0xf0f0f0f1
> >>>  ^^^
> >>> Fix that by eliding the subtraction in shift.
> >>>
> >>> With patch:
> >>>
> >>>li  a5,-252645376
> >>>addia5,a5,240
> >>>sllia0,a5,32
> >>>add a0,a0,a5
> >>>ret
> >>>
> >>> gcc/ChangeLog:
> >>>
> >>>* config/riscv/riscv.cc (riscv_split_integer): hival computation
> >>>  do elide subtraction of loval.
> >>>* (riscv_split_integer_cost): Ditto.
> >>>* (riscv_build_integer): Ditto
> >>>
> >>> Signed-off-by: Vineet Gupta 
> >>> ---
> >>> I wasn't planning to do any more work on large const stuff, but just ran 
> >>> into it this
> >>> on a random BZ entry when trying search for redundant constant stuff.
> >>> The test seemed too good to pass :-)
> >>> ---
> >>>gcc/config/riscv/riscv.cc | 6 +++---
> >>>1 file changed, 3 insertions(+), 3 deletions(-)
> >>>
> >>> diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> >>> index 5ac187c1b1b4..377d3aac794b 100644
> >>> --- a/gcc/config/riscv/riscv.cc
> >>> +++ b/gcc/config/riscv/riscv.cc

Re: [PATCH] RISC-V: improve codegen for repeating large constants [3]

2023-06-30 Thread Andrew Waterman via Gcc-patches
I don't believe this is correct; the subtraction is needed to account
for the fact that the low part might be negative, resulting in a
borrow from the high part.  See the output for your test case below:

$ cat test.c
#include 

int main()
{
  unsigned long result, tmp;

asm (
  "li  %1,-252645376\n"
  "addi%1,%1,240\n"
  "slli%0,%1,32\n"
  "add %0,%0,%1"
: "=r" (result), "=r" (tmp));

  printf("%lx\n", result);

  return 0;
}
$ riscv64-unknown-elf-gcc -O2 test.c
$ spike pk a.out
bbl loader
f0f0f0eff0f0f0f0
$


On Fri, Jun 30, 2023 at 4:42 PM Vineet Gupta  wrote:
>
>
>
> On 6/30/23 16:33, Vineet Gupta wrote:
> > Ran into a minor snafu in const splitting code when playing with test
> > case from an old PR/23813.
> >
> >   long long f(void) { return 0xF0F0F0F0F0F0F0F0ull; }
> >
> > This currently generates
> >
> >   li  a5,-252645376
> >   addia5,a5,241
> >   li  a0,-252645376
> >   sllia5,a5,32
> >   addia0,a0,240
> >   add a0,a5,a0
> >   ret
> >
> > The signed math in hival extraction introduces an additional bit,
> > causing loval == hival check to fail.
> >
> > | riscv_split_integer (val=-1085102592571150096, mode=E_DImode) at 
> > ../gcc/config/riscv/riscv.cc:702
> > | 702   unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> > | (gdb)n
> > | 703   unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> > | (gdb)
>
> FWIW (and I missed adding this observation to the changelog) I pondered
> about using unsigned loval/hival with zext_hwi() but that in certain
> cases can cause additional insns
>
> e.g. constant 0x8000_ is codegen to LI 1 +SLLI 31 vs, LI
> 0x_8000
>
>
> > | 704   rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
> > | (gdb) p/x val
> > | $2 = 0xf0f0f0f0f0f0f0f0
> > | (gdb) p/x loval
> > | $3 = 0xf0f0f0f0
> > | (gdb) p/x hival
> > | $4 = 0xf0f0f0f1
> > ^^^
> > Fix that by eliding the subtraction in shift.
> >
> > With patch:
> >
> >   li  a5,-252645376
> >   addia5,a5,240
> >   sllia0,a5,32
> >   add a0,a0,a5
> >   ret
> >
> > gcc/ChangeLog:
> >
> >   * config/riscv/riscv.cc (riscv_split_integer): hival computation
> > do elide subtraction of loval.
> >   * (riscv_split_integer_cost): Ditto.
> >   * (riscv_build_integer): Ditto
> >
> > Signed-off-by: Vineet Gupta 
> > ---
> > I wasn't planning to do any more work on large const stuff, but just ran 
> > into it this
> > on a random BZ entry when trying search for redundant constant stuff.
> > The test seemed too good to pass :-)
> > ---
> >   gcc/config/riscv/riscv.cc | 6 +++---
> >   1 file changed, 3 insertions(+), 3 deletions(-)
> >
> > diff --git a/gcc/config/riscv/riscv.cc b/gcc/config/riscv/riscv.cc
> > index 5ac187c1b1b4..377d3aac794b 100644
> > --- a/gcc/config/riscv/riscv.cc
> > +++ b/gcc/config/riscv/riscv.cc
> > @@ -643,7 +643,7 @@ riscv_build_integer (struct riscv_integer_op *codes, 
> > HOST_WIDE_INT value,
> > && (value > INT32_MAX || value < INT32_MIN))
> >   {
> > unsigned HOST_WIDE_INT loval = sext_hwi (value, 32);
> > -  unsigned HOST_WIDE_INT hival = sext_hwi ((value - loval) >> 32, 32);
> > +  unsigned HOST_WIDE_INT hival = sext_hwi (value >> 32, 32);
> > struct riscv_integer_op alt_codes[RISCV_MAX_INTEGER_OPS];
> > struct riscv_integer_op hicode[RISCV_MAX_INTEGER_OPS];
> > int hi_cost, lo_cost;
> > @@ -674,7 +674,7 @@ riscv_split_integer_cost (HOST_WIDE_INT val)
> >   {
> > int cost;
> > unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> > -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> > +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
> > struct riscv_integer_op codes[RISCV_MAX_INTEGER_OPS];
> >
> > cost = 2 + riscv_build_integer (codes, loval, VOIDmode);
> > @@ -700,7 +700,7 @@ static rtx
> >   riscv_split_integer (HOST_WIDE_INT val, machine_mode mode)
> >   {
> > unsigned HOST_WIDE_INT loval = sext_hwi (val, 32);
> > -  unsigned HOST_WIDE_INT hival = sext_hwi ((val - loval) >> 32, 32);
> > +  unsigned HOST_WIDE_INT hival = sext_hwi (val >> 32, 32);
> > rtx hi = gen_reg_rtx (mode), lo = gen_reg_rtx (mode);
> >
> > riscv_move_integer (lo, lo, loval, mode);
>


Re: [PATCH] RISC-V: Synthesize power-of-two constants.

2023-05-30 Thread Andrew Waterman via Gcc-patches
This turns out to be a de-optimization for implementations with any
amount of temporal execution (which is most machines with LMUL > 1 and
even some machines with LMUL <= 1).  Scalar instructions are generally
cheaper than multi-cycle-occupancy vector operations, so reducing
scalar work by increasing vector work is normally not a good tradeoff.
(And even if the vector instruction has unit occupancy, it likely
burns a bit more energy.)  The best generic scheme to load 143 into
all elements of a vector register is to first load 143 into a scalar
register, then use vmv.v.x.  If the proposed scheme is profitable on
some implementations in some circumstances, it should probably be
enabled only when tuning for that implementation.


On Tue, May 30, 2023 at 12:14 PM Robin Dapp via Gcc-patches
 wrote:
>
> Hi,
>
> I figured I'd send this patch that I quickly hacked together some
> days back.  It's likely going to be controversial because we don't
> have vector costs in place at all yet and even with costs it's
> probably debatable as the emitted sequence is longer :)
> I'm willing to defer or ditch it altogether but as it's small and
> localized why not at least discuss it quickly.
>
> For immediates that are powers of two, instead of loading them into a
> GPR and then broadcasting (incurring the scalar-vector latency) we
> can synthesize them with a vmv.vi and a vsll.v.i.  Depending on actual
> costs we could also add more complicated synthesis patterns in the
> future.
>
> Regards
>  Robin
>
> gcc/ChangeLog:
>
> * config/riscv/riscv-selftests.cc (run_const_vector_selftests):
> Adjust expectation.
> * config/riscv/riscv-v.cc (expand_const_vector): Synthesize
> power-of-two constants.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv32.c: Adjust test
> expectation.
> * gcc.target/riscv/rvv/autovec/vmv-imm-fixed-rv64.c: Dito.
> * gcc.target/riscv/rvv/autovec/vmv-imm-rv32.c: Dito.
> * gcc.target/riscv/rvv/autovec/vmv-imm-rv64.c: Dito.
> ---
>  gcc/config/riscv/riscv-selftests.cc   |  9 +-
>  gcc/config/riscv/riscv-v.cc   | 31 +++
>  .../riscv/rvv/autovec/vmv-imm-fixed-rv32.c|  5 +--
>  .../riscv/rvv/autovec/vmv-imm-fixed-rv64.c|  5 +--
>  .../riscv/rvv/autovec/vmv-imm-rv32.c  |  5 +--
>  .../riscv/rvv/autovec/vmv-imm-rv64.c  |  5 +--
>  6 files changed, 51 insertions(+), 9 deletions(-)
>
> diff --git a/gcc/config/riscv/riscv-selftests.cc 
> b/gcc/config/riscv/riscv-selftests.cc
> index 1bf1a648fa1..21fa460bb1f 100644
> --- a/gcc/config/riscv/riscv-selftests.cc
> +++ b/gcc/config/riscv/riscv-selftests.cc
> @@ -259,9 +259,16 @@ run_const_vector_selftests (void)
>   rtx_insn *insn = get_last_insn ();
>   rtx src = XEXP (SET_SRC (PATTERN (insn)), 1);
>   /* 1. Should be vmv.v.i for in rang of -16 ~ 15.
> -2. Should be vmv.v.x for exceed -16 ~ 15.  */
> +2. For 16 (and appropriate higher powers of two)
> +   expect a shift because we emit a
> +   vmv.v.i v1, 8 and a
> +   vsll.v.i v1, v1, 1.
> +3. Should be vmv.v.x for everything else.  */
>   if (IN_RANGE (val, -16, 15))
> ASSERT_TRUE (rtx_equal_p (src, dup));
> + else if (IN_RANGE (val, 16, 16))
> +   ASSERT_TRUE (GET_CODE (src) == ASHIFT
> +&& INTVAL (XEXP (src, 1)) == 1);
>   else
> ASSERT_TRUE (
>   rtx_equal_p (src,
> diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc
> index b381970140d..b295a48bb9d 100644
> --- a/gcc/config/riscv/riscv-v.cc
> +++ b/gcc/config/riscv/riscv-v.cc
> @@ -560,6 +560,7 @@ expand_const_vector (rtx target, rtx src)
>rtx elt;
>if (const_vec_duplicate_p (src, ))
>  {
> +  HOST_WIDE_INT val = INTVAL (elt);
>rtx tmp = register_operand (target, mode) ? target : gen_reg_rtx 
> (mode);
>/* Element in range -16 ~ 15 integer or 0.0 floating-point,
>  we use vmv.v.i instruction.  */
> @@ -568,6 +569,36 @@ expand_const_vector (rtx target, rtx src)
>   rtx ops[] = {tmp, src};
>   emit_vlmax_insn (code_for_pred_mov (mode), RVV_UNOP, ops);
> }
> +  /* If we can reach the immediate by loading an immediate and shifting,
> +assume this is cheaper than loading a scalar.
> +A power-of-two value > 15 cannot be loaded with vmv.v.i but we can
> +load 8 into a vector register and shift it.  */
> +  else if (val > 15 && wi::popcount (val) == 1
> +  && exact_log2 (val) - 3 /* exact_log2 (8)  */
> +  <= 15)
> +   {
> + /* We could also allow shifting an immediate and adding
> +another one if VAL is suitable.
> +This would allow us to synthesize constants like
> +

Re: [RFC] RISC-V: Add proposed Ztso atomic mappings

2023-05-05 Thread Andrew Waterman via Gcc-patches
On Fri, May 5, 2023 at 2:42 PM Hans Boehm  wrote:
>
> I think A.6-tso also needs to change the last line in the table from lr.aqrl 
> ... sc to lr.aq ... sc.rl, otherwise I think we have problems with a 
> subsequent A.7-tso generated l.aq . Otherwise I agree.
>
> I certainly agree that, given the Ztso extension, there should be a standard 
> compiler-implemented mapping that leverages it. I'm personally much less 
> enthusiastic about calling it an ABI. I'd like to see clarity that the RVWMO 
> ABI is the standard we expect portable libraries to be prepared to use.

There's already a ratified sentiment that effectively implies this.
Ztso is not required by the RVA profiles, and so it follows that any
binary that's compatible across RVA-profile implementations cannot
assume the presence of Ztso.  (I agree the ABI should encode this
property for de jure purposes, too, but it's already a de facto
requirement.)

> If they want to test for and use Ztso internally, fine. But having users deal 
> with two different ABIs seems like a very high cost for avoiding some 
> (basically no-op?) fences.
>
> Hans
>
>
>
> On Fri, May 5, 2023 at 1:11 PM Andrea Parri  wrote:
>>
>> On Fri, May 05, 2023 at 12:18:12PM -0700, Palmer Dabbelt wrote:
>> > On Fri, 05 May 2023 11:55:31 PDT (-0700), Andrea Parri wrote:
>> > > On Fri, May 05, 2023 at 10:12:56AM -0700, Patrick O'Neill wrote:
>> > > > The RISC-V Ztso extension currently has no effect on generated code.
>> > > > With the additional ordering constraints guarenteed by Ztso, we can 
>> > > > emit
>> > > > more optimized atomic mappings than the RVWMO mappings.
>> > > >
>> > > > This patch implements Andrea Parri's proposed Ztso mappings ("Proposed
>> > > > Mapping").
>> > > >   
>> > > > https://github.com/preames/public-notes/blob/master/riscv-tso-mappings.rst
>> > > >
>> > > > LLVM has implemented this same mapping (Ztso is still behind a
>> > > > experimental flag in LLVM, so there is *not* a defined ABI for this 
>> > > > yet).
>> > > >   https://reviews.llvm.org/D143076
>> > >
>> > > Given the recent patches/discussions, it seems worth pointing out the
>> > > the Ztso mappings referred to above was designed to be compatible with
>> > > the mappings in Table A.6 and that they are _not_ compatible with the
>> > > mappings in Table A.7 or with a "subset" of A.7 (even assuming RVTSO).
>> >
>> > I guess that brings up the question of what we should do about WMO/TSO
>> > compatibility.  IIUC the general plan has been that WMO binaries would be
>> > compatible with TSO binaries when run on TSO systems, and that TSO binaries
>> > would require TSO systems.
>> >
>> > I suppose it would be possible to have TSO produce binaries that would run
>> > on WMO systems by just emitting a bunch of extra fences, but I don't think
>> > anyone wants that?
>> >
>> > We've always just assumed that WMO binaries would be compatible with TSO
>> > binaries, but I don't think it's ever really been concretely discussed.
>> > Having an ABI break here wouldn't be the craziest idea as it'd let us fix
>> > some other issues, but that'd certainly need to be pretty widely discussed.
>> >
>> > Do we have an idea of what A.7-compatible TSO mappings would look like?
>>
>> As in riscv-tso-mappings.rst but with
>>
>>   atomic_store(memory_order_seq_cst)  |  s{b|h|w|d} ; fence rw,rw
>>
>> would be A.7-compatible: call the resulting mappings "A.6-tso".
>>
>> A.6-tso is (also) compatible with the following subset of A.7:
>>
>> C/C++ Construct | A.7-tso Mapping
>> --
>> Non-atomic load | l{b|h|w|d}
>> atomic_load(memory_order_relaxed| l{b|h|w|d}
>> atomic_load(memory_order_acquire)   | l{b|h|w|d}
>> atomic_load(memory_order_seq_cst)   | l{b|h|w|d}.aq
>> --
>> Non-atomic store| s{b|h|w|d}
>> atomic_store(memory_order_relaxed)  | s{b|h|w|d}
>> atomic_store(memory_order_release)  | s{b|h|w|d}
>> atomic_store(memory_order_seq_cst)  | s{b|h|w|d}.rl
>> --
>> atomic_thread_fence(memory_order_acquire)   | nop
>> atomic_thread_fence(memory_order_release)   | nop
>> atomic_thread_fence(memory_order_acq_rel)   | nop
>> atomic_thread_fence(memory_order_seq_cst)   | fence rw,rw
>> --
>> C/C++ Construct | RVTSO AMO Mapping
>> atomic_(memory_order_relaxed)   | amo.{w|d}
>> atomic_(memory_order_acquire)   | amo.{w|d}
>> atomic_(memory_order_release)   | amo.{w|d}
>> atomic_(memory_order_acq_rel)   | amo.{w|d}
>> atomic_(memory_order_seq_cst)   | amo.{w|d}

Re: [PATCH] RISC-V: Allow RVV VMS{Compare}(V1, V1) simplify to VMSET

2023-04-29 Thread Andrew Waterman via Gcc-patches
On Sat, Apr 29, 2023 at 8:06 AM Jeff Law via Gcc-patches <
gcc-patches@gcc.gnu.org> wrote:
>
>
>
> On 4/28/23 20:55, Li, Pan2 wrote:
> > Thanks Jeff for comments.
> >
> > It makes sense to me. For the EQ operator we should have CONSTM1.
> That's not the way I interpret the RVV documentation.  Of course it's
> not terribly clear.I guess one could do some experiments with qemu
> or try to dig into the sail code and figure out the intent from those.
>
>
>
> Does this mean s390 parts has similar issue here? Then for instructions
> like VMSEQ, we need to adjust the simplify_rtx up to a point.
> You'd have to refer to the s390 instruction set reference to understand
> precisely how the vector compares work.
>
> But as it stands this really isn't a simplify-rtx question, but a
> question of the semantics of risc-v.   What happens with the high bits
> in the destination mask register is critical -- and if risc-v doesn't
> set them to all ones in this case, then that would mean that defining
> that macro is simply wrong for risc-v.

The relevant statement in the spec is that "the tail elements are always
updated with a tail-agnostic policy".  The vmset.m instruction will cause
mask register bits [0, vl-1] to be set to 1; elements [vl, VLMAX-1] will
either be undisturbed or set to 1, i.e., effectively unspecified.

>
> jeff


Re: [PATCH v1] [RFC] Improve folding for comparisons with zero in tree-ssa-forwprop.

2023-03-17 Thread Andrew Waterman via Gcc-patches
On Fri, Mar 17, 2023 at 6:16 AM Philipp Tomsich
 wrote:
>
> On Fri, 17 Mar 2023 at 09:31, Richard Biener  
> wrote:
> >
> > On Thu, Mar 16, 2023 at 4:27 PM Manolis Tsamis  
> > wrote:
> > >
> > > For this C testcase:
> > >
> > > void g();
> > > void f(unsigned int *a)
> > > {
> > >   if (++*a == 1)
> > > g();
> > > }
> > >
> > > GCC will currently emit a comparison with 1 by using the value
> > > of *a after the increment. This can be improved by comparing
> > > against 0 and using the value before the increment. As a result
> > > there is a potentially shorter dependancy chain (no need to wait
> > > for the result of +1) and on targets with compare zero instructions
> > > the generated code is one instruction shorter.
> >
> > The downside is we now need two registers and their lifetime overlaps.
> >
> > Your patch mixes changing / inverting a parameter (which seems unneeded
> > for the actual change) with preferring compares against zero.
> >
> > What's the reason to specifically prefer compares against zero?  On x86
> > we have add that sets flags, so ++*a == 0 would be preferred, but
> > for your sequence we'd need a test reg, reg; branch on zero, so we do
> > not save any instruction.
>
> AArch64, RISC-V and MIPS support a branch-on-(not-)equals-zero, while
> comparing against a constant requires to load any non-zero value into
> a register first.

Equality comparisons against 0 are also slightly cheaper than equality
comparisons against 1 on x86, though it's a code-size difference, not
an instruction-count difference.  Not sure this changes the
story--just pointing out that this optimization might be slightly more
generally applicable than it initially seems.

>
> This feels a bit like we need to call onto the backend to check
> whether comparisons against 0 are cheaper.
>
> Obviously, the underlying issue become worse if the immediate can not
> be built up in a single instruction.
> Using RISC-V as an example (primarily, as RISC-V makes it particularly
> easy to run into multi-instruction sequences for constants), we can
> construct the following case:
>
>   void f(unsigned int *a)
>   {
> if ((*a += 0x900) == 0x900)
>g();
>   }
>
> which GCC 12.2.0 (trunk may already be small enough to reuse the
> constant once loaded into register, but I did not check…) with -O3
> turns into:
>
> f:
>   lw a4,0(a0)
>   li a5,4096
>   addiw a5,a5,-1792
>   addw a4,a5,a4
>   li a5,4096
>   sw a4,0(a0)
>   addi a5,a5,-1792
>   beq a4,a5,.L4
>   ret
> .L4:
>   tail g
>
> Thanks,
> Philipp.
>
>
> On Fri, 17 Mar 2023 at 09:31, Richard Biener  
> wrote:
> >
> > On Thu, Mar 16, 2023 at 4:27 PM Manolis Tsamis  
> > wrote:
> > >
> > > For this C testcase:
> > >
> > > void g();
> > > void f(unsigned int *a)
> > > {
> > >   if (++*a == 1)
> > > g();
> > > }
> > >
> > > GCC will currently emit a comparison with 1 by using the value
> > > of *a after the increment. This can be improved by comparing
> > > against 0 and using the value before the increment. As a result
> > > there is a potentially shorter dependancy chain (no need to wait
> > > for the result of +1) and on targets with compare zero instructions
> > > the generated code is one instruction shorter.
> >
> > The downside is we now need two registers and their lifetime overlaps.
> >
> > Your patch mixes changing / inverting a parameter (which seems unneeded
> > for the actual change) with preferring compares against zero.
> >
> > What's the reason to specifically prefer compares against zero?  On x86
> > we have add that sets flags, so ++*a == 0 would be preferred, but
> > for your sequence we'd need a test reg, reg; branch on zero, so we do
> > not save any instruction.
> >
> > We do have quite some number of bugreports with regards to making VRPs
> > life harder when splitting things this way.  It's easier for VRP to handle
> >
> >   _1 = _2 + 1;
> >   if (_1 == 1)
> >
> > than it is
> >
> >   _1 = _2 + 1;
> >   if (_2 == 0)
> >
> > where VRP fails to derive a range for _1 on the _2 == 0 branch.  So besides
> > the life-range issue there's other side-effects as well.  Maybe ranger 
> > meanwhile
> > can handle the above case?
> >
> > What's the overall effect of the change on a larger code base?
> >
> > Thanks,
> > Richard.
> >
> > >
> > > Example from Aarch64:
> > >
> > > Before
> > > ldr w1, [x0]
> > > add w1, w1, 1
> > > str w1, [x0]
> > > cmp w1, 1
> > > beq .L4
> > > ret
> > >
> > > After
> > > ldr w1, [x0]
> > > add w2, w1, 1
> > > str w2, [x0]
> > > cbz w1, .L4
> > > ret
> > >
> > > gcc/ChangeLog:
> > >
> > > * tree-ssa-forwprop.cc (combine_cond_expr_cond):
> > > (forward_propagate_into_comparison_1): Optimize
> > > for zero comparisons.
> > >
> > > Signed-off-by: Manolis Tsamis 
> > > ---
> > >
> > >  gcc/tree-ssa-forwprop.cc | 41 +++-
> > >  1 file changed, 

Re: RISC-V: Add divmod instruction support

2023-02-20 Thread Andrew Waterman via Gcc-patches
On Sat, Feb 18, 2023 at 1:30 PM Palmer Dabbelt  wrote:
>
> On Sat, 18 Feb 2023 13:06:02 PST (-0800), jeffreya...@gmail.com wrote:
> >
> >
> > On 2/18/23 11:26, Palmer Dabbelt wrote:
> >> On Fri, 17 Feb 2023 06:02:40 PST (-0800), gcc-patches@gcc.gnu.org wrote:
> >>> Hi all,
> >>> If we have division and remainder calculations with the same operands:
> >>>
> >>>   a = b / c;
> >>>   d = b % c;
> >>>
> >>> We can replace the calculation of remainder with multiplication +
> >>> subtraction, using the result from the previous division:
> >>>
> >>>   a = b / c;
> >>>   d = a * c;
> >>>   d = b - d;
> >>>
> >>> Which will be faster.
> >>
> >> Do you have any benchmarks that show that performance increase?  The ISA
> >> manual specifically says the suggested sequence is div+mod, and while
> >> those suggestions don't always pan out for real hardware it's likely
> >> that at least some implementations will end up with the ISA-suggested
> >> fusions.
> > It'll almost certainly be visible in mcf.  Been there, done that.  In
> > fact, that's why I asked the team Matevos works on to poke at this case
> > as I went through this issue on another processor.
> >
> > It can also be run through LLVM's MCA to estimate counts if you've got a
> > pipeline description.  THe div+rem will come out at around ~40c while a
> > div+mul+sub should weigh in around 25c for Veyron v1.
>
> Do you have a link to the patches somewhere?  I couldn't find them
> online, just the custom instruction support.  Or even just some docs
> describing what the pipeline does, as just basing one performance model
> on another is kind of a double-edged sword.
>
> That said, I think just knowing the processor doesn't do the div+mod
> fusion is sufficient to turn something like this on for the mtune for
> that processor.  That's different than turning it on globally, though --
> unless it turns out nobody is actually doing the fusion suggested in the
> ISA manual, which wouldn't be super surprising.
>
> Maybe some of the SiFive and T-Head folks can chime in on whether or not
> their processors perform the fusion in question -- and if so, do the
> instructions need to say back-to-back?

AFAIK, the sequence with the multiplication will normally be faster on
SiFive cores when both the quotient and the remainder are needed.

>  It doesn't look like we're
> really targeting the code sequences the ISA suggests as it stands, so
> maybe it's OK to just switch the default over too?
>
> It also brings up the question of mulh+mul fusions, which I don't think
> we've really looked at (though maybe they're a lot less important for
> rv64).


Re: [PATCH] RISC-V: Note that __builtin_riscv_pause() implies Xgnuzihintpausestate

2022-12-17 Thread Andrew Waterman
On Sat, Dec 17, 2022 at 2:21 AM Andreas Schwab  wrote:
>
> On Dez 17 2022, Andrew Waterman wrote:
>
> > On Sat, Dec 17, 2022 at 2:10 AM Andreas Schwab  
> > wrote:
> >>
> >> On Dez 17 2022, Andrew Waterman wrote:
> >>
> >> > It took me a few minutes to understand the purpose of this chicanery, but
> >> > there's indeed a contradiction in the ISA spec.  HINT instructions _do_
> >> > affect architectural state in a limited fashion--namely, updating the PC.
> >>
> >> How can an insn _not_ affect the PC? (Other than the trivial infinite
> >> loop.)
> >
> > Heh, yeah, that's roughly what I meant by "common-sense reading" (and
> > that's my rationale for simply clarifying the spec and nuking this
> > Xgnuzihintpausestate extension).
>
> My point is that the implicit update of the PC cannot be part of the
> architectural state in the first place.  Even the trivial infinite loop
> has this, before the actual state change (setting PC back) is performed.

It's just a definitional issue.  By analogy, this is why we have the
concept of "explicit memory access" (the thing a load or store is
trying to do) and "implicit memory access" (all of the other memory
accesses, like the instruction fetch or page-table walk).  The PC
update is very much an architectural-state change, but it would be
fair to call it an "implicit architectural-state change" to contrast
with e.g. writing an x-register being an "explicit architectural state
change".

Anyway, I don't think we're disagreeing with each other (and I still
think there's no problem to be solved here).

>
> --
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."


Re: [PATCH] RISC-V: Note that __builtin_riscv_pause() implies Xgnuzihintpausestate

2022-12-17 Thread Andrew Waterman
On Sat, Dec 17, 2022 at 2:10 AM Andreas Schwab  wrote:
>
> On Dez 17 2022, Andrew Waterman wrote:
>
> > It took me a few minutes to understand the purpose of this chicanery, but
> > there's indeed a contradiction in the ISA spec.  HINT instructions _do_
> > affect architectural state in a limited fashion--namely, updating the PC.
>
> How can an insn _not_ affect the PC? (Other than the trivial infinite
> loop.)

Heh, yeah, that's roughly what I meant by "common-sense reading" (and
that's my rationale for simply clarifying the spec and nuking this
Xgnuzihintpausestate extension).

>
> --
> Andreas Schwab, sch...@linux-m68k.org
> GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510  2552 DF73 E780 A9DA AEC1
> "And now for something completely different."


Re: [PATCH] RISC-V: Note that __builtin_riscv_pause() implies Xgnuzihintpausestate

2022-12-17 Thread Andrew Waterman
It took me a few minutes to understand the purpose of this chicanery, but
there's indeed a contradiction in the ISA spec.  HINT instructions _do_
affect architectural state in a limited fashion--namely, updating the PC.
So, it's incorrect to say that PAUSE changes no architectural state.
Because these statements are contradictory, a common-sense reading should
parse this as "PAUSE changes no architectural state in the same informal
sense as other HINTs".  Otherwise, PAUSE wouldn't actually be a HINT.

I'm just going to delete the erroneous text.  This eliminates the
contradiction and makes the spec consistent with both the de facto and de
jure golden models, which behave in the common-sense manner Palmer's
Xgnuzihintpausestate extension would suggest.

To avoid confusion, I strongly suggest deleting all references
to Xgnuzihintpausestate, since its existence invites a question that no
longer needs to be answered.

cc'ing Greg since AFAICS he merged in the erroneous text.

On Fri, Dec 16, 2022 at 8:48 AM Palmer Dabbelt  wrote:

> On Mon, 28 Nov 2022 10:45:51 PST (-0800), Palmer Dabbelt wrote:
> > On Fri, 18 Nov 2022 09:01:08 PST (-0800), Palmer Dabbelt wrote:
> >> On Thu, 17 Nov 2022 22:59:08 PST (-0800), Kito Cheng wrote:
> >>> Wait, what's Xgnuzihintpausestate???
> >>
> >> I just made it up, it's defined right next to the name like those
> >> profile extensions are.  I figured that's the most RISC-V way to define
> >> something like this, but we could just drop it and run with the
> >> definition -- IIRC we just stuck a comment in for Linux and QEMU, I
> >> doubt anyone is actually going to implement the "doesn't touch PC"
> >> version of pause.
> >
> > Just checking up on this one.  I don't care a ton about the name, just
> > that we document where we're intentionally violating the specs.
>
> I'm just committing this one, no big deal if you want to change the
> wording.  I just want it out of my queue.
>
> >
> >>
> >>> On Fri, Nov 18, 2022 at 12:30 PM Palmer Dabbelt 
> wrote:
> 
>  gcc/ChangeLog:
> 
>  * doc/extend.texi (__builtin_riscv_pause): Imply
>  Xgnuzihintpausestate.
>  ---
>   gcc/doc/extend.texi | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
> 
>  diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi
>  index b1dd39e64b8..26f14e61bc8 100644
>  --- a/gcc/doc/extend.texi
>  +++ b/gcc/doc/extend.texi
>  @@ -21103,7 +21103,9 @@ Returns the value that is currently set in
> the @samp{tp} register.
>   @end deftypefn
> 
>   @deftypefn {Built-in Function}  void __builtin_riscv_pause (void)
>  -Generates the @code{pause} (hint) machine instruction.
>  +Generates the @code{pause} (hint) machine instruction.  This implies
> the
>  +Xgnuzihintpausestate extension, which redefines the @code{pause}
> instruction to
>  +change architectural state.
>   @end deftypefn
> 
>   @node RX Built-in Functions
>  --
>  2.38.1
> 
>


Re: [PATCH] RISC-V: Remove unit-stride store from ta attribute

2022-12-16 Thread Andrew Waterman
On Fri, Dec 16, 2022 at 1:59 PM Palmer Dabbelt  wrote:
>
> On Fri, 16 Dec 2022 12:01:04 PST (-0800), jeffreya...@gmail.com wrote:
> >
> >
> > On 12/14/22 04:36, juzhe.zh...@rivai.ai wrote:
> >> From: Ju-Zhe Zhong 
> >>
> >> Since store instructions doesn't care about tail policy, we remove
> >> vste from "ta" attribute. Hence, we could have more fusion chances
> >> and better optimization.
> >>
> >> gcc/ChangeLog:
> >>
> >>  * config/riscv/vector.md: Remove vste.
> > Just to confirm that I understand the basic model.  Vector stores only
> > update active elements, thus they don't care about tail policy, right?
> >
> > Assuming that's the case, then this is OK.
>
> That had been my assumption as well, but I don't see that explicitly
> called out in the ISA manual.  I see
>
>Masked vector stores only update active memory elements.
>
> where "active" is defined as
>
> * The _body_ elements are those whose element index is greater than or 
> equal
> to the initial value in the `vstart` register, and less than the current
> vector length setting in `vl`. The body can be split into two disjoint 
> subsets:
>
> ** The _active_ elements during a vector instruction's execution are the
> elements within the body and where the current mask is enabled at that 
> element
> position.  The active elements can raise exceptions and update the 
> destination
> vector register group.
>
> but I don't see anything about the unmasked stores.  The blurb about
> tail elements only applies to registers groups, not memory addresses, so
> I think that's kind of a grey area there too.  I was pretty sure the intent
> here was to have tail elements not updated in memory, so hopefully I'm just
> missing something in the spec.

As discussed on the github issue, I think there is sufficient
justification in the spec to say that vector stores are forbidden from
accessing tail elements even if unmasked.  (And of course the ISA
would be pretty useless if that weren't the case...) But there's no
reason not to clarify the language in the spec, so as to make it
easier for readers to grok.

>
> I open an issue to check: https://github.com/riscv/riscv-v-spec/issues/846


Re: [PATCH] RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

2022-11-17 Thread Andrew Waterman
On Thu, Nov 17, 2022 at 10:52 AM Philipp Tomsich
 wrote:
>
> On Thu, 17 Nov 2022 at 19:33, Andrew Waterman  wrote:
> >
> > Am I wrong to worry that this will increase dynamic instruction count
> > when used in a loop?  The obvious code is more efficient when the
> > constant loads can be hoisted out of a loop.  Or does the cost model
> > account for this somehow?
>
> With this change merged, GCC still hoists the constants out of the
> loop (just checked with a quick test case).
> So the cost model seems correct (whether intentionally or accidentally).

Cool, thanks for checking.

>
> Thanks,
> Philipp.
>
> >
> >
> > On Sun, Nov 13, 2022 at 12:50 PM Philipp Tomsich
> >  wrote:
> > >
> > > Use Zbs when generating a sequence for "if ((a & twobits) == singlebit) 
> > > ..."
> > > that can be expressed as bexti + bexti + andn.
> > >
> > > gcc/ChangeLog:
> > >
> > > * config/riscv/bitmanip.md 
> > > (*branch_mask_twobits_equals_singlebit):
> > > Handle "if ((a & T) == C)" using Zbs, when T has 2 bits set and C 
> > > has one
> > > of these tow bits set.
> > > * config/riscv/predicates.md (const_twobits_operand): New 
> > > predicate.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > * gcc.target/riscv/zbs-if_then_else-01.c: New test.
> > >
> > > Signed-off-by: Philipp Tomsich 
> > > ---
> > >
> > >  gcc/config/riscv/bitmanip.md  | 42 +++
> > >  gcc/config/riscv/predicates.md|  5 +++
> > >  .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
> > >  3 files changed, 67 insertions(+)
> > >  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> > >
> > > diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> > > index 7a8f4e35880..2cea394671f 100644
> > > --- a/gcc/config/riscv/bitmanip.md
> > > +++ b/gcc/config/riscv/bitmanip.md
> > > @@ -690,3 +690,45 @@
> > >"TARGET_ZBS"
> > >[(set (match_dup 0) (zero_extract:X (match_dup 1) (const_int 1) 
> > > (match_dup 2)))
> > > (set (match_dup 0) (xor:X (match_dup 0) (const_int 1)))])
> > > +
> > > +;; IF_THEN_ELSE: test for 2 bits of opposite polarity
> > > +(define_insn_and_split "*branch_mask_twobits_equals_singlebit"
> > > +  [(set (pc)
> > > +   (if_then_else (match_operator 1 "equality_operator"
> > > +  [(and:X (match_operand:X 2 "register_operand" "r")
> > > +  (match_operand:X 3 "const_twobits_operand" 
> > > "i"))
> > > +   (match_operand:X 4 "single_bit_mask_operand" 
> > > "i")])
> > > +(label_ref (match_operand 0 "" ""))
> > > +(pc)))
> > > +   (clobber (match_scratch:X 5 "="))
> > > +   (clobber (match_scratch:X 6 "="))]
> > > +  "TARGET_ZBS && TARGET_ZBB && !SMALL_OPERAND (INTVAL (operands[3]))"
> > > +  "#"
> > > +  "&& reload_completed"
> > > +  [(set (match_dup 5) (zero_extract:X (match_dup 2)
> > > + (const_int 1)
> > > + (match_dup 8)))
> > > +   (set (match_dup 6) (zero_extract:X (match_dup 2)
> > > + (const_int 1)
> > > + (match_dup 9)))
> > > +   (set (match_dup 6) (and:X (not:X (match_dup 6)) (match_dup 5)))
> > > +   (set (pc) (if_then_else (match_op_dup 1 [(match_dup 6) (const_int 0)])
> > > +  (label_ref (match_dup 0))
> > > +  (pc)))]
> > > +{
> > > +   unsigned HOST_WIDE_INT twobits_mask = UINTVAL (operands[3]);
> > > +   unsigned HOST_WIDE_INT singlebit_mask = UINTVAL (operands[4]);
> > > +
> > > +   /* Make sure that the reference value has one of the bits of the mask 
> > > set */
> > > +   if ((twobits_mask & singlebit_mask) == 0)
> > > +  FAIL;
> > > +
> > > +   int setbit = ctz_hwi (singlebit_mask);
> > > +   int clearbit = ctz_hwi (twobits_mask & ~singlebit_mask);
> > > +
> > > +   operands[1] = gen_rtx_fmt_ee (GET_CODE (operands[1

Re: [PATCH] RISC-V: Handle "(a & twobits) == singlebit" in branches using Zbs

2022-11-17 Thread Andrew Waterman
Am I wrong to worry that this will increase dynamic instruction count
when used in a loop?  The obvious code is more efficient when the
constant loads can be hoisted out of a loop.  Or does the cost model
account for this somehow?


On Sun, Nov 13, 2022 at 12:50 PM Philipp Tomsich
 wrote:
>
> Use Zbs when generating a sequence for "if ((a & twobits) == singlebit) ..."
> that can be expressed as bexti + bexti + andn.
>
> gcc/ChangeLog:
>
> * config/riscv/bitmanip.md 
> (*branch_mask_twobits_equals_singlebit):
> Handle "if ((a & T) == C)" using Zbs, when T has 2 bits set and C has 
> one
> of these tow bits set.
> * config/riscv/predicates.md (const_twobits_operand): New predicate.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/riscv/zbs-if_then_else-01.c: New test.
>
> Signed-off-by: Philipp Tomsich 
> ---
>
>  gcc/config/riscv/bitmanip.md  | 42 +++
>  gcc/config/riscv/predicates.md|  5 +++
>  .../gcc.target/riscv/zbs-if_then_else-01.c| 20 +
>  3 files changed, 67 insertions(+)
>  create mode 100644 gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
>
> diff --git a/gcc/config/riscv/bitmanip.md b/gcc/config/riscv/bitmanip.md
> index 7a8f4e35880..2cea394671f 100644
> --- a/gcc/config/riscv/bitmanip.md
> +++ b/gcc/config/riscv/bitmanip.md
> @@ -690,3 +690,45 @@
>"TARGET_ZBS"
>[(set (match_dup 0) (zero_extract:X (match_dup 1) (const_int 1) (match_dup 
> 2)))
> (set (match_dup 0) (xor:X (match_dup 0) (const_int 1)))])
> +
> +;; IF_THEN_ELSE: test for 2 bits of opposite polarity
> +(define_insn_and_split "*branch_mask_twobits_equals_singlebit"
> +  [(set (pc)
> +   (if_then_else (match_operator 1 "equality_operator"
> +  [(and:X (match_operand:X 2 "register_operand" "r")
> +  (match_operand:X 3 "const_twobits_operand" 
> "i"))
> +   (match_operand:X 4 "single_bit_mask_operand" "i")])
> +(label_ref (match_operand 0 "" ""))
> +(pc)))
> +   (clobber (match_scratch:X 5 "="))
> +   (clobber (match_scratch:X 6 "="))]
> +  "TARGET_ZBS && TARGET_ZBB && !SMALL_OPERAND (INTVAL (operands[3]))"
> +  "#"
> +  "&& reload_completed"
> +  [(set (match_dup 5) (zero_extract:X (match_dup 2)
> + (const_int 1)
> + (match_dup 8)))
> +   (set (match_dup 6) (zero_extract:X (match_dup 2)
> + (const_int 1)
> + (match_dup 9)))
> +   (set (match_dup 6) (and:X (not:X (match_dup 6)) (match_dup 5)))
> +   (set (pc) (if_then_else (match_op_dup 1 [(match_dup 6) (const_int 0)])
> +  (label_ref (match_dup 0))
> +  (pc)))]
> +{
> +   unsigned HOST_WIDE_INT twobits_mask = UINTVAL (operands[3]);
> +   unsigned HOST_WIDE_INT singlebit_mask = UINTVAL (operands[4]);
> +
> +   /* Make sure that the reference value has one of the bits of the mask set 
> */
> +   if ((twobits_mask & singlebit_mask) == 0)
> +  FAIL;
> +
> +   int setbit = ctz_hwi (singlebit_mask);
> +   int clearbit = ctz_hwi (twobits_mask & ~singlebit_mask);
> +
> +   operands[1] = gen_rtx_fmt_ee (GET_CODE (operands[1]) == NE ? EQ : NE,
> +mode, operands[6], GEN_INT(0));
> +
> +   operands[8] = GEN_INT (setbit);
> +   operands[9] = GEN_INT (clearbit);
> +})
> diff --git a/gcc/config/riscv/predicates.md b/gcc/config/riscv/predicates.md
> index 490bff688a7..6e34829a59b 100644
> --- a/gcc/config/riscv/predicates.md
> +++ b/gcc/config/riscv/predicates.md
> @@ -321,6 +321,11 @@
>(and (match_code "const_int")
> (match_test "popcount_hwi (~UINTVAL (op)) == 2")))
>
> +;; A CONST_INT operand that has exactly two bits set.
> +(define_predicate "const_twobits_operand"
> +  (and (match_code "const_int")
> +   (match_test "popcount_hwi (UINTVAL (op)) == 2")))
> +
>  ;; A CONST_INT operand that fits into the unsigned half of a
>  ;; signed-immediate after the top bit has been cleared.
>  (define_predicate "uimm_extra_bit_operand"
> diff --git a/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c 
> b/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> new file mode 100644
> index 000..d249a841ff9
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/riscv/zbs-if_then_else-01.c
> @@ -0,0 +1,20 @@
> +/* { dg-do compile } */
> +/* { dg-options "-march=rv64gc_zbb_zbs -mabi=lp64" } */
> +/* { dg-skip-if "" { *-*-* } { "-O0" "-Og" "-O1" } } */
> +
> +void g();
> +
> +void f1 (long a)
> +{
> +  if ((a & ((1ul << 33) | (1 << 4))) == (1ul << 33))
> +g();
> +}
> +
> +void f2 (long a)
> +{
> +  if ((a & 0x12) == 0x10)
> +g();
> +}
> +
> +/* { dg-final { scan-assembler-times "bexti\t" 2 } } */
> +/* { dg-final { scan-assembler-times "andn\t" 1 } } */
> --
> 2.34.1
>


Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-11-08 Thread Andrew Waterman
On Tue, Nov 8, 2022 at 10:11 AM Jakub Jelinek  wrote:
>
> On Tue, Nov 08, 2022 at 09:44:40AM -0800, Andrew Waterman wrote:
> > On Tue, Nov 8, 2022 at 3:20 AM Jakub Jelinek via Gcc-patches
> >  wrote:
> > >
> > > On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote:
> > > > As suggested upthread, I have also adjusted update_nan_sign() to drop
> > > > the NAN sign to VARYING if both operands are NAN.  As an optimization
> > > > I keep the sign if both operands are NAN and have the same sign.
> > >
> > > For NaNs this still relies on something IEEE754 doesn't guarantee,
> > > as I cited, after a binary operation the sign bit of the NaN is
> > > unspecified, whether there is one NaN operand or two.
> > > It might be that all CPUs handle it the way you've implemented
> > > (that for one NaN operand the sign of NaN result will be the same
> > > as that NaN operand and for two it will be the sign of one of the two
> > > NaNs operands, never something else), but I think we'd need to check
> > > more than one implementation for that (I've only tried x86_64 and thus
> > > SSE behavior in it), so one would need to test i387 long double behavior
> > > too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ...
> > > The guarantee given by IEEE754 is only for those copy, negate, abs, 
> > > copySign
> > > operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*,
> > > __builtin_copysign*.
> >
> > FWIW, RISC-V canonicalizes NaNs by clearing the sign bit; the signs of
> > the input NaNs do not factor in.
>
> Just for binary operations and some unary, or also the ones that
> IEEE754 spells out (moves, negations, absolute value and copysign)?

I should've been more specific in my earlier email: I was referring to
the arithmetic operators.  Copysign and friends do not canonicalize
NaNs.

>
> Jakub
>


Re: [PATCH] [PR24021] Implement PLUS_EXPR range-op entry for floats.

2022-11-08 Thread Andrew Waterman
On Tue, Nov 8, 2022 at 3:20 AM Jakub Jelinek via Gcc-patches
 wrote:
>
> On Mon, Nov 07, 2022 at 04:41:23PM +0100, Aldy Hernandez wrote:
> > As suggested upthread, I have also adjusted update_nan_sign() to drop
> > the NAN sign to VARYING if both operands are NAN.  As an optimization
> > I keep the sign if both operands are NAN and have the same sign.
>
> For NaNs this still relies on something IEEE754 doesn't guarantee,
> as I cited, after a binary operation the sign bit of the NaN is
> unspecified, whether there is one NaN operand or two.
> It might be that all CPUs handle it the way you've implemented
> (that for one NaN operand the sign of NaN result will be the same
> as that NaN operand and for two it will be the sign of one of the two
> NaNs operands, never something else), but I think we'd need to check
> more than one implementation for that (I've only tried x86_64 and thus
> SSE behavior in it), so one would need to test i387 long double behavior
> too, ARM/AArch64, PowerPC, s390{,x}, RISCV, ...
> The guarantee given by IEEE754 is only for those copy, negate, abs, copySign
> operations, so copying values around, NEG_EXPR, ABS_EXPR, __builtin_fabs*,
> __builtin_copysign*.

FWIW, RISC-V canonicalizes NaNs by clearing the sign bit; the signs of
the input NaNs do not factor in.

>
> Otherwise LGTM (but would be nice to get into GCC13 not just
> +, but also -, *, /, sqrt at least).
>
> Jakub
>


Re: [PATCH] RISC-V: Libitm add RISC-V support.

2022-10-27 Thread Andrew Waterman
I'm surprised by the hard-coded 128-byte cache line size.  If we need
to hard-code a value, it should be 64 (in accordance with the RVA
profiles, see https://github.com/riscv/riscv-profiles/blob/main/profiles.adoc),
but ideally this would be queried dynamically.


On Thu, Oct 27, 2022 at 3:51 AM Xiongchuan Tan via Gcc-patches
 wrote:
>
> libitm/ChangeLog:
>
> * configure.tgt: Add riscv support.
> * config/riscv/asm.h: New file.
> * config/riscv/sjlj.S: New file.
> * config/riscv/target.h: New file.
> ---
>  libitm/config/riscv/asm.h|  52 +
>  libitm/config/riscv/sjlj.S   | 144 +++
>  libitm/config/riscv/target.h |  50 
>  libitm/configure.tgt |   2 +
>  4 files changed, 248 insertions(+)
>  create mode 100644 libitm/config/riscv/asm.h
>  create mode 100644 libitm/config/riscv/sjlj.S
>  create mode 100644 libitm/config/riscv/target.h
>
> diff --git a/libitm/config/riscv/asm.h b/libitm/config/riscv/asm.h
> new file mode 100644
> index 000..6ba5e2c
> --- /dev/null
> +++ b/libitm/config/riscv/asm.h
> @@ -0,0 +1,52 @@
> +/* Copyright (C) 2022 Free Software Foundation, Inc.
> +   Contributed by Xiongchuan Tan .
> +
> +   This file is part of the GNU Transactional Memory Library (libitm).
> +
> +   Libitm is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
> +   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
> +   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> +   more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   .  */
> +
> +#ifndef _RV_ASM_H
> +#define _RV_ASM_H
> +
> +#if __riscv_xlen == 64
> +#  define GPR_L ld
> +#  define GPR_S sd
> +#  define SZ_GPR 8
> +#elif __riscv_xlen == 32
> +#  define GPR_L lw
> +#  define GPR_S sw
> +#  define SZ_GPR 4
> +#else
> +#  error Unsupported XLEN (must be 64-bit or 32-bit).
> +#endif
> +
> +#if defined(__riscv_flen) && __riscv_flen == 64
> +#  define FPR_L fld
> +#  define FPR_S fsd
> +#  define SZ_FPR 8
> +#elif defined(__riscv_flen) && __riscv_flen == 32
> +#  define FPR_L flw
> +#  define FPR_S fsw
> +#  define SZ_FPR 4
> +#else
> +#  define SZ_FPR 0
> +#endif
> +
> +#endif  /* _RV_ASM_H */
> diff --git a/libitm/config/riscv/sjlj.S b/libitm/config/riscv/sjlj.S
> new file mode 100644
> index 000..6f25cb5
> --- /dev/null
> +++ b/libitm/config/riscv/sjlj.S
> @@ -0,0 +1,144 @@
> +/* Copyright (C) 2022 Free Software Foundation, Inc.
> +   Contributed by Xiongchuan Tan .
> +
> +   This file is part of the GNU Transactional Memory Library (libitm).
> +
> +   Libitm is free software; you can redistribute it and/or modify it
> +   under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   Libitm is distributed in the hope that it will be useful, but WITHOUT ANY
> +   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
> +   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> +   more details.
> +
> +   Under Section 7 of GPL version 3, you are granted additional
> +   permissions described in the GCC Runtime Library Exception, version
> +   3.1, as published by the Free Software Foundation.
> +
> +   You should have received a copy of the GNU General Public License and
> +   a copy of the GCC Runtime Library Exception along with this program;
> +   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
> +   .  */
> +
> +#include "asmcfi.h"
> +#include "asm.h"
> +
> +   .text
> +   .align  2
> +   .global _ITM_beginTransaction
> +   .type   _ITM_beginTransaction, @function
> +
> +_ITM_beginTransaction:
> +   cfi_startproc
> +mv a1, sp
> +   addi sp, sp, -(14*SZ_GPR+12*SZ_FPR)
> +   cfi_adjust_cfa_offset(14*SZ_GPR+12*SZ_FPR)
> +
> +   /* Return Address */
> +   GPR_S ra, 0*SZ_GPR(sp)
> +   cfi_rel_offset(ra, 0*SZ_GPR)
> +
> +   /* Caller's sp */
> +   GPR_S a1, 1*SZ_GPR(sp)
> +
> +   /* Caller's s0/fp */
> +   GPR_S fp, 2*SZ_GPR(sp)
> +   cfi_rel_offset(fp, 2*SZ_GPR)
> +
> +   /* Callee-saved registers */
> +   GPR_S s1, 3*SZ_GPR(sp)
> +   

Re: [PATCH] RISC-V: Default to tuning for the thead-c906

2022-10-05 Thread Andrew Waterman
I agree with Kito; I don't support merging this patch.  My reasoning is twofold:

- The default settings should be fairly neutral, avoiding codegen that
severely disadvantages some targets.  Misaligned memory accesses are
certainly a problematic case in that respect.  (And it's highly
asymmetric: the perf upside for implementations that do support this
feature is much lower than the perf downside for implementations that
don't.)  I haven't reviewed the rest of the C906 tuning decisions, but
I'd raise the same concern if any of them are similarly non-neutral.

- Both the RISC-V ISA spec and the more recent RVA22 profile spec
explicitly caution about the perf impacts of such a tuning decision.
The RVA22 profile spec goes further to explicitly state that standard
software distributions should not do what this patch does.
https://github.com/riscv/riscv-profiles/commit/b09dc934e8cb8c1dfb7ddead8bdf509408e94609#diff-01b3848d5e44a205fafe2a7b93dbf353138763b76110bb833b71ac4afe5bR593-R595


Andrew


On Wed, Oct 5, 2022 at 6:59 PM Kito Cheng via Gcc-patches
 wrote:
>
> -1 for this, default enable fast unaligned access could cause many
> problems, and lots of RISC-V cores
> don't support HW unaligned access (Rocket-base RISC-V core, most
> SiFive core, and most Andes core IIRC),
> change this to default means package from RISC-V linux distro might
> contain unaligned access by default.
>
> The default one should be the safest option, in case people really
> want that, they could use --with-tune and
> --with-arch to change that.
>
>
> On Wed, Oct 5, 2022 at 1:57 PM Andrew Pinski via Gcc-patches
>  wrote:
> >
> > On Tue, Oct 4, 2022 at 8:55 PM Palmer Dabbelt  wrote:
> > >
> > > The C906 is by far the most widely available RISC-V processor, so let's
> > > default to tuning for it.
> > >
> > > gcc/ChangeLog
> > >
> > > * config/riscv/riscv.h (RISCV_TUNE_STRING_DEFAULT): Change to
> > > thead-c906.
> > > * doc/invoke.texi (RISC-V -mtune): Change the default to
> > > thead-c906.
> > >
> > > ---
> >
> > I am ok with this as --with-tune and --with-arch works as ways of
> > changing the default still.
> >
> > Thanks,
> > Andrew
> >
> > >
> > > This has come up a handful of times, most recently during the Cauldron.
> > > It seems like a grey area to me: we're changing the behavior of some
> > > command-line arguments (ie, everything that doesn't specify -mtune), but
> > > we sort of change that anyway as the tuning parameters change between
> > > releases.
> > >
> > > I'm not really seeing much of a precedent from the other ports.  It
> > > looks like aarch64 sort of changed the default in 02fdbd5beb0
> > > ("[AArch64] [-mtune cleanup 2/5] Tune for Cortex-A53 by default.") but I
> > > think at that point -mtune=generic and -mtune=cortex-a53 were equivalent
> > > so I'm not sure that counts.  I can't quite sort out if the default x86
> > > tuning has ever changed, but the tuning parameters have changed.  I
> > > don't see any way around having the tuning parameters change as they're
> > > pretty tightly coupled to the GCC internals, but changing to a different
> > > tuning target is a bit bigger of a change.
> > >
> > > We also have a bit of a special case here: -mtune is in theory only a
> > > performance issue, but this change will emit a lot more misaligned
> > > accesses and we've seen those trigger bugs in the trap handlers before.
> > > Those bugs are elsewhere so it's sort of not a GCC problem, but I'm sure
> > > there's still users out there with broken firmware and this may cause
> > > visible fallout.  We can just tell those users their systems were always
> > > broken, but that's never a fun way to do things.
> > >
> > > I figured the easiest way to talk about this would be to just send the
> > > patch, but I definitely don't plan on committing it without some
> > > discussion.
> > > ---
> > >  gcc/config/riscv/riscv.h | 2 +-
> > >  gcc/doc/invoke.texi  | 2 +-
> > >  2 files changed, 2 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> > > index 363113c6511..1d9379fa5ee 100644
> > > --- a/gcc/config/riscv/riscv.h
> > > +++ b/gcc/config/riscv/riscv.h
> > > @@ -40,7 +40,7 @@ along with GCC; see the file COPYING3.  If not see
> > >  #endif
> > >
> > >  #ifndef RISCV_TUNE_STRING_DEFAULT
> > > -#define RISCV_TUNE_STRING_DEFAULT "rocket"
> > > +#define RISCV_TUNE_STRING_DEFAULT "thead-c906"
> > >  #endif
> > >
> > >  extern const char *riscv_expand_arch (int argc, const char **argv);
> > > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> > > index e0c2c57c9b2..2a9ea3455f6 100644
> > > --- a/gcc/doc/invoke.texi
> > > +++ b/gcc/doc/invoke.texi
> > > @@ -28529,7 +28529,7 @@ particular CPU name.  Permissible values for this 
> > > option are: @samp{rocket},
> > >  @samp{thead-c906}, @samp{size}, and all valid options for 
> > > @option{-mcpu=}.
> > >
> > >  When @option{-mtune=} is not specified, use the setting from 
> > > 

Re: Supporting RISC-V Vendor Extensions in the GNU Toolchain

2022-05-16 Thread Andrew Waterman
On Fri, May 13, 2022 at 3:38 AM Philipp Tomsich 
wrote:

> On Fri, 13 May 2022 at 12:00, Christoph Müllner 
> wrote:
> >
> > On Wed, May 11, 2022 at 2:02 AM Palmer Dabbelt 
> wrote:
> > >
> > > [Sorry for cross-posting to a bunch of lists, I figured it'd be best to
> > > have all the discussions in one thread.]
> > >
> > > We currently only support what is defined by official RISC-V
> > > specifications in the various GNU toolchain projects.  There's
> certainly
> > > some grey areas there, but in general that means not taking code that
> > > relies on drafts or vendor defined extensions, even if that would
> result
> > > in higher performance or more featured systems for users.
> > >
> > > The original goal of these policies were to steer RISC-V implementers
> > > towards a common set of specifications, but over the last year or so
> > > it's become abundantly clear that this is causing more harm that good.
> > > All extant RISC-V systems rely on behaviors defined outside the
> official
> > > specifications, and while that's technically always been the case we've
> > > gotten to the point where trying to ignore that fact is impacting real
> > > users on real systems.  There's been consistent feedback from users
> that
> > > we're not meeting their needs, which can clearly be seen in the many
> out
> > > of tree patch sets in common use.
> > >
> > > There's been a handful of discussions about this, but we've yet to have
> > > a proper discussion on the mailing lists.  From the various discussions
> > > I've had it seems that folks are broadly in favor of supporting vendor
> > > extensions, but the devil's always in the details with this sort of
> > > thing so I thought it'd be best to write something up so we can have a
> > > concrete discussion.
> > >
> > > The idea is to start taking code that depends on vendor-defined
> behavior
> > > into the core GNU toolchain ports, as long as it meets the following
> > > criteria:
> > >
> > > * An ISA manual is available that can be redistributed/archived,
> defines
> > >   the behaviors in question as one or more vendor-specific extensions,
> > >   and is clearly versioned.  The RISC-V foundation is setting various
> > >   guidelines around how vendor-defined extensions and instructions
> > >   should be named, we strongly suggest that vendors follow those
> > >   conventions whenever possible (this is all new, though, so exactly
> > >   what's necessary from vendor specifications will likely evolve as we
> > >   learn).
> > > * There is a substantial user base that depends on the behavior in
> > >   question, which probably means there is hardware in the wild that
> > >   implements the extensions and users that require those extensions in
> > >   order for that hardware to be useful for common applications.  This
> is
> > >   always going to be a grey area, but it's essentially the same spot
> > >   everyone else is in.
>
> I must take exception to the "There is a substantial user base" rule,
> as this conflicts with the goal of avoiding fragmentation: the support
> for vendor-defined extensions should ideally have landed in an
> upstream release before the silicon is widely released.


The "substantial user base" standard is specious.  Other recent emails have
suggested that phrase is a euphemism for "widely available
consumer-programmable product."  It fails to account for RISC-V's most
important constituency: people using open-source toolchains to program
non-mass-market parts.

Our existing regime of "no custom extension support in standard software"
is draconian, but at least it's self-consistent.  I'd support retaining
it.  But, if we do change it, it's preposterous to reject extensions
like XVentanaCondOps on the basis that their silicon isn't available on
AliExpress.

  This would
> see these extensions being sent upstream significantly before
> widespread sampling (and sometimes around the time of the announcement
> of a device on the roadmap).  Simply put: I want everyone defining
> vendor extensions to contribute to our mainline development efforts
> instead of extending their own ancient forks.
>
> I suspect that this rule is intended to ensure that experimental,
> purely academic, or "closed" (as in: even if you have the silicon, it
> will be so deeply embedded that no one can run their own software —
> e.g. radio baseband controllers) extensions don't make the maintenance
> work harder.  If that is the case: could we use wording such as (a
> native speaker might wordsmith something more accurate) "accessible to
> run user-developed software" and "intended for a wider audience"?
>
> > > * There is a mechanism for testing the code in question without direct
> > >   access to hardware, which in practice means a QEMU port (or whatever
> > >   simulator is relevant in the space and that folks use for testing) or
> > >   some community commitment to long-term availability of the hardware
> > >   for testing (something like the GCC compile farm, for 

Re: [PATCH 1/1] RISC-V: Fix canonical extension order (K and J)

2022-04-23 Thread Andrew Waterman
Neither K nor J is an extension that exists, and so it doesn't make
sense to mandate any particular ordering.  The better change would be
to delete the letters `k' and `j' from that string, so that we aren't
enforcing constraints that don't serve a useful purpose.

cf. 
https://github.com/riscv/riscv-isa-manual/commit/f5f9c27010b69a015958ffebe1ac5a34f8776dff

On Sat, Apr 23, 2022 at 10:26 PM Tsukasa OI via Gcc-patches
 wrote:
>
> This commit fixes canonical extension order to follow the RISC-V ISA
> Manual draft-20210402-1271737 or later.
>
> gcc/ChangeLog:
>
> * common/config/riscv/riscv-common.cc (riscv_supported_std_ext):
> Fix "K" extension prefix to be placed before "J".
> ---
>  gcc/common/config/riscv/riscv-common.cc | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/gcc/common/config/riscv/riscv-common.cc 
> b/gcc/common/config/riscv/riscv-common.cc
> index 1501242e296..0b0ec2c4ec5 100644
> --- a/gcc/common/config/riscv/riscv-common.cc
> +++ b/gcc/common/config/riscv/riscv-common.cc
> @@ -594,7 +594,7 @@ riscv_subset_list::lookup (const char *subset, int 
> major_version,
>  static const char *
>  riscv_supported_std_ext (void)
>  {
> -  return "mafdqlcbjktpvn";
> +  return "mafdqlcbkjtpvn";
>  }
>
>  /* Parsing subset version.
> --
> 2.32.0
>


Re: [PATCH] RISC-V: Document `auipc' and `bitmanip' `type' attributes

2022-01-27 Thread Andrew Waterman
LGTM, thanks for correcting this oversight in my patch.


On Thu, Jan 27, 2022 at 2:09 PM Maciej W. Rozycki  wrote:
>
> Document new `auipc' and `bitmanip' `type' attributes added respectively
> with commit 88108b27dda9 ("RISC-V: Add sifive-7 pipeline description.")
> and commit 283b1707f237 ("RISC-V: Implement instruction patterns for ZBA
> extension.") but not listed so far.
>
> gcc/
> * config/riscv/riscv.md: Document `auipc' and `bitmanip' `type'
> attributes.
> ---
> Hi,
>
>  There's also the `rotate' `type' attribute, but it's nowhere used, so it
> might be worth removing.  As not-a-regression however that would be GCC 13
> material I guess.
>
>  OK to apply?
>
>   Maciej
> ---
>  gcc/config/riscv/riscv.md |2 ++
>  1 file changed, 2 insertions(+)
>
> gcc-riscv-auipc-bitmanip-type.diff
> Index: gcc/gcc/config/riscv/riscv.md
> ===
> --- gcc.orig/gcc/config/riscv/riscv.md
> +++ gcc/gcc/config/riscv/riscv.md
> @@ -150,6 +150,7 @@
>  ;; mfc transfer from coprocessor
>  ;; const   load constant
>  ;; arith   integer arithmetic instructions
> +;; auipc   integer addition to PC
>  ;; logical  integer logical instructions
>  ;; shift   integer shift instructions
>  ;; slt set less than instructions
> @@ -167,6 +168,7 @@
>  ;; multi   multiword sequence (or user asm statements)
>  ;; nop no operation
>  ;; ghost   an instruction that produces no real code
> +;; bitmanipbit manipulation instructions
>  (define_attr "type"
>"unknown,branch,jump,call,load,fpload,store,fpstore,
> mtc,mfc,const,arith,logical,shift,slt,imul,idiv,move,fmove,fadd,fmul,


Re: [PATCH][GCC13?] RISC-V: Replace `smin'/`smax' RTL patterns with `fmin'/`fmax'

2022-01-20 Thread Andrew Waterman
On Thu, Jan 20, 2022 at 12:30 PM Palmer Dabbelt  wrote:
>
> On Thu, 20 Jan 2022 07:44:25 PST (-0800), ma...@embecosm.com wrote:
> > RISC-V FMIN and FMAX machine instructions are IEEE-754-conformant[1]:
> >
> > "For FMIN and FMAX, if at least one input is a signaling NaN, or if both
> > inputs are quiet NaNs, the result is the canonical NaN.  If one operand
> > is a quiet NaN and the other is not a NaN, the result is the non-NaN
> > operand."
> >
> > as required by our `fminM3' and `fmaxM3' standard RTL patterns.
> >
> > However we only define `sminM3' and `smaxM3' standard RTL patterns to
> > produce the FMIN and FMAX machine instructions, which in turn causes the
> > `__builtin_fmin' and `__builtin_fmax' family of intrinsics to emit the
> > corresponding libcalls rather than the relevant machine instructions.
> >
> > Rename the `smin3' and `smax3' patterns to `fmin3' and
> > `fmax3' respectively then, removing the need to use libcalls for
> > IEEE 754 semantics with the minimum and maximum operations.
> >
> > [1] "The RISC-V Instruction Set Manual, Volume I: User-Level ISA",
> > Document Version 2.2, May 7, 2017, Section 8.3 "NaN Generation and
> > Propagation", p. 48
> >
> >   gcc/
> >   * config/riscv/riscv.md (smin3): Rename pattern to...
> >   (fmin3): ... this.
> >   (smax3): Likewise...
> >   (fmax3): ... this.
> > ---
> > Hi,
> >
> >  It's not clear to me how it's been missed or whether there is anything I
> > might be actually missing.  It looks to me like a clear oversight however.
>
> I'm not really a floating point person, but IIUC It's actually on
> purpose: earlier versions of the ISA spec didn't have this behavior, and
> at the time we originally merged the GCC port we decided to play it
> safe.  Pretty sure we discussed this before on the GCC mailing list
> ,maybe around the time the glibc port was going upstream?  I think Jim
> was the one who figured out how all the specs fit together.
>
> I can't find those older discussions, but this definately recently came
> up in glibc:
> https://sourceware.org/pipermail/libc-alpha/2021-October/131637.html .
> Looks like back then nobody knew of any hardware that ran glibc and
> implemented the old behavior, but there also haven't been patches posted
> yet so it's not set in stone.
>
> It's probably worth repeating the question here since there are a lot of
> RISC-V users that don't use glibc but do use GCC.  I don't know of
> anyone who implemented the old floating point standards off the top of
> my head, even in embedded land, but I'm pretty lost when it comes to ISA
> versioning these days so I might be missing something.
>
> One option could be to tie this to the ISA spec version and emit the
> required emulation routines, but I don't think that's worth bothering to
> do unless someone knows of an implementation that implements the old
> behavior.

The old formulation of the instructions were never ratified as a
RISC-V standard.  I don't think we need to hamstring ourselves here by
assuming the possibility of their implementation.

>
> > And in any case this change has passed full GCC regression testing (except
> > for the D frontend, which has stopped being built recently due to a defect
> > in Debian I haven't yet got to getting fixed) with the `riscv64-linux-gnu'
> > target using the HiFive Unmatched (U74 CPU) target board, so it seems to
> > be doing the right thing.
> >
> >  Timing might a bit unfortunate for this submission and given that it is
> > not a regression fix I guess this is GCC 13 material.  Please let me know
> > otherwise.
> >
> >  In any case OK to apply (when the time comes)?
>
> IMO waiting is the right way to go, as if this does uncover any issues
> they'll be a long-tail sort of thing.  That way we'll at least have a
> whole release cycle for folks to test on their hardware, which is about
> as good as we can do here.
>
> Acked-by: Palmer Dabbelt  # for 13
>
> Someone should probably do the glibc version, too ;)
>
> >
> >   Maciej
> > ---
> >  gcc/config/riscv/riscv.md |4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > gcc-riscv-fmin-fmax.diff
> > Index: gcc/gcc/config/riscv/riscv.md
> > ===
> > --- gcc.orig/gcc/config/riscv/riscv.md
> > +++ gcc/gcc/config/riscv/riscv.md
> > @@ -1214,7 +1214,7 @@
> >  ;;
> >  ;;  
> >
> > -(define_insn "smin3"
> > +(define_insn "fmin3"
> >[(set (match_operand:ANYF0 "register_operand" "=f")
> >   (smin:ANYF (match_operand:ANYF 1 "register_operand" " f")
> >  (match_operand:ANYF 2 "register_operand" " f")))]
> > @@ -1223,7 +1223,7 @@
> >[(set_attr "type" "fmove")
> > (set_attr "mode" "")])
> >
> > -(define_insn "smax3"
> > +(define_insn "fmax3"
> >[(set (match_operand:ANYF0 "register_operand" "=f")
> >   (smax:ANYF (match_operand:ANYF 1 "register_operand" " f")
> >  

Re: [PATCH] RISC-V: Enable overlap-by-pieces in case of fast unaliged access

2021-07-26 Thread Andrew Waterman
On Thu, Jul 22, 2021 at 10:27 AM Palmer Dabbelt  wrote:
>
> On Thu, 22 Jul 2021 06:29:46 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
> > Could you add a testcase? Otherwise LGTM.
> >
> > Option: -O2 -mtune=thead-c906 -march=rv64gc -mabi=lp64
> > void foo(char *dst){
> >__builtin_memset(dst, 0, 15);
> > }
>
> I'd like to see:
>
> * Test results.  This is only on for one target right now, so relying on
>   it to just work on others isn't a good idea.
> * Something to demonstrate this doesn't break -mstrict-align.
> * Some sort of performance analysis.  Most machines that support
>   unaligned access do so with some performance degredation,

Also, some machines that gracefully support misaligned accesses under
most circumstances nevertheless experience a perf degradation when the
load depends on two stores that overlap partially but not fully.  This
transformation will obviously trigger such behavior from time to time.

Note, I'm not objecting to this patch; I'm only responding to Palmer's
comment.  It makes sense to enable this kind of optimization for
-mtune=native etc., just not for standard software distributions.


>   it's unclear
>   that this conversion will actually manifst a performance improvement.
>   I don't have a C906 and don't know what workloads people care about
>   running on one, but I'm certainly not convinced this is a win --
>   what's listed here looks to be the best case, and that's only saving
>   two instructions to generate a pretty pathological sequence
>   (misaligned access that conflicts with a prior store).
>
> Jojo: do you have any description of the C906 pipeline?  Specifically in
> this case it'd be good to know how it handles unaligned accesses.
>
> >
> > On Thu, Jul 22, 2021 at 8:53 PM Christoph Muellner via Gcc-patches
> >  wrote:
> >>
> >> This patch enables the overlap-by-pieces feature of the by-pieces
> >> infrastructure for inlining builtins in case the target has set
> >> riscv_slow_unaligned_access_p to false.
> >>
> >> To demonstrate the effect for targets with fast unaligned access,
> >> the following code sequences are generated for a 15-byte memset-zero.
> >>
> >> Without overlap_op_by_pieces we get:
> >>   8e:   00053023sd  zero,0(a0)
> >>   92:   00052423sw  zero,8(a0)
> >>   96:   00051623sh  zero,12(a0)
> >>   9a:   00050723sb  zero,14(a0)
> >>
> >> With overlap_op_by_pieces we get:
> >>   7e:   00053023sd  zero,0(a0)
> >>   82:   000533a3sd  zero,7(a0)
> >>
> >> gcc/ChangeLog:
> >>
> >> * config/riscv/riscv.c (riscv_overlap_op_by_pieces): New function.
> >> (TARGET_OVERLAP_OP_BY_PIECES_P): Connect to
> >> riscv_overlap_op_by_pieces.
> >>
> >> Signed-off-by: Christoph Muellner 
> >> ---
> >>  gcc/config/riscv/riscv.c | 11 +++
> >>  1 file changed, 11 insertions(+)
> >>
> >> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> >> index 576960bb37c..98c76ba657a 100644
> >> --- a/gcc/config/riscv/riscv.c
> >> +++ b/gcc/config/riscv/riscv.c
> >> @@ -5201,6 +5201,14 @@ riscv_slow_unaligned_access (machine_mode, unsigned 
> >> int)
> >>return riscv_slow_unaligned_access_p;
> >>  }
> >>
> >> +/* Implement TARGET_OVERLAP_OP_BY_PIECES_P.  */
> >> +
> >> +static bool
> >> +riscv_overlap_op_by_pieces (void)
> >> +{
> >> +  return !riscv_slow_unaligned_access_p;
> >> +}
> >> +
> >>  /* Implement TARGET_CAN_CHANGE_MODE_CLASS.  */
> >>
> >>  static bool
> >> @@ -5525,6 +5533,9 @@ riscv_asan_shadow_offset (void)
> >>  #undef TARGET_SLOW_UNALIGNED_ACCESS
> >>  #define TARGET_SLOW_UNALIGNED_ACCESS riscv_slow_unaligned_access
> >>
> >> +#undef TARGET_OVERLAP_OP_BY_PIECES_P
> >> +#define TARGET_OVERLAP_OP_BY_PIECES_P riscv_overlap_op_by_pieces
> >> +
> >>  #undef TARGET_SECONDARY_MEMORY_NEEDED
> >>  #define TARGET_SECONDARY_MEMORY_NEEDED riscv_secondary_memory_needed
> >>
> >> --
> >> 2.31.1
> >>


Re: [PATCH] [RISCV] Add Pattern for builtin overflow

2021-05-01 Thread Andrew Waterman
On Thu, Apr 29, 2021 at 3:02 PM Jim Wilson  wrote:
>
> On Wed, Apr 28, 2021 at 4:04 PM Andrew Waterman  wrote:
>>
>> > This is a good suggestion, but in the interests of making forward progress 
>> > here, I'd like to accept the patch and then file these as bugzillas as 
>> > ways to further improve the patch.
>>
>> Agreed, these potential improvements are definitely not blockers.
>
>
> Turns out Levy had time to work on the patch after all, and submitted a 
> fourth version with your improvements.

Cool.  Thank you, Levy!

>
> Jim


Re: [PATCH] [RISCV] Add Pattern for builtin overflow

2021-04-28 Thread Andrew Waterman
On Wed, Apr 28, 2021 at 1:18 PM Jim Wilson  wrote:
>
> On Tue, Apr 27, 2021 at 12:45 AM Andrew Waterman  wrote:
>>
>> > signed addition (SImode with RV64):
>> > add t0, t1, t2
>> > sext.w  t3, t0
>> > bne t0, t3, overflow
>>
>> The following version has the same instruction count but offers more ILP:
>>
>>   add t0, t1, t2
>>   addw t3, t1, t2
>>   bne t0, t3, overflow
>
>
> This is a good suggestion, but in the interests of making forward progress 
> here, I'd like to accept the patch and then file these as bugzillas as ways 
> to further improve the patch.

Agreed, these potential improvements are definitely not blockers.

>>
>> > unsigned addition (SImode with RV64):
>> > sext.w  t3, t1
>> > addwt0, t1, t2
>> > bltut0, t3, overflow
>>
>> I think you can do this in two instructions, similar to the previous pattern:
>>
>>   addw t0, t1, t2
>>   bltu t0, t1, overflow
>
>
> Likewise.
>>
>> > signed subtraction (SImode with RV64):
>> > sub t0, t1, t2
>> > sext.w  t3, t0
>> > bne t0, t3, overflow
>>
>> See analogous addition comment.
>
>
> Likewise.
>>
>>
>> > unsigned subtraction (SImode with RV64):
>> > sext.w  t3, t1
>> > subwt0, t1, t2
>> > bltut0, t3, overflow
>>
>> See analogous addition comment.
>
>
> Likewise.
>>
>> > unsigned multiplication (SImode with RV64):
>> > sllit0,t0,32
>> > sllit1,t1,32
>> > srlit0,t0,32
>> > srlit1,t1,32
>> > mul t0,t0,t1
>> > srait5,t0,32
>> > bne t5, 0, overflow
>>
>> I think you can eliminate the first two right shifts by replacing mul
>> with mulhu... something like:
>>
>>   slli rx, rx, 32
>>   slli ry, ry, 32
>>   mulhu rz, rx, ry
>>   srli rt, rz, 32
>>   bnez rt, overflow
>
>
> Likewise, except this should be a separate bugzilla.
>
> Jim


Re: [PATCH] [RISCV] Add Pattern for builtin overflow

2021-04-27 Thread Andrew Waterman
On Tue, Apr 27, 2021 at 12:18 AM Levy Hsu  wrote:
>
> From: LevyHsu 
>
> Added implementation for builtin overflow detection, new patterns are listed 
> below.
>
> ---
> Addition:
>
> signed addition (SImode with RV32 || DImode with RV64):
> add t0, t1, t2
> sltit3, t2, 0
> slt t4, t0, t1
> bne t3, t4, overflow
>
> signed addition (SImode with RV64):
> add t0, t1, t2
> sext.w  t3, t0
> bne t0, t3, overflow

The following version has the same instruction count but offers more ILP:

  add t0, t1, t2
  addw t3, t1, t2
  bne t0, t3, overflow

>
> unsigned addition (SImode with RV32 || DImode with RV64):
> add t0, t1, t2
> bltut0, t1, overflow
>
> unsigned addition (SImode with RV64):
> sext.w  t3, t1
> addwt0, t1, t2
> bltut0, t3, overflow

I think you can do this in two instructions, similar to the previous pattern:

  addw t0, t1, t2
  bltu t0, t1, overflow

> ---
> Subtraction:
>
> signed subtraction (SImode with RV32 || DImode with RV64):
> sub t0, t1, t2
> sltit3, t2, 0
> slt t4, t1, t0
> bne t3, t4, overflow
>
> signed subtraction (SImode with RV64):
> sub t0, t1, t2
> sext.w  t3, t0
> bne t0, t3, overflow

See analogous addition comment.

>
> unsigned subtraction (SImode with RV32 || DImode with RV64):
> add t0, t1, t2
> bltut1, t0, overflow
>
> unsigned subtraction (SImode with RV64):
> sext.w  t3, t1
> subwt0, t1, t2
> bltut0, t3, overflow

See analogous addition comment.

> ---
> Multiplication:
>
> signed multiplication (SImode with RV32 || DImode with RV64):
> mulht4, t1, t2
> mul t0, t1, t2
> srait5, t0, 31/63 (RV32/64)
> bne t4, t5, overflow
>
> signed multiplication (SImode with RV64):
> mul t0, t1, t2
> sext.w  t3, t0
> bne t0, t3, overflow
>
> unsigned multiplication (SImode with RV32 || DImode with RV64 ):
> mulhu   t4, t1, t2
> mul t0, t1, t2
> bne t4, 0,  overflow
>
> unsigned multiplication (SImode with RV64):
> sllit0,t0,32
> sllit1,t1,32
> srlit0,t0,32
> srlit1,t1,32
> mul t0,t0,t1
> srait5,t0,32
> bne t5, 0, overflow

I think you can eliminate the first two right shifts by replacing mul
with mulhu... something like:

  slli rx, rx, 32
  slli ry, ry, 32
  mulhu rz, rx, ry
  srli rt, rz, 32
  bnez rt, overflow

>
> ---
> ---
>  gcc/config/riscv/riscv.c  |   8 ++
>  gcc/config/riscv/riscv.h  |   5 +
>  gcc/config/riscv/riscv.md | 240 ++
>  3 files changed, 253 insertions(+)
>
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index d489717b2a5..cf94f5c9658 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -351,6 +351,14 @@ static const struct riscv_tune_info 
> riscv_tune_info_table[] = {
>{ "size", generic, _size_tune_info },
>  };
>
> +/* Implement TARGET_MIN_ARITHMETIC_PRECISION.  */
> +
> +static unsigned int
> +riscv_min_arithmetic_precision (void)
> +{
> +  return 32;
> +}
> +
>  /* Return the riscv_tune_info entry for the given name string.  */
>
>  static const struct riscv_tune_info *
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 172c7ca7c98..a6f451b97e3 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -121,6 +121,11 @@ extern const char *riscv_default_mtune (int argc, const 
> char **argv);
>  #define MIN_UNITS_PER_WORD 4
>  #endif
>
> +/* Allows SImode op in builtin overflow pattern, see internal-fn.c.  */
> +
> +#undef TARGET_MIN_ARITHMETIC_PRECISION
> +#define TARGET_MIN_ARITHMETIC_PRECISION riscv_min_arithmetic_precision
> +
>  /* The `Q' extension is not yet supported.  */
>  #define UNITS_PER_FP_REG (TARGET_DOUBLE_FLOAT ? 8 : 4)
>
> diff --git a/gcc/config/riscv/riscv.md b/gcc/config/riscv/riscv.md
> index 36012ad1f77..c82017a4bce 100644
> --- a/gcc/config/riscv/riscv.md
> +++ b/gcc/config/riscv/riscv.md
> @@ -462,6 +462,81 @@
>[(set_attr "type" "arith")
> (set_attr "mode" "DI")])
>
> +(define_expand "addv4"
> +  [(set (match_operand:GPR 0 "register_operand" "=r,r")
> +(plus:GPR (match_operand:GPR 1 "register_operand" " r,r")
> +(match_operand:GPR 2 "arith_operand"" r,I")))
> +(label_ref (match_operand 3 "" ""))]
> +  ""
> +{
> +  if (TARGET_64BIT && mode == SImode)
> +  {
> +rtx t3 = gen_reg_rtx (DImode);
> +rtx t4 = gen_reg_rtx (DImode);
> +rtx t5 = gen_reg_rtx (DImode);
> +rtx t6 = gen_reg_rtx (DImode);
> +
> +if (GET_CODE 

Re: [PATCH] RISC-V: Zihintpause: add __builtin_riscv_pause

2021-01-06 Thread Andrew Waterman via Gcc-patches
I've got a contrary opinion:

Since HINTs are guaranteed to execute as no-ops--e.g., this one is
just a FENCE instruction, which is already a mandatory part of the
base ISA--they don't _need_ to be called out as separate extensions in
the toolchain.

Although there's nothing fundamentally wrong with Kito's suggestion,
it seems like an extra hoop to jump through without commensurate
benefit.  I see no reason to restrict the use of __builtin_pause,
since all RISC-V implementations, including old ones, are required to
support it.  And, of course, that's the reason we encoded it this way
:)


On Wed, Jan 6, 2021 at 7:35 PM Kito Cheng  wrote:
>
> Hi Philipp:
>
> Could you add zihintpause to -march parser and guard that on the
> pattern and builtin like zifencei[1-2]?
>
> And could you sent a PR to
> https://github.com/riscv/riscv-c-api-doc/blob/master/riscv-c-api.md to
> mention __builtin_riscv_pause?
>
> Thanks!
>
> [1] march parser change:
> https://github.com/gcc-mirror/gcc/commit/b03be74bad08c382da47e048007a78fa3fb4ef49
> [2] Default version for ext.:
> https://github.com/gcc-mirror/gcc/commit/4b81528241ca682025d92558ff6aeec91dafdca8
>
>
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/riscv/builtin_pause.c
> > @@ -0,0 +1,10 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" }  */
> > +
> > +void test_pause()
>
> I would suggest you change the function name in the testcase,
> otherwise the scan-assembler test will always pass even if you didn't
> generate "pause" instruction.
>
>
> > +{
> > +  __builtin_riscv_pause ();
> > +}
> > +
> > +/* { dg-final { scan-assembler "pause" } } */
> > +
> > --
> > 2.18.4
> >


Re: [RISC-V] Add support for AddressSanitizer on RISC-V GCC

2020-08-19 Thread Andrew Waterman
I'm having trouble understanding why different ports chose their
various constants--e.g., SPARC uses 1<<29 for 32-bit and 1<<43 for
64-bit, whereas x86 uses 1<<29 and 0x7fff8000, respectively.  So I
can't comment on the choice of the constant 1<<36 for RISC-V.  But
isn't it a problem that 1<<36 is not a valid Pmode value for ILP32?


On Wed, Aug 19, 2020 at 1:02 AM Joshua via Gcc-patches
 wrote:
>
> From: cooper.joshua 
>
> gcc/
>
> * config/riscv/riscv.c (asan_shadow_offset): Implement the offset of 
> asan shadow memory for risc-v.
> (asan_shadow_offset): new macro definition.
> ---
>
>  gcc/config/riscv/riscv.c | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/gcc/config/riscv/riscv.c b/gcc/config/riscv/riscv.c
> index 63b0c38..b85b459 100644
> --- a/gcc/config/riscv/riscv.c
> +++ b/gcc/config/riscv/riscv.c
> @@ -5292,6 +5292,14 @@ riscv_gpr_save_operation_p (rtx op)
>return true;
>  }
>
> +/* Implement TARGET_ASAN_SHADOW_OFFSET.  */
> +
> +static unsigned HOST_WIDE_INT
> +riscv_asan_shadow_offset (void)
> +{
> +  return HOST_WIDE_INT_1U << 36;
> +}
> +
>  /* Initialize the GCC target structure.  */
>  #undef TARGET_ASM_ALIGNED_HI_OP
>  #define TARGET_ASM_ALIGNED_HI_OP "\t.half\t"
> @@ -5475,6 +5483,9 @@ riscv_gpr_save_operation_p (rtx op)
>  #undef TARGET_NEW_ADDRESS_PROFITABLE_P
>  #define TARGET_NEW_ADDRESS_PROFITABLE_P riscv_new_address_profitable_p
>
> +#undef TARGET_ASAN_SHADOW_OFFSET
> +#define TARGET_ASAN_SHADOW_OFFSET riscv_asan_shadow_offset
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>
>  #include "gt-riscv.h"
> --
> 2.7.4
>


Re: [PATCH] RISC-V/libgcc: Reduce the size of RV64 millicode by 6 bytes

2020-07-30 Thread Andrew Waterman
IIRC, I didn't use this approach originally because I wanted to avoid
the additional dynamic instruction.  But I agree that code size is the
more important metric for users of this feature.  LGTM.


On Thu, Jul 30, 2020 at 1:56 PM Maciej W. Rozycki via Gcc-patches
 wrote:
>
> Rewrite code sequences throughout the 64-bit RISC-V `__riscv_save_*'
> routines replacing `li t1, -48', `li t1, -64', and `li t1, -80',
> instructions, which do not have a compressed encoding, respectively with
> `li t1, 3', `li t1, 4', and `li t1, 4', which do, and then adjusting the
> remaining code accordingly observing that `sub sp, sp, t1' takes the
> same amount of space as an `slli t1, t1, 4'/`add sp, sp, t1' instruction
> pair does, again due to the use of compressed encodings, saving 6 bytes
> total.
>
> This change does increase code size by 4 bytes for RISC-V processors
> lacking the compressed instruction set, however their users couldn't
> care about the code size or they would have chosen an implementation
> that does have the compressed instructions, wouldn't they?
>
> libgcc/
> * config/riscv/save-restore.S [__riscv_xlen == 64]
> (__riscv_save_10, __riscv_save_8, __riscv_save_6, __riscv_save_4)
> (__riscv_save_2): Replace negative immediates used for the final
> stack pointer adjustment with positive ones, right-shifted by 4.
> ---
> Hi,
>
>  This is hopefully functionally obviously correct.  There were also no
> regressions in `riscv64-linux-gnu' lp64d multilib testing across all our
> testsuites (with QEMU run in the Linux user emulation mode) where target
> libraries, including glibc, have been built with `-Os -msave-restore',
> that is with millicode enabled.
>
>  OK to apply?
>
>   Maciej
> ---
>  libgcc/config/riscv/save-restore.S |   13 +++--
>  1 file changed, 7 insertions(+), 6 deletions(-)
>
> gcc-riscv-libgcc-save-sll.diff
> Index: gcc/libgcc/config/riscv/save-restore.S
> ===
> --- gcc.orig/libgcc/config/riscv/save-restore.S
> +++ gcc/libgcc/config/riscv/save-restore.S
> @@ -45,7 +45,7 @@ FUNC_BEGIN (__riscv_save_10)
>.cfi_restore 27
>addi sp, sp, -112
>.cfi_def_cfa_offset 112
> -  li t1, -16
> +  li t1, 1
>  .Ls10:
>sd s10, 16(sp)
>.cfi_offset 26, -96
> @@ -60,7 +60,7 @@ FUNC_BEGIN (__riscv_save_8)
>.cfi_restore 27
>addi sp, sp, -112
>.cfi_def_cfa_offset 112
> -  li t1, -32
> +  li t1, 2
>  .Ls8:
>sd s8, 32(sp)
>.cfi_offset 24, -80
> @@ -77,7 +77,7 @@ FUNC_BEGIN (__riscv_save_6)
>.cfi_restore 27
>addi sp, sp, -112
>.cfi_def_cfa_offset 112
> -  li t1, -48
> +  li t1, 3
>  .Ls6:
>sd s6, 48(sp)
>.cfi_offset 22, -64
> @@ -99,7 +99,7 @@ FUNC_BEGIN (__riscv_save_4)
>.cfi_restore 27
>addi sp, sp, -112
>.cfi_def_cfa_offset 112
> -  li t1, -64
> +  li t1, 4
>  .Ls4:
>sd s4, 64(sp)
>.cfi_offset 20, -48
> @@ -123,7 +123,7 @@ FUNC_BEGIN (__riscv_save_2)
>.cfi_restore 27
>addi sp, sp, -112
>.cfi_def_cfa_offset 112
> -  li t1, -80
> +  li t1, 5
>  .Ls2:
>sd s2, 80(sp)
>.cfi_offset 18, -32
> @@ -133,9 +133,10 @@ FUNC_BEGIN (__riscv_save_2)
>.cfi_offset 8, -16
>sd ra, 104(sp)
>.cfi_offset 1, -8
> +  slli t1, t1, 4
># CFA info is not correct in next 2 instruction since t1's
># value is depend on how may register really save.
> -  sub sp, sp, t1
> +  add sp, sp, t1
>jr t0
>.cfi_endproc
>  FUNC_END (__riscv_save_12)


Re: [RFC/PATCH] IFN: Fix mask_{load,store} optab support macros

2020-06-24 Thread Andrew Waterman
On Wed, Jun 24, 2020 at 5:56 PM Jim Wilson  wrote:
>
> On Wed, Jun 24, 2020 at 1:35 AM Richard Sandiford
>  wrote:
> > Richard Biener  writes:
> > > AVX512 would have V16SImode and SImode because the mask would have
> > > an integer mode?  Likewise I could imagine RISC-V using V4SImode and 
> > > V4QImode
> > > or however their mask registers look like.
>
> RISC-V has 7 mask modes, currently.  Masks are defined to always fit
> in one register.  So if you have a register group of 8 (mlen=8) and an
> element length of 8-bits (elen=8) then you have 1 bit of mask per
> element which completely fills one register.  If you have a register
> group of 1 (mlen=1) and an element length of 64-bits (elen=64) then
> you have 64-bits of mask per element which completely fills one
> register.  So we need a mode for each possible way of tiling a
> register with mask bits, which is 2**x for x=(0..6).

The more recent (and hopefully final) ISA design has element i masked
by mask bit i, irrespective of element size--i.e., mlen=1
unconditionally.  Sorry about the churn.

>
> > But what I mean is, once you know the vector mode, there should only
> > be one “choice” of mask mode.
>
> This is true for RISC-V also.  The mask mode is determined by the
> number of registers in the group and the element width, so there is
> only one valid mask mode for each vector mode.

And of course this is still true.

>
> Jim


Re: [committed] riscv: Fix up riscv_atomic_assign_expand_fenv [PR94950]

2020-05-06 Thread Andrew Waterman
Thank you!


On Wed, May 6, 2020 at 12:43 AM Jakub Jelinek  wrote:
>
> Hi!
>
> Similarly to the fixes on many other targets, riscv needs to use TARGET_EXPR
> to avoid having the create_tmp_var_raw temporaries without proper DECL_CONTEXT
> and not mentioned in local decls.
>
> Committed as obvious to trunk.
>
> 2020-05-06  Jakub Jelinek  
>
> PR target/94950
> * config/riscv/riscv-builtins.c (riscv_atomic_assign_expand_fenv): Use
> TARGET_EXPR instead of MODIFY_EXPR for first assignment to old_flags.
>
> --- gcc/config/riscv/riscv-builtins.c.jj2020-01-12 11:54:36.384413846 
> +0100
> +++ gcc/config/riscv/riscv-builtins.c   2020-05-05 12:41:08.005288721 +0200
> @@ -283,8 +283,8 @@ riscv_atomic_assign_expand_fenv (tree *h
>tree fsflags = GET_BUILTIN_DECL (CODE_FOR_riscv_fsflags);
>tree old_flags = create_tmp_var_raw (RISCV_ATYPE_USI);
>
> -  *hold = build2 (MODIFY_EXPR, RISCV_ATYPE_USI, old_flags,
> - build_call_expr (frflags, 0));
> +  *hold = build4 (TARGET_EXPR, RISCV_ATYPE_USI, old_flags,
> + build_call_expr (frflags, 0), NULL_TREE, NULL_TREE);
>*clear = build_call_expr (fsflags, 1, old_flags);
>*update = NULL_TREE;
>  }
>
> Jakub
>


Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass

2019-10-31 Thread Andrew Waterman
Nevertheless, as Jim observed, it's a great burden on the RISC-V
backend maintainers to support all these passes.  Are you saying WD is
willing to put its money where its mouth is and will provide active
support for these passes?

On Thu, Oct 31, 2019 at 2:42 AM Nidal Faour  wrote:
>
> A tests case for this patch has been written and pushed to WD github 
> repository at the following link:
> https://github.com/westerndigitalcorporation/riscv32-Code-density-test-bench/tree/master/common_load_store_base_address_offset
>
> the test case itself has been written based on a real product scenario.
> We got a saving of about 10% in code size of the test itself.
>
> Best Regards,
>
> Nidal Faour
> Staff Engineer, R Engineering – Firmware & Toolchain, CTO Group
>
> Western Digital®
> Migdal Tefen 24959, P.O Box 3
> Email: nidal.fa...@wdc.com
> Office: +972-4-9078756
> Mobile: +972-50-8867756
>
> >-Original Message-
> >From: Jim Wilson 
> >Sent: Thursday, 31 October 2019 1:57
> >To: Craig Blackmore 
> >Cc: GCC Patches ; Ofer Shinaar
> >; Nidal Faour ; Kito Cheng
> >; Jeff Law 
> >Subject: Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass
> >
> >CAUTION: This email originated from outside of Western Digital. Do not click
> >on links or open attachments unless you recognize the sender and know that
> >the content is safe.
> >
> >
> >On Fri, Oct 25, 2019 at 10:40 AM Craig Blackmore
> > wrote:
> >> This patch aims to allow more load/store instructions to be compressed
> >> by replacing a load/store of 'base register + large offset' with a new
> >> load/store of 'new base + small offset'. If the new base gets stored
> >> in a compressed register, then the new load/store can be compressed.
> >> Since there is an overhead in creating the new base, this change is
> >> only attempted when 'base register' is referenced in at least 4 
> >> load/stores in
> >a basic block.
> >>
> >> The optimization is implemented in a new RISC-V specific pass called
> >> shorten_memrefs which is enabled for RVC targets. It has been
> >> developed for the 32-bit lw/sw instructions but could also be extended to
> >64-bit ld/sd in future.
> >
> >The fact that this needs 4 load/stores in a block with the same base address
> >means it won't trigger very often.  But it does seem to work.
> >I see about a 0.05% code size reduction for a rv32gc newlib nano libc.a.  
> >There
> >might be other codes that benefit more.
> >
> >I'm concerned about the number of RISC-V specific optimization passes
> >people are writing.  I've seen at least 3 so far.  This one is at least 
> >small and
> >simple enough to understand that it doesn't look like it will cause 
> >maintenance
> >problems.
> >
> >The config.gcc change conflicts with the save-restore optimization pass that
> >Andrew Burgess added, but that is a trivial fix.
> >
> >The code only works for 32-bit load/stores.  rv64 has compressed 64-bit
> >load/stores, and the F and D extensions have float and double compressed
> >loads and stores.  The patch would be more useful if it handled all of these.
> >
> >The patch doesn't check the other operand, it only looks at the memory
> >operand.  This results in some cases where the code rewrites instructions 
> >that
> >can never be compressed.  For instance, given void store1z (int *array) {
> >  array[200] = 0;
> >  array[201] = 0;
> >  array[202] = 0;
> >  array[203] = 0;
> >}
> >the patch increases code size by 4 bytes because it rewrites the stores, but 
> >we
> >don't have compressed store 0, and x0 is not a compressed reg, so there is no
> >point in rewriting the stores.  I can also create examples that show a size
> >increase with loads but it is a little trickier.
> >extern int sub1 (int, int, int, int, int, int, int); int load1a (int a0, int 
> >a1, int a2, int
> >a3, int a4, int *array) {
> >  int a = 0;
> >  a += array[200];
> >  a += array[201];
> >  a += array[202];
> >  a += array[203];
> >  return sub1 (a0, a1, a2, a3, a4, 0, a); } The register allocator will pass 
> > a0-a4
> >directly through from args to the call, leaving only a5-a7 for the load base
> >address and dest, and only one of those regs is a compressed reg, but we
> >need two, so these loads can't be compressed.  The code still gets rewritten,
> >resulting in a size increase for the extra add.  Not sure if you can do 
> >anything
> >about that though, since you are doing this before register allocation.
> >
> >It isn't clear that the change riscv_address_cost is for.  That should be
> >commented.
> >
> >I'd suggest parens in the riscv_compressed_lw_offset_p return statement,
> >that makes it easier to align the code correctly (in emacs at least).
> >
> >The RISCV_MAX_COMPRESSED_LW_OFFSET should be moved down to the
> >"ISA constants" section of riscv.h, and maybe renamed similar to the other
> >constants.
> >
> >There is a REG_P check in get_si_mem_reg.  That should probably handle
> >SUBREGs too.
> >
> >A testcase to verify the patch would be nice.  I have one I wrote for 

Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass

2019-10-26 Thread Andrew Waterman
I don't know enough to say whether the legitimize_address hook is
sufficient for the intended purpose, but I am sure that RISC-V's
concerns are not unique: other GCC targets have to cope with
offset-size constraints that are coupled to register-allocation
constraints.


On Sat, Oct 26, 2019 at 11:21 AM Jeff Law  wrote:
>
> On 10/25/19 11:39 AM, Craig Blackmore wrote:
> > This patch aims to allow more load/store instructions to be compressed by
> > replacing a load/store of 'base register + large offset' with a new 
> > load/store
> > of 'new base + small offset'. If the new base gets stored in a compressed
> > register, then the new load/store can be compressed. Since there is an 
> > overhead
> > in creating the new base, this change is only attempted when 'base 
> > register' is
> > referenced in at least 4 load/stores in a basic block.
> >
> > The optimization is implemented in a new RISC-V specific pass called
> > shorten_memrefs which is enabled for RVC targets. It has been developed for 
> > the
> > 32-bit lw/sw instructions but could also be extended to 64-bit ld/sd in 
> > future.
> >
> > Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc, rv64imac,
> > rv64imafdc via QEMU. No regressions.
> >
> > gcc/ChangeLog:
> >
> >   * config.gcc: Add riscv-shorten-memrefs.o to extra_objs for riscv.
> >   * config/riscv/riscv-passes.def: New file.
> >   * config/riscv/riscv-protos.h (make_pass_shorten_memrefs): Declare.
> >   * config/riscv/riscv-shorten-memrefs.c: New file.
> >   * config/riscv/riscv.c (tree-pass.h): New include.
> >   (riscv_compressed_reg_p): New Function
> >   (riscv_compressed_lw_offset_p): Likewise.
> >   (riscv_compressed_lw_address_p): Likewise.
> >   (riscv_shorten_lw_offset): Likewise.
> >   (riscv_legitimize_address): Attempt to convert base + large_offset
> > to compressible new_base + small_offset.
> >   (riscv_address_cost): Make anticipated compressed load/stores
> > cheaper for code size than uncompressed load/stores.
> >   (riscv_register_priority): Move compressed register check to
> > riscv_compressed_reg_p.
> >   * config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET): Define.
> >   * config/riscv/riscv.opt (mshorten-memefs): New option.
> >   * config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
> >   (PASSES_EXTRA): Add riscv-passes.def.
> >   * doc/invoke.texi: Document -mshorten-memrefs.
> This has traditionally been done via the the legitimize_address hook.
> Is there some reason that hook is insufficient for this case?
>
> The hook, IIRC, is called out explow.c.
>
> Jeff
>


Re: [PATCH 1/2] gcc/riscv: Include more registers in SIBCALL_REGS

2019-08-19 Thread Andrew Waterman
x5 is used as an alternate link register, so using it for sibcalls
will confuse hardware return-address stacks and reduce performance for
implementations that have one.

The reason I excluded a0-a7 is that those are used to pass arguments
to the sibcallee.  It's of course possible that's more restrictive
than necessary, but make sure to test before merging.


On Mon, Aug 19, 2019 at 12:16 PM Andrew Burgess
 wrote:
>
> The current SIBCALL_REGS are x6, x7, and x28 to x31.  These are all
> caller saved registers, however, they are not all of the caller saved
> registers.
>
> I don't see any reason why we couldn't add t1, and a0 to a7 into this
> set, and this is what this patch does.
>
> gcc/ChangeLog:
>
> * config/riscv/riscv.h (REG_CLASS_CONTENTS): Update SIBCALL_REGS.
> ---
>  gcc/ChangeLog| 4 
>  gcc/config/riscv/riscv.h | 2 +-
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
> index 5fc9be8edbf2..bb8240bb849a 100644
> --- a/gcc/config/riscv/riscv.h
> +++ b/gcc/config/riscv/riscv.h
> @@ -400,7 +400,7 @@ enum reg_class
>  #define REG_CLASS_CONTENTS \
>  {  \
>{ 0x, 0x, 0x },  /* NO_REGS */   \
> -  { 0xf0c0, 0x, 0x },  /* SIBCALL_REGS */  \
> +  { 0xf003fce0, 0x, 0x },  /* SIBCALL_REGS */  \
>{ 0xffc0, 0x, 0x },  /* JALR_REGS */ \
>{ 0x, 0x, 0x },  /* GR_REGS */   \
>{ 0x, 0x, 0x },  /* FP_REGS */   \
> --
> 2.14.5
>


Re: [riscv] Wrap ASM_OUTPUT_LABELREF in do {} while (0)

2017-11-12 Thread Andrew Waterman
Thanks, Tom.

On Sun, Nov 12, 2017 at 8:44 AM, Tom de Vries 
wrote:

> Hi,
>
> this patch wraps riscv.h's ASM_OUTPUT_LABELREF in "do {} while (0)".
>
> Build for riscv64.
>
> Committed as obvious.
>
> Thanks,
> - Tom
>


Re: [PATCH] RISC-V: Document the medlow and medany code models

2017-11-01 Thread Andrew Waterman
Thanks for caring enough about our patches to even notice the grammar :)

On Tue, Oct 31, 2017 at 11:09 PM, Sandra Loosemore
<san...@codesourcery.com> wrote:
> On 10/31/2017 10:27 PM, Andrew Waterman wrote:
>>
>> I have to disagree.  It's standard to not hyphenate an
>> adverb-adjective compound, since they tend not to be ambiguous.  But
>> if the standard in the GCC documentation is to hyphenate, I will not
>> stand in the way!
>
>
> Heh, you got me on this one -- it's one of those weird special cases, the
> "adverb ending in -ly plus participle" case listed here:
>
> http://www.chicagomanualofstyle.org/16/images/ch07_tab01.pdf
>
> So the patch is OK as-is, but maybe I need a few new brain cells.  :-P
>
> -Sandra
>


Re: [PATCH] RISC-V: Document the medlow and medany code models

2017-10-31 Thread Andrew Waterman
I have to disagree.  It's standard to not hyphenate an
adverb-adjective compound, since they tend not to be ambiguous.  But
if the standard in the GCC documentation is to hyphenate, I will not
stand in the way!

On Tue, Oct 31, 2017 at 8:47 PM, Sandra Loosemore
 wrote:
> On 10/31/2017 06:54 PM, Palmer Dabbelt wrote:
>>
>> This documentation is patterned off the aarch64 -mcmodel documentation.
>>
>> gcc/ChangeLog:
>>
>> 2017-10-31  Palmer Dabbelt  
>>
>>  * doc/invoke.texi (RISC-V Options): Explicitly name the medlow
>>  and medany code models, and describe what they do.
>> ---
>>   gcc/doc/invoke.texi | 17 +
>>   1 file changed, 13 insertions(+), 4 deletions(-)
>>
>> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
>> index 43acbcbbcd77..8903afaeeffc 100644
>> --- a/gcc/doc/invoke.texi
>> +++ b/gcc/doc/invoke.texi
>> @@ -989,7 +989,7 @@ See RS/6000 and PowerPC Options.
>>   -msmall-data-limit=@var{N-bytes} @gol
>>   -msave-restore  -mno-save-restore @gol
>>   -mstrict-align -mno-strict-align @gol
>> --mcmodel=@var{code-model} @gol
>> +-mcmodel=medlow -mcmodel=medany @gol
>>   -mexplicit-relocs  -mno-explicit-relocs @gol}
>>
>>   @emph{RL78 Options}
>
>
> Hmmm.  I have a mild preference for keeping the metasyntactic variable in
> the option documentation and using a nested table to list the possible
> values, but I see that the documentation for other backends that support
> this option are not consistent, so I guess it is OK to expand it like that
> as long as you don't anticipate adding additional keywords in the future.
>
>> @@ -21765,9 +21765,18 @@ Use smaller but slower prologue and epilogue
>> code.
>>   @opindex mstrict-align
>>   Do not generate unaligned memory accesses.
>>
>> -@item -mcmodel=@var{code-model}
>> -@opindex mcmodel
>> -Specify the code model.
>> +@item -mcmodel=medlow
>> +@opindex mcmodel=medlow
>> +Generate code for the medium-low code model. The program and its
>> statically
>> +defined symbols must lie within a single 2 GiB address range and must lie
>> +between absolute addresses -2 GiB and +2 GiB. Programs can be statically
>> or
>> +dynamically linked. This is the default code model.
>> +
>> +@item -mcmodel=medany
>> +@opindex mcmodel=medany
>> +Generate code for the medium-any code model. The program and its
>> statically
>> +defined symbols must be within any single 2 GiB address range. Programs
>> can be
>> +statically or dynamically linked.
>>
>>   @end table
>
>
> s/statically defined symbols/statically-defined symbols/g
>
> Bonus points if you make that fix in the aarch64 documentation you copied
> this from, too.  :-)
>
> -Sandra
>


Re: Fix riscv port breakage.

2017-07-12 Thread Andrew Waterman
Thank you for the fix and the cleanup, Jeff!


Re: [PATCH][riscv] Fix build due to INT16_MAX issue

2017-02-07 Thread Andrew Waterman
Approved.

And thanks for the reminder about the MAINTAINERS file.

On Tue, Feb 7, 2017 at 2:45 AM, Kyrill Tkachov
 wrote:
> Hi all,
>
> I tried building a cc1 for riscv-x-elf and I got a build error about
> INT16_MAX not being defined on my system.
> I think it's not a standard macro.  This patch fixes that with what I think
> is the equivalent definition using
> GCC modes logic.
>
> With this my build proceeds to build a cc1 successfully. No testing beyond
> that.
>
> Is this ok for trunk?
>
> Thanks,
> Kyrill
>
> P.S. Palmer, Andrew, I didn't see your names and emails in the MAINTAINERS
> file, you should update that when
> you get the chance.
>
>
> 2016-02-07  Kyrylo Tkachov  
>
> * config/riscv/riscv.c (riscv_build_integer_1): Avoid use of INT16_MAX.


Re: [PATCH 1/6] RISC-V Port: gcc/config/riscv/riscv.c

2017-01-31 Thread Andrew Waterman
On Tue, Jan 31, 2017 at 10:01 AM, Richard Henderson <r...@redhat.com> wrote:
> On 01/30/2017 04:53 PM, Andrew Waterman wrote:
>> The ISA spec references an out-of-date calling convention, and will be
>> removed in the next revision to orthogonalize the ABI from the ISA.
>> We are in the process of drafting a RISC-V ELF psABI spec, which the
>> GCC port targets.
>> https://github.com/riscv/riscv-elf-psabi-doc/blob/master/riscv-elf.md
>> In this calling convention, the GPRs and FPRs are allocated
>> separately, which is more performant, especially when XLEN and FLEN
>> are not equal.
>
> Good to know.
>
> Re the new spec, you don't require arguments stored in two fp registers to be
> aligned, but you do require arguments stored in two int registers to be
> aligned?  Why?
>
> I see that as a waste when it comes to argument lists (for xlen=32) like
>
>   void foo(class *x, int64_t y, int a, int b, int c, int d)
>
> It's not as if the ISA has a store-multiple-register instruction that requires
> even register numbers...

We will reconsider only aligning arguments in the varargs case.

>
>
> r~


Re: [PATCH 2/6] RISC-V Port: gcc

2017-01-31 Thread Andrew Waterman
On Tue, Jan 31, 2017 at 10:32 AM, Richard Henderson <r...@redhat.com> wrote:
> On 01/30/2017 05:10 PM, Andrew Waterman wrote:
>>>> +(define_expand "clear_cache"
>>>> +  [(match_operand 0 "pmode_register_operand")
>>>> +   (match_operand 1 "pmode_register_operand")]
>>>> +  ""
>>>> +  "
>>>> +{
>>>> +  emit_insn (gen_fence_i ());
>>>> +  DONE;
>>>> +}")
>>>
>>>
>>> Do you need a FENCE before the FENCE.I?
>>
>> It's actually not clear to me what the semantics of clear_cache are
>> for multiprocessors.  Can you shed some light?
>>
>> If thread A does modifies code then sets a flag, then thread B reads
>> the flag and executes a FENCE.I, then thread A needs a FENCE before
>> setting the flag and thread B needs a fence before the FENCE.I.  But,
>> is it not the software's responsibility to insert both fences, rather
>> than assuming one of the fences is folded into clear_cache?
>
> Your introduction of "flag" confuses the issue.
>
> Having re-read the description in section 2.7, I see that FENCE.I is
> thread-local and is all that is required for a single thread to sync its own I
> and D caches.  I think perhaps I'd mis-read or mis-remembered before.
>
> Practically speaking, I'm not sure we have put any real thought about what
> needs to happen for threads using on-stack trampolines.  Certainly no other 
> gcc
> port attempts to broadcast the need for an icache flush to other cpus.
>
> So just leave as-is -- __builtin_clear_cache works properly for the local 
> thread.

Sorry, I jumped right to the assumption that you were suggesting a
FENCE for multiprocessor synchronization.  Indeed, it's not necessary
for enforcing ordering between local stores and FENCE.I.

>
>
>>>> +(define_insn "call_value_multiple_internal"
>>>> +  [(set (match_operand 0 "register_operand" "")
>>>> +   (call (mem:SI (match_operand 1 "call_insn_operand" "l,S"))
>>>> + (match_operand 2 "" "")))
>>>> +   (set (match_operand 3 "register_operand" "")
>>>> +   (call (mem:SI (match_dup 1))
>>>> + (match_dup 2)))
>>>
>>>
>>> Any reason for this?  Your return value registers are sequential.  The
>>> normal thing to do is just use e.g. (reg:TI 10).
>>
>> I think we'd need different patterns for mixed int/FP struct returns
>> (which use a0 and fa0) if we took that approach.
>
> Ah.  Other targets get away with using a PARALLEL.
>
> >From sparc.c, function_arg_record_value:
>
>   data.ret = gen_rtx_PARALLEL (mode, rtvec_alloc (data.stack + nregs));
>
> and sparc.md:
>
> (define_insn "*call_value_symbolic_sp64"
>   [(set (match_operand 0 "" "")
> (call (mem:DI (match_operand:DI 1 "symbolic_operand" "s"))
>   (match_operand 2 "" "")))
>(clobber (reg:DI O7_REG))]
>
> So you wind up with things like
>
>   (set (parallel [
>  (reg:DI o0)
>  (reg:DF fp0)
>  (reg:DI o1)
>  (reg:DF fp1)
> ])
>(call (mem:DI (symbol_ref:DI "function"))
>  (const_int 0)))
>
> We do the same for x86_64 -- in function_value_64:
>
>   ret = construct_container (mode, orig_mode, valtype, 1,
>  X86_64_REGPARM_MAX, X86_64_SSE_REGPARM_MAX,
>  x86_64_int_return_registers, 0);

Thanks--these pointers were quite helpful.

>
>
> r~


Re: [PATCH 2/6] RISC-V Port: gcc

2017-01-30 Thread Andrew Waterman
On Fri, Jan 20, 2017 at 10:41 PM, Richard Henderson  wrote:
> On 01/11/2017 06:30 PM, Palmer Dabbelt wrote:
>>
>> +(define_register_constraint "f" "TARGET_HARD_FLOAT ? FP_REGS : NO_REGS"
>> +  "A floating-point register (if available).")
>> +
>
>
> I know this is the Traditional Way, but I do wonder if using the new enable
> attribute on the alternatives would be better.  No need to change change;
> just something that makes me wonder.

Yeah, we could look into that later.

>
>> +(define_constraint "Q"
>> +  "@internal"
>> +  (match_operand 0 "const_arith_operand"))
>
>
> How does this differ from "I"?
>
>> +
>> +(define_memory_constraint "W"
>> +  "@internal
>> +   A memory address based on a member of @code{BASE_REG_CLASS}."
>> +  (and (match_code "mem")
>> +   (match_operand 0 "memory_operand")))
>
>
> How does this differ (materially) from "m"?  Or from just
>
>   (match_operand 0 "memory_operand")
>
> for that matter?
>
>
>> +;;
>> +;; DI -> SI optimizations
>> +;;
>> +
>> +;; Simplify (int)(a + 1), etc.
>> +(define_peephole2
>> +  [(set (match_operand:DI 0 "register_operand")
>> +   (match_operator:DI 4 "modular_operator"
>> + [(match_operand:DI 1 "register_operand")
>> +  (match_operand:DI 2 "arith_operand")]))
>> +   (set (match_operand:SI 3 "register_operand")
>> +   (truncate:SI (match_dup 0)))]
>> +  "TARGET_64BIT && (REGNO (operands[0]) == REGNO (operands[3]) ||
>> peep2_reg_dead_p (2, operands[0]))
>> +   && (GET_CODE (operands[4]) != ASHIFT || (CONST_INT_P (operands[2]) &&
>> INTVAL (operands[2]) < 32))"
>> +  [(set (match_dup 3)
>> + (truncate:SI
>> +(match_op_dup:DI 4
>> +  [(match_operand:DI 1 "register_operand")
>> +   (match_operand:DI 2 "arith_operand")])))])
>
>
> I take from this that combine + ree fail to do the job?
>
> I must say I'm less than thrilled about the use of truncate instead of
> simply properly representing the sign-extensions that are otherwise required
> by the ABI.  RISCV is unlike MIPS in that the 32-bit operations (addw et al)
> do not require sign-extended inputs in order to produce a correct output
> value.
>
> Consider Alpha as a counter-example to MIPS, where the ABI requires
> sign-extensions at function edges and the ISA requires sign-extensions for
> comparisons.

We've revamped the port to use TRULY_NOOP_TRUNCATION.

>>
>>
>> +(define_insn "*local_pic_storesi"
>> +  [(set (mem:ANYF (match_operand 0 "absolute_symbolic_operand" ""))
>> +   (match_operand:ANYF 1 "register_operand" "f"))
>> +   (clobber (reg:SI T0_REGNUM))]
>> +  "TARGET_HARD_FLOAT && !TARGET_64BIT && USE_LOAD_ADDRESS_MACRO
>> (operands[0])"
>> +  "\t%1,%0,t0"
>> +  [(set (attr "length") (const_int 8))])
>
>
> Use match_scratch so that you need not hard-code T0.
> And likewise for the other pic stores.
>
>> +(define_predicate "const_0_operand"
>> +  (and (match_code "const_int,const_double,const_vector")
>> +   (match_test "op == CONST0_RTX (GET_MODE (op))")))
>
>
> Probably want to support const_wide_int.
>
>> +  /* Don't handle multi-word moves this way; we don't want to introduce
>> + the individual word-mode moves until after reload.  */
>> +  if (GET_MODE_SIZE (mode) > UNITS_PER_WORD)
>> +return false;
>
>
> Why not?  Do the subreg passes not do a good job?  I would expect it to do
> well on a target like RISCV.
>
>> +(define_predicate "move_operand"
>> +  (match_operand 0 "general_operand")
>
>
> Normally this is handled by TARGET_LEGITIMATE_CONSTANT_P.
>
>> +/* Target CPU builtins.  */
>> +#define TARGET_CPU_CPP_BUILTINS()  \
>> +  do   \
>> +{  \
>> +  builtin_define ("__riscv");  \
>
>
> Perhaps better to move this to riscv-c.c?
>
>> +  if (TARGET_MUL)  \
>> +   builtin_define ("__riscv_mul"); \
>> +  if (TARGET_DIV)  \
>> +   builtin_define ("__riscv_div"); \
>> +  if (TARGET_DIV && TARGET_MUL)\
>> +   builtin_define ("__riscv_muldiv");  \
>
>
> Out of curiosity, why are these split when the M extension doesn't do so?

Implementations that have MUL but not DIV cannot claim conformance to
the M extension, but they do exist, and it's easy enough for us to
support them.  AFAIK, they are all FPGA soft cores, where MUL maps
very cheaply to the DSP macros, but DIV is relatively more expensive.

>
>> +/* Define this macro if it is advisable to hold scalars in registers
>> +   in a wider mode than that declared by the program.  In such cases,
>> +   the value is constrained to be within the 

Re: [PATCH 1/6] RISC-V Port: gcc/config/riscv/riscv.c

2017-01-30 Thread Andrew Waterman
Thanks for the feedback, Richard.  We've addressed the bulk of it, and
added some explanatory comments in the few cases where the current
implementation makes sense, but for less than obvious reasons.  We
will submit a v2 patch set reflecting these changes in the next couple
of days.

A few responses are in-line below.

On Fri, Jan 20, 2017 at 6:32 PM, Richard Henderson  wrote:
>
> On 01/11/2017 06:30 PM, Palmer Dabbelt wrote:
>>
>> +/* The largest number of operations needed to load an integer constant.
>> +   The worst case is LUI, ADDI, SLLI, ADDI, SLLI, ADDI, SLLI, ADDI,
>> +   but we may attempt and reject even worse sequences.  */
>> +#define RISCV_MAX_INTEGER_OPS 32
>
>
> Why would you?  Surely after you exhaust 8 you'd just abandon that search as 
> unprofitable.
>
>> +  if (cost > 2 && (value & 1) == 0)
>> +{
>> +  int shift = 0;
>> +  while ((value & 1) == 0)
>> +   shift++, value >>= 1;
>
>
>   shift = ctz_hwi (value);
>
> You also may want to test for
>
>   value | (HOST_WIDE_INT_M1U << (HOST_BITS_PER_WIDE_INT - shift))
>
> i.e. shifting out leading 1's.  As well as checking with shift - 12.  The 
> latter is interesting for shifting a 20-bit value up into the high word.
>
> I once proposed a generic framework for this that cached results for 
> computation of sequences and costs.  Unfortunately it never gained traction. 
> Perhaps it's time to try again.
>
>> +  /* Try filling trailing bits with 1s.  */
>> +  while ((value << shift) >= 0)
>> +   shift++;
>
>
> clz_hwi.
>
>> +  return GET_CODE (x) == SYMBOL_REF && SYMBOL_REF_TLS_MODEL (x) != 0;
>
>
> SYMBOL_REF_P.
>
>> +riscv_symbol_binds_local_p (const_rtx x)
>> +{
>> +  return (SYMBOL_REF_DECL (x)
>> + ? targetm.binds_local_p (SYMBOL_REF_DECL (x))
>> + : SYMBOL_REF_LOCAL_P (x));
>
>
> Missing SYMOL_REF_P?
>
> Surely encode_section_info will already have set SYMBOL_FLAG_LOCAL, and thus 
> you need not invoke targetm.binds_local_p again.
>
>> +case LABEL_REF:
>> +  if (LABEL_REF_NONLOCAL_P (x))
>> +   return SYMBOL_GOT_DISP;
>> +  break;
>
>
> Non-local labels are not local to the current function, but they are still 
> local to the translation unit (they'll be local to one of the outer functions 
> of a nested function).
>
>> +  switch (*symbol_type)
>> +{
>> +case SYMBOL_ABSOLUTE:
>> +case SYMBOL_PCREL:
>> +case SYMBOL_TLS_LE:
>> +  return (int32_t) INTVAL (offset) == INTVAL (offset);
>
>
> Why?
>
>
>> +case MINUS:
>> +  if (float_mode_p
>> + && !HONOR_NANS (mode)
>> + && !HONOR_SIGNED_ZEROS (mode))
>> +   {
>> + /* See if we can use NMADD or NMSUB.  See riscv.md for the
>> +associated patterns.  */
>> + rtx op0 = XEXP (x, 0);
>> + rtx op1 = XEXP (x, 1);
>> + if (GET_CODE (op0) == MULT && GET_CODE (XEXP (op0, 0)) == NEG)
>> +   {
>> + *total = (tune_info->fp_mul[mode == DFmode]
>> +   + set_src_cost (XEXP (XEXP (op0, 0), 0), mode, speed)
>> +   + set_src_cost (XEXP (op0, 1), mode, speed)
>> +   + set_src_cost (op1, mode, speed));
>> + return true;
>> +   }
>> + if (GET_CODE (op1) == MULT)
>> +   {
>> + *total = (tune_info->fp_mul[mode == DFmode]
>> +   + set_src_cost (op0, mode, speed)
>> +   + set_src_cost (XEXP (op1, 0), mode, speed)
>> +   + set_src_cost (XEXP (op1, 1), mode, speed));
>> + return true;
>> +   }
>> +   }
>
>
> Do we not fold these to FMA + NEG?  If not, that's surprising and maybe 
> should be fixed.  Also, you appear to be missing costs for FMA in 
> riscv_rtx_costs.
>
>> +   case UNORDERED:
>> + *code = EQ;
>> + /* Fall through.  */
>> +
>> +   case ORDERED:
>> + /* a == a && b == b */
>> + tmp0 = gen_reg_rtx (SImode);
>> + riscv_emit_binary (EQ, tmp0, cmp_op0, cmp_op0);
>> + tmp1 = gen_reg_rtx (SImode);
>> + riscv_emit_binary (EQ, tmp1, cmp_op1, cmp_op1);
>> + *op0 = gen_reg_rtx (SImode);
>> + riscv_emit_binary (AND, *op0, tmp0, tmp1);
>> + break;
>
>
> Better with FCLASS + AND?  At least for a branch?

The code path is slightly shorter using FEQ instead.  If FEQ were two
or more cycles slower than FCLASS, then FCLASS would be a win for
latency, but that is not the case for any known implementation.

>
>> +static int
>> +riscv_flatten_aggregate_field (const_tree type,
>> +  riscv_aggregate_field fields[2],
>> +  int n, HOST_WIDE_INT offset)
>
>
> I don't see any code within to bound N to 2, so as not to overflow FIELDS.  
> Am I missing something?
>
> Are you missing code for COMPLEX_TYPE?  In the default case I only see 
> SCALAR_FLOAT_TYPE_P.
>
>> +  memset (info, 0, sizeof (*info));
>> +  

Re: [PATCH 3/6] RISC-V Port: libgcc

2017-01-24 Thread Andrew Waterman
On Fri, Jan 20, 2017 at 10:53 PM, Richard Henderson  wrote:
> On 01/11/2017 06:30 PM, Palmer Dabbelt wrote:
>>
>> +__riscv_save_12:
>> +  addi sp, sp, -112
>> +  li t1, 0
>> +  sd s11, 8(sp)
>> +  j .Ls10
>> +
>> +__riscv_save_11:
>> +__riscv_save_10:
>> +  addi sp, sp, -112
>> +  li t1, -16
>
>
> No unwind info?

Next patch set will have unwind info.

>
>
> r~


Re: [PATCH 2/6] RISC-V Port: gcc

2017-01-17 Thread Andrew Waterman
On Tue, Jan 17, 2017 at 12:48 PM, Karsten Merker <mer...@debian.org> wrote:
> On Mon, Jan 16, 2017 at 09:37:15PM -0800, Palmer Dabbelt wrote:
>> On Sat, 14 Jan 2017 02:05:27 PST (-0800), mer...@debian.org wrote:
>> > Palmer Dabbelt wrote:
>> >
>> >> diff --git a/gcc/config/riscv/linux.h b/gcc/config/riscv/linux.h
>> >> new file mode 100644
>> >> index 000..045f6cc
>> >> --- /dev/null
>> >> +++ b/gcc/config/riscv/linux.h
>> >> [...]
>> >>  +#define GLIBC_DYNAMIC_LINKER "/lib" XLEN_SPEC "/" ABI_SPEC "/ld.so.1"
>> >
>> > [with XLEN_SPEC being either 32 or 64 and ABI_SPEC being one of
>> >  ilp32, ilp32f, ilp32d, lp64, lp64f, lp64d]
> [...]
>> > I am not fully happy with the way the dynamic linker path (which
>> > gets embedded into every Linux executable built by gcc and
>> > therefore cannot be changed later) is defined here.  The dynamic
>> > linker path must be unique over all platforms for which a Linux
>> > port exists to make multiarch installations (i.e. having
>> > dynamically linked binaries for multiple architectures/ABIs in
>> > the same root filesystem) possible.  The path specifier as cited
>> > above contains nothing that makes the linker path inherently
>> > specific to RISC-V.  While there is AFAIK no other architecture
>> > that currently uses exactly this specific linker path model with
>> > the ABI specifier as a separate subdirectory (instead of encoding
>> > it into the filename), so that there technically isn't a naming
>> > conflict, I think that we should follow the convention of the
>> > other "modern" Linux architectures, which all include the
>> > architecture name in their linker path:
>> >
>> >   * arm64:/lib/ld-linux-aarch64.so.1
>> >   * armhf:/lib/ld-linux-armhf.so.3
>> >   * ia64: /lib/ld-linux-ia64.so.2
>> >   * mips n64: /lib64/ld-linux-mipsn8.so.1
>> >   * nios2:/lib/ld-linux-nios2.so.1
>> >   * x86_64:   /lib64/ld-linux-x86-64.so.2
>> >
>> > So the actual ld.so binary should be called something like
>> > "ld-linux-rv.so.1" instead of just "ld.so.1". With everything
>> > else staying the same, that would give us a dynamic linker path
>> > along the lines of "/lib64/lp64f/ld-linux-rv.so.1" for an RV64G
>> > system.
> [...]
>> Just to be clear, the paths you'd like would look exactly like
>>
>>   rv32-ilp32: /lib32/ilp32/ld-linux-rv.so.1
>>   rv64-lp64d: /lib64/lp64d/ld-linux-rv.so.1
>>
>> ?
>
> Yes, that is what I had in mind.
>
>> If so, that should be a pretty straight-forward change.  I'll
>> incorporate it into our v2 patchset.  I'd also be OK with something
>> like "/lib64/lp64d/ld-linux-rv64imafd-lp64d.so.1", if for some reason
>> that's better (it looks a bit more like the other architectures to
>> me).  I'm really OK with pretty much anything here, so feel free to
>> offer suggestions -- otherwise I'll just go with what's listed above.
>
> Including the ABI specifier twice, i.e. both as a subdirectory
> name (.../lp64d/...) and as part of the ld.so filename
> (ld-linux-rv64imafd-lp64d.so.1) doesn't seem useful to me. The
> ABI specifier must be a part of the dynamic linker path, but
> having it in there once should be enough :-).
>
> Whether one encodes the ABI specifier inside the ld.so filename
> or in the form of a subdirectory AFAICS doesn't make any
> technical difference and appears to me largely as a matter of
> taste.  My proposal above was just the minimalst-possible change
> against the existing code that would fullfill my aim.
>
> The other Linux platforms commonly don't use subdirectories and
> instead encode the ABI specifier as part of the ld.so filename
> (e.g. the "hf" part in /lib/ld-linux-armhf.so.3 specifies
> hardfloat EABI, and the "n8" part in
> /lib64/ld-linux-mipsn8.so.1 specifies a specific MIPS ABI variant),
> while RISC-V currently encodes the ABI variant as a subdirectory name.
>
> Stefan O'Rear wrote on the RISC-V sw-dev list that he would prefer to
> encode the ABI specifier as part of the ld.so filename and put
> everything in /lib instead of differentiating the directory by XLEN,
> which would keep things largely similar to the other Linux platforms.
> Based on your two examples above that would result in something like:
>
> rv32-ilp32: /lib/ld-linux-rv32ilp32.so.1
> rv64-lp64d: /lib/ld-linux-rv64lp64d.so.1
>
> I am happy with any of these v

Re: [PATCH 1/6] RISC-V Port: gcc/config/riscv/riscv.c

2017-01-16 Thread Andrew Waterman
On Thu, Jan 12, 2017 at 1:38 PM, Joseph Myers  wrote:
> On Wed, 11 Jan 2017, Palmer Dabbelt wrote:
>
>> +#include 
>
> This is included in system.h, so don't include it here.

OK.

>
>> +  error ("unknown cpu `%s' for -mtune", cpu_string);
>
> This is using very-old-style `' quotes.  Diagnostics should use e.g. %qs
> for quoting the output of a single % directive, or %< and %> for quoting
> anything more complicated, so that Unicode quotes can be used when the
> locale permits.
>
> Likewise elsewhere in this patch and in patch 2.

Will do, throughout.

>
>> +#undef TARGET_LRA_P
>> +#define TARGET_LRA_P hook_bool_void_true
>
> Using LRA is the default; you shouldn't need this definition.

Ah, glad to see LRA is now the default.

>
> I don't see a definition of TARGET_ATOMIC_ASSIGN_EXPAND_FENV.  Since you
> have floating-point exceptions, I'd expect lack this to result in
> c11-atomic-exec-5.c failing.

Indeed, and after implementing the hook, those failures disappeared.  Thanks.

>
> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: [PATCH 2/6] RISC-V Port: gcc

2017-01-13 Thread Andrew Waterman
On Thu, Jan 12, 2017 at 2:30 PM, Joseph Myers  wrote:
> On Wed, 11 Jan 2017, Palmer Dabbelt wrote:
>
>> +static void
>> +riscv_parse_arch_string (const char *isa, int *flags)
>
> This should be passed the location from riscv_handle_option...
>
>> +  error ("-march=%s: ISA string must begin with rv32 or rv64", isa);
>
>  ... so you can use error_at with an explicit location (for all errors in
> thsi function).
>
>> +static bool
>> +riscv_handle_option (struct gcc_options *opts,
>> +  struct gcc_options *opts_set ATTRIBUTE_UNUSED,
>> +  const struct cl_decoded_option *decoded,
>> +  location_t loc ATTRIBUTE_UNUSED)
>
> The location will no longer then be ATTRIBUTE_UNUSED.

Will do.

>
>> + supported_defaults="abi arch tune"
>
> So you have three supported defaults here ...
>
>> +/* Support for a compile-time default CPU, et cetera.  The rules are:
>> +   --with-arch is ignored if -march is specified.
>> +   --with-abi is ignored if -mabi is specified.
>> +   --with-tune is ignored if -mtune is specified.  */
>> +#define OPTION_DEFAULT_SPECS \
>> +  {"march", "%{!march=*:-march=%(VALUE)}" }, \
>> +  {"mabi", "%{!mabi=*:-mabi=%(VALUE)}" }, \
>> +  {"tune", "%{!mtune=*:-mtune=%(VALUE)}" }, \
>> +  {"arch", "%{!march=*:-march=%(VALUE)}" }, \
>> +  {"abi", "%{!mabi=*:-mabi=%(VALUE)}" }, \
>
>  ... why do you need the "march" and "mabi" entries here, when I'd expect
> three entries as well?

Indeed, we do not.

>
>> +#undef ASM_SPEC
>> +#define ASM_SPEC "\
>> +%(subtarget_asm_debugging_spec) \
>> +%{fPIC|fpic|fPIE|fpie:-fpic} \
>
> I'd expect you to use FPIE_OR_FPIC_SPEC here to enable
> --enable-default-pie.

Thanks for pointing this out.  We missed that this macro was
introduced after we started the port.

>
> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: [PATCH 3/6] RISC-V Port: libgcc

2017-01-12 Thread Andrew Waterman
Thanks again for your feedback.

On Thu, Jan 12, 2017 at 3:30 PM, Joseph Myers  wrote:
> On Wed, 11 Jan 2017, Palmer Dabbelt wrote:
>
>> +riscv*-*-linux*)
>> + tmake_file="${tmake_file} t-softfp-sfdf riscv/t-softfp${host_address} 
>> t-softfp riscv/t-elf riscv/t-elf${host_address}"
>> + extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o crtendS.o 
>> crtbeginT.o"
>> + md_unwind_header=riscv/linux-unwind.h
>> + ;;
>> +riscv*-*-*)
>> + tmake_file="${tmake_file} t-softfp-sfdf riscv/t-softfp${host_address} 
>> t-softfp riscv/t-elf riscv/t-elf${host_address}"
>> + extra_parts="$extra_parts crtbegin.o crtend.o crti.o crtn.o"
>> + ;;
>
> This looks like you're building soft-fp functions into libgcc for all
> types whether or not you have hardware floating point support for them.
>
> If your ABIs are such that hardware and software floating point are ABI
> compatible at the function call level, then both copies of libgcc (the
> shared library, at least) should indeed have the same ABI (so a soft-float
> program can run with a hard-float copy of libgcc) - but for the hardware
> types, it's better to use t-hardfp.  If they are not ABI compatible, it's
> best for libgcc to contain only the functions that are actually needed.
> That is, in general, it only needs to contain functions that are not
> implemented in hardware, or are implemented in hardware but might not be
> implemented in hardware for some configurations using the same ABI (and in
> the latter case, the t-hardfp implementations are preferred).

Yes, some soft-float routines are needlessly built for some ABIs.
We'll rectify this.

>
>> +#define _FP_NANFRAC_S((_FP_QNANBIT_S << 1) - 1)
>> +#define _FP_NANFRAC_D((_FP_QNANBIT_D << 1) - 1), -1
>> +#define _FP_NANFRAC_Q((_FP_QNANBIT_Q << 1) - 1), -1, -1, -1
>
> This is different from the default NaN the specification says is used by
> hardware (all mantissa bits clear except for the MSB used to indicate a
> quiet NaN).  I'd expect the soft-fp configuration to make the same choices
> here as hardware.
>
>> +#define _FP_NANFRAC_S((_FP_QNANBIT_S << 1) - 1)
>> +#define _FP_NANFRAC_D((_FP_QNANBIT_D << 1) - 1)
>> +#define _FP_NANFRAC_Q((_FP_QNANBIT_Q << 1) - 1), -1
>
> Likewise.
>
>> +#define _FP_KEEPNANFRACP 1
>
> And since the hardware semantics don't propagate payloads I'd expect this
> to be zero, and ...
>
>> +/* From my experiments it seems X is chosen unless one of the
>> +   NaNs is sNaN,  in which case the result is NANSIGN/NANFRAC.  */
>> +#define _FP_CHOOSENAN(fs, wc, R, X, Y, OP)   \
>
>  ... this to use a canonical NaN unconditionally, so that again you do the
> same as hardware (the comment here is actively misleading in this case as
> it describes something contrary to the hardware specification as being
> what experiments show hardware does).

Thanks for pointing this out.  We will make the soft-float canonical
NaN value and NaN propagation behavior match the ISA.

>
>> +#define FP_ROUNDMODE (_fcw >> 5)
>
> I'm unclear from the specification whether the high 24 bits of fcsr are
> architecturally defined always to read as zero, or whether that's only the
> case in present architecture versions and they are reserved for possible
> future feature additions.  If the latter, it would seem desirable to mask
> the result of shifting so existing binaries using the soft-fp code
> continue to work on future hardware that might set some of the high bits.

I will see to it that the ISA spec is clarified on this point.  In the
mean time, I will obviate the issue by accessing the rounding mode and
exceptions through the frm and fflags shadow CSRs, rather than through
the fcsr.  (This should also be more performant.)

>
>> +#define  __LITTLE_ENDIAN 1234
>> +#define  __BIG_ENDIAN4321
>> +
>> +#if defined __big_endian__
>> +# define __BYTE_ORDER __BIG_ENDIAN
>> +#else
>> +# define __BYTE_ORDER __LITTLE_ENDIAN
>> +#endif
>
> As far as I can tell the port is always little-endian and there is no
> __big_endian__ macro, so that #if should not be there.

Indeed.

>
> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: [PATCH 6/6] RISC-V Port: gcc/testsuite

2017-01-12 Thread Andrew Waterman
On Thu, Jan 12, 2017 at 3:42 PM, Joseph Myers  wrote:
> On Wed, 11 Jan 2017, Palmer Dabbelt wrote:
>
> As I understand it, you have -m options that can change the ABI between
> 32-bit and 64-bit.  In such a case you mustn't check for riscv*64*-*-*
> target triplets in the testsuite, since that behave incorrectly when
> testing with an option that changes to the other one of 32-bit and 64-bit
> from the configured default.  Instead, you need to allow for all
> riscv*-*-* triplets and combine as needed with a test such as lp64 or
> ilp32.

Yes, that makes sense.  This will be rectified in v2 of the patch set.  Thanks.

>
> --
> Joseph S. Myers
> jos...@codesourcery.com


Re: New Port for RISC-V

2017-01-12 Thread Andrew Waterman
Thank you for taking the time to give us feedback.

On Thu, Jan 12, 2017 at 9:30 AM, Joseph Myers  wrote:
> General observation: I see no documentation (no changes to .texi files)
> anywhere in this patch series.  I also don't see updates to config-list.mk
> to add RISC-V targets to the set that builds.  (You should make sure the
> port builds cleanly when built with current mainline GCC and configured
> with --enable-werror-always; config-list.mk uses --enable-werror-always
> and it's a rough way of making sure in a cross build that there aren't
> warnings that would make a native bootstrap fail.  It should build cleanly
> for both 32-bit and 64-bit hosts.)
>
> See sourcebuild.texi, "Back End", for a description of various places that
> may need updating for a new port, including lots of things that need
> documenting if your port has them (not all ports will have all those
> target-specific features).  That includes website updates.

Thanks for the pointer.  We'll write documentation, adjust the build
list, and look at sourcebuild.texi for further guidance before
submitting an updated version of the patch set.

>
> Regarding binutils support: to be clear, does 2.28 branch have all the
> features / fixes known to be required at present and will subsequent fixes
> go on there as needed?

Yes, and we will continue to apply fixes to both the trunk and the 2.28 branch.

>
> Regarding Linux kernel support: how confident are you that the signal
> frame ABI, to the extent that this patch embeds it in linux-unwind.h, will
> remain unchanged?  To what extent has that port been reviewed by Linux
> kernel people familiar with the right way to add new Linux kernel ports?

I will consult with the kernel folks on this matter.

>
> Looking at the architecture specification to resolve some questions about
> the port, I see the statement (section 7.4 Subnormal Arithmetic) "In the
> parlance of the IEEE standard, tininess is detected after rounding---that
> is, the underflow exception is raised only if the rounded result is
> subnormal, even if the unrounded result would have been subnormal.".
> That "that is" is *not* an accurate description of what after-rounding
> tininess detection means.  After-rounding tininess detection means the
> result is tiny if the result *rounded to normal precision but with
> infinite exponent range* has an exponent outside the normal range.  This
> means that some results that round to the least normal value (given the
> actual finite exponent range) count as tiny (the exact range of results
> with that property depends on the rounding mode - in round-to-nearest, for
> example, if s is the greatest subnormal and s+u is the least normal, it's
> the half-open interval [s + u/2, s + 3u/4)).  If your processor behaves as
> described in "that is", that does not conform to IEEE 754-2008 and you
> won't get clean glibc test results, for example.

That is an error in the specification; the intent is that RISC-V
implementations conform to IEEE 754-2008.  The next revision of the
spec will simply remove the "that is" clause.  Thank you for pointing
this out.

>
> I also see the architecture has roundTiesToAway support for your hardware
> binary floating point.  Do you intend to support that rounding mode from C
> code?  (I expect a fair number of glibc changes would be needed to do so,
> although the soft-fp part shouldn't be that big.)

Our expectation was that roundTiesToAway would be primarily used in
hand-written library routines, and that application code wouldn't be
interested in using it because most architectures do not provide it
for binary FP.  Perhaps that is not forward-thinking.  We will consult
with the glibc maintainers about the possibility of a RISC-V-specific
extension to the fenv API.

>
> --
> Joseph S. Myers
> jos...@codesourcery.com


[PATCHv2][libatomic] Avoid misaligned atomic operations

2015-01-20 Thread Andrew Waterman
My original patch only fixed libat_fetch_op; this one applies the same
fix to libat_op_fetch, as well.

When using word-wide CAS to emulate atomic fetch-and-op, addresses should
be word-aligned to avoid exceptions on some targets.

The problem manifested in a new port I'm working on as a failure in test
gcc.dg/atomic/stdatomic-op-1.c, and I've confirmed that this patch
fixes it.  x86_64-unknown-linux still bootstraps, but that is admittedly
of little significance, since it doesn't use these routines.

2015-01-09  Andrew Waterman water...@cs.berkeley.edu

* fop_n.c (libat_fetch_op): Align address to word boundary.
(libat_op_fetch): Likewise.
---
 libatomic/fop_n.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/libatomic/fop_n.c b/libatomic/fop_n.c
index 307184d..854d648 100644
--- a/libatomic/fop_n.c
+++ b/libatomic/fop_n.c
@@ -112,9 +112,9 @@ SIZE(C2(libat_fetch_,NAME)) (UTYPE *mptr, UTYPE opval, int 
smodel)
 
   pre_barrier (smodel);
 
-  wptr = (UWORD *)mptr;
-  shift = 0;
-  mask = -1;
+  wptr = (UWORD *)((uintptr_t)mptr  -WORDSIZE);
+  shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT) ^ SIZE(INVERT_MASK);
+  mask = SIZE(MASK)  shift;
 
   wopval = (UWORD)opval  shift;
   woldval = __atomic_load_n (wptr, __ATOMIC_RELAXED);
@@ -136,9 +136,9 @@ SIZE(C3(libat_,NAME,_fetch)) (UTYPE *mptr, UTYPE opval, int 
smodel)
 
   pre_barrier (smodel);
 
-  wptr = (UWORD *)mptr;
-  shift = 0;
-  mask = -1;
+  wptr = (UWORD *)((uintptr_t)mptr  -WORDSIZE);
+  shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT) ^ SIZE(INVERT_MASK);
+  mask = SIZE(MASK)  shift;
 
   wopval = (UWORD)opval  shift;
   woldval = __atomic_load_n (wptr, __ATOMIC_RELAXED);
-- 
2.2.1



[PATCH][libatomic] Avoid misaligned atomic operations

2015-01-10 Thread Andrew Waterman
When using word-wide CAS to emulate atomic fetch-and-op, addresses should
be word-aligned to avoid exceptions on some targets.

The problem manifested in a new port I'm working on as a failure in test
gcc.dg/atomic/stdatomic-op-1.c, and I've confirmed that this patch
fixes it.  x86_64-unknown-linux still bootstraps, but that is admittedly
of little significance, since that target doesn't use these routines.

FYI, I don't have commit access.

2015-01-09  Andrew Waterman water...@cs.berkeley.edu

* fop_n.c (libat_fetch_op): Align address to word boundary.
---
 libatomic/fop_n.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/libatomic/fop_n.c b/libatomic/fop_n.c
index 307184d..c2556eb 100644
--- a/libatomic/fop_n.c
+++ b/libatomic/fop_n.c
@@ -112,9 +112,9 @@ SIZE(C2(libat_fetch_,NAME)) (UTYPE *mptr, UTYPE opval, int 
smodel)
 
   pre_barrier (smodel);
 
-  wptr = (UWORD *)mptr;
-  shift = 0;
-  mask = -1;
+  wptr = (UWORD *)((uintptr_t)mptr  -WORDSIZE);
+  shift = (((uintptr_t)mptr % WORDSIZE) * CHAR_BIT) ^ SIZE(INVERT_MASK);
+  mask = SIZE(MASK)  shift;
 
   wopval = (UWORD)opval  shift;
   woldval = __atomic_load_n (wptr, __ATOMIC_RELAXED);
-- 
2.2.1



Re: [ping] libatomic: Fix sub-word CAS synthesis on LP64 targets

2014-11-08 Thread Andrew Waterman
Thank you for approving the patch.  I neglected to mention that I do
not have write access.  Would you or someone else be so kind as to
commit this?

On Fri, Nov 7, 2014 at 5:20 AM, Richard Henderson r...@redhat.com wrote:
 On 11/06/2014 09:24 PM, Andrew Waterman wrote:
 2014-10-23  Andrew Waterman water...@cs.berkeley.edu

   * cas_n.c (libat_compare_exchange): Add missing cast.

 Ok.


 r~


[ping] libatomic: Fix sub-word CAS synthesis on LP64 targets

2014-11-06 Thread Andrew Waterman
Greetings.  This is just a ping about a one-liner.  The original email was
here and is inlined below.

https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02465.html

A missing cast causes the compare value to be truncated, resulting in
spurious CAS failures.

The problem manifested in a new port I'm working on as a failure in test
gcc.dg/atomic/c11-atomic-exec-2.c, and I've confirmed that this patch
fixes it.  I also verified that x86_64-unknown-linux still bootstraps
(which is admittedly vacuous, since x86-64 doesn't exercise this code).

2014-10-23  Andrew Waterman water...@cs.berkeley.edu

  * cas_n.c (libat_compare_exchange): Add missing cast.
---
 libatomic/cas_n.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libatomic/cas_n.c b/libatomic/cas_n.c
index 801262d..a885afa 100644
--- a/libatomic/cas_n.c
+++ b/libatomic/cas_n.c
@@ -70,7 +70,7 @@ SIZE(libat_compare_exchange) (UTYPE *mptr, UTYPE *eptr, UTYPE 
newval,
   mask = -1;
 }
 
-  weval = *eptr  shift;
+  weval = (UWORD)*eptr  shift;
   wnewval = (UWORD)newval  shift;
   woldval = __atomic_load_n (wptr, __ATOMIC_RELAXED);
   do
-- 
2.1.1



[PATCH] Fix sub-word CAS synthesis on LP64 targets

2014-10-23 Thread Andrew Waterman
A missing cast causes the compare value to be truncated, resulting in
spurious CAS failures.

The problem manifested in a new port I'm working on as a failure in test
gcc.dg/atomic/c11-atomic-exec-2.c, and I've confirmed that this patch
fixes it.  I also verified that x86_64-unknown-linux still bootstraps
(which is admittedly vacuous, since x86-64 doesn't exercise this code).

2014-10-23  Andrew Waterman water...@cs.berkeley.edu

  * cas_n.c (libat_compare_exchange): Add missing cast.
---
 libatomic/cas_n.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libatomic/cas_n.c b/libatomic/cas_n.c
index 801262d..a885afa 100644
--- a/libatomic/cas_n.c
+++ b/libatomic/cas_n.c
@@ -70,7 +70,7 @@ SIZE(libat_compare_exchange) (UTYPE *mptr, UTYPE *eptr, UTYPE 
newval,
   mask = -1;
 }
 
-  weval = *eptr  shift;
+  weval = (UWORD)*eptr  shift;
   wnewval = (UWORD)newval  shift;
   woldval = __atomic_load_n (wptr, __ATOMIC_RELAXED);
   do
-- 
2.1.1