[SH, committed]: Fix outage caused by secondary combine pass (was: Re: [RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4)

2024-07-20 Thread Oleg Endo
Hi,

I've committed the attached patch to fix the full gcc + libstdc++ build on
sh-elf.

Best regards,
Oleg Endo



On Sat, 2024-07-06 at 07:35 -0600, Jeff Law wrote:
> 
> On 7/5/24 1:28 AM, Sébastien Michelland wrote:
> > Hi Oleg!
> > 
> > > I don't understand why this is being limited to SH3 and SH4 only?
> > > Almost all SH4 systems out there have an FPU (unless special 
> > > configurations
> > > are used).  So I'd say if switching to soft-fp, then for SH-anything, not
> > > just SH3/SH4.
> > > 
> > > If it yields some improvements for some users, I'm all for it.
> > 
> > Yeah I just defaulted to SH3/SH4 conservatively because that's the only 
> > hardware I have. (My main platform also happens to be one of these SH4 
> > without an FPU, the SH4AL-DSP.)
> > 
> > Once this is tested/validated on simulator, I'll happily simplify the 
> > patch to apply to all SH.
> > 
> > > I think it would make sense to test it using sh-sim on SH2 big-endian and
> > > little endian at least, as that doesn't have an FPU and hence would run
> > > tests utilizing soft-fp.
> > > 
> > > After building the toolchain for --target=sh-elf, you can use this to run
> > > the testsuite in the simulator:
> > > 
> > > make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb}"
> > > 
> > > (add make -j parameter according to you needs -- it will be slow)
> > 
> > Alright, it might take a little bit.
> > 
> > Building the combined tree of gcc/binutils/newlib masters (again 
> > following [1]) gives me an ICE in libstdc++v3/src/libbacktrace, 
> > irrespective of my libgcc change:
> This is almost certainly a poorly written pattern.  I just fixed a bunch 
> of these, but not this one.  Essentially a recent change in the generic 
> parts of the compiler is exposing some bugs in the SH backend. 
> Specifically:
> 
> > ;; Store (negated) T bit as all zeros or ones in a reg.  
> > ;;  subcRn,Rn   ! Rn = Rn - Rn - T; T = T
> > ;;  not Rn,Rn   ! Rn = 0 - Rn
> > ;; 
> > ;; Note the call to sh_split_treg_set_expr may clobber
> > ;; the T reg.  We must express this, even though it's
> > ;; not immediately obvious this pattern changes the
> > ;; T register.
> > (define_insn_and_split "mov_neg_si_t"
> >   [(set (match_operand:SI 0 "arith_reg_dest" "=r")
> > (neg:SI (match_operand 1 "treg_set_expr")))
> >(clobber (reg:SI T_REG))] 
> >   "TARGET_SH1" 
> > {
> >   gcc_assert (t_reg_operand (operands[1], VOIDmode));
> >   return "subc  %0,%0";
> > }
> >   "&& can_create_pseudo_p () && !t_reg_operand (operands[1], VOIDmode)"
> >   [(const_int 0)]
> > {
> >   sh_treg_insns ti = sh_split_treg_set_expr (operands[1], curr_insn);
> >   emit_insn (gen_mov_neg_si_t (operands[0], get_t_reg_rtx ()));
> > 
> >   if (ti.remove_trailing_nott ())
> > emit_insn (gen_one_cmplsi2 (operands[0], operands[0]));
> > 
> >   DONE; 
> > }
> >   [(set_attr "type" "arith")])
> 
> 
> As written this pattern could match after register allocation is 
> complete and thus we can't create new pseudos (the condition TARGET_SH1 
> controls that behavior).  operands[1] won't necessarily be the T 
> register in that case.
> 
> The split condition fails because we can't create new pseudos, so it's 
> left as-is.  At final assembly time the assertion triggers.
> 
> the "&& can_create_pseudo ()" part of the split condition should be 
> moved into the main condition.  I think that's all that's necessary to 
> fix this problem.  It'd probably be best of Oleg went through the 
> various define_insn_and_split patterns that utilize can_create_pseudo in 
> their split condition and evaluated them.
> 
> I only fixed the most obvious cases in my change from this morning.  I 
> don't typically work on the SH port and for changes which aren't 
> obviously correct, Oleg is in a better position to evaluate the proper fix.
> 
> jeff
From 9e740e7d71d02369774e1380902bddd9681c463f Mon Sep 17 00:00:00 2001
From: Oleg Endo 
Date: Sun, 21 Jul 2024 14:11:21 +0900
Subject: [PATCH] SH: Fix outage caused by recently added 2nd combine pass after reg alloc

I've also confirmed on the CSiBE set that the secondary combine pass is
actually beneficial on SH.  It does result in some code size reductions.

gcc/CHangeLog:
	* config/sh/sh.md (mov_neg_si_t): Allow insn and split after
	register allocation.
	(*treg_noop_move): New insn.
---
 

[gcc r15-2183] SH: Fix outage caused by recently added 2nd combine pass after reg alloc

2024-07-20 Thread Oleg Endo via Gcc-cvs
https://gcc.gnu.org/g:58b78cf068b3b24c11d7812a5f4de865e9cdb8b4

commit r15-2183-g58b78cf068b3b24c11d7812a5f4de865e9cdb8b4
Author: Oleg Endo 
Date:   Sun Jul 21 14:11:21 2024 +0900

SH: Fix outage caused by recently added 2nd combine pass after reg alloc

I've also confirmed on the CSiBE set that the secondary combine pass is
actually beneficial on SH.  It does result in some code size reductions.

gcc/CHangeLog:
* config/sh/sh.md (mov_neg_si_t): Allow insn and split after
register allocation.
(*treg_noop_move): New insn.

Diff:
---
 gcc/config/sh/sh.md | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 3e978254ab0c..7eee12ca6b8a 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -8408,7 +8408,7 @@
   gcc_assert (t_reg_operand (operands[1], VOIDmode));
   return "subc %0,%0";
 }
-  "&& can_create_pseudo_p () && !t_reg_operand (operands[1], VOIDmode)"
+  "&& !t_reg_operand (operands[1], VOIDmode)"
   [(const_int 0)]
 {
   sh_treg_insns ti = sh_split_treg_set_expr (operands[1], curr_insn);
@@ -8421,6 +8421,14 @@
 }
   [(set_attr "type" "arith")])
 
+;; no-op T bit move which can result from other optimizations.
+(define_insn_and_split "*treg_noop_move"
+  [(set (reg:SI T_REG) (reg:SI T_REG))]
+  "TARGET_SH1"
+  "#"
+  "&& 1"
+  [(const_int 0)])
+
 ;; Invert the T bit.
 ;; On SH2A we can use the nott insn.  On anything else this must be done with
 ;; multiple insns like:


Re: [RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4

2024-07-08 Thread Oleg Endo


Hi,

> > > > The default sh-elf configuration has no multi-libs for SH3 and SH4 
> > > > variants
> > > > without FPU (from what I can see).  So it won't use soft-fp so much 
> > > > during
> > > > sim testing.  So please change to soft-fp for sh*, not just SH3/SH4.
> > 
> > Got it, done that locally, and will update patch once tested.
> > 
> > > > Here's an old proposed change to the simtest instructions to not use
> > > > combined trees:
> > > > 
> > > > https://gcc.gnu.org/pipermail/gcc-patches/attachments/20140815/fb38918e/attachment.bin
> > 
> > Thanks for the instructions. Apologies for the back-and-forth as I'm 
> > pretty new with this infrastructure (I usually do research stuff on LLVM).

No need to apologize.  I know this is a tedious and annoying thing to go
through and there is only very little useful information out there.

> > The split-tree build goes better, still fails with GCC 15 (as expected, 
> > though somehow my custom toolchain did build originally) and sort of 
> > works with GCC 14.

> > The binutils/gdb repos have been merged since that attachement, and 
> > while I can build binutils only with --disable-gdb, building gdb (in 
> > another build folder, reconfiguring from scratch) seems iffy. The global 
> > CFLAGS/CXXFLAGS to switch to 32-bit affects at least parts of binutils, 
> > resulting in a broken toolchain due to architecture mixup:

It shouldn't be needed to build GDB separately or to specify the -m32 flags.
Not sure why you have to do that.

I've just tried the following configure lines:

binutils-gdb (binutils-2_41-release)
<..>/configure --target=sh-elf --prefix=/usr/local --disable-nls 
--disable-werror --enable-initfini-array

gcc (any version)
<..>/configure --target=sh-elf --prefix=/usr/local --enable-languages=c,c++,lto 
--disable-nls --disable-werror --with-newlib --enable-lto --enable-multilib 
--with-system-zlib --disable-libstdcxx-verbose --disable-symvers

newlib (latest)
CFLAGS_FOR_TARGET="-Wno-error=implicit-function-declaration -Wno-implicit-int 
-ffunction-sections -fdata-sections -flto" <..>/newlib/configure --host=sh-elf 
--target=sh-elf --prefix=/usr/local --enable-multilib 
--enable-newlib-io-c99-formats

Note that the latest newlib version will try to create multilib directories
one directory above its current build directory for some reason.  So just
create another sub-directory in the build directory and do the config and
build from there.

Other than that, the build steps are the same as before.


I could reproduce the issue with the latest GCC when building libstdc++. 
I'm working on a fix for it.


Unfortunately I'm also getting the SIGBUS error when running a C++ program
that uses std::cout / std::cerr.

To be honest, I don't remember what the issue was/is, whether this has ever
worked at all or not.  I've tried rewinding everything back ~10 years ago
but was still getting the same error.  Using printf from the simulator seems
to work fine though.  So I guess a bunch of C++ tests of the GCC testsuite
will fail on the simulator, but that could be tolerable -- it never passed
all the tests on the simulator anyway.  It's still a good way to test for
regressions that could be introduced by a patch.

> > How active are the main types? Like are there still new products 
> > designed with these (maybe the J2)?

There is some activity on the software side which mainly stems from folks
using old parts and systems.  I'd say the biggest activity is now people
hammering on Sega 32X (SH2), Saturn (SH2) and Dreamcast (SH4), but I might
be biased here.

As for new hardware, I'm not sure.  Apparently it's still possible to
license SH4A(+FPU) and SH4AL-DSP IP cores from Renesas, but I doubt anybody
is really doing that.  Some parts are still being manufactured, like SH2A
for some niche applications. Don't know what j-core people are up to these
days.  Some of the SH MCUs have been re-implemented as open source gateware
for the MisTer FPGA project.

> > 
> > I'd be interested to learn more about the history of the SH backend, if 
> > anyone wrote that up somewhere...
> > 

>From what I know it started during the earlier cygwin days in the 90s,
originally contracted by Hitachi to complement their own in-house C compiler
and also to allow sh-linux to happen at some point.  It was entertained by
Renesas for a while through further contracted support work but eventually
they have abandoned it.  STmicro was also a licensee of the SH4 CPU for
their TV set top boxes and had a few guys submitting patches now and then
for a while.  But the whole thing basically went on life support about 10
years ago.

Perhaps Jeff or others can give more insight on the historical parts.


Best regards,
Oleg Endo


Re: [RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4

2024-07-06 Thread Oleg Endo
Hi,

( For some weird reason I keep losing Sebastien's messages ... )

On Sat, 2024-07-06 at 07:35 -0600, Jeff Law wrote:
> 
> On 7/5/24 1:28 AM, Sébastien Michelland wrote:
> > Hi Oleg!
> > 
> > > I don't understand why this is being limited to SH3 and SH4 only?
> > > Almost all SH4 systems out there have an FPU (unless special 
> > > configurations
> > > are used).  So I'd say if switching to soft-fp, then for SH-anything, not
> > > just SH3/SH4.
> > > 
> > > If it yields some improvements for some users, I'm all for it.
> > 
> > Yeah I just defaulted to SH3/SH4 conservatively because that's the only 
> > hardware I have. (My main platform also happens to be one of these SH4 
> > without an FPU, the SH4AL-DSP.)

Oh, wow, especially rare type!

> > 
> > Once this is tested/validated on simulator, I'll happily simplify the 
> > patch to apply to all SH.

The default sh-elf configuration has no multi-libs for SH3 and SH4 variants
without FPU (from what I can see).  So it won't use soft-fp so much during
sim testing.  So please change to soft-fp for sh*, not just SH3/SH4.

> > 
> > > I think it would make sense to test it using sh-sim on SH2 big-endian and
> > > little endian at least, as that doesn't have an FPU and hence would run
> > > tests utilizing soft-fp.
> > > 
> > > After building the toolchain for --target=sh-elf, you can use this to run
> > > the testsuite in the simulator:
> > > 
> > > make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb}"
> > > 
> > > (add make -j parameter according to you needs -- it will be slow)
> > 
> > Alright, it might take a little bit.
> > 
> > Building the combined tree of gcc/binutils/newlib masters (again 
> > following [1])
> > 

I have never built the toolchain using a combined tree.  Like you said, it's
difficult to debug and so on.  I've only built it separately and never had
any issues with this approach on multiple platforms/targets.

Here's an old proposed change to the simtest instructions to not use
combined trees:

https://gcc.gnu.org/pipermail/gcc-patches/attachments/20140815/fb38918e/attachment.bin


> 

> This is almost certainly a poorly written pattern.  I just fixed a bunch 
> of these, but not this one.  Essentially a recent change in the generic 
> parts of the compiler is exposing some bugs in the SH backend. 

The patterns were written and tested to the best of our knowledge at that
time many years ago.  Nobody thought that we'll get a 2nd combine pass after
RA.  Anyway, I'll have a look at the remaining patterns.

Sebastien, in the meantime you could also try out and test your changes on
the latest GCC 14 branch, which shouldn't have those issues.

Best regards,
Oleg Endo


Re: [committed] Fix various sh define_insn_and_split predicates

2024-07-06 Thread Oleg Endo



On Sat, 2024-07-06 at 06:40 -0600, Jeff Law wrote:
> The sh4-linux-gnu port has failed to bootstrap since the introduction of 
> late combine due to failures to split certain insns.
> 
> This is caused by incorrect predicates in various define_insn_and_split 
> patterns.  Essentially the insn's predicate is something like 
> "TARGET_SH1".  The split predicate is "&& can_create_pseudos_p ()".  So 
> these patterns will match post-reload, but be un-splittable.  So at 
> assembly output time, we get the failure as the output template is "#".
> 
> This patch fixes the most obvious & egregious cases by bringing the 
> split condition into the insn's predicate and leaving "&& 1" as the 
> split condition.  That's enough to get sh4-linux-gnu bootstrapping again 
> and I'm hoping it does the same for sh4eb-linux-gnu.
> 
> Pushing to the trunk.
> 

Thanks, Jeff!

Best regards,
Oleg Endo


Re: [RFC/PATCH] libgcc: sh: Use soft-fp for non-hosted SH3/SH4

2024-07-03 Thread Oleg Endo
Hi!

On Wed, 2024-07-03 at 19:28 +0200, Sébastien Michelland wrote:
> On 2024-07-03 17:59, Jeff Law wrote:
> > On 7/3/24 3:59 AM, Sébastien Michelland wrote:
> > > libgcc's fp-bit.c is quite slow and most modern/developed architectures
> > > have switched to using the soft-fp library. This patch does so for
> > > free-standing/unknown-OS SH3/SH4 builds, using soft-fp's default 
> > > parameters
> > > for the most part, most notably no exceptions.
> > > 
> > > A quick run of Whetstone (built with OpenLibm) on an SH4 machine shows
> > > about x3 speedup (~320 -> 1050 Kwhets/s).
> > > 
> > > I'm sending this as RFC because I'm quite unsure about testing. I built
> > > the compiler and ran the benchmark, but I don't know if GCC has a test
> > > for soft-fp correctness and whether I can run that in my non-hosted
> > > environment. Any advice?
> > > 
> > > Cheers,
> > > Sébastien
> > > 
> > > libgcc/ChangeLog:
> > > 
> > >  * config.host: Use soft-fp library for non-hosted SH3/SH4
> > >  instead of fpdbit.
> > >  * config/sh/sfp-machine.h: New.

> > I'd really like to hear from Oleg on this, though given we're using the 
> > soft-fp library on other targets it seems reasonable at a high level.

I don't understand why this is being limited to SH3 and SH4 only?
Almost all SH4 systems out there have an FPU (unless special configurations
are used).  So I'd say if switching to soft-fp, then for SH-anything, not
just SH3/SH4.

If it yields some improvements for some users, I'm all for it.

> > As far as testing, the GCC testsuite has some FP components which would 
> > implicitly test soft fp on any target that doesn't have hardware 
> > floating point.
> 
> Thank you. I went this route, following the guide [1] and the 
> instructions for cross-compiling [2] before hitting "Newlib does not 
> support CPU sh3eb" which I should have seen coming.
> 
> There are plenty of random ports lying around but just grabbing one 
> doesn't feel right (and I don't have a canonical one to go to as I 
> usually run a custom libc for... mostly bad reasons).
> 
> Deferring maybe again to the few SH users... how do you usually do it?
> 
> 

I think it would make sense to test it using sh-sim on SH2 big-endian and
little endian at least, as that doesn't have an FPU and hence would run
tests utilizing soft-fp.

After building the toolchain for --target=sh-elf, you can use this to run
the testsuite in the simulator:

make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb}"

(add make -j parameter according to you needs -- it will be slow)

Let me know if you have any further questions.

Best regards,
Oleg Endo


Re: [RFC PATCH] cse: Add another CSE pass after split1

2024-06-28 Thread Oleg Endo
Hi,

On Thu, 2024-06-27 at 14:56 -0700, Palmer Dabbelt wrote:
> This is really more of a question than a patch.
> 
> Looking at PR/115687 I managed to convince myself there's a general
> class of problems here: splitting might produce constant subexpressions,
> but as far as I can tell there's nothing to eliminate those constant
> subexpressions.  So I very quickly threw together a CSE that doesn't
> fold expressions, and it does eliminate the high-part constants in
> question.

Maybe this is somewhat relevant ... 

On SH there was/is a need to hoist constant loads outside of loops, which
might form as part of combine/split1 optimization.


https://gcc.gnu.org/bugzilla/attachment.cgi?id=55543=diff


Don't know about others, but maybe it would make sense to have those passes
permanently added for everyone, with conditional opt-in/opt-out so keep the
compile times down.


Best regards,
Oleg Endo


Re: [committed] Remove compromised sh test

2024-06-26 Thread Oleg Endo
On Wed, 2024-06-26 at 18:30 -0600, Jeff Law wrote:
> > > 
> > 
> > OK, then what's the default config of your test setup / triplet?
> > Can you please show the generated code that you get?  Because - like I said
> > - I can't reproduce it.
> test01:
>  sts.l   pr,@-r15! 31[c=4 l=2]  movsi_i/10
>  add #-4,r15 ! 32[c=4 l=2]  *addsi3/0
>  mov.l   .L3,r0  ! 26[c=10 l=2]  movsi_i/0
>  jsr @r0 ! 12[c=5 l=2]  call_valuei
>  mov.l   r6,@r15 ! 4 [c=4 l=2]  movsi_i/8
>  mov.l   @r15,r1 ! 29[c=1 l=2]  movsi_i/5
>  add r1,r0   ! 30[c=4 l=2]  *addsi3/0
>  add #4,r15  ! 36[c=4 l=2]  *addsi3/0
>  lds.l   @r15+,pr! 38[c=1 l=2]  movsi_i/14
>  rts
>  nop ! 40[c=0 l=4]  *return_i
> 
> 
> Note that there's a scheduling barrier in the RTL between insns 30 and 
> 36.  So instructions prior to insn 36 can't be used to fill the delay slot.
> 

Thanks.  Now I'm also seeing the same result.  Needed to specify -O2 to get
that.  -O1 was not enough it seems.

I don't know why you said that the code for this case improved -- it has
not?!

I think the test is still valid.  The reason for the failure might be
different from the original one (the scheduling barrier for whatever
reason), but the end result is the same -- the last delay slot is not
stuffed, although the 'add r1,r0' could go in there.

I'd like to revert the removal of this test case, as it catches a valid
issue.

Best regards,
Oleg Endo







Re: [committed] Remove compromised sh test

2024-06-26 Thread Oleg Endo



On Wed, 2024-06-26 at 16:39 -0600, Jeff Law wrote:
> 
> On 6/26/24 4:12 PM, Oleg Endo wrote:
> > 
> > 
> > On Wed, 2024-06-26 at 07:22 -0600, Jeff Law wrote:
> > > Surya's recent patch to IRA improves the code for sh/pr54602-1.c
> > > slightly.  Specifically it's able to eliminate a save/restore in the
> > > prologue/epilogue and a bit of register shuffling.
> > > 
> > > As a result there literally aren't any insns that can be used to fill
> > > the delay slot of the return, so a nop gets emitted and the test fails.
> > > 
> > > Given there literally aren't any insns to move into the delay slot, the
> > > best course of action is to just drop the test.
> > > 
> > > Pushed to the trunk.
> > > 
> > > Jeff
> > 
> > I can't reproduce what you are saying.
> > Which triplet and flags is your test setup using?
> > 
> > For this test case, GCC 13 with -m4 -ml -O1 -fno-pic:
> No -m flags at all.   As plain of a testrun as you can do.
> 

OK, then what's the default config of your test setup / triplet?
Can you please show the generated code that you get?  Because - like I said
- I can't reproduce it.

Best regards,
Oleg Endo


Re: [committed] Remove compromised sh test

2024-06-26 Thread Oleg Endo



On Wed, 2024-06-26 at 07:22 -0600, Jeff Law wrote:
> Surya's recent patch to IRA improves the code for sh/pr54602-1.c 
> slightly.  Specifically it's able to eliminate a save/restore in the 
> prologue/epilogue and a bit of register shuffling.
> 
> As a result there literally aren't any insns that can be used to fill 
> the delay slot of the return, so a nop gets emitted and the test fails.
> 
> Given there literally aren't any insns to move into the delay slot, the 
> best course of action is to just drop the test.
> 
> Pushed to the trunk.
> 
> Jeff

I can't reproduce what you are saying.
Which triplet and flags is your test setup using?

For this test case, GCC 13 with -m4 -ml -O1 -fno-pic:

_test01:
mov.l   r8,@-r15
sts.l   pr,@-r15
mov.l   .L3,r0
jsr @r0
mov r6,r8
add r8,r0
lds.l   @r15+,pr
rts 
mov.l   @r15+,r8
.L3:
.long   _test00


current GCC master branch with -m4 -ml -O1 -fno-pic:

_test00:
mov.l   r8,@-r15
sts.l   pr,@-r15
mov.l   .L3,r0
jsr @r0
mov r6,r8
add r8,r0
lds.l   @r15+,pr
rts
mov.l   @r15+,r8
.L4:
.align 2
.L3:
    .long   _test01


Best regards,
Oleg Endo


Re: [PATCH 6/6] Add a late-combine pass [PR106594]

2024-06-20 Thread Oleg Endo


On Thu, 2024-06-20 at 14:34 +0100, Richard Sandiford wrote:
> 
> I tried compiling at least one target per CPU directory and comparing
> the assembly output for parts of the GCC testsuite.  This is just a way
> of getting a flavour of how the pass performs; it obviously isn't a
> meaningful benchmark.  All targets seemed to improve on average:
> 
> Target Tests   GoodBad   %Good   Delta  Median
> == =   ===   =   =  ==
> aarch64-linux-gnu   2215   1975240  89.16%   -4159  -1
> aarch64_be-linux-gnu1569   1483 86  94.52%  -10117  -1
> alpha-linux-gnu 1454   1370 84  94.22%   -9502  -1
> amdgcn-amdhsa   5122   4671451  91.19%  -35737  -1
> arc-elf 2166   1932234  89.20%  -37742  -1
> arm-linux-gnueabi   1953   1661292  85.05%  -12415  -1
> arm-linux-gnueabihf 1834   1549285  84.46%  -11137  -1
> avr-elf 4789   4330459  90.42% -441276  -4
> bfin-elf2795   2394401  85.65%  -19252  -1
> bpf-elf 3122   2928194  93.79%   -8785  -1
> c6x-elf 2227   1929298  86.62%  -17339  -1
> cris-elf3464   3270194  94.40%  -23263  -2
> csky-elf2915   2591324  88.89%  -22146  -1
> epiphany-elf2399   2304 95  96.04%  -28698  -2
> fr30-elf7712   7299413  94.64%  -99830  -2
> frv-linux-gnu   3332   2877455  86.34%  -25108  -1
> ft32-elf2775   2667108  96.11%  -25029  -1
> h8300-elf   3176   2862314  90.11%  -29305  -2
> hppa64-hp-hpux11.23 4287   4247 40  99.07%  -45963  -2
> ia64-linux-gnu  2343   1946397  83.06%   -9907  -2
> iq2000-elf  9684   9637 47  99.51% -126557  -2
> lm32-elf2681   2608 73  97.28%  -59884  -3
> loongarch64-linux-gnu   1303   1218 85  93.48%  -13375  -2
> m32r-elf1626   1517109  93.30%   -9323  -2
> m68k-linux-gnu  3022   2620402  86.70%  -21531  -1
> mcore-elf   2315   2085230  90.06%  -24160  -1
> microblaze-elf  2782   2585197  92.92%  -16530  -1
> mipsel-linux-gnu1958   1827131  93.31%  -15462  -1
> mipsisa64-linux-gnu 1655   1488167  89.91%  -16592  -2
> mmix4914   4814100  97.96%  -63021  -1
> mn10300-elf 3639   3320319  91.23%  -34752  -2
> moxie-rtems 3497   3252245  92.99%  -87305  -3
> msp430-elf  4353   3876477  89.04%  -23780  -1
> nds32le-elf 3042   2780262  91.39%  -27320  -1
> nios2-linux-gnu 1683   1355328  80.51%   -8065  -1
> nvptx-none  2114   1781333  84.25%  -12589  -2
> or1k-elf3045   2699346  88.64%  -14328  -2
> pdp11   4515   4146369  91.83%  -26047  -2
> pru-elf 1585   1245340  78.55%   -5225  -1
> riscv32-elf 2122   2000122  94.25% -101162  -2
> riscv64-elf 1841   1726115  93.75%  -49997  -2
> rl78-elf2823   2530293  89.62%  -40742  -4
> rx-elf  2614   2480134  94.87%  -18863  -1
> s390-linux-gnu  1591   1393198  87.55%  -16696  -1
> s390x-linux-gnu 2015   1879136  93.25%  -21134  -1
> sh-linux-gnu1870   1507363  80.59%   -9491  -1
> sparc-linux-gnu 1123   1075 48  95.73%  -14503  -1
> sparc-wrs-vxworks   1121   1073 48  95.72%  -14578  -1
> sparc64-linux-gnu   1096   1021 75  93.16%  -15003  -1
> v850-elf1897   1728169  91.09%  -11078  -1
> vax-netbsdelf   3035   2995 40  98.68%  -27642  -1
> visium-elf  1392   1106286  79.45%   -7984  -2
> xstormy16-elf   2577   2071506  80.36%  -13061  -1
> 
> 

Since you have already briefly compared some of the code, can you share
those cases which get worse and might require some potential follow up
patches?

Best regards,
Oleg Endo


Re: [PATCH 4/6] sh: Make *minus_plus_one work after RA

2024-06-20 Thread Oleg Endo


On Thu, 2024-06-20 at 14:34 +0100, Richard Sandiford wrote:
> *minus_plus_one had no constraints, which meant that it could be
> matched after RA with operands 0, 1 and 2 all being different.
> The associated split instead requires operand 0 to be tied to
> operand 1.

Thanks for spotting this.  Makes sense, please install.

Best regards,
Oleg Endo

> 
> gcc/
>   * config/sh/sh.md (*minus_plus_one): Add constraints.
> ---
>  gcc/config/sh/sh.md | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
> index 92a1efeb811..9491b49e55b 100644
> --- a/gcc/config/sh/sh.md
> +++ b/gcc/config/sh/sh.md
> @@ -1642,9 +1642,9 @@ (define_insn_and_split "*addc"
>  ;; matched.  Split this up into a simple sub add sequence, as this will save
>  ;; us one sett insn.
>  (define_insn_and_split "*minus_plus_one"
> -  [(set (match_operand:SI 0 "arith_reg_dest" "")
> - (plus:SI (minus:SI (match_operand:SI 1 "arith_reg_operand" "")
> -(match_operand:SI 2 "arith_reg_operand" ""))
> +  [(set (match_operand:SI 0 "arith_reg_dest" "=r")
> + (plus:SI (minus:SI (match_operand:SI 1 "arith_reg_operand" "0")
> +(match_operand:SI 2 "arith_reg_operand" "r"))
>(const_int 1)))]
>"TARGET_SH1"
>"#"
> -- 
> 2.25.1
> 


Re: [PATCH 45/52] sh: New hook implementation sh_c_mode_for_floating_type

2024-06-02 Thread Oleg Endo


Hi!

On Sun, 2024-06-02 at 22:01 -0500, Kewen Lin wrote:
> This is to remove macro LONG_DOUBLE_TYPE_SIZE define in
> sh port, and add new port specific hook implementation
> sh_c_mode_for_floating_type.
> 

The SH parts look OK to me.

Best regards,
Oleg Endo


> gcc/ChangeLog:
> 
>   * config/sh/sh.cc (sh_c_mode_for_floating_type): New function.
>   (TARGET_C_MODE_FOR_FLOATING_TYPE): New macro.
>   * config/sh/sh.h (LONG_DOUBLE_TYPE_SIZE): Remove.
> ---
>  gcc/config/sh/sh.cc | 18 ++
>  gcc/config/sh/sh.h  | 10 --
>  2 files changed, 18 insertions(+), 10 deletions(-)
> 
> diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
> index ef3c2e6791d..bc017420381 100644
> --- a/gcc/config/sh/sh.cc
> +++ b/gcc/config/sh/sh.cc
> @@ -328,6 +328,7 @@ static unsigned int sh_hard_regno_nregs (unsigned int, 
> machine_mode);
>  static bool sh_hard_regno_mode_ok (unsigned int, machine_mode);
>  static bool sh_modes_tieable_p (machine_mode, machine_mode);
>  static bool sh_can_change_mode_class (machine_mode, machine_mode, 
> reg_class_t);
> +static machine_mode sh_c_mode_for_floating_type (enum tree_index);
>  
>  TARGET_GNU_ATTRIBUTES (sh_attribute_table,
>  {
> @@ -664,6 +665,9 @@ TARGET_GNU_ATTRIBUTES (sh_attribute_table,
>  #undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
>  #define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
>  
> +#undef TARGET_C_MODE_FOR_FLOATING_TYPE
> +#define TARGET_C_MODE_FOR_FLOATING_TYPE sh_c_mode_for_floating_type
> +
>  struct gcc_target targetm = TARGET_INITIALIZER;
>  
>  
> @@ -10674,6 +10678,20 @@ sh_can_change_mode_class (machine_mode from, 
> machine_mode to,
>return true;
>  }
>  
> +/* Implement TARGET_C_MODE_FOR_FLOATING_TYPE.  Return SFmode or DFmode
> +   for TI_DOUBLE_TYPE which is for double type, go with the default one
> +   for the others.  */
> +
> +static machine_mode
> +sh_c_mode_for_floating_type (enum tree_index ti)
> +{
> +  /* Since the SH2e has only `float' support, it is desirable to make all
> + floating point types equivalent to `float'.  */
> +  if (ti == TI_DOUBLE_TYPE)
> +return TARGET_FPU_SINGLE_ONLY ? SFmode : DFmode;
> +  return default_mode_for_floating_type (ti);
> +}
> +
>  /* Return true if registers in machine mode MODE will likely be
> allocated to registers in small register classes.  */
>  bool
> diff --git a/gcc/config/sh/sh.h b/gcc/config/sh/sh.h
> index 7d3a3f08338..53cad85d122 100644
> --- a/gcc/config/sh/sh.h
> +++ b/gcc/config/sh/sh.h
> @@ -425,9 +425,6 @@ extern const sh_atomic_model& selected_atomic_model 
> (void);
>  /* Width in bits of a `long long'.  */
>  #define LONG_LONG_TYPE_SIZE 64
>  
> -/* Width in bits of a `long double'.  */
> -#define LONG_DOUBLE_TYPE_SIZE 64
> -
>  /* Width of a word, in units (bytes).  */
>  #define UNITS_PER_WORD   (4)
>  #define MIN_UNITS_PER_WORD 4
> @@ -1433,13 +1430,6 @@ extern bool current_function_interrupt;
> Do not define this if the table should contain absolute addresses.  */
>  #define CASE_VECTOR_PC_RELATIVE 1
>  
> -/* Define it here, so that it doesn't get bumped to 64-bits on SHmedia.  */
> -#define FLOAT_TYPE_SIZE 32
> -
> -/* Since the SH2e has only `float' support, it is desirable to make all
> -   floating point types equivalent to `float'.  */
> -#define DOUBLE_TYPE_SIZE (TARGET_FPU_SINGLE_ONLY ? 32 : 64)
> -
>  /* 'char' is signed by default.  */
>  #define DEFAULT_SIGNED_CHAR  1
>  
> -- 
> 2.43.0
> 


Re: [PATCH] Add extra copy of the ifcombine pass after pre [PR102793]

2024-05-16 Thread Oleg Endo


On Thu, 2024-05-16 at 10:35 +0200, Richard Biener wrote:
> On Fri, Apr 5, 2024 at 8:14 PM Andrew Pinski  wrote:
> > 
> > On Fri, Apr 5, 2024 at 5:28 AM Manolis Tsamis  
> > wrote:
> > > 
> > > If we consider code like:
> > > 
> > > if (bar1 == x)
> > >   return foo();
> > > if (bar2 != y)
> > >   return foo();
> > > return 0;
> > > 
> > > We would like the ifcombine pass to convert this to:
> > > 
> > > if (bar1 == x || bar2 != y)
> > >   return foo();
> > > return 0;
> > > 
> > > The ifcombine pass can handle this transformation but it is ran very 
> > > early and
> > > it misses the opportunity because there are two seperate blocks for foo().
> > > The pre pass is good at removing duplicate code and blocks and due to that
> > > running ifcombine again after it can increase the number of successful
> > > conversions.
> > 
> > I do think we should have something similar to re-running
> > ssa-ifcombine but I think it should be much later, like after the loop
> > optimizations are done.
> > Maybe just a simplified version of it (that does the combining and not
> > the optimizations part) included in isel or pass_optimize_widening_mul
> > (which itself should most likely become part of isel or renamed since
> > it handles more than just widening multiply these days).
> 
> I've long wished we had a (late?) pass that can also undo if-conversion
> (basically do what RTL expansion would later do).  Maybe
> gimple-predicate-analysis.cc (what's used by uninit analysis) can
> represent mixed CFG + if-converted conditions so we can optimize
> it and code-gen the condition in a more optimal manner much like
> we have if-to-switch, switch-conversion and switch-expansion.
> 
> That said, I agree that re-running ifcombine should be later.  And there's
> still the old task of splitting tail-merging from PRE (and possibly making
> it more effective).

Sorry to butt in, but it might be little bit relevant and caught my
attention.

I've got this SH patch sitting around
https://gcc.gnu.org/bugzilla/attachment.cgi?id=55543

The idea is basically to run an additional loop pass after combine and
split1.  The main purpose is to hoist constant loads out of loops. Such
constant loads might be formed (in this particular case) during combine
transformations.

The patch adds a new file gcc/config/sh/sh_loop.cc, which has some boiler-
plate code copy pasted from other places to get the loop pass setup and
going.

Any thoughts on this way of doing it?


Best regards,
Oleg Endo


[committed][SH] Fix 101737

2024-03-02 Thread Oleg Endo
Hi,

The attached patch should fix PR 101737.  It's a rather obvious oversight. 
Sanity tested with 'make all-gcc'.  Committed to master, gcc-13, gcc-12,
gcc-11.

Cheers,
Oleg


gcc/ChangeLog:
PR target/101737
* config/sh/sh.cc (sh_is_nott_insn): Handle case where the input
is not an insn, but e.g. a code label.
From 4ff8ffe7331cf174668cf5c729fd68ff327ab014 Mon Sep 17 00:00:00 2001
From: Oleg Endo 
Date: Sun, 3 Mar 2024 14:58:58 +0900
Subject: [PATCH] SH: Fix 101737

gcc/ChangeLog:
	PR target/101737
	* config/sh/sh.cc (sh_is_nott_insn): Handle case where the input
	is not an insn, but e.g. a code label.
---
 gcc/config/sh/sh.cc | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/gcc/config/sh/sh.cc b/gcc/config/sh/sh.cc
index 2c4..ef3c2e6 100644
--- a/gcc/config/sh/sh.cc
+++ b/gcc/config/sh/sh.cc
@@ -11766,9 +11766,10 @@ sh_insn_operands_modified_between_p (rtx_insn* operands_insn,
negates the T bit and stores the result in the T bit.  */
 bool
 sh_is_nott_insn (const rtx_insn* i)
 {
-  return i != NULL && GET_CODE (PATTERN (i)) == SET
+  return i != NULL_RTX && PATTERN (i) != NULL_RTX
+	 && GET_CODE (PATTERN (i)) == SET
 	 && t_reg_operand (XEXP (PATTERN (i), 0), VOIDmode)
 	 && negt_reg_operand (XEXP (PATTERN (i), 1), VOIDmode);
 }
 
--
libgit2 1.6.4



Re: [PATCH] m68k: restore bootstrap

2024-02-18 Thread Oleg Endo
On Sun, 2024-02-18 at 08:42 -0700, Jeff Law wrote:
> 
> On 2/18/24 02:18, Mikael Pettersson wrote:
> > m68k fails to bootstrap since -ffold-mem-offsets was introduced,
> > in what looks like wrong-code during stage2.
> > 
> > To restore bootstrap this disables -ffold-mem-offsets on m68k.
> > It's not ideal, but better than keeping bootstraps broken until
> > the root cause is debugged and fixed.
> > 
> > Tested with a bootstrap and regression test run on m68k-linux-gnu.
> > 
> > Ok for master? (I'll need help getting it committed.)
> > 
> > gcc/
> > PR target/113357
> > * config/m68k/m68k.cc (m68k_option_override): Disable
> > -ffold-mem-offsets.  Fix typo in comment.
> Definitely not OK.This needs to be debugged further, just disabling 
> the pass is not the right solution here.
> 
> It is also worth noting I'm bootstrapping and regression testing the 
> m68k weekly.
> 
> 

Jeff, could you please consider sharing your test setup so that others can
reproduce it as well?

I'd be really better if more people had access to a unified test setup and
methodology.

Best regards,
Oleg Endo


Re: Building a GCC backend for the STM8

2024-01-30 Thread Oleg Endo
Hi,

On Sun, 2024-01-28 at 04:41 +0100, Sophie 'Tyalie' Friedrich via Gcc wrote:
> Hello dear people,
> 
> I want to try building a GCC compiler backend for the STM8 
> micro-controller target in order to make this wonderful architecture 
> more accessible.
> 
> But as I'm fairly new in this area of building compiler backends for 
> GCC, I would need a bit of guidance / read material to get started. Do 
> you have recommendations for anything? And is there interest in such work?
> 
> With best regards
> Tyalie

GCC might not be a bit difficult for 8-bit targets.  For example, if you
look at RL78, it had to resort to some virtual register set workaround
because GCC usual register allocation at that time couldn't deal with it. 
8-bit targets like AVR seem to be a bit easier.

Some other interesting options for 8-bit targets are SDCC (STM8 already
supported it seems) and llvm-mos project (LLVM port originally for 6502).

Cheers,
Oleg


Re: [committed] Adjust expectations for pr59533-1.c

2024-01-21 Thread Oleg Endo


On Sun, 2024-01-21 at 19:14 -0700, Jeff Law wrote:
> The change for pr111267 twiddled code generation for sh/pr59533-1.c
> 
> We end up eliminating two comparisons, but require two shll instructions 
> to do so.  And in a couple places we're using an addc sequence rather 
> than a subc sequence.   This patch adjusts the expected codegen for the 
> test as all those are either a wash or a
> 
> The fwprop change does cause some code regressions on the same test. 
> I'll file a distinct but for that issue.
> 
> Pushed to the trunk,
> 
> Jeff

Thanks for keeping an eye on this.

Note that on SH4 the comparison insns are of MT type, which increases
likelihood of parallel execution.  So it's better to use those e.g. to shift
out the MSB into T bit than shll.

Cheers,
Oleg


Re: [RFA] New pass for sign/zero extension elimination

2023-11-19 Thread Oleg Endo
On Sun, 2023-11-19 at 19:51 -0700, Jeff Law wrote:
> 
> On 11/19/23 18:22, Oleg Endo wrote:
> > 
> > On Sun, 2023-11-19 at 17:47 -0700, Jeff Law wrote:
> > > This is work originally started by Joern @ Embecosm.
> > > 
> > > There's been a long standing sense that we're generating too many
> > > sign/zero extensions on the RISC-V port.  REE is useful, but it's really
> > > focused on a relatively narrow part of the extension problem.
> > > 
> > > What Joern's patch does is introduce a new pass which tracks liveness of
> > > chunks of pseudo regs.  Specifically it tracks bits 0..7, 8..15, 16..31
> > > and 32..63.
> > > 
> > > If it encounters a sign/zero extend that sets bits that are never read,
> > > then it replaces the sign/zero extension with a narrowing subreg.  The
> > > narrowing subreg usually gets eliminated by subsequent passes (it's just
> > > a copy after all).
> > > 
> > 
> > Have you tried it on SH, too?  (and if so any numbers?)


> Just bootstrap with C regression testing on sh4/sh4eb.  No data on 
> improvements.
> 

Alright.  I'll check what it does for SH once it's in.

Cheers,
Oleg


Re: [RFA] New pass for sign/zero extension elimination

2023-11-19 Thread Oleg Endo


On Sun, 2023-11-19 at 17:47 -0700, Jeff Law wrote:
> This is work originally started by Joern @ Embecosm.
> 
> There's been a long standing sense that we're generating too many 
> sign/zero extensions on the RISC-V port.  REE is useful, but it's really 
> focused on a relatively narrow part of the extension problem.
> 
> What Joern's patch does is introduce a new pass which tracks liveness of 
> chunks of pseudo regs.  Specifically it tracks bits 0..7, 8..15, 16..31 
> and 32..63.
> 
> If it encounters a sign/zero extend that sets bits that are never read, 
> then it replaces the sign/zero extension with a narrowing subreg.  The 
> narrowing subreg usually gets eliminated by subsequent passes (it's just 
> a copy after all).
> 

Have you tried it on SH, too?  (and if so any numbers?)

It sounds like this one would be great to remove some of the sign/zero
extension removal hackery that I've accumulated in the SH backend.

Cheers,
Oleg


[SH][committed] Fix PR 111001

2023-10-23 Thread Oleg Endo
The attached patch fixes PR 111001.

Committed to master, cherry-picked to GCC-13, GCC-12 and GCC-11.
Sanity tested with 'make all-gcc'.
Bootstrapped on GCC-13 sh4-linux by Adrian.

Cheers,
Oleg

gcc/ChangeLog:

PR target/111001
* config/sh/sh_treg_combine.cc (sh_treg_combine::record_set_of_reg):
Skip over nop move insns.

From 4414818f4e5de54ea3c353e2ebb2e79a89ae211b Mon Sep 17 00:00:00 2001
From: Oleg Endo 
Date: Mon, 23 Oct 2023 22:08:37 +0900
Subject: [PATCH] SH: Fix PR 111001

gcc/ChangeLog:

	PR target/111001
	* config/sh/sh_treg_combine.cc (sh_treg_combine::record_set_of_reg):
	Skip over nop move insns.
---
 gcc/config/sh/sh_treg_combine.cc |  9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/gcc/config/sh/sh_treg_combine.cc b/gcc/config/sh/sh_treg_combine.cc
index f6553c0..685ca54 100644
--- a/gcc/config/sh/sh_treg_combine.cc
+++ b/gcc/config/sh/sh_treg_combine.cc
@@ -731,9 +731,16 @@ sh_treg_combine::record_set_of_reg (rtx reg, rtx_insn *start_insn,
 	  new_entry.cstore_type = cstore_inverted;
 	}
   else if (REG_P (new_entry.cstore.set_src ()))
 	{
-	  // If it's a reg-reg copy follow the copied reg.
+	  // If it's a reg-reg copy follow the copied reg, but ignore
+	  // nop copies of the reg onto itself.
+	  if (REGNO (new_entry.cstore.set_src ()) == REGNO (reg))
+	{
+	  i = prev_nonnote_nondebug_insn_bb (i);
+	  continue;
+	}
+
 	  new_entry.cstore_reg_reg_copies.push_back (new_entry.cstore);
 	  reg = new_entry.cstore.set_src ();
 	  i = new_entry.cstore.insn;
 
--
libgit2 1.3.2



[SH][committed] Fix PR 101177

2023-10-20 Thread Oleg Endo
The attached patch fixes PR 101177.

Committed to master, cherry-picked to GCC-13, GCC-12 and GCC-11.
Sanity tested with 'make all-gcc'.

Cheers,
Oleg

gcc/ChangeLog:

PR target/101177
* config/sh/sh.md (unnamed split pattern): Fix comparison of
find_regno_note result.

From 3ce4e99303d01d348229cca22bf8d3dd63004e01 Mon Sep 17 00:00:00 2001
From: Oleg Endo 
Date: Fri, 20 Oct 2023 18:48:34 +0900
Subject: [PATCH] SH: Fix PR 101177

Fix accidentally inverted comparison.

gcc/ChangeLog:

	PR target/101177
	* config/sh/sh.md (unnamed split pattern): Fix comparison of
	find_regno_note result.
---
 gcc/config/sh/sh.md | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 76e7774..93374c6 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -841,9 +841,9 @@
   rtx reg = operands[0];
   if (SUBREG_P (reg))
 reg = SUBREG_REG (reg);
   gcc_assert (REG_P (reg));
-  if (find_regno_note (curr_insn, REG_DEAD, REGNO (reg)) != NULL_RTX)
+  if (find_regno_note (curr_insn, REG_DEAD, REGNO (reg)) == NULL_RTX)
 FAIL;
 
   /* FIXME: Maybe also search the predecessor basic blocks to catch
  more cases.  */
--
libgit2 1.3.2



Re: RISC-V: Added support for CRC.

2023-09-26 Thread Oleg Endo
On Sun, 2023-09-24 at 00:05 +0100, Joern Rennecke wrote:
> 
> Although maybe Oleg Endo's library, as mentioned in
> https://gcc.gnu.org/pipermail/gcc-patches/2022-March/591748.html ,
> might be suitable?  What is the license for that?
> 
> 

I haven't published the library, but I think I could do that.

It's a C++-14 header-only thing and uses templates + constexpr to generate
the .rodata lookup tables.  It's convenient for an application project, as
it doesn't require any generator tool in the build.  This might be not a big
advantage in the context of GCC.

Since the tables are computed during compile-time, there is no particular
optimization implemented.  The run-time function is also nothing fancy:

static constexpr uint8_t table_index (value_type rem, uint8_t x)
{
  if (ReflectInput)
return x ^ rem;
  else
return x ^ (BitCount > 8 ? (rem >> (BitCount - 8))
 : (rem << (8 - BitCount)));
}

static constexpr value_type shift (value_type rem)
{
  return ReflectInput ? rem >> 8 : rem << 8;
}

static value_type
default_process_bytes (value_type rem, const uint8_t* in, const uint8_t* in_end)
{
  for (; in != in_end; ++in)
  {
auto i = table_index (rem, *in);
rem = table[i] ^ shift (rem);
  }
  return rem;
}

Anyway, let me know if anyone is interested.

Cheers,
Oleg


[SH][committed] Fix PR 101469

2023-07-13 Thread Oleg Endo
Hi,

The attached patch fixes PR 101469.
Tested by the original reporter Rin Okuyama on NetBSD with GCC 10.5.
Applied to master, GCC 11, GCC 12, GCC 13 after 'make all' sanity check.

Cheers,
Oleg


gcc/ChangeLog:

PR target/101469
* config/sh/sh.md (peephole2): Handle case where eliminated reg
is also used by the address of the following memory
operand.

diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
index 4622dba0121..76e7774cef3 100644
--- a/gcc/config/sh/sh.md
+++ b/gcc/config/sh/sh.md
@@ -10680,6 +10680,45 @@
&& peep2_reg_dead_p (2, operands[1]) && peep2_reg_dead_p (3, operands[0])"
   [(const_int 0)]
 {
+  if (MEM_P (operands[3]) && reg_overlap_mentioned_p (operands[0], operands[3]))
+{
+  // Take care when the eliminated operand[0] register is part of
+  // the destination memory address.
+  rtx addr = XEXP (operands[3], 0);
+
+  if (REG_P (addr))
+	operands[3] = replace_equiv_address (operands[3], operands[1]);
+
+  else if (GET_CODE (addr) == PLUS && REG_P (XEXP (addr, 0))
+	   && CONST_INT_P (XEXP (addr, 1))
+	   && REGNO (operands[0]) == REGNO (XEXP (addr, 0)))
+	operands[3] = replace_equiv_address (operands[3],
+			gen_rtx_PLUS (SImode, operands[1], XEXP (addr, 1)));
+
+  else if (GET_CODE (addr) == PLUS && REG_P (XEXP (addr, 0))
+	   && REG_P (XEXP (addr, 1)))
+{
+  // register + register address  @(R0, Rn)
+  // can change only the Rn in the address, not R0.
+  if (REGNO (operands[0]) == REGNO (XEXP (addr, 0))
+	  && REGNO (XEXP (addr, 0)) != 0)
+	{
+	  operands[3] = replace_equiv_address (operands[3],
+			gen_rtx_PLUS (SImode, operands[1], XEXP (addr, 1)));
+	}
+  else if (REGNO (operands[0]) == REGNO (XEXP (addr, 1))
+		   && REGNO (XEXP (addr, 1)) != 0)
+{
+	  operands[3] = replace_equiv_address (operands[3],
+			gen_rtx_PLUS (SImode, XEXP (addr, 0), operands[1]));
+}
+  else
+FAIL;
+}
+  else
+FAIL;
+}
+
   emit_insn (gen_addsi3 (operands[1], operands[1], operands[2]));
   sh_peephole_emit_move_insn (operands[3], operands[1]);
 })


Re: wishlist: support for shorter pointers

2023-07-04 Thread Oleg Endo
> I think a C++ class (or rather, class template) with inline functions is 
> the way to go here.  gcc's optimiser will give good code, and the C++ 
> class will let you get nice syntax to hide the messy details.
> 
> There is no good way to do this in C.  Named address spaces would be a 
> possibility, but require quite a bit of effort and change to the 
> compiler to implement, and they don't give you anything that you would 
> not get from a C++ class.
> 
> (That's not quite true - named address spaces can, I believe, also 
> influence the section name used for allocation of data defined in these 
> spaces, which cannot be done by a C++ class.)
> 

Does the C++ template class shebang work for storing "short code pointers"
for things like compile-time/link-time generated function tables?  Haven't
tried it myself, but somehow I doubt it.

Cheers,
Oleg




Re: RFA: crc builtin functions & optimizations

2022-03-14 Thread Oleg Endo
On Mon, 2022-03-14 at 18:04 -0700, Andrew Pinski via Gcc-patches wrote:
> On Mon, Mar 14, 2022 at 5:33 PM Joern Rennecke
>  wrote:
> > 
> > Most microprocessors have efficient ways to perform CRC operations, be
> > that with lookup tables, rotates, or even special instructions.
> > However, because we lack a representation for CRC in the compiler, we
> > can't do proper instruction selection.  With this patch I seek out to
> > rectify this,
> > I've avoided using a mode name for the built-in functions because that
> > would tie the semantics to the size of the addressable unit.  We
> > generally use abbreviations like s/l/ll for type names, which is all
> > right when the type can be widened without changing semantics.  For
> > the data input, however, we also have to consider the shift count that
> > is tied to it.  That is why I used a number to designate the width of
> > the data input and shift.
> > 
> > For machine support, I made a start with 8 and 16 bit little-endian
> > CRC for RISCV using a
> > lookup table.  I am sure once we have the basic infrastructure in the
> > tree, we'll get more
> > contributions of suitable named patterns for various ports.
> 
> 
> A few points.
> There are at least 9 different polynomials for the CRC-8 in common use today.
> For CRC-32 there are 5 different polynomials used.
> You don't have a patch to invoke.texi adding the descriptions of the builtins.
> How is your polynom 3rd argument described? Is it similar to how it is
> done on the wiki for the CRC?
> Does it make sense to have to list the most common polynomials in the
> documentation?
> 
> Also I am sorry but micro-optimizing coremarks is just wrong. Maybe it
> is better to pick the CRC32 that is inside zip instead for a testcase
> and benchmarking against?
> Or even the CRC32C for iSCSI/ext4.
> 
> I see you also don't optimize the case where you have three other
> variants of polynomials that are reversed, reciprocal and reversed
> reciocal.

In my own CRC library I've got ~30 'commonly used' CRC types, based on
the following generic definition:

template <
  // number of crc result bits (polynomial order in bits)
  unsigned int BitCount,

  // normal polynomial without the leading 1 bit.
  typename crc_impl_detail::select_int::type TruncPoly,

  // initial remainder
  typename crc_impl_detail::select_int::type InitRem = 0,

  // final xor value
  typename crc_impl_detail::select_int::type FinalXor = 0,

  // input data byte reflected before processing (LSB / MSB first)
  bool ReflectInput = false,

  // output CRC reflected before the xor
  bool ReflectRemainder = false >
class crc
{
...
};


and then it goes like ...

// CRC-1 (most hardware; also known as parity bit)
// x + 1
typedef crc < 1, 0x01 > crc_1;

// CRC-3
typedef crc < 3, 0x03, 0x07, 0x00, true, true> crc_3;

...

// CRC-32 (ISO 3309, ANSI X3.66, FIPS PUB 71, FED-STD-1003, ITU-T V.42, 
Ethernet, SATA, MPEG-2, Gzip, PKZIP, POSIX cksum, PNG, ZMODEM)
// x^32 + x^26 + x^23 + x^22 + x^16 + x^12 + x^11 + x^10 + x^8 + x^7 + x^5 + 
x^4 + x^2 + x + 1
typedef crc < 32, 0x04C11DB7, 0x, 0x, true, true > crc_32;

typedef crc < 32, 0x04C11DB7, 0x7FFF, 0x7FFF, false, false > 
crc_32_mpeg2;
typedef crc < 32, 0x04C11db7, 0x, 0x, false, false > 
crc_32_posix;

...


It then generates the lookup tables at compile time into .rodata for
the types that are used in the program, which is great for MCUs with
more flash/ROM than RAM.

Specific CRC types can be overridden if the system has a better way to
calculate the CRC, e.g. as hardware peripheral.

This being a library makes it relatively easy to tune and customize for
various systems.

How would that work together with your proposal?

Cheers,
Oleg



Re: [PATCH] sh-linux fix target cpu

2022-01-30 Thread Oleg Endo
On Fri, 2022-01-28 at 15:18 -0700, Jeff Law via Gcc-patches wrote:
> 
> On 1/12/2022 2:02 AM, Yoshinori Sato wrote:
> > sh-linux not supported any SH1 and SH2a little-endian.
> > Add exceptios it.
> > 
> > gcc/ChangeLog:
> > 
> > * config/sh/t-linux (MULTILIB_EXCEPTIONS): Add m1, mb/m1 and m2a.
> Thanks.  Technically this is probably too late to make gcc-12 as we're 
> in stage4 (regression fixes only).  BUt it was posted during stage3 
> (general bugfixing) and is very very low risk.
> 
> I went ahead and committed it for you.
> 
> Thanks, and sorry for the delays.


Thanks, Jeff!

Cheers,
Oleg



Re: Test results for gccrs on Debian unstable sh4

2021-06-09 Thread Oleg Endo
On Wed, 2021-06-09 at 15:49 +0200, John Paul Adrian Glaubitz wrote:
> Hi!
> 
> Just built revision ff4715d79e2c17d270db8b94315aa6b574f48994 on Debian 
> unstable sh4
> (SuperH) on my Renesas SH-7785LCR evaluation board.
> 
> Testsuite passes without any issues, attaching the full log.
> 
> CC'ing two SuperH-related mailing list in case someone is interested.
> 
> Adrian
> === rust tests ===
> 
> Schedule of variations:
> unix
> 
> Running target unix
> Using /usr/share/dejagnu/baseboards/unix.exp as board description file for 
> target.
> Using /usr/share/dejagnu/config/unix.exp as generic interface file for target.
> Using /srv/glaubitz/gccrs/gcc/testsuite/config/default.exp as 
> tool-and-target-specific interface file.
> Running /srv/glaubitz/gccrs/gcc/testsuite/rust/compile/compile.exp ...
> Running /srv/glaubitz/gccrs/gcc/testsuite/rust/compile/torture/compile.exp ...
> Running /srv/glaubitz/gccrs/gcc/testsuite/rust/compile/xfail/xfail.exp ...
> Running /srv/glaubitz/gccrs/gcc/testsuite/rust/execute/torture/execute.exp ...
> 
> === rust Summary ===
> 
> # of expected passes2415
> # of expected failures  15
> make[2]: Leaving directory '/srv/glaubitz/gccrs/build/gcc'
> make[1]: Leaving directory '/srv/glaubitz/gccrs/build/gcc'
> 


Nice! Thanks for your efforts, Adrian!

Cheers,
Oleg

-- 
Gcc-rust mailing list
Gcc-rust@gcc.gnu.org
https://gcc.gnu.org/mailman/listinfo/gcc-rust


Re: [PATCH 10/11] sh: Update unexpected empty split condition

2021-06-01 Thread Oleg Endo
On Wed, 2021-06-02 at 00:05 -0500, Kewen Lin wrote:
> gcc/ChangeLog:
> 
>   * config/sh/sh.md (doloop_end_split): Fix empty split condition.
> ---
>  gcc/config/sh/sh.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
> index e3af9ae21c1..93ee7c9a7de 100644
> --- a/gcc/config/sh/sh.md
> +++ b/gcc/config/sh/sh.md
> @@ -6424,7 +6424,7 @@ (define_insn_and_split "doloop_end_split"
> (clobber (reg:SI T_REG))]
>"TARGET_SH2"
>"#"
> -  ""
> +  "&& 1"
>[(parallel [(set (reg:SI T_REG)
>  (eq:SI (match_dup 2) (const_int 1)))
> (set (match_dup 0) (plus:SI (match_dup 2) (const_int -1)))])

This is OK (obvious).

Cheers,
Oleg



Re: PING [PATCH] RX new builtin function

2020-08-10 Thread Oleg Endo
On Mon, 2020-08-10 at 13:51 +0300, Darius Galis wrote:
> 
> I've found the following patch 
> https://gcc.gnu.org/legacy-ml/gcc-patches/2018-11/msg00983.html, but it 
> is not in the latest sources.
> Could please let me know why it was not added? I'm willing to do any 
> rework necessary in order for it to be accepted to the latest sources.

I think it'd be better to fix and/or improve the backend code so that
the compiler generates the bset instruction automatically.  Otherwise
this built-in is not very useful for user code.  Except for the use
case of atomic-or on byte memory location, as a "side effect".  But
that should be implemented as such -- as atomics.

There are a couple of PRs that might be relevant/interesting

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93587
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83832
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81823

Cheers,
Oleg



Re: [committed] Fix latent bug due to peephole2 pass dropping REG_INC notes

2020-05-31 Thread Oleg Endo
Hi Jeff,

On Sun, 2020-05-31 at 11:20 -0600, Jeff Law via Gcc-patches wrote:
> 
> The peephole2 pass makes some attempt to update various notes, but that 
> doesn't
> include REG_INC notes.  While I could trivially fix this in the H8 port, I
> wouldn't be terribly surprised if the lack of a REG_INC note could cause 
> problems
> on other ports.  So I think it's best to fix in the peephole pass.
> 
> As it turns out we already have  a function (two copies even!) to scan an insn
> for auto-inc addressing modes and add an appropriate note.
> 
> This patch moves that code from reload & lra into rtlanal.c and calls it from 
> the
> peephole pass.

I ran into this issue a while ago.
See also config/sh/sh.c, function sh_check_add_incdec_notes.

Is it possible to somehow fold all that into a universal solution?

Cheers,
Oleg



Re: size of exception handling (Was: performance of exception handling)

2020-05-12 Thread Oleg Endo
On Tue, 2020-05-12 at 09:20 +0200, Freddie Chopin wrote:
> 
> I actually have to build my own toolchain instead of the one provided
> by ARM, because to really NOT use C++ exceptions, you have to recompile
> the whole libstdc++ with `-fno-exceptions -fno-rtti` (yes, I know they
> provide the "nano" libraries, but I the options they used for newlib
> don't suit my needs - this is "too minimized"). If you pass these two
> flags during compilation and linking of your own application, this
> disables these features only in your code. As libstdc++ is compiled
> with exceptions and RTTI enabled, ...

IMHO this is a conceptual fail of the whole concept of using pre-
compiled pre-installed libraries somewhere in the toolchain, in
particular for this kind of cross-compilation scenario.  Like you say,
when we set "exceptions off" it usually means for the whole embedded
app, and the whole embedded app usually means all the OS and runtime
libraries and everything, not just the user code.

One option is to not use the pre-compiled toolchain libstc++ but build
it from source (or use another c++ std lib of your choice), as part of
the whole project, with the desired project settings.


BTW, just to throw in my 2-cents into the "I'm using MCU" pool of
pain/joy ... in one of my projects I'm using STM32F051K6U6, 32 KB
flash, 8 KB RAM, running all C++ code with shared C++ RPC libraries to
communicate with other (bigger) devices.  Exceptions, RTTI, threads
have to be turned off and only the header-only things from the stdlib
can be used and no heap allocations.  Otherwise the thing doesn't fit. 
Don't feel like rewriting the whole thing either.  There are some
annoyances when turning off exceptions and RTTI which results in
increased code maintenance.  I'd definitely be good and highly
appreciated if there were any improvements in the area of exception
handling.

Cheers,
Oleg



Re: Spam, bounces and gcc list removal

2020-03-21 Thread Oleg Endo
On Sat, 2020-03-21 at 13:08 -0700, H.J. Lu via Gcc wrote:
> On Sat, Mar 21, 2020 at 12:40 PM Thomas Koenig via Gcc <
> gcc@gcc.gnu.org> wrote:
> > 
> > Hi,
> > 
> > > since the change to the new list management, there has been
> > > an uptick of spam getting through. Spam is bounced by my ISP,
> > > and this just resulted in a warning that there were too many
> > > bounces and that I would get removed from the list unless I
> > > confirmed it (which I then did).
> > 
> > This has now happened a second time, and this question
> 
> Same here.
> 

Same here.


Cheers,
Oleg



Re: Minor regression due to recent IRA changes

2020-03-07 Thread Oleg Endo
On Thu, 2020-03-05 at 08:51 -0700, Jeff Law wrote:
> 
> FWIW I've got an sh4/sh4eb bootstrap and regression test running with
> HONOR_REG_ALLOC_ORDER defined.  As Vlad mentioned, that may be a
> viable workaround.
> 

I've had a look at the good old CSiBE code size results and poked at
some of the cases.  Overall, it seems to help code quality when
HONOR_REG_ALLOC_ORDER is defined on SH.

sum:  3383449 -> 3379629-3820 / -0.112903 %
avg: -212.22 / -0.271573 %
max: flex-2.5.31  253514 -> 253718+204 / +0.080469 %
min: bzip2-1.0.2   67202 -> 65938-1264 / -1.880896 %


However, even with HONOR_REG_ALLOC_ORDER defined, the simple test case
from PR 81426 https://gcc.gnu.org/bugzilla/attachment.cgi?id=47159
fails to compile without -mlra (use options -m4 -matomic-model=soft-gusa on 
regular non-linux sh-elf cross compiler).

How about the bootstrap, Jeff?  Did it help anything?

Cheers,
Oleg




Re: Minor regression due to recent IRA changes

2020-02-29 Thread Oleg Endo
On Sat, 2020-02-29 at 12:35 -0700, Jeff Law wrote:
> 
> Yup.  That was roughly what I was thinking and roughly the worry I had with
> trying to squash out the quality regressions.  But it may ultimately be the
> only way to really resolve these issues.

Another idea would be to let RA see R0, but ignore all the R0
constraints.  Then try fixing up everything afterwards.  If R0 is
removed from the allocatable reg list, there will be one register less
for it to work with and I'd expect some code quality regressions.  But
in order to fix up all the R0 cases after the regular RA/reload, I
believe it will have to re-do a lot of (similar) work that has been
done by the regular RA already.  One thing that comes instantly to mind
are loops and the use of R0 as index/base register in memory addressing
... it just sounds like a lot of duplicate work in general.

> 
> DJ's work on the m32c IIRC might be useful if you do try to chase this stuff
> down.  Essentially there weren't really enough registers.  So he had the port
> pretend to have more than it really did, then had a post-reload pass to do the
> final allocation into the target's actual register file.
> 

AFAIK DJ did the same (or similar) thing for RL78.  IMHO that just
shows that one type of RA/reload does not fit all.  Perhaps it'd be
better to have the option of different RA/reload implementations, which
implement different strategies for different needs and priorities.

Anyway, on SH the R0 problem seems to go away with LRA for the most
part.  I don't know if anything has been put in LRA specifically to
address such cases, or it works by general definition of the design, or
it's just a mere coincidence.  If it's the latter case, I'm not sure
what to expect in the future.  Perhaps it will start breaking again if
changes for other targets are being made to LRA.

Cheers,
Oleg



Re: Minor regression due to recent IRA changes

2020-02-29 Thread Oleg Endo
On Sat, 2020-02-29 at 09:38 -0700, Jeff Law wrote:
> 
> It really would have just been a workaround for some of the R0 issues anyway. 
> I think at its core R0 on the SH probably needs to be treated more like a
> temporary rather than a general register.  But that's probably a huge change,
> both in terms of just getting it working right and in terms of addressing the
> code quality regressions that would introduce.
> 

I think one of the major issues is that R0 is a constraint in several
addressing modes for memory accesses.  I believe I once had the idea of
hiding R0 from RA ... then insert reg-reg copies (to load R0) after
RA/reload ... and then somehow do back propagation to get rid of the
reg-reg copies again.  Another idea was to run a pre-RA pass to pre-
allocate all R0 things.  But I think it's all just running in sqrt(1)
circles after all.

Cheers,
Oleg



Re: Minor regression due to recent IRA changes

2020-02-29 Thread Oleg Endo
On Sat, 2020-02-29 at 08:57 -0700, Jeff Law wrote:
> 
> > It could open a can of worms.  Off the top of my head, R0 is used to
> > hold the function return value, and R0:R1 to return structs with sizeof
> > > 4 bytes.  So if DImode is allocated to R0, it implicitly uses R0:R1,
> > 
> > AFAIR, doesn't it?  Would that kind of thing cause troubles?
> 
> It might.  We might have to move a pair or even a quad if you have modes that
> cover r0-r3. It may not be feasible in practice.  I was just thinking off the
> top of my head.
> 

Yeah, for instance 'double _Complex' will be returned in R0-R3 when
compiling for 'without FPU'.  How about adding a target hook or look-up 
table (default 1:1 mapping for other targets)?  Would that be an
option?

Cheers,
Oleg



Re: Minor regression due to recent IRA changes

2020-02-29 Thread Oleg Endo
On Sat, 2020-02-29 at 08:47 -0700, Jeff Law wrote:
> 
> It's almost certainly the case that the recent IRA changes are going to stress
> R0 more.  If I'm reading what Vlad did correctly, one of the tie-breakers its
> using now is to choose the lowest numbered register when all else is equal.  
> So
> R0 on SH is likely going to be more problematical.
> 
> I wonder if just reordering the regs on the SH (and adjusting the debug output
> to keep that working) would be enough to mitigate some of the R0 problems.

It could open a can of worms.  Off the top of my head, R0 is used to
hold the function return value, and R0:R1 to return structs with sizeof
> 4 bytes.  So if DImode is allocated to R0, it implicitly uses R0:R1,
AFAIR, doesn't it?  Would that kind of thing cause troubles?

Cheers,
Oleg



Re: Minor regression due to recent IRA changes

2020-02-29 Thread Oleg Endo
On Fri, 2020-02-28 at 13:24 -0700, Jeff Law wrote:
> This change:
> 
> > commit 3133bed5d0327e8a9cd0a601b7ecdb9de4fc825d
> > Author: Vladimir N. Makarov 
> > Date:   Sun Feb 23 16:20:05 2020 -0500
> > 
> > Changing cost propagation and ordering colorable bucket
> > heuristics for
> > PR93564.
> > 
> > 2020-02-23  Vladimir Makarov  
> > 
> > PR rtl-optimization/93564
> > * ira-color.c (struct update_cost_queue_elem): New
> > member start.
> > (queue_update_cost, get_next_update_cost): Add new arg
> > start.
> > (allocnos_conflict_p): New function.
> > (update_costs_from_allocno): Add new arg
> > conflict_cost_update_p.
> > Add checking conflicts with allocnos_conflict_p.
> > (update_costs_from_prefs, restore_costs_from_copies):
> > Adjust
> > update_costs_from_allocno calls.
> > (update_conflict_hard_regno_costs): Add checking
> > conflicts with
> > allocnos_conflict_p.  Adjust calls of queue_update_cost
> > and
> > get_next_update_cost.
> > (assign_hard_reg): Adjust calls of
> > queue_update_cost.  Add
> > debugging print.
> > (bucket_allocno_compare_func): Restore previous
> > version.
> > 
> 
> Is causing c-torture/compile/sync-1 to fail with an ICE on sh4eb
> (search for
> "Tests that now fail, but worked before":
> 
> 
> http://3.14.90.209:8080/job/sh4eb-linux-gnu/lastFailedBuild/console
> 
> 
> In the .log we have:
> 
> > /home/gcc/gcc/gcc/testsuite/gcc.c-torture/compile/sync-1.c:253:1:
> > error:
> > unable to find a register to spill in class 'R0_REGS'^M
> > /home/gcc/gcc/gcc/testsuite/gcc.c-torture/compile/sync-1.c:253:1:
> > error: this
> > is the insn:^M
> > (insn 209 207 212 2 (parallel [^M
> > (set (subreg:SI (reg:HI 431) 0)^M
> > (unspec_volatile:SI [^M
> > (mem/v:HI (reg/f:SI 299) [-1  S2 A16])^M
> > (subreg:HI (reg:SI 6 r6 [orig:425 uc+-3 ]
> > [425]) 2)^M
> > (reg:HI 5 r5 [orig:428 sc+-1 ] [428])^M
> > ] UNSPECV_CMPXCHG_1))^M
> > (set (mem/v:HI (reg/f:SI 299) [-1  S2 A16])^M
> > (unspec_volatile:HI [^M
> > (const_int 0 [0])^M
> > ] UNSPECV_CMPXCHG_2))^M
> > (set (reg:SI 147 t)^M
> > (unspec_volatile:SI [^M
> > (const_int 0 [0])^M
> > ] UNSPECV_CMPXCHG_3))^M
> > (clobber (scratch:SI))^M
> > (clobber (reg:SI 0 r0))^M
> > (clobber (reg:SI 1 r1))^M
> > ]) "/home/gcc/gcc/gcc/testsuite/gcc.c-torture/compile/sync-
> > 1.c":245:8 
> > 406 {atomic_compare_and_swaphi_soft_gusa}^M
> >  (expr_list:REG_DEAD (reg:HI 5 r5 [orig:428 sc+-1 ] [428])^M
> > (expr_list:REG_DEAD (reg:SI 6 r6 [orig:425 uc+-3 ] [425])^M
> > (expr_list:REG_DEAD (reg/f:SI 299)^M
> > (expr_list:REG_UNUSED (reg:HI 431)^M
> > (expr_list:REG_UNUSED (reg:SI 1 r1)^M
> > (expr_list:REG_UNUSED (reg:SI 0 r0)^M
> > (nil^M
> > 
> 
> You should be able to trigger it with a cross compiler at -O2 with
> the attached
> testcase.
> 
> This could well be a target issue.  I haven't tried to debug it.  If
> it's a
> target issue, I'm fully comfortable punting it to the SH folks for
> resolving.

The R0_REGS spill failure is a general problem, in particular with old
reload.  The atomic patterns tend to trigger it in one circumstance or
the other.  The IRA change probably just stresses it more.  Perhaps it
will go away with -mlra.

However, LRA on SH still has its own issues, so it can't be generally
enabled by default yet, unfortunately.  See also some of the recent
posts in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93877

Cheers,
Oleg



Re: AW: [PATCH] m68k architecture: support ccmode + lra

2019-12-13 Thread Oleg Endo
On Fri, 2019-12-13 at 16:09 +0100, John Paul Adrian Glaubitz wrote:
> Hi!
> 
> On 12/13/19 4:06 PM, Oleg Endo wrote:
> > > What are the combine2 patches?
> > 
> > See the other thread that I've linked in my message.
> 
> I don't see any patch there.

You'd have to crawl up the discussion or so.
And I think there were a couple of versions.  Anyway, I don't think it
made it into trunk yet. 

> > 
> > Have you tried rebuilding debian on/for SH with -mlra enabled for
> > *everything*?  Do you have an easy way of doing that?  It would be
> > interesting to see how it goes.
> 
> Yes, that would be possible. We would have to enable -mlra in gcc by
> default and then trigger a rebuild for 10.000 source packages. But
> that would take a while to finish.
> 

Better start now then :)
No hurry though.

Cheers,
Oleg



Re: AW: [PATCH] m68k architecture: support ccmode + lra

2019-12-13 Thread Oleg Endo
On Fri, 2019-12-13 at 15:57 +0100, John Paul Adrian Glaubitz wrote:
> Hello Segher!
> 
> > With LRA, sh builds fine (with the combine2 patches).  I have no idea
> > if correct code is generated, but it doesn't ICE anymore.
> 
> What are the combine2 patches?

See the other thread that I've linked in my message.


>  And I would support switching SH to LRA as
> there are a few cases (Debian packages) where GCC fails with an internal
> compiler error which I reported to the GCC bugzilla.

Have you tried rebuilding debian on/for SH with -mlra enabled for
*everything*?  Do you have an easy way of doing that?  It would be
interesting to see how it goes.

Cheers,
Oleg





Re: AW: [PATCH] m68k architecture: support ccmode + lra

2019-12-13 Thread Oleg Endo
On Fri, 2019-12-13 at 08:09 -0600, Segher Boessenkool wrote:
> On Fri, Dec 13, 2019 at 10:06:20PM +0900, Oleg Endo wrote:
> > On Fri, 2019-12-13 at 05:03 -0600, Segher Boessenkool wrote:
> > > On Thu, Dec 12, 2019 at 09:32:27AM +, Richard Sandiford
> > > wrote:
> > > > I doubt it will be long before we deprecate
> > > > all targets that require old reload.)
> > > 
> > > Do we wait until GCC 12 (to remove old reload completely)?  If
> > > not, we
> > > should deprecate it now.
> > > 
> > 
> > Segher, could you please re-run your tests on SH with -mlra as
> > mentioned here?
> > https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00133.html
> > 
> > I'm thinking to make -mlra the default on SH.
> 
> With LRA, sh builds fine (with the combine2 patches).  I have no idea
> if correct code is generated, but it doesn't ICE anymore.
> 

Great, thanks for checking.  I'll try to run some more tests.

Cheers,
Oleg



Re: AW: [PATCH] m68k architecture: support ccmode + lra

2019-12-13 Thread Oleg Endo
On Fri, 2019-12-13 at 05:03 -0600, Segher Boessenkool wrote:
> On Thu, Dec 12, 2019 at 09:32:27AM +, Richard Sandiford wrote:
> > I doubt it will be long before we deprecate
> > all targets that require old reload.)
> 
> Do we wait until GCC 12 (to remove old reload completely)?  If not, we
> should deprecate it now.
> 

Segher, could you please re-run your tests on SH with -mlra as
mentioned here?
https://gcc.gnu.org/ml/gcc-patches/2019-12/msg00133.html

I'm thinking to make -mlra the default on SH.

Cheers,
Oleg





Re: AW: [PATCH] m68k architecture: support ccmode + lra

2019-12-07 Thread Oleg Endo
On Tue, 2019-11-26 at 07:38 +0100, ste...@franke.ms wrote:
> > On 11/21/19 10:30 AM, ste...@franke.ms wrote:
> > > Hi there,
> > > 
> > > here is mc68k's patch to switch the m68k architecture over to ccmode and
> > > lra. See https://github.com/mc68kghost/gcc 68k-ccmode branch.
> > 
> > Bernd Schmidt posted a conversion of the m68k port to ccmode a couple
> > weeks before yours.  We've already ACK'd it for installing onto the trunk.
> > 
> > Jeff
> 
> To be honest:
> - 8 days is hardly "a couple weeks before"
> - ccmode is not the same as ccmode+lra
> 
> The paperwork for contributing to fsf is on the way and the repo at
> https://github.com/mc68kghost/gcc got an update. Tests are not yet at 100%
> (master branch fails too many tests) but it's closer to master branch now.
> The code is to 50% identical, a fair amount has swapped cmp/bcc, few are a
> tad worse and some yield surprisingly better code.
> 

You can still submit patches for further improvements, like adding
support for LRA.  Now that the main CCmode conversion is on trunk and
has been confirmed and tested, it should be much easier for you to
pinpoint problems in your changes.

Cheers,
Oleg



Re: Add a new combine pass

2019-12-06 Thread Oleg Endo
On Fri, 2019-12-06 at 16:51 -0600, Segher Boessenkool wrote:
> On Wed, Dec 04, 2019 at 07:43:30PM +0900, Oleg Endo wrote:
> > On Tue, 2019-12-03 at 12:05 -0600, Segher Boessenkool wrote:
> > > > Hmm ... the R0 problem ... SH doesn't override class_likely_spilled
> > > > explicitly, but it's got a R0_REGS class with only one said reg in it. 
> > > > So the default impl of class_likely_spilled should do its thing.
> > > 
> > > Yes, good point.  So what happened here?
> > 
> > "Something, somewhere, went terribly wrong"...
> > 
> > insn 18 wants to do
> > 
> > mov.l @(r4,r6),r0
> > 
> > But it can't because the reg+reg address mode has a R0 constraint
> > itself.  So it needs to be changed to
> > 
> > mov   r4,r0
> > mov.l @(r0,r6),r0
> > 
> > And it can't handle that.  Or only sometimes?  Don't remember.
> > 
> > >   Is it just RA messing things
> > > up, unrelated to the new pass?
> > 
> > Yep, I think so.  The additional pass seems to create "tougher" code so
> > reload passes out earlier than usual.  We've had the same issue when
> > trying address mode selection optimization.  In fact that was one huge
> > showstopper.
> 
> So maybe you should have a define_insn_and_split that allows any two
> regs and replaces one by r0 if neither is (and a move to r0 before the
> load)?  Split after reload of course.
> 
> It may be admitting defeat, but it may even result in better code as
> well ;-)
> 

AFAIR I've tried that already and it was just like running in circles. 
Means it didn't help.  Perhaps if R0_REGS was hidden from RA altogether
it might work.  But that sounds like opening a whole other can of
worms.  Another idea I was entertaining was to do a custom RTL pass to
pre-allocate all R0 constraints before the real full RA.  But then the
whole reload stuff would still have the same issue as above.  So all
the wallpapering is just moot.  Proper fix of the actual problem would
be more appropriate.

Cheers,
Oleg



Re: Add a new combine pass

2019-12-04 Thread Oleg Endo
On Tue, 2019-12-03 at 12:05 -0600, Segher Boessenkool wrote:
> On Tue, Dec 03, 2019 at 10:33:48PM +0900, Oleg Endo wrote:
> > On Mon, 2019-11-25 at 16:47 -0600, Segher Boessenkool wrote:
> > > 
> > > > > - sh (that's sh4-linux):
> > > > > 
> > > > > /home/segher/src/kernel/net/ipv4/af_inet.c: In function 
> > > > > 'snmp_get_cpu_field':
> > > > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: unable to 
> > > > > find a register to spill in class 'R0_REGS'
> > > > >  1638 | }
> > > > >   | ^
> > > > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: this is the 
> > > > > insn:
> > > > > (insn 18 17 19 2 (set (reg:SI 0 r0)
> > > > > (mem:SI (plus:SI (reg:SI 4 r4 [178])
> > > > > (reg:SI 6 r6 [171])) [17 *_3+0 S4 A32])) 
> > > > > "/home/segher/src/kernel/net/ipv4/af_inet.c":1638:1 188 {movsi_i}
> > > > >  (expr_list:REG_DEAD (reg:SI 4 r4 [178])
> > > > > (expr_list:REG_DEAD (reg:SI 6 r6 [171])
> > > > > (nil
> > > > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638: confused by earlier 
> > > > > errors, bailing out
> > > > 
> > > > Would have to look more at this one.  Seems odd that it can't allocate
> > > > R0 when it's already the destination and when R0 can't be live before
> > > > the insn.  But there again, this is reload, so my enthuasiasm for 
> > > > looking
> > > > is a bit limited :-)
> > > 
> > > It wants to use r0 in some other insn, so it needs to spill it here, but
> > > cannot.  This is what class_likely_spilled is for.
> > 
> > Hmm ... the R0 problem ... SH doesn't override class_likely_spilled
> > explicitly, but it's got a R0_REGS class with only one said reg in it. 
> > So the default impl of class_likely_spilled should do its thing.
> 
> Yes, good point.  So what happened here?

"Something, somewhere, went terribly wrong"...

insn 18 wants to do

mov.l @(r4,r6),r0

But it can't because the reg+reg address mode has a R0 constraint
itself.  So it needs to be changed to

mov   r4,r0
mov.l @(r0,r6),r0

And it can't handle that.  Or only sometimes?  Don't remember.


>   Is it just RA messing things
> up, unrelated to the new pass?
> 

Yep, I think so.  The additional pass seems to create "tougher" code so
reload passes out earlier than usual.  We've had the same issue when
trying address mode selection optimization.  In fact that was one huge
showstopper.

Cheers,
Oleg



Re: Add a new combine pass

2019-12-03 Thread Oleg Endo
On Mon, 2019-11-25 at 16:47 -0600, Segher Boessenkool wrote:
> 
> > > - sh (that's sh4-linux):
> > > 
> > > /home/segher/src/kernel/net/ipv4/af_inet.c: In function 
> > > 'snmp_get_cpu_field':
> > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: unable to find 
> > > a register to spill in class 'R0_REGS'
> > >  1638 | }
> > >   | ^
> > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: this is the 
> > > insn:
> > > (insn 18 17 19 2 (set (reg:SI 0 r0)
> > > (mem:SI (plus:SI (reg:SI 4 r4 [178])
> > > (reg:SI 6 r6 [171])) [17 *_3+0 S4 A32])) 
> > > "/home/segher/src/kernel/net/ipv4/af_inet.c":1638:1 188 {movsi_i}
> > >  (expr_list:REG_DEAD (reg:SI 4 r4 [178])
> > > (expr_list:REG_DEAD (reg:SI 6 r6 [171])
> > > (nil
> > > /home/segher/src/kernel/net/ipv4/af_inet.c:1638: confused by earlier 
> > > errors, bailing out
> > 
> > Would have to look more at this one.  Seems odd that it can't allocate
> > R0 when it's already the destination and when R0 can't be live before
> > the insn.  But there again, this is reload, so my enthuasiasm for looking
> > is a bit limited :-)
> 
> It wants to use r0 in some other insn, so it needs to spill it here, but
> cannot.  This is what class_likely_spilled is for.
> 

Hmm ... the R0 problem ... SH doesn't override class_likely_spilled
explicitly, but it's got a R0_REGS class with only one said reg in it. 
So the default impl of class_likely_spilled should do its thing.

LRA is available on SH and often fixes the R0 problems -- but not
always.  Maybe it got better over time, haven't checked.

Could you re-run the SH build tests with -mlra, please ?

Cheers,
Oleg




Re: [PATCH 2/4] The main m68k cc0 conversion

2019-11-23 Thread Oleg Endo
On Sat, 2019-11-23 at 10:36 -0700, Jeff Law wrote:
> 
> > Any news on this? I would be in favor of merging these patches as I
> > have
> > tested them successfully on Debian by building the gcc-snapshot
> > package
> > with the patches applied. I used all four patches plus the
> > additional one
> > required to be able to build the kernel.
> 
> Not really.  I've already indicated to Bernd that he should go ahead
> and
> commit the changes and we can iterate on any problems that arise.
> 
> > 
> > I did not see the bootstrap problems that Mikael reported and
> > Andreas has
> > reported that the issues is not reliably reproducible on Aranym. He
> > suspected
> > that it might be a bug in the MMU emulation in Aranym which only
> > triggers
> > randomly depending on the current memory layout.
> 
> Good to know it wasn't reproducible and might ultimately be a bug in
> aranym.
> 

Meanwhile, another patch that's supposed to do the same thing was
posted:

https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02131.html

Cheers,
Oleg



Re: [PATCH 0/4][MSP430] Tweaks to default configuration to reduce code size

2019-11-08 Thread Oleg Endo
On Fri, 2019-11-08 at 13:27 +, Jozef Lawrynowicz wrote:
> 
> Yes, I should have used -flto in my examples. But it doesn't help remove these
> CRT library functions which are normally either directly added to the
> list of functions to run before main (via .init, .ctors or .init_array) or 
> used
> in functions which are themselves added to this list.
> 
> The unnecessary functions we want to remove are:
>   deregister_tm_clones
>   register_tm_clones
>   __do_global_dtors_aux
>   frame_dummy
> LTO can't remove any of them.
> 

Ah, right, good point.  That's not MSP430 specific actually.  For those
things I usually have custom init code, which also does other things
occasionally.  Stripping off global dtors is then an option in the
build system which takes care of it (in my case, I do it by modifying
the generated linker script).

But again, as with the exceptions, it might be better to implement
these kind of things outside of the compiler, e.g. by building the app
with -nostartfiles -nodefaultlibs and providing your own substitutes.

Another option is to patch those things in using the OS part of the
target triplet.

Cheers,
Oleg



Re: [PATCH 0/4][MSP430] Tweaks to default configuration to reduce code size

2019-11-08 Thread Oleg Endo
On Thu, 2019-11-07 at 21:31 +, Jozef Lawrynowicz wrote:
> When building small programs for MSP430, the impact of the unused
> functions pulled in from the CRT libraries is quite noticeable. Most of these
> relates to feature that will never be used for MSP430 (Transactional memory,
> supporting shared objects and dynamic linking), or rarely used (exception
> handling).

There's a magic switch, which does the business, at least for me, most
of the time:

   -flto

If you're trying to bring down the executable size as much as possible,
but don't use -flto, I think something is wrong.

Cheers,
Oleg



Re: [PATCH 2/4] MSP430: Disable exception handling by default for C++

2019-11-07 Thread Oleg Endo
On Thu, 2019-11-07 at 21:37 +, Jozef Lawrynowicz wrote:
> The code size bloat added by building C++ programs using libraries containing
> support for exceptions is significant. When using simple constructs such as
> static variables, sometimes many kB from the libraries are unnecessarily
> pulled in.
> 
> So this patch disable exceptions by default for MSP430 when compiling for C++,
> by implicitly passing -fno-exceptions unless -fexceptions is passed.

It is extremely annoying when GCC's default standard behavior differs
across different targets.  And as a consequence, you have to add a load
of workarounds and disable other things, like fiddling with the
testsuite.  It's the same thing as setting "double = float" to get more
"speed" by default.

I would strongly advice against making such non-standard behaviors the
default in the vanilla compiler.  C++ normally has exceptions enabled. 
If a user doesn't want them and is willing to deal with it all the
consequences, then we already have a mechanism to do that:
 --fno-exceptions

Perhaps it's generally more useful to add a global configure option for
GCC to disable exception handling by default.  Then you can provide a
turn-key toolchain to your customers as well -- just add an option to
the configure line.

Cheers,
Oleg



Re: [PATCH][RFC] C++-style iterators for FOR_EACH_IMM_USE_STMT

2019-11-03 Thread Oleg Endo
On Wed, 2019-10-30 at 10:27 +0100, Richard Biener wrote:
> 
> Hmm, not sure - I'd like to write
> 
>  for (gimple *use_stmt : imm_stmt_uses (SSAVAR))
>for (use_operand_p use_p :  from 
> above>)
>  ...
> 
> I don't see how that's possible.  It would need to be "awkward" like
> 
>  for (auto it : imm_stmt_uses (SSAVAR))
>{
>  gimple *use_stmt = *it;
>  for (use_operand_p use_p : it)
>...
>}
> 
> so the first loops iteration object are the actual iterator and you'd
> have to do extra indirection to get at the actual stmt you iterated
> to.
> 
> So I'd extend C++ (hah) to allow
> 
>   for (gimple *use_stmt : imm_stmt_uses (SSAVAR))
> for (use_operand_p use_p : auto)
>   ...
> 
> where 'auto' magically selects the next iterator object in scope
> [that matches].
> 
> ;)

Have you applied for a patent yet? :D

How about this one?

for (gimple* use_stmt : imm_stmt_uses (SSAVAR))
  for (use_operand_p use_p : imm_uses_on_stmt (*use_stmt))

... where helper function "imm_uses_on_stmt" returns a range object
that offers a begin and end function and its own iterator type.


Another concept that could be interesting are filter iterators.

We used a simplistic re-implementation (c++03) to avoid dragging in
boost when working on AMS
https://github.com/erikvarga/gcc/blob/master/gcc/filter_iterator.h

Example uses are
https://github.com/erikvarga/gcc/blob/master/gcc/ams.h#L845
https://github.com/erikvarga/gcc/blob/master/gcc/ams.cc#L3715


I think there are also some places in RTL where filter iterators could
be used, e.g. "iterate over all MEMs in an RTL" could be made to look
something like that:

  for (auto&& i : filter_rtl (my_rtl_obj, MEM_P))
   ...


Anyway, maybe it can plant some ideas.

Cheers,
Oleg



Re: [RFH][libgcc] fp-bit bit ordering (PR 78804)

2019-11-03 Thread Oleg Endo
On Fri, 2019-10-11 at 23:27 +0900, Oleg Endo wrote:
> On Thu, 2019-10-03 at 19:34 -0600, Jeff Law wrote:
> > 
> > So probably the most interesting target for this test is v850-elf
> > as
> > it's got a reasonably well functioning simulator, hard and soft FP
> > targets, little endian, and I'm familiar with its current set of
> > failures.
> > 
> > I can confirm that your patch makes no difference in the test
> > results
> > (which includes execution results).
> > 
> > In fact, there haven't been any problems on any target in my tester
> > that
> > I can tie back to this change.
> > 
> > At this point I'd say let's go for it.
> > 
> 
> Thanks, Jeff.  I'll commit it to trunk if there are no further
> objections some time next week.
> 

I've just committed it as r277752.

Personally I'd like to install it on GCC 8 and 9 branches as well.
Any thoughts on that?

Cheers,
Oleg




Re: [PATCH][RFC] C++-style iterators for FOR_EACH_IMM_USE_STMT

2019-10-29 Thread Oleg Endo
On Tue, 2019-10-29 at 11:26 +0100, Richard Biener wrote:
> While I converted other iterators requiring special BREAK_FROM_XYZ
> a few years ago FOR_EACH_IMM_USE_STMT is remaining.  I've pondered
> a bit but cannot arrive at a "nice" solution here with just one
> iterator as the macros happen to use.  For reference, the macro use
> is
> 
>   imm_use_iterator iter;
>   gimple *use_stmt;
>   FOR_EACH_IMM_USE_STMT (use_stmt, iter, name)
> {
>   use_operand_p use_p;
>   FOR_EACH_IMM_USE_ON_STMT (use_p, iter)
> ;
> }
> 
> which expands to (w/o macros)
> 
>imm_use_iterator iter; 
>for (gimple *use_stmt = first_imm_use_stmt (, name);
> !end_imm_use_stmt_p ();
> use_stmt = nest_imm_use_stmt ())
>  for (use_operand_p use_p = first_imm_use_on_stmt ();
>   !end_imm_use_on_stmt_p ();
>   use_p = next_imm_use_on_stmt ())
>;
> 
> and my foolish C++ attempt results in
> 
>  for (imm_use_stmt_iter it = SSAVAR; !it.end_p (); ++it)
>for (imm_use_stmt_iter::use_on_stmt it2 = it; !it2.end_p ();
> ++it2)
>  ;
> 
> with *it providing the gimple * USE_STMT and *it2 the use_operand_p.
> The complication here is to map the two-level iteration to "the C++
> way".
> Are there any STL examples mimicing this?  Of course with C++11 we
> could
> do
> 
>   for (imm_use_stmt_iter it = SSAVAR; !it.end_p (); ++it)
> for (auto it2 = it.first_use_on_stmt (); !it2.end_p (); ++it2)
>   ;
> 
> but that's not much nicer either.

Is there a way to put it in such a way that the iterators follow
standard concepts for iterators?  It would increase chances of it
becoming nicer by utilizing range based for loops.

Cheers,
Oleg



Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass

2019-10-26 Thread Oleg Endo
On Sat, 2019-10-26 at 14:04 -0600, Jeff Law wrote:
> On 10/26/19 1:33 PM, Andrew Waterman wrote:
> > I don't know enough to say whether the legitimize_address hook is
> > sufficient for the intended purpose, but I am sure that RISC-V's
> > concerns are not unique: other GCC targets have to cope with
> > offset-size constraints that are coupled to register-allocation
> > constraints.
> 
> Yup.  I think every risc port in the 90s faces this problem.  I
> always
> wished for a generic mechanism for ports to handle this problem.
> 
> Regardless, it's probably worth investigating.
> 

What we tried to do with the address mode selection (AMS) optimization
some time ago was the following:

  - Extract memory accesses from the insns stream and put them in 
"access sequences".  Also analyze the address expression and try   
to find effective base addresses by tracing back address 
calculations.

  - For each memory access, get a set of address mode alternatives and 
the corresponding costs from the backend.  The full context of each
access is provided, so the backend can detect things like 
"in this access sequence, FP loads dominate" and use this 
information to tune the alternative costs.

  - Given the alternatives and costs for each memory access, the pass 
would then try to minimize the costs of the whole memory access
sequence, taking costs of address modification isnns into account. 

I think this is quite generic, but of course also complex.  The
optimization problem itself is hard.  There was some research done by
others using  CPLEX or PBQP solvers.  To keep things simple we used a 
backtracking algorithm and handled only a limited set of scenarios. 
For example, I think we could not get loop constructs work nicely to
improve post-inc address mode utilization.

The SH ISA has very similar properties to ARM thumb and RVC, and
perhaps others.  Advantages would not be limited to RISC only, even
CISC ISAs like M68K, RX, ... can benefit from it, as the "proper
optimization" can reduce the instruction sizes by shortening the
addresses in the instruction stream.

If anyone is interested, here is the AMS optimization pass class:

https://github.com/erikvarga/gcc/blob/master/gcc/ams.h
https://github.com/erikvarga/gcc/blob/master/gcc/ams.cc

It's using a different style to callback into the backend code.  Not
GCC's "hooks" but a delegate pattern.  SH backend's delegate
implementation is here

https://github.com/erikvarga/gcc/blob/master/gcc/config/sh/sh.c#L11897

We were getting some improvements in the generated code, but it started
putting pressure on register allocation related issues in the SH
backend (the R0 problem), so we could not do more proper testing.

Maybe somebody can get some ideas out of it.

Cheers,
Oleg



Re: [PATCH v2 1/2] RISC-V: Add shorten_memrefs pass

2019-10-26 Thread Oleg Endo
On Sat, 2019-10-26 at 12:21 -0600, Jeff Law wrote:
> On 10/25/19 11:39 AM, Craig Blackmore wrote:
> > This patch aims to allow more load/store instructions to be
> > compressed by
> > replacing a load/store of 'base register + large offset' with a new
> > load/store
> > of 'new base + small offset'. If the new base gets stored in a
> > compressed
> > register, then the new load/store can be compressed. Since there is
> > an overhead
> > in creating the new base, this change is only attempted when 'base
> > register' is
> > referenced in at least 4 load/stores in a basic block.
> > 
> > The optimization is implemented in a new RISC-V specific pass
> > called
> > shorten_memrefs which is enabled for RVC targets. It has been
> > developed for the
> > 32-bit lw/sw instructions but could also be extended to 64-bit
> > ld/sd in future.
> > 
> > Tested on bare metal rv32i, rv32iac, rv32im, rv32imac, rv32imafc,
> > rv64imac,
> > rv64imafdc via QEMU. No regressions.
> > 
> > gcc/ChangeLog:
> > 
> > * config.gcc: Add riscv-shorten-memrefs.o to extra_objs for
> > riscv.
> > * config/riscv/riscv-passes.def: New file.
> > * config/riscv/riscv-protos.h (make_pass_shorten_memrefs):
> > Declare.
> > * config/riscv/riscv-shorten-memrefs.c: New file.
> > * config/riscv/riscv.c (tree-pass.h): New include.
> > (riscv_compressed_reg_p): New Function
> > (riscv_compressed_lw_offset_p): Likewise.
> > (riscv_compressed_lw_address_p): Likewise.
> > (riscv_shorten_lw_offset): Likewise.
> > (riscv_legitimize_address): Attempt to convert base +
> > large_offset
> > to compressible new_base + small_offset.
> > (riscv_address_cost): Make anticipated compressed load/stores
> > cheaper for code size than uncompressed load/stores.
> > (riscv_register_priority): Move compressed register check to
> > riscv_compressed_reg_p.
> > * config/riscv/riscv.h (RISCV_MAX_COMPRESSED_LW_OFFSET):
> > Define.
> > * config/riscv/riscv.opt (mshorten-memefs): New option.
> > * config/riscv/t-riscv (riscv-shorten-memrefs.o): New rule.
> > (PASSES_EXTRA): Add riscv-passes.def.
> > * doc/invoke.texi: Document -mshorten-memrefs.
> 
> This has traditionally been done via the the legitimize_address hook.
> Is there some reason that hook is insufficient for this case?
> 
> The hook, IIRC, is called out explow.c.
> 

This sounds like some of my addressing mode selection (AMS) attempts on
SH.  Haven't looked at the patch (sorry), but I'm sure the problem is
pretty much the same.

On SH legitimize_address is used to do ... "something" ... to the
address in order to make the displacement fit.  The issue is,
legitimize_address doesn't get any context so it can't even try to find
a local optimal base address or something like that.

Cheers,
Oleg



Re: [RFH][libgcc] fp-bit bit ordering (PR 78804)

2019-10-11 Thread Oleg Endo
On Thu, 2019-10-03 at 19:34 -0600, Jeff Law wrote:
> 
> So probably the most interesting target for this test is v850-elf as
> it's got a reasonably well functioning simulator, hard and soft FP
> targets, little endian, and I'm familiar with its current set of
> failures.
> 
> I can confirm that your patch makes no difference in the test results
> (which includes execution results).
> 
> In fact, there haven't been any problems on any target in my tester
> that
> I can tie back to this change.
> 
> At this point I'd say let's go for it.
> 

Thanks, Jeff.  I'll commit it to trunk if there are no further
objections some time next week.

It's a latent issue, not a regression.  OK for the open branches, too? 
Any opinions on that?

Cheers,
Oleg



Re: [RFH][libgcc] fp-bit bit ordering (PR 78804)

2019-10-11 Thread Oleg Endo
Hi Bernd,

On Sun, 2019-09-29 at 08:49 +, Bernd Edlinger wrote:
> 
> But I cannot tell if the bitfield access generates
> more efficient code or identical code than the
> original variant when no ms bitfields are used.
> That needs closer inspection of the generated
> assembler code, a simple bootstrap / regtest will
> probably not be sufficient.

Thanks for your comment.

The bitfields in this case are used for packing and unpacking the FP
values.  So I gave it a try on RX and SH as examples (with trunk
compiler).  I've extracted the unpack function for a "double" value and
compared the generated assembly code.  Please see the attached file.

The bitfield code on RX is equivalent.  On SH bitfields is slightly
worse.  It gets *a lot* worse when the struct is passed by
reference/pointer, then things get completely out of control.

Theoretically, if the bitfields and the shifts-and-ands extract/pack
the same bits from the underlying base integer value, the resulting
code should be the same in both cases (see RX).  But it depends on what
the backend does, or tries to do (see SH).

> But my thought is if the -mms-bitfields option has such
> an impact on this structure, then it would be good if there
> was a built-in define that can be used to adjust to and/or
> diagnose the problem at compile time.

I think it'd be better to add static_assert on the expected sizes of
the various float value structs.


> 
> I think that is missing right now, but wouldn't it be nice to have
> a define like __MS_BITFIELD_LAYOUT__ ?

Honestly, I'm not a big fan of the whole MS bitfield thing.  On RX it's
been put there long time ago for compatibility with some other
compiler.  I'm not sure if it's relevant at all anymore.  So I don't
want to add any more to the pile ...

Cheers,
Oleg

typedef unsigned long long fractype;

struct unpacked_double
{
  fractype fraction;
  int exp;
  int sign;
};

struct  __attribute__ ((packed))
bits_little_endian 
{
  fractype fraction:52 __attribute__ ((packed));
  unsigned int exp:11 __attribute__ ((packed));
  unsigned int sign:1 __attribute__ ((packed));
};

static_assert (sizeof (bits_little_endian) == 8, "");
static_assert (sizeof (long long) == 8, "");


// 

unpacked_double
unpack_d_bitfields (bits_little_endian val)
{
  fractype fraction = val.fraction;
  int exp = val.exp;
  int sign = val.sign;

  return { fraction, exp, sign };
}

/*
RX -O2

__Z18unpack_d_bitfieldsRK18bits_little_endian:
	mov.L	r2, r4	 ; 4	[c=4 l=2]  *movsi_internal/4
	shlr	#20, r2, r3	 ; 9	[c=8 l=3]  lshrsi3/2
	add	#-16, r0	 ; 58	[c=4 l=3]  addsi3_internal/3
	mov.L	#0xf, r2	 ; 57	[c=4 l=5]  *movsi_internal/2
	and	r4, r2	 ; 43	[c=4 l=2]  andsi3/0
	and	#0x7ff, r3	 ; 44	[c=4 l=4]  andsi3/3
	shlr	#31, r4	 ; 45	[c=8 l=2]  lshrsi3/1
	rtsd	#16		 ; 61	[c=4 l=2]  deallocate_and_return

SH -O2

__Z18unpack_d_bitfieldsRK18bits_little_endian:
.LFB0:
	.cfi_startproc
	.cfi_startproc
	add	#-8,r15	! 85	[c=4 l=2]  *addsi3/0
	.cfi_def_cfa_offset 8
	mov	r2,r0	! 2	[c=4 l=2]  movsi_ie/1
	mov.l	.L3,r3	! 32	[c=10 l=2]  movsi_ie/0
	mov	#-21,r2	! 82	[c=4 l=2]  movsi_ie/2
	mov	r5,r1	! 81	[c=4 l=2]  movsi_ie/1
	add	r1,r1	! 16	[c=4 l=2]  ashlsi3_k/0
	shld	r2,r1	! 71	[c=4 l=2]  lshrsi3_d
	mov	r5,r2	! 83	[c=4 l=2]  movsi_ie/1
	shll	r2		! 72	[c=0 l=2]  shll
	mov.l	r4,@r0	! 44	[c=4 l=2]  movsi_ie/8
	movt	r2		! 73	[c=4 l=2]  movt
	mov.l	r1,@(8,r0)	! 35	[c=4 l=2]  movsi_ie/8
	and	r3,r5	! 33	[c=4 l=2]  *andsi_compact/3
	mov.l	r2,@(12,r0)	! 36	[c=4 l=2]  movsi_ie/8
	mov.l	r5,@(4,r0)	! 45	[c=4 l=2]  movsi_ie/8
	rts			! 91	[c=0 l=2]  *return_i
	add	#8,r15	! 90	[c=4 l=2]  *addsi3/0
	.cfi_endproc

*/

// 

unpacked_double
unpack_d_no_bitfields (long long val)
{
  fractype fraction = val & fractype)1) << 52) - 1);
  int exp = ((int)(val >> 52)) & ((1 << 11) - 1);
  int sign = ((int)(val >> (52 + 11))) & 1;

  return { fraction, exp, sign };
}

/*
RX -O2
	mov.L	r2, r4	 ; 4	[c=4 l=2]  *movsi_internal/4
	shar	#20, r2, r3	 ; 13	[c=8 l=3]  ashrsi3/2
	add	#-16, r0	 ; 56	[c=4 l=3]  addsi3_internal/3
	mov.L	#0xf, r2	 ; 55	[c=4 l=5]  *movsi_internal/2
	and	r4, r2	 ; 42	[c=4 l=2]  andsi3/0
	and	#0x7ff, r3	 ; 43	[c=4 l=4]  andsi3/3
	shlr	#31, r4	 ; 44	[c=8 l=2]  lshrsi3/1
	rtsd	#16		 ; 59	[c=4 l=2]  deallocate_and_return

SH -O2
	mov	r2,r0	! 2	[c=4 l=2]  movsi_ie/1
	mov.l	.L6,r1	! 10	[c=10 l=2]  movsi_ie/0
	mov.l	r4,@r2	! 9	[c=4 l=2]  movsi_ie/8
	and	r5,r1	! 11	[c=4 l=2]  *andsi_compact/3
	mov.l	r1,@(4,r2)	! 12	[c=4 l=2]  movsi_ie/8
	mov	#-21,r2	! 69	[c=4 l=2]  movsi_ie/2
	mov	r5,r1	! 68	[c=4 l=2]  movsi_ie/1
	shll	r5		! 59	[c=0 l=2]  shll
	add	r1,r1	! 18	[c=4 l=2]  ashlsi3_k/0
	shld	r2,r1	! 58	[c=4 l=2]  lshrsi3_d
	movt	r5		! 60	[c=4 l=2]  movt
	mov.l	r1,@(8,r0)	! 20	[c=4 l=2]  movsi_ie/8
	rts			! 74	[c=0 l=2]  *return_i
	mov.l	r5,@(12,r0)	! 28	[c=4 l=2]  movsi_ie/8
.L7:
	.align 2
.L6:
	.long	1048575

*/



Re: [SH][committed] Fix PR 88630

2019-10-11 Thread Oleg Endo
On Fri, 2019-10-11 at 00:36 +0900, Oleg Endo wrote:
> Hi,
> 
> When we did the refactoring of SH's FPSCR handling back in GCC 5, we
> missed one thing regarding ST-40, it seems.
> 
> The attached patch fixes the issue as confirmed on the real
> hardware. 
> Also tested on sh-sim with
> make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,-
> m2a/-mb,-m4/-ml,-m4/-mb}"
> 
> 
> Committed to trunk, GCC 9, GCC 8 as r276809, r276825, r276837.

I forgot about GCC 7.  Committed there as well as r276877.

Cheers,
Oleg



[SH][committed] Fix PR 88630

2019-10-10 Thread Oleg Endo
Hi,

When we did the refactoring of SH's FPSCR handling back in GCC 5, we
missed one thing regarding ST-40, it seems.

The attached patch fixes the issue as confirmed on the real hardware. 
Also tested on sh-sim with
make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,-
m2a/-mb,-m4/-ml,-m4/-mb}"


Committed to trunk, GCC 9, GCC 8 as r276809, r276825, r276837.

Cheers,
Oleg

gcc/ChangeLog:
PR target/88630
* config/sh/sh.h (TARGET_FPU_SH4_300): New macro.
* config/sh/sh.c (sh_option_override): Enable fsca and fsrra insns
also for TARGET_FPU_SH4_300.
(sh_emit_mode_set): Check for TARGET_FPU_SH4_300 instead of
TARGET_SH4_300.
* config/sh/sh.md (toggle_pr): Add TARGET_FPU_SH4_300 condition.
(negsf2): Expand to either negsf2_fpscr or negsf2_no_fpscr.
(*negsf2_i): Split into ...
(negsf2_fpscr, negsf2_no_fpscr): ... these new patterns.
(abssf2): Expand to either abssf2_fpsc or abssf2_no_fpsc.
(**abssf2_i): Split into ...
(abssf2_fpscr, abssf2_no_fpscr): ... these new patterns.
(negdf2): Expand to either negdf2_fpscr or negdf2_no_fpscr.
(*negdf2_i): Split into ...
(negdf2_fpscr, negdf2_no_fpscr): ... these new patterns.
(absdf2): Expand to either absdf2_fpscr or absdf2_no_fpsc.
(**abssf2_i): Split into ...
(absdf2_fpscr, absdf2_no_fpscr): ... these new patterns.
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 276411)
+++ gcc/config/sh/sh.c	(working copy)
@@ -958,11 +958,13 @@
   if (flag_unsafe_math_optimizations)
 {
   /* Enable fsca insn for SH4A if not otherwise specified by the user.  */
-  if (global_options_set.x_TARGET_FSCA == 0 && TARGET_SH4A_FP)
+  if (global_options_set.x_TARGET_FSCA == 0
+	  && (TARGET_SH4A_FP || TARGET_FPU_SH4_300))
 	TARGET_FSCA = 1;
 
   /* Enable fsrra insn for SH4A if not otherwise specified by the user.  */
-  if (global_options_set.x_TARGET_FSRRA == 0 && TARGET_SH4A_FP)
+  if (global_options_set.x_TARGET_FSRRA == 0
+	  && (TARGET_SH4A_FP || TARGET_FPU_SH4_300))
 	TARGET_FSRRA = 1;
 }
 
@@ -12490,7 +12492,7 @@
 sh_emit_mode_set (int entity ATTRIBUTE_UNUSED, int mode,
 		  int prev_mode, HARD_REG_SET regs_live ATTRIBUTE_UNUSED)
 {
-  if ((TARGET_SH4A_FP || TARGET_SH4_300)
+  if ((TARGET_SH4A_FP || TARGET_FPU_SH4_300)
   && prev_mode != FP_MODE_NONE && prev_mode != mode)
 {
   emit_insn (gen_toggle_pr ());
Index: gcc/config/sh/sh.h
===
--- gcc/config/sh/sh.h	(revision 276410)
+++ gcc/config/sh/sh.h	(working copy)
@@ -69,6 +69,8 @@
FPU is disabled (which makes it compatible with SH4al-dsp).  */
 #define TARGET_SH4A_FP (TARGET_SH4A && TARGET_FPU_ANY)
 
+/* True if the FPU is a SH4-300 variant.  */
+#define TARGET_FPU_SH4_300 (TARGET_FPU_ANY && TARGET_SH4_300)
 
 /* This is not used by the SH2E calling convention  */
 #define TARGET_VARARGS_PRETEND_ARGS(FUN_DECL) \
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 276410)
+++ gcc/config/sh/sh.md	(working copy)
@@ -9163,7 +9163,7 @@
 	(xor:SI (reg:SI FPSCR_REG) (const_int FPSCR_PR)))
(set (reg:SI FPSCR_MODES_REG)
 	(unspec_volatile:SI [(const_int 0)] UNSPECV_FPSCR_MODES))]
-  "TARGET_SH4A_FP"
+  "TARGET_SH4A_FP || TARGET_FPU_SH4_300"
   "fpchg"
   [(set_attr "type" "fpscr_toggle")])
 
@@ -9391,15 +9391,31 @@
 (define_expand "negsf2"
   [(set (match_operand:SF 0 "fp_arith_reg_operand")
 	(neg:SF (match_operand:SF 1 "fp_arith_reg_operand")))]
-  "TARGET_SH2E")
+  "TARGET_FPU_ANY"
+{
+  if (TARGET_FPU_SH4_300)
+emit_insn (gen_negsf2_fpscr (operands[0], operands[1]));
+  else
+emit_insn (gen_negsf2_no_fpscr (operands[0], operands[1]));
+  DONE;
+})
 
-(define_insn "*negsf2_i"
+(define_insn "negsf2_no_fpscr"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
 	(neg:SF (match_operand:SF 1 "fp_arith_reg_operand" "0")))]
-  "TARGET_SH2E"
+  "TARGET_FPU_ANY && !TARGET_FPU_SH4_300"
   "fneg	%0"
   [(set_attr "type" "fmove")])
 
+(define_insn "negsf2_fpscr"
+  [(set (match_operand:SF 0 "fp_arith_reg_operand" "=f")
+	(neg:SF (match_operand:SF 1 "fp_arith_reg_operand" "0")))
+   (use (reg:SI FPSCR_MODES_REG))]
+  "TARGET_FPU_SH4_300"
+  "fneg	%0"
+  [(set_attr "type" "fmove")
+   (set_attr "fp_mode" "single")])
+
 (define_expand "sqrtsf2"
   [(set (match_operand:SF 0 "fp_arith_reg_operand" "")
 	(sqrt:SF (match_operand:SF 1 "fp_arith_reg_operand" "")))]
@@ -9489,15 +9505,31 @@
 (define_expand "abssf2"
   [(set (match_operand:SF 0 "fp_arith_reg_operand")
 	(abs:SF (match_operand:SF 1 "fp_arith_reg_operand")))]
-  "TARGET_SH2E")
+  "TARGET_FPU_ANY"
+{
+  if (TARGET_FPU_SH4_300)
+emit_insn (gen_abssf2_fpscr (operands[0], operands[1]));
+  else
+emit_insn (gen_abssf2_no_fpscr (operands[0], 

Re: [RFH][libgcc] fp-bit bit ordering (PR 78804)

2019-10-02 Thread Oleg Endo
On Tue, 2019-10-01 at 14:21 -0600, Jeff Law wrote:
> 
> So the ask is to just test this on some LE targets?  I can do that :-)
> 
> I'll throw it in.  Analysis will be slightly more difficult than
> usual
> as we've got some fallout from Richard S's work, but it's certainly
> do-able.

Thanks a lot!


> ps.  ANd yes, I've got a request to the build farm folks to get a
> jenkins instance on the build farm.  Once that's in place I can have
> my tester start publishing results that everyone can see.

Sounds great.  Would it be possible for other people to give the auto
tester patches for testing and get the results back from it?  Or
something like that?

Cheers,
Oleg



[SH][committed] Fix PR 88562

2019-10-01 Thread Oleg Endo
Hi,

The attached patch fixes PR 88562.

Tested on trunk with
   make -k check 
RUNTESTFLAGS="--target_board=sh-sim\{-m2/-ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb}"

Committed to trunk, GCC 9, GCC 8, GCC 7 as r276411, r276412, r276413, r276414.

Cheers,
Oleg


gcc/ChangeLog:
PR target/88562
* config/sh/sh.c (sh_extending_set_of_reg::use_as_extended_reg): Use
sh_check_add_incdec_notes to preserve REG_INC notes when replacing
a memory access insn.
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 276264)
+++ gcc/config/sh/sh.c	(working copy)
@@ -12068,9 +12068,11 @@
 	rtx r = gen_reg_rtx (SImode);
 	rtx_insn* i0;
 	if (from_mode == QImode)
-	  i0 = emit_insn_after (gen_extendqisi2 (r, set_src), insn);
+	  i0 = sh_check_add_incdec_notes (
+			emit_insn_after (gen_extendqisi2 (r, set_src), insn));
 	else if (from_mode == HImode)
-	  i0 = emit_insn_after (gen_extendhisi2 (r, set_src), insn);
+	  i0 = sh_check_add_incdec_notes (
+			emit_insn_after (gen_extendhisi2 (r, set_src), insn));
 	else
 	  gcc_unreachable ();
 


[RFH][libgcc] fp-bit bit ordering (PR 78804)

2019-09-28 Thread Oleg Endo
Hi,

I've been dragging this patch along with me for a while.
At the moment, I don't have the resources to fully test it as requested
by Ian in the PR discussion.

So I would like to ask for general comments on this one and hope that
folks with bigger automated test setups can run the patch through their
machinery for little endian targets.


Summary of the story:

I've noticed this issue on the RX on GCC 6, but it seems it's been
there forever.

On RX, fp-bit is used for software floating point emulation.  The RX
target also uses "MS bit-field" layout by default.  This means that
code like

struct
{
  fractype fraction:FRACBITS __attribute__ ((packed));
  unsigned int exp:EXPBITS __attribute__ ((packed));
  unsigned int sign:1 __attribute__ ((packed));
} bits;

will result in sizeof (bits) != 8

For some reason, this bit-field style declaration is used only for
FLOAT_BIT_ORDER_MISMATCH, which generally seems to be set for little
endian targets.  In other cases (i.e. big endian) open coded bit field
extraction and packing is used on the base integer type, like

 fraction = src->value_raw & fractype)1) << FRACBITS) - 1);
 exp = ((int)(src->value_raw >> FRACBITS)) & ((1 << EXPBITS) - 1);
 sign = ((int)(src->value_raw >> (FRACBITS + EXPBITS))) & 1;

This works of course regardless of the bit-field packing layout of the
target.

Joseph suggested to pack the struct bit, which would fix the issue.  
https://gcc.gnu.org/ml/gcc-bugs/2017-08/msg01651.html

However, I would like to propose to remove the special case of
FLOAT_BIT_ORDER_MISMATCH altogether as in the attached patch.

Any comments?

Cheers,
Oleg



libgcc/ChangeLog

PR libgcc/77804
* fp-bit.h: Remove FLOAT_BIT_ORDER_MISMATCH.
* fp-bit.c (pack_d, unpack_d): Remove special cases for 
FLOAT_BIT_ORDER_MISMATCH.
* config/arc/t-arc: Remove FLOAT_BIT_ORDER_MISMATCH.
Index: libgcc/config/arc/t-arc
===
--- libgcc/config/arc/t-arc	(revision 251045)
+++ libgcc/config/arc/t-arc	(working copy)
@@ -46,7 +46,6 @@
 
 dp-bit.c: $(srcdir)/fp-bit.c
 	echo '#ifndef __big_endian__' > dp-bit.c
-	echo '#define FLOAT_BIT_ORDER_MISMATCH' >> dp-bit.c
 	echo '#endif' >> dp-bit.c
 	echo '#include "fp-bit.h"' >> dp-bit.c
 	echo '#include "config/arc/dp-hack.h"' >> dp-bit.c
@@ -55,7 +54,6 @@
 fp-bit.c: $(srcdir)/fp-bit.c
 	echo '#define FLOAT' > fp-bit.c
 	echo '#ifndef __big_endian__' >> fp-bit.c
-	echo '#define FLOAT_BIT_ORDER_MISMATCH' >> fp-bit.c
 	echo '#endif' >> fp-bit.c
 	echo '#include "config/arc/fp-hack.h"' >> fp-bit.c
 	cat $(srcdir)/fp-bit.c >> fp-bit.c
Index: libgcc/fp-bit.c
===
--- libgcc/fp-bit.c	(revision 251045)
+++ libgcc/fp-bit.c	(working copy)
@@ -316,12 +316,7 @@
   /* We previously used bitfields to store the number, but this doesn't
  handle little/big endian systems conveniently, so use shifts and
  masks */
-#ifdef FLOAT_BIT_ORDER_MISMATCH
-  dst.bits.fraction = fraction;
-  dst.bits.exp = exp;
-  dst.bits.sign = sign;
-#else
-# if defined TFLOAT && defined HALFFRACBITS
+#if defined TFLOAT && defined HALFFRACBITS
  {
halffractype high, low, unity;
int lowsign, lowexp;
@@ -394,11 +389,10 @@
  }
dst.value_raw = ((fractype) high << HALFSHIFT) | low;
  }
-# else
+#else
   dst.value_raw = fraction & fractype)1) << FRACBITS) - (fractype)1);
   dst.value_raw |= ((fractype) (exp & ((1 << EXPBITS) - 1))) << FRACBITS;
   dst.value_raw |= ((fractype) (sign & 1)) << (FRACBITS | EXPBITS);
-# endif
 #endif
 
 #if defined(FLOAT_WORD_ORDER_MISMATCH) && !defined(FLOAT)
@@ -450,12 +444,7 @@
   src = 
 #endif
   
-#ifdef FLOAT_BIT_ORDER_MISMATCH
-  fraction = src->bits.fraction;
-  exp = src->bits.exp;
-  sign = src->bits.sign;
-#else
-# if defined TFLOAT && defined HALFFRACBITS
+#if defined TFLOAT && defined HALFFRACBITS
  {
halffractype high, low;

@@ -498,11 +487,10 @@
 	 }
  }
  }
-# else
+#else
   fraction = src->value_raw & fractype)1) << FRACBITS) - 1);
   exp = ((int)(src->value_raw >> FRACBITS)) & ((1 << EXPBITS) - 1);
   sign = ((int)(src->value_raw >> (FRACBITS + EXPBITS))) & 1;
-# endif
 #endif
 
   dst->sign = sign;
Index: libgcc/fp-bit.h
===
--- libgcc/fp-bit.h	(revision 251045)
+++ libgcc/fp-bit.h	(working copy)
@@ -128,10 +128,6 @@
 #define NO_DI_MODE
 #endif
 
-#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
-#define FLOAT_BIT_ORDER_MISMATCH
-#endif
-
 #if __BYTE_ORDER__ != __FLOAT_WORD_ORDER__
 #define FLOAT_WORD_ORDER_MISMATCH
 #endif
@@ -354,16 +350,6 @@
 # endif
 #endif
 
-#ifdef FLOAT_BIT_ORDER_MISMATCH
-  struct
-{
-  fractype fraction:FRACBITS __attribute__ ((packed));
-  unsigned int exp:EXPBITS __attribute__ ((packed));
-  unsigned int sign:1 __attribute__ ((packed));
-}
-  bits;
-#endif
-
 #ifdef _DEBUG_BITFLOAT
   struct
 {


Re: [PATCH v2] libitm: sh: avoid absolute relocation in shared library (PR 86712)

2019-09-28 Thread Oleg Endo
On Sat, 2018-08-04 at 18:00 +0900, Oleg Endo wrote:
> On Fri, 2018-08-03 at 14:54 -0600, Jeff Law wrote:
> > On 07/28/2018 07:04 AM, slyfox.inbox.ru via gcc-patches wrote:
> > > 
> > > From: Sergei Trofimovich 
> > > 
> > > Cc: Andreas Schwab 
> > > Cc: Torvald Riegel 
> > > Cc: Alexandre Oliva 
> > > Cc: Oleg Endo 
> > > Cc: Kaz Kojima 
> > > Signed-off-by: Sergei Trofimovich 
> > > ---
> > >  libitm/config/sh/sjlj.S | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/libitm/config/sh/sjlj.S b/libitm/config/sh/sjlj.S
> > > index 043f36749be..f265ab8f898 100644
> > > --- a/libitm/config/sh/sjlj.S
> > > +++ b/libitm/config/sh/sjlj.S
> > > @@ -53,7 +53,7 @@ _ITM_beginTransaction:
> > >  #else
> > >   cfi_def_cfa_offset (4*10)
> > >  #endif
> > > -#if defined HAVE_ATTRIBUTE_VISIBILITY || !defined __PIC__
> > > +#if !defined __PIC__
> > >   mov.l   .Lbegin, r1
> > >   jsr @r1
> > >movr15, r5
> > > @@ -78,7 +78,7 @@ _ITM_beginTransaction:
> > >  
> > >   .align  2
> > >  .Lbegin:
> > > -#if defined HAVE_ATTRIBUTE_VISIBILITY || !defined __PIC__
> > > +#if !defined __PIC__
> > >   .long   GTM_begin_transaction
> > >  #else
> > >   .long   GTM_begin_transaction@PCREL-(.Lbegin0-.)
> > > 
> > 
> > THanks.  I installed this version.
> > 
> 
> Thanks Jeff.
> If there are no objections, I'll backport it to the 7 and 8 branches.
> 
> Cheers,
> Oleg


Finally  committed to GCC 8 as r276246 and to GCC 7 as r276247.

Cheers,
Oleg




[SH][committed] Fix PR 86805

2019-09-28 Thread Oleg Endo
Hi,

This also sets TARGET_HAVE_SPECULATION_SAFE_VALUE to
speculation_safe_value_not_needed for SH.

Tested with "make all-gcc".

Committed on trunk as r276244 and on GCC 9 as r276245.

Cheers,
Oleg

gcc/ChangeLog

PR target/86805
* config/sh/sh.c (TARGET_HAVE_SPECULATION_SAFE_VALUE): Define.
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 276243)
+++ gcc/config/sh/sh.c	(working copy)
@@ -661,6 +661,9 @@
 #undef TARGET_CONSTANT_ALIGNMENT
 #define TARGET_CONSTANT_ALIGNMENT constant_alignment_word_strings
 
+#undef  TARGET_HAVE_SPECULATION_SAFE_VALUE
+#define TARGET_HAVE_SPECULATION_SAFE_VALUE speculation_safe_value_not_needed
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 


[SH][committed] Fix PR 80672

2019-09-28 Thread Oleg Endo
Hi,

The attached patch fixes PR 80672.
Tested by building the compiler with "make all-gcc" and manually
invoking it and checking that the option is parsed as expected.

Committed to trunk as r276240, GCC 9 as r276241, GCC 8 as r276242, GCC
7 as r276243.

Cheers,
Oleg


gcc/ChangeLog
PR target/80672
* config/sh/sh.c (parse_validate_atomic_model_option): Use
std::string::compare instead of std::string::find.
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 276235)
+++ gcc/config/sh/sh.c	(working copy)
@@ -734,7 +734,7 @@
 {
   if (tokens[i] == "strict")
 	ret.strict = true;
-  else if (tokens[i].find ("gbr-offset=") == 0)
+  else if (!tokens[i].compare (0, strlen ("gbr-offset="), "gbr-offset="))
 	{
 	  std::string offset_str = tokens[i].substr (strlen ("gbr-offset="));
 	  ret.tcb_gbr_offset = integral_argument (offset_str.c_str ());


Re: [committed] [PR target/85993] Remove dead conditional in SH target code

2019-09-28 Thread Oleg Endo
On Sun, 2018-07-15 at 14:30 -0600, Jeff Law wrote:
> 
> Per Oleg's comment in the PR, the second block is dead and should be
> removed...
> 
> Committing to the trunk.   While I'm confident this won't change
> anything, my tester will bootstrap sh4 & sh4eb overnight for
> additional
> verification.
> 

Probably irrelevant, but just for the record ...

I've just backported this patch to GCC 7 and GCC 8 branches as r276237
and r276239.

Tested briefly with "make all-gcc".

Cheers,
Oleg



Re: [1/9] Simplify the implementation of HARD_REG_SET

2019-09-10 Thread Oleg Endo
On Mon, 2019-09-09 at 19:05 +0100, Richard Sandiford wrote:
> 
> Yeah.  I might come back to this later and look at a fuller
> transition
> to C++ (or at least to try to get rid of CLEAR_HARD_REG_SET).
> 

Maybe you can just typedef it to std::bitset ;)

Cheers,
Oleg



Re: [PATCH 26/30] Changes to sh

2019-06-28 Thread Oleg Endo
On Tue, 2019-06-25 at 15:22 -0500, acsaw...@linux.ibm.com wrote:
> From: Aaron Sawdey 
> 
>   * config/sh/sh.md (movmemsi): Change name to cpymemsi.
> ---
>  gcc/config/sh/sh.md | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gcc/config/sh/sh.md b/gcc/config/sh/sh.md
> index 8354377..ed70e34 100644
> --- a/gcc/config/sh/sh.md
> +++ b/gcc/config/sh/sh.md
> @@ -8906,7 +8906,7 @@
>  
>  ;; String/block move insn.
>  
> -(define_expand "movmemsi"
> +(define_expand "cpymemsi"
>[(parallel [(set (mem:BLK (match_operand:BLK 0))
>  (mem:BLK (match_operand:BLK 1)))
> (use (match_operand:SI 2 "nonmemory_operand"))

Looks like a trivial change.  It's OK.

Cheers,
Oleg



Re: [PATCH] RX: Add rx-*-linux target

2019-06-03 Thread Oleg Endo
On Sun, 2019-06-02 at 12:37 -0500, Segher Boessenkool wrote:
> 
> This is -m64bit-doubles/-m32bit-doubles, and t-rx already multilibs
> that?

Yes, it does.

> And the default is 32-bit.  So why does rx-linux need something
> different?
> You could make a point for wanting 64-bit doubles as default, even;
> disabling it completely does not seem like a good plan.
> 

On RX, "default to 32-bit" means DF=SF, always.  I.e. DF is disabled
for good.  Usually I built RX stuff always with -m64bit-doubles because
of that.  It should be actually the default and -m32bit-doubles should
be some kind of "optimization" switch.  So in my optinion, -m64bit-
doubles should be the default for a linux target, to allow having
actual doubles, and not just floats.

Cheers,
Oleg




Re: [PATCH] RX: Add rx-*-linux target

2019-06-02 Thread Oleg Endo
On Sun, 2019-06-02 at 20:26 +0900, Yoshinori Sato wrote:
> On Fri, 31 May 2019 09:16:18 +0900,
> Jeff Law wrote:
> > 
> > On 5/29/19 12:27 PM, Jeff Law wrote:
> > > On 5/23/19 6:05 AM, Yoshinori Sato wrote:
> > > > I ported linux kernel to Renesas RX.
> > > > 
> > > > rx-*-elf target output a binary different from the standard
> > > > ELF.
> > > > It has the same format as the Renesas compiler.
> > > > 
> > > > But the linux kernel requires the standard ELF format.
> > > > I want to define a rx-*-linux target so that I can generate
> > > > a standard ELF binary.
> > > 
> > > Presumably you're resubmitting after your assignment got recorded
> > > (I
> > > think I saw that fly by recently).
> > > 
> > > I'll construct a ChangeLog and install this on the trunk.
> > 
> > So this is causing libgcc to fail to build for rx-elf.  The problem
> > is
> > the DF=SF #define.  I think you need so split that out so that it's
> > only
> > used for rx-linux.
> > 
> > Jeff
> 
> OK. fix it.
> I tried build rx-elf target. it success.
> 

Setting DF=SF is the wrong thing to do IMHO.  RX can do DF just fine in
software.  If this is hardcoded like that in the roots of the
toolchain, it will make compiling packages that actually need real DF
completely impossible, won't it?  We also don't set DI = SI just
because the hardware is bad at SI ... 

Just my 2 cents.

Cheers,
Oleg



Re: Recent combine change causing regressions

2019-05-13 Thread Oleg Endo
On Mon, 2019-05-13 at 09:38 -0500, Segher Boessenkool wrote:
> On Mon, May 13, 2019 at 08:27:15AM -0600, Jeff Law wrote:
> > sh3-linux-gnu and sh3eb-linux-gnu:
> 
> I test sh2 and sh4, but not sh3 :-)
> 
> > Tests that now fail, but worked before (3 tests):
> > 
> > gcc.target/sh/pr51244-11.c scan-assembler-not subc|and|bra
> > gcc.target/sh/pr51244-11.c scan-assembler-times bf\t0f 1
> > gcc.target/sh/pr51244-11.c scan-assembler-times bt\t0f 1
> > 
> > Previously we'd match this pattern:
> > 
> > (define_insn "*cset_zero"
> >   [(set (match_operand:SI 0 "arith_reg_dest" "=r")
> > (if_then_else:SI (match_operand:SI 1 "cbranch_treg_value")
> >  (match_operand:SI 2 "arith_reg_operand"
> > "0")
> >  (const_int 0)))]
> >   "TARGET_SH1 && TARGET_ZDCBRANCH"
> > 
> > After your change we no longer try to do so.
> > 
> > I really don't care about the SH port.  But isn't this really a
> > symptom
> > of a larger problem.  Namely that by not generating if-then-else
> > you've
> > hosed every target that implements conditional moves via if-then-
> > else
> > constructs?
> 
> I tested on 30-something targets (all *-linux), and only mips64
> regressed
> a little, everything else improved.  So the current tuning is better
> than
> what it was before.  No doubt it can be improved though!
> 
> This is only if-then-else for a single bit, fwiw.
> 
> I'll build some sh3-linux if I find a cycle or two.
> 

Hmmm .. on SH3 TARGET_ZDCBRANCH should be off, afair.
What would be the alternative now for the if-then-else?

Cheers,
Oleg



Re: Parallelize the compilation using Threads

2019-02-15 Thread Oleg Endo
On Tue, 2019-02-12 at 15:12 +0100, Richard Biener wrote:
> On Mon, Feb 11, 2019 at 10:46 PM Giuliano Belinassi
>  wrote:
> > 
> > Hi,
> > 
> > I was just wondering what API should I use to spawn threads and
> > control
> > its flow. Should I use OpenMP, pthreads, or something else?
> > 
> > My point what if we break compatibility with something. If we use
> > OpenMP, I'm afraid that we will break compatibility with compilers
> > not
> > supporting it. On the other hand, If we use pthread, we will break
> > compatibility with non-POSIX systems (Windows).
> 
> I'm not sure we have a thread abstraction for the host - we do have
> one for the target via libgcc gthr.h though.  For prototyping I'd
> resort
> to this same interface and fixup the host != target case as needed.

Or maybe, in the year 2019, we could assume that most c++ compilers
which are used to compile GCC support c++11 and come with an adequate
 implementation...  yeah, I know, sounds jacked :)

Cheers,
Oleg



Re: [PATCH v2] libitm: sh: avoid absolute relocation in shared library (PR 86712)

2018-08-04 Thread Oleg Endo
On Fri, 2018-08-03 at 14:54 -0600, Jeff Law wrote:
> On 07/28/2018 07:04 AM, slyfox.inbox.ru via gcc-patches wrote:
> > 
> > From: Sergei Trofimovich 
> > 
> > Cc: Andreas Schwab 
> > Cc: Torvald Riegel 
> > Cc: Alexandre Oliva 
> > Cc: Oleg Endo 
> > Cc: Kaz Kojima 
> > Signed-off-by: Sergei Trofimovich 
> > ---
> >  libitm/config/sh/sjlj.S | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/libitm/config/sh/sjlj.S b/libitm/config/sh/sjlj.S
> > index 043f36749be..f265ab8f898 100644
> > --- a/libitm/config/sh/sjlj.S
> > +++ b/libitm/config/sh/sjlj.S
> > @@ -53,7 +53,7 @@ _ITM_beginTransaction:
> >  #else
> >     cfi_def_cfa_offset (4*10)
> >  #endif
> > -#if defined HAVE_ATTRIBUTE_VISIBILITY || !defined __PIC__
> > +#if !defined __PIC__
> >     mov.l   .Lbegin, r1
> >     jsr @r1
> >      movr15, r5
> > @@ -78,7 +78,7 @@ _ITM_beginTransaction:
> >  
> >     .align  2
> >  .Lbegin:
> > -#if defined HAVE_ATTRIBUTE_VISIBILITY || !defined __PIC__
> > +#if !defined __PIC__
> >     .long   GTM_begin_transaction
> >  #else
> >     .long   GTM_begin_transaction@PCREL-(.Lbegin0-.)
> > 
> THanks.  I installed this version.
> 

Thanks Jeff.
If there are no objections, I'll backport it to the 7 and 8 branches.

Cheers,
Oleg


Re: [PATCH] RX TARGET_RTX_COSTS function

2018-02-22 Thread Oleg Endo
On Thu, 2018-02-22 at 15:41 +, Nick Clifton wrote:


> > > gcc/ChangeLog:
> > >   * config/rx/rx.c (rx_rtx_costs): New function.
> > >   (TARGET_RTX_COSTS): Override to use rx_rtx_costs.
> Approved - please apply.
> 

Thanks.  Committed as r257905.

Cheers,
Oleg


Re: [PATCH] RX TARGET_RTX_COSTS function

2018-02-22 Thread Oleg Endo
Ping.

On Thu, 2018-02-15 at 23:07 +0900, Oleg Endo wrote:
> On Wed, 2018-02-14 at 01:06 +0900, Oleg Endo wrote:
> > 
> >  
> > Do you happen to have any other numbers on the resulting code
> > size/speed?  Looking at the new costs that the patch introduces,
> > I'd
> > expect there to be some more changes than just the 1/x...
> > 
> I've checked your proposed patch with the CSiBE set for code size
> changes.
> 
> With your patch, the code size of the whole set:
> sum:  2806044 -> 2801346-4698 / -0.167424 %
> 
> 
> Taking out this piece
> 
>     case IF_THEN_ELSE:
>   *total = COSTS_N_INSNS (3);
>   return true;
> 
> from the rx_rtx_costs results in:
> sum:  2806044 -> 2801099-4945 / -0.176227 %
> 
> 
> Taking out another piece 
> 
>       if (GET_CODE (XEXP (x, 0)) == MEM
>  || GET_CODE (XEXP (x, 1)) == MEM)
>    *total = COSTS_N_INSNS (3);
>   else
> 
> results in:
> sum:  2806044 -> 2800315-5729 / -0.204166 %
> 
> So I'd like to propose the attached patch instead, as it eliminates 1
> KByte of code more from the whole set.
> 
> Just in case, I'm testing it now with
>   "make -k check" on rx-sim for c and c++
> 
> OK for trunk if it passes?
> 
> Cheers,
> Oleg
> 
> gcc/ChangeLog:
>   * config/rx/rx.c (rx_rtx_costs): New function.
>   (TARGET_RTX_COSTS): Override to use rx_rtx_costs.


Re: [RX] Fix PR 83831 -- Unused bclr, bnot, bset insns

2018-02-16 Thread Oleg Endo
On Wed, 2018-02-14 at 21:34 +0900, Oleg Endo wrote:


> > Approved - please apply - and thanks very much for doing this!
> Thanks!  Committed as r257655.
> 

Testing another patch
https://gcc.gnu.org/ml/gcc-patches/2018-02/msg00903.html

revealed a bug.
I've committed the attached somewhat obvious patch as r257735.

Cheers,
Oleg

gcc/ChangeLog:
PR target/83831
* config/rx/rx.c (rx_fuse_in_memory_bitop): Convert shift operand
to QImode.

gcc/testsuite/ChangeLog:
PR target/83831
* gcc.target/rx/pr83831.c (test_3, test_6): Adjust test cases.Index: gcc/config/rx/rx.c
===
--- gcc/config/rx/rx.c	(revision 257733)
+++ gcc/config/rx/rx.c	(working copy)
@@ -3515,7 +3515,7 @@
 if (volatile_insn_p (PATTERN (i)) || CALL_P (i))
   return false;
 
-  emit_insn (gen_insn (mem, operands[1]));
+  emit_insn (gen_insn (mem, gen_lowpart (QImode, operands[1])));
   set_insn_deleted (op2_def.insn);
   set_insn_deleted (op0_use);
   return true;
Index: gcc/testsuite/gcc.target/rx/pr83831.c
===
--- gcc/testsuite/gcc.target/rx/pr83831.c	(revision 257733)
+++ gcc/testsuite/gcc.target/rx/pr83831.c	(working copy)
@@ -1,8 +1,8 @@
 /* { dg-do compile }  */
 /* { dg-options "-O1" }  */
 /* { dg-final { scan-assembler-times "bclr" 6 } }  */
-/* { dg-final { scan-assembler-times "bset" 6 } }  */
-/* { dg-final { scan-assembler-times "bnot" 6 } }  */
+/* { dg-final { scan-assembler-times "bset" 7 } }  */
+/* { dg-final { scan-assembler-times "bnot" 7 } }  */
 
 void
 test_0 (char* x, unsigned int y)
@@ -29,13 +29,14 @@
 }
 
 void
-test_3 (char* x, unsigned int y)
+test_3 (char* x, unsigned int y, unsigned int z)
 {
-  /* Expect 4x bset here.  */
+  /* Expect 5x bset here.  */
   x[0] |= 0x10;
   x[1] = y | (1 << 1);
   x[2] |= 0x10;
   x[65000] |= 0x10;
+  x[5] |= 1 << z;
 }
 
 unsigned int
@@ -53,13 +54,14 @@
 }
 
 void
-test_6 (char* x, unsigned int y)
+test_6 (char* x, unsigned int y, unsigned int z)
 {
-  /* Expect 4x bnot here.  */
+  /* Expect 5x bnot here.  */
   x[0] ^= 0x10;
   x[1] = y ^ (1 << 1);
   x[2] ^= 0x10;
   x[65000] ^= 0x10;
+  x[5] ^= 1 << z;
 }
 
 unsigned int


Re: [PATCH] RX TARGET_RTX_COSTS function

2018-02-15 Thread Oleg Endo
On Wed, 2018-02-14 at 01:06 +0900, Oleg Endo wrote:
> 
> Do you happen to have any other numbers on the resulting code
> size/speed?  Looking at the new costs that the patch introduces, I'd
> expect there to be some more changes than just the 1/x...
> 

I've checked your proposed patch with the CSiBE set for code size
changes.

With your patch, the code size of the whole set:
sum:  2806044 -> 2801346-4698 / -0.167424 %


Taking out this piece

    case IF_THEN_ELSE:
  *total = COSTS_N_INSNS (3);
  return true;

from the rx_rtx_costs results in:
sum:  2806044 -> 2801099-4945 / -0.176227 %


Taking out another piece 

      if (GET_CODE (XEXP (x, 0)) == MEM
 || GET_CODE (XEXP (x, 1)) == MEM)
   *total = COSTS_N_INSNS (3);
  else

results in:
sum:  2806044 -> 2800315-5729 / -0.204166 %

So I'd like to propose the attached patch instead, as it eliminates 1
KByte of code more from the whole set.

Just in case, I'm testing it now with
  "make -k check" on rx-sim for c and c++

OK for trunk if it passes?

Cheers,
Oleg

gcc/ChangeLog:
* config/rx/rx.c (rx_rtx_costs): New function.
(TARGET_RTX_COSTS): Override to use rx_rtx_costs.Index: gcc/config/rx/rx.c
===
--- gcc/config/rx/rx.c	(revision 257655)
+++ gcc/config/rx/rx.c	(working copy)
@@ -2976,6 +2976,62 @@
 }
 
 static bool
+rx_rtx_costs (rtx x, machine_mode mode, int outer_code ATTRIBUTE_UNUSED,
+	  int opno ATTRIBUTE_UNUSED, int* total, bool speed)
+{
+  if (x == const0_rtx)
+{
+  *total = 0;
+  return true;
+}
+
+  switch (GET_CODE (x))
+{
+case MULT:
+  if (mode == DImode)
+	{
+	  *total = COSTS_N_INSNS (2);
+	  return true;
+	}
+  /* fall through */
+
+case PLUS:
+case MINUS:
+case AND:
+case COMPARE:
+case IOR:
+case XOR:
+  *total = COSTS_N_INSNS (1);
+  return true;
+
+case DIV:
+  if (speed)
+	/* This is the worst case for a division.  Pessimize divisions when
+	   not optimizing for size and allow reciprocal optimizations which
+	   produce bigger code.  */
+	*total = COSTS_N_INSNS (20);
+  else
+	*total = COSTS_N_INSNS (3);
+  return true;
+
+case UDIV:
+  if (speed)
+	/* This is the worst case for a division.  Pessimize divisions when
+	   not optimizing for size and allow reciprocal optimizations which
+	   produce bigger code.  */
+	*total = COSTS_N_INSNS (18);
+  else
+	*total = COSTS_N_INSNS (3);
+  return true;
+
+default:
+  break;
+}
+
+  return false;
+}
+
+static bool
 rx_can_eliminate (const int from ATTRIBUTE_UNUSED, const int to)
 {
   /* We can always eliminate to the frame pointer.
@@ -3709,6 +3765,9 @@
 #undef  TARGET_MODES_TIEABLE_P
 #define TARGET_MODES_TIEABLE_P			rx_modes_tieable_p
 
+#undef  TARGET_RTX_COSTS
+#define TARGET_RTX_COSTS rx_rtx_costs
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-rx.h"


Re: [RX] Fix PR 83831 -- Unused bclr, bnot, bset insns

2018-02-14 Thread Oleg Endo
On Tue, 2018-02-13 at 17:04 +, Nick Clifton wrote:
> 
> > gcc/ChangeLog:
> > 
> > PR target/83831
> > * config/rx/rx-protos.h (rx_reg_dead_or_unused_after_insn,
> > rx_copy_reg_dead_or_unused_notes, rx_fuse_in_memory_bitop): New
> > declarations.
> > (set_of_reg): New struct.
> > (rx_find_set_of_reg, rx_find_use_of_reg): New functions.
> > * config/rx/rx.c (rx_reg_dead_or_unused_after_insn,
> > rx_copy_reg_dead_or_unused_notes, rx_fuse_in_memory_bitop): New
> > functions.
> > * config/rx/rx.md (andsi3, iorsi3, xorsi3): Convert to insn_and_split.
> > Split into bitclr, bitset, bitinvert patterns if appropriate.
> > (*bitset, *bitinvert, *bitclr): Convert to named insn_and_split and
> > use rx_fuse_in_memory_bitop.
> > (*bitset_in_memory, *bitinvert_in_memory, *bitclr_in_memory): Convert
> > to named insn, correct maximum insn length.
> > 
> > gcc/testsuite/ChangeLog:
> > 
> > PR target/83831
> > * gcc.target/rx/pr83831.c: New tests.

> Approved - please apply - and thanks very much for doing this!

Thanks!  Committed as r257655.

Cheers,
Oleg


[RX] Fix PR 83831 -- Unused bclr, bnot, bset insns

2018-02-13 Thread Oleg Endo
Hi,

The attached patch fixes the deficits mentioned in PR 83831 which is
about unused bclr, bnot and bset instructions.

For some simple cases, the combine pass can successfully fuse a load-
modify-store insn sequence into an insn that operates on a memory
directly.  However, in some cases where it thinks it's too complex, it
will not try to combine the insns.

What I'm doing here is similar to what I've been doing on SH in the
split1 pass after combine -- manually walking the insns up/down
(limited to the BB) to find the def/use and fuse 3 insns into 1, if
it's possible to do so.

For that I've copy-pasted some of the RTL utility functions from SH.  I
will propose folding and moving those into rtl.h / rtlanal.c during
next stage 1.

With that patch, I get a code size decrease of about 1 KByte on a
larger application.

The attached patch is the version for GCC 8 (trunk).  I've posted
versions for GCC 6 and GCC 7 in the PR.  All 3 patches have been tested
with 
   "make -k check" on rx-sim for c and c++

with no new failures.

OK for trunk?

Cheers,
Oleg

gcc/ChangeLog:

PR target/83831
* config/rx/rx-protos.h (rx_reg_dead_or_unused_after_insn,
rx_copy_reg_dead_or_unused_notes, rx_fuse_in_memory_bitop): New
declarations.
(set_of_reg): New struct.
(rx_find_set_of_reg, rx_find_use_of_reg): New functions.
* config/rx/rx.c (rx_reg_dead_or_unused_after_insn,
rx_copy_reg_dead_or_unused_notes, rx_fuse_in_memory_bitop): New
functions.
* config/rx/rx.md (andsi3, iorsi3, xorsi3): Convert to insn_and_split.
Split into bitclr, bitset, bitinvert patterns if appropriate.
(*bitset, *bitinvert, *bitclr): Convert to named insn_and_split and
use rx_fuse_in_memory_bitop.
(*bitset_in_memory, *bitinvert_in_memory, *bitclr_in_memory): Convert
to named insn, correct maximum insn length.

gcc/testsuite/ChangeLog:

PR target/83831
* gcc.target/rx/pr83831.c: New tests.Index: gcc/config/rx/rx-protos.h
===
--- gcc/config/rx/rx-protos.h	(revision 257549)
+++ gcc/config/rx/rx-protos.h	(working copy)
@@ -63,6 +63,112 @@
 extern void		rx_split_cbranch (machine_mode, enum rtx_code,
 	  rtx, rtx, rtx);
 extern machine_mode	rx_select_cc_mode (enum rtx_code, rtx, rtx);
+
+extern bool rx_reg_dead_or_unused_after_insn (const rtx_insn* i, int regno);
+extern void rx_copy_reg_dead_or_unused_notes (rtx reg, const rtx_insn* src,
+	  rtx_insn* dst);
+
+extern bool rx_fuse_in_memory_bitop (rtx* operands, rtx_insn* curr_insn,
+ rtx (*gen_insn)(rtx, rtx));
+
+/* Result value of rx_find_set_of_reg.  */
+struct set_of_reg
+{
+  /* The insn where sh_find_set_of_reg stopped looking.
+ Can be NULL_RTX if the end of the insn list was reached.  */
+  rtx_insn* insn;
+
+  /* The set rtx of the specified reg if found, NULL_RTX otherwise.  */
+  const_rtx set_rtx;
+
+  /* The set source rtx of the specified reg if found, NULL_RTX otherwise.
+ Usually, this is the most interesting return value.  */
+  rtx set_src;
+};
+
+/* FIXME: Copy-pasta from SH.  Move to rtl.h.
+   Given a reg rtx and a start insn, try to find the insn that sets
+   the specified reg by using the specified insn stepping function,
+   such as 'prev_nonnote_nondebug_insn_bb'.  When the insn is found,
+   try to extract the rtx of the reg set.  */
+template  inline set_of_reg
+rx_find_set_of_reg (rtx reg, rtx_insn* insn, F stepfunc,
+		bool ignore_reg_reg_copies = false)
+{
+  set_of_reg result;
+  result.insn = insn;
+  result.set_rtx = NULL_RTX;
+  result.set_src = NULL_RTX;
+
+  if (!REG_P (reg) || insn == NULL_RTX)
+return result;
+
+  for (rtx_insn* i = stepfunc (insn); i != NULL_RTX; i = stepfunc (i))
+{
+  if (BARRIER_P (i))
+	break;
+  if (!INSN_P (i) || DEBUG_INSN_P (i))
+	  continue;
+  if (reg_set_p (reg, i))
+	{
+	  if (CALL_P (i))
+	break;
+
+	  result.insn = i;
+	  result.set_rtx = set_of (reg, i);
+
+	  if (result.set_rtx == NULL_RTX || GET_CODE (result.set_rtx) != SET)
+	break;
+
+	  result.set_src = XEXP (result.set_rtx, 1);
+
+	  if (ignore_reg_reg_copies && REG_P (result.set_src))
+	{
+	  reg = result.set_src;
+	  continue;
+	}
+	  if (ignore_reg_reg_copies && SUBREG_P (result.set_src)
+	  && REG_P (SUBREG_REG (result.set_src)))
+	{
+	  reg = SUBREG_REG (result.set_src);
+	  continue;
+	}
+
+	  break;
+	}
+}
+
+  /* If the searched reg is found inside a (mem (post_inc:SI (reg))), set_of
+ will return NULL and set_rtx will be NULL.
+ In this case report a 'not found'.  result.insn will always be non-null
+ at this point, so no need to check it.  */
+  if (result.set_src != NULL && result.set_rtx == NULL)
+result.set_src = NULL;
+
+  return result;
+}
+
+/* FIXME: Move to rtlh.h.  */
+template  inline rtx_insn*
+rx_find_use_of_reg (rtx reg, rtx_insn* insn, F 

Re: [PATCH] RX TARGET_RTX_COSTS function

2018-02-13 Thread Oleg Endo
Hi,

On Tue, 2018-02-13 at 15:54 +, Sebastian Perta wrote:
> 
> The patch required some changes (the prototype, second param more
> exactly, has changed) in order to compile in the trunk.

Could you please send the patch as an attachment?  The formatting looks
a bit off (tabs spaces etc).

> So I updated this and I also same a further change to the patch: I
> disabled the replacement of the division with multiplication of
> reciprocals on -Os because it increases code size for example for the
> following division:

Do you happen to have any other numbers on the resulting code
size/speed?  Looking at the new costs that the patch introduces, I'd
expect there to be some more changes than just the 1/x...

Cheers,
Oleg


Re: PING [PATCH] RX movsicc degrade fix

2018-02-12 Thread Oleg Endo
On Mon, 2018-02-12 at 11:06 +, Sebastian Perta wrote:


> > > 1) there should be a space between * and the filename
> The spaces are there (see the changelog), the renesas mail server
> removes them sometimes

You might want to send around your patches as email attachments.  That
avoids formatting issues.

Cheers,
Oleg


[SH][committed] Fix PR 81485

2018-01-21 Thread Oleg Endo
Hi,

The following fixes PR 81485.
Tested with make -k check RUNTESTFLAGS="--target_board=sh-sim\{-m2/-
ml,-m2/-mb,-m2a/-mb,-m4/-ml,-m4/-mb,-m4a/-ml,-m4a/-mb}"

Committed as r256930.

Cheers,
Oleg

gcc/ChangeLog:
PR target/81485
* config/sh/sh-protos.h (sh_find_set_of_reg): Remove assert.Index: gcc/config/sh/sh-protos.h
===
--- gcc/config/sh/sh-protos.h	(revision 256929)
+++ gcc/config/sh/sh-protos.h	(working copy)
@@ -228,9 +228,13 @@
 	}
 }
 
-  if (result.set_src != NULL)
-gcc_assert (result.insn != NULL && result.set_rtx != NULL);
-
+  /* If the searched reg is found inside a (mem (post_inc:SI (reg))), set_of
+ will return NULL and set_rtx will be NULL.
+ In this case report a 'not found'.  result.insn will always be non-null
+ at this point, so no need to check it.  */
+  if (result.set_src != NULL && result.set_rtx == NULL)
+result.set_src = NULL;
+ 
   return result;
 }
 


[SH][committed] Fix PR 80870

2018-01-20 Thread Oleg Endo
Hi,

The following fixed PR 80870.
For whatever reason one of the source files in config/sh was still
including  and  directly...

Committed as r256926 (trunk), r256928 (GCC 7), r256929 (GCC 6).

Cheers,
Oleg

gcc/ChangeLog:
PR target/80870
* config/sh/sh_optimize_sett_clrt.cc:
Use INCLUDE_ALGORITHM and INCLUDE_VECTOR instead of direct includes.Index: gcc/config/sh/sh_optimize_sett_clrt.cc
===
--- gcc/config/sh/sh_optimize_sett_clrt.cc	(revision 256924)
+++ gcc/config/sh/sh_optimize_sett_clrt.cc	(working copy)
@@ -20,6 +20,8 @@
 #define IN_TARGET_CODE 1
 
 #include "config.h"
+#define INCLUDE_ALGORITHM
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -29,9 +31,6 @@
 #include "cfgrtl.h"
 #include "tree-pass.h"
 
-#include 
-#include 
-
 /*
 This pass tries to eliminate unnecessary sett or clrt instructions in cases
 where the ccreg value is already known to be the same as the constant set
Index: gcc/config/sh/sh_optimize_sett_clrt.cc
===
--- gcc/config/sh/sh_optimize_sett_clrt.cc	(revision 256924)
+++ gcc/config/sh/sh_optimize_sett_clrt.cc	(working copy)
@@ -18,6 +18,8 @@
 .  */
 
 #include "config.h"
+#define INCLUDE_ALGORITHM
+#define INCLUDE_VECTOR
 #include "system.h"
 #include "coretypes.h"
 #include "backend.h"
@@ -27,9 +29,6 @@
 #include "cfgrtl.h"
 #include "tree-pass.h"
 
-#include 
-#include 
-
 /*
 This pass tries to eliminate unnecessary sett or clrt instructions in cases
 where the ccreg value is already known to be the same as the constant set


[wwwdocs][committed] Mention GCC 7 changes for SH and RX

2018-01-20 Thread Oleg Endo
Hi,

Somehow I never managed to commit the attached patch.
Better late than never.

Cheers,
Oleg? gcc7_rx_sh_update.patch
Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.96
diff -r1.96 changes.html
1065c1065,1067
< 
---
> RX
> Basic support for atomic built-in function has been added.  It is currently
> implemented by flipping interrupts off and on as needed.
1067c1069,1094
< 
---
> SH
> 
>   Support for SH5/SH64 has been removed.
>   Improved utilization of delay slots on SH2A.
>   Improved utilization of zero-displacement conditional branches.
>   The following deprecated options have been removed
> 
>   -mcbranchdi
>   -mcmpeqdi
>   -minvalid-symbols
>   -msoft-atomic
>   -mspace
>   -madjust-unroll
> 
>   
>   Support for the following SH2A instructions has been added
> 
>   mov.b  @-Rm,R0
>   mov.w  @-Rm,R0
>   mov.l  @-Rm,R0
>   mov.b  R0,@Rn+
>   mov.w  R0,@Rn+
>   mov.l  R0,@Rn+
> 
>   
> 


Re: [RX] Fix PR 81819

2018-01-13 Thread Oleg Endo
On Thu, 2018-01-11 at 17:32 +, Nick Clifton wrote:
> 
> > gcc/ChangeLog:
> > PR target/81819
> > * config/rx/rx.c (rx_is_restricted_memory_address):
> > Handle SUBREG case.
> Go ahead make my day^H^H^H^H^H^H
> 
> Approved - please apply.

Committed as r256578 (trunk) and r256579 (GCC 7 branch).

Cheers,
Oleg


[RX] Fix PR 81819

2018-01-11 Thread Oleg Endo
Hi,

The attached patch fixes PR 81819, which popped up on GCC 7.  I assume
it's also there on trunk, but can't build my app with trunk compiler
because it's got other issues.

Unfortunately I can't add the reproducer test case since it happens
only when building a bigger app with LTO.  But I have confirmed that
with this fix the app builds and runs.

OK for trunk and GCC 7?

Cheers,
Oleg

gcc/ChangeLog:
PR target/81819
* config/rx/rx.c (rx_is_restricted_memory_address):
Handle SUBREG case.Index: gcc/config/rx/rx.c
===
--- gcc/config/rx/rx.c	(revision 256385)
+++ gcc/config/rx/rx.c	(working copy)
@@ -284,6 +284,9 @@
   /* Simple memory addresses are OK.  */
   return true;
 
+case SUBREG:
+  return RX_REG_P (SUBREG_REG (mem));
+
 case PRE_DEC:
 case POST_INC:
   return false;


Re: [RX] Fix PR 81821

2018-01-11 Thread Oleg Endo
On Thu, 2018-01-11 at 15:10 +, Nick Clifton wrote:
> 
> > OK for trunk and GCC 7?
> Approved.  Do you have access to the repository, or would you like me
> to apply the patch for you ?

Thanks.  Committed as r256536 (trunk) and r256538 (GCC 7).

Cheers,
Oleg


[RX] Fix PR 81821

2018-01-11 Thread Oleg Endo
Hi,

The attached patch fixes PR 81821.  It is the same as the patch
proposed in the PR.  I have briefly checked it in a private application
that uses atomic variables and I think it's rather obvious.

I have added atomics support on RX in GCC 7 and that was an initial
bug.  Thus I'd like to also apply it to the GCC 7 branch.

OK for trunk and GCC 7?

Cheers,
Oleg

gcc/ChangeLog:
* config/rx/rx.md (BW): New mode attribute.
(sync_lock_test_and_setsi): Add mode suffix to insn output.Index: gcc/config/rx/rx.md
===
--- gcc/config/rx/rx.md	(revision 256385)
+++ gcc/config/rx/rx.md	(working copy)
@@ -2167,6 +2167,7 @@
   [(plus "add") (minus "sub") (ior "ior") (xor "xor") (and "and")])
 
 (define_mode_iterator QIHI [QI HI])
+(define_mode_attr BW [(QI "B") (HI "W")])
 
 (define_insn "sync_lock_test_and_setsi"
   [(set (match_operand:SI 0 "register_operand"   "=r,r")
@@ -2208,7 +2209,7 @@
(set (match_dup 1)
 	(match_operand:QIHI 2 "register_operand""0"))]
   ""
-  "xchg\t%1, %0"
+  "xchg\t%1., %0"
   [(set_attr "length" "6")
(set_attr "timings" "22")]
 )


Re: [PATCH v2] libgo: Add support for sh

2018-01-10 Thread Oleg Endo
On Wed, 2018-01-10 at 06:25 -0800, Ian Lance Taylor wrote:
> 
> Thanks.  I finally took a look at this.  I don't know much about SH,
> but I don't think we want to add each SH variant as a separate GOARCH
> value.  As you can see from the list you modified in
> ibgo/go/go/build/syslist.go, the difference between GOARCH values is
> essentially the calling convention.  There are many different kinds of
> x86 processors, but since the only calling convention difference is
> between 32-bit and 64-bit, the list has only 386 and amd64.  Similarly
> it seems to me we should have only sh and shbe in the list for SH
> processors.

On SH the calling convention depends on the processor features which
are available/used.  For example there is a "no-fpu" mode which uses
software fp and passes all fp values in gp regs, while the "normal"
mode is to pass fp values in fp regs.  Some of the CPU variants like
SH2 imply "no-fpu".

Cheers,
Oleg


Re: [PATCH] RX pragma address

2018-01-05 Thread Oleg Endo
On Fri, 2018-01-05 at 12:12 +, Sebastian Perta wrote:
> 
> > > 
> > > Is this for some kind of legacy backwards compatibility of
> > > something?

> Sort of, this is required by the following tool
> https://www.renesas.com/en-eu/products/software-tools/tools/code-
> generator/ap4-application-leading-tool-applilet.html

> There are not many plans to improve this tool and other solutions
> (which might be more complex) might not be possible to implement in
> this tool.

I see.

> 
> The only way I can think of is to put the variable in a separate
> section (using section attribute) and then in the linker script put
> that section at the desired address.
> The problem is AP4 does not touch the linker script it only generates
> source code.
> 
> Do you have any other ideas/suggestions? I'm very happy to listen.

If you can compile the code only as plain C, for example

#define const_addr_var(addr, type) (*(volatile type*)addr)

#define DTCCR const_addr_var (0x00082400, uint8_t)
#define DTCSTS const_addr_var (0x0008240E, uint16_t)


If you can compile the code as C++11 there are certainly more options,
albeit probably not useful for generated code.  For example I do those
things this way:

// read-write hardware register with constant address
static constexpr hw_reg_rw> DTCCR = { };

// ready-only hardware register with constant address
static constexpr hw_reg_r> DTCSTS = { };


In both cases (C and C++) the following will compile to the same code:

void test_wr (void)
{
  DTCCR = 123;
}

int test_rd (void)
{
  return DTCSTS;
}

volatile void* get_reg_addr (void)
{
  return 
}

For a possible implementation of the hw_reg thingy see
https://github.com/shared-ptr/bits/blob/master/hw_reg.hpp

But really, if that is just for some code generator, why not simply
adjust the code generator to spit out normal C code instead of
cluttering the compiler with non-portable pragmas?  You have also
posted a similar thing for RL78 a while ago, so in the end the same
pragma thing would be re-implemented in the compiler three times (M32C,
RL78, RX)?  In that case, maybe better move it out from the M32C target
and make it available for every target?

Cheers,
Oleg


Re: [PATCH] RX pragma address

2018-01-05 Thread Oleg Endo
Hi,

On Fri, 2018-01-05 at 11:03 +, Sebastian Perta wrote:
> 
> Hello, 
> 
> The following patch adds a new pragma, "pragma address" for RX.
> The patch updates extend.texi and add a test case to the regression
> as well.
> For the test case I checked than test is getting picked up in gcc.log
> unfortunately for the .texi part I don't know where to look/what to
> do to
> get the
> documentation generated.
> 
> This is similar to the pragma address implemented for M32C.

Is this for some kind of legacy backwards compatibility of something?
There are ways how to achieve the same with standard C and C++ language
features ... so I was wondering what's the purpose of this?

Cheers,
Oleg


[SH][committed] Fix PR 83111

2017-11-23 Thread Oleg Endo
Hi,

The attached patch fixes PR 83111.
Committed to mainline as r255096 and to GCC 7 branch as r255097.

Cheers,
Oleg

gcc/ChangeLog

PR target/83111
* config/sh/sh.md (udivsi3, divsi3, sibcall_value_pcrel,
sibcall_value_pcrel_fdpic): Use local variable instead of
operands[3].
(calli_tbr_rel): Add missing operand 2.
(call_valuei_tbr_rel): Add missing operand 3.Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 255095)
+++ gcc/config/sh/sh.md	(working copy)
@@ -2277,8 +2277,8 @@
   ""
 {
   rtx last;
+  rtx func_ptr = gen_reg_rtx (Pmode);
 
-  operands[3] = gen_reg_rtx (Pmode);
   /* Emit the move of the address to a pseudo outside of the libcall.  */
   if (TARGET_DIVIDE_CALL_TABLE)
 {
@@ -2298,16 +2298,16 @@
 	  emit_move_insn (operands[0], operands[2]);
 	  DONE;
 	}
-  function_symbol (operands[3], "__udivsi3_i4i", SFUNC_GOT);
-  last = gen_udivsi3_i4_int (operands[0], operands[3]);
+  function_symbol (func_ptr, "__udivsi3_i4i", SFUNC_GOT);
+  last = gen_udivsi3_i4_int (operands[0], func_ptr);
 }
   else if (TARGET_DIVIDE_CALL_FP)
 {
-  rtx lab = function_symbol (operands[3], "__udivsi3_i4", SFUNC_STATIC).lab;
+  rtx lab = function_symbol (func_ptr, "__udivsi3_i4", SFUNC_STATIC).lab;
   if (TARGET_FPU_SINGLE)
-	last = gen_udivsi3_i4_single (operands[0], operands[3], lab);
+	last = gen_udivsi3_i4_single (operands[0], func_ptr, lab);
   else
-	last = gen_udivsi3_i4 (operands[0], operands[3], lab);
+	last = gen_udivsi3_i4 (operands[0], func_ptr, lab);
 }
   else if (TARGET_SH2A)
 {
@@ -2318,8 +2318,8 @@
 }
   else
 {
-  rtx lab = function_symbol (operands[3], "__udivsi3", SFUNC_STATIC).lab;
-  last = gen_udivsi3_i1 (operands[0], operands[3], lab);
+  rtx lab = function_symbol (func_ptr, "__udivsi3", SFUNC_STATIC).lab;
+  last = gen_udivsi3_i1 (operands[0], func_ptr, lab);
 }
   emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);
   emit_move_insn (gen_rtx_REG (SImode, 5), operands[2]);
@@ -2405,22 +2405,22 @@
   ""
 {
   rtx last;
+  rtx func_ptr = gen_reg_rtx (Pmode);
 
-  operands[3] = gen_reg_rtx (Pmode);
   /* Emit the move of the address to a pseudo outside of the libcall.  */
   if (TARGET_DIVIDE_CALL_TABLE)
 {
-  function_symbol (operands[3], sh_divsi3_libfunc, SFUNC_GOT);
-  last = gen_divsi3_i4_int (operands[0], operands[3]);
+  function_symbol (func_ptr, sh_divsi3_libfunc, SFUNC_GOT);
+  last = gen_divsi3_i4_int (operands[0], func_ptr);
 }
   else if (TARGET_DIVIDE_CALL_FP)
 {
-  rtx lab = function_symbol (operands[3], sh_divsi3_libfunc,
+  rtx lab = function_symbol (func_ptr, sh_divsi3_libfunc,
  SFUNC_STATIC).lab;
   if (TARGET_FPU_SINGLE)
-	last = gen_divsi3_i4_single (operands[0], operands[3], lab);
+	last = gen_divsi3_i4_single (operands[0], func_ptr, lab);
   else
-	last = gen_divsi3_i4 (operands[0], operands[3], lab);
+	last = gen_divsi3_i4 (operands[0], func_ptr, lab);
 }
   else if (TARGET_SH2A)
 {
@@ -2431,8 +2431,8 @@
 }
   else
 {
-  function_symbol (operands[3], sh_divsi3_libfunc, SFUNC_GOT);
-  last = gen_divsi3_i1 (operands[0], operands[3]);
+  function_symbol (func_ptr, sh_divsi3_libfunc, SFUNC_GOT);
+  last = gen_divsi3_i1 (operands[0], func_ptr);
 }
   emit_move_insn (gen_rtx_REG (SImode, 4), operands[1]);
   emit_move_insn (gen_rtx_REG (SImode, 5), operands[2]);
@@ -6519,6 +6519,7 @@
   [(call (mem (match_operand:SI 0 "symbol_ref_operand" ""))
 	 (match_operand 1 "" ""))
(use (reg:SI FPSCR_MODES_REG))
+   (use (match_scratch 2))
(clobber (reg:SI PR_REG))]
   "TARGET_SH2A && sh2a_is_function_vector_call (operands[0])"
 {
@@ -6629,6 +6630,7 @@
 	(call (mem:SI (match_operand:SI 1 "symbol_ref_operand" ""))
 	  (match_operand 2 "" "")))
(use (reg:SI FPSCR_MODES_REG))
+   (use (match_scratch 3))
(clobber (reg:SI PR_REG))]
   "TARGET_SH2A && sh2a_is_function_vector_call (operands[1])"
 {
@@ -7044,13 +7046,11 @@
   [(const_int 0)]
 {
   rtx lab = PATTERN (gen_call_site ());
-  rtx call_insn;
+  rtx tmp =  gen_rtx_REG (SImode, R1_REG);
 
-  operands[3] =  gen_rtx_REG (SImode, R1_REG);
-
-  sh_expand_sym_label2reg (operands[3], operands[1], lab, true);
-  call_insn = emit_call_insn (gen_sibcall_valuei_pcrel (operands[0],
-			operands[3],
+  sh_expand_sym_label2reg (tmp, operands[1], lab, true);
+  rtx call_insn = emit_call_insn (gen_sibcall_valuei_pcrel (operands[0],
+			tmp,
 			operands[2],
 			copy_rtx (lab)));
   SIBLING_CALL_P (call_insn) = 1;
@@ -7078,12 +7078,11 @@
   [(const_int 0)]
 {
   rtx lab = PATTERN (gen_call_site ());
+  rtx tmp = gen_rtx_REG (SImode, R1_REG);
 
-  operands[3] =  gen_rtx_REG (SImode, R1_REG);
-
-  sh_expand_sym_label2reg (operands[3], operands[1], lab, true);
+  sh_expand_sym_label2reg (tmp, operands[1], lab, true);
   rtx i = 

Re: Bit-field struct member sign extension pattern results in redundant

2017-08-18 Thread Oleg Endo
On Fri, 2017-08-18 at 10:29 +1200, Michael Clark wrote:
> 
> This one is quite interesting:
> 
> - https://cx.rv8.io/g/WXWMTG
> 
> It’s another target independent bug. x86 is using some LEA followed
> by SAR trick with a 3 bit shift. Surely SHL 27, SAR 27 would suffice.
> In any case RISC-V seems like a nice target to try to fix this
> codegen for, as its less risk than attempting a fix in x86 ;-)
> 
> - https://github.com/riscv/riscv-gcc/issues/89
> 
> code:
> 
>   template 
>   inline T signextend(const T x)
>   {
>   struct {T x:B;} s;
>   return s.x = x;
>   }
> 
>   int sx5(int x) {
>   return signextend(x);
>   }
> 
> riscv asm:
> 
>   sx5(int):
>     slliw a0,a0,3
>     slliw a0,a0,24
>     sraiw a0,a0,24
>     sraiw a0,a0,3
>     ret
> 
> hand coded riscv asm
> 
>   sx5(int):
>     slliw a0,a0,27
>     sraiw a0,a0,27
>     ret
> 

Maybe related ...

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67644
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=50521


Cheers,
Oleg


Re: Overwhelmed by GCC frustration

2017-08-17 Thread Oleg Endo
On Wed, 2017-08-16 at 19:04 -0500, Segher Boessenkool wrote:
> 
> LRA is easier to work with than old reload, and that makes it better
> maintainable.
> 
> Making LRA handle everything reload did is work, and someone needs to
> do it.
> 
> LRA probably needs a few more target hooks (a _few_) to guide its
> decisions.

Like Georg-Johann mentioned before, LRA has been targeted mainly for
mainstream ISAs.  And actually it's a pretty reasonable choice.  Again,
I don't think that "one RA to rule them all" is a scalable approach.
 But that's just my opinion.

Cheers,
Oleg


Re: Limit SH strncmp inline expansion (PR target/78460)

2017-08-16 Thread Oleg Endo
On Tue, 2017-08-15 at 23:44 +, Joseph Myers wrote:
> 
> > This is an older issue.  Please also add a reference to PR 67712 in
> > your commit.  Can you also apply it to GCC 6 branch please?

> I can't reproduce the problem with GCC 6 branch; the glibc testsuite 
> builds fine without out-of-memory issues, as does the test included
> in the  patch, despite the GCC code in question being essentially
> unchanged.  That's why the bug appears to me as a regression in GCC
> 7.

OK, thanks for the investigation!

Cheers,
Oleg


Re: Optimizing away deletion of null pointers with g++

2017-08-16 Thread Oleg Endo
On Wed, 2017-08-16 at 13:30 +0200, Paolo Carlini wrote:
> 
> I didn't understand why we don't already handle the easy case:
> 
> constexpr int* ptr = nullptr;
> delete ptr;
> 

What about overriding the global delete operator with some user defined
implementation?  Is there something in the C++ standard that says the
invocation can be completely omitted, i.e. on which side of the call
the nullptr check is being done?

One possible use case could be overriding the global delete operator to
count the number of invocations, incl. for nullptr.  Not sure how
useful that is though.

Cheers,
Oleg


Re: Overwhelmed by GCC frustration

2017-08-16 Thread Oleg Endo
On Wed, 2017-08-16 at 15:53 +0200, Georg-Johann Lay wrote:
> 
> This means it's actually waste of time to work on these
> backends.  The code will finally end up in the dustbin as cc0
> backends are considered undesired ballast that has to be
> "jettisoned".
> 
> "Deprecate all cc0" is just a nice formulation of "deprecate
> most of the cc0 backends".
> 
> Just the fact that the backends that get most attention and attract
> most developers don't use cc0 doesn't mean cc0 is a useless device.

The desire to get rid of old, crusty and unmaintained stuff is somehow
understandable...


> First of all, LRA cannot cope with cc0 (Yes, I know deprecating
> cc0 is just to deprecate all non-LRA BEs).  LRA asserts that
> accessing the frame doesn't change condition code. LRA doesn't
> provide replacement for LEGITIMITE_RELOAD_ADDRESS.  Hence LRA
> focusses just comfortable, orthogonal targets.

It seems LRA is being praised so much, but all those niche BEs and
corner cases get zero support.  There are several known instances of SH
code regressions with LRA, and that's why I haven't switched it to
LRA. 

I think the problem is that it's very difficult to make a register
allocator that works well for everything.  The last attempt ended in
reload.  And eventually LRA will go down the same route.  So instead of
trying to fit a round peg in a square hole, maybe we should just have
the options for round and square pegs and holes.


Cheers,
Oleg


Re: Limit SH strncmp inline expansion (PR target/78460)

2017-08-15 Thread Oleg Endo
On Tue, 2017-08-15 at 21:15 +, Joseph Myers wrote:
> GCC mainline built for sh4-linux-gnu runs out of memory building a
> glibc test, which calls strncmp with very large constant size
> argument, resulting in the SH inline strncmp expansion trying to
> inline a fully unrolled expansion of strncmp for that size.
> 
> This patch limits that fully unrolled expansion to the case of less
> than 32 bytes.  This is explicitly *not* trying to be optimal in any
> way (very likely a lower threshold makes sense), just to limit enough
> to avoid the out-of-memory issue in the glibc testsuite.
> 
> I have *not* run the GCC testsuite for SH.  I have verified that this
> allows the glibc testsuite to build OK, with both GCC mainline and
> GCC 7 branch (and that the included test builds quickly with patched
> GCC, runs out of memory with unpatched GCC).
> 
> OK to commit (mainline and GCC 7 branch)?

Yes, that's OK.
This is an older issue.  Please also add a reference to PR 67712 in
your commit.  Can you also apply it to GCC 6 branch please?

Thanks!

Cheers,
Oleg


Re: [PATCH 6/6] qsort comparator consistency checking

2017-08-03 Thread Oleg Endo
On Thu, 2017-08-03 at 19:31 +0300, Alexander Monakov wrote:
> 
> My mistake, but the main point remains: STL uses 'sort' without the
> 'q'.

I think it'd be better if GCC's custom containers somewhat tried to
follow C++ standard container idioms.  Chopping off the 'q' is one step
towards it.

Cheers,
Oleg


  1   2   3   4   5   6   7   8   9   >