Re: Add a new combine pass

Richard Sandiford Mon, 25 Nov 2019 13:41:34 -0800

Segher Boessenkool <seg...@kernel.crashing.org> writes:
> Hi!
>
> On Mon, Nov 18, 2019 at 05:55:13PM +0000, Richard Sandiford wrote:
>> Richard Sandiford <richard.sandif...@arm.com> writes:
>> > (It's 23:35 local time, so it's still just about stage 1. :-))
>> 
>> Or actually, just under 1 day after end of stage 1.  Oops.
>> Could have sworn stage 1 ended on the 17th :-(  Only realised
>> I'd got it wrong when catching up on Saturday's email traffic.
>> 
>> And inevitably, I introduced a couple of stupid mistakes while
>> trying to clean the patch up for submission by that (non-)deadline.
>> Here's a version that fixes an inverted overlapping memref check
>> and that correctly prunes the use list for combined instructions.
>> (This last one is just a compile-time saving -- the old code was
>> correct, just suboptimal.)
>
> I've build the Linux kernel with the previous version, as well as this
> one.  R0 is unmodified GCC, R1 is the first patch, R2 is this one:
>
> (I've forced --param=run-combine=6 for R1 and R2):
> (Percentages are relative to R0):
>
>                     R0        R1        R2        R1        R2
>        alpha   6107088   6101088   6101088   99.902%   99.902%
>          arc   4008224   4006568   4006568   99.959%   99.959%
>          arm   9206728   9200936   9201000   99.937%   99.938%
>        arm64  13056174  13018174  13018194   99.709%   99.709%
>        armhf         0         0         0         0         0
>          c6x   2337237   2337077   2337077   99.993%   99.993%
>         csky   3356602         0         0         0         0
>        h8300   1166996   1166776   1166776   99.981%   99.981%
>         i386  11352159         0         0         0         0
>         ia64  18230640  18167000  18167000   99.651%   99.651%
>         m68k   3714271         0         0         0         0
>   microblaze   4982749   4979945   4979945   99.944%   99.944%
>         mips   8499309   8495205   8495205   99.952%   99.952%
>       mips64   7042036   7039816   7039816   99.968%   99.968%
>        nds32   4486663         0         0         0         0
>        nios2   3680001   3679417   3679417   99.984%   99.984%
>     openrisc   4226076   4225868   4225868   99.995%   99.995%
>       parisc   7681895   7680063   7680063   99.976%   99.976%
>     parisc64   8677077   8676581   8676581   99.994%   99.994%
>      powerpc  10687611  10682199  10682199   99.949%   99.949%
>    powerpc64  17671082  17658570  17658570   99.929%   99.929%
>  powerpc64le  17671082  17658570  17658570   99.929%   99.929%
>      riscv32   1554938   1554758   1554758   99.988%   99.988%
>      riscv64   6634342   6632788   6632788   99.977%   99.977%
>         s390  13049643  13014939  13014939   99.734%   99.734%
>           sh   3254743         0         0         0         0
>      shnommu   1632364   1632124   1632124   99.985%   99.985%
>        sparc   4404993   4399593   4399593   99.877%   99.877%
>      sparc64   6796711   6797491   6797491  100.011%  100.011%
>       x86_64  19713174  19712817  19712817   99.998%   99.998%
>       xtensa         0         0         0         0         0


Thanks for running these.

> There are fivew new failures, with either of the combine2 patches.  And
> all five are actually different (different symptoms, at least):
>
> - csky fails on libgcc build:
>
> /home/segher/src/gcc/libgcc/fp-bit.c: In function '__fixdfsi':
> /home/segher/src/gcc/libgcc/fp-bit.c:1405:1: error: unable to generate 
> reloads for:
>  1405 | }
>       | ^
> (insn 199 86 87 8 (parallel [
>             (set (reg:SI 101)
>                 (plus:SI (reg:SI 98)
>                     (const_int -32 [0xffffffffffffffe0])))
>             (set (reg:CC 33 c)
>                 (lt:CC (plus:SI (reg:SI 98)
>                         (const_int -32 [0xffffffffffffffe0]))
>                     (const_int 0 [0])))
>         ]) "/home/segher/src/gcc/libgcc/fp-bit.c":1403:23 207 {*cskyv2_declt}
>      (nil))
> during RTL pass: reload
>
> Target problem?

Yeah, looks like it.  The pattern is:

(define_insn "*cskyv2_declt"
  [(set (match_operand:SI 0 "register_operand" "=r")
        (plus:SI (match_operand:SI 1 "register_operand" "r")
                 (match_operand:SI 2 "const_int_operand" "Uh")))
   (set (reg:CC CSKY_CC_REGNUM)
        (lt:CC (plus:SI (match_dup 1) (match_dup 2))
               (const_int 0)))]
  "CSKY_ISA_FEATURE (2E3)"
  "declt\t%0, %1, %M2"
)

So the predicate accepts all const_ints but the constraint doesn't.

> - i386 goes into an infinite loop compiling, or at least an hour or so...
> Erm I forgot too record what it was compiling.  I did attach a GDB...  It
> is something from lra_create_live_ranges.

Hmm.

> - m68k:
>
> /home/segher/src/kernel/fs/exec.c: In function 'copy_strings':
> /home/segher/src/kernel/fs/exec.c:590:1: internal compiler error: in 
> final_scan_insn_1, at final.c:3048
>   590 | }
>       | ^
> 0x10408307 final_scan_insn_1
>         /home/segher/src/gcc/gcc/final.c:3048
> 0x10408383 final_scan_insn(rtx_insn*, _IO_FILE*, int, int, int*)
>         /home/segher/src/gcc/gcc/final.c:3152
> 0x10408797 final_1
>         /home/segher/src/gcc/gcc/final.c:2020
> 0x104091f7 rest_of_handle_final
>         /home/segher/src/gcc/gcc/final.c:4658
> 0x104091f7 execute
>         /home/segher/src/gcc/gcc/final.c:4736
>
> and that line is
>             gcc_assert (prev_nonnote_insn (insn) == last_ignored_compare);

Ah, this'll be while m68k was still a cc0 target.  Yeah, I should probably
just skip the whole pass for cc0.

> - nds32:
>
> /tmp/ccC8Czca.s: Assembler messages:
> /tmp/ccC8Czca.s:3144: Error: Unrecognized operand/register, lmw.bi 
> [$fp+(-60)],[$fp],$r11,0x0.
>
> /tmp/ccl8o20c.s: Assembler messages:
> /tmp/ccl8o20c.s:2449: Error: Unrecognized operand/register, lmw.bi 
> $r9,[$fp],[$fp+(-132)],0x0.
>
> /tmp/ccZxjwHd.s: Assembler messages:
> /tmp/ccZxjwHd.s:4776: Error: Unrecognized operand/register, lmw.bi 
> [$fp+(-52)],[$fp],[$fp+(-56)],0x0.
>
> /tmp/cczjOS3d.s: Assembler messages:
> /tmp/cczjOS3d.s:2336: Error: Unrecognized operand/register, lmw.bi 
> $r16,[$fp],$r7,0x0.
>
> and more.  All lmw.bi...  target issue?

Yeah, looks like it wasn't expecting this pattern to be generated
automatically before RA, so it doesn't have constraints (and probably
couldn't, since the registers need to be consecutive).

> - sh (that's sh4-linux):
>
> /home/segher/src/kernel/net/ipv4/af_inet.c: In function 'snmp_get_cpu_field':
> /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: unable to find a 
> register to spill in class 'R0_REGS'
>  1638 | }
>       | ^
> /home/segher/src/kernel/net/ipv4/af_inet.c:1638:1: error: this is the insn:
> (insn 18 17 19 2 (set (reg:SI 0 r0)
>         (mem:SI (plus:SI (reg:SI 4 r4 [178])
>                 (reg:SI 6 r6 [171])) [17 *_3+0 S4 A32])) 
> "/home/segher/src/kernel/net/ipv4/af_inet.c":1638:1 188 {movsi_i}
>      (expr_list:REG_DEAD (reg:SI 4 r4 [178])
>         (expr_list:REG_DEAD (reg:SI 6 r6 [171])
>             (nil))))
> /home/segher/src/kernel/net/ipv4/af_inet.c:1638: confused by earlier errors, 
> bailing out

Would have to look more at this one.  Seems odd that it can't allocate
R0 when it's already the destination and when R0 can't be live before
the insn.  But there again, this is reload, so my enthuasiasm for looking
is a bit limited :-)

> Looking at just binary size, which is a good stand-in for how many insns
> it combined:
>
>                     R2
>        arm64   99.709%
>         ia64   99.651%
>         s390   99.734%
>        sparc   99.877%
>      sparc64  100.011%
>
> (These are those that are not between 99.9% and 100.0%).
>
> So only sparc64 regressed, and just a tiny bit (I can look at what that
> is, if there is interest).  But 32-bit sparc improved, and s390, arm64,
> and ia64 got actual benefit.
>
> Again this is just code size, not analysing the actually changed code.

OK.  Certainly not an earth-shattering improvement then, but not
entirely worthless either.

> I did look at the powerpc64le changes.  It is almost completely load-
> with-update (and store-with-update) insns that make the difference, but
> there are also some dot insns.  The extra mr. are usually not a good
> idea, but the extsw. are.  Sometimes this causes *more* insns in the end
> (register move insns), but that is the exception.
>
> This mr. problem is there with combine already, btw.  In the end it is
> caused by this just not being something good to do on pseudos, it would
> be better to do this after RA, in a peephole or similar.  OTOH it isn't
> actually really important for performance either way.
>
> Btw, does the new pass use TARGET_LEGITIMATE_COMBINED_INSN?  It probably
> should.  (That would be the hook where we would probably want to prevent
> generating mr. insns).

No, it doesn't use that yet, but I agree it should.  Will fix.

I see combine also tests cannot_copy_insn_p.  I'm not sure whether that's
appropriate for the new pass or not.  Arguably it's not copying the
instruction, it's just moving it to be in parallel with something else.
(But then that's largely true of the combine case too.)

Thanks,
Richard

Re: Add a new combine pass

Reply via email to