Re: [PATCH] Don't combine across likely spilled hard reg setters (PR rtl-optimization/59477)

2014-01-15 Thread Joern Rennecke
On 15 January 2014 21:55, Jakub Jelinek  wrote:
...
> The patch removes the likely_spilled_retval stuff Joern wrote a few years
> ago because I believe this change should obsolete that.  But, have tried to
> reproduce the ICE using sh-elf target and haven't managed to do so, 4.9 is
> quite a different compiler from 2005-ish gcc.

You wouldn't see an ICE, but wrong-code.
That being said, it sounds like your case 2) should handle this.


Re: [PATCH] Don't combine across likely spilled hard reg setters (PR rtl-optimization/59477)

2014-01-16 Thread Jakub Jelinek
On Thu, Jan 16, 2014 at 03:41:57AM +, Joern Rennecke wrote:
> On 15 January 2014 21:55, Jakub Jelinek  wrote:
> ...
> > The patch removes the likely_spilled_retval stuff Joern wrote a few years
> > ago because I believe this change should obsolete that.  But, have tried to
> > reproduce the ICE using sh-elf target and haven't managed to do so, 4.9 is
> > quite a different compiler from 2005-ish gcc.
> 
> You wouldn't see an ICE, but wrong-code.

I've mistyped it above, yeah, I was looking with your functions reverted
and without the rest of my patch if combiner would even attempt to combine
something into the second insn (loading fr1, after the fr0 set) and that
didn't happen on the trunk.

> That being said, it sounds like your case 2) should handle this.

Anyway, for all the changed *.o files I've done:
for i in `cat /tmp/M1`; do readelf -Ws $i | awk '($4=="FUNC" && 
$7!="UND"){print $8" "$3}' | sort > /tmp/1; readelf -Ws ../obj877/$i | awk 
'($4=="FUNC" && $7!="UND"){print $8" "$3}' | sort > /tmp/2; j=`cat /tmp/1 | wc 
-l`; k=`diff -up /tmp/1 /tmp/2 | grep -v '^+++' | grep '^+' | wc -l`; echo $i 
$k/$j; done
to see how many functions in those files actually changed because of this
patch and the numbers are below.  The numbers are smaller though, because
it only looks at function sizes, there could be code generation change even
in function that has the same size (e.g. if later alignment makes the
change not visible in the size).  For x86_64, the sums are that
5384 functions out of the 43856 symbols (in these ~ 8% of all *.o files)
changed function size.  That still looks like a big change.

I wonder if instead of not creating the LOG_LINKS we could just somehow taint
them, add a flag to struct insn_link that we need to be careful about
combining across that link, then in combine_instructions or the flags together
from all the used links and if try_combine is called with this
crosses_likely_spilled_setter_p
flag, if recog accepts the result look at all constraints on all operands of the
new insn and if for any constraint anywhere REG_CLASS_FROM_CONSTRAINT is likely
spilled, punt.  Thoughts on if this could be reliable?

i686:
gcc/builtins.o 0/201
gcc/gimple-fold.o 1/71
gcc/tree-sra.o 1/124
gcc/tree-ssa-structalias.o 0/164
gcc/tree-vect-stmts.o 0/128
gcc/tree-vrp.o 0/172
gcc/cp/error.o 0/61
i686-pc-linux-gnu/libcilkrts/.libs/cilk-abi-cilk-for.o 0/16
x86_64:
gcc/builtins.o 0/195
gcc/cfgexpand.o 0/112
gcc/cfghooks.o 0/56
gcc/cgraphunit.o 1/30
gcc/df-core.o 1/82
gcc/final.o 2/89
gcc/fold-const.o 1/144
gcc/ggc-common.o 2/45
gcc/gimple-fold.o 0/70
gcc/gimple-pretty-print.o 1/74
gcc/godump.o 3/18
gcc/haifa-sched.o 3/162
gcc/i386.o 1/500
gcc/ipa-inline.o 0/46
gcc/opts-common.o 1/18
gcc/opts.o 1/17
gcc/passes.o 7/120
gcc/reg-stack.o 0/44
gcc/sched-rgn.o 1/86
gcc/sel-sched-ir.o 1/238
gcc/toplev.o 0/27
gcc/tree-eh.o 0/138
gcc/tree-inline.o 1/111
gcc/tree-sra.o 1/121
gcc/tree-ssa-live.o 0/45
gcc/tree-ssa-phiprop.o 0/24
gcc/tree-ssa-sccvn.o 0/119
gcc/tree-ssa-structalias.o 0/165
gcc/tree-vect-stmts.o 1/129
gcc/tree-vrp.o 1/176
gcc/varasm.o 2/196
gcc/ada/ali-util.o 1/111
gcc/ada/errout.o 1/86
gcc/ada/lib-xref.o 1/93
gcc/ada/urealp.o 2/88
gcc/ada/utils.o 0/141
gcc/build/read-rtl.o 2/23
gcc/c/c-decl.o 1/160
gcc/c/c-objc-common.o 1/8
gcc/c-family/c-opts.o 0/20
gcc/c-family/c-pch.o 1/8
gcc/cp/cp-objcp-common.o 1/15
gcc/cp/optimize.o 0/6
gcc/fortran/expr.o 1/72
gcc/fortran/intrinsic.o 0/44
gcc/fortran/scanner.o 1/47
gcc/go/expressions.o 81/765
gcc/go/gogo.o 14/351
gcc/go/gogo-tree.o 5/64
gcc/go/go.o 1/6
gcc/go/import-archive.o 1/24
gcc/go/import.o 1/47
gcc/go/parse.o 4/148
gcc/go/runtime.o 3/11
gcc/go/statements.o 21/402
gcc/go/types.o 31/566
gcc/go/unsafe.o 0/5
x86_64-unknown-linux-gnu/boehm-gc/.libs/finalize.o 2/23
x86_64-unknown-linux-gnu/boehm-gc/.libs/pthread_support.o 3/36
x86_64-unknown-linux-gnu/boehm-gc/.libs/ptr_chck.o 1/9
x86_64-unknown-linux-gnu/libcilkrts/.libs/cilk-abi-cilk-for.o 2/15
x86_64-unknown-linux-gnu/libcilkrts/.libs/cilk_fiber-unix.o 0/15
x86_64-unknown-linux-gnu/libgfortran/.libs/environ.o 2/18
x86_64-unknown-linux-gnu/libgfortran/.libs/file_pos.o 1/4
x86_64-unknown-linux-gnu/libgfortran/.libs/inquire.o 1/2
x86_64-unknown-linux-gnu/libgfortran/.libs/intrinsics.o 2/34
x86_64-unknown-linux-gnu/libgfortran/.libs/list_read.o 1/22
x86_64-unknown-linux-gnu/libgfortran/.libs/open.o 2/4
x86_64-unknown-linux-gnu/libgfortran/.libs/transfer.o 3/49
x86_64-unknown-linux-gnu/libgfortran/.libs/unit.o 1/19
x86_64-unknown-linux-gnu/libgfortran/.libs/unix.o 1/60
x86_64-unknown-linux-gnu/libgo/.libs/bufio.o 1/70
x86_64-unknown-linux-gnu/libgo/.libs/bytes.o 2/87
x86_64-unknown-linux-gnu/libgo/.libs/expvar.o 0/49
x86_64-unknown-linux-gnu/libgo/.libs/flag.o 4/94
x86_64-unknown-linux-gnu/libgo/.libs/fmt.o 8/159
x86_64-unknown-linux-gnu/libgo/.libs/image.o 2/132
x86_64-unknown-linux-gnu/libgo/.libs/io.o 2/44
x86_64-unknown-linux-gnu/libgo/.libs/malloc.o 0/19
x86_64-unknown-linux-gnu/libgo/.libs/net.o 32/533
x86_64-unk

Re: [PATCH] Don't combine across likely spilled hard reg setters (PR rtl-optimization/59477)

2014-01-16 Thread Richard Biener
On Wed, 15 Jan 2014, Jakub Jelinek wrote:

> Hi!
> 
> As discussed in the PR, when combine combines something across a setter
> of a likely spilled non-fixed hard register, we may get RA ICE, as the
> combined insn might need for reload a hard reg that is live at the point
> where combiner has combined the insn.
> 
> The following patch attempts to fix that by not creating LOG_LINKS in
> certain cases.

Err - your description makes it sound like a register allocator / reload / 
LRA issue but you are papering over it in combine?

Vlad claims that reload or LRA cannot fix up here but surely splitting
the live range of the live-crossing hardreg twice and spilling it
must be possible, no?

>  There are two cases which the patch handles separately:
> 1) CALL_INSN and preceeding load_register_parameters sequence
>(from the call insn up to the earliest insn in the sequence that
>sets some likely spilled non-fixed hard register

that's PR59477, right?

> 2) other setters of likely spilled non-fixed hard register
>(loading of return value of a function into the return register(s),
>explicit register vars, ...)
> 
> 1) is handled specially, by disallowing LOG_LINKS from before the first
>insn into the sequence, because in between that first insn and
>the CALL_INSN a likely spilled hard register is live.  LOG_LINKS from
>before the sequence to after the CALL_INSN or from within the sequence
>to after the call or from within the sequence to within the sequence
>are allowed
> 2) other setters are handled just like a basic block boundary for LOG_LINKS,
>no LOG_LINKS are allowed to cross that boundary

It seems this is very conservative - the time combine runs you
only have pseudos in the insn that will eventually create the
conflict.  In fact,

(define_insn "*ashl3_1"
  [(set (match_operand:SWI48 0 "nonimmediate_operand" "=rm,r,r")
(ashift:SWI48 (match_operand:SWI48 1 "nonimmediate_operand" 
"0,l,rm")
  (match_operand:QI 2 "nonmemory_operand" 
"c,M,r")))
   (clobber (reg:CC FLAGS_REG))]
  "ix86_binary_operator_ok (ASHIFT, mode, operands)"

suggests that IRA can happily choose sth else than cx here even though
Vlad claims

"As r90 in inns 29 needs to be in cx and cx is already alive, neither 
reload nor LRA can do anything."

Other passes may have exactly the same issue - if live-ranges cannot
be split during IRA then two shift insns with overlapping shift
amount life-ranges will cause exactly the same problem, no?

Jakub says

I don't think this is a RA bug, the problem is I think that combine 
changes:
 (insn 20 19 21 2 (set (reg:DI 2 cx)
 (const_int 0 [0])) pr59477.C:20 62 {*movdi_internal_rex64}
  (nil))
 
-(insn 21 20 22 2 (set (reg:SI 37 r8)
-(subreg:SI (reg:DI 64) 0)) pr59477.C:20 64 {*movsi_internal}
- (expr_list:REG_DEAD (reg:DI 64)
-(nil)))
+(insn 21 20 22 2 (parallel [
+(set (reg:DI 37 r8)
+(ashift:DI (reg:DI 65)
+(subreg:QI (reg:SI 63 [ v ]) 0)))
+(clobber (reg:CC 17 flags))
+]) pr59477.C:20 503 {*ashldi3_1}
+ (expr_list:REG_UNUSED (reg:CC 17 flags)
+(expr_list:REG_DEAD (reg:SI 63 [ v ])
+(expr_list:REG_DEAD (reg:DI 65)
+(nil)

why is there a non-argument setup insn inbetween argument setup
and the call?  Having argument register constraints handled like

;; bar (_20, _20, _20, _20);

(insn 21 20 22 (set (reg:SI 2 cx)
(reg:SI 97 [ D.1780 ])) t.c:10 -1
 (nil))

(insn 22 21 23 (set (reg:SI 1 dx)
(reg:SI 97 [ D.1780 ])) t.c:10 -1
 (nil))

(insn 23 22 24 (set (reg:SI 4 si)
(reg:SI 97 [ D.1780 ])) t.c:10 -1
 (nil))

(insn 24 23 25 (set (reg:SI 5 di)
(reg:SI 97 [ D.1780 ])) t.c:10 -1
 (nil))

(call_insn 25 24 0 (call (mem:QI (symbol_ref:DI ("bar") [flags 0x41]  
) [0 bar S1 A8])

looks less than ideal - we get a live-range split which is
probably good for RA, but ultimatively there shouldn't be separate
insns involving hard regs before reload - if anything gets
between those insns this limits RA / reload freedom.  We
should be able to tell the RA that the call needs its inputs in
certain registers - just as we do with asm()s.

So, looking at what combine does:

Trying 18 -> 28:
Successfully matched this instruction:
(set (reg:SI 37 r8)
(subreg:SI (reg:DI 89 [ D.2308 ]) 0))

that's

(insn 18 17 20 2 (parallel [
(set (reg:DI 95)
(ior:DI (reg:DI 93)
(reg:DI 91 [ D.2309 ])))
(clobber (reg:CC 17 flags))
]) t.c:4 421 {*iordi_1}
 (expr_list:REG_DEAD (reg:DI 93)
(expr_list:REG_DEAD (reg:DI 91 [ D.2309 ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)

(insn 28 27 29 2 (set (reg:SI 37 r8)
(subreg:SI (reg:DI 95) 0)) t.c:20 90 {*movsi_internal}
 (expr_list:REG_DEAD (reg:DI 95)
(nil)))

Trying 11 -> 28:
Successfully matched this instructio

Re: [PATCH] Don't combine across likely spilled hard reg setters (PR rtl-optimization/59477)

2014-01-16 Thread Joern Rennecke
On 16 January 2014 08:26, Jakub Jelinek  wrote:

> I wonder if instead of not creating the LOG_LINKS we could just somehow taint
> them, add a flag to struct insn_link that we need to be careful about
> combining across that link, then in combine_instructions or the flags together
> from all the used links and if try_combine is called with this
> crosses_likely_spilled_setter_p
> flag, if recog accepts the result look at all constraints on all operands of 
> the
> new insn and if for any constraint anywhere REG_CLASS_FROM_CONSTRAINT is 
> likely
> spilled, punt.  Thoughts on if this could be reliable?

You would miss spills from secondary/tertiary reloads.  OTOH, as
Richard already said,
what you do check is at the same time paranoidly conservative.

Since the failure mode here is an ICE, I think it would be appropriate
to use a target
hook to look at the live likely-spilled register and the combined
insn, and tell if
the combination is safe - and set the default not to worry.


Re: [PATCH] Don't combine across likely spilled hard reg setters (PR rtl-optimization/59477)

2014-01-20 Thread Eric Botcazou
> I think the problem is still either a missed feature in LRA/reload
> (split live-ranges), a problem in how we represent calls & argument
> setup (magic hard-reg uses), or a backend problem (should spill
> itself if necessary, via a post-reload splitter or always spill
> and undo via a peephole2).
>
> Of course papering over in combine might be the best at this
> stage.  So the above was just observations from the less experienced
> people in this area.

Yes, this is ultimately a RA issue, but an ancient and hard one as far as I 
understand so papering over it makes sense I think.  That being said, while 
Joern's machinery was IMO acceptable because relatively simple and localized, 
Jakub's looks more complicated and far-reaching (that's why I suggested to try 
to extend the existing machinery if possible) so I think that we ought to try 
something simpler first.

-- 
Eric Botcazou


Re: [PATCH] Don't combine across likely spilled hard reg setters (PR rtl-optimization/59477)

2014-01-20 Thread Vladimir Makarov

On 1/20/2014, 4:12 AM, Eric Botcazou wrote:

I think the problem is still either a missed feature in LRA/reload
(split live-ranges), a problem in how we represent calls & argument
setup (magic hard-reg uses), or a backend problem (should spill
itself if necessary, via a post-reload splitter or always spill
and undo via a peephole2).

Of course papering over in combine might be the best at this
stage.  So the above was just observations from the less experienced
people in this area.


Yes, this is ultimately a RA issue, but an ancient and hard one as far as I
understand so papering over it makes sense I think.  That being said, while
Joern's machinery was IMO acceptable because relatively simple and localized,
Jakub's looks more complicated and far-reaching (that's why I suggested to try
to extend the existing machinery if possible) so I think that we ought to try
something simpler first.



I've just started to work on fixing this in LRA.  But I am not going to 
work on fixing this in reload.  It would be just wasting enormous amount 
of time.




Re: [PATCH] Don't combine across likely spilled hard reg setters (PR rtl-optimization/59477)

2014-01-20 Thread Eric Botcazou
> I've just started to work on fixing this in LRA.  But I am not going to
> work on fixing this in reload.  It would be just wasting enormous amount
> of time.

Yes, fixing it in LRA only would be sufficient for PR rtl-optimization/59477, 
since it's an x86-specific issue.  Other architectures aren't as sensitive as 
x86 in this area, and the existing counter-measures in combine.c are usually 
sufficient for them.

-- 
Eric Botcazou