Hello, It seems to me that tree balancing risk of producing wrong result due to overflow of subexpression.
Say a = INT_MIN, b = 10, c = 10, d = INT_MAX. If ((a + b) + c) + d)) becomes ((a + b) + (c + d)) c + d will overflow and the original won't. So the behaviour of two are different. Though the architecture may manage to produce correct result, it is undefined I think. Cheers, Bingfeng > -----Original Message----- > From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On > Behalf Of Ian Bolton > Sent: 20 November 2009 15:05 > To: gcc@gcc.gnu.org > Subject: Worth balancing the tree before scheduling? > > From some simple experiments (see below), it appears as > though GCC aims > to > create a lop-sided tree when there are constants involved > (func1 below), > but a balanced tree when there aren't (func2 below). > > Our assumption is that GCC likes having constants all near to > each other > to > aid with tree-based optimisations, but I'm fairly sure that, when it > comes > to scheduling, it would be better to have a balanced tree, so > sched has > more > choices about what to schedule next? > > The impact of limiting sched's options can be seen if we look at the > pseudo-assembly produced by GCC for our architecture: > > func1: > LSHIFT $c3,$c1,3 # tmp137, a, > ADD $c2,$c2,$c3 # tmp138, b, tmp137 > ADD $c1,$c2,$c1 #, tmp138, a > > We think it would be better to avoid using the temporary: > > func1: > ADD $c2,$c1,$c2 # tmp137, a, b > LSHIFT $c1,$c1,3 # tmp138, a, > ADD $c1,$c2,$c1 # <result>, tmp137, tmp138 > > As it currently stands, sched doesn't have the option to do > this because > its input (shown in func.c.172r.asmcons below) is arranged > such that the > first add depends on the shift and the second add depends on the first > add. > > If the tree were balanced, sched would have the option to do the add > first. > And, providing the logic was there in sched, we could make it > choose to > schedule such that we limit the number of temporaries used. > > Maybe one of the RTL passes prior to scheduling has the potential to > balance the tree/RTL, but just isn't on our architecture? > > ====================================== > func.c: > -------------------------------------- > int func1 (int a, int b) > { > /* the original expression */ > return a + b + (a << 3); > } > > > int func2 (int a, int b, int c) > { > /* the original expression */ > return a + b + (a << c); > } > > > ====================================== > > ====================================== > func.c.129t.supress_extend: > -------------------------------------- > ;; Function func1 (func1) > > func1 (int a, int b) > { > <bb 2>: > return (b + (a << 3)) + a; > } > > func2 (int a, int b, int c) > { > <bb 2>: > return (b + a) + (a << c); > } > > > ====================================== > > ====================================== > func.c.172r.asmcons: > -------------------------------------- > > ;; Function func1 (func1) > > ;; Pred edge ENTRY [100.0%] (fallthru) > (note 5 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > > (insn 2 5 3 2 func.c:2 (set (reg/v:SI 134 [ a ]) > (reg:SI 1 $c1 [ a ])) 45 {*movsi} (expr_list:REG_DEAD > (reg:SI 1 > $c1 [ a ]) > (nil))) > > (note 3 2 4 2 NOTE_INSN_DELETED) > > (note 4 3 7 2 NOTE_INSN_FUNCTION_BEG) > > (insn 7 4 8 2 func.c:2 (set (reg:SI 137) > (ashift:SI (reg/v:SI 134 [ a ]) > (const_int 3 [0x3]))) 80 {ashlsi3} (nil)) > > (insn 8 7 9 2 func.c:2 (set (reg:SI 138) > (plus:SI (reg:SI 2 $c2 [ b ]) > (reg:SI 137))) 65 {*addsi3} (expr_list:REG_DEAD > (reg:SI 137) > (expr_list:REG_DEAD (reg:SI 2 $c2 [ b ]) > (nil)))) > > > (note 9 8 14 2 NOTE_INSN_DELETED) > > > (insn 14 9 20 2 func.c:5 (set (reg/i:SI 1 $c1) > (plus:SI (reg:SI 138) > (reg/v:SI 134 [ a ]))) 65 {*addsi3} (expr_list:REG_DEAD > (reg:SI 138) > (expr_list:REG_DEAD (reg/v:SI 134 [ a ]) > (nil)))) > > > (insn 20 14 0 2 func.c:5 (use (reg/i:SI 1 $c1)) -1 (nil)) > > ;; Function func2 (func2) > > ;; Pred edge ENTRY [100.0%] (fallthru) > (note 6 1 2 2 [bb 2] NOTE_INSN_BASIC_BLOCK) > > > (insn 2 6 3 2 func.c:8 (set (reg/v:SI 134 [ a ]) > (reg:SI 1 $c1 [ a ])) 45 {*movsi} (expr_list:REG_DEAD > (reg:SI 1 > $c1 [ a ]) > (nil))) > > > (note 3 2 4 2 NOTE_INSN_DELETED) > > > (note 4 3 5 2 NOTE_INSN_DELETED) > > > (note 5 4 8 2 NOTE_INSN_FUNCTION_BEG) > > > (insn 8 5 9 2 func.c:8 (set (reg:SI 138) > (plus:SI (reg:SI 2 $c2 [ b ]) > (reg/v:SI 134 [ a ]))) 65 {*addsi3} (expr_list:REG_DEAD > (reg:SI 2 $c2 [ b ]) > (nil))) > > > (insn 9 8 10 2 func.c:8 (set (reg:SI 139) > (ashift:SI (reg/v:SI 134 [ a ]) > (reg:SI 3 $c3 [ c ]))) 80 {ashlsi3} (expr_list:REG_DEAD > (reg/v:SI 134 [ a ]) > (expr_list:REG_DEAD (reg:SI 3 $c3 [ c ]) > (nil)))) > > > (note 10 9 15 2 NOTE_INSN_DELETED) > > > (insn 15 10 21 2 func.c:11 (set (reg/i:SI 1 $c1) > (plus:SI (reg:SI 138) > (reg:SI 139))) 65 {*addsi3} (expr_list:REG_DEAD > (reg:SI 139) > (expr_list:REG_DEAD (reg:SI 138) > (nil)))) > > > (insn 21 15 0 2 func.c:11 (use (reg/i:SI 1 $c1)) -1 (nil)) > > ====================================== > > Cheers, > Ian > >