On Thu, Jun 29, 2017 at 1:20 PM, Wilco Dijkstra <wilco.dijks...@arm.com> wrote: > Richard Biener wrote: >> Hurugalawadi, Naveen wrote: >> > The code (m1 > m2) * d code should be optimized as m1> m2 ? d : 0. > >> What's the reason of this transform? I expect that the HW multiplier >> is quite fast given one operand is either zero or one and a multiplication >> is a gimple operation that's better handled in optimizations than >> COND_EXPRs which eventually expand to conditional code which >> would be much slower. > > Even really fast multipliers have several cycles latency, and this is > generally > fixed irrespectively of the inputs. Maybe you were thinking about division? > > Additionally integer multiply typically has much lower throughput than other > ALU operations like conditional move - a modern CPU may have 4 ALUs > but only 1 multiplier, so removing redundant integer multiplies is always > good. > > Note (m1 > m2) is also a conditional expression which will result in branches > for floating point expressions and on some targets even for integers. Moving > the multiply into the conditional expression generates the best code: > > Integer version: > f1: > cmp w0, 100 > csel w0, w1, wzr, gt > ret > f2: > cmp w0, 100 > cset w0, gt > mul w0, w0, w1 > ret > > Float version: > f3: > movi v1.2s, #0 > cmp w0, 100 > fcsel s0, s0, s1, gt > ret > f4: > cmp w0, 100 > bgt .L8 > movi v1.2s, #0 > fmul s0, s0, s1 // eh??? > .L8: > ret
But then int f (int m, int c) { return (m & 1) * c; } int g (int m, int c) { if (m & 1 != 0) return c; return 0; } f: .LFB0: .cfi_startproc andl $1, %edi movl %edi, %eax imull %esi, %eax ret g: .LFB1: .cfi_startproc movl %edi, %eax andl $1, %eax cmovne %esi, %eax ret anyway. As a general comment to the patch please do it as a pattern in match.pd (match boolean_value_range_p @0 (if (INTEGRAL_TYPE_P (type) && TYPE_PRECISION (type) == 1))) (match boolean_value_range_p INTEGER_CST (if (integer_zerop (t) || integer_onep (t)))) (match boolean_value_range_p SSA_NAME (if (INTEGRAL_TYPE_P (type) && ~get_nonzero_bits (t) == 1))) (simplify (mult:c boolean_value_range_p@0 @1) (cond @0 @1 @0)) or something like that. Richard. > Wilco