Hi Segher,
On 04/11/15 23:50, Segher Boessenkool wrote:
Hi Kyrill,
On Mon, Nov 02, 2015 at 02:15:27PM +0000, Kyrill Tkachov wrote:
This patch attempts to restrict combine from transforming ZERO_EXTEND and
SIGN_EXTEND operations into and-bitmask
and weird SUBREG expressions when they appear inside MULT expressions. This
is because a MULT rtx containing these
extend operations is usually a well understood widening multiply operation.
Right. I have often wanted for combine to try plain substitution as well
as substitution and simplification: simplification (which uses nonzero_bits
etc.) often makes a mess of even moderately complex instructions.
But since we do not have that yet, and your 24% number is nicely impressive,
let's try to do no simplification for widening mults, as in your patch.
With this patch on aarch64 I saw increased generation of smaddl and umaddl
instructions where
before the combination of smull and add instructions were rejected because
the extend operands
were being transformed into these weird equivalent expressions.
Overall, I see 24% more [su]maddl instructions being generated for SPEC2006
and with no performance regressions.
The testcase in the patch is the most minimal one I could get that
demonstrates the issue I'm trying to solve.
Does this approach look ok?
In broad lines it does. Could you try this patch instead? Not tested
etc. (other than building an aarch64 cross and your test case); I'll do
that tomorrow if you like the result.
Thanks, that looks less intrusive. I did try it out on arm and aarch64.
It does work on the aarch64 testcase. However, there's also a correctness
regression, I'll try to explain inline....
Segher
diff --git a/gcc/combine.c b/gcc/combine.c
index c3db2e0..3bf7ffb 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5284,6 +5284,15 @@ subst (rtx x, rtx from, rtx to, int in_dest, int
in_cond, int unique_copy)
|| GET_CODE (SET_DEST (x)) == PC))
fmt = "ie";
+ /* Substituting into the operands of a widening MULT is not likely
+ to create RTL matching a machine insn. */
+ if (code == MULT
+ && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
+ && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
+ return x;
I think we should also add:
&& REG_P (XEXP (XEXP (x, 0), 0))
&& REG_P (XEXP (XEXP (x, 1), 0))
to the condition. Otherwise I've seen regressions in the arm testsuite, the
gcc.target/arm/smlatb-1.s test in particular that tries to match the pattern
(define_insn "*maddhisi4tb"
[(set (match_operand:SI 0 "s_register_operand" "=r")
(plus:SI (mult:SI (ashiftrt:SI
(match_operand:SI 1 "s_register_operand" "r")
(const_int 16))
(sign_extend:SI
(match_operand:HI 2 "s_register_operand" "r")))
(match_operand:SI 3 "s_register_operand" "r")))]
There we have a sign_extend of a shift that we want to convert to the form
that the pattern expects. So adding the checks for REG_P fixes that for me.
For the correctness issue I saw on aarch64 the shortest case I could reduce is:
short int a[16], b[16];
void
f5 (void)
{
a[0] = b[0] / 14;
}
We synthesise the division and lose one of the intermediate immediates.
The good codegen is:
f5:
adrp x1, b
mov w0, 9363 // <--- This gets lost!
movk w0, 0x9249, lsl 16 // <--- This gets lost!
adrp x2, a
ldrsh w1, [x1, #:lo12:b]
smull x0, w1, w0
lsr x0, x0, 32
add w0, w1, w0
asr w0, w0, 3
sub w0, w0, w1, asr 31
strh w0, [x2, #:lo12:a]
ret
The bad one with this patch is:
adrp x1, b
adrp x2, a
ldrsh w1, [x1, #:lo12:b]
smull x0, w1, w0
lsr x0, x0, 32
add w0, w1, w0
asr w0, w0, 3
sub w0, w0, w1, asr 31
strh w0, [x2, #:lo12:a]
ret
The problematic hunk there is the subst hunk.
In the expression 'x':
(mult:DI (sign_extend:DI (reg:SI 80 [ b ]))
(sign_extend:DI (reg:SI 82)))
it tries to substitute 'from': (reg:SI 82)
with 'to': (const_int -1840700269 [0xffffffff92492493])
since we now return just 'x' combine thinks that the substitution succeeded
and eliminates the immediate move.
Is there a way that subst can signal some kind of "failed to substitute" result?
If not, I got it to work by using that check to set the in_dest variable to the
subsequent recursive call to subst, in a similar way to my original patch, but
I was
hoping to avoid overloading the meaning of in_dest.
Thanks,
Kyrill
+
/* Get the mode of operand 0 in case X is now a SIGN_EXTEND of a
constant. */
if (fmt[0] == 'e')
@@ -8455,6 +8464,15 @@ force_to_mode (rtx x, machine_mode mode, unsigned
HOST_WIDE_INT mask,
/* ... fall through ... */
case MULT:
+ /* Substituting into the operands of a widening MULT is not likely to
+ create RTL matching a machine insn. */
+ if (code == MULT
+ && (GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 0)) == SIGN_EXTEND)
+ && (GET_CODE (XEXP (x, 1)) == ZERO_EXTEND
+ || GET_CODE (XEXP (x, 1)) == SIGN_EXTEND))
+ return gen_lowpart_or_truncate (mode, x);
+
/* For PLUS, MINUS and MULT, we need any bits less significant than the
most significant bit in MASK since carries from those bits will
affect the bits we are interested in. */