This is the i386 backend specific piece of my revised patch for PR middle-end/98865, where Richard Biener has suggested that I perform the desired transformation during RTL expansion where the backend can control whether it is profitable to convert a multiplication into a bit-wise AND and a negation. This works well for x86_64, but alas exposes a latent bug with -m32, where a DImode multiplication incorrectly appears to be cheaper than negdi2+anddi3(!?). The fix to ix86_rtx_costs is to report that a DImode (multi-word) multiplication actually requires three SImode multiplications and two SImode additions. This also corrects the cost of TImode multiplication on TARGET_64BIT.
This patch has been tested on x86_64-pc-linux-gnu with make bootstrap and make -k check, both with and without --target_board=unix{-m32}, with no new failures. This change avoids the need for a !ia32 target selector for the upcoming new test case gcc.target/i386/pr98865.c. Ok for mainline? 2022-05-17 Roger Sayle <ro...@nextmovesoftware.com> gcc/ChangeLog * config/i386/i386.cc (ix86_rtx_costs) [MULT]: When mode size is wider than word_mode, a multiplication costs three word_mode multiplications and two word_mode additions. Thanks in advance, Roger --
diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc index 86752a6..e8a2229 100644 --- a/gcc/config/i386/i386.cc +++ b/gcc/config/i386/i386.cc @@ -20634,7 +20634,17 @@ ix86_rtx_costs (rtx x, machine_mode mode, int outer_code_i, int opno, op0 = XEXP (op0, 0), mode = GET_MODE (op0); } - *total = (cost->mult_init[MODE_INDEX (mode)] + int mult_init; + // Double word multiplication requires 3 mults and 2 adds. + if (GET_MODE_SIZE (mode) > UNITS_PER_WORD) + { + mult_init = 3 * cost->mult_init[MODE_INDEX (word_mode)] + + 2 * cost->add; + nbits *= 3; + } + else mult_init = cost->mult_init[MODE_INDEX (mode)]; + + *total = (mult_init + nbits * cost->mult_bit + rtx_cost (op0, mode, outer_code, opno, speed) + rtx_cost (op1, mode, outer_code, opno, speed));