On Fri, Feb 26, 2016 at 7:50 AM, James Greenhalgh <james.greenha...@arm.com> wrote: > On Thu, Feb 25, 2016 at 09:25:45AM +0000, Kyrill Tkachov wrote: >> Hi all, >> >> In this wrong-code PR we get bad code when synthesising a TImode right shift >> by variable amount using DImode shifts during expand. >> >> The expand_double_word_shift function expands two paths: one where the >> variable amount is greater than GET_MODE_BITSIZE (DImode) (word_mode for >> aarch64) at runtime >> and one where it's less than that, and performs a conditional select between >> the two in the end. >> >> The second path is the one that goes wrong here. >> The algorithm it uses for calculating a TImode shift when the variable shift >> amount is <64 is: >> >> >> ------DImode----- ------DImode----- ------DImode----- ------DImode----- >> { |__ target-hi ___| |___ target-lo ___| } = { |___ source-hi ___| |___ >> source-lo ___| } >> var_amnt >> >> 1) carry = source-hi << 1 >> 2) tmp = ~var_amnt // either that or BITS_PER_WORD - 1 - var_amnt depending >> on shift_truncation_mask >> 3) carry = carry << tmp // net effect is that carry is source-hi shifted >> left by BITS_PER_WORD - var_amnt >> 4) target-lo = source-lo >>u var_amnt //unsigned shift. >> 5) target-lo = target-lo | carry >> 6) target-hi = source-hi >> var_amnt >> >> where the two DImode words source-hi and source-lo are the two words of the >> source TImode value and var_amnt is the register holding the variable shift >> amount. This is performed by the expand_subword_shift function in optabs.c. >> >> Step 2) is valid only if the target truncates the shift amount by the width >> of the type its shifting, that is SHIFT_COUNT_TRUNCATED is true and >> TARGET_SHIFT_TRUNCATION_MASK is 63 in this case. >> >> Step 3) is the one that goes wrong. On aarch64 a left shift is usually >> implemented using the LSL instruction but we also have alternatives that can >> use the SIMD registers and the USHL instruction. In this case we end up >> using the USHL instruction. However, although the USHL instruction does >> a DImode shift, it doesn't truncate the shift amount by 64, but rather by >> 255. Furthermore, the shift amount is interpreted as a 2's complement signed >> integer and if it's negative performs a right shift. This is in contrast >> with the normal LSL instruction which just performs an unsigned shift by an >> amount truncated by 64. >> >> Now, on aarch64 SHIFT_COUNT_TRUNCATED is defined as !TARGET_SIMD, so we don't >> assume shift amounts are truncated unless we know we can only ever use the >> LSL instructions for variable shifts. >> >> However, we still return 63 as the TARGET_SHIFT_TRUNCATION_MASK for DImode, >> so expand_subword_shift assumes that since the mask is non-zero it's a valid >> shift truncation mask. The documentation for TARGET_SHIFT_TRUNCATION_MASK >> says: >> " The default implementation of this function returns >> @code{GET_MODE_BITSIZE (@var{mode}) - 1} if @code{SHIFT_COUNT_TRUNCATED} >> and 0 otherwise." >> >> So since for TARGET_SIMD we cannot guarantee that all shifts truncate their >> amount, we should be returning 0 in TARGET_SHIFT_TRUNCATION_MASK when >> SHIFT_COUNT_TRUNCATED is false. This is what the patch does, and with it >> step 2) becomes: >> 2) tmp = BITS_PER_WORD - 1 - var_amnt >> which behaves correctly on aarch64. >> >> Unfortunately, the testcase from the PR has very recently gone latent on >> trunk because it depends on register allocation picking a particular >> alternative from the *aarch64_ashl_sisd_or_int_<mode>3 pattern in aarch64.md. >> I tried to do some inline asm tricks to get it to force the correct >> constraints but was unsuccessful. Nevertheless I've included the testcase in >> the patch. I suppose it's better than nothing. > > Thanks for the great write-up. This level of detail is very helpful for > review. > > OK for trunk. > > Thanks, > James > >> >> Bootstrapped and tested on aarch64. >> >> Ok for trunk? >> >> Thanks, >> Kyrill >> >> >> 2016-02-25 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >> >> PR target/69613 >> * config/aarch64/aarch64.c (aarch64_shift_truncation_mask): >> Return 0 if !SHIFT_COUNT_TRUNCATED >> >> 2016-02-25 Kyrylo Tkachov <kyrylo.tkac...@arm.com> >> >> PR target/69613 >> * gcc.dg/torture/pr69613.c: New test. > >> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c >> index >> d0d15b4feee70a5ca6af8dd16c7667cbcd844dbf..2e69e345545e591d1de76c2d175aac476e6e1107 >> 100644 >> --- a/gcc/config/aarch64/aarch64.c >> +++ b/gcc/config/aarch64/aarch64.c >> @@ -11171,7 +11171,8 @@ static unsigned HOST_WIDE_INT >> aarch64_shift_truncation_mask (machine_mode mode) >> { >> return >> - (aarch64_vector_mode_supported_p (mode) >> + (!SHIFT_COUNT_TRUNCATED >> + || aarch64_vector_mode_supported_p (mode) >> || aarch64_vect_struct_mode_p (mode)) ? 0 : (GET_MODE_BITSIZE (mode) - >> 1); >> } >> >> diff --git a/gcc/testsuite/gcc.dg/torture/pr69613.c >> b/gcc/testsuite/gcc.dg/torture/pr69613.c >> new file mode 100644 >> index >> 0000000000000000000000000000000000000000..44f2b0cc91ac4b12c4d255b608f95bc8bf016923 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/torture/pr69613.c >> @@ -0,0 +1,40 @@ >> +/* PR target/69613. */ >> +/* { dg-do run { target int128 } } */ >> +/* { dg-additional-options "-mavx" { target { i?86-*-* x86_64-*-* } } } */ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
This is incorrect. You need to check AVX run-time to run it on x86. -- H.J.