On November 30, 2020 7:18:37 PM GMT+01:00, Richard Sandiford <richard.sandif...@arm.com> wrote: >Richard Biener <richard.guent...@gmail.com> writes: >> On November 30, 2020 4:29:41 PM GMT+01:00, Richard Sandiford via >Gcc-patches <gcc-patches@gcc.gnu.org> wrote: >>>dse.c:find_shift_sequence tries to represent a store and load >>>back as a shift right followed by a truncation. It therefore >>>needs to find an integer mode in which to do the shift right. >>>The loop it uses has the form: >>> >>> FOR_EACH_MODE_FROM (new_mode_iter, >>> smallest_int_mode_for_size (GET_MODE_BITSIZE (read_mode))) >>> >>>which implicitly assumes that read_mode has an equivalent integer >mode. >>>As shown in the testcase, not all modes have such an integer mode. >> >> But if no such mode exists iterating won't help either? So why not >simply fail when the mode does not exist? > >You mean test against the maximum integer mode before the loop, >but keep the smallest_int_mode_for_size call as-is? I'm not sure >that's going to be more efficient in practice, and it seems less >obviously correct. > >Alternatively, we could have a version of smallest_int_mode_for_size >that returns failure instead of asserting.
OH, didn't remember this one asserts... But >smallest_int_mode_for_size >is itself a FOR_EACH_MODE_* iterator, so it would iterate just as much >as >the patch does. > >smallest_int_mode_for_size also has some __int20 etc. handling that I >don't think we want here, and probably avoid by luck. (We already know >at this point that any shift value would be nonzero, which probably >weeds out most of the problematic cases for PSImode targets.) Ok, I see. >find_shift_sequence already iterates over the modes itself and it >already has custom requirements in terms of when to break and when >to continue. It just seems simpler and more obvious for the loop to >iterate over all the modes directly rather than delegate part of the >iteration to another function. Fine. Jeff has already acked the patch. Richard. >Thanks, >Richard > > > >> >>>This patch just makes the code start from the smallest integer mode >and >>>skip modes that are too small. The loop already breaks at the first >>>mode wider than word_mode. >>> >>>Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK for trunk and >>>GCC 10 branch? >>> >>>Richard >>> >>> >>>gcc/ >>> PR rtl-optimization/98037 >>> * dse.c (find_shift_sequence): Iterate over all integers and >>> skip modes that are too small. >>> >>>gcc/testsuite/ >>> PR rtl-optimization/98037 >>> * gcc.target/aarch64/sve/acle/general/pr98037.c: New test. >>>--- >>> gcc/dse.c | 5 >+++-- >>> gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr98037.c | 6 >++++++ >>> 2 files changed, 9 insertions(+), 2 deletions(-) >>>create mode 100644 >>>gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr98037.c >>> >>>diff --git a/gcc/dse.c b/gcc/dse.c >>>index d65266b5476..651e6e7e71e 100644 >>>--- a/gcc/dse.c >>>+++ b/gcc/dse.c >>>@@ -1757,8 +1757,7 @@ find_shift_sequence (poly_int64 access_size, >>> the machine. */ >>> >>> opt_scalar_int_mode new_mode_iter; >>>- FOR_EACH_MODE_FROM (new_mode_iter, >>>- smallest_int_mode_for_size (GET_MODE_BITSIZE (read_mode))) >>>+ FOR_EACH_MODE_IN_CLASS (new_mode_iter, MODE_INT) >>> { >>> rtx target, new_reg, new_lhs; >>> rtx_insn *shift_seq, *insn; >>>@@ -1767,6 +1766,8 @@ find_shift_sequence (poly_int64 access_size, >>> new_mode = new_mode_iter.require (); >>> if (GET_MODE_BITSIZE (new_mode) > BITS_PER_WORD) >>> break; >>>+ if (maybe_lt (GET_MODE_SIZE (new_mode), GET_MODE_SIZE >>>(read_mode))) >>>+ continue; >>> >>> /* Try a wider mode if truncating the store mode to NEW_MODE >>> requires a real instruction. */ >>>diff --git >>>a/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr98037.c >>>b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr98037.c >>>new file mode 100644 >>>index 00000000000..b91e940b18e >>>--- /dev/null >>>+++ b/gcc/testsuite/gcc.target/aarch64/sve/acle/general/pr98037.c >>>@@ -0,0 +1,6 @@ >>>+/* { dg-options "-msve-vector-bits=1024 -O3" } */ >>>+ >>>+typedef __SVInt8_t vec __attribute__((arm_sve_vector_bits(1024))); >>>+struct pair { vec v[2]; }; >>>+void use (struct pair *); >>>+vec f (struct pair p) { vec v = p.v[1]; use (&p); return v; }