Richard Biener <richard.guent...@gmail.com> writes: > On Fri, Jul 20, 2018 at 12:22 PM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> We couldn't vectorise: >> >> for (int j = 0; j < n; ++j) >> { >> for (int i = 0; i < 16; ++i) >> a[i] = (b[i] + c[i]) >> 1; >> a += step; >> b += step; >> c += step; >> } >> >> at -O3 because cunrolli unrolled the inner loop and SLP couldn't handle >> AVG_FLOOR patterns (see also PR86504). The problem was some overly >> strict checking of pattern statements compared to normal statements >> in vect_get_and_check_slp_defs: >> >> switch (gimple_code (def_stmt)) >> { >> case GIMPLE_PHI: >> case GIMPLE_ASSIGN: >> break; >> >> default: >> if (dump_enabled_p ()) >> dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location, >> "unsupported defining stmt:\n"); >> return -1; >> } >> >> The easy fix would have been to add GIMPLE_CALL to the switch, >> but I don't think the switch is doing anything useful. We only create >> pattern statements that the rest of the vectoriser can handle, and code >> in this function and elsewhere check whether SLP is possible. >> >> I'm also not sure why: >> >> if (!first && !oprnd_info->first_pattern >> /* Allow different pattern state for the defs of the >> first stmt in reduction chains. */ >> && (oprnd_info->first_dt != vect_reduction_def >> >> is necessary. All that should matter is that the statements in the >> node are "similar enough". It turned out to be quite hard to find a >> convincing example that used a mixture of pattern and non-pattern >> statements, so bb-slp-pow-1.c is the best I could come up with. >> But it does show that the combination of "xi * xi" statements and >> "pow (xj, 2) -> xj * xj" patterns are handled correctly. >> >> The patch therefore just removes the whole if block. >> >> The loop also needed commutative swapping to be extended to at least >> AVG_FLOOR. >> >> This gives +3.9% on 525.x264_r at -O3. >> >> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf >> and x86_64-linux-gnu. OK to install? > > Always nice to see seemingly dead code go. OK if you can still > build SPEC with this change and pass a test run. > > At least I _do_ seem to remember having "issues" in this area...
Thanks. I've now applied it after testing SPEC2006 and SPEC2017 on aarch64-linux-gnu and x86_64-linux-gnu. I also ran it through our internal SVE benchmark set, which includes those two and various other things. Richard