On Fri, 9 Aug 2019, Uros Bizjak wrote: > On Mon, Aug 5, 2019 at 3:09 PM Uros Bizjak <ubiz...@gmail.com> wrote: > > > > > > > > (define_mode_iterator MAXMIN_IMODE [SI "TARGET_SSE4_1"] [DI > > > > > > > "TARGET_AVX512F"]) > > > > > > > > > > > > > > and then we need to split DImode for 32bits, too. > > > > > > > > > > > > For now, please add "TARGET_64BIT && TARGET_AVX512F" for DImode > > > > > > condition, I'll provide _doubleword splitter later. > > > > > > > > > > Shouldn't that be TARGET_AVX512VL instead? Or does the insn use %g0 > > > > > etc. > > > > > to force use of %zmmN? > > > > > > > > It generates V4SI mode, so - yes, AVX512VL. > > > > > > case SMAX: > > > case SMIN: > > > case UMAX: > > > case UMIN: > > > if ((mode == DImode && (!TARGET_64BIT || !TARGET_AVX512VL)) > > > || (mode == SImode && !TARGET_SSE4_1)) > > > return false; > > > > > > so there's no way to use AVX512VL for 32bit? > > > > There is a way, but on 32bit targets, we need to split DImode > > operation to a sequence of SImode operations for unconverted pattern. > > This is of course doable, but somehow more complex than simply > > emitting a DImode compare + DImode cmove, which is what current > > splitter does. So, a follow-up task. > > Please find attached the complete .md part that enables SImode for > TARGET_SSE4_1 and DImode for TARGET_AVX512VL for both, 32bit and 64bit > targets. The patterns also allows for memory operand 2, so STV has > chance to create the vector pattern with implicit load. In case STV > fails, the memory operand 2 is loaded to the register first; operand > 2 is used in compare and cmove instruction, so pre-loading of the > operand should be beneficial.
Thanks. > Also note, that splitting should happen rarely. Due to the cost > function, STV should effectively always convert minmax to a vector > insn. I've analyzed the 464.h264ref slowdown on Haswell and it is due to this kind of "simple" conversion: 5.50 │1d0: test %esi,%es 0.07 │ mov $0x0,%ex │ cmovs %eax,%es 5.84 │ imul %r8d,%es to 0.65 │1e0: vpxor %xmm0,%xmm0,%xmm0 0.32 │ vpmaxs -0x10(%rsp),%xmm0,%xmm0 40.45 │ vmovd %xmm0,%eax 2.45 │ imul %r8d,%eax which looks like a RA artifact in the end. We spill %esi only with -mstv here as STV introduces a (subreg:V4SI ...) use of a pseudo ultimatively set from di. STV creates an additional pseudo for this (copy-in) but it places that copy next to the original def rather than next to the start of the chain it converts which is probably the issue why we spill. And this is because it inserts those at each definition of the pseudo rather than just at the reaching definition(s) or at the uses of the pseudo in the chain (that because there may be defs of that pseudo in the chain itself). Note that STV emits such "conversion" copies as simple reg-reg moves: (insn 1094 3 4 2 (set (reg:SI 777) (reg/v:SI 438 [ y ])) "refbuf.i":4:1 -1 (nil)) but those do not prevail very long (this one gets removed by CSE2). So IRA just sees the (subreg:V4SI (reg/v:SI 438 [ y ]) 0) use and computes r438: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS a297(r438,l0) costs: SSE_REGS:5628,5628 MEM:3618,3618 so I wonder if STV shouldn't instead emit gpr->xmm moves here (but I guess nothing again prevents RTL optimizers from combining that with the single-use in the max instruction...). So this boils down to STV splitting live-ranges but other passes undoing that and then RA not considering splitting live-ranges here, arriving at unoptimal allocation. A testcase showing this issue is (simplified from 464.h264ref UMVLine16Y_11): unsigned short UMVLine16Y_11 (short unsigned int * Pic, int y, int width) { if (y != width) { y = y < 0 ? 0 : y; return Pic[y * width]; } return Pic[y]; } where the condition and the Pic[y] load mimics the other use of y. Different, even worse spilling is generated by unsigned short UMVLine16Y_11 (short unsigned int * Pic, int y, int width) { y = y < 0 ? 0 : y; return Pic[y * width] + y; } I guess this all shows that STVs "trick" of simply wrapping integer mode pseudos in (subreg:vector-mode ...) is bad? I've added a (failing) testcase to reflect the above. Richard.