On Tue, Apr 4, 2017 at 2:00 PM, Jakub Jelinek <ja...@redhat.com> wrote: > On Tue, Apr 04, 2017 at 08:39:59AM +0200, Uros Bizjak wrote: >> > Any thoughts on what to do to generate reasonable code when the shift count >> > comes from memory (e.g. as int variable) or is in the low bits of some XMM >> > regioster? >> >> The problem with int variable from memory is, that shifts access full >> 128bits for their count operand, so this is effectively a no-go. If >> there is a 128bit count value in memory, we can maybe define shift >> pattern with: >> >> (subreg:DI (match_operand:V2DI 2 "general_operand" "xmN,vmN")) >> >> ? > > Well, if the original memory is say int, then we can't just read it as V2DI > or V4SI.
Of course. The above was for the case when we *want* to load from memory. The insn loads full 128bit value. >> > First of all, perhaps we could have some combiner (or peephole) pattern >> > that would >> > transform sign-extend from e.g. SI to DI on the shift count into >> > zero-extend >> > if there are no other uses of the extension result - if the shift count is >> > negative in SImode (or even QImode), then it is already large number and >> > the >> > upper 32 bits or more don't really change anything on that. >> >> We can introduce shift patterns with embedded extensions, and split >> them to zext + shift. These new patterns can be easily macroized with >> any_extend code iterator and SWI124 mode iterator, so we avoid pattern >> explosion. > > I assume split those before reload. Because we want to give reload a chance > to do the zero extension on GPRs if it is more beneficial, and it might > choose to store it into memory and load into XMM from memory and that is > hard to do after reload. Yes, split before reload, and hope that alternative's decorations play well with RA. >> > Then perhaps we could emit pmovzxdq for SSE4.1+ instead of going through >> > GPRs and back, or for SSE2 pxor on a scratch reg and punpck* to get it zero >> > extended. Not sure if we want to add =v / vm alternative to >> > zero_extendsidi2*, it already has some x but with ?s that prevent the RA >> > from using it. So thoughts on that? >> >> The ? is there to discourage RA from allocating xmm reg (all these >> alternatives have * on xmm reg), in effect instructing RA to prefer >> GPRs. If the value is already in xmm reg, then I expect ? alternative >> will be used. So, yes, v/v alternative as you proposed would be a good >> addition to zero_extendsidi alternatives. Please note though that >> pmovzxdq operates on a vector value, so memory operands should be >> avoided. > > With ? in front of it or without? I admit I've only tried so far: I'd leave ?* in this case. In my experience, RA allocates alternative with ?* only when really needed. > @@ -4049,24 +4049,29 @@ (define_expand "extendsidi2" > }) > > (define_insn "*extendsidi2_rex64" > - [(set (match_operand:DI 0 "register_operand" "=*a,r") > - (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "*0,rm")))] > + [(set (match_operand:DI 0 "register_operand" "=*a,r,v") > + (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" > "*0,rm,vm")))] > "TARGET_64BIT" > "@ > {cltq|cdqe} > - movs{lq|x}\t{%1, %0|%0, %1}" > - [(set_attr "type" "imovx") > - (set_attr "mode" "DI") > - (set_attr "prefix_0f" "0") > - (set_attr "modrm" "0,1")]) > + movs{lq|x}\t{%1, %0|%0, %1} > + %vpmovsxdq\t{%1, %0|%0, %1}" > + [(set_attr "isa" "*,*,sse4") > + (set_attr "type" "imovx,imovx,ssemov") > + (set_attr "mode" "DI,DI,TI") > + (set_attr "prefix_0f" "0,0,*") > + (set_attr "prefix_extra" "*,*,1") > + (set_attr "prefix" "orig,orig,maybe_evex") > + (set_attr "modrm" "0,1,*")]) > > > and with the ? in front of v it for some reason didn't trigger. > I'll try the zero_extendsidi2 now and see how it works. > >> OK for trunk and backports. > > Committed to trunk so far, backports in a week or so when I backport > dozens of other patches together with it. > > Jakub Uros.