On Tue, Apr 4, 2017 at 2:00 PM, Jakub Jelinek <ja...@redhat.com> wrote:
> On Tue, Apr 04, 2017 at 08:39:59AM +0200, Uros Bizjak wrote:
>> > Any thoughts on what to do to generate reasonable code when the shift count
>> > comes from memory (e.g. as int variable) or is in the low bits of some XMM
>> > regioster?
>>
>> The problem with int variable from memory is, that shifts access full
>> 128bits for their count operand, so this is effectively a no-go. If
>> there is a 128bit count value in memory, we can maybe define shift
>> pattern with:
>>
>> (subreg:DI (match_operand:V2DI 2 "general_operand" "xmN,vmN"))
>>
>> ?
>
> Well, if the original memory is say int, then we can't just read it as V2DI
> or V4SI.

Of course. The above was for the case when we *want* to load from
memory. The insn loads full 128bit value.

>> > First of all, perhaps we could have some combiner (or peephole) pattern 
>> > that would
>> > transform sign-extend from e.g. SI to DI on the shift count into 
>> > zero-extend
>> > if there are no other uses of the extension result - if the shift count is
>> > negative in SImode (or even QImode), then it is already large number and 
>> > the
>> > upper 32 bits or more don't really change anything on that.
>>
>> We can introduce shift patterns with embedded extensions, and split
>> them to zext + shift. These new patterns can be easily macroized with
>> any_extend code iterator and SWI124 mode iterator, so we avoid pattern
>> explosion.
>
> I assume split those before reload.  Because we want to give reload a chance
> to do the zero extension on GPRs if it is more beneficial, and it might
> choose to store it into memory and load into XMM from memory and that is
> hard to do after reload.

Yes, split before reload, and hope that alternative's decorations play
well with RA.

>> > Then perhaps we could emit pmovzxdq for SSE4.1+ instead of going through
>> > GPRs and back, or for SSE2 pxor on a scratch reg and punpck* to get it zero
>> > extended.  Not sure if we want to add =v / vm alternative to
>> > zero_extendsidi2*, it already has some x but with ?s that prevent the RA
>> > from using it.  So thoughts on that?
>>
>> The ? is there to discourage RA from allocating xmm reg (all these
>> alternatives have * on xmm reg), in effect instructing RA to prefer
>> GPRs. If the value is already in xmm reg, then I expect ? alternative
>> will be used. So, yes, v/v alternative as you proposed would be a good
>> addition to zero_extendsidi alternatives. Please note though that
>> pmovzxdq operates on a vector value, so memory operands should be
>> avoided.
>
> With ? in front of it or without?  I admit I've only tried so far:

I'd leave ?* in this case. In my experience, RA allocates alternative
with ?* only when really needed.

> @@ -4049,24 +4049,29 @@ (define_expand "extendsidi2"
>  })
>
>  (define_insn "*extendsidi2_rex64"
> -  [(set (match_operand:DI 0 "register_operand" "=*a,r")
> -       (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "*0,rm")))]
> +  [(set (match_operand:DI 0 "register_operand" "=*a,r,v")
> +       (sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" 
> "*0,rm,vm")))]
>    "TARGET_64BIT"
>    "@
>     {cltq|cdqe}
> -   movs{lq|x}\t{%1, %0|%0, %1}"
> -  [(set_attr "type" "imovx")
> -   (set_attr "mode" "DI")
> -   (set_attr "prefix_0f" "0")
> -   (set_attr "modrm" "0,1")])
> +   movs{lq|x}\t{%1, %0|%0, %1}
> +   %vpmovsxdq\t{%1, %0|%0, %1}"
> +  [(set_attr "isa" "*,*,sse4")
> +   (set_attr "type" "imovx,imovx,ssemov")
> +   (set_attr "mode" "DI,DI,TI")
> +   (set_attr "prefix_0f" "0,0,*")
> +   (set_attr "prefix_extra" "*,*,1")
> +   (set_attr "prefix" "orig,orig,maybe_evex")
> +   (set_attr "modrm" "0,1,*")])
>
>
> and with the ? in front of v it for some reason didn't trigger.
> I'll try the zero_extendsidi2 now and see how it works.
>
>> OK for trunk and backports.
>
> Committed to trunk so far, backports in a week or so when I backport
> dozens of other patches together with it.
>
>         Jakub

Uros.

Reply via email to