Hi,

On 2020/9/1 01:04, Segher Boessenkool wrote:
> Hi!
> 
> On Mon, Aug 31, 2020 at 04:06:47AM -0500, Xiong Hu Luo wrote:
>> vec_insert accepts 3 arguments, arg0 is input vector, arg1 is the value
>> to be insert, arg2 is the place to insert arg1 to arg0.  This patch adds
>> __builtin_vec_insert_v4si[v4sf,v2di,v2df,v8hi,v16qi] for vec_insert to
>> not expand too early in gimple stage if arg2 is variable, to avoid generate
>> store hit load instructions.
>>
>> For Power9 V4SI:
>>      addi 9,1,-16
>>      rldic 6,6,2,60
>>      stxv 34,-16(1)
>>      stwx 5,9,6
>>      lxv 34,-16(1)
>> =>
>>      addis 9,2,.LC0@toc@ha
>>      addi 9,9,.LC0@toc@l
>>      mtvsrwz 33,5
>>      lxv 32,0(9)
>>      sradi 9,6,2
>>      addze 9,9
>>      sldi 9,9,2
>>      subf 9,9,6
>>      subfic 9,9,3
>>      sldi 9,9,2
>>      subfic 9,9,20
>>      lvsl 13,0,9
>>      xxperm 33,33,45
>>      xxperm 32,32,45
>>      xxsel 34,34,33,32
> 
> For v a V4SI, x a SI, j some int,  what do we generate for
>    v[j&3] = x;
> ?
> This should be exactly the same as we generate for
>    vec_insert(x, v, j);
> (the builtin does a modulo 4 automatically).

No, even with my patch "stxv 34,-16(1);stwx 5,9,6;lxv 34,-16(1)" generated 
currently.
Is it feasible and acceptable to expand some kind of pattern in expander 
directly without
builtin transition?

I borrowed some of implementation from vec_extract.  For vec_extract, the issue 
also exists:

           source:                             gimple:                          
   expand:                                        asm:
1) i = vec_extract (v, n);  =>  i = __builtin_vec_ext_v4si (v, n);   => 
{r120:SI=unspec[r118:V4SI,r119:DI] 134;...}     => slwi 9,6,2   vextuwrx 3,9,2  
 
2) i = vec_extract (v, 3);  =>  i = __builtin_vec_ext_v4si (v, 3);   => 
{r120:SI=vec_select(r118:V4SI,parallel)...}     =>  li 9,12      vextuwrx 3,9,2
3) i = v[n%4];   =>  _1 = n & 3;  i = VIEW_CONVERT_EXPR<int[4]>(v)[_1];  =>     
   ...        =>     stxv 34,-16(1);addi 9,1,-16; rldic 5,5,2,60; lwax 3,9,5
4) i = v[3];     =>          i = BIT_FIELD_REF <v, 32, 96>;              =>  
{r120:SI=vec_select(r118:V4SI,parallel)...} => li 9,12;   vextuwrx 3,9,2 

Case 3) also couldn't handle the similar usage, and case 4) doesn't generate 
builtin as expected,
it just expand to vec_select by coincidence.  So does this mean both vec_insert 
and vec_extract
and all other similar vector builtins should use IFN as suggested by Richard 
Biener, to match the
pattern in gimple and expand both constant and variable index in expander?  
Will this also be
beneficial for other targets except power?  Or we should do that gradually 
after this patch
approved as it seems another independent issue?  Thanks:)


Xionghu

Reply via email to