On 5/22/19 7:01 AM, David Hildenbrand wrote:
> 
>> I also think that, if we create a bunch more of these wrappers:
>>
>>> +DEF_VFAE_HELPER(8)
>>> +DEF_VFAE_HELPER(16)
>>> +DEF_VFAE_HELPER(32)
>>
>> then RT and ZS can be passed in as constant parameters to the above, and then
>> the compiler will fold away all of the stuff that's not needed for each
>> different case.  Which, I think, is significant.  These are practically
>> different instructions with the different modifiers.
>>
> 
> So, we have 4 flags, resulting in 16 variants. Times 3 element sizes ...
> 48 helpers in total. Do we really want to go down that path?

Maybe?

> I can also go ahead any try to identify the most frequent users (in
> Linux) and only specialize that one.

Also plausible.  I guess it would be good to know, anyway.

I think RT probably makes the largest difference to the layout of the function,
so maybe that's the one we pick.  We could also leave our options open and make
the 3 non-CC flags be parameters to the inline function, just extract them from
the M4 parameter at the one higher level.


r~

Reply via email to