On 5/22/19 7:01 AM, David Hildenbrand wrote: > >> I also think that, if we create a bunch more of these wrappers: >> >>> +DEF_VFAE_HELPER(8) >>> +DEF_VFAE_HELPER(16) >>> +DEF_VFAE_HELPER(32) >> >> then RT and ZS can be passed in as constant parameters to the above, and then >> the compiler will fold away all of the stuff that's not needed for each >> different case. Which, I think, is significant. These are practically >> different instructions with the different modifiers. >> > > So, we have 4 flags, resulting in 16 variants. Times 3 element sizes ... > 48 helpers in total. Do we really want to go down that path?
Maybe? > I can also go ahead any try to identify the most frequent users (in > Linux) and only specialize that one. Also plausible. I guess it would be good to know, anyway. I think RT probably makes the largest difference to the layout of the function, so maybe that's the one we pick. We could also leave our options open and make the 3 non-CC flags be parameters to the inline function, just extract them from the M4 parameter at the one higher level. r~