The thinking here is that using the largest possible LMUL when we know the operation will fit in fewer registers potentially leaves performance on the table - indirectly, due to the unnecessarily increased register pressure, and also directly, depending on the implementation.
On Mon, Dec 11, 2023 at 10:05 AM juzhe.zh...@rivai.ai <juzhe.zh...@rivai.ai> wrote: > Hi, Thanks for contributing this. > > +/* Select appropriate LMUL for a single vector operation based on > + byte size of data to be processed. > + On success, return true and populate lmul_out. > + If length_in is too wide for a single vector operation, return false > + and leave lmul_out unchanged. */ > + > +static bool > +select_appropriate_lmul (HOST_WIDE_INT length_in, > + HOST_WIDE_INT &lmul_out) > +{ > > I don't think we need this, you only need to use TARGET_MAX_LMUL > > > ------------------------------ > juzhe.zh...@rivai.ai >