钟居哲 <juzhe.zh...@rivai.ai> writes:
> Oh. I see. Thank you so much for pointing this.
> Could you tell me what I should do in the codes?
> It seems that I should adjust it in 
> vect_adjust_loop_lens_control
>
> muliply by some factor ? Is this correct multiply by max_nscalars_per_iter
> ?

max_nscalars_per_iter * factor rather than just max_nscalars_per_iter

Note that it's possible for later max_nscalars_per_iter * factor to
be smaller, so a division might be needed in rare cases.  E.g.:

uint64_t x[100];
uint16_t y[200];

void f() {
  for (int i = 0, j = 0; i < 100; i += 2, j += 4) {
    x[i + 0] += 1;
    x[i + 1] += 2;
    y[j + 0] += 1;
    y[j + 1] += 2;
    y[j + 2] += 3;
    y[j + 3] += 4;
  }
}

where y has a single-control rgroup with max_nscalars_per_iter == 4
and x has a 2-control rgroup with max_nscalars_per_iter == 2

What gives the best code in these cases?  Is emitting a multiplication
better?  Or is using a new IV better?

Thanks,
Richard

Reply via email to