钟居哲 <juzhe.zh...@rivai.ai> writes: > Oh. I see. Thank you so much for pointing this. > Could you tell me what I should do in the codes? > It seems that I should adjust it in > vect_adjust_loop_lens_control > > muliply by some factor ? Is this correct multiply by max_nscalars_per_iter > ?
max_nscalars_per_iter * factor rather than just max_nscalars_per_iter Note that it's possible for later max_nscalars_per_iter * factor to be smaller, so a division might be needed in rare cases. E.g.: uint64_t x[100]; uint16_t y[200]; void f() { for (int i = 0, j = 0; i < 100; i += 2, j += 4) { x[i + 0] += 1; x[i + 1] += 2; y[j + 0] += 1; y[j + 1] += 2; y[j + 2] += 3; y[j + 3] += 4; } } where y has a single-control rgroup with max_nscalars_per_iter == 4 and x has a 2-control rgroup with max_nscalars_per_iter == 2 What gives the best code in these cases? Is emitting a multiplication better? Or is using a new IV better? Thanks, Richard