Hi, For the first piece of code ,I tried:
unsigned int nitems_per_iter
= dest_rgm->max_nscalars_per_iter * dest_rgm->factor;
step = gimple_build (seq, MULT_EXPR, iv_type, step,
build_int_cst (iv_type, nitems_per_iter));
Then optimized IR:
loop_len_34 = MIN_EXPR <ivtmp_72, 8>;
_74 = loop_len_34 * 4;
loop_len_51 = _74 + 18446744073709551604;
_16 = (void *) ivtmp.27_41;
_17 = &MEM <vector(8) short int> [(short int *)_16];
vect__1.7_33 = .LEN_LOAD (_17, 16B, loop_len_34, 0);
vect__2.8_23 = VIEW_CONVERT_EXPR<vector(8) unsigned short>(vect__1.7_33);
vect__3.9_22 = vect__2.8_23 + { 1, 2, 1, 2, 1, 2, 1, 2 };
vect__4.10_21 = VIEW_CONVERT_EXPR<vector(8) short int>(vect__3.9_22);
.LEN_STORE (_17, 16B, loop_len_34, vect__4.10_21, 0);
_20 = (void *) ivtmp.28_1;
_31 = &MEM <vector(4) int> [(int *)_20];
vect__10.15_52 = .LEN_LOAD (_31, 32B, 4, 0);
_30 = (void *) ivtmp.31_4;
_29 = &MEM <vector(4) int> [(int *)_30];
vect__10.16_54 = .LEN_LOAD (_29, 32B, 4, 0);
_26 = (void *) ivtmp.32_8;
_25 = &MEM <vector(4) int> [(int *)_26];
vect__10.17_56 = .LEN_LOAD (_25, 32B, 4, 0);
_79 = (void *) ivtmp.33_12;
_80 = &MEM <vector(4) int> [(int *)_79];
vect__10.18_58 = .LEN_LOAD (_80, 32B, loop_len_51, 0);
Is it correct ? It looks wierd ?
[email protected]
From: Richard Sandiford
Date: 2023-05-25 00:00
To: 钟居哲
CC: gcc-patches; rguenther
Subject: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by
variable amount support
钟居哲 <[email protected]> writes:
> Oh. I see. Thank you so much for pointing this.
> Could you tell me what I should do in the codes?
> It seems that I should adjust it in
> vect_adjust_loop_lens_control
>
> muliply by some factor ? Is this correct multiply by max_nscalars_per_iter
> ?
max_nscalars_per_iter * factor rather than just max_nscalars_per_iter
Note that it's possible for later max_nscalars_per_iter * factor to
be smaller, so a division might be needed in rare cases. E.g.:
uint64_t x[100];
uint16_t y[200];
void f() {
for (int i = 0, j = 0; i < 100; i += 2, j += 4) {
x[i + 0] += 1;
x[i + 1] += 2;
y[j + 0] += 1;
y[j + 1] += 2;
y[j + 2] += 3;
y[j + 3] += 4;
}
}
where y has a single-control rgroup with max_nscalars_per_iter == 4
and x has a 2-control rgroup with max_nscalars_per_iter == 2
What gives the best code in these cases? Is emitting a multiplication
better? Or is using a new IV better?
Thanks,
Richard