subject:"Re\: Re\: \[PATCH V14\] VECT\: Add decrement IV iteration loop control by variable amount support"

Re: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread juzhe.zh...@rivai.ai

Hi, Richard. 
After several tries with your testcases (I already added into V15 patch).
I think "using a new IV" would be better than "multiplication"

Now:
 loop_len_34 = MIN_EXPR ;
  _74 = MIN_EXPR ;   --> multiplication approach will changed 
into  _74 = loop_len_34  * 2;
  loop_len_48 = MIN_EXPR <_74, 4>;
  _77 = _74 - loop_len_48;
  loop_len_49 = MIN_EXPR <_77, 4>;
  _78 = _77 - loop_len_49;
  loop_len_50 = MIN_EXPR <_78, 4>;
  loop_len_51 = _78 - loop_len_50;

I prefer "new IV" since it looks more reasonable and better codegen.
Could you take a look at it:
V15 patch:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619534.html 

Thanks.

juzhe.zh...@rivai.ai

From: Richard Sandiford
Date: 2023-05-25 04:05
To: 钟居哲
CC: gcc-patches; rguenther
Subject: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by 
variable amount support
I'll look at the samples tomorrow, but just to address one thing:

钟居哲  writes:
>>> What gives the best code in these cases?  Is emitting a multiplication
>>> better?  Or is using a new IV better?
> Could you give me more detail information about "new refresh IV" approach.
> I'd like to try that.

By “using a new IV” I meant calling vect_set_loop_controls_directly
for every rgroup, not just the first.  So in the earlier example,
there would be one decrementing IV for x and one decrementing IV for y.

Thanks,
Richard

Re: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread 钟居哲

Hi, Richard. After I fix codes, now IR is correct I think:

loop_len_34 = MIN_EXPR ;
  _74 = loop_len_34 * 2;
  loop_len_48 = MIN_EXPR <_74, 4>;
  _75 = _74 - loop_len_48;
  loop_len_49 = MIN_EXPR <_75, 4>;
  _76 = _75 - loop_len_49;
  loop_len_50 = MIN_EXPR <_76, 4>;
  loop_len_51 = _76 - loop_len_50;
  ...
  vect__1.8_33 = .LEN_LOAD (_17, 16B, loop_len_34, 0);
...
  .LEN_STORE (_17, 16B, loop_len_34, vect__4.11_21, 0);
...

  vect__10.16_52 = .LEN_LOAD (_31, 32B, loop_len_48, 0);
...
  vect__10.17_54 = .LEN_LOAD (_29, 32B, loop_len_49, 0);
...
  vect__10.18_56 = .LEN_LOAD (_25, 32B, loop_len_50, 0);
...
  vect__10.19_58 = .LEN_LOAD (_80, 32B, loop_len_51, 0);


For this case:

uint64_t x2[100];
uint16_t y2[200];

void f2(int n) {
  for (int i = 0, j = 0; i < n; i += 2, j += 4) {
x2[i + 0] += 1;
x2[i + 1] += 2;
y2[j + 0] += 1;
y2[j + 1] += 2;
y2[j + 2] += 3;
y2[j + 3] += 4;
  }
}

The IR is like this:

  loop_len_56 = MIN_EXPR ;
  _66 = loop_len_56 * 4;
  loop_len_43 = _66 + 18446744073709551614;
  ...
  vect__1.44_44 = .LEN_LOAD (_6, 64B, 2, 0);
  ...
  vect__1.45_46 = .LEN_LOAD (_14, 64B, loop_len_43, 0);
  vect__2.46_47 = vect__1.44_44 + { 1, 2 };
  vect__2.46_48 = vect__1.45_46 + { 1, 2 };
  .LEN_STORE (_6, 64B, 2, vect__2.46_47, 0);
  .LEN_STORE (_14, 64B, loop_len_43, vect__2.46_48, 0);
  ...
  vect__6.51_57 = .LEN_LOAD (_10, 16B, loop_len_56, 0);

  vect__7.52_58 = vect__6.51_57 + { 1, 2, 3, 4, 1, 2, 3, 4 };
  .LEN_STORE (_10, 16B, loop_len_56, vect__7.52_58, 0);

It seems correct too ?

>> What gives the best code in these cases?  Is emitting a multiplication
>> better?  Or is using a new IV better?
Could you give me more detail information about "new refresh IV" approach.
I'd like to try that.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-25 00:00
To: 钟居哲
CC: gcc-patches; rguenther
Subject: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by 
variable amount support
钟居哲  writes:
> Oh. I see. Thank you so much for pointing this.
> Could you tell me what I should do in the codes?
> It seems that I should adjust it in 
> vect_adjust_loop_lens_control
>
> muliply by some factor ? Is this correct multiply by max_nscalars_per_iter
> ?
 
max_nscalars_per_iter * factor rather than just max_nscalars_per_iter
 
Note that it's possible for later max_nscalars_per_iter * factor to
be smaller, so a division might be needed in rare cases.  E.g.:
 
uint64_t x[100];
uint16_t y[200];
 
void f() {
  for (int i = 0, j = 0; i < 100; i += 2, j += 4) {
x[i + 0] += 1;
x[i + 1] += 2;
y[j + 0] += 1;
y[j + 1] += 2;
y[j + 2] += 3;
y[j + 3] += 4;
  }
}
 
where y has a single-control rgroup with max_nscalars_per_iter == 4
and x has a 2-control rgroup with max_nscalars_per_iter == 2
 
What gives the best code in these cases?  Is emitting a multiplication
better?  Or is using a new IV better?
 
Thanks,
Richard

Re: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread 钟居哲

Hi, For the first piece of code ,I tried:
  unsigned int nitems_per_iter
= dest_rgm->max_nscalars_per_iter * dest_rgm->factor;
  step = gimple_build (seq, MULT_EXPR, iv_type, step,
   build_int_cst (iv_type, nitems_per_iter));

Then optimized IR:
loop_len_34 = MIN_EXPR ;
  _74 = loop_len_34 * 4;
  loop_len_51 = _74 + 18446744073709551604;

  _16 = (void *) ivtmp.27_41;
  _17 = &MEM  [(short int *)_16];

  vect__1.7_33 = .LEN_LOAD (_17, 16B, loop_len_34, 0);

  vect__2.8_23 = VIEW_CONVERT_EXPR(vect__1.7_33);
  vect__3.9_22 = vect__2.8_23 + { 1, 2, 1, 2, 1, 2, 1, 2 };
  vect__4.10_21 = VIEW_CONVERT_EXPR(vect__3.9_22);
  .LEN_STORE (_17, 16B, loop_len_34, vect__4.10_21, 0);
  _20 = (void *) ivtmp.28_1;
  _31 = &MEM  [(int *)_20];

  vect__10.15_52 = .LEN_LOAD (_31, 32B, 4, 0);

  _30 = (void *) ivtmp.31_4;
  _29 = &MEM  [(int *)_30];

  vect__10.16_54 = .LEN_LOAD (_29, 32B, 4, 0);

  _26 = (void *) ivtmp.32_8;
  _25 = &MEM  [(int *)_26];

  vect__10.17_56 = .LEN_LOAD (_25, 32B, 4, 0);

  _79 = (void *) ivtmp.33_12;
  _80 = &MEM  [(int *)_79];

  vect__10.18_58 = .LEN_LOAD (_80, 32B, loop_len_51, 0);

Is it correct ? It looks wierd ? 


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-25 00:00
To: 钟居哲
CC: gcc-patches; rguenther
Subject: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by 
variable amount support
钟居哲  writes:
> Oh. I see. Thank you so much for pointing this.
> Could you tell me what I should do in the codes?
> It seems that I should adjust it in 
> vect_adjust_loop_lens_control
>
> muliply by some factor ? Is this correct multiply by max_nscalars_per_iter
> ?
 
max_nscalars_per_iter * factor rather than just max_nscalars_per_iter
 
Note that it's possible for later max_nscalars_per_iter * factor to
be smaller, so a division might be needed in rare cases.  E.g.:
 
uint64_t x[100];
uint16_t y[200];
 
void f() {
  for (int i = 0, j = 0; i < 100; i += 2, j += 4) {
x[i + 0] += 1;
x[i + 1] += 2;
y[j + 0] += 1;
y[j + 1] += 2;
y[j + 2] += 3;
y[j + 3] += 4;
  }
}
 
where y has a single-control rgroup with max_nscalars_per_iter == 4
and x has a 2-control rgroup with max_nscalars_per_iter == 2
 
What gives the best code in these cases?  Is emitting a multiplication
better?  Or is using a new IV better?
 
Thanks,
Richard

Re: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread 钟居哲

Oh. I see. Thank you so much for pointing this.
Could you tell me what I should do in the codes?
It seems that I should adjust it in 
vect_adjust_loop_lens_control

muliply by some factor ? Is this correct multiply by max_nscalars_per_iter
?
Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-24 23:47
To: 钟居哲
CC: gcc-patches; rguenther
Subject: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by 
variable amount support
钟居哲  writes:
> Hi, Richard. I still don't understand it. Sorry about that.
>
>>>  loop_len_48 = MIN_EXPR ;
>   >>   _74 = loop_len_34 * 2 - loop_len_48;
>
> I have the tests already tested.
> We have a MIN_EXPR to calculate the total elements:
> loop_len_34 = MIN_EXPR ;
> I think "8" is already multiplied by 2?
>
> Why do we need loop_len_34 * 2 ?
> Could you give me more informations, The similiar tests you present we 
> already have
> execution check and passed. I am not sure whether this patch has the issue 
> that I didn't notice.
 
Think about the maximum values of each SSA name:
 
   loop_len_34 = MIN_EXPR ;   // MAX 8
   loop_len_48 = MIN_EXPR ;// MAX 4
   _74 = loop_len_34 - loop_len_48;// MAX 4
   loop_len_49 = MIN_EXPR <_74, 4>;// MAX 4 (always == _74)
   _75 = _74 - loop_len_49;// 0
   loop_len_50 = MIN_EXPR <_75, 4>;// 0
   loop_len_51 = _75 - loop_len_50;// 0
 
So the final two y vectors will always have 0 controls.
 
Thanks,
Richard

Re: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread 钟居哲

Hi, Richard. I still don't understand it. Sorry about that.

>>  loop_len_48 = MIN_EXPR ;
  >>   _74 = loop_len_34 * 2 - loop_len_48;

I have the tests already tested.
We have a MIN_EXPR to calculate the total elements:
loop_len_34 = MIN_EXPR ;
I think "8" is already multiplied by 2?

Why do we need loop_len_34 * 2 ?
Could you give me more informations, The similiar tests you present we already 
have
execution check and passed. I am not sure whether this patch has the issue that 
I didn't notice.

Thanks.

juzhe.zh...@rivai.ai

From: Richard Sandiford
Date: 2023-05-24 23:31
To: 钟居哲
CC: gcc-patches; rguenther
Subject: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by 
variable amount support
钟居哲  writes:
> Hi, the .optimized dump is like this:
>
>[local count: 21045336]:
>   ivtmp.26_36 = (unsigned long) &x;
>   ivtmp.27_3 = (unsigned long) &y;
>   ivtmp.30_6 = (unsigned long) &MEM  [(void *)&y + 16B];
>   ivtmp.31_10 = (unsigned long) &MEM  [(void *)&y + 32B];
>   ivtmp.32_14 = (unsigned long) &MEM  [(void *)&y + 48B];
>
>[local count: 273589366]:
>   # ivtmp_72 = PHI 
>   # ivtmp.26_41 = PHI 
>   # ivtmp.27_1 = PHI 
>   # ivtmp.30_4 = PHI 
>   # ivtmp.31_8 = PHI 
>   # ivtmp.32_12 = PHI 
>   loop_len_34 = MIN_EXPR ;
>   loop_len_48 = MIN_EXPR ;
>   _74 = loop_len_34 - loop_len_48;

Yeah, I think this needs to be:

  loop_len_48 = MIN_EXPR ;
  _74 = loop_len_34 * 2 - loop_len_48;

(as valid gimple).  The point is that...

>   loop_len_49 = MIN_EXPR <_74, 4>;
>   _75 = _74 - loop_len_49;
>   loop_len_50 = MIN_EXPR <_75, 4>;
>   loop_len_51 = _75 - loop_len_50;

...there are 4 lengths capped to 4, for a total element count of 16.
But loop_len_34 is never greater than 8.

So for this case we either need to multiply, or we need to create
a fresh IV for the second rgroup.  Both approaches are fine.

Thanks,
Richard

Re: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

2023-05-24 Thread 钟居哲

Hi, the .optimized dump is like this:

   [local count: 21045336]:
  ivtmp.26_36 = (unsigned long) &x;
  ivtmp.27_3 = (unsigned long) &y;
  ivtmp.30_6 = (unsigned long) &MEM  [(void *)&y + 16B];
  ivtmp.31_10 = (unsigned long) &MEM  [(void *)&y + 32B];
  ivtmp.32_14 = (unsigned long) &MEM  [(void *)&y + 48B];

   [local count: 273589366]:
  # ivtmp_72 = PHI 
  # ivtmp.26_41 = PHI 
  # ivtmp.27_1 = PHI 
  # ivtmp.30_4 = PHI 
  # ivtmp.31_8 = PHI 
  # ivtmp.32_12 = PHI 
  loop_len_34 = MIN_EXPR ;
  loop_len_48 = MIN_EXPR ;
  _74 = loop_len_34 - loop_len_48;
  loop_len_49 = MIN_EXPR <_74, 4>;
  _75 = _74 - loop_len_49;
  loop_len_50 = MIN_EXPR <_75, 4>;
  loop_len_51 = _75 - loop_len_50;
  _16 = (void *) ivtmp.26_41;
  _17 = &MEM  [(short int *)_16];
  vect__1.6_33 = .LEN_LOAD (_17, 16B, loop_len_34, 0);
  vect__2.7_23 = VIEW_CONVERT_EXPR(vect__1.6_33);
  vect__3.8_22 = vect__2.7_23 + { 1, 2, 1, 2, 1, 2, 1, 2 };
  vect__4.9_21 = VIEW_CONVERT_EXPR(vect__3.8_22);
  .LEN_STORE (_17, 16B, loop_len_34, vect__4.9_21, 0);
  _20 = (void *) ivtmp.27_1;
  _31 = &MEM  [(int *)_20];
  vect__10.14_52 = .LEN_LOAD (_31, 32B, loop_len_48, 0);
  _30 = (void *) ivtmp.30_4;
  _29 = &MEM  [(int *)_30];
  vect__10.15_54 = .LEN_LOAD (_29, 32B, loop_len_49, 0);
  _26 = (void *) ivtmp.31_8;
  _25 = &MEM  [(int *)_26];
  vect__10.16_56 = .LEN_LOAD (_25, 32B, loop_len_50, 0);
  _78 = (void *) ivtmp.32_12;
  _79 = &MEM  [(int *)_78];
  vect__10.17_58 = .LEN_LOAD (_79, 32B, loop_len_51, 0);
  vect__11.18_59 = vect__10.14_52 + { 1, 2, 3, 4 };
  vect__11.18_60 = vect__10.15_54 + { 1, 2, 3, 4 };
  vect__11.18_61 = vect__10.16_56 + { 1, 2, 3, 4 };
  vect__11.18_62 = vect__10.17_58 + { 1, 2, 3, 4 };
  .LEN_STORE (_31, 32B, loop_len_48, vect__11.18_59, 0);
  .LEN_STORE (_29, 32B, loop_len_49, vect__11.18_60, 0);
  .LEN_STORE (_25, 32B, loop_len_50, vect__11.18_61, 0);
  .LEN_STORE (_79, 32B, loop_len_51, vect__11.18_62, 0);
  ivtmp_73 = ivtmp_72 - loop_len_34;
  ivtmp.26_37 = ivtmp.26_41 + 16;
  ivtmp.27_2 = ivtmp.27_1 + 64;
  ivtmp.30_5 = ivtmp.30_4 + 64;
  ivtmp.31_9 = ivtmp.31_8 + 64;
  ivtmp.32_13 = ivtmp.32_12 + 64;
  if (ivtmp_73 != 0)
goto ; [92.31%]
  else
goto ; [7.69%]

I am still check about it but I send it to you earlier.

Thanks.


juzhe.zh...@rivai.ai
 
From: Richard Sandiford
Date: 2023-05-24 23:07
To: juzhe.zhong
CC: gcc-patches; rguenther
Subject: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by 
variable amount support
Thanks for trying it.  I'm still surprised that no multiplication
is needed though.  Does the patch work for:
 
short x[100];
int y[200];
 
void f() {
  for (int i = 0, j = 0; i < 100; i += 2, j += 4) {
x[i + 0] += 1;
x[i + 1] += 2;
y[j + 0] += 1;
y[j + 1] += 2;
y[j + 2] += 3;
y[j + 3] += 4;
  }
}
 
?  Here, there should be a single-control rgroup for x, counting
2 units per scalar iteration.  I'd expect the IV to use this scale.
 
There should also be a 4-control rgroup for y, counting 4 units per
scalar iteration.  So I think the IV would need to be multiplied by 2
before being used for the y rgroup.
 
Thanks,
Richard
 
juzhe.zh...@rivai.ai writes:
> From: Ju-Zhe Zhong 
>
> This patch is supporting decrement IV by following the flow designed by 
> Richard:
>
> (1) In vect_set_loop_condition_partial_vectors, for the first iteration of:
> call vect_set_loop_controls_directly.
>
> (2) vect_set_loop_controls_directly calculates "step" as in your patch.
> If rgc has 1 control, this step is the SSA name created for that control.
> Otherwise the step is a fresh SSA name, as in your patch.
>
> (3) vect_set_loop_controls_directly stores this step somewhere for later
> use, probably in LOOP_VINFO.  Let's use "S" to refer to this stored step.
>
> (4) After the vect_set_loop_controls_directly call above, and outside
> the "if" statement that now contains vect_set_loop_controls_directly,
> check whether rgc->controls.length () > 1.  If so, use
> vect_adjust_loop_lens_control to set the controls based on S.
>
> Then the only caller of vect_adjust_loop_lens_control is
> vect_set_loop_condition_partial_vectors.  And the starting
> step for vect_adjust_loop_lens_control is always S.
>
> This patch has well tested for single-rgroup and multiple-rgroup (SLP) and
> passed all testcase in RISC-V port.
>
> Also, pass tests for multiple-rgroup (non-SLP) tested on vec_pack_trunk.
>
> ---
>  gcc/tree-vect-loop-manip.cc | 178 +---
>  gcc/tree-vect-loop.cc   |  13 +++
>  gcc/tree-vectorizer.h   |  12 +++
>  3 files changed, 192 insertions(+), 11 deletions(-)
>
> diff --git a/gcc/tree-vect-loop-manip.cc b/gcc/tree-vect-loop-manip.cc
> index ff6159e08d5..578ac5b783e 100644
> --- a/gcc/tree-vect-loop-manip.cc
> +++ b/gcc/tree-vect-loop-manip.cc
> @@ -468,6 +468,38 @@ vect_set_loop_controls_directly (class loop *loop, 
> loop_vec_info loop_vinfo,
>gimple_stmt_iterator incr_gsi;
>bool insert_after;
>standard_iv_increment_position (loop,

Re: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

Re: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

Re: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

Re: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

Re: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

Re: Re: [PATCH V14] VECT: Add decrement IV iteration loop control by variable amount support

6 matches

Site Navigation

Mail list logo

Footer information