钟居哲 <juzhe.zh...@rivai.ai> writes:
> Hi, the .optimized dump is like this:
>
>   <bb 2> [local count: 21045336]:
>   ivtmp.26_36 = (unsigned long) &x;
>   ivtmp.27_3 = (unsigned long) &y;
>   ivtmp.30_6 = (unsigned long) &MEM <int[200]> [(void *)&y + 16B];
>   ivtmp.31_10 = (unsigned long) &MEM <int[200]> [(void *)&y + 32B];
>   ivtmp.32_14 = (unsigned long) &MEM <int[200]> [(void *)&y + 48B];
>
>   <bb 3> [local count: 273589366]:
>   # ivtmp_72 = PHI <ivtmp_73(3), 100(2)>
>   # ivtmp.26_41 = PHI <ivtmp.26_37(3), ivtmp.26_36(2)>
>   # ivtmp.27_1 = PHI <ivtmp.27_2(3), ivtmp.27_3(2)>
>   # ivtmp.30_4 = PHI <ivtmp.30_5(3), ivtmp.30_6(2)>
>   # ivtmp.31_8 = PHI <ivtmp.31_9(3), ivtmp.31_10(2)>
>   # ivtmp.32_12 = PHI <ivtmp.32_13(3), ivtmp.32_14(2)>
>   loop_len_34 = MIN_EXPR <ivtmp_72, 8>;
>   loop_len_48 = MIN_EXPR <loop_len_34, 4>;
>   _74 = loop_len_34 - loop_len_48;

Yeah, I think this needs to be:

  loop_len_48 = MIN_EXPR <loop_len_34 * 2, 4>;
  _74 = loop_len_34 * 2 - loop_len_48;
  
(as valid gimple).  The point is that...

>   loop_len_49 = MIN_EXPR <_74, 4>;
>   _75 = _74 - loop_len_49;
>   loop_len_50 = MIN_EXPR <_75, 4>;
>   loop_len_51 = _75 - loop_len_50;

...there are 4 lengths capped to 4, for a total element count of 16.
But loop_len_34 is never greater than 8.

So for this case we either need to multiply, or we need to create
a fresh IV for the second rgroup.  Both approaches are fine.

Thanks,
Richard

Reply via email to