https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115629

--- Comment #5 from rguenther at suse dot de <rguenther at suse dot de> ---
> Am 01.07.2024 um 12:10 schrieb tnfchris at gcc dot gnu.org 
> <gcc-bugzi...@gcc.gnu.org>:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115629
> 
> --- Comment #4 from Tamar Christina <tnfchris at gcc dot gnu.org> ---
> (In reply to Richard Biener from comment #3)
>> So we now tail-merge the two b[i] loading blocks.  Can you check SVE
>> code-gen with this?  If that fixes the PR consider adding a SVE testcase.
> 
> Thanks, the codegen is much better now, but shows some other missing mask
> tracking in the vectorizer.
> 
> Atm we generate:
> 
> .L3:
>        ld1w    z31.s, p6/z, [x0, x6, lsl 2] <-- load a
>        cmpeq   p7.s, p6/z, z31.s, #0        <-- a == 0, !a
>        ld1w    z0.s, p7/z, [x2, x6, lsl 2]  <-- load c conditionally on !a
>        cmpeq   p7.s, p7/z, z0.s, #0         <-- !a && !c
>        orr     z0.d, z31.d, z0.d            <-- a || c
>        ld1w    z29.s, p7/z, [x3, x6, lsl 2] <--- load d where !a && !c
>        cmpne   p5.s, p6/z, z0.s, #0         <--- (a || c) & loop_mask
>        and     p7.b, p6/z, p7.b, p7.b       <--- ((!a && !c) && (!a && !c)) &
> loop_mask
>        ld1w    z30.s, p5/z, [x1, x6, lsl 2] <-- load b conditionally on (a ||
> c)
>        sel     z30.s, p7, z29.s, z30.s      <-- select (!a && !c, d, b)
>        st1w    z30.s, p6, [x4, x6, lsl 2]
>        add     x6, x6, x7
>        whilelo p6.s, w6, w5
>        b.any   .L3
> 
> which corresponds to:
> 
>  # loop_mask_63 = PHI <next_mask_95(10), max_mask_94(20)>
>  vect__4.10_64 = .MASK_LOAD (vectp_a.8_53, 32B, loop_mask_63);
>  mask__31.11_66 = vect__4.10_64 != { 0, ... };
>  mask__56.12_67 = ~mask__31.11_66;
>  vec_mask_and_70 = mask__56.12_67 & loop_mask_63;
>  vect__7.15_71 = .MASK_LOAD (vectp_c.13_68, 32B, vec_mask_and_70);
>  mask__22.16_73 = vect__7.15_71 == { 0, ... };
>  mask__34.17_75 = vec_mask_and_70 & mask__22.16_73;
>  vect_iftmp.20_78 = .MASK_LOAD (vectp_d.18_76, 32B, mask__34.17_75);
>  vect__61.21_79 = vect__4.10_64 | vect__7.15_71;
>  mask__35.22_81 = vect__61.21_79 != { 0, ... };
>  vec_mask_and_84 = mask__35.22_81 & loop_mask_63;
>  vect_iftmp.25_85 = .MASK_LOAD (vectp_b.23_82, 32B, vec_mask_and_84);
>  _86 = mask__34.17_75 & loop_mask_63;
>  vect_iftmp.26_87 = VEC_COND_EXPR <_86, vect_iftmp.20_78, vect_iftmp.25_85>;
>  .MASK_STORE (vectp_res.27_88, 32B, loop_mask_63, vect_iftmp.26_87);
> 
> it looks like what's missing is that the mask tracking doesn't know that other
> masked operations naturally perform an AND when combined.  We do some of this
> in the backend but I feel like it may be better to do it in the vectorizer.
> 
> In this case, the second load is conditional on the first load mask,  which
> means it's already done an AND.
> And crucially inverting it means you also inverted both conditions.
> 
> So there are some superflous masking operations happening.  But I guess that's
> a separate bug.  Shall I just add some tests here and close it and open a new
> PR?

Not sure if that helps - do we fully understand this is a separate issue and
not related to how we if-convert?

Adding a testcase is nevertheless OK of course.

> --
> You are receiving this mail because:
> You are on the CC list for the bug.

Reply via email to