On Wed, 9 Sep 2020, Andrea Corallo wrote:

> Hi all,
> 
> this patch is meant not to generate predication in loop when the
> loop is operating only on full vectors.
> 
> Ex:
> 
> #+BEGIN_SRC C
> /* Vector length is 256.  */
> void
> f (int *restrict x, int *restrict y, unsigned int n) {
>   for (unsigned int i = 0; i < n * 8; ++i)
>     x[i] += y[i];
> }
> #+END_SRC
> 
> Compiling on aarch64 with -O3 -msve-vector-bits=256 current trunk
> gives:
> 
> #+BEGIN_SRC asm
> f:
> .LFB0:
>         .cfi_startproc
>         lsl     w2, w2, 3
>         cbz     w2, .L1
>         mov     x3, 0
>         whilelo p0.s, xzr, x2
>         .p2align 3,,7
> .L3:
>         ld1w    z0.s, p0/z, [x0, x3, lsl 2]
>         ld1w    z1.s, p0/z, [x1, x3, lsl 2]
>         add     z0.s, z0.s, z1.s
>         st1w    z0.s, p0, [x0, x3, lsl 2]
>         add     x3, x3, 8
>         whilelo p0.s, x3, x2
>         b.any   .L3
> .L1:
>         ret
>         .cfi_endproc
> #+END_SRC
> 
> With the patch applied:
> 
> #+BEGIN_SRC asm
> f:
> .LFB0:
>         .cfi_startproc
>         lsl     w3, w2, 3
>         cbz     w3, .L1
>         mov     x2, 0
>         ptrue   p0.b, vl32
>         .p2align 3,,7
> .L3:
>         ld1w    z0.s, p0/z, [x0, x2, lsl 2]
>         ld1w    z1.s, p0/z, [x1, x2, lsl 2]
>         add     z0.s, z0.s, z1.s
>         st1w    z0.s, p0, [x0, x2, lsl 2]
>         add     x2, x2, 8
>         cmp     x2, x3
>         bne     .L3
> .L1:
>         ret
>         .cfi_endproc
> #+END_SRC
> 
> To achieve this we check earlier if the loop needs peeling and if is
> not the case we do not set LOOP_VINFO_USING_PARTIAL_VECTORS_P to true.
> 
> I moved some logic from 'determine_peel_for_niter' to
> 'vect_need_peeling_or_part_vects_p' so it can be used for this purpose.
> 
> Bootstrapped and regtested on aarch64-linux-gnu.

Looks OK to me, the comment

@@ -2267,7 +2278,10 @@ start_over:
     {
       if (param_vect_partial_vector_usage == 0)
        LOOP_VINFO_USING_PARTIAL_VECTORS_P (loop_vinfo) = false;
-      else if (vect_verify_full_masking (loop_vinfo)
+      else if ((vect_verify_full_masking (loop_vinfo)
+               && vect_need_peeling_or_part_vects_p (loop_vinfo))
+              /* Don't use partial vectors if we don't need to peel the
+                 loop.  */
               || vect_verify_loop_lens (loop_vinfo))

seems to be oddly misplaced (I'd put it before the call).

Richard.

> Feedback is welcome, thanks.
> 
>   Andrea
> 
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to