On Tue, Oct 11, 2016 at 5:03 PM, Bin Cheng <bin.ch...@arm.com> wrote: > Hi, > Given below test case, > int foo (unsigned short a[], unsigned int x) > { > unsigned int i; > for (i = 0; i < 1000; i++) > { > x = a[i]; > a[i] = (unsigned short)(x >= 32768 ? x - 32768 : 0); > } > return x; > } > > it now can be vectorized on AArch64, but generated assembly is way from > optimal: > .L4: > ldr q4, [x3, x1] > add w2, w2, 1 > cmp w2, w0 > ushll v1.4s, v4.4h, 0 > ushll2 v0.4s, v4.8h, 0 > add v3.4s, v1.4s, v6.4s > add v2.4s, v0.4s, v6.4s > cmhi v1.4s, v1.4s, v5.4s > cmhi v0.4s, v0.4s, v5.4s > and v1.16b, v3.16b, v1.16b > and v0.16b, v2.16b, v0.16b > xtn v2.4h, v1.4s > xtn2 v2.8h, v0.4s > str q2, [x3, x1] > add x1, x1, 16 > bcc .L4 > > The vectorized loop has 15 instructions, which can be greatly simplified by > turning cond_expr into max_expr, as below: > .L4: > ldr q1, [x3, x1] > add w2, w2, 1 > cmp w2, w0 > umax v0.8h, v1.8h, v2.8h > add v0.8h, v0.8h, v2.8h > str q0, [x3, x1] > add x1, x1, 16 > bcc .L4 > > This patch addresses the issue by adding new vectorization pattern. > Bootstrap and test on x86_64 and AArch64. Is it OK?
So the COND_EXPRs are generated this way by if-conversion, right? I believe that the MAX/MIN_EXPR form is always preferrable and thus it looks like if-conversion might want to either directly generate it or make sure to fold the introduced stmts (and have a match.pd pattern catching this). Richard. > Thanks, > bin > > 2016-10-11 Bin Cheng <bin.ch...@arm.com> > > * tree-vect-patterns.c (vect_recog_min_max_modify_pattern): New. > (vect_vect_recog_func_ptrs): New element for above pattern. > * tree-vectorizer.h (NUM_PATTERNS): Increase by 1. > > gcc/testsuite/ChangeLog > 2016-10-11 Bin Cheng <bin.ch...@arm.com> > > * gcc.dg/vect/vect-umax-modify-pattern.c: New test. > * gcc.dg/vect/vect-umin-modify-pattern.c: New test.