On Tue, Nov 10, 2020 at 2:39 PM Nick Desaulniers <ndesaulni...@google.com> wrote: > > On Tue, Nov 10, 2020 at 2:36 PM Nick Desaulniers > <ndesaulni...@google.com> wrote: > > > > On Tue, Nov 10, 2020 at 2:15 PM Arvind Sankar <nived...@alum.mit.edu> wrote: > > > > > > On Tue, Nov 10, 2020 at 01:41:17PM -0800, Nick Desaulniers wrote: > > > > On Mon, Nov 9, 2020 at 11:51 AM Adrian Ratiu > > > > <adrian.ra...@collabora.com> wrote: > > > > > > > > > > On Fri, 06 Nov 2020, Nick Desaulniers <ndesaulni...@google.com> > > > > > wrote: > > > > > > +#pragma clang loop vectorize(enable) > > > > > > do { > > > > > > p1[0] ^= p2[0] ^ p3[0] ^ p4[0] ^ p5[0]; p1[1] ^= > > > > > > p2[1] ^ p3[1] ^ p4[1] ^ p5[1]; > > > > > > ``` seems to generate the vectorized code. > > > > > > > > > > > > Why don't we find a way to make those pragma's more toolchain > > > > > > portable, rather than open coding them like I have above rather > > > > > > than this series? > > > > > > > > > > Hi again Nick, > > > > > > > > > > How did you verify the above pragmas generate correct vectorized > > > > > code? Have you tested this specific use case? > > > > > > > > I read the disassembly before and after my suggested use of pragmas; > > > > look for vld/vstr. You can also add -Rpass-missed=loop-vectorize to > > > > CFLAGS_xor-neon.o in arch/arm/lib/Makefile and rebuild > > > > arch/arm/lib/xor-neon.o with CONFIG_BTRFS enabled. > > > > > > > > > > https://godbolt.org/z/1oo9M6 > > > > > > With the __restrict__ keywords added, clang seems to vectorize the loop, > > > but still reports that vectorization wasn't beneficial -- any idea > > > what's going on? > > Anyways, it's not safe to make that change in the kernel unless you > can guarantee that callers of these routines do not alias or overlap.
s/callers/parameters passed by callers/ > > > > > I suspect that loop-vectorize is a higher level pass that relies on > > slp-vectorizer for the transform. > > > > $ clang -O2 --target=arm-linux-gnueabi -S -o - foo.c -mfpu=neon -mllvm > > -print-after-all > > ... > > *** IR Dump After SLP Vectorizer *** > > (bunch of <4 x i32> types) > > > > If you add -Rpass-missed=slp-vectorizer, observe that the existing > > warnings from -Rpass-missed=loop-vectorize disappear; I suspect > > loop-vectorize will print a "remark" if passes it calls did not, but > > returned some for of error code. > > > > -Rpass=slp-vectorizer shows that it vectorizes two sequences of the > > loop, and warns that some third portion (that's > > non-immediately-obvious to me) was non beneficial. > > > > -- > > Thanks, > > ~Nick Desaulniers > > > > -- > Thanks, > ~Nick Desaulniers -- Thanks, ~Nick Desaulniers