https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95265
Bug ID: 95265 Summary: aarch64: suboptimal code generation for common neon intrinsic sequence involving shrn and mull Product: gcc Version: 10.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: generictoadhuman at gmail dot com Target Milestone: --- Compileable example: #include <arm_neon.h> int32x4_t func(int32x4_t a, int32x4_t b) { return vshrn_high_n_s64( vshrn_n_s64(vmull_s32(vget_low_s32(a), vget_low_s32(b)), 12), vmull_high_s32(a, b), 12); } with gcc -O3 the generated code contains two superfluent movs and and one unecessary dup. output of gcc -v Using built-in specs. COLLECT_GCC=C:\msys64\opt\devkitpro\devkitA64\bin\aarch64-none-elf-gcc.exe COLLECT_LTO_WRAPPER=c:/msys64/opt/devkitpro/devkita64/bin/../libexec/gcc/aarch64-none-elf/10.1.0/lto-wrapper.exe Target: aarch64-none-elf Configured with: ../../gcc-10.1.0/configure --enable-languages=c,c++,objc,lto --with-gnu-as --with-gnu-ld --with-gcc --with-march=armv8 --enable-cxx-flags=-ffunction-sections --disable-libstdcxx-verbose --enable-poison-system-directories --enable-interwork --enable-multilib --enable-threads --disable-win32-registry --disable-nls --disable-debug --disable-libmudflap --disable-libssp --disable-libgomp --disable-libstdcxx-pch --enable-libstdcxx-time --enable-libstdcxx-filesystem-ts --target=aarch64-none-elf --with-newlib=yes --with-headers=../../newlib-3.3.0/newlib/libc/include --prefix=/opt/devkitpro/x86_64-w64-mingw32/devkitA64 --enable-lto --with-system-zlib --with-bugurl=https://github.com/devkitPro/buildscripts/issues --with-pkgversion='devkitA64 release 15' --build=x86_64-unknown-linux-gnu --host=x86_64-w64-mingw32 --with-gmp=/opt/mingw64/mingw --with-mpfr=/opt/mingw64/mingw --with-mpc=/opt/mingw64/mingw Thread model: posix Supported LTO compression algorithms: zlib gcc version 10.1.0 (devkitA64 release 15)