https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005
Bug ID: 93005 Summary: Redundant NEON loads/stores from stack are not eliminated Product: gcc Version: 8.3.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ Assignee: unassigned at gcc dot gnu.org Reporter: joel at airwebreathe dot org.uk Target Milestone: --- On x86_64 SSE, gcc is able to eliminated redundant load/store operations to the stack, but on ARM, gcc seems unable to do the same optimization with NEON vector registers. This x86_64 code is optimized as expected: #include <x86intrin.h> __m128i foo(__m128i a) { int32_t temp[4]; _mm_store_si128(reinterpret_cast<__m128i*>(temp), a); return _mm_load_si128(reinterpret_cast<__m128i*>(temp)); } ...when compiled with -O2: foo(long long __vector(2)): ret However, when compiling analogous code for ARM NEON: #include <arm_neon.h> int32x4_t foo(int32x4_t a) { int32_t temp[4]; vst1q_s32(temp, a); return vld1q_s32(temp); } ...when compiled with -O2 -march=armv7-a -mtune=cortex-a8 -mfpu=neon -mfloat-abi=hard : foo(__simd128_int32_t): sub sp, sp, #16 vst1.32 {d0-d1}, [sp:64] vld1.32 {d0-d1}, [sp:64] add sp, sp, #16 bx lr The load/store to the stack are redundant and should be eliminated, because temp should have been promoted to NEON registers. (see the attached godbolt link [1] to compare) This issue was discovered while trying to use gcc with the Eigen library on ARM NEON. Eigen does intermediate processing using compiler intrinsics, but intermediate values must be written back to POD arrays on the stack. In a complex algorithm this results in the machine code being peppered with redundant stores and loads. [1] https://godbolt.org/#z:OYLghAFBqd5TKALEBjA9gEwKYFFMCWALugE4A0BIEAZugHZEDKqAhgDbYgCMALOQCse5dq3qhUAUgBMAIRmzyAZ2ydURAg2rZ6mAMLp2AVwC29ENPI7MAGQL1sAOVMAjbKRAAOcgAd0S4k16A2MzC19/QIY7B2cTNw9vFTUNBiYiVlIiENNzS2TsdSD0zKIYp1d3L2UMrJyw/Nqy%2Bwr4qs8ASmV0I1JULgByGQBme1RjHABqSWG9AA9PADZ7IlJ7ADokGdxJAAYAQT39gH1jk25pTwJJunQIU/PL69YOo8kAdnkDyZ/JleHpMciJMiNgTD5JABWWS8KEAERmX32v0mZxMxyUJFI2AxBAungg2JW7h82KIxzYmJmege%2BIIACpthBQeCupMXoijiiyb16KiTOj2OhWJhcfjCdhiaRSdhyZSiNTaU9GcNcMywT4Oq9hkiPnCBl12CABpCBuRzANdmb0Ma9AoFJMlD0%2BthptJhtwzURjVateQANYgYbvdaeXi8ACc3Aji1jnl20cWImNvDNJhAkfWEd2kd40hjkPe7wjvHe5p91uNZqUIF25G9loN5DgsBQGHBBE4FCoEHbPk7VWAnmG5BondBpBrEBcFfILnsmQAnsbPeR2yYdEQAPL0djLgarnAmMTATizwjYooAN2wNcbVjmhSMoJXZuJRvv7AILlIS4MOFfes1nTA8mzoRgWA4Lg%2BEEYRRHENB7TkERvxrWB6FYDcQElYAdHIG8PFWIx6H9P10B8VJ6DvABaLdpGrVRCko7RdHqcxuCsXRyjiBJhD8AJKLYvjIko7jKg8DiCiKNImiEyTGOk%2BgSiyMS2gkmpSjkjSVJaHiqm4LonV6foeENY1TXLe8bQGBZFmoxZeEmYBUFQSZPHWYZJggfBiDIN0PXISYDA7Lt/IMoKkNkL0Kz9JBsBFKoIDMgZU3IdNIV2dZdgjDLC12YY82kXZIWkfgLStchrOrWt6xirpAxK9ZpEWZqIw9SFFneAFPAjZMBmGM1ysrAZosbLoWwQeAIDbLA8EILFKGocDmDYM8YKESx4IkSLlAU5iIGsITeusVTeKTfiomCQxchAXqLtE3TxJAJMpMo5TsmusJete4omlOqoXtkz7zG%2Bv7HrU57DOdEyhndMYJldalMnRBwGE2bY3gOf5pDmXggRudA7mx3H8Y5A4PiRFFsfxlkIWhWFIQRHUuV%2BK9MW4ABHDEAXVcFAo5Znvl%2BHlSD5K92EwTnuekXnNU5cn3n1ZKLKGyrbUix1ocR916NqsaujihKPCSgMMwjdZIV4BNdnebhuGy4Ybc2lM0xADLLIqqrlBqhtfWbRBprQdAQvcRbe2D/suxAZHhw4sd2AnKcZ3vecMNIfdV3XTcdz3c8wRPM97wvJiCBvO8KuwR9UGfQZV3fWcvx/P9ZsA1YCBAz0umWyC1v4DaRBPRC5AUFCXDQk3yMomi6Mmajjz6LZhjhZGr3eajWFnkwiGI7AZjhDAskr9fPE3mgfCMPfUb5OeaCFVgiHX%2Bc96QTJMAYlIghY/RgZ4Y6uPBs6EQBJBDkndESQR/oeBentX6mkf7Rl2h/GSpRIHPW0h9UI7FQYoIAQDKGxkuDSGVoNWc1l9gACUACyTkXJuQ8usbgXl6AMGwB0LyPksT%2BUsEFCOA5SBcLYXaYechRq%2BgNvFHAxtkqpXSnWVWXsax1l9gaeqGYQxdS6rsRYYYASdTtn1AaHthqiJUX1XW8iqx6zEfhdwAQtC8CAA%3D%3D%3D