https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93005

            Bug ID: 93005
           Summary: Redundant NEON loads/stores from stack are not
                    eliminated
           Product: gcc
           Version: 8.3.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: c++
          Assignee: unassigned at gcc dot gnu.org
          Reporter: joel at airwebreathe dot org.uk
  Target Milestone: ---

On x86_64 SSE, gcc is able to eliminated redundant load/store operations to the
stack, but on ARM, gcc seems unable to do the same optimization with NEON
vector registers.

This x86_64 code is optimized as expected:


#include <x86intrin.h>

__m128i foo(__m128i a)
{
    int32_t temp[4];
    _mm_store_si128(reinterpret_cast<__m128i*>(temp), a);
    return _mm_load_si128(reinterpret_cast<__m128i*>(temp));
}


...when compiled with -O2:


foo(long long __vector(2)):
        ret


However, when compiling analogous code for ARM NEON:


#include <arm_neon.h>

int32x4_t foo(int32x4_t a)
{
    int32_t temp[4];
    vst1q_s32(temp, a);
    return vld1q_s32(temp);
}


...when compiled with -O2 -march=armv7-a -mtune=cortex-a8 -mfpu=neon
-mfloat-abi=hard :


foo(__simd128_int32_t):
        sub     sp, sp, #16
        vst1.32 {d0-d1}, [sp:64]
        vld1.32 {d0-d1}, [sp:64]
        add     sp, sp, #16
        bx      lr


The load/store to the stack are redundant and should be eliminated, because
temp should have been promoted to NEON registers.

(see the attached godbolt link [1] to compare)

This issue was discovered while trying to use gcc with the Eigen library on ARM
NEON. Eigen does intermediate processing using compiler intrinsics, but
intermediate values must be written back to POD arrays on the stack. In a
complex algorithm this results in the machine code being peppered with
redundant stores and loads.


[1]
https://godbolt.org/#z:OYLghAFBqd5TKALEBjA9gEwKYFFMCWALugE4A0BIEAZugHZEDKqAhgDbYgCMALOQCse5dq3qhUAUgBMAIRmzyAZ2ydURAg2rZ6mAMLp2AVwC29ENPI7MAGQL1sAOVMAjbKRAAOcgAd0S4k16A2MzC19/QIY7B2cTNw9vFTUNBiYiVlIiENNzS2TsdSD0zKIYp1d3L2UMrJyw/Nqy%2Bwr4qs8ASmV0I1JULgByGQBme1RjHABqSWG9AA9PADZ7IlJ7ADokGdxJAAYAQT39gH1jk25pTwJJunQIU/PL69YOo8kAdnkDyZ/JleHpMciJMiNgTD5JABWWS8KEAERmX32v0mZxMxyUJFI2AxBAungg2JW7h82KIxzYmJmege%2BIIACpthBQeCupMXoijiiyb16KiTOj2OhWJhcfjCdhiaRSdhyZSiNTaU9GcNcMywT4Oq9hkiPnCBl12CABpCBuRzANdmb0Ma9AoFJMlD0%2BthptJhtwzURjVateQANYgYbvdaeXi8ACc3Aji1jnl20cWImNvDNJhAkfWEd2kd40hjkPe7wjvHe5p91uNZqUIF25G9loN5DgsBQGHBBE4FCoEHbPk7VWAnmG5BondBpBrEBcFfILnsmQAnsbPeR2yYdEQAPL0djLgarnAmMTATizwjYooAN2wNcbVjmhSMoJXZuJRvv7AILlIS4MOFfes1nTA8mzoRgWA4Lg%2BEEYRRHENB7TkERvxrWB6FYDcQElYAdHIG8PFWIx6H9P10B8VJ6DvABaLdpGrVRCko7RdHqcxuCsXRyjiBJhD8AJKLYvjIko7jKg8DiCiKNImiEyTGOk%2BgSiyMS2gkmpSjkjSVJaHiqm4LonV6foeENY1TXLe8bQGBZFmoxZeEmYBUFQSZPHWYZJggfBiDIN0PXISYDA7Lt/IMoKkNkL0Kz9JBsBFKoIDMgZU3IdNIV2dZdgjDLC12YY82kXZIWkfgLStchrOrWt6xirpAxK9ZpEWZqIw9SFFneAFPAjZMBmGM1ysrAZosbLoWwQeAIDbLA8EILFKGocDmDYM8YKESx4IkSLlAU5iIGsITeusVTeKTfiomCQxchAXqLtE3TxJAJMpMo5TsmusJete4omlOqoXtkz7zG%2Bv7HrU57DOdEyhndMYJldalMnRBwGE2bY3gOf5pDmXggRudA7mx3H8Y5A4PiRFFsfxlkIWhWFIQRHUuV%2BK9MW4ABHDEAXVcFAo5Znvl%2BHlSD5K92EwTnuekXnNU5cn3n1ZKLKGyrbUix1ocR916NqsaujihKPCSgMMwjdZIV4BNdnebhuGy4Ybc2lM0xADLLIqqrlBqhtfWbRBprQdAQvcRbe2D/suxAZHhw4sd2AnKcZ3vecMNIfdV3XTcdz3c8wRPM97wvJiCBvO8KuwR9UGfQZV3fWcvx/P9ZsA1YCBAz0umWyC1v4DaRBPRC5AUFCXDQk3yMomi6Mmajjz6LZhjhZGr3eajWFnkwiGI7AZjhDAskr9fPE3mgfCMPfUb5OeaCFVgiHX%2Bc96QTJMAYlIghY/RgZ4Y6uPBs6EQBJBDkndESQR/oeBentX6mkf7Rl2h/GSpRIHPW0h9UI7FQYoIAQDKGxkuDSGVoNWc1l9gACUACyTkXJuQ8usbgXl6AMGwB0LyPksT%2BUsEFCOA5SBcLYXaYechRq%2BgNvFHAxtkqpXSnWVWXsax1l9gaeqGYQxdS6rsRYYYASdTtn1AaHthqiJUX1XW8iqx6zEfhdwAQtC8CAA%3D%3D%3D

Reply via email to