https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115135

            Bug ID: 115135
           Summary: [C++] GCC produces wrong code at certain inlining
                    levels on Aarch64 with -fno-exceptions, related to
                    lambdas and variants
           Product: gcc
           Version: 14.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: clopez at igalia dot com
  Target Milestone: ---

Created attachment 58225
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58225&action=edit
Simplified test case to reproduce the issue on Aarch64. The program should
print OK at the end. Check the comments at the top on how to build it.

This issue has been detected on the WebKit project. The builds of some CI bots
of WPEWebKit for Aarch64 started to crash heavily recently.
We are currently using GCC-12, but the issue happens also with newer GCC
versions. I tested and I can reproduce the issue with GCC-13 and GCC-14.

The original bug report is here: https://bugs.webkit.org/show_bug.cgi?id=273703
and further discussion is at: https://github.com/WebKit/WebKit/pull/28117

After quite a bit of effort I managed to create a simplified test case of only
a hundred lines. I'm attaching the test-case here.

If you build the test case and everything goes as expected you should see this:

intel-64:~# g++ -O3 -fno-exceptions test.cpp && ./a.out
[DEBUG] aPtr at 0x7ffd042c5080 points to 0x55e26078eec0 which has value 10 [At
initTest()]
[DEBUG] bObj at 0x7ffd042c5078 has value 11 [At initTest()]
[DEBUG] aPtr at 0x7ffd042c5090 points to 0x55e26078eec0 which has value 10 [At
doTest():aTest]
[DEBUG] bObj at 0x7ffd042c507c has value 11 [At doTest():aTest]
[DEBUG] mObj at 0x7ffd042c50cc [At doTest():aTest]
[DEBUG] mObj at 0x7ffd042c50cc has value 33 [At main()]

[OK] Everything went as expected. Program compiled correctly :)

So the program checks itself that everything was calculated as expected. It
reports OK if everything works as it should. Which is for example, what you see
if you build the program on x86_64

However, if you try this on Aarch64 you see something like this:

raspberrypi4-64:~# g++ -O3 -fno-exceptions test.cpp && ./a.out
[DEBUG] aPtr at 0x7fec013cb0 points to 0x55cf43cec0 which has value 10 [At
initTest()]
[DEBUG] bObj at 0x7fec013ca0 has value 11 [At initTest()]
[DEBUG] aPtr at 0x7fec013cc0 points to 0x5590b2fd20 which has value -1867445184
[At doTest():aTest]
[DEBUG] bObj at 0x7fec013ca8 has value -2006561636 [At doTest():aTest]
[DEBUG] mObj at 0x7fec013cf8 [At doTest():aTest]
[DEBUG] mObj at 0x7fec013cf8 has value 420960488 [At main()]

[ERROR] Something went wrong compiling the program!:  mObj.m_data should be 33
but is 420960488

The program ends with error, because the last two function parameters that the
doTest() function receives (aPtr and bObj) are messed up when passed into the
lambda.
It seems to me that is like the compiler somehow misses the initialization of
those function parameters (pass-by-value in this case) when the doTest()
function is called if those parameters are not used on the main body of the
function, but only inside the lambda.

What is even more amazing, is that if you comment out the third printfs() that
print the address of mObj inside the lambda then the program works correctly.

In other words, basically apply this patch to the program:

--- a/test.cpp  2024-05-17 12:12:50.561903072 +0000
+++ b/test.cpp  2024-05-17 12:32:45.957454704 +0000
@@ -81,13 +81,11 @@
         [&](std::unique_ptr<aTest>& ptr_a) -> int {
             printf("[DEBUG] aPtr at %p points to %p which has value %d [At
doTest():aTest]\n", &aPtr, aPtr.get(), aPtr->m_data);
             printf("[DEBUG] bObj at %p has value %d [At doTest():aTest]\n",
&bObj, bObj.m_data);
-            printf("[DEBUG] mObj at %p [At doTest():aTest]\n", &mObj);
             return aPtr->m_data + ptr_a->m_data + bObj.m_data;
         },
         [&](std::unique_ptr<bTest>& ptr_b) -> int {
             printf("[DEBUG] aPtr at %p points to %p which has value %d [At
doTest():bTest]\n", &aPtr, aPtr.get(), aPtr->m_data);
             printf("[DEBUG] bObj at %p has value %d [At doTest():bTest]\n",
&bObj, bObj.m_data);
-            printf("[DEBUG] mObj at %p [At doTest():bTest]\n", &mObj);
             return aPtr->m_data + ptr_b->m_data + bObj.m_data;
         });
 }


And then it works.. why? It makes zero sense to me.
Note that mObj is not used on the calculation returned, so it shouldn't even
need to be captured into the lambda for the program to work.

Inside the test code example there are some comments at the top about different
switches on how to compile it to reproduce the error.

The issue is only reproducible on Aarch64.
And I reproduced with gcc-12, gcc-13 and gcc-14 both on a Debian system as well
on a Yocto/OpenEmbedded based system. Both on Aarch64.

Note: To the best of my understanding there is no undefined behaviour or
dangling references here, as the lambda should finish the execution before the
function doTest() ends and those function parameters that were passed into the
lambda by reference go out of scope.
Another fun thing: if you pass -fsanitize=undefined then the program works
correctly

Reply via email to