https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115135
Bug ID: 115135 Summary: [C++] GCC produces wrong code at certain inlining levels on Aarch64 with -fno-exceptions, related to lambdas and variants Product: gcc Version: 14.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: clopez at igalia dot com Target Milestone: --- Created attachment 58225 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=58225&action=edit Simplified test case to reproduce the issue on Aarch64. The program should print OK at the end. Check the comments at the top on how to build it. This issue has been detected on the WebKit project. The builds of some CI bots of WPEWebKit for Aarch64 started to crash heavily recently. We are currently using GCC-12, but the issue happens also with newer GCC versions. I tested and I can reproduce the issue with GCC-13 and GCC-14. The original bug report is here: https://bugs.webkit.org/show_bug.cgi?id=273703 and further discussion is at: https://github.com/WebKit/WebKit/pull/28117 After quite a bit of effort I managed to create a simplified test case of only a hundred lines. I'm attaching the test-case here. If you build the test case and everything goes as expected you should see this: intel-64:~# g++ -O3 -fno-exceptions test.cpp && ./a.out [DEBUG] aPtr at 0x7ffd042c5080 points to 0x55e26078eec0 which has value 10 [At initTest()] [DEBUG] bObj at 0x7ffd042c5078 has value 11 [At initTest()] [DEBUG] aPtr at 0x7ffd042c5090 points to 0x55e26078eec0 which has value 10 [At doTest():aTest] [DEBUG] bObj at 0x7ffd042c507c has value 11 [At doTest():aTest] [DEBUG] mObj at 0x7ffd042c50cc [At doTest():aTest] [DEBUG] mObj at 0x7ffd042c50cc has value 33 [At main()] [OK] Everything went as expected. Program compiled correctly :) So the program checks itself that everything was calculated as expected. It reports OK if everything works as it should. Which is for example, what you see if you build the program on x86_64 However, if you try this on Aarch64 you see something like this: raspberrypi4-64:~# g++ -O3 -fno-exceptions test.cpp && ./a.out [DEBUG] aPtr at 0x7fec013cb0 points to 0x55cf43cec0 which has value 10 [At initTest()] [DEBUG] bObj at 0x7fec013ca0 has value 11 [At initTest()] [DEBUG] aPtr at 0x7fec013cc0 points to 0x5590b2fd20 which has value -1867445184 [At doTest():aTest] [DEBUG] bObj at 0x7fec013ca8 has value -2006561636 [At doTest():aTest] [DEBUG] mObj at 0x7fec013cf8 [At doTest():aTest] [DEBUG] mObj at 0x7fec013cf8 has value 420960488 [At main()] [ERROR] Something went wrong compiling the program!: mObj.m_data should be 33 but is 420960488 The program ends with error, because the last two function parameters that the doTest() function receives (aPtr and bObj) are messed up when passed into the lambda. It seems to me that is like the compiler somehow misses the initialization of those function parameters (pass-by-value in this case) when the doTest() function is called if those parameters are not used on the main body of the function, but only inside the lambda. What is even more amazing, is that if you comment out the third printfs() that print the address of mObj inside the lambda then the program works correctly. In other words, basically apply this patch to the program: --- a/test.cpp 2024-05-17 12:12:50.561903072 +0000 +++ b/test.cpp 2024-05-17 12:32:45.957454704 +0000 @@ -81,13 +81,11 @@ [&](std::unique_ptr<aTest>& ptr_a) -> int { printf("[DEBUG] aPtr at %p points to %p which has value %d [At doTest():aTest]\n", &aPtr, aPtr.get(), aPtr->m_data); printf("[DEBUG] bObj at %p has value %d [At doTest():aTest]\n", &bObj, bObj.m_data); - printf("[DEBUG] mObj at %p [At doTest():aTest]\n", &mObj); return aPtr->m_data + ptr_a->m_data + bObj.m_data; }, [&](std::unique_ptr<bTest>& ptr_b) -> int { printf("[DEBUG] aPtr at %p points to %p which has value %d [At doTest():bTest]\n", &aPtr, aPtr.get(), aPtr->m_data); printf("[DEBUG] bObj at %p has value %d [At doTest():bTest]\n", &bObj, bObj.m_data); - printf("[DEBUG] mObj at %p [At doTest():bTest]\n", &mObj); return aPtr->m_data + ptr_b->m_data + bObj.m_data; }); } And then it works.. why? It makes zero sense to me. Note that mObj is not used on the calculation returned, so it shouldn't even need to be captured into the lambda for the program to work. Inside the test code example there are some comments at the top about different switches on how to compile it to reproduce the error. The issue is only reproducible on Aarch64. And I reproduced with gcc-12, gcc-13 and gcc-14 both on a Debian system as well on a Yocto/OpenEmbedded based system. Both on Aarch64. Note: To the best of my understanding there is no undefined behaviour or dangling references here, as the lambda should finish the execution before the function doTest() ends and those function parameters that were passed into the lambda by reference go out of scope. Another fun thing: if you pass -fsanitize=undefined then the program works correctly