[Bug libstdc++/88971] Branch optimization inconsistency (missed optimization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971 Richard Biener changed: What|Removed |Added Keywords||missed-optimization CC||rguenth at gcc dot gnu.org Component|c++ |libstdc++ --- Comment #1 from Richard Biener --- This is because it still needs to generate the std::string objects at the caller site (outside of the if (print)). This involves quite some code to get rid of, and even at -O3 we do not inline basic_string::basic_string it seems (ISTR that is out-of-line in the library): __asm__ __volatile__("mfence" : : : "memory"); _6 = MEM[(const int *)&data + 4B]; if (_6 > 0) goto ; [41.48%] else goto ; [58.52%] [local count: 445388109]: std::basic_string::basic_string (&D.39204, "<", &D.39205); _7 = MEM[(char * *)&D.39204]; _8 = _7 + 18446744073709551592; if (_8 != &_S_empty_rep_storage) goto ; [10.00%] else goto ; [90.00%] [local count: 434030711]: goto ; [100.00%] [local count: 44538811]: if (__gthrw___pthread_key_create != 0B) goto ; [53.47%] else goto ; [46.53%] [local count: 23814902]: _9 = &MEM[(struct _Rep *)_7 + -24B].D.23940._M_refcount; _10 = __atomic_fetch_add_4 (_9, 4294967295, 4); _11 = (int) _10; goto ; [100.00%] [local count: 20723909]: __result_12 = MEM[(_Atomic_word *)_7 + -8B]; _13 = __result_12 + -1; MEM[(_Atomic_word *)_7 + -8B] = _13; [local count: 44538811]: # _14 = PHI <_11(6), __result_12(7)> if (_14 <= 0) goto ; [25.50%] else goto ; [74.50%] [local count: 11357397]: std::basic_string::_Rep::_M_destroy (_8, &D.39206); [local count: 445388108]: D.39206 ={v} {CLOBBER}; D.39204 ={v} {CLOBBER}; D.39205 ={v} {CLOBBER}; [local count: 1073741825]: __asm__ __volatile__("mfence" : : : "memory"); data ={v} {CLOBBER};
[Bug libstdc++/88971] Branch optimization inconsistency (missed optimization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971 --- Comment #2 from Jonathan Wakely --- (In reply to Richard Biener from comment #1) > rid of, and even at -O3 we do not inline basic_string::basic_string it seems > (ISTR that is out-of-line in the library): There's an explicit instantiation in the library, but the definition is inline in the headers. If the compiler wanted to inline it, all the code is visible and nothing forces it to use the explicit instantiation in the library.
[Bug libstdc++/88971] Branch optimization inconsistency (missed optimization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971 --- Comment #3 from Jonathan Wakely --- (In reply to Richard Biener from comment #1) > This is because it still needs to generate the std::string objects at the > caller > site (outside of the if (print)). This involves quite some code to get > rid of, and even at -O3 we do not inline basic_string::basic_string it seems > (ISTR that is out-of-line in the library): > > __asm__ __volatile__("mfence" : : : "memory"); > _6 = MEM[(const int *)&data + 4B]; > if (_6 > 0) > goto ; [41.48%] > else > goto ; [58.52%] > >[local count: 445388109]: > std::basic_string::basic_string (&D.39204, "<", &D.39205); > _7 = MEM[(char * *)&D.39204]; > _8 = _7 + 18446744073709551592; > if (_8 != &_S_empty_rep_storage) > goto ; [10.00%] > else > goto ; [90.00%] Looks like you're using -D_GLIBCXX_USE_CXX11_ABI=0 but the OP is not.
[Bug libstdc++/88971] Branch optimization inconsistency (missed optimization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971 Richard Biener changed: What|Removed |Added CC||hubicka at gcc dot gnu.org --- Comment #4 from Richard Biener --- (In reply to Jonathan Wakely from comment #3) > (In reply to Richard Biener from comment #1) > > This is because it still needs to generate the std::string objects at the > > caller > > site (outside of the if (print)). This involves quite some code to get > > rid of, and even at -O3 we do not inline basic_string::basic_string it seems > > (ISTR that is out-of-line in the library): > > > > __asm__ __volatile__("mfence" : : : "memory"); > > _6 = MEM[(const int *)&data + 4B]; > > if (_6 > 0) > > goto ; [41.48%] > > else > > goto ; [58.52%] > > > >[local count: 445388109]: > > std::basic_string::basic_string (&D.39204, "<", &D.39205); > > _7 = MEM[(char * *)&D.39204]; > > _8 = _7 + 18446744073709551592; > > if (_8 != &_S_empty_rep_storage) > > goto ; [10.00%] > > else > > goto ; [90.00%] > > Looks like you're using -D_GLIBCXX_USE_CXX11_ABI=0 but the OP is not. Indeed. It's still missed inlining that makes elding of the argument construction difficult. Looks like std::__cxx11::basic_string::_M_construct is not marked inline (so needs -finline-functions to get IPA inlining processing). Indeed I see // For forward_iterators up to random_access_iterators, used for // string::iterator, _CharT*, etc. template void _M_construct(_FwdIterator __beg, _FwdIterator __end, std::forward_iterator_tag); and others.
[Bug libstdc++/88971] Branch optimization inconsistency (missed optimization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971 --- Comment #5 from Marat Stanichenko --- (In reply to Richard Biener from comment #1) > This is because it still needs to generate the std::string objects at the > caller Thank you very much for the comment! That is probably why I had to add another string parameter to the function `PrintBad` in the example provided to trigger this behaviour. The fact is that in the production environment where I managed to spot this, the compiler could not optimize the function that is very similar to `PrintGood` and has just a single string parameter. I didn't manage to reproduce it when simplifying things while preparing a test-case here. Do you believe that compiler can do better in such situations? Or the current behaviour is perfectly valid and no improvements are really needed?
[Bug libstdc++/88971] Branch optimization inconsistency (missed optimization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971 --- Comment #6 from rguenther at suse dot de --- On Tue, 22 Jan 2019, maratrus at mail dot ru wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971 > > --- Comment #5 from Marat Stanichenko --- > > (In reply to Richard Biener from comment #1) > > This is because it still needs to generate the std::string objects at the > > caller > > Thank you very much for the comment! That is probably why I had to add another > string parameter to the function `PrintBad` in the example provided to trigger > this behaviour. The fact is that in the production environment where I managed > to spot this, the compiler could not optimize the function that is very > similar > to `PrintGood` and has just a single string parameter. I didn't manage to > reproduce it when simplifying things while preparing a test-case here. > > Do you believe that compiler can do better in such situations? Or the current > behaviour is perfectly valid and no improvements are really needed? The compiler can of course do better when estimating the benefit of inlining. It's just not entirely clear if it is reasonably easy to do so ... [let aside the -finline-function issue]
[Bug libstdc++/88971] Branch optimization inconsistency (missed optimization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971 --- Comment #7 from Marat Stanichenko --- (In reply to rguent...@suse.de from comment #6) > > Do you believe that compiler can do better in such situations? Or the > > current > > behaviour is perfectly valid and no improvements are really needed? > > The compiler can of course do better when estimating the benefit of > inlining. It's just not entirely clear if it is reasonably easy to > do so ... [let aside the -finline-function issue] Is it only about inlining? Unfortunately, I cannot evaluate the technical difficulties but in patterns like presented here i.e. ``` if (condition) Function(param); ``` irrespective of the fact whether `Function()` is asked to be inlined or not there are at least two observations that the compiler may notice before taking an optimization decision: a) Whether `Function(param)` is empty or not. In both scenarios in the example presented `PrintBad("<", ">", t)` and `PrintGood("<", t)` are actually empty. b) Whether `param` is used in a `Function`. I think I can see the examples when function calls like `PrintBad(">", t)` generates a code to construct parameter ">" despite the fact that not only it is not used but also the function body is empty. Of course, ideally I expect the parameter not to be constructed if it is not used and the whole branch to be eliminated of the `Function` is empty and `condition` does not have side effects. But as I said, I do not have enough technical expertise to evaluate the cost. I speak from a solely user's perspective. What do you think is the best way to solve this? Keep tracking examples until some critical mass is collected?
[Bug libstdc++/88971] Branch optimization inconsistency (missed optimization)
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971 --- Comment #8 from rguenther at suse dot de --- On Tue, 22 Jan 2019, maratrus at mail dot ru wrote: > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88971 > > --- Comment #7 from Marat Stanichenko --- > (In reply to rguent...@suse.de from comment #6) > > > Do you believe that compiler can do better in such situations? Or the > > > current > > > behaviour is perfectly valid and no improvements are really needed? > > > > The compiler can of course do better when estimating the benefit of > > inlining. It's just not entirely clear if it is reasonably easy to > > do so ... [let aside the -finline-function issue] > > Is it only about inlining? > > Unfortunately, I cannot evaluate the technical difficulties but in patterns > like presented here i.e. > > ``` > if (condition) > Function(param); > ``` > > irrespective of the fact whether `Function()` is asked to be inlined or not > there are at least two observations that the compiler may notice before taking > an optimization decision: > > a) Whether `Function(param)` is empty or not. In both scenarios in the example > presented `PrintBad("<", ">", t)` and `PrintGood("<", t)` are > actually empty. > > b) Whether `param` is used in a `Function`. I think I can see the examples > when > function calls like `PrintBad(">", t)` generates a code to construct > parameter ">" despite the fact that not only it is not used but also the > function body is empty. > > Of course, ideally I expect the parameter not to be constructed if it is not > used and the whole branch to be eliminated of the `Function` is empty and > `condition` does not have side effects. But as I said, I do not have enough > technical expertise to evaluate the cost. I speak from a solely user's > perspective. > > What do you think is the best way to solve this? Keep tracking examples until > some critical mass is collected? Well, it's more until any bright idea pops up how to solve this abstraction issue...