[Bug c++/62056] Long compile times with large tuples
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056 Manuel López-Ibáñez manu at gcc dot gnu.org changed: What|Removed |Added CC||jwakely.gcc at gmail dot com --- Comment #11 from Manuel López-Ibáñez manu at gcc dot gnu.org --- Jonathan, what should we do about this? Is this implementation better than the one in libstdc++? If so, what steps are needed to get this integrated?
[Bug c++/62056] Long compile times with large tuples
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056 --- Comment #5 from Piotr Dziwinski piotrdz at gmail dot com --- (In reply to Jonathan Wakely from comment #4) tr1::tuple doesn't support perfect-forwarding or move semantics tr1::tuple doesn't support uses-allocator construction tr1::tuple doesn't support 'final' classes tr1::tuple doesn't have correct exception specifications tr1::tuple doesn't prevent implicit conversions that would use explicit constructors tr1::tuple doesn't support tuple concatenation If you can add all those features to the tr1/tuple implementation so that it meets the C++11 requirements and it still compiles faster then I'd be interested in your analysis of the remaining differences. Otherwise I'm going to assume the difference is because the tuple header contains more than twice as many lines of code and several additional features. Ok, I understand it now. I was speaking from only somewhat experienced user perspective and I did not realise the deeper implications of standard compliance. Just for the record, I did some testing and found two important factors here are: - dependency to array (to enable it in tuple concatenation) - (change in compile time 0.357s - 0.231s), - allocator constructors - (change 0.231s - 0.185s). So in the end please ignore my interruption with `std::tr1::tuple`. It seems the recursive version of `std::tuple` is not going to be optimized easily and the new flat implementation is the way to go here.
[Bug c++/62056] Long compile times with large tuples
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056 --- Comment #6 from Agustín Bergé kaballo86 at hotmail dot com --- (In reply to Piotr Dziwinski from comment #5) It seems the recursive version of `std::tuple` is not going to be optimized easily and the new flat implementation is the way to go here. Well, not necessarily, It's not `std::tuple` which is at fault, but the compiler (gcc and every other). Note that this issue was moved from libstdc++ to c++, so that's promising!
[Bug c++/62056] Long compile times with large tuples
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056 Manuel López-Ibáñez manu at gcc dot gnu.org changed: What|Removed |Added Status|UNCONFIRMED |WAITING Last reconfirmed||2014-09-30 CC||manu at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #7 from Manuel López-Ibáñez manu at gcc dot gnu.org --- (In reply to Agustín Bergé from comment #6) Well, not necessarily, It's not `std::tuple` which is at fault, but the compiler (gcc and every other). Note that this issue was moved from libstdc++ to c++, so that's promising! I don't see any analysis of what the bug actually is. If there is a bug in g++, you should be able to trigger it with a testcase not including tuple. On the other hand, if you want to provide an alternative implementation of tuple, you should join libstc++: https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html
[Bug c++/62056] Long compile times with large tuples
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056 --- Comment #8 from Agustín Bergé kaballo86 at hotmail dot com --- (In reply to Manuel López-Ibáñez from comment #7) (In reply to Agustín Bergé from comment #6) Well, not necessarily, It's not `std::tuple` which is at fault, but the compiler (gcc and every other). Note that this issue was moved from libstdc++ to c++, so that's promising! I don't see any analysis of what the bug actually is. If there is a bug in g++, you should be able to trigger it with a testcase not including tuple. On the other hand, if you want to provide an alternative implementation of tuple, you should join libstc++: https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html This issue started targeting libstdc++, and later someone reassigned it to c++. The recursive tuple implementation in libstdc++ causes our use of large tuples to more than double compilation time when compared to a flat implementation. Hence the inclusion of tuple. There is no bug, but a QoI issue that makes `std::tuple` use prohibitive for large tuples. Whether the QoI issue belongs with libc++ or g++, I am not entitled to say. The underlying issue is a g++ issue, but there are known compile-time efficient implementations of `std::tuple`. Of course, a solution in the compiler would benefit a wider audience so it would be preferable. And again, no solution would be fine as well given that this is QoI. As for a bug analysis (which is a requirement that I was unaware of), I haven't put this particular combination of implementation and compiler under a profiler. I have done however enough experimentation on similar code bases and the results are consistent, so while I might be off in the specific details it should still give you a strong hint of where to look. The issue boils down to name mangling and lookup: the longer the name the more memory consumption and cpu usage. For a recursive variadic implementation, instantiating a tuple of N elements with aggregated mangled length M results in the mangling of N bases with average mangled length M/2. For a flat variadic implementation, it results in the mangling of N bases of average mangled length M/N instead. As the tuple gets large and the names get long (and mangled names get very long very fast), the recursive implementation turns into a compilation time (and memory!) hog way faster than a flat implementation does. Please let me know if I can help you in any other way.
[Bug c++/62056] Long compile times with large tuples
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056 --- Comment #9 from Manuel López-Ibáñez manu at gcc dot gnu.org --- (In reply to Agustín Bergé from comment #8) Please let me know if I can help you in any other way. Thanks for the detailed explanation. As for the g++ issue, then a minimal testcase that does not include the whole of tuple might be useful to see what could be speed up: See https://gcc.gnu.org/bugs/minimize.html The testcase itself might generate code programmatically using macros, for example, but less code without any headers is easier to analyze. I cannot say if the libstdc++ implementation could be better, but the way to prove it is simply to submit a new implementation and convince the maintainers that it is better with numbers. I'm sure they will be delighted to receive such contribution. But a draft is not enough. The new implementation needs to pass all testcases in the testsuite.
[Bug c++/62056] Long compile times with large tuples
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056 --- Comment #10 from Agustín Bergé kaballo86 at hotmail dot com --- (In reply to Manuel López-Ibáñez from comment #9) I cannot say if the libstdc++ implementation could be better From a compile-time performance point of view it is, check the implementation attached to this issue. but the way to prove it is simply to submit a new implementation and convince the maintainers that it is better with numbers. I'm sure they will be delighted to receive such contribution. But a draft is not enough. The new implementation needs to pass all testcases in the testsuite. Such implementation and numbers were initially attached to this issue. The draft implementation is feature complete (modulo bugs) and is implemented with the minimal possible number of changes from the current implementation. The entire functionality is retained, the layout is switch to left-to-right (instead of reversed), I have not analyzed codegen. I was directly instructed by libstdc++ developers to submit this issue, I was told they were not aware of the poor compile-time performance of tuple. I submitted the issue as instructed. I decided to add the proof-of-concept implementation so that people could understand the implications by simply replacing a header. I added the analysis of the underlying issue upon your request. I am not interested in pursuing this further without as much as a hint that the compile-time performance issue would even be considered. Remember, this is QoI, so completely ignoring this issue is a reasonable resolution. After all, this is a known issue with known solutions (or I should say workarounds, as the underlying issue is not addressed), and there are other concerns to keep in mind besides compile-time performance.
[Bug c++/62056] Long compile times with large tuples
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056 --- Comment #3 from Piotr Dziwinski piotrdz at gmail dot com --- (In reply to Jonathan Wakely from comment #2) (In reply to Piotr Dziwinski from comment #1) I would also second the proposal to fix this issue by implementing flat version of std::tuple. Perhaps the existing std::tr1::tuple implementation can be re-used here? GCC's std::tr1:tuple is not a flat implementation, and does not conform to the C++11 requirements, so no, that's not an option. Oh, you're right - somehow I got convinced that std::tr1::tuple was a flat implementation since it compiles so much faster. But this raises a question - why do the two recursive implementations have such different compile times? Perhaps by analyzing the differences between std::tr1::tuple and std::tuple, we can learn something to our advantage?
[Bug c++/62056] Long compile times with large tuples
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056 --- Comment #4 from Jonathan Wakely redi at gcc dot gnu.org --- tr1::tuple doesn't support perfect-forwarding or move semantics tr1::tuple doesn't support uses-allocator construction tr1::tuple doesn't support 'final' classes tr1::tuple doesn't have correct exception specifications tr1::tuple doesn't prevent implicit conversions that would use explicit constructors tr1::tuple doesn't support tuple concatenation If you can add all those features to the tr1/tuple implementation so that it meets the C++11 requirements and it still compiles faster then I'd be interested in your analysis of the remaining differences. Otherwise I'm going to assume the difference is because the tuple header contains more than twice as many lines of code and several additional features.