[Bug c++/62056] Long compile times with large tuples

2014-10-01 Thread manu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056

Manuel López-Ibáñez manu at gcc dot gnu.org changed:

   What|Removed |Added

 CC||jwakely.gcc at gmail dot com

--- Comment #11 from Manuel López-Ibáñez manu at gcc dot gnu.org ---
Jonathan, what should we do about this? Is this implementation better than the
one in libstdc++?

If so, what steps are needed to get this integrated?

[Bug c++/62056] Long compile times with large tuples

2014-09-30 Thread piotrdz at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056

--- Comment #5 from Piotr Dziwinski piotrdz at gmail dot com ---
(In reply to Jonathan Wakely from comment #4)
 tr1::tuple doesn't support perfect-forwarding or move semantics
 
 tr1::tuple doesn't support uses-allocator construction
 
 tr1::tuple doesn't support 'final' classes
 
 tr1::tuple doesn't have correct exception specifications
 
 tr1::tuple doesn't prevent implicit conversions that would use explicit
 constructors
 
 tr1::tuple doesn't support tuple concatenation
 
 If you can add all those features to the tr1/tuple implementation so that
 it meets the C++11 requirements and it still compiles faster then I'd be
 interested in your analysis of the remaining differences. Otherwise I'm
 going to assume the difference is because the tuple header contains more
 than twice as many lines of code and several additional features.

Ok, I understand it now. I was speaking from only somewhat experienced user
perspective and I did not realise the deeper implications of standard
compliance.

Just for the record, I did some testing and found two important factors here
are:
 - dependency to array (to enable it in tuple concatenation) - (change in
compile time 0.357s - 0.231s),
 - allocator constructors - (change 0.231s - 0.185s).

So in the end please ignore my interruption with `std::tr1::tuple`. It seems
the recursive version of `std::tuple` is not going to be optimized easily and
the new flat implementation is the way to go here.


[Bug c++/62056] Long compile times with large tuples

2014-09-30 Thread kaballo86 at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056

--- Comment #6 from Agustín Bergé kaballo86 at hotmail dot com ---
(In reply to Piotr Dziwinski from comment #5)
 It seems
 the recursive version of `std::tuple` is not going to be optimized easily
 and the new flat implementation is the way to go here.

Well, not necessarily, It's not `std::tuple` which is at fault, but the
compiler (gcc and every other). Note that this issue was moved from libstdc++
to c++, so that's promising!

[Bug c++/62056] Long compile times with large tuples

2014-09-30 Thread manu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056

Manuel López-Ibáñez manu at gcc dot gnu.org changed:

   What|Removed |Added

 Status|UNCONFIRMED |WAITING
   Last reconfirmed||2014-09-30
 CC||manu at gcc dot gnu.org
 Ever confirmed|0   |1

--- Comment #7 from Manuel López-Ibáñez manu at gcc dot gnu.org ---
(In reply to Agustín Bergé from comment #6)
 Well, not necessarily, It's not `std::tuple` which is at fault, but the
 compiler (gcc and every other). Note that this issue was moved from
 libstdc++ to c++, so that's promising!

I don't see any analysis of what the bug actually is. If there is a bug in
g++, you should be able to trigger it with a testcase not including tuple.

On the other hand, if you want to provide an alternative implementation of
tuple, you should join libstc++:
https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html

[Bug c++/62056] Long compile times with large tuples

2014-09-30 Thread kaballo86 at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056

--- Comment #8 from Agustín Bergé kaballo86 at hotmail dot com ---
(In reply to Manuel López-Ibáñez from comment #7)
 (In reply to Agustín Bergé from comment #6)
  Well, not necessarily, It's not `std::tuple` which is at fault, but the
  compiler (gcc and every other). Note that this issue was moved from
  libstdc++ to c++, so that's promising!
 
 I don't see any analysis of what the bug actually is. If there is a bug in
 g++, you should be able to trigger it with a testcase not including tuple.
 
 On the other hand, if you want to provide an alternative implementation of
 tuple, you should join libstc++:
 https://gcc.gnu.org/onlinedocs/libstdc++/manual/appendix_contributing.html

This issue started targeting libstdc++, and later someone reassigned it to c++.
The recursive tuple implementation in libstdc++ causes our use of large tuples
to more than double compilation time when compared to a flat implementation.
Hence the inclusion of tuple.

There is no bug, but a QoI issue that makes `std::tuple` use prohibitive for
large tuples. Whether the QoI issue belongs with libc++ or g++, I am not
entitled to say. The underlying issue is a g++ issue, but there are known
compile-time efficient implementations of `std::tuple`. Of course, a solution
in the compiler would benefit a wider audience so it would be preferable. And
again, no solution would be fine as well given that this is QoI.

As for a bug analysis (which is a requirement that I was unaware of), I
haven't put this particular combination of implementation and compiler under a
profiler. I have done however enough experimentation on similar code bases and
the results are consistent, so while I might be off in the specific details it
should still give you a strong hint of where to look. The issue boils down to
name mangling and lookup: the longer the name the more memory consumption and
cpu usage. For a recursive variadic implementation, instantiating a tuple of N
elements with aggregated mangled length M results in the mangling of N bases
with average mangled length M/2. For a flat variadic implementation, it results
in the mangling of N bases of average mangled length M/N instead. As the tuple
gets large and the names get long (and mangled names get very long very fast),
the recursive implementation turns into a compilation time (and memory!) hog
way faster than a flat implementation does.

Please let me know if I can help you in any other way.

[Bug c++/62056] Long compile times with large tuples

2014-09-30 Thread manu at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056

--- Comment #9 from Manuel López-Ibáñez manu at gcc dot gnu.org ---
(In reply to Agustín Bergé from comment #8)
 Please let me know if I can help you in any other way.

Thanks for the detailed explanation.


As for the g++ issue, then a minimal testcase that does not include the whole
of tuple might be useful to see what could be speed up: See
https://gcc.gnu.org/bugs/minimize.html The testcase itself might generate code
programmatically using macros, for example, but less code without any headers
is easier to analyze.

I cannot say if the libstdc++ implementation could be better, but the way to
prove it is simply to submit a new implementation and convince the maintainers
that it is better with numbers. I'm sure they will be delighted to receive such
contribution. But a draft is not enough. The new implementation needs to pass
all testcases in the testsuite.

[Bug c++/62056] Long compile times with large tuples

2014-09-30 Thread kaballo86 at hotmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056

--- Comment #10 from Agustín Bergé kaballo86 at hotmail dot com ---
(In reply to Manuel López-Ibáñez from comment #9)
 I cannot say if the libstdc++ implementation could be better

From a compile-time performance point of view it is, check the implementation
attached to this issue.

 but the way to
 prove it is simply to submit a new implementation and convince the
 maintainers that it is better with numbers. I'm sure they will be delighted
 to receive such contribution. But a draft is not enough. The new
 implementation needs to pass all testcases in the testsuite.

Such implementation and numbers were initially attached to this issue. The
draft implementation is feature complete (modulo bugs) and is implemented with
the minimal possible number of changes from the current implementation. The
entire functionality is retained, the layout is switch to left-to-right
(instead of reversed), I have not analyzed codegen.

I was directly instructed by libstdc++ developers to submit this issue, I was
told they were not aware of the poor compile-time performance of tuple. I
submitted the issue as instructed. I decided to add the proof-of-concept
implementation so that people could understand the implications by simply
replacing a header. I added the analysis of the underlying issue upon your
request.

I am not interested in pursuing this further without as much as a hint that the
compile-time performance issue would even be considered. Remember, this is QoI,
so completely ignoring this issue is a reasonable resolution. After all, this
is a known issue with known solutions (or I should say workarounds, as the
underlying issue is not addressed), and there are other concerns to keep in
mind besides compile-time performance.

[Bug c++/62056] Long compile times with large tuples

2014-09-29 Thread piotrdz at gmail dot com
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056

--- Comment #3 from Piotr Dziwinski piotrdz at gmail dot com ---
(In reply to Jonathan Wakely from comment #2)
 (In reply to Piotr Dziwinski from comment #1)
  I would also second the proposal to fix this issue by implementing flat
  version of std::tuple. Perhaps the existing std::tr1::tuple implementation
  can be re-used here?
 
 GCC's std::tr1:tuple is not a flat implementation, and does not conform to
 the C++11 requirements, so no, that's not an option.

Oh, you're right - somehow I got convinced that std::tr1::tuple was a flat
implementation since it compiles so much faster. But this raises a question -
why do the two recursive implementations have such different compile times?
Perhaps by analyzing the differences between std::tr1::tuple and std::tuple, we
can learn something to our advantage?


[Bug c++/62056] Long compile times with large tuples

2014-09-29 Thread redi at gcc dot gnu.org
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62056

--- Comment #4 from Jonathan Wakely redi at gcc dot gnu.org ---
tr1::tuple doesn't support perfect-forwarding or move semantics

tr1::tuple doesn't support uses-allocator construction

tr1::tuple doesn't support 'final' classes

tr1::tuple doesn't have correct exception specifications

tr1::tuple doesn't prevent implicit conversions that would use explicit
constructors

tr1::tuple doesn't support tuple concatenation

If you can add all those features to the tr1/tuple implementation so that it
meets the C++11 requirements and it still compiles faster then I'd be
interested in your analysis of the remaining differences. Otherwise I'm going
to assume the difference is because the tuple header contains more than twice
as many lines of code and several additional features.