On Tue, Jul 2, 2013 at 7:56 AM, Howard Hinnant <[email protected]> wrote: > On Jul 1, 2013, at 10:13 PM, Richard Smith <[email protected]> wrote: > >> On Mon, Jul 1, 2013 at 4:57 PM, Howard Hinnant <[email protected]> wrote: >>> On Jul 1, 2013, at 6:17 PM, Howard Hinnant <[email protected]> wrote: >>> >>>> Inspired by your approach, I started from scratch and came up with the >>>> enclosed. The compiler will run out of memory before it runs into the >>>> recursive template instantiation limit, and this works with far higher >>>> numbers. It passes all the current intseq tests. >>> >>> I did some timing tests for the compiler using: >>> >>> #include <utility> >>> #include <iostream> >>> >>> int >>> main() >>> { >>> typedef std::make_index_sequence<10000> T; >>> std::cout << typeid(T).name() << "\n"; >>> } >>> >>> and: >>> >>> #include <utility> >>> #include <iostream> >>> >>> int >>> main() >>> { >>> typedef std::make_index_sequence<100000> T; >>> std::cout << typeid(T).name() << "\n"; >>> } >>> >>> These are extremely long sequences! I discovered that if I add to the >>> patch enclosed in my previous email: >>> >>> template <class _Tp, _Tp _Sp, _Tp _Ep> >>> struct __make_integer_sequence_unchecked<_Tp, _Sp, _Ep, 4> >>> { >>> typedef integer_sequence<_Tp, _Sp, _Sp+1, _Sp+2, _Sp+3> type; >>> }; >>> >>> template <class _Tp, _Tp _Sp, _Tp _Ep> >>> struct __make_integer_sequence_unchecked<_Tp, _Sp, _Ep, 3> >>> { >>> typedef integer_sequence<_Tp, _Sp, _Sp+1, _Sp+2> type; >>> }; >>> >>> template <class _Tp, _Tp _Sp, _Tp _Ep> >>> struct __make_integer_sequence_unchecked<_Tp, _Sp, _Ep, 2> >>> { >>> typedef integer_sequence<_Tp, _Sp, _Sp+1> type; >>> }; >>> >>> that there is between a 50% to 100% improvement in compile time, and lower >>> compiler memory usage (unmeasured), with no other difference in behavior. >>> There seems to be a sweet spot somewhere in this neighborhood (3 <= _Diff >>> <= 8) between complexity, added benefit, and even performance with lower >>> numbers. I rather arbitrarily selected 4 as the optimum. >>> >>> However I'm unsure how this compares to Richard's algorithm as I never was >>> able to get that working. Perhaps though if someone does, we could make an >>> informed comparison and choose the best. Regardless this is much better >>> than the algorithm Marshall first posted, which I was responsible for. Its >>> genesis can be found in <__tuple>'s __make_tuple_indices. When we get >>> make_integer_sequence settled, I plan on updating <__tuple> accordingly. >> >> My algorithm is fixed up and attached. I upped the divisor to 8; that >> was the sweet spot in my testing. I also switched to performing the >> pack building in size_t and converting to the sequence's type at the >> end (and thus using explicit specializations rather than partial >> specializations); that made the size_t case about 35% faster, but the >> non-size_t cases about 40% slower (converting from >> integer_sequence<size_t, ...> to integer_sequence<T, ...> is now, >> remarkably, slower than building the integer_sequence<size_t, ...>). >> It's probably possible to tweak this to get the same performance for >> all T, but I've not yet found a trick for that. >> <mis.cc> > > I copy/pasted this in and ran the unit tests and one of the tests caused > clang to consume over a Gb of memory before I aborted the test. I guessed > that the test responsible for this was inputing a negative N. So I renamed > your __make_integer_sequence to __make_integer_sequence_unchecked, and put > checking on top of it and tested again. This is much faster than what I > posted for very long sequences. Looks like 10X faster. > > I went through and macro-protected your suggestion and am posting it back > here for review.
LGTM _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
