On Jul 1, 2013, at 10:13 PM, Richard Smith <[email protected]> wrote:
> On Mon, Jul 1, 2013 at 4:57 PM, Howard Hinnant <[email protected]> wrote: >> On Jul 1, 2013, at 6:17 PM, Howard Hinnant <[email protected]> wrote: >> >>> Inspired by your approach, I started from scratch and came up with the >>> enclosed. The compiler will run out of memory before it runs into the >>> recursive template instantiation limit, and this works with far higher >>> numbers. It passes all the current intseq tests. >> >> I did some timing tests for the compiler using: >> >> #include <utility> >> #include <iostream> >> >> int >> main() >> { >> typedef std::make_index_sequence<10000> T; >> std::cout << typeid(T).name() << "\n"; >> } >> >> and: >> >> #include <utility> >> #include <iostream> >> >> int >> main() >> { >> typedef std::make_index_sequence<100000> T; >> std::cout << typeid(T).name() << "\n"; >> } >> >> These are extremely long sequences! I discovered that if I add to the patch >> enclosed in my previous email: >> >> template <class _Tp, _Tp _Sp, _Tp _Ep> >> struct __make_integer_sequence_unchecked<_Tp, _Sp, _Ep, 4> >> { >> typedef integer_sequence<_Tp, _Sp, _Sp+1, _Sp+2, _Sp+3> type; >> }; >> >> template <class _Tp, _Tp _Sp, _Tp _Ep> >> struct __make_integer_sequence_unchecked<_Tp, _Sp, _Ep, 3> >> { >> typedef integer_sequence<_Tp, _Sp, _Sp+1, _Sp+2> type; >> }; >> >> template <class _Tp, _Tp _Sp, _Tp _Ep> >> struct __make_integer_sequence_unchecked<_Tp, _Sp, _Ep, 2> >> { >> typedef integer_sequence<_Tp, _Sp, _Sp+1> type; >> }; >> >> that there is between a 50% to 100% improvement in compile time, and lower >> compiler memory usage (unmeasured), with no other difference in behavior. >> There seems to be a sweet spot somewhere in this neighborhood (3 <= _Diff <= >> 8) between complexity, added benefit, and even performance with lower >> numbers. I rather arbitrarily selected 4 as the optimum. >> >> However I'm unsure how this compares to Richard's algorithm as I never was >> able to get that working. Perhaps though if someone does, we could make an >> informed comparison and choose the best. Regardless this is much better >> than the algorithm Marshall first posted, which I was responsible for. Its >> genesis can be found in <__tuple>'s __make_tuple_indices. When we get >> make_integer_sequence settled, I plan on updating <__tuple> accordingly. > > My algorithm is fixed up and attached. I upped the divisor to 8; that > was the sweet spot in my testing. I also switched to performing the > pack building in size_t and converting to the sequence's type at the > end (and thus using explicit specializations rather than partial > specializations); that made the size_t case about 35% faster, but the > non-size_t cases about 40% slower (converting from > integer_sequence<size_t, ...> to integer_sequence<T, ...> is now, > remarkably, slower than building the integer_sequence<size_t, ...>). > It's probably possible to tweak this to get the same performance for > all T, but I've not yet found a trick for that. > <mis.cc> I copy/pasted this in and ran the unit tests and one of the tests caused clang to consume over a Gb of memory before I aborted the test. I guessed that the test responsible for this was inputing a negative N. So I renamed your __make_integer_sequence to __make_integer_sequence_unchecked, and put checking on top of it and tested again. This is much faster than what I posted for very long sequences. Looks like 10X faster. I went through and macro-protected your suggestion and am posting it back here for review. Howard
integer_sequence.patch
Description: Binary data
_______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
