On Jul 1, 2013, at 10:13 PM, Richard Smith <[email protected]> wrote:

> On Mon, Jul 1, 2013 at 4:57 PM, Howard Hinnant <[email protected]> wrote:
>> On Jul 1, 2013, at 6:17 PM, Howard Hinnant <[email protected]> wrote:
>> 
>>> Inspired by your approach, I started from scratch and came up with the 
>>> enclosed.  The compiler will run out of memory before it runs into the 
>>> recursive template instantiation limit, and this works with far higher 
>>> numbers.  It passes all the current intseq tests.
>> 
>> I did some timing tests for the compiler using:
>> 
>> #include <utility>
>> #include <iostream>
>> 
>> int
>> main()
>> {
>>    typedef std::make_index_sequence<10000> T;
>>    std::cout << typeid(T).name() << "\n";
>> }
>> 
>> and:
>> 
>> #include <utility>
>> #include <iostream>
>> 
>> int
>> main()
>> {
>>    typedef std::make_index_sequence<100000> T;
>>    std::cout << typeid(T).name() << "\n";
>> }
>> 
>> These are extremely long sequences!  I discovered that if I add to the patch 
>> enclosed in my previous email:
>> 
>> template <class _Tp, _Tp _Sp, _Tp _Ep>
>> struct __make_integer_sequence_unchecked<_Tp, _Sp, _Ep, 4>
>> {
>>    typedef integer_sequence<_Tp, _Sp, _Sp+1, _Sp+2, _Sp+3> type;
>> };
>> 
>> template <class _Tp, _Tp _Sp, _Tp _Ep>
>> struct __make_integer_sequence_unchecked<_Tp, _Sp, _Ep, 3>
>> {
>>    typedef integer_sequence<_Tp, _Sp, _Sp+1, _Sp+2> type;
>> };
>> 
>> template <class _Tp, _Tp _Sp, _Tp _Ep>
>> struct __make_integer_sequence_unchecked<_Tp, _Sp, _Ep, 2>
>> {
>>    typedef integer_sequence<_Tp, _Sp, _Sp+1> type;
>> };
>> 
>> that there is between a 50% to 100% improvement in compile time, and lower 
>> compiler memory usage (unmeasured), with no other difference in behavior.  
>> There seems to be a sweet spot somewhere in this neighborhood (3 <= _Diff <= 
>> 8) between complexity, added benefit, and even performance with lower 
>> numbers.  I rather arbitrarily selected 4 as the optimum.
>> 
>> However I'm unsure how this compares to Richard's algorithm as I never was 
>> able to get that working.  Perhaps though if someone does, we could make an 
>> informed comparison and choose the best.  Regardless this is much better 
>> than the algorithm Marshall first posted, which I was responsible for.  Its 
>> genesis can be found in <__tuple>'s __make_tuple_indices.  When we get 
>> make_integer_sequence settled, I plan on updating <__tuple> accordingly.
> 
> My algorithm is fixed up and attached. I upped the divisor to 8; that
> was the sweet spot in my testing. I also switched to performing the
> pack building in size_t and converting to the sequence's type at the
> end (and thus using explicit specializations rather than partial
> specializations); that made the size_t case about 35% faster, but the
> non-size_t cases about 40% slower (converting from
> integer_sequence<size_t, ...> to integer_sequence<T, ...> is now,
> remarkably, slower than building the integer_sequence<size_t, ...>).
> It's probably possible to tweak this to get the same performance for
> all T, but I've not yet found a trick for that.
> <mis.cc>

I copy/pasted this in and ran the unit tests and one of the tests caused clang 
to consume over a Gb of memory before I aborted the test.  I guessed that the 
test responsible for this was inputing a negative N.  So I renamed your 
__make_integer_sequence to __make_integer_sequence_unchecked, and put checking 
on top of it and tested again.  This is much faster than what I posted for very 
long sequences.  Looks like 10X faster.

I went through and macro-protected your suggestion and am posting it back here 
for review.

Howard

Attachment: integer_sequence.patch
Description: Binary data

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to