On Tue, Jul 2, 2013 at 7:56 AM, Howard Hinnant <[email protected]> wrote:
> On Jul 1, 2013, at 10:13 PM, Richard Smith <[email protected]> wrote:
>
>> On Mon, Jul 1, 2013 at 4:57 PM, Howard Hinnant <[email protected]> wrote:
>>> On Jul 1, 2013, at 6:17 PM, Howard Hinnant <[email protected]> wrote:
>>>
>>>> Inspired by your approach, I started from scratch and came up with the 
>>>> enclosed.  The compiler will run out of memory before it runs into the 
>>>> recursive template instantiation limit, and this works with far higher 
>>>> numbers.  It passes all the current intseq tests.
>>>
>>> I did some timing tests for the compiler using:
>>>
>>> #include <utility>
>>> #include <iostream>
>>>
>>> int
>>> main()
>>> {
>>>    typedef std::make_index_sequence<10000> T;
>>>    std::cout << typeid(T).name() << "\n";
>>> }
>>>
>>> and:
>>>
>>> #include <utility>
>>> #include <iostream>
>>>
>>> int
>>> main()
>>> {
>>>    typedef std::make_index_sequence<100000> T;
>>>    std::cout << typeid(T).name() << "\n";
>>> }
>>>
>>> These are extremely long sequences!  I discovered that if I add to the 
>>> patch enclosed in my previous email:
>>>
>>> template <class _Tp, _Tp _Sp, _Tp _Ep>
>>> struct __make_integer_sequence_unchecked<_Tp, _Sp, _Ep, 4>
>>> {
>>>    typedef integer_sequence<_Tp, _Sp, _Sp+1, _Sp+2, _Sp+3> type;
>>> };
>>>
>>> template <class _Tp, _Tp _Sp, _Tp _Ep>
>>> struct __make_integer_sequence_unchecked<_Tp, _Sp, _Ep, 3>
>>> {
>>>    typedef integer_sequence<_Tp, _Sp, _Sp+1, _Sp+2> type;
>>> };
>>>
>>> template <class _Tp, _Tp _Sp, _Tp _Ep>
>>> struct __make_integer_sequence_unchecked<_Tp, _Sp, _Ep, 2>
>>> {
>>>    typedef integer_sequence<_Tp, _Sp, _Sp+1> type;
>>> };
>>>
>>> that there is between a 50% to 100% improvement in compile time, and lower 
>>> compiler memory usage (unmeasured), with no other difference in behavior.  
>>> There seems to be a sweet spot somewhere in this neighborhood (3 <= _Diff 
>>> <= 8) between complexity, added benefit, and even performance with lower 
>>> numbers.  I rather arbitrarily selected 4 as the optimum.
>>>
>>> However I'm unsure how this compares to Richard's algorithm as I never was 
>>> able to get that working.  Perhaps though if someone does, we could make an 
>>> informed comparison and choose the best.  Regardless this is much better 
>>> than the algorithm Marshall first posted, which I was responsible for.  Its 
>>> genesis can be found in <__tuple>'s __make_tuple_indices.  When we get 
>>> make_integer_sequence settled, I plan on updating <__tuple> accordingly.
>>
>> My algorithm is fixed up and attached. I upped the divisor to 8; that
>> was the sweet spot in my testing. I also switched to performing the
>> pack building in size_t and converting to the sequence's type at the
>> end (and thus using explicit specializations rather than partial
>> specializations); that made the size_t case about 35% faster, but the
>> non-size_t cases about 40% slower (converting from
>> integer_sequence<size_t, ...> to integer_sequence<T, ...> is now,
>> remarkably, slower than building the integer_sequence<size_t, ...>).
>> It's probably possible to tweak this to get the same performance for
>> all T, but I've not yet found a trick for that.
>> <mis.cc>
>
> I copy/pasted this in and ran the unit tests and one of the tests caused 
> clang to consume over a Gb of memory before I aborted the test.  I guessed 
> that the test responsible for this was inputing a negative N.  So I renamed 
> your __make_integer_sequence to __make_integer_sequence_unchecked, and put 
> checking on top of it and tested again.  This is much faster than what I 
> posted for very long sequences.  Looks like 10X faster.
>
> I went through and macro-protected your suggestion and am posting it back 
> here for review.

LGTM

_______________________________________________
cfe-commits mailing list
[email protected]
http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits

Reply via email to