On 09/10/16 16:23 +0100, Elliot Goodrich wrote:
Hi,

If we unroll the loop so that we iterate both forwards and backwards,
we can take advantage of memory-level parallelism when chasing
pointers. This means that reverse takes 35% less time when nodes are
randomly scattered in memory and about the same time if nodes are
contiguous.

Further, as our node pointers will never alias, we can interleave the
swaps of the next and previous pointers to remove further data
dependencies. This takes another 5% off the time when nodes are
scattered in memory and takes 20% off when nodes are contiguous.

All in all we save 20%-40% depending on the memory layout.

Nice, thanks for the patch.

Do you have (or are you willing to sign) a copyright assignment for
GCC?

See https://gcc.gnu.org/contribute.html#legal for details.

For future improvement, by passing whether there is an odd or even
number of nodes in the list we can hoist one of the ifs out of the
loop and gain another 5-10% but most likely this is only possible when
_GLIBCXX_USE_CXX11_ABI is defined and size() is O(1). This would bring
the saving to 30%-45%. Is it worth writing a new overload of
_M_reverse which takes the size of the list?

That certainly seems worthwhile. Do we need an overload or can it just
be done with #if? It seems to me we'd either want to use the size, or
not use it, we wouldn't want both versions defined at once. That
suggests #if to me.

Reply via email to