Sorry for the mis-understanding but the core of this patch has already
been committed after your approval here:
https://gcc.gnu.org/ml/libstdc++/2019-05/msg00023.html
It was considered as PR 90277 fix but eventually I also change tests to
make those implementation-details agnostics.
However the performance test didn't make it, I'll re-submit it in
another patch more dedicated to small size optimization.
François
On 5/31/19 1:44 PM, Jonathan Wakely wrote:
Re https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00903.html
On 15/10/18 22:46 +0200, François Dumont wrote:
I started considering PR libstdc++/68303.
First thing was to write a dedicated performance test case, it is the
unordered_small_size.cc I'd like to add with this patch.
Great, more performance tests are always good.
The first runs show a major difference between tr1 and std
implementations, tr1 being much better:
std::tr1::unordered_set<int> without hash code cached: 1st insert
9r 9u 1s 14725920mem 0pf
std::tr1::unordered_set<int> with hash code cached: 1st insert
8r 9u 0s 14719680mem 0pf
std::unordered_set<int> without hash code cached: 1st insert
17r 17u 0s 16640080mem 0pf
std::unordered_set<int> with hash code cached: 1st insert 14r
14u 0s 16638656mem 0pf
I had a look in gdb to find out why and the answer was quite obvious.
For 20 insertions tr1 implementation bucket count goes through [11,
23] whereas for std it is [2, 5, 11, 23], so 2 more expensive rehash.
As unordered containers are dedicated to rather important number of
elements I propose to review the rehash policy with this patch so
that std also starts at 11 on the 1st insertion. After the patch
figures are:
std::tr1::unordered_set<int> without hash code cached: 1st insert
9r 9u 0s 14725920mem 0pf
std::tr1::unordered_set<int> with hash code cached: 1st insert
8r 8u 0s 14719680mem 0pf
std::unordered_set<int> without hash code cached: 1st insert
15r 15u 0s 16640128mem 0pf
std::unordered_set<int> with hash code cached: 1st insert 12r
12u 0s 16638688mem 0pf
OK, that seems worthwhile then.
Moreover I noticed that performance tests are built with -O2, is that
intentional ? The std implementation uses more abstractions than tr1,
looks like building with -O3 optimizes away most of those
abstractions making tr1 and std implementation much closer:
Yes, I think it's intentional that -O2 is used, because I think
that's the most commonly-used optimisation level. We don't want to
std::tr1::unordered_set<int> without hash code cached: 1st insert
2r 1u 1s 14725952mem 0pf
std::tr1::unordered_set<int> with hash code cached: 1st insert
2r 1u 0s 14719536mem 0pf
std::unordered_set<int> without hash code cached: 1st insert
2r 2u 0s 16640064mem 0pf
std::unordered_set<int> with hash code cached: 1st insert 2r
2u 0s 16638608mem 0pf
Hmm, interesting. I wonder if we can determine what is not being
optimized at -O2, and either tweak the code or improve the compiler.
Note that this patch also rework the alternative rehash policy based
on powers of 2 so that it also starts with a larger number of bucket
(16) and respects LWG2156.
Last I had to wider the memory column so that alignment is preserved
even when memory diff is negative.
Tested under Linux x86_64.
Ok to commit ?
Does this patch still apply cleanly? I think it would be good to
commit it now, early in stage 1.