On 21/12/2023 22:55, Jonathan Wakely wrote:
I think this should wait for the next stage 1. It's a big patch
affecting the default -std mode (not just experimental C++20/23/26
material), and was first posted after the end of stage 1.
The idea of this patch was in the air far before it but I agree that its form has changed a lot.

Do we really need the changes for versioned namespace? How much
difference does that extra member make to performance, compared with
the version for the default config?

It is huge and demonstrated by the bench results below. You can see that the number of calls to the hash functor is divided by 2. With such a member you only need to compute 1 hash code when inserting in an empty bucket (the perfect hasher case) whereas we currently need 2 to properly maintained the bucket pointing to the before-begin base node.

This is also why I proposed it now as I still hope that the patch to move to cxx11 abi in gnu versioned namespace will be integrated so with its implied version bump.



On Wed, 20 Dec 2023 at 06:10, François Dumont <frs.dum...@gmail.com> wrote:
Here is a new version of this patch.

The previous one had some flaws that were unnoticed by testsuite tests,
only the performance tests were spotting it. So I'm adding checks on the
consistency of the unordered containers in this patch.

I also forget to signal that after this patch gnu versioned namespace
version is supposed to be bump. But I hope it's already the plan because
of the move to the cxx11 abi in this mode.

Note for reviewer, after application of the patch, a 'git diff -b' is
much more readable.

And some benches results:

before:

unordered_set_range_insert.cc-thread    hash code NOT cached 2 X 1000000
inserts individually 19999990 calls      44r   44u    0s 95999760mem    0pf
unordered_set_range_insert.cc-thread    hash code NOT cached 2 X 1000000
inserts in range 20000000 calls      43r   43u    0s 95999760mem    0pf
unordered_set_range_insert.cc-thread    hash code NOT cached 1000000 X
inserts individually 19999990 calls      44r   44u 0s  95999760mem    0pf
unordered_set_range_insert.cc-thread    hash code NOT cached 1000000 X
inserts in range 20000000 calls      43r   43u    0s 95999760mem    0pf
unordered_set_range_insert.cc-thread    hash code cached 2 X 1000000
inserts individually 10000000 calls      30r   30u    0s 111999328mem
0pf
unordered_set_range_insert.cc-thread    hash code cached 2 X 1000000
inserts in range 10000010 calls      33r   32u    0s 111999328mem    0pf
unordered_set_range_insert.cc-thread    hash code cached 1000000 X
inserts individually 10000000 calls      30r   31u    0s 111999328mem
0pf
unordered_set_range_insert.cc-thread    hash code cached 1000000 X
inserts in range 10000010 calls      32r   32u    0s 111999328mem    0pf

after:

unordered_set_range_insert.cc-thread    hash code NOT cached 2 X 1000000
inserts individually 19999990 calls      44r   44u    0s 95999760mem    0pf
unordered_set_range_insert.cc-thread    hash code NOT cached 2 X 1000000
inserts in range 10000020 calls      26r   25u    0s 95999760mem    0pf
unordered_set_range_insert.cc-thread    hash code NOT cached 1000000 X
inserts individually 19999990 calls      43r   44u 0s  95999760mem    0pf
unordered_set_range_insert.cc-thread    hash code NOT cached 1000000 X
inserts in range 10000020 calls      26r   26u    0s 95999760mem    0pf
unordered_set_range_insert.cc-thread    hash code cached 2 X 1000000
inserts individually 10000000 calls      35r   35u    0s 111999328mem
0pf
unordered_set_range_insert.cc-thread    hash code cached 2 X 1000000
inserts in range 10000010 calls      32r   33u    0s 111999328mem    0pf
unordered_set_range_insert.cc-thread    hash code cached 1000000 X
inserts individually 10000000 calls      31r   32u    0s 111999328mem
0pf
unordered_set_range_insert.cc-thread    hash code cached 1000000 X
inserts in range 10000010 calls      31r   31u    0s 111999328mem    0pf


      libstdc++: [_Hashtable] Prefer to insert after last node

      When inserting an element into an empty bucket we currently insert
the new node
      after the before-begin node so in first position. The drawback of
doing this is
      that we are forced to update the bucket that was containing this
before-begin
      node to point to the newly inserted node. To do so we need at best
to do a modulo
      to find this bucket and at worst, when hash code is not cached,
also compute it.

      To avoid this side effect it is better to insert after the last
node. To do so
      we are introducing a helper type _HintPolicy that has 3
resposibilities.

      1. When the gnu versioned namespace is used we add a _M_last member
to _Hashtable,
      _HintPolicy is then in charge of maintaining it. For this purpose
_HintPolicy is
      using the RAII pattern, it resets the _M_last at destruction level.
It also maintain
      its own _M_last, all mutable operations are updating it when needed.

      2. When the gnu versioned namespace is not used _HintPolicy will
still maintain its
      _M_last member using initially the user provided hint if any and if
it is actually
      the container last node that is to say a dereferenceable node with
its next node being
      null. All mutable operations can also update the contextual
_HintPolicy instance
      whenever they detect the last node during their process.

      3. As long as we haven't been able to detect the container last
node, _HintPolicy
      is used to keep a cache of the before-begin bucket index so that we
do not need to
      compute it afterward for example in the context of range insertion.

      libstdc++-v3/ChangeLog:

              * include/bits/hashtable.h
[_GLIBCXX_INLINE_VERSION](_Hashtable<>::_M_last): New, the container
last node.
              (_Hashtable<>::_LastNodeManager): New.
              (_Hashtable<>::_M_get_last(__node_ptr)): New.
(_Hashtable<>::_M_get_last_node_mgr(__node_ptr)): New.
(_Hashtable<>::_M_check_for_last(__node_base_ptr)): New.
              (_Hashtable<>::_M_set_last(__node_ptr)): New.
              (_Hashtable<>::_M_compute_hash_code(__node_ptr, const
key_type&)): Remove.
(_Hashtable<>::operator=(initializer_list<>)): Adapt to instantiate a
_HintPolicy.
              (_Hashtable<>::_M_insert_bucket_begin): Add _HintPolicy&
parameter and use it
              to optimize insertion in an empty bucket.
              (_Hashtable<>::_M_insert_unique_node): Add _HintPolicy&
parameter.
              (_Hashtable<>::_M_insert_multi_node(__node_ptr,
__hash_code, __node_ptr)): Replace by...
(_Hashtable<>::_M_insert_multi_node(_HintPolicy&, const key_type&,
              __node_ptr, __node_ptr)): ... this.
(_Hashtable<>::_M_emplace_unique(_HintPolicy&, _Args&&...)): New.
(_Hashtable<>::_M_emplace_multi(_HintPolicy&, _Args&&...)): New.
              (_Hashtable<>::_M_emplace): Adapt to use latters.
              (_Hashtable<>::_M_insert_unique): Add _HintPolicy& parameter.
              (_Hashtable<>::_M_insert_unique_aux): Add _HintPolicy&
parameter.
              (_Hashtable<>::_M_insert): Adapt to use latter.
              (_Hashtable<>::emplace_hint(const_iterator, _Args&&...)):
Adapt.
              (_hashtable<>::rehash(size_type)): Adapt.
              (_Hashtable<>::_M_reinsert_node(const_iterator, node_type&&)):
              Add hint parameter, adapt to use it for _HintPolicy
instantiation.
              (_Hashtable<>::_M_reinsert_node_multi): Likewise.
              (_Hashtable<>::_M_merge_unique): Adapt.
              (_Hashtable<>::_M_rehash): Add _HintPolicy& parameter.
              (_Hashtable<>::_Hashtable<_InputIte>()): Adapt.
              (_Hashtable<>::_M_assign): Call _M_set_last.
              (_Hashtable<>::_M_reset()): Reset _M_last.
(_Hashtable<>::_M_move_assign(_Hashtable&&, true_type)): Move _M_last.
              (_Hashtable<>(_Hashtable&&, __node_alloc_type&&,
true_type)): Copy _M_last.
              (_Hashtable<>(_Hashtable&&, __node_alloc_type&&,
false_type)): Copy _M_last.
              (_Hashtable<>::_M_insert_range): Adapt.
              (_Hashtable<>::_M_erase): Call _M_check_for_last.
              (_Hashtable<>::erase): Likewise.
              * include/bits/hashtable_policy.h:
              (_Map_base<>::operator[](const key_type&)): Use hint policy.
              (_Map_base<>::operator[](key_type&&)): Likewise.
              (_Insert_base<>::insert(const_iterator, const
value_type&)): Likewise using hint.
              (_Insert_base<>::try_emplace): Likewise.
(_Insert_base<>::_M_insert_range<>(_InputIte, _InputIte, true_type)):
Use hint policy.
(_Insert_base<>::_M_insert_range<>(_InputIte, _InputIte, false_type)):
Use hint policy.
              (_Insert<>::insert(const_iterator, value_type&&)): Likewise.
              * include/bits/unordered_map.h
(unordered_map<>::insert(node_type&&)): Pass cend as
              hint.
              (unordered_map<>::insert(const_iterator, node_type&&)):
Adapt to use hint.
              * include/bits/unordered_set.h
(unordered_set<>::insert(node_type&&)): Pass cend as
              hint.
              (unordered_set<>::insert(const_iterator, node_type&&)):
Adapt to use hint.
              *
testsuite/23_containers/unordered_multimap/insert/hint.cc: Adapt
implementation
              specific tests.
              *
testsuite/23_containers/unordered_map/consistency_check.cc: New test case.
              *
testsuite/23_containers/unordered_multimap/consistency_check.cc: New
test case.
              *
testsuite/23_containers/unordered_multiset/consistency_check.cc: New
test case.
              *
testsuite/23_containers/unordered_set/consistency_check.cc: New test case.
              * testsuite/libstdc++-prettyprinters/cxx11.cc (main): Adapt
expectations on element
              order in unordered_map instance.

Reply via email to