Re: Review Hashtable extract node API
On 6/19/19 12:47 AM, Jonathan Wakely wrote: On 18/06/19 22:42 +0200, François Dumont wrote: On 6/18/19 12:54 PM, Jonathan Wakely wrote: On 18/06/19 07:52 +0200, François Dumont wrote: A small regression noticed while merging. We shouldn't keep on using a moved-from key_type instance. Ok to commit ? Feel free to do it if you prefer, I'll do so at end of Europe day otherwise. diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h index f5809c7443a..7e89e1b44c4 100644 --- a/libstdc++-v3/include/bits/hashtable_policy.h +++ b/libstdc++-v3/include/bits/hashtable_policy.h @@ -743,7 +743,8 @@ namespace __detail std::tuple<>() }; auto __pos - = __h->_M_insert_unique_node(__k, __bkt, __code, __node._M_node); + = __h->_M_insert_unique_node(__h->_M_extract()(__node._M_node->_M_v()), + __bkt, __code, __node._M_node); __node._M_node = nullptr; return __pos->second; } I can't create an example where this causes a problem, because the key passed to _M_insert_unique_node is never used. So it doesn't matter that it's been moved from. So I have to wonder why we just added the key parameter to that function, if it's never used. I think you've been influence by my patch. I was using a "_NodeAccessor" which wasn't giving access to the node without taking owership so I needed to pass the key properly to compute new bucket index in case of rehash. But with your approach this change to the _M_insert_unique_node was simply unecessary so here is a patch to cleanup this part. Ha! I see, thanks. So I should have removed that key_type parameter again after removing the NodeAccessor stuff. Ok to commit ? No, because that would restore the original signature of the _M_insert_unique_node function, but it has changed contract. Old callers who expect that function to delete the node would now leak memory if an exception is thrown. Oh, yes, abi, I tend to forget even if the recent PR 90920 remind me about that, sorry. If we change the contract of the function we need to change its mangled name, so that callers expecting the old contract will not use the new function. I'll think about the best way to do that ... Something like what's attached ? I still use _GLIBCXX_INLINE_VERSION to tag functions that are kept just for abi-compatibility. Ok to commit after having run tests ? François diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index ab579a7059e..2ea75a24f1c 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -693,19 +693,35 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __node_base* _M_get_previous_node(size_type __bkt, __node_base* __n); - // Insert node __n with key __k and hash code __code, in bucket __bkt - // if no rehash (assumes no element with same key already present). + // Insert node __n with hash code __code, in bucket __bkt if no + // rehash (assumes no element with same key already present). // Takes ownership of __n if insertion succeeds, throws otherwise. iterator - _M_insert_unique_node(const key_type& __k, size_type __bkt, - __hash_code __code, __node_type* __n, - size_type __n_elt = 1); + _M_insert_node(true_type, size_type __bkt, __hash_code, + __node_type* __n, size_type __n_elt = 1); + +#if !_GLIBCXX_INLINE_VERSION + // Insert node with hash code __code, in bucket bkt if no rehash (assumes + // no element with its key already present). Take ownership of the node, + // deallocate it on exception. + iterator + _M_insert_unique_node(size_type __bkt, __hash_code __code, + __node_type* __n, size_type __n_elt = 1); +#endif // Insert node __n with key __k and hash code __code. // Takes ownership of __n if insertion succeeds, throws otherwise. iterator - _M_insert_multi_node(__node_type* __hint, const key_type& __k, + _M_insert_node(false_type, __node_type* __hint, + __hash_code __code, __node_type* __n); + +#if !_GLIBCXX_INLINE_VERSION + // Insert node with hash code __code. Take ownership of the node, + // deallocate it on exception. + iterator + _M_insert_multi_node(__node_type* __hint, __hash_code __code, __node_type* __n); +#endif template std::pair @@ -831,7 +847,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION else { __ret.position - = _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr); + = _M_insert_node(true_type{}, __bkt, __code, __nh._M_ptr); __nh._M_ptr = nullptr; __ret.inserted = true; } @@ -851,7 +867,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION const key_type& __k = __nh._M_key(); auto __code = this->_M_hash_code(__k); auto __ret - = _M_insert_multi_node(__hint._M_cur, __k, __code, __nh._M_ptr); + = _M_insert_node(false_type{}, __hint._M_cur, __code, __nh._M_ptr);
Re: Review Hashtable extract node API
On 18/06/19 22:42 +0200, François Dumont wrote: On 6/18/19 12:54 PM, Jonathan Wakely wrote: On 18/06/19 07:52 +0200, François Dumont wrote: A small regression noticed while merging. We shouldn't keep on using a moved-from key_type instance. Ok to commit ? Feel free to do it if you prefer, I'll do so at end of Europe day otherwise. diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h index f5809c7443a..7e89e1b44c4 100644 --- a/libstdc++-v3/include/bits/hashtable_policy.h +++ b/libstdc++-v3/include/bits/hashtable_policy.h @@ -743,7 +743,8 @@ namespace __detail std::tuple<>() }; auto __pos - = __h->_M_insert_unique_node(__k, __bkt, __code, __node._M_node); + = __h->_M_insert_unique_node(__h->_M_extract()(__node._M_node->_M_v()), + __bkt, __code, __node._M_node); __node._M_node = nullptr; return __pos->second; } I can't create an example where this causes a problem, because the key passed to _M_insert_unique_node is never used. So it doesn't matter that it's been moved from. So I have to wonder why we just added the key parameter to that function, if it's never used. I think you've been influence by my patch. I was using a "_NodeAccessor" which wasn't giving access to the node without taking owership so I needed to pass the key properly to compute new bucket index in case of rehash. But with your approach this change to the _M_insert_unique_node was simply unecessary so here is a patch to cleanup this part. Ha! I see, thanks. So I should have removed that key_type parameter again after removing the NodeAccessor stuff. Ok to commit ? No, because that would restore the original signature of the _M_insert_unique_node function, but it has changed contract. Old callers who expect that function to delete the node would now leak memory if an exception is thrown. If we change the contract of the function we need to change its mangled name, so that callers expecting the old contract will not use the new function. I'll think about the best way to do that ...
Re: Review Hashtable extract node API
On 6/18/19 12:54 PM, Jonathan Wakely wrote: On 18/06/19 07:52 +0200, François Dumont wrote: A small regression noticed while merging. We shouldn't keep on using a moved-from key_type instance. Ok to commit ? Feel free to do it if you prefer, I'll do so at end of Europe day otherwise. diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h index f5809c7443a..7e89e1b44c4 100644 --- a/libstdc++-v3/include/bits/hashtable_policy.h +++ b/libstdc++-v3/include/bits/hashtable_policy.h @@ -743,7 +743,8 @@ namespace __detail std::tuple<>() }; auto __pos - = __h->_M_insert_unique_node(__k, __bkt, __code, __node._M_node); + = __h->_M_insert_unique_node(__h->_M_extract()(__node._M_node->_M_v()), + __bkt, __code, __node._M_node); __node._M_node = nullptr; return __pos->second; } I can't create an example where this causes a problem, because the key passed to _M_insert_unique_node is never used. So it doesn't matter that it's been moved from. So I have to wonder why we just added the key parameter to that function, if it's never used. I think you've been influence by my patch. I was using a "_NodeAccessor" which wasn't giving access to the node without taking owership so I needed to pass the key properly to compute new bucket index in case of rehash. But with your approach this change to the _M_insert_unique_node was simply unecessary so here is a patch to cleanup this part. Ok to commit ? As far as I can tell, it would only be used for a non-default range hash function, and I don't care about that. Frankly I find the policy-based _Hashtable completely unmaintainable and I'd gladly get rid of all of it that isn't needed for the std::unordered_xxx containers. The non-standard extensions are not used by anybody, apparently not tested properly (or this regression should have been noticed) and make the code too complicated. I never consider removing this but it would indeed make maintainance easy. I'll add to my TODO. We're adding new parameters that have to be passed around even though they're never used by 99.999% of users. No wonder the code is only fast at -O3. diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index ab579a7059e..41edafaa2e3 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -693,11 +693,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __node_base* _M_get_previous_node(size_type __bkt, __node_base* __n); - // Insert node __n with key __k and hash code __code, in bucket __bkt - // if no rehash (assumes no element with same key already present). + // Insert node __n with hash code __code, in bucket __bkt if no + // rehash (assumes no element with same key already present). // Takes ownership of __n if insertion succeeds, throws otherwise. iterator - _M_insert_unique_node(const key_type& __k, size_type __bkt, + _M_insert_unique_node(size_type __bkt, __hash_code __code, __node_type* __n, size_type __n_elt = 1); @@ -831,7 +831,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION else { __ret.position - = _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr); + = _M_insert_unique_node(__bkt, __code, __nh._M_ptr); __nh._M_ptr = nullptr; __ret.inserted = true; } @@ -918,8 +918,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION if (_M_find_node(__bkt, __k, __code) == nullptr) { auto __nh = __src.extract(__pos); - _M_insert_unique_node(__k, __bkt, __code, __nh._M_ptr, - __n_elt); + _M_insert_unique_node(__bkt, __code, __nh._M_ptr, __n_elt); __nh._M_ptr = nullptr; __n_elt = 1; } @@ -1671,7 +1670,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION return std::make_pair(iterator(__p), false); // Insert the node - auto __pos = _M_insert_unique_node(__k, __bkt, __code, __node._M_node); + auto __pos = _M_insert_unique_node(__bkt, __code, __node._M_node); __node._M_node = nullptr; return { __pos, true }; } @@ -1705,9 +1704,8 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION auto _Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>:: -_M_insert_unique_node(const key_type& __k, size_type __bkt, - __hash_code __code, __node_type* __node, - size_type __n_elt) +_M_insert_unique_node(size_type __bkt, __hash_code __code, + __node_type* __node, size_type __n_elt) -> iterator { const __rehash_state& __saved_state = _M_rehash_policy._M_state(); @@ -1718,7 +1716,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION if (__do_rehash.first) { _M_rehash(__do_rehash.second, __saved_state); - __bkt = _M_bucket_index(__k, __code); + __bkt = _M_bucket_index(this->_M_extract()(__node->_M_v()), __code); } this->_M_store_code(__node, __code); @@ -1804,7 +1802,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
Re: Review Hashtable extract node API
On 18/06/19 07:52 +0200, François Dumont wrote: A small regression noticed while merging. We shouldn't keep on using a moved-from key_type instance. Ok to commit ? Feel free to do it if you prefer, I'll do so at end of Europe day otherwise. diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h index f5809c7443a..7e89e1b44c4 100644 --- a/libstdc++-v3/include/bits/hashtable_policy.h +++ b/libstdc++-v3/include/bits/hashtable_policy.h @@ -743,7 +743,8 @@ namespace __detail std::tuple<>() }; auto __pos - = __h->_M_insert_unique_node(__k, __bkt, __code, __node._M_node); + = __h->_M_insert_unique_node(__h->_M_extract()(__node._M_node->_M_v()), +__bkt, __code, __node._M_node); __node._M_node = nullptr; return __pos->second; } I can't create an example where this causes a problem, because the key passed to _M_insert_unique_node is never used. So it doesn't matter that it's been moved from. So I have to wonder why we just added the key parameter to that function, if it's never used. As far as I can tell, it would only be used for a non-default range hash function, and I don't care about that. Frankly I find the policy-based _Hashtable completely unmaintainable and I'd gladly get rid of all of it that isn't needed for the std::unordered_xxx containers. The non-standard extensions are not used by anybody, apparently not tested properly (or this regression should have been noticed) and make the code too complicated. We're adding new parameters that have to be passed around even though they're never used by 99.999% of users. No wonder the code is only fast at -O3.
Re: Review Hashtable extract node API
A small regression noticed while merging. We shouldn't keep on using a moved-from key_type instance. Ok to commit ? Feel free to do it if you prefer, I'll do so at end of Europe day otherwise. François On 6/17/19 12:28 PM, Jonathan Wakely wrote: On 07/06/19 18:39 +0200, François Dumont wrote: On 6/5/19 6:22 PM, Jonathan Wakely wrote: On 04/06/19 19:19 +0200, François Dumont wrote: Hi Here is a patch to enhance the _Hashtable extract node API and fix a FIXME request. The enhancement to the extract node Api is that extract(const key_type&) do not call extract(const_iterator) anymore. Doing so we had to loop again through bucket nodes to find the previous node to the one to extract. Even if a bucket shall not contain many nodes (in unique key mode) it's easy to avoid it. Nice. To fix the FIXME I introduced a node smart pointer type managing the node lifetime. The node is extracted from this smart pointer only when there can't be any exception raised. In the context of the node extract api the node handle is considered as a smart pointer. So the node handle will remain owner of the node in case of exception when reinserting it, I hope it is the expected behavior. Yes, that's right. I was going to suggest just using the node handle type instead of inventing a new smart pointer, but the handle type uses std::optional so isn't available for C++11/14. I considered it too, or even use shared_ptr but thought it was overkill. * include/bits/hashtable_policy.h (struct _NodeSmartPointer<_NodeAlloc>): New. (_Map_base<>::operator[](const key_type&)): Use latter, adapt. (_Map_base<>::operator[](key_type&&)): Likewise. * include/bits/hashtable.h (_Hashtable<>::__node_sp_t): New. (_Hashtable<>::_M_insert_unique_node(size_type, __hash_code, __node_type*, size_type)): Replace by... (_Hashtable<>::_M_insert_unique_node<_NodeAccessor>(const key_type&, size_type, __hash_code, const _NodeAccessor&, size_type)): ...that. (_Hashtable<>::_M_insert_multi_node(__node_type*, __hash_code, __node_type*)): Replace by... (_Hashtable<>::_M_insert_multi_node<_NodeAccessor>(__node_type*, __hash_code, const _NodeAccessor&)): ...that. (_Hashtable<>::_M_reinsert_node): Adapt. (_Hashtable<>::_M_reinsert_node_multi): Adapt. (_Hashtable<>::_M_extract_node(size_t, __node_base*)): New. (_Hashtable<>::extract(const_iterator)): Use latter. (_Hashtable<>::extract(const _Key&)): Likewise. (_Hashtable<>::_M_merge_unique): Adapt. (_Hashtable<>::_M_emplace<_Args>(true_type, _Args&&...)): Adapt. (_Hashtable<>::_M_emplace<_Args>(const_iterator, false_type, _Args&&...)): Adapt. Tested under Linux x86_64. Ok to commit ? François diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index e2e3f016a35..307865b96bf 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -197,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION using __hash_cached = typename __traits_type::__hash_cached; using __node_type = __detail::_Hash_node<_Value, __hash_cached::value>; using __node_alloc_type = __alloc_rebind<_Alloc, __node_type>; + using __node_sp_t = __detail::_NodeSmartPointer<__node_alloc_type>; using __hashtable_alloc = __detail::_Hashtable_alloc<__node_alloc_type>; @@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __node_base* _M_get_previous_node(size_type __bkt, __node_base* __n); - // Insert node with hash code __code, in bucket bkt if no rehash (assumes - // no element with its key already present). Take ownership of the node, - // deallocate it on exception. + // Insert node with key __k and hash code __code, in bucket __bkt if no + // rehash (assumes no element with its key already present). + template iterator - _M_insert_unique_node(size_type __bkt, __hash_code __code, - __node_type* __n, size_type __n_elt = 1); + _M_insert_unique_node(const key_type& __k, size_type __bkt, + __hash_code __code, const _NodeAccessor&, + size_type __n_elt = 1); - // Insert node with hash code __code. Take ownership of the node, - // deallocate it on exception. + // Insert node with hash code __code. + template iterator - _M_insert_multi_node(__node_type* __hint, - __hash_code __code, __node_type* __n); + _M_insert_multi_node(__node_type* __hint, __hash_code __code, + const _NodeAccessor& __node_accessor); It looks like most times you call these functions you pass an identical lambda expression, but each of those lambda expressions will create a unique type. That means you create different instantiations of the function templates even though they do exactly the same thing. That's just generating multiple copies of identical code. Passing in a function object
Re: Review Hashtable extract node API
On 07/06/19 18:39 +0200, François Dumont wrote: On 6/5/19 6:22 PM, Jonathan Wakely wrote: On 04/06/19 19:19 +0200, François Dumont wrote: Hi Here is a patch to enhance the _Hashtable extract node API and fix a FIXME request. The enhancement to the extract node Api is that extract(const key_type&) do not call extract(const_iterator) anymore. Doing so we had to loop again through bucket nodes to find the previous node to the one to extract. Even if a bucket shall not contain many nodes (in unique key mode) it's easy to avoid it. Nice. To fix the FIXME I introduced a node smart pointer type managing the node lifetime. The node is extracted from this smart pointer only when there can't be any exception raised. In the context of the node extract api the node handle is considered as a smart pointer. So the node handle will remain owner of the node in case of exception when reinserting it, I hope it is the expected behavior. Yes, that's right. I was going to suggest just using the node handle type instead of inventing a new smart pointer, but the handle type uses std::optional so isn't available for C++11/14. I considered it too, or even use shared_ptr but thought it was overkill. * include/bits/hashtable_policy.h (struct _NodeSmartPointer<_NodeAlloc>): New. (_Map_base<>::operator[](const key_type&)): Use latter, adapt. (_Map_base<>::operator[](key_type&&)): Likewise. * include/bits/hashtable.h (_Hashtable<>::__node_sp_t): New. (_Hashtable<>::_M_insert_unique_node(size_type, __hash_code, __node_type*, size_type)): Replace by... (_Hashtable<>::_M_insert_unique_node<_NodeAccessor>(const key_type&, size_type, __hash_code, const _NodeAccessor&, size_type)): ...that. (_Hashtable<>::_M_insert_multi_node(__node_type*, __hash_code, __node_type*)): Replace by... (_Hashtable<>::_M_insert_multi_node<_NodeAccessor>(__node_type*, __hash_code, const _NodeAccessor&)): ...that. (_Hashtable<>::_M_reinsert_node): Adapt. (_Hashtable<>::_M_reinsert_node_multi): Adapt. (_Hashtable<>::_M_extract_node(size_t, __node_base*)): New. (_Hashtable<>::extract(const_iterator)): Use latter. (_Hashtable<>::extract(const _Key&)): Likewise. (_Hashtable<>::_M_merge_unique): Adapt. (_Hashtable<>::_M_emplace<_Args>(true_type, _Args&&...)): Adapt. (_Hashtable<>::_M_emplace<_Args>(const_iterator, false_type, _Args&&...)): Adapt. Tested under Linux x86_64. Ok to commit ? François diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index e2e3f016a35..307865b96bf 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -197,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION using __hash_cached = typename __traits_type::__hash_cached; using __node_type = __detail::_Hash_node<_Value, __hash_cached::value>; using __node_alloc_type = __alloc_rebind<_Alloc, __node_type>; + using __node_sp_t = __detail::_NodeSmartPointer<__node_alloc_type>; using __hashtable_alloc = __detail::_Hashtable_alloc<__node_alloc_type>; @@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __node_base* _M_get_previous_node(size_type __bkt, __node_base* __n); - // Insert node with hash code __code, in bucket bkt if no rehash (assumes - // no element with its key already present). Take ownership of the node, - // deallocate it on exception. + // Insert node with key __k and hash code __code, in bucket __bkt if no + // rehash (assumes no element with its key already present). + template iterator - _M_insert_unique_node(size_type __bkt, __hash_code __code, - __node_type* __n, size_type __n_elt = 1); + _M_insert_unique_node(const key_type& __k, size_type __bkt, + __hash_code __code, const _NodeAccessor&, + size_type __n_elt = 1); - // Insert node with hash code __code. Take ownership of the node, - // deallocate it on exception. + // Insert node with hash code __code. + template iterator - _M_insert_multi_node(__node_type* __hint, - __hash_code __code, __node_type* __n); + _M_insert_multi_node(__node_type* __hint, __hash_code __code, + const _NodeAccessor& __node_accessor); It looks like most times you call these functions you pass an identical lambda expression, but each of those lambda expressions will create a unique type. That means you create different instantiations of the function templates even though they do exactly the same thing. That's just generating multiple copies of identical code. Passing in a function object to provide the node pointer doesn't really seem necessary anyway, so if it results in larger executables it's really not desirable. Maybe one the reasons we have a difference when running performance tests in -03. I don't know why I abused so much of
Re: Review Hashtable extract node API
On 6/5/19 6:22 PM, Jonathan Wakely wrote: On 04/06/19 19:19 +0200, François Dumont wrote: Hi Here is a patch to enhance the _Hashtable extract node API and fix a FIXME request. The enhancement to the extract node Api is that extract(const key_type&) do not call extract(const_iterator) anymore. Doing so we had to loop again through bucket nodes to find the previous node to the one to extract. Even if a bucket shall not contain many nodes (in unique key mode) it's easy to avoid it. Nice. To fix the FIXME I introduced a node smart pointer type managing the node lifetime. The node is extracted from this smart pointer only when there can't be any exception raised. In the context of the node extract api the node handle is considered as a smart pointer. So the node handle will remain owner of the node in case of exception when reinserting it, I hope it is the expected behavior. Yes, that's right. I was going to suggest just using the node handle type instead of inventing a new smart pointer, but the handle type uses std::optional so isn't available for C++11/14. I considered it too, or even use shared_ptr but thought it was overkill. * include/bits/hashtable_policy.h (struct _NodeSmartPointer<_NodeAlloc>): New. (_Map_base<>::operator[](const key_type&)): Use latter, adapt. (_Map_base<>::operator[](key_type&&)): Likewise. * include/bits/hashtable.h (_Hashtable<>::__node_sp_t): New. (_Hashtable<>::_M_insert_unique_node(size_type, __hash_code, __node_type*, size_type)): Replace by... (_Hashtable<>::_M_insert_unique_node<_NodeAccessor>(const key_type&, size_type, __hash_code, const _NodeAccessor&, size_type)): ...that. (_Hashtable<>::_M_insert_multi_node(__node_type*, __hash_code, __node_type*)): Replace by... (_Hashtable<>::_M_insert_multi_node<_NodeAccessor>(__node_type*, __hash_code, const _NodeAccessor&)): ...that. (_Hashtable<>::_M_reinsert_node): Adapt. (_Hashtable<>::_M_reinsert_node_multi): Adapt. (_Hashtable<>::_M_extract_node(size_t, __node_base*)): New. (_Hashtable<>::extract(const_iterator)): Use latter. (_Hashtable<>::extract(const _Key&)): Likewise. (_Hashtable<>::_M_merge_unique): Adapt. (_Hashtable<>::_M_emplace<_Args>(true_type, _Args&&...)): Adapt. (_Hashtable<>::_M_emplace<_Args>(const_iterator, false_type, _Args&&...)): Adapt. Tested under Linux x86_64. Ok to commit ? François diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index e2e3f016a35..307865b96bf 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -197,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION using __hash_cached = typename __traits_type::__hash_cached; using __node_type = __detail::_Hash_node<_Value, __hash_cached::value>; using __node_alloc_type = __alloc_rebind<_Alloc, __node_type>; + using __node_sp_t = __detail::_NodeSmartPointer<__node_alloc_type>; using __hashtable_alloc = __detail::_Hashtable_alloc<__node_alloc_type>; @@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __node_base* _M_get_previous_node(size_type __bkt, __node_base* __n); - // Insert node with hash code __code, in bucket bkt if no rehash (assumes - // no element with its key already present). Take ownership of the node, - // deallocate it on exception. + // Insert node with key __k and hash code __code, in bucket __bkt if no + // rehash (assumes no element with its key already present). + template iterator - _M_insert_unique_node(size_type __bkt, __hash_code __code, - __node_type* __n, size_type __n_elt = 1); + _M_insert_unique_node(const key_type& __k, size_type __bkt, + __hash_code __code, const _NodeAccessor&, + size_type __n_elt = 1); - // Insert node with hash code __code. Take ownership of the node, - // deallocate it on exception. + // Insert node with hash code __code. + template iterator - _M_insert_multi_node(__node_type* __hint, - __hash_code __code, __node_type* __n); + _M_insert_multi_node(__node_type* __hint, __hash_code __code, + const _NodeAccessor& __node_accessor); It looks like most times you call these functions you pass an identical lambda expression, but each of those lambda expressions will create a unique type. That means you create different instantiations of the function templates even though they do exactly the same thing. That's just generating multiple copies of identical code. Passing in a function object to provide the node pointer doesn't really seem necessary anyway, so if it results in larger executables it's really not desirable. Maybe one the reasons we have a difference when running performance tests in -03. I don't know why I abused so much of lambdas when I see your proposal. The
Re: Review Hashtable extract node API
On 05/06/19 20:18 +0100, Jonathan Wakely wrote: On 05/06/19 17:43 +0100, Jonathan Wakely wrote: On 05/06/19 17:22 +0100, Jonathan Wakely wrote: On 04/06/19 19:19 +0200, François Dumont wrote: @@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __node_base* _M_get_previous_node(size_type __bkt, __node_base* __n); - // Insert node with hash code __code, in bucket bkt if no rehash (assumes - // no element with its key already present). Take ownership of the node, - // deallocate it on exception. + // Insert node with key __k and hash code __code, in bucket __bkt if no + // rehash (assumes no element with its key already present). + template iterator - _M_insert_unique_node(size_type __bkt, __hash_code __code, - __node_type* __n, size_type __n_elt = 1); + _M_insert_unique_node(const key_type& __k, size_type __bkt, + __hash_code __code, const _NodeAccessor&, + size_type __n_elt = 1); - // Insert node with hash code __code. Take ownership of the node, - // deallocate it on exception. + // Insert node with hash code __code. + template iterator - _M_insert_multi_node(__node_type* __hint, - __hash_code __code, __node_type* __n); + _M_insert_multi_node(__node_type* __hint, __hash_code __code, +const _NodeAccessor& __node_accessor); It looks like most times you call these functions you pass an identical lambda expression, but each of those lambda expressions will create a unique type. That means you create different instantiations of the function templates even though they do exactly the same thing. That's just generating multiple copies of identical code. Passing in a function object to provide the node pointer doesn't really seem necessary anyway, so if it results in larger executables it's really not desirable. Also I didn't really like the name NodeAccessor. It's not an accessor, because it performs ownership transfer. Invoking __node_accessor() returns a __node_type* by releasing it from the previous owner (by setting the owner's pointer member to null). Passing a const reference to something called NodeAccessor does not make it clear that it performs a mutating operation like that! If the _M_insert_unique_node and _M_insert_multi_node functions did the __node_accessor() call *before* rehashing, and rehashing threw an exception, then they would leak. So it's important that the __node_acessor() call happens at the right time, and so it's important to name it well. In my suggested patch the naming isn't misleading, because we just pass a raw __node_type* and have a new comment saying: // Takes ownership of __n if insertion succeeds, throws otherwise. The function doesn't have a callable with non-local effects that modifies an object outside the function. Because the caller sets the previous owner's pointer to null there's no danger of it happening at the wrong time; it can only happen after the function has returned and ownership transfer has completed. As a further evolution that simplifies some uses of _Scoped_node we could give it a constructor that allocates a node and constructs an element, as in the attached patch. Of course all this code is completely wrong, because it uses raw pointers not the allocator's pointer type. But that's a much bigger problem that needs to be solved separately.
Re: Review Hashtable extract node API
On 05/06/19 17:43 +0100, Jonathan Wakely wrote: On 05/06/19 17:22 +0100, Jonathan Wakely wrote: On 04/06/19 19:19 +0200, François Dumont wrote: @@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __node_base* _M_get_previous_node(size_type __bkt, __node_base* __n); - // Insert node with hash code __code, in bucket bkt if no rehash (assumes - // no element with its key already present). Take ownership of the node, - // deallocate it on exception. + // Insert node with key __k and hash code __code, in bucket __bkt if no + // rehash (assumes no element with its key already present). + template iterator - _M_insert_unique_node(size_type __bkt, __hash_code __code, - __node_type* __n, size_type __n_elt = 1); + _M_insert_unique_node(const key_type& __k, size_type __bkt, + __hash_code __code, const _NodeAccessor&, + size_type __n_elt = 1); - // Insert node with hash code __code. Take ownership of the node, - // deallocate it on exception. + // Insert node with hash code __code. + template iterator - _M_insert_multi_node(__node_type* __hint, - __hash_code __code, __node_type* __n); + _M_insert_multi_node(__node_type* __hint, __hash_code __code, +const _NodeAccessor& __node_accessor); It looks like most times you call these functions you pass an identical lambda expression, but each of those lambda expressions will create a unique type. That means you create different instantiations of the function templates even though they do exactly the same thing. That's just generating multiple copies of identical code. Passing in a function object to provide the node pointer doesn't really seem necessary anyway, so if it results in larger executables it's really not desirable. Also I didn't really like the name NodeAccessor. It's not an accessor, because it performs ownership transfer. Invoking __node_accessor() returns a __node_type* by releasing it from the previous owner (by setting the owner's pointer member to null). Passing a const reference to something called NodeAccessor does not make it clear that it performs a mutating operation like that! If the _M_insert_unique_node and _M_insert_multi_node functions did the __node_accessor() call *before* rehashing, and rehashing threw an exception, then they would leak. So it's important that the __node_acessor() call happens at the right time, and so it's important to name it well. In my suggested patch the naming isn't misleading, because we just pass a raw __node_type* and have a new comment saying: // Takes ownership of __n if insertion succeeds, throws otherwise. The function doesn't have a callable with non-local effects that modifies an object outside the function. Because the caller sets the previous owner's pointer to null there's no danger of it happening at the wrong time; it can only happen after the function has returned and ownership transfer has completed. As a further evolution that simplifies some uses of _Scoped_node we could give it a constructor that allocates a node and constructs an element, as in the attached patch. commit cba81aa3c4b41bef140de30e9651c5134ee2f5ef Author: Jonathan Wakely Date: Wed Jun 5 20:12:01 2019 +0100 with _Scoped_node constructor that creates the node diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index 87a15c8d037..ab579a7059e 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -259,9 +259,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // Simple RAII type for managing a node containing an element struct _Scoped_node { - _Scoped_node(__hashtable_alloc* __h, __node_type* __n) + // Take ownership of a node with a constructed element. + _Scoped_node(__node_type* __n, __hashtable_alloc* __h) : _M_h(__h), _M_node(__n) { } + // Allocate a node and construct an element within it. + template + _Scoped_node(__hashtable_alloc* __h, _Args&&... __args) + : _M_h(__h), + _M_node(__h->_M_allocate_node(std::forward<_Args>(__args)...)) + { } + + // Destroy element and deallocate node. ~_Scoped_node() { if (_M_node) _M_h->_M_deallocate_node(_M_node); }; _Scoped_node(const _Scoped_node&) = delete; @@ -1653,9 +1662,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION -> pair { // First build the node to get access to the hash code - _Scoped_node __node { - this, this->_M_allocate_node(std::forward<_Args>(__args)...) - }; + _Scoped_node __node { this, std::forward<_Args>(__args)... }; const key_type& __k = this->_M_extract()(__node._M_node->_M_v()); __hash_code __code = this->_M_hash_code(__k); size_type __bkt = _M_bucket_index(__k, __code); @@ -1681,9 +1688,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION -> iterator { // First build the node to get its
Re: Review Hashtable extract node API
On 05/06/19 17:22 +0100, Jonathan Wakely wrote: On 04/06/19 19:19 +0200, François Dumont wrote: @@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __node_base* _M_get_previous_node(size_type __bkt, __node_base* __n); - // Insert node with hash code __code, in bucket bkt if no rehash (assumes - // no element with its key already present). Take ownership of the node, - // deallocate it on exception. + // Insert node with key __k and hash code __code, in bucket __bkt if no + // rehash (assumes no element with its key already present). + template iterator - _M_insert_unique_node(size_type __bkt, __hash_code __code, - __node_type* __n, size_type __n_elt = 1); + _M_insert_unique_node(const key_type& __k, size_type __bkt, + __hash_code __code, const _NodeAccessor&, + size_type __n_elt = 1); - // Insert node with hash code __code. Take ownership of the node, - // deallocate it on exception. + // Insert node with hash code __code. + template iterator - _M_insert_multi_node(__node_type* __hint, - __hash_code __code, __node_type* __n); + _M_insert_multi_node(__node_type* __hint, __hash_code __code, +const _NodeAccessor& __node_accessor); It looks like most times you call these functions you pass an identical lambda expression, but each of those lambda expressions will create a unique type. That means you create different instantiations of the function templates even though they do exactly the same thing. That's just generating multiple copies of identical code. Passing in a function object to provide the node pointer doesn't really seem necessary anyway, so if it results in larger executables it's really not desirable. Also I didn't really like the name NodeAccessor. It's not an accessor, because it performs ownership transfer. Invoking __node_accessor() returns a __node_type* by releasing it from the previous owner (by setting the owner's pointer member to null). Passing a const reference to something called NodeAccessor does not make it clear that it performs a mutating operation like that! If the _M_insert_unique_node and _M_insert_multi_node functions did the __node_accessor() call *before* rehashing, and rehashing threw an exception, then they would leak. So it's important that the __node_acessor() call happens at the right time, and so it's important to name it well. In my suggested patch the naming isn't misleading, because we just pass a raw __node_type* and have a new comment saying: // Takes ownership of __n if insertion succeeds, throws otherwise. The function doesn't have a callable with non-local effects that modifies an object outside the function. Because the caller sets the previous owner's pointer to null there's no danger of it happening at the wrong time; it can only happen after the function has returned and ownership transfer has completed.
Re: Review Hashtable extract node API
On 04/06/19 19:19 +0200, François Dumont wrote: Hi Here is a patch to enhance the _Hashtable extract node API and fix a FIXME request. The enhancement to the extract node Api is that extract(const key_type&) do not call extract(const_iterator) anymore. Doing so we had to loop again through bucket nodes to find the previous node to the one to extract. Even if a bucket shall not contain many nodes (in unique key mode) it's easy to avoid it. Nice. To fix the FIXME I introduced a node smart pointer type managing the node lifetime. The node is extracted from this smart pointer only when there can't be any exception raised. In the context of the node extract api the node handle is considered as a smart pointer. So the node handle will remain owner of the node in case of exception when reinserting it, I hope it is the expected behavior. Yes, that's right. I was going to suggest just using the node handle type instead of inventing a new smart pointer, but the handle type uses std::optional so isn't available for C++11/14. * include/bits/hashtable_policy.h (struct _NodeSmartPointer<_NodeAlloc>): New. (_Map_base<>::operator[](const key_type&)): Use latter, adapt. (_Map_base<>::operator[](key_type&&)): Likewise. * include/bits/hashtable.h (_Hashtable<>::__node_sp_t): New. (_Hashtable<>::_M_insert_unique_node(size_type, __hash_code, __node_type*, size_type)): Replace by... (_Hashtable<>::_M_insert_unique_node<_NodeAccessor>(const key_type&, size_type, __hash_code, const _NodeAccessor&, size_type)): ...that. (_Hashtable<>::_M_insert_multi_node(__node_type*, __hash_code, __node_type*)): Replace by... (_Hashtable<>::_M_insert_multi_node<_NodeAccessor>(__node_type*, __hash_code, const _NodeAccessor&)): ...that. (_Hashtable<>::_M_reinsert_node): Adapt. (_Hashtable<>::_M_reinsert_node_multi): Adapt. (_Hashtable<>::_M_extract_node(size_t, __node_base*)): New. (_Hashtable<>::extract(const_iterator)): Use latter. (_Hashtable<>::extract(const _Key&)): Likewise. (_Hashtable<>::_M_merge_unique): Adapt. (_Hashtable<>::_M_emplace<_Args>(true_type, _Args&&...)): Adapt. (_Hashtable<>::_M_emplace<_Args>(const_iterator, false_type, _Args&&...)): Adapt. Tested under Linux x86_64. Ok to commit ? François diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index e2e3f016a35..307865b96bf 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -197,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION using __hash_cached = typename __traits_type::__hash_cached; using __node_type = __detail::_Hash_node<_Value, __hash_cached::value>; using __node_alloc_type = __alloc_rebind<_Alloc, __node_type>; + using __node_sp_t = __detail::_NodeSmartPointer<__node_alloc_type>; using __hashtable_alloc = __detail::_Hashtable_alloc<__node_alloc_type>; @@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __node_base* _M_get_previous_node(size_type __bkt, __node_base* __n); - // Insert node with hash code __code, in bucket bkt if no rehash (assumes - // no element with its key already present). Take ownership of the node, - // deallocate it on exception. + // Insert node with key __k and hash code __code, in bucket __bkt if no + // rehash (assumes no element with its key already present). + template iterator - _M_insert_unique_node(size_type __bkt, __hash_code __code, - __node_type* __n, size_type __n_elt = 1); + _M_insert_unique_node(const key_type& __k, size_type __bkt, + __hash_code __code, const _NodeAccessor&, + size_type __n_elt = 1); - // Insert node with hash code __code. Take ownership of the node, - // deallocate it on exception. + // Insert node with hash code __code. + template iterator - _M_insert_multi_node(__node_type* __hint, - __hash_code __code, __node_type* __n); + _M_insert_multi_node(__node_type* __hint, __hash_code __code, +const _NodeAccessor& __node_accessor); It looks like most times you call these functions you pass an identical lambda expression, but each of those lambda expressions will create a unique type. That means you create different instantiations of the function templates even though they do exactly the same thing. That's just generating multiple copies of identical code. Passing in a function object to provide the node pointer doesn't really seem necessary anyway, so if it results in larger executables it's really not desirable. The attached patch still just passes in a node pointer (which the function takes ownership of, unless it throws). Because the semantics of _M_insert_multi_node change, we need the symbol name to change as well (so old code linking to
Review Hashtable extract node API
Hi Here is a patch to enhance the _Hashtable extract node API and fix a FIXME request. The enhancement to the extract node Api is that extract(const key_type&) do not call extract(const_iterator) anymore. Doing so we had to loop again through bucket nodes to find the previous node to the one to extract. Even if a bucket shall not contain many nodes (in unique key mode) it's easy to avoid it. To fix the FIXME I introduced a node smart pointer type managing the node lifetime. The node is extracted from this smart pointer only when there can't be any exception raised. In the context of the node extract api the node handle is considered as a smart pointer. So the node handle will remain owner of the node in case of exception when reinserting it, I hope it is the expected behavior. * include/bits/hashtable_policy.h (struct _NodeSmartPointer<_NodeAlloc>): New. (_Map_base<>::operator[](const key_type&)): Use latter, adapt. (_Map_base<>::operator[](key_type&&)): Likewise. * include/bits/hashtable.h (_Hashtable<>::__node_sp_t): New. (_Hashtable<>::_M_insert_unique_node(size_type, __hash_code, __node_type*, size_type)): Replace by... (_Hashtable<>::_M_insert_unique_node<_NodeAccessor>(const key_type&, size_type, __hash_code, const _NodeAccessor&, size_type)): ...that. (_Hashtable<>::_M_insert_multi_node(__node_type*, __hash_code, __node_type*)): Replace by... (_Hashtable<>::_M_insert_multi_node<_NodeAccessor>(__node_type*, __hash_code, const _NodeAccessor&)): ...that. (_Hashtable<>::_M_reinsert_node): Adapt. (_Hashtable<>::_M_reinsert_node_multi): Adapt. (_Hashtable<>::_M_extract_node(size_t, __node_base*)): New. (_Hashtable<>::extract(const_iterator)): Use latter. (_Hashtable<>::extract(const _Key&)): Likewise. (_Hashtable<>::_M_merge_unique): Adapt. (_Hashtable<>::_M_emplace<_Args>(true_type, _Args&&...)): Adapt. (_Hashtable<>::_M_emplace<_Args>(const_iterator, false_type, _Args&&...)): Adapt. Tested under Linux x86_64. Ok to commit ? François diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index e2e3f016a35..307865b96bf 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -197,6 +197,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION using __hash_cached = typename __traits_type::__hash_cached; using __node_type = __detail::_Hash_node<_Value, __hash_cached::value>; using __node_alloc_type = __alloc_rebind<_Alloc, __node_type>; + using __node_sp_t = __detail::_NodeSmartPointer<__node_alloc_type>; using __hashtable_alloc = __detail::_Hashtable_alloc<__node_alloc_type>; @@ -669,18 +670,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __node_base* _M_get_previous_node(size_type __bkt, __node_base* __n); - // Insert node with hash code __code, in bucket bkt if no rehash (assumes - // no element with its key already present). Take ownership of the node, - // deallocate it on exception. + // Insert node with key __k and hash code __code, in bucket __bkt if no + // rehash (assumes no element with its key already present). + template iterator - _M_insert_unique_node(size_type __bkt, __hash_code __code, - __node_type* __n, size_type __n_elt = 1); + _M_insert_unique_node(const key_type& __k, size_type __bkt, + __hash_code __code, const _NodeAccessor&, + size_type __n_elt = 1); - // Insert node with hash code __code. Take ownership of the node, - // deallocate it on exception. + // Insert node with hash code __code. + template iterator - _M_insert_multi_node(__node_type* __hint, - __hash_code __code, __node_type* __n); + _M_insert_multi_node(__node_type* __hint, __hash_code __code, + const _NodeAccessor& __node_accessor); template std::pair @@ -805,9 +807,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION } else { - __ret.position - = _M_insert_unique_node(__bkt, __code, __nh._M_ptr); - __nh._M_ptr = nullptr; + __ret.position = _M_insert_unique_node(__k, __bkt, __code, +[&__nh]() +{ return std::exchange(__nh._M_ptr, nullptr); }); __ret.inserted = true; } } @@ -818,33 +820,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION iterator _M_reinsert_node_multi(const_iterator __hint, node_type&& __nh) { - iterator __ret; if (__nh.empty()) - __ret = end(); - else - { + return end(); + __glibcxx_assert(get_allocator() == __nh.get_allocator()); auto __code = this->_M_hash_code(__nh._M_key()); - auto __node = std::exchange(__nh._M_ptr, nullptr); - // FIXME: this deallocates the node on exception. - __ret = _M_insert_multi_node(__hint._M_cur, __code, __node); - } - return __ret; + return _M_insert_multi_node(__hint._M_cur, __code, + [&__nh]() + { return std::exchange(__nh._M_ptr, nullptr); }); } +