Re: Hashtable comment cleanups & renamings
On 27/05/19 22:11 +0200, François Dumont wrote: I had miss some occurences of __bucket_hint to replace with __bucket_count_hint so here is a new version. Ok to commit with the simple ChangeLog entry below ? OK for trunk, thanks.
Re: Hashtable comment cleanups & renamings
I had miss some occurences of __bucket_hint to replace with __bucket_count_hint so here is a new version. Ok to commit with the simple ChangeLog entry below ? On 5/21/19 7:42 AM, François Dumont wrote: Here is a simplified form. Rename variables and cleanup comments. * include/bits/hashtable_policy.h * include/bits/hashtable.h Ok to commit ? François On 5/17/19 10:24 PM, Jonathan Wakely wrote: On 17/05/19 18:19 +0200, François Dumont wrote: Hi I got tired of '__n' being used in _Hashtable for many different purposes: node, bucket, bucket count, bucket hint. It makes the code difficult to read. This code makes sure that __n is a node except is some very limited use cases where the method name is clear enough to tell what __n means. So I'd like to commit this patch which only change that and some comments before moving forward to more serious stuff. The only code change is a use of auto return type on _M_allocate_node. My main concern is the ChangeLog entry. Is the following entry ok ? Rename variables and cleanup comments. * include/bits/hashtable_policy.h * include/bits/hashtable.h Tested under Linux x86_64 (even if it can't be otherwise) François @@ -350,24 +347,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_base_alloc() { return *this; } __bucket_type* - _M_allocate_buckets(size_type __n) + _M_allocate_buckets(size_type __bkt_count) This is a much more helpful name, thanks. @@ -439,30 +436,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { } explicit - _Hashtable(size_type __n, + _Hashtable(size_type __bkt_hint, This seems less helpful. Would __num_bkts_hint be clearer? Or for consistency, __bkt_count_hint? @@ -1415,9 +1414,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION -> iterator { __hash_code __code = this->_M_hash_code(__k); - std::size_t __n = _M_bucket_index(__k, __code); - __node_type* __p = _M_find_node(__n, __k, __code); - return __p ? iterator(__p) : end(); + std::size_t __bkt = _M_bucket_index(__k, __code); + __node_type* __n = _M_find_node(__bkt, __k, __code); + return __n ? iterator(__n) : end(); Is __n really an improvement over __p here? If you're changing it, __node or __ptr might be an improvement, but changing __p to __n seems like unnecessary churn. I'm not convinced that __n is a big enough improvement over __p to bother changing dozens of lines, for not much benefit. All those changes will make it slower to use git blame to track down when thigns changed, and will make it harder to review diffs between trunk and older branches. @@ -1479,17 +1478,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION -> pair { __hash_code __code = this->_M_hash_code(__k); - std::size_t __n = _M_bucket_index(__k, __code); - __node_type* __p = _M_find_node(__n, __k, __code); + std::size_t __bkt = _M_bucket_index(__k, __code); + __node_type* __n = _M_find_node(__bkt, __k, __code); - if (__p) + if (__n) { - __node_type* __p1 = __p->_M_next(); - while (__p1 && _M_bucket_index(__p1) == __n - && this->_M_equals(__k, __code, __p1)) - __p1 = __p1->_M_next(); + __node_type* __n1 = __n->_M_next(); __p1 is not a good name, but __n1 is no better. At least with __p the second pointer could be __q, which is a fairly idiomatic pairing of letters :-) How about __first and __last? Or __n and __next? Even __n1 and __n2 seems better than __n and __n1. Those pointers end up being used for the 'first' and 'second' members of a pair, so __n1 and __n2 makes more sense than setting 'first' from __n and 'second' from __n1. But I don't feel strongly about it, so if it's just me who dislikes __n and __n1 then it doesn't matter. diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h index a4d2a97f4f3..bb2e7b762ff 100644 --- a/libstdc++-v3/include/bits/hashtable_policy.h +++ b/libstdc++-v3/include/bits/hashtable_policy.h @@ -181,7 +181,7 @@ namespace __detail * @tparam _Cache_hash_code Boolean value. True if the value of * the hash function is stored along with the value. This is a * time-space tradeoff. Storing it may improve lookup speed by - * reducing the number of times we need to call the _Equal + * reducing the number of times we need to call the _Hash Doesn't it reduce both? In _M_equals we don't bother calling the _Equal predicate if the cached hash code doesn't match the one for the key we're comparing. diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index ab24b5bb537..33711ea4573 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -253,7 +253,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>; - using __reuse_or_alloc_node_type = + using __reuse_or_alloc_node_gen_t =
Re: Hashtable comment cleanups & renamings
Here is a simplified form. Rename variables and cleanup comments. * include/bits/hashtable_policy.h * include/bits/hashtable.h Ok to commit ? François On 5/17/19 10:24 PM, Jonathan Wakely wrote: On 17/05/19 18:19 +0200, François Dumont wrote: Hi I got tired of '__n' being used in _Hashtable for many different purposes: node, bucket, bucket count, bucket hint. It makes the code difficult to read. This code makes sure that __n is a node except is some very limited use cases where the method name is clear enough to tell what __n means. So I'd like to commit this patch which only change that and some comments before moving forward to more serious stuff. The only code change is a use of auto return type on _M_allocate_node. My main concern is the ChangeLog entry. Is the following entry ok ? Rename variables and cleanup comments. * include/bits/hashtable_policy.h * include/bits/hashtable.h Tested under Linux x86_64 (even if it can't be otherwise) François @@ -350,24 +347,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_base_alloc() { return *this; } __bucket_type* - _M_allocate_buckets(size_type __n) + _M_allocate_buckets(size_type __bkt_count) This is a much more helpful name, thanks. @@ -439,30 +436,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { } explicit - _Hashtable(size_type __n, + _Hashtable(size_type __bkt_hint, This seems less helpful. Would __num_bkts_hint be clearer? Or for consistency, __bkt_count_hint? @@ -1415,9 +1414,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION -> iterator { __hash_code __code = this->_M_hash_code(__k); - std::size_t __n = _M_bucket_index(__k, __code); - __node_type* __p = _M_find_node(__n, __k, __code); - return __p ? iterator(__p) : end(); + std::size_t __bkt = _M_bucket_index(__k, __code); + __node_type* __n = _M_find_node(__bkt, __k, __code); + return __n ? iterator(__n) : end(); Is __n really an improvement over __p here? If you're changing it, __node or __ptr might be an improvement, but changing __p to __n seems like unnecessary churn. I'm not convinced that __n is a big enough improvement over __p to bother changing dozens of lines, for not much benefit. All those changes will make it slower to use git blame to track down when thigns changed, and will make it harder to review diffs between trunk and older branches. @@ -1479,17 +1478,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION -> pair { __hash_code __code = this->_M_hash_code(__k); - std::size_t __n = _M_bucket_index(__k, __code); - __node_type* __p = _M_find_node(__n, __k, __code); + std::size_t __bkt = _M_bucket_index(__k, __code); + __node_type* __n = _M_find_node(__bkt, __k, __code); - if (__p) + if (__n) { - __node_type* __p1 = __p->_M_next(); - while (__p1 && _M_bucket_index(__p1) == __n - && this->_M_equals(__k, __code, __p1)) - __p1 = __p1->_M_next(); + __node_type* __n1 = __n->_M_next(); __p1 is not a good name, but __n1 is no better. At least with __p the second pointer could be __q, which is a fairly idiomatic pairing of letters :-) How about __first and __last? Or __n and __next? Even __n1 and __n2 seems better than __n and __n1. Those pointers end up being used for the 'first' and 'second' members of a pair, so __n1 and __n2 makes more sense than setting 'first' from __n and 'second' from __n1. But I don't feel strongly about it, so if it's just me who dislikes __n and __n1 then it doesn't matter. diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h index a4d2a97f4f3..bb2e7b762ff 100644 --- a/libstdc++-v3/include/bits/hashtable_policy.h +++ b/libstdc++-v3/include/bits/hashtable_policy.h @@ -181,7 +181,7 @@ namespace __detail * @tparam _Cache_hash_code Boolean value. True if the value of * the hash function is stored along with the value. This is a * time-space tradeoff. Storing it may improve lookup speed by - * reducing the number of times we need to call the _Equal + * reducing the number of times we need to call the _Hash Doesn't it reduce both? In _M_equals we don't bother calling the _Equal predicate if the cached hash code doesn't match the one for the key we're comparing. diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index ab24b5bb537..444c247a315 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -253,7 +253,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>; - using __reuse_or_alloc_node_type = + using __reuse_or_alloc_node_gen_t = __detail::_ReuseOrAllocNode<__node_alloc_type>; // Metaprogramming for picking apart hash caching. @@ -278,9 +278,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION "Cache the hash code or qualify your
Re: Hashtable comment cleanups & renamings
On 5/17/19 10:24 PM, Jonathan Wakely wrote: On 17/05/19 18:19 +0200, François Dumont wrote: Hi I got tired of '__n' being used in _Hashtable for many different purposes: node, bucket, bucket count, bucket hint. It makes the code difficult to read. This code makes sure that __n is a node except is some very limited use cases where the method name is clear enough to tell what __n means. So I'd like to commit this patch which only change that and some comments before moving forward to more serious stuff. The only code change is a use of auto return type on _M_allocate_node. My main concern is the ChangeLog entry. Is the following entry ok ? Rename variables and cleanup comments. * include/bits/hashtable_policy.h * include/bits/hashtable.h Tested under Linux x86_64 (even if it can't be otherwise) François @@ -350,24 +347,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_base_alloc() { return *this; } __bucket_type* - _M_allocate_buckets(size_type __n) + _M_allocate_buckets(size_type __bkt_count) This is a much more helpful name, thanks. @@ -439,30 +436,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { } explicit - _Hashtable(size_type __n, + _Hashtable(size_type __bkt_hint, This seems less helpful. Would __num_bkts_hint be clearer? Or for consistency, __bkt_count_hint? I thought also about a longer name like this one but then I considered that I didn't want to make it too long and that '__bkt_hint' was enough know that this parameter was related to the buckets. But I can use __bkt_count_hint if you prefer. @@ -1415,9 +1414,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION -> iterator { __hash_code __code = this->_M_hash_code(__k); - std::size_t __n = _M_bucket_index(__k, __code); - __node_type* __p = _M_find_node(__n, __k, __code); - return __p ? iterator(__p) : end(); + std::size_t __bkt = _M_bucket_index(__k, __code); + __node_type* __n = _M_find_node(__bkt, __k, __code); + return __n ? iterator(__n) : end(); Is __n really an improvement over __p here? Outside any context no but within _Hashtable implementation __n is already used most of the time to indicate a node. This is patch just try to fix this 'most of the time' part. If you're changing it, __node or __ptr might be an improvement, but changing __p to __n seems like unnecessary churn. I'm not convinced that __n is a big enough improvement over __p to bother changing dozens of lines, for not much benefit. All those changes will make it slower to use git blame to track down when thigns changed, and will make it harder to review diffs between trunk and older branches. Yes, this is why I wanted to commit it outside of any real change so that this commit can be ignore from any git log or blame more easily. So I can limit the patch to renaming __n occurences into __bkt, __bkt_count_hint, __bkt_count when possible and leave other __n but also __p & al untouch. @@ -1479,17 +1478,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION -> pair { __hash_code __code = this->_M_hash_code(__k); - std::size_t __n = _M_bucket_index(__k, __code); - __node_type* __p = _M_find_node(__n, __k, __code); + std::size_t __bkt = _M_bucket_index(__k, __code); + __node_type* __n = _M_find_node(__bkt, __k, __code); - if (__p) + if (__n) { - __node_type* __p1 = __p->_M_next(); - while (__p1 && _M_bucket_index(__p1) == __n - && this->_M_equals(__k, __code, __p1)) - __p1 = __p1->_M_next(); + __node_type* __n1 = __n->_M_next(); __p1 is not a good name, but __n1 is no better. At least with __p the second pointer could be __q, which is a fairly idiomatic pairing of letters :-) How about __first and __last? Or __n and __next? Even __n1 and __n2 seems better than __n and __n1. Those pointers end up being used for the 'first' and 'second' members of a pair, so __n1 and __n2 makes more sense than setting 'first' from __n and 'second' from __n1. But I don't feel strongly about it, so if it's just me who dislikes __n and __n1 then it doesn't matter. diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h index a4d2a97f4f3..bb2e7b762ff 100644 --- a/libstdc++-v3/include/bits/hashtable_policy.h +++ b/libstdc++-v3/include/bits/hashtable_policy.h @@ -181,7 +181,7 @@ namespace __detail * @tparam _Cache_hash_code Boolean value. True if the value of * the hash function is stored along with the value. This is a * time-space tradeoff. Storing it may improve lookup speed by - * reducing the number of times we need to call the _Equal + * reducing the number of times we need to call the _Hash Doesn't it reduce both? In _M_equals we don't bother calling the _Equal predicate if the cached hash code doesn't match the one for the key we're comparing. Sure it reduces both, I'll just add _Hash (but first)
Re: Hashtable comment cleanups & renamings
On 17/05/19 18:19 +0200, François Dumont wrote: Hi I got tired of '__n' being used in _Hashtable for many different purposes: node, bucket, bucket count, bucket hint. It makes the code difficult to read. This code makes sure that __n is a node except is some very limited use cases where the method name is clear enough to tell what __n means. So I'd like to commit this patch which only change that and some comments before moving forward to more serious stuff. The only code change is a use of auto return type on _M_allocate_node. My main concern is the ChangeLog entry. Is the following entry ok ? Rename variables and cleanup comments. * include/bits/hashtable_policy.h * include/bits/hashtable.h Tested under Linux x86_64 (even if it can't be otherwise) François @@ -350,24 +347,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_base_alloc() { return *this; } __bucket_type* - _M_allocate_buckets(size_type __n) + _M_allocate_buckets(size_type __bkt_count) This is a much more helpful name, thanks. @@ -439,30 +436,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { } explicit - _Hashtable(size_type __n, + _Hashtable(size_type __bkt_hint, This seems less helpful. Would __num_bkts_hint be clearer? Or for consistency, __bkt_count_hint? @@ -1415,9 +1414,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION -> iterator { __hash_code __code = this->_M_hash_code(__k); - std::size_t __n = _M_bucket_index(__k, __code); - __node_type* __p = _M_find_node(__n, __k, __code); - return __p ? iterator(__p) : end(); + std::size_t __bkt = _M_bucket_index(__k, __code); + __node_type* __n = _M_find_node(__bkt, __k, __code); + return __n ? iterator(__n) : end(); Is __n really an improvement over __p here? If you're changing it, __node or __ptr might be an improvement, but changing __p to __n seems like unnecessary churn. I'm not convinced that __n is a big enough improvement over __p to bother changing dozens of lines, for not much benefit. All those changes will make it slower to use git blame to track down when thigns changed, and will make it harder to review diffs between trunk and older branches. @@ -1479,17 +1478,17 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION -> pair { __hash_code __code = this->_M_hash_code(__k); - std::size_t __n = _M_bucket_index(__k, __code); - __node_type* __p = _M_find_node(__n, __k, __code); + std::size_t __bkt = _M_bucket_index(__k, __code); + __node_type* __n = _M_find_node(__bkt, __k, __code); - if (__p) + if (__n) { - __node_type* __p1 = __p->_M_next(); - while (__p1 && _M_bucket_index(__p1) == __n -&& this->_M_equals(__k, __code, __p1)) - __p1 = __p1->_M_next(); + __node_type* __n1 = __n->_M_next(); __p1 is not a good name, but __n1 is no better. At least with __p the second pointer could be __q, which is a fairly idiomatic pairing of letters :-) How about __first and __last? Or __n and __next? Even __n1 and __n2 seems better than __n and __n1. Those pointers end up being used for the 'first' and 'second' members of a pair, so __n1 and __n2 makes more sense than setting 'first' from __n and 'second' from __n1. But I don't feel strongly about it, so if it's just me who dislikes __n and __n1 then it doesn't matter. diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h index a4d2a97f4f3..bb2e7b762ff 100644 --- a/libstdc++-v3/include/bits/hashtable_policy.h +++ b/libstdc++-v3/include/bits/hashtable_policy.h @@ -181,7 +181,7 @@ namespace __detail * @tparam _Cache_hash_code Boolean value. True if the value of * the hash function is stored along with the value. This is a * time-space tradeoff. Storing it may improve lookup speed by - * reducing the number of times we need to call the _Equal + * reducing the number of times we need to call the _Hash Doesn't it reduce both? In _M_equals we don't bother calling the _Equal predicate if the cached hash code doesn't match the one for the key we're comparing.
Hashtable comment cleanups & renamings
Hi I got tired of '__n' being used in _Hashtable for many different purposes: node, bucket, bucket count, bucket hint. It makes the code difficult to read. This code makes sure that __n is a node except is some very limited use cases where the method name is clear enough to tell what __n means. So I'd like to commit this patch which only change that and some comments before moving forward to more serious stuff. The only code change is a use of auto return type on _M_allocate_node. My main concern is the ChangeLog entry. Is the following entry ok ? Rename variables and cleanup comments. * include/bits/hashtable_policy.h * include/bits/hashtable.h Tested under Linux x86_64 (even if it can't be otherwise) François diff --git a/libstdc++-v3/include/bits/hashtable.h b/libstdc++-v3/include/bits/hashtable.h index ab24b5bb537..78e6aeed5b1 100644 --- a/libstdc++-v3/include/bits/hashtable.h +++ b/libstdc++-v3/include/bits/hashtable.h @@ -253,7 +253,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>; - using __reuse_or_alloc_node_type = + using __reuse_or_alloc_node_gen_t = __detail::_ReuseOrAllocNode<__node_alloc_type>; // Metaprogramming for picking apart hash caching. @@ -278,9 +278,6 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION "Cache the hash code or qualify your functors involved" " in hash code and bucket index computation with noexcept"); - // Following two static assertions are necessary to guarantee - // that local_iterator will be default constructible. - // When hash codes are cached local iterator inherits from H2 functor // which must then be default constructible. static_assert(__if_hash_cached>::value, @@ -331,7 +328,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _RehashPolicy _M_rehash_policy; // A single bucket used when only need for 1 bucket. Especially - // interesting in move semantic to leave hashtable with only 1 buckets + // interesting in move semantic to leave hashtable with only 1 bucket // which is not allocated so that we can have those operations noexcept // qualified. // Note that we can't leave hashtable with 0 bucket without adding @@ -350,24 +347,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_base_alloc() { return *this; } __bucket_type* - _M_allocate_buckets(size_type __n) + _M_allocate_buckets(size_type __bkt_count) { - if (__builtin_expect(__n == 1, false)) + if (__builtin_expect(__bkt_count == 1, false)) { _M_single_bucket = nullptr; return &_M_single_bucket; } - return __hashtable_alloc::_M_allocate_buckets(__n); + return __hashtable_alloc::_M_allocate_buckets(__bkt_count); } void - _M_deallocate_buckets(__bucket_type* __bkts, size_type __n) + _M_deallocate_buckets(__bucket_type* __bkts, size_type __bkt_count) { if (_M_uses_single_bucket(__bkts)) return; - __hashtable_alloc::_M_deallocate_buckets(__bkts, __n); + __hashtable_alloc::_M_deallocate_buckets(__bkts, __bkt_count); } void @@ -394,10 +391,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _M_assign(const _Hashtable&, const _NodeGenerator&); void - _M_move_assign(_Hashtable&&, std::true_type); + _M_move_assign(_Hashtable&&, true_type); void - _M_move_assign(_Hashtable&&, std::false_type); + _M_move_assign(_Hashtable&&, false_type); void _M_reset() noexcept; @@ -439,30 +436,31 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION { } explicit - _Hashtable(size_type __n, + _Hashtable(size_type __bkt_hint, const _H1& __hf = _H1(), const key_equal& __eql = key_equal(), const allocator_type& __a = allocator_type()) - : _Hashtable(__n, __hf, _H2(), _Hash(), __eql, + : _Hashtable(__bkt_hint, __hf, _H2(), _Hash(), __eql, __key_extract(), __a) { } template _Hashtable(_InputIterator __f, _InputIterator __l, - size_type __n = 0, + size_type __bkt_hint = 0, const _H1& __hf = _H1(), const key_equal& __eql = key_equal(), const allocator_type& __a = allocator_type()) - : _Hashtable(__f, __l, __n, __hf, _H2(), _Hash(), __eql, + : _Hashtable(__f, __l, __bkt_hint, __hf, _H2(), _Hash(), __eql, __key_extract(), __a) { } _Hashtable(initializer_list __l, - size_type __n = 0, + size_type __bkt_hint = 0, const _H1& __hf = _H1(), const key_equal& __eql = key_equal(), const allocator_type& __a = allocator_type()) - : _Hashtable(__l.begin(), __l.end(), __n, __hf, _H2(), _Hash(), __eql, + : _Hashtable(__l.begin(), __l.end(), __bkt_hint, + __hf, _H2(), _Hash(), __eql, __key_extract(), __a) { } @@ -485,7 +483,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION _Hashtable& operator=(initializer_list __l) { - __reuse_or_alloc_node_type __roan(_M_begin(),