[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)
mclow.lists accepted this revision. mclow.lists added inline comments. This revision is now accepted and ready to land. Comment at: include/__hash_table:140 +return (__n > 1) ? (size_t(1) << (std::numeric_limits::digits - __clz(__n-1))) : __n; } I turned the condition around, and commtted this as r304617: return __n < 2 ? __n : (size_t(1) << (std::numeric_limits::digits - __clz(__n-1))); https://reviews.llvm.org/D33588 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)
vsk added inline comments. Comment at: include/__hash_table:139 { -return size_t(1) << (std::numeric_limits::digits - __clz(__n-1)); +return (__n > 1) ? (size_t(1) << (std::numeric_limits::digits - __clz(__n-1))) : __n; } EricWF wrote: > Shouldn't this return `__n + 1` when `__n <= 1`, or even 2 in both cases? I thought "next_hash_pow2(n)" meant "a hash based on n or the first power-of-two GTE n", but I suppose it's actually "first power-of-two GTE than n, for hash tables". In this case, returning `1` when `__n <= 1` makes the most sense to me, since it's the first power of two GTE 0 and 1. What do you think? https://reviews.llvm.org/D33588 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)
EricWF added inline comments. Comment at: include/__hash_table:139 { -return size_t(1) << (std::numeric_limits::digits - __clz(__n-1)); +return (__n > 1) ? (size_t(1) << (std::numeric_limits::digits - __clz(__n-1))) : __n; } Shouldn't this return `__n + 1` when `__n <= 1`, or even 2 in both cases? https://reviews.llvm.org/D33588 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)
vsk updated this revision to Diff 100461. vsk edited the summary of this revision. vsk added a comment. Thanks @EricWF for pointing me to the right place to add a test. I've tried to follow the style used by the existing tests. PTAL. https://reviews.llvm.org/D33588 Files: include/__hash_table test/libcxx/containers/unord/next_pow2.pass.cpp Index: test/libcxx/containers/unord/next_pow2.pass.cpp === --- /dev/null +++ test/libcxx/containers/unord/next_pow2.pass.cpp @@ -0,0 +1,80 @@ +//===--===// +// +// The LLVM Compiler Infrastructure +// +// This file is dual licensed under the MIT and the University of Illinois Open +// Source Licenses. See LICENSE.TXT for details. +// +//===--===// +// +// REQUIRES: long_tests + +// Not a portable test + +// <__hash_table> + +// size_t __next_hash_pow2(size_t n); + +// If n <= 1, return n. If n is a power of 2, return n. +// Otherwise, return the next power of 2. + +#include <__hash_table> +#include +#include + +#include + +bool +is_power_of_two(unsigned long n) +{ +return __builtin_popcount(n) == 1; +} + +void +test_next_pow2() +{ +assert(!is_power_of_two(0)); +assert(is_power_of_two(1)); +assert(is_power_of_two(2)); +assert(!is_power_of_two(3)); + +assert(std::__next_hash_pow2(0) == 0); +assert(std::__next_hash_pow2(1) == 1); + +for (std::size_t n = 2; n < (sizeof(std::size_t) * 8 - 1); ++n) +{ +std::size_t pow2 = 1ULL << n; +assert(std::__next_hash_pow2(pow2) == pow2); +} + +for (std::size_t n : {3, 7, 9, 15, 127, 129}) +{ +std::size_t npow2 = std::__next_hash_pow2(n); +assert(is_power_of_two(npow2) && npow2 > n); +} +} + +// Note: this is only really useful when run with -fsanitize=undefined. +void +fuzz_unordered_map_reserve(unsigned num_inserts, + unsigned num_reserve1, + unsigned num_reserve2) +{ +std::unordered_mapm; +m.reserve(num_reserve1); +for (unsigned I = 0; I < num_inserts; ++I) m[I] = 0; +m.reserve(num_reserve2); +assert(m.bucket_count() >= num_reserve2); +} + +int main() +{ +test_next_pow2(); + +for (unsigned num_inserts = 0; num_inserts <= 64; ++num_inserts) +for (unsigned num_reserve1 = 1; num_reserve1 <= 64; ++num_reserve1) +for (unsigned num_reserve2 = 1; num_reserve2 <= 64; ++num_reserve2) +fuzz_unordered_map_reserve(num_inserts, num_reserve1, num_reserve2); + +return 0; +} Index: include/__hash_table === --- include/__hash_table +++ include/__hash_table @@ -136,7 +136,7 @@ size_t __next_hash_pow2(size_t __n) { -return size_t(1) << (std::numeric_limits::digits - __clz(__n-1)); +return (__n > 1) ? (size_t(1) << (std::numeric_limits::digits - __clz(__n-1))) : __n; } Index: test/libcxx/containers/unord/next_pow2.pass.cpp === --- /dev/null +++ test/libcxx/containers/unord/next_pow2.pass.cpp @@ -0,0 +1,80 @@ +//===--===// +// +// The LLVM Compiler Infrastructure +// +// This file is dual licensed under the MIT and the University of Illinois Open +// Source Licenses. See LICENSE.TXT for details. +// +//===--===// +// +// REQUIRES: long_tests + +// Not a portable test + +// <__hash_table> + +// size_t __next_hash_pow2(size_t n); + +// If n <= 1, return n. If n is a power of 2, return n. +// Otherwise, return the next power of 2. + +#include <__hash_table> +#include +#include + +#include + +bool +is_power_of_two(unsigned long n) +{ +return __builtin_popcount(n) == 1; +} + +void +test_next_pow2() +{ +assert(!is_power_of_two(0)); +assert(is_power_of_two(1)); +assert(is_power_of_two(2)); +assert(!is_power_of_two(3)); + +assert(std::__next_hash_pow2(0) == 0); +assert(std::__next_hash_pow2(1) == 1); + +for (std::size_t n = 2; n < (sizeof(std::size_t) * 8 - 1); ++n) +{ +std::size_t pow2 = 1ULL << n; +assert(std::__next_hash_pow2(pow2) == pow2); +} + +for (std::size_t n : {3, 7, 9, 15, 127, 129}) +{ +std::size_t npow2 = std::__next_hash_pow2(n); +assert(is_power_of_two(npow2) && npow2 > n); +} +} + +// Note: this is only really useful when run with -fsanitize=undefined. +void +fuzz_unordered_map_reserve(unsigned num_inserts, + unsigned num_reserve1, + unsigned num_reserve2) +{ +std::unordered_map m; +m.reserve(num_reserve1); +for
[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)
EricWF added a comment. @vsk: I would include your fuzzing test in this patch. Simply put it somewhere under `test/libcxx/containers/unordered`. https://reviews.llvm.org/D33588 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)
vsk added a comment. In https://reviews.llvm.org/D33588#765768, @mclow.lists wrote: > I can reproduce this, but I'd rather figure out why we're calling > `__next_hash_pow2(0)` or `(1)` before deciding how to fix it. It looks like we hit the UB while attempting to shrink a hash table during a rehash. If the current bucket count is a power of two, we try and find a smaller bucket count (also a power of two) large enough to accommodate all entries in the table. The argument passed in to next_hash_pow2 from hash_table::rehash is `__n = size_t(ceil(float(size()) / max_load_factor()))`. I think `__n = 0` if the table is empty. And `__n = 1` when the maximum load factor is (roughly) equal to the table's size. As an alternate fix, it might be worth considering changing the rehashing algorithm. But I'd like to start with a more conservative fix for the UB issue, at least initially. https://reviews.llvm.org/D33588 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)
mclow.lists added a comment. I can reproduce this, but I'd rather figure out why we're calling `__next_hash_pow2(0)` or `(1)` before deciding how to fix it. https://reviews.llvm.org/D33588 ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits
[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)
vsk created this revision. There are two sources of undefined behavior in __next_hash_pow2: an invalid shift (occurs when __n is 0 or 1), and an invalid call to CLZ (when __n=1). This patch corrects both issues. It's NFC for all values of __n which do not trigger UB, and leads to defined results when __n is 0 or 1. Minimal reproducer: unordered_mapm; m.reserve(4); m.reserve(1); I wrote a miniature 'fuzzer' to test this change and ran it with UBSan enabled. AFAIK, this is the best we can do for a test. I don't know how to write a non-flaky test which would fail without this patch applied. Here is the 'fuzzer' (simply run with -fsanitize=undefined): void fuzzUnorderedMap(unsigned NumInserts, unsigned NumReserve1, unsigned NumReserve2) { unordered_map m; m.reserve(NumReserve1); for (unsigned I = 0; I < NumInserts; ++I) { m[I] = 0; } size_t __n = size_t(ceil(float(m.size()) / m.max_load_factor())); m.reserve(NumReserve2); } ... for (unsigned NumInserts = 0; NumInserts <= 64; ++NumInserts) for (unsigned NumReserve1 = 1; NumReserve1 <= 64; ++NumReserve1) for (unsigned NumReserve2 = 1; NumReserve2 <= 64; ++NumReserve2) fuzzUnorderedMap(NumInserts, NumReserve1, NumReserve2); rdar://problem/32407328 https://reviews.llvm.org/D33588 Files: include/__hash_table Index: include/__hash_table === --- include/__hash_table +++ include/__hash_table @@ -136,7 +136,7 @@ size_t __next_hash_pow2(size_t __n) { -return size_t(1) << (std::numeric_limits::digits - __clz(__n-1)); +return (__n > 1) ? (size_t(1) << (std::numeric_limits::digits - __clz(__n-1))) : __n; } Index: include/__hash_table === --- include/__hash_table +++ include/__hash_table @@ -136,7 +136,7 @@ size_t __next_hash_pow2(size_t __n) { -return size_t(1) << (std::numeric_limits::digits - __clz(__n-1)); +return (__n > 1) ? (size_t(1) << (std::numeric_limits::digits - __clz(__n-1))) : __n; } ___ cfe-commits mailing list cfe-commits@lists.llvm.org http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits