[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)

2017-06-02 Thread Marshall Clow via Phabricator via cfe-commits
mclow.lists accepted this revision.
mclow.lists added inline comments.
This revision is now accepted and ready to land.



Comment at: include/__hash_table:140
+return (__n > 1) ? (size_t(1) << (std::numeric_limits::digits - 
__clz(__n-1))) : __n;
 }
 

I turned the condition around, and commtted this as r304617:

 return __n < 2 ? __n : (size_t(1) << (std::numeric_limits::digits 
- __clz(__n-1)));



https://reviews.llvm.org/D33588



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)

2017-06-01 Thread Vedant Kumar via Phabricator via cfe-commits
vsk added inline comments.



Comment at: include/__hash_table:139
 {
-return size_t(1) << (std::numeric_limits::digits - __clz(__n-1));
+return (__n > 1) ? (size_t(1) << (std::numeric_limits::digits - 
__clz(__n-1))) : __n;
 }

EricWF wrote:
> Shouldn't this return  `__n + 1` when `__n <= 1`, or even 2 in both cases?
I thought "next_hash_pow2(n)" meant "a hash based on n or the first 
power-of-two GTE n", but I suppose it's actually "first power-of-two GTE than 
n, for hash tables". In this case, returning `1` when `__n <= 1` makes the most 
sense to me, since it's the first power of two GTE 0 and 1. What do you think?


https://reviews.llvm.org/D33588



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)

2017-05-31 Thread Eric Fiselier via Phabricator via cfe-commits
EricWF added inline comments.



Comment at: include/__hash_table:139
 {
-return size_t(1) << (std::numeric_limits::digits - __clz(__n-1));
+return (__n > 1) ? (size_t(1) << (std::numeric_limits::digits - 
__clz(__n-1))) : __n;
 }

Shouldn't this return  `__n + 1` when `__n <= 1`, or even 2 in both cases?


https://reviews.llvm.org/D33588



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)

2017-05-26 Thread Vedant Kumar via Phabricator via cfe-commits
vsk updated this revision to Diff 100461.
vsk edited the summary of this revision.
vsk added a comment.

Thanks @EricWF for pointing me to the right place to add a test. I've tried to 
follow the style used by the existing tests. PTAL.


https://reviews.llvm.org/D33588

Files:
  include/__hash_table
  test/libcxx/containers/unord/next_pow2.pass.cpp


Index: test/libcxx/containers/unord/next_pow2.pass.cpp
===
--- /dev/null
+++ test/libcxx/containers/unord/next_pow2.pass.cpp
@@ -0,0 +1,80 @@
+//===--===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is dual licensed under the MIT and the University of Illinois Open
+// Source Licenses. See LICENSE.TXT for details.
+//
+//===--===//
+//
+// REQUIRES: long_tests
+
+// Not a portable test
+
+// <__hash_table>
+
+// size_t __next_hash_pow2(size_t n);
+
+// If n <= 1, return n. If n is a power of 2, return n.
+// Otherwise, return the next power of 2.
+
+#include <__hash_table>
+#include 
+#include 
+
+#include 
+
+bool
+is_power_of_two(unsigned long n)
+{
+return __builtin_popcount(n) == 1;
+}
+
+void
+test_next_pow2()
+{
+assert(!is_power_of_two(0));
+assert(is_power_of_two(1));
+assert(is_power_of_two(2));
+assert(!is_power_of_two(3));
+
+assert(std::__next_hash_pow2(0) == 0);
+assert(std::__next_hash_pow2(1) == 1);
+
+for (std::size_t n = 2; n < (sizeof(std::size_t) * 8 - 1); ++n)
+{
+std::size_t pow2 = 1ULL << n;
+assert(std::__next_hash_pow2(pow2) == pow2);
+}
+
+for (std::size_t n : {3, 7, 9, 15, 127, 129})
+{
+std::size_t npow2 = std::__next_hash_pow2(n);
+assert(is_power_of_two(npow2) && npow2 > n);
+}
+}
+
+// Note: this is only really useful when run with -fsanitize=undefined.
+void
+fuzz_unordered_map_reserve(unsigned num_inserts,
+   unsigned num_reserve1,
+   unsigned num_reserve2)
+{
+std::unordered_map m;
+m.reserve(num_reserve1);
+for (unsigned I = 0; I < num_inserts; ++I) m[I] = 0;
+m.reserve(num_reserve2);
+assert(m.bucket_count() >= num_reserve2);
+}
+
+int main()
+{
+test_next_pow2();
+
+for (unsigned num_inserts = 0; num_inserts <= 64; ++num_inserts)
+for (unsigned num_reserve1 = 1; num_reserve1 <= 64; ++num_reserve1)
+for (unsigned num_reserve2 = 1; num_reserve2 <= 64; ++num_reserve2)
+fuzz_unordered_map_reserve(num_inserts, num_reserve1, 
num_reserve2);
+
+return 0;
+}
Index: include/__hash_table
===
--- include/__hash_table
+++ include/__hash_table
@@ -136,7 +136,7 @@
 size_t
 __next_hash_pow2(size_t __n)
 {
-return size_t(1) << (std::numeric_limits::digits - __clz(__n-1));
+return (__n > 1) ? (size_t(1) << (std::numeric_limits::digits - 
__clz(__n-1))) : __n;
 }
 
 


Index: test/libcxx/containers/unord/next_pow2.pass.cpp
===
--- /dev/null
+++ test/libcxx/containers/unord/next_pow2.pass.cpp
@@ -0,0 +1,80 @@
+//===--===//
+//
+// The LLVM Compiler Infrastructure
+//
+// This file is dual licensed under the MIT and the University of Illinois Open
+// Source Licenses. See LICENSE.TXT for details.
+//
+//===--===//
+//
+// REQUIRES: long_tests
+
+// Not a portable test
+
+// <__hash_table>
+
+// size_t __next_hash_pow2(size_t n);
+
+// If n <= 1, return n. If n is a power of 2, return n.
+// Otherwise, return the next power of 2.
+
+#include <__hash_table>
+#include 
+#include 
+
+#include 
+
+bool
+is_power_of_two(unsigned long n)
+{
+return __builtin_popcount(n) == 1;
+}
+
+void
+test_next_pow2()
+{
+assert(!is_power_of_two(0));
+assert(is_power_of_two(1));
+assert(is_power_of_two(2));
+assert(!is_power_of_two(3));
+
+assert(std::__next_hash_pow2(0) == 0);
+assert(std::__next_hash_pow2(1) == 1);
+
+for (std::size_t n = 2; n < (sizeof(std::size_t) * 8 - 1); ++n)
+{
+std::size_t pow2 = 1ULL << n;
+assert(std::__next_hash_pow2(pow2) == pow2);
+}
+
+for (std::size_t n : {3, 7, 9, 15, 127, 129})
+{
+std::size_t npow2 = std::__next_hash_pow2(n);
+assert(is_power_of_two(npow2) && npow2 > n);
+}
+}
+
+// Note: this is only really useful when run with -fsanitize=undefined.
+void
+fuzz_unordered_map_reserve(unsigned num_inserts,
+   unsigned num_reserve1,
+   unsigned num_reserve2)
+{
+std::unordered_map m;
+m.reserve(num_reserve1);
+for 

[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)

2017-05-26 Thread Eric Fiselier via Phabricator via cfe-commits
EricWF added a comment.

@vsk: I would include your fuzzing test in this patch. Simply put it somewhere 
under `test/libcxx/containers/unordered`.


https://reviews.llvm.org/D33588



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)

2017-05-26 Thread Vedant Kumar via Phabricator via cfe-commits
vsk added a comment.

In https://reviews.llvm.org/D33588#765768, @mclow.lists wrote:

> I can reproduce this, but I'd rather figure out why we're calling 
> `__next_hash_pow2(0)` or `(1)` before deciding how to fix it.


It looks like we hit the UB while attempting to shrink a hash table during a 
rehash. If the current bucket count is a power of two, we try and find a 
smaller bucket count (also a power of two) large enough to accommodate all 
entries in the table.

The argument passed in to next_hash_pow2 from hash_table::rehash is `__n = 
size_t(ceil(float(size()) / max_load_factor()))`. I think `__n = 0` if the 
table is empty. And `__n = 1` when the maximum load factor is (roughly) equal 
to the table's size.

As an alternate fix, it might be worth considering changing the rehashing 
algorithm. But I'd like to start with a more conservative fix for the UB issue, 
at least initially.


https://reviews.llvm.org/D33588



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)

2017-05-26 Thread Marshall Clow via Phabricator via cfe-commits
mclow.lists added a comment.

I can reproduce this, but I'd rather figure out why we're calling 
`__next_hash_pow2(0)` or `(1)` before deciding how to fix it.


https://reviews.llvm.org/D33588



___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits


[PATCH] D33588: Fix two sources of UB in __next_hash_pow2 (from __hash_table)

2017-05-26 Thread Vedant Kumar via Phabricator via cfe-commits
vsk created this revision.

There are two sources of undefined behavior in __next_hash_pow2: an
invalid shift (occurs when __n is 0 or 1), and an invalid call to CLZ
(when __n=1).

This patch corrects both issues. It's NFC for all values of __n which do
not trigger UB, and leads to defined results when __n is 0 or 1.

Minimal reproducer:

  unordered_map m;
  m.reserve(4);
  m.reserve(1);

I wrote a miniature 'fuzzer' to test this change and ran it with UBSan
enabled. AFAIK, this is the best we can do for a test. I don't know how
to write a non-flaky test which would fail without this patch applied.
Here is the 'fuzzer' (simply run with -fsanitize=undefined):

  void fuzzUnorderedMap(unsigned NumInserts, unsigned NumReserve1,
unsigned NumReserve2) {
unordered_map m;
m.reserve(NumReserve1);
for (unsigned I = 0; I < NumInserts; ++I) { m[I] = 0; }
size_t __n = size_t(ceil(float(m.size()) / m.max_load_factor()));
m.reserve(NumReserve2);
  }
  
  ...
  
  for (unsigned NumInserts = 0; NumInserts <= 64; ++NumInserts)
for (unsigned NumReserve1 = 1; NumReserve1 <= 64; ++NumReserve1)
  for (unsigned NumReserve2 = 1; NumReserve2 <= 64; ++NumReserve2)
fuzzUnorderedMap(NumInserts, NumReserve1, NumReserve2);

rdar://problem/32407328


https://reviews.llvm.org/D33588

Files:
  include/__hash_table


Index: include/__hash_table
===
--- include/__hash_table
+++ include/__hash_table
@@ -136,7 +136,7 @@
 size_t
 __next_hash_pow2(size_t __n)
 {
-return size_t(1) << (std::numeric_limits::digits - __clz(__n-1));
+return (__n > 1) ? (size_t(1) << (std::numeric_limits::digits - 
__clz(__n-1))) : __n;
 }
 
 


Index: include/__hash_table
===
--- include/__hash_table
+++ include/__hash_table
@@ -136,7 +136,7 @@
 size_t
 __next_hash_pow2(size_t __n)
 {
-return size_t(1) << (std::numeric_limits::digits - __clz(__n-1));
+return (__n > 1) ? (size_t(1) << (std::numeric_limits::digits - __clz(__n-1))) : __n;
 }
 
 
___
cfe-commits mailing list
cfe-commits@lists.llvm.org
http://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits