subject:"\[Impala\-ASF\-CR\] IMPALA\-4123\: Fast bit unpacking"

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Internal Jenkins (Code Review)

Internal Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4123: Fast bit unpacking
..


IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 32-bit loads, shifts by constants, masks by constants, bitwise
ors when a value straddles 32-bit words and stores. Further speedups
should be possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Reviewed-on: http://gerrit.cloudera.org:8080/4494
Reviewed-by: Tim Armstrong 
Tested-by: Internal Jenkins
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
M be/src/exprs/expr-test.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/bit-util.h
M be/src/util/openssl-util-test.cc
13 files changed, 929 insertions(+), 84 deletions(-)

Approvals:
  Internal Jenkins: Verified
  Tim Armstrong: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Internal Jenkins (Code Review)

Internal Jenkins has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 15: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Internal Jenkins
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Tim Armstrong (Code Review)

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 15:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

Line 132: 
> No seed is the same as a single default seed for this type.
Oh ok, this should be ok too then.


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Tim Armstrong (Code Review)

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 15: Code-Review+2

Carry +2

-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Tim Armstrong (Code Review)

Hello Jim Apple,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/4494

to look at the new patch set (#14).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 32-bit loads, shifts by constants, masks by constants, bitwise
ors when a value straddles 32-bit words and stores. Further speedups
should be possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
M be/src/exprs/expr-test.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/bit-util.h
M be/src/util/openssl-util-test.cc
13 files changed, 929 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/14
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#13).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 32-bit loads, shifts by constants, masks by constants, bitwise
ors when a value straddles 32-bit words and stores. Further speedups
should be possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
M be/src/exprs/expr-test.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/bit-util.h
M be/src/util/openssl-util-test.cc
13 files changed, 929 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/13
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-17 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#12).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 32-bit loads, shifts by constants, masks by constants, bitwise
ors when a value straddles 32-bit words and stores. Further speedups
should be possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
M be/src/exprs/expr-test.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
M be/src/util/bit-util.h
M be/src/util/openssl-util-test.cc
13 files changed, 928 insertions(+), 84 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/12
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-15 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 11:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

Line 57:   constexpr int BATCH_SIZE = 32;
After some reading, I think you can and should throw a static on these 
constexprs.

For instance, 
http://stackoverflow.com/questions/26152096/when-and-why-would-you-use-static-with-constexpr


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-14 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 11:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

Line 19: #include 
cstd...


Line 132:   std::generate(std::begin(in), std::end(in), rand);
rand() is only guaranteed to have 15 non-zero bits - the high 17 bits may all 
be 0 each time.

More modern rand, and also seeded to prevent flaky tests: 

std::mt19937 pseudorandom(0);
std::uniform_int_distribution uniform;
std::generate(std::begin(in), std::end(in),
  [, ]() { return uniform(pseudorandom); });


http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS11, Line 74:   /// Templated versions of UnpackValues() that can be used if 
BIT_WIDTH is a
 :   /// compile-time constant.
Did you check to see that this benchmarks to be actually faster than if 
BIT_WIDTH is not a template parameter?


http://gerrit.cloudera.org:8080/#/c/4494/11/be/src/util/bit-stream-utils.h
File be/src/util/bit-stream-utils.h:

Line 100:   /// 'buffer' is the buffer to read from.  The buffer's length is 
'buffer_len'.
While you're here, can you add "Does not take ownership of the pointer"?


Line 142:   const uint8_t* buffer_;
While you're here, can you DISALLOW_COPY_AND_ASSIGN?


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-13 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#11).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 32-bit loads, shifts by constants, masks by constants, bitwise
ors when a value straddles 32-bit words and stores. Further speedups
should be possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
10 files changed, 886 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/11
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-13 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 10:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/4494/10/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS10, Line 39: it
them


PS10, Line 39: Both
Both buffers


Line 60:   uint8_t* packed = storage.data() + aligned;
If aligned is true, then packed is not aligned


http://gerrit.cloudera.org:8080/#/c/4494/10/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

Line 41: #define UNPACK_VALUES_CASE(ignore1, i, ignore2) \
Here is a way to do this without any macros, but it's a bit heavyweight from a 
syntax POV. I found it was roughly the same speed as the macro approach:

--- a/be/src/util/bit-packing.inline.h
+++ b/be/src/util/bit-packing.inline.h
@@ -31,10 +31,33 @@
 
 namespace impala {
 
+template 
+constexpr std::array), sizeof...(I)> MakeArray(
+std::integer_sequence) {
+  return {::template f...};
+}
+
+template 
+struct BitPackStruct {
+  template 
+  static auto f(const uint8_t* __restrict__ in, int64_t in_bytes, int64_t 
num_values,
+  OutType* __restrict__ out) {
+return BitPacking::UnpackValues(in, in_bytes, num_values, 
out);
+  }
+};
+
 template 
 std::pair BitPacking::UnpackValues(int bit_width,
 const uint8_t* __restrict__ in, int64_t in_bytes, int64_t num_values,
 OutType* __restrict__ out) {
+  static constexpr auto UNPACKERS =
+  MakeArray(std::make_integer_sequence());
+  if (bit_width < 0 || bit_width > 32) {
+DCHECK(false);
+return {nullptr, 0};
+  }
+  return UNPACKERS[bit_width](in, in_bytes, num_values, out);

Similar things can be done below. Line 146 can be done without 
std::make_integer_sequence.


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-13 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#10).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 32-bit loads, shifts by constants, masks by constants, bitwise
ors when a value straddles 32-bit words and stores. Further speedups
should be possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
10 files changed, 884 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/10
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-13 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 9:

> (11 comments)

It looks like you might have missed my comments on Base and PS7 in this group 
of comments.

-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: No

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-12 Thread Tim Armstrong (Code Review)

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 8:

(11 comments)

http://gerrit.cloudera.org:8080/#/c/4494/8//COMMIT_MSG
Commit Message:

PS8, Line 17: 64
> 32, now?
Done


PS8, Line 18: ors at 64-bit boundaries
> What does it mean to have an or at a k-bit boundary?
Reworded to be hopefully clearer.


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

Line 273: #include "common/names.h"
> This isn't needed if using namespace std;, yes?
I think I prefer keeping this and removing the "using namespace" declarations 
below.


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/benchmarks/bswap-benchmark.cc
File be/src/benchmarks/bswap-benchmark.cc:

Line 123
> Thanks for fixing this. When this is committed, can you file a bug against 
Sure, will do.


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/testutil/mem-util.h
File be/src/testutil/mem-util.h:

PS8, Line 31: posix_memalign
> #include stdlib.h. I'd say cstdlib, but posix_memalign doesn't seem to be p
Added cstdlib. I think in practice it's reasonable to assume that cstdlib 
includes the same things as stdlib.h in some form.

The glibc one from the toolchain literally has #include  in it.


Line 45:  private:
> DISALLOW_COPY_AND_ASSIGN
Done


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

Line 39: void UnpackSubset(const uint32_t* in, const uint8_t* packed, int 
num_in_values,
> I think it would aid clarity to add a short description of what the meaning
Added descriptions and elaborated a little on what the function is doing.


PS8, Line 45: uint32_t
> const uint32_t * in
Done


PS8, Line 72: INFO
> Was this just for debugging, or did you mean to leave it in?
Didn't mean to leave it in - removed.


PS8, Line 84: 8
> Why 8 and not 4?
Used CHAR_BIT and added an intermediate variable to make it a bit clearer.


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS8, Line 64: //LOG(INFO) << "bit_width " << BIT_WIDTH << "\n"
:   //  << "in_bytes " << in_bytes << " num_values " << num_values 
<< " max_input_values " << max_input_values << "\n"
:// << "values_to_read " << values_to_read << " batches_to_read 
" << batches_to_read << "\n"
:// << "remainder_values " << remainder_values;
> remove
Oops, sorry missed this.


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-12 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#9).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 32-bit loads, shifts by constants, masks by constants, bitwise
ors when a value straddles 32-bit words and stores. Further speedups
should be possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
10 files changed, 883 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/9
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-11 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 8:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/4494/8//COMMIT_MSG
Commit Message:

PS8, Line 17: 64
32, now?


PS8, Line 18: ors at 64-bit boundaries
What does it mean to have an or at a k-bit boundary?


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

Line 273: #include "common/names.h"
This isn't needed if using namespace std;, yes?


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/benchmarks/bswap-benchmark.cc
File be/src/benchmarks/bswap-benchmark.cc:

Line 123
Thanks for fixing this. When this is committed, can you file a bug against me 
to fix the other posix_memalign locations?


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/testutil/mem-util.h
File be/src/testutil/mem-util.h:

PS8, Line 31: posix_memalign
#include stdlib.h. I'd say cstdlib, but posix_memalign doesn't seem to be part 
of the C standard, so I don't know if it is technically in that header.


Line 45:  private:
DISALLOW_COPY_AND_ASSIGN


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS7, Line 42: values' va
> In practice it should be 64-bit aligned with TCMalloc beyond a certain allo
I think it would help the reader here to spell out that misaligned is with 
respect to 64.


PS7, Line 132: 
> Other ones should work but I don't think improve coverage - can't see a sce
Still, if we're only allowing those two you could switch it to a bool 
'misaligned'


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

Line 39: void UnpackSubset(const uint32_t* in, const uint8_t* packed, int 
num_in_values,
I think it would aid clarity to add a short description of what the meanings 
are of 'in' and 'packed'


PS8, Line 45: uint32_t
const uint32_t * in


PS8, Line 72: INFO
Was this just for debugging, or did you mean to leave it in?


PS8, Line 84: 8
Why 8 and not 4?

Can you add a more exact ASSERT here about num_to_unpack and it's relationship 
with writer.bytes_written()?


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS7, Line 69:   // First unpack as many full batches as possible.
> Yep, that was a bug.
Should there be an additional test that would demonstrate that bug?


http://gerrit.cloudera.org:8080/#/c/4494/8/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS8, Line 64: //LOG(INFO) << "bit_width " << BIT_WIDTH << "\n"
:   //  << "in_bytes " << in_bytes << " num_values " << num_values 
<< " max_input_values " << max_input_values << "\n"
:// << "values_to_read " << values_to_read << " batches_to_read 
" << batches_to_read << "\n"
:// << "remainder_values " << remainder_values;
remove


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-10 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#8).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 64-bit loads, shifts by constants, masks by constants, bitwise
ors at 64-bit boundaries, and stores. Further speedups should be
possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/benchmarks/bswap-benchmark.cc
A be/src/testutil/mem-util.h
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
10 files changed, 879 insertions(+), 39 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/8
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-09 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 7:

(10 comments)

found with help of clang-tidy

http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

PS7, Line 330: int argc, char **argv
unused


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

Line 22: #ifndef IMPALA_UTIL_BIT_PACKING_H
These usually go above the #includes


PS7, Line 48: BitPacking
Why not put this into use in this patch? In other words, why only the 
microbenchmark?


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS7, Line 45: make_
make_ can be omitted.


PS7, Line 90: onstants
constants


PS7, Line 115: else
if branch returns, so you can drop the "else" here.


PS7, Line 118: else
else if branch returns, so you can drop the else here, if you want to. I'm 
ambivalent.


Line 120: DCHECK_LT(VALUE_IDX, 31) << "Trailing bits after last word.";
Can you explain more why VALUE_IDX < 31? Can you add a static_assert at the top 
with the maximum value it could possibly be?


PS7, Line 126: warning
What template values cause that warning to trigger?


PS7, Line 184: BIT_WIDTH
if BIT_WIDTH is 0, I don't think this is technically allowed.


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-07 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 7:

(20 comments)

Thanks for your patience

http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

PS7, Line 336: int64_t
const, or just make this the param to the data constructor and use data.size() 
as the last argument to the params initializer list.


PS7, Line 343: suite.AddBenchmark(Substitute("UnpackScalar", bit_width), 
UnpackBenchmark, );
This line at bit_width 32 is not in the output you display in the comment at 
the top.


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS7, Line 35:  a subset of
: /// 'num_in_values'
I still find this wording confusing - Which is the to buffer, and which the 
from? What does it mean to have a subset of an int?


PS7, Line 42: misaligned
When misalignment is 0, what alignment is the memory guaranteed to have? 0 mod 
64?


PS7, Line 49: uint32_t
const


Line 93: uint32_t* in, uint8_t* packed, int num_in_values, int bit_width, 
int misalignment) {
const T*


PS7, Line 94: int
const


PS7, Line 99: vector
std::aligned_storage?


PS7, Line 102:  sizeof(uint32_t)
Why is this needed?


PS7, Line 132: min(length, 1)
so the only misalignments allowed are 0 and 1? Why so restrictive?


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS2, Line 57: ast one return
> 8 is the smallest number such that (3 * 8) % 8 == 0. I.e. where the last bi
OK, but you said "you have to choose" 8. A batch size of 32 would still work, 
right?


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS7, Line 30: values
'value'


PS7, Line 67:  bits of data
You can elide this phrase


http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS4, Line 125: const uint32_t trailing_bits = in_words[IN_WORD_IDX + 1] % 
> That's true, but there's a downside - if the loads are aligned relative to 
I'm not so sure. Let's consider reading 10 bit ints using 32-bit loads. To 
unpack 16 of them with your method, you need to read 160 bits, which means 5 
(aligned) loads. 12 of the unpacked values can be read without using this third 
(more expensive) branch. In this branch, there are two more ALU ops, so that's 
4*2 = 8 extra ALU ops.

OTOH, using un-aligned loads, you could do only 4 loads and (at bits 0, 24, 48, 
and 80) and no more than two ALU ops for each extraction.


http://gerrit.cloudera.org:8080/#/c/4494/7/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS7, Line 57: const
constexpr BATCH_SIZE


PS7, Line 69: in_bytes -= batch_size * (BIT_WIDTH / CHAR_BIT);
so in_bytes does not move if BIT_WIDTH is 7?


PS7, Line 91: should
Should this be "must", since  VALUE_IDX is now a template param? Not that the 
explanation is wrong, but now you have forced the caller into efficient 
behavior with your choice.


PS7, Line 178: const
constexpr MAX_BATCH_SIZE


PS7, Line 179: int
const


PS7, Line 183:   // Copy into padded temporary buffer to avoid reading past the 
end of 'in'.
Can this be avoided sometimes? For instance, if BIT_WIDTH is 6 and num_values 
is 16, can we just use 'in'?


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-06 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#7).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 64-bit loads, shifts by constants, masks by constants, bitwise
ors at 64-bit boundaries, and stores. Further speedups should be
possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
8 files changed, 807 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/7
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-05 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 4:

(15 comments)

http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

PS4, Line 296: bool
const


PS4, Line 303: UnpackValues
Unpack32Values


PS4, Line 312: + offset
This could go off the end of out_buffer if NUM_OUT_VALUES is not a multiple of 
32, right?


PS4, Line 322: int64_t
const


http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS4, Line 30: compute_mask
CamelCaseName


PS4, Line 35: smaller
smaller than what?


Line 116:   const int num_in_values = 64 * 1024;
constexpr, and all caps name


Line 119: in[i] = rand();
std::generate


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS2, Line 57: ast one return
> A batch size of 32 is the smallest that works for all supported bit widths.
I don't understand yet why you have to choose a batch size of 8 for a bit width 
of 3.


http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS4, Line 66: bytes
bits?


PS4, Line 66: values
values of type OutType


PS4, Line 70: in_bytes
Can you add to the comment a description of how this parameter is used?


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS3, Line 143: undef 
> I'm not quite sure that I understand the intent or how I'd do that, aside f
https://gcc.gnu.org/onlinedocs/gcc/Push_002fPop-Macro-Pragmas.html#Push_002fPop-Macro-Pragmas


http://gerrit.cloudera.org:8080/#/c/4494/4/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS4, Line 125: return lower_bits | (trailing_bits << TRAILING_BITS_SHIFT);
This comes out to two shifts, a bitwise or, and a bitwise and. If you did 
unaligned 64-bit reads to fit the full value inside of a single word, it could 
be done in one shift and one bitwise and, yes?


Line 142:   BOOST_PP_REPEAT_FROM_TO(0, 33, UNPACK_VALUE_CALL, ignore);
So, even after turning this into a templated function, a loop with static 
bounds doesn't cause inlining? Maybe leave a note on why you do it this way, in 
that case.


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-05 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#5).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 64-bit loads, shifts by constants, masks by constants, bitwise
ors at 64-bit boundaries, and stores. Further speedups should be
possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
8 files changed, 799 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/5
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-05 Thread Tim Armstrong (Code Review)

Tim Armstrong has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 2:

(33 comments)

http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

Line 296: reader.Reset(p->data, p->data_len);
> Should offset advance in this loop, then?
The code was a bit obfuscated. The loop only really retries once, so there's no 
need to advance offset. Changed it to be more obvious. Offset is advanced once 
per value read by virtue of the offset calculation above.


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

Line 270: #include "common/names.h"
> Why the blank line here?
We separate out common/names.h like this most places because it needs to come 
after all the other includes.


PS3, Line 277: 
> Maybe NUM_OUT_VALUES
Done


PS3, Line 305: * p
> const uint8_t * const data_end ...
Done


Line 334: for (int i = 0; i < data_len; ++i) {
> std::iota
Done


PS3, Line 336: 
> I think you can even elide the parens
Done


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS3, Line 29: compute_mask
> ComputeMask, and in an anonymous namespace
Done


PS3, Line 33: UnpackSubset
> With a comment here, if I'm going to use it in PackUnpack.
Done


PS3, Line 37: with memory that is
> this phrase can be omitted
Done


PS3, Line 39: int
> unsigned? DCHECK <= something?
I don't think there's a particular limit on the number of values we could use 
here.


PS3, Line 39: num_values
> num_in_values?
Done


PS3, Line 39: int
> also unsigned?
I think int should be ok - I normally prefer using signed types by default even 
for variables that logically should always be positive because it's easier to 
reason about the number going negative compared with underflowing (also easier 
to catch the out-of-range value with dchecks).


PS3, Line 45: int
> const, here and elsewhere
Done


PS3, Line 51: & compute_mask(bit_width)
> AKA mask; already hoisted
Done


PS3, Line 58: becase
> because
Done


PS3, Line 58: the input buffer size
> "the input buffer size (num_values)"
Done


PS3, Line 60: // Size buffer exactly so that ASAN can detect buffer 
overruns.
> I think manual canaray checks might be called for here
We rely on ASAN to detect read overruns anyway (which I was more concerned 
about since they're tricky to avoid in the code), so it doesn't seem worth it 
to me on balance to add the extra test code.


PS3, Line 65: ASSERT_EQ
> Can you add more " << error message " to your ASSERTs?
Done


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

Line 36:   static const std::pair UnpackValues(int 
bit_width,
> Yeah, the verbosity is frustrating. However, cstdint doesn't necessarily pt
It seems like most implementations of cstdint put the typedefs in the global 
namespace anyway, so I think we're ok for now. If that turns out to be a 
problem I think we should create a wrapper header with the using declarations.


PS2, Line 57: Unpack32Values
> So, this boundary condition happens for any multiple of 8, and you chose 32
A batch size of 32 is the smallest that works for all supported bit widths. 
Otherwise you have to choose different batch sizes for different bit widths 
(e.g 32 for bit width 31, 8 for bit width 3, etc), which gets more complicated.

Lemire's bitpacking libraries use 32 everywhere (https://github.com/lemire/) so 
it seemed like a reasonable choice - he obviously put some effort into 
validating the performance.

I could benchmark different numbers but I think at this point I'd be optimising 
for the microbenchmark rather than a real workload.


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS3, Line 55: IT_WIDTH from 'in' to 'out'.
> But the last byte of 'in' that was read might only have been partially used
That's correct. I deliberately made the decision that these functions shouldn't 
deal with resuming reads in the middle of bytes, because it makes things way 
more complicated (and probably slower). I expect that callers will either read 
in batches or do their own buffering if they need to read a value at a time.

Expanded on the comment to make this explicit and explain what the caller can 
do.


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS3, Line 56: >(in,
> can be omitted if the branches are swapped
Done


Line 56: case 21: return UnpackValues(in, in_bytes, 
num_values, out);
> long line
Done


PS3, Line 90: 
:   if (r
> I think you accidentally a word
I ended up retrying the template function thing and it works ok this time 
around. I think something else was

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-05 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#4).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 64-bit loads, shifts by constants, masks by constants, bitwise
ors at 64-bit boundaries, and stores. Further speedups should be
possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
8 files changed, 799 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/4
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-04 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 3:

(34 comments)

http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

Line 296:   }
> We need to do this to start the reader reading from the start of the buffer
Should offset advance in this loop, then?


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

Line 270: 
Why the blank line here?


PS3, Line 277: NUM_VALUES
Maybe NUM_OUT_VALUES


PS3, Line 305: * d
const uint8_t * const data_end ...


Line 334:   data[i] = static_cast(i);
std::iota


PS3, Line 336: (
I think you can even elide the parens


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing-test.cc
File be/src/util/bit-packing-test.cc:

PS3, Line 29: compute_mask
ComputeMask, and in an anonymous namespace


PS3, Line 33: UnpackSubset
With a comment here, if I'm going to use it in PackUnpack.


PS3, Line 37: with memory that is
this phrase can be omitted


PS3, Line 39: int
also unsigned?


PS3, Line 39: int
unsigned? DCHECK <= something?


PS3, Line 39: num_values
num_in_values?


PS3, Line 45: int
const, here and elsewhere


PS3, Line 51: & compute_mask(bit_width)
AKA mask; already hoisted


PS3, Line 58: becase
because


PS3, Line 58: the input buffer size
"the input buffer size (num_values)"


PS3, Line 60: // Size buffer exactly so that ASAN can detect buffer 
overruns.
I think manual canaray checks might be called for here


PS3, Line 65: ASSERT_EQ
Can you add more " << error message " to your ASSERTs?


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

Line 36: /// little-endian order). E.g. the values 1, 2, 3, 4 packed with bit 
width 4 results
> We don't do this anywhere else, seems unnecessarily verbose.
Yeah, the verbosity is frustrating. However, cstdint doesn't necessarily pt 
these in the global namespace, and stdint.h is deprecated. What color do you 
think we should paint this bikeshed? Maybe a using declaration at the top?


PS2, Line 57: Type>
> Added some explanation to the class comment for the magic number. It works 
So, this boundary condition happens for any multiple of 8, and you chose 32 to 
amortize the loop some more?

Did you do any testing of 16 and 64 to see how they compare?


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

PS3, Line 55: after the last byte of 'in' that was read
But the last byte of 'in' that was read might only have been partially used, 
right?


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS2, Line 108:  \
 : if (end_bit_offset < load_bit_width) {  
> Reworded this. I did some experiments early on looking at the assembly and 
Did you try always_inline and template parameters? If yes, what else did you 
try?


http://gerrit.cloudera.org:8080/#/c/4494/3/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

PS3, Line 56:  == 0
can be omitted if the branches are swapped


Line 56:   const int64_t max_input_values = BIT_WIDTH == 0 ? num_values : 
(in_bytes * CHAR_BIT) / BIT_WIDTH;
long line


PS3, Line 90: early
: // with
I think you accidentally a word


PS3, Line 93: LOAD_TYPE
I think "LOAD_TYPE" could be "LoadType" to more clearly match the lexical 
standard for other type names


PS3, Line 93: i
If 'i' is a compile-time constant, maybe call it "I" or "NTH"?


PS3, Line 96: load_bit_width
I think constexprs should get the static const treatment of all caps


PS3, Line 106: shifted
Can you add a comment describing what this is for?


PS3, Line 108: 32
How is this 32 derived?


PS3, Line 109: end_bit_offset
so, this case al the bits we need are in one word?


PS3, Line 119: Make non-negative to avoid spurious compiler warning
Maybe give it an unsigned type?


PS3, Line 124: Special-case impossible BIT_WIDTH 32 to avoid spurious compiler 
warning
this could use a bit more explanation


PS3, Line 143: define
Check not already defined?


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-04 Thread Tim Armstrong (Code Review)

Tim Armstrong has uploaded a new patch set (#3).

Change subject: IMPALA-4123: Fast bit unpacking
..

IMPALA-4123: Fast bit unpacking

Adds utility functions for fast unpacking of batches of bit-packed
values. These support reading batches of any number of values provided
that the start of the batch is aligned to a byte boundary. Callers that
want to read smaller batches that don't align to byte boundaries will
need to implement their own buffering.

The unpacking code uses only portable C++ and no SIMD intrinsics, but is
fairly efficient because unpacking a full batch of 32 values compiles
down to 64-bit loads, shifts by constants, masks by constants, bitwise
ors at 64-bit boundaries, and stores. Further speedups should be
possible using SIMD intrinsics.

Testing:
Added unit tests for unpacking, exhaustively covering different
bitwidths with additional test dimensions (memory alignment, various
input sizes, etc).

Tested under ASAN to ensure the bit unpacking doesn't read past the end
of buffers.

Perf:
Added microbenchmark that shows on average an 8-9x speedup over the
existing BitReader code.

Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
---
M be/src/benchmarks/CMakeLists.txt
A be/src/benchmarks/bit-packing-benchmark.cc
M be/src/util/CMakeLists.txt
A be/src/util/bit-packing-test.cc
A be/src/util/bit-packing.h
A be/src/util/bit-packing.inline.h
M be/src/util/bit-stream-utils.h
M be/src/util/bit-stream-utils.inline.h
8 files changed, 788 insertions(+), 21 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/94/4494/3
-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

2016-10-03 Thread Jim Apple (Code Review)

Jim Apple has posted comments on this change.

Change subject: IMPALA-4123: Fast bit unpacking
..


Patch Set 2:

(34 comments)

First pass, didn't get to all of it yet but I thought you might want to see 
what was done

http://gerrit.cloudera.org:8080/#/c/4494/2//COMMIT_MSG
Commit Message:

PS2, Line 12: was
"want"


PS2, Line 18: possible
long line


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/benchmarks/bit-packing-benchmark.cc
File be/src/benchmarks/bit-packing-benchmark.cc:

PS2, Line 259: include
I think cmath, cstdlib, cstdio are the non-deprecated versions for C++


Line 260: #include 
Can you arrange these in three blocks:

1. C standard headers

2. C++ standard headers

3. Other

alphabetized in each block


Line 276: #define NUM_VALUES 1024 * 1024
constexpr int


PS2, Line 281: BenchmarkParams
You can leave the constructor out and use aggregate initialization:

BenchmarkParams bp {p,q,r};


Line 290:   BenchmarkParams* p = reinterpret_cast(data);
maybe const auto p = reinterpret_cast


PS2, Line 294: int64_t
const


PS2, Line 294: i * 32 % NUM_VALUES + j;
Shouldn't this while thing be modulo NUM_VALUES, not just the non-+j part?


Line 296: reader.Reset(p->data, p->data_len);
Why Reset?


PS2, Line 333: uint8_t* data = new uint8_t[data_len];
unique_ptr to clean up after


PS2, Line 334: int
int64_t


PS2, Line 337: BenchmarkParams
This can be a local var, right?


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.h
File be/src/util/bit-packing.h:

Line 18: #include 
cstdint


PS2, Line 29: Unpack bit-packed values
What does this phrase mean?


PS2, Line 30: outputted
unpacked


PS2, Line 30: is
are


PS2, Line 31: 0
What does a zero bit_width mean in this context?


PS2, Line 30: 'out' must have
:   /// enough space for 'num_values'.
And in must point to at least in_bytes of addressable memory?


PS2, Line 33: the byte 
a pointer to the byte


PS2, Line 33: plus
"And also"


PS2, Line 32: utType.
:   /// Retu
extra line break here, please


PS2, Line 33: the number of
:   /// values that were read.
I can infer that by just subtracting, right? I think the pair might be overkill


PS2, Line 36: const
This const has no effect, I believe.


Line 36:   static const std::pair UnpackValues(int 
bit_width,
std::uint8_t, etc.


PS2, Line 57: Unpack32Values
Why not up to 64?


http://gerrit.cloudera.org:8080/#/c/4494/2/be/src/util/bit-packing.inline.h
File be/src/util/bit-packing.inline.h:

Line 35: case 0: return UnpackValues(in, in_bytes, num_values, 
out);
BOOST_PP_REPEAT_FROM_TO would be worth it here, I think


PS2, Line 70: NULL
nullptr


PS2, Line 70: 
can be omitted


PS2, Line 78: DCHECK_GE
static_assert


PS2, Line 79: 8
CHAR_BIT, here and below


PS2, Line 80: in_bytes * 8 / BIT_WIDTH
I think this line would be clearer with more parens


PS2, Line 99:  unsigned integer type
static_check with http://en.cppreference.com/w/cpp/types/is_unsigned?


PS2, Line 108: inlined as
 : // soon as possible and subject to all optimizations
Can you elaborate on this? Are there benchmarks on that?


-- 
To view, visit http://gerrit.cloudera.org:8080/4494
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I12db69409483d208cd4c0f41c27a78aeb6cd3622
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Tim Armstrong 
Gerrit-Reviewer: Jim Apple 
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

[Impala-ASF-CR] IMPALA-4123: Fast bit unpacking

27 matches

Site Navigation

Mail list logo

Footer information