Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13807 )
Change subject: IMPALA-8741: Speed up bit unpacking by vectorisation ...................................................................... Patch Set 4: (8 comments) I started reviewing this morning but ran out of time to look today. I got through the C++ code but haven't reviewed the guts of the code generator. I have pretty good confidence it's correct though, the unit tests should provide good coverage. http://gerrit.cloudera.org:8080/#/c/13807/4/be/src/benchmarks/bit-packing-benchmark.cc File be/src/benchmarks/bit-packing-benchmark.cc: http://gerrit.cloudera.org:8080/#/c/13807/4/be/src/benchmarks/bit-packing-benchmark.cc@29 PS4, Line 29: // The second one compares the original scalar implementation with the vectorised one Include results for this one as a comment? http://gerrit.cloudera.org:8080/#/c/13807/4/be/src/util/bit-packing-vectorized.h File be/src/util/bit-packing-vectorized.h: PS4: We generally don't check in generated files. There are arguments for and against doing so, but generally that's the direction we've gone. The most compelling reason for me is that re-generating the code as part of the build means that vectorised_bit_unpacking_generator.py is tested. Otherwise it could easily bit rot. I think this is useful for the purposes of review, but I'd be inclined to remove it before merging and rely on generating via a CMake rule. We can discuss the pros and cons though; maybe there are some considerations I'm issing. http://gerrit.cloudera.org:8080/#/c/13807/4/be/src/util/bit-packing.h File be/src/util/bit-packing.h: http://gerrit.cloudera.org:8080/#/c/13807/4/be/src/util/bit-packing.h@64 PS4, Line 64: simultaniously simultaneously http://gerrit.cloudera.org:8080/#/c/13807/4/be/src/util/bit-packing.h@67 PS4, Line 67: template <typename OutType, bool VECTORIZE = true> Is there a significant performance benefit to making VECTORIZE a compile-time constant - we already have to do a runtime check for the instruction anyway, so it can't result in more specialisation. http://gerrit.cloudera.org:8080/#/c/13807/4/be/src/util/bit-packing.inline.h File be/src/util/bit-packing.inline.h: http://gerrit.cloudera.org:8080/#/c/13807/4/be/src/util/bit-packing.inline.h@84 PS4, Line 84: if (LIKELY((std::is_same<OutType, uint8_t>::value Does it even make sense to unpack values into a different type outside of these 4? Could we make this a static_assert instead? That would avoid someone accidentally instantiating a non-vectorized version. http://gerrit.cloudera.org:8080/#/c/13807/4/be/src/util/bit-packing.inline.h@262 PS4, Line 262: return in; indentation http://gerrit.cloudera.org:8080/#/c/13807/4/be/src/util/vectorised_bit_unpacking_generator.py File be/src/util/vectorised_bit_unpacking_generator.py: http://gerrit.cloudera.org:8080/#/c/13807/4/be/src/util/vectorised_bit_unpacking_generator.py@142 PS4, Line 142: sinlge single http://gerrit.cloudera.org:8080/#/c/13807/4/be/src/util/vectorised_bit_unpacking_generator.py@1080 PS4, Line 1080: metod method -- To view, visit http://gerrit.cloudera.org:8080/13807 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e452a547973778bbd8d768c608e1a32e948f947 Gerrit-Change-Number: 13807 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Wed, 17 Jul 2019 23:02:01 +0000 Gerrit-HasComments: Yes