Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/13807 )

Change subject: IMPALA-8741: Speed up bit unpacking by vectorisation
......................................................................


Patch Set 4:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/13807/4/be/src/util/bit-packing-vectorized.h
File be/src/util/bit-packing-vectorized.h:

PS4:
> We thought it would make the build process more complicated if we generated
Ahh I see. I don't think it should be too bad, we have similar CMake rules, 
e.g. see the file2array invocations in e/src/codegen/CMakeLists. Having a 
reproducible build process ultimately simplifies maintenance even if it 
requires some upfront work.

That said, I don't think we expect this file to change that often so requiring 
a manual rerun when the python script changes isn't unreasonable.

Taking into account this and the Python 3 issue, I think we have two paths 
forward:
* Port the script to Python 2 (we don't have Python 3 on all systems we build 
on yet; would probably have to add it to the toolchain, which I would like to 
do eventually to get us off Python 2 for tests, etc) and add a CMake rule to 
generate in each build
* Keep this as is, but ensure that the code generation is easily reproducible 
by others. Maybe it would be enough to generate a comment in this files header 
with the vectorised_bit_unpacking_generator.py command-line used.

I somewhat prefer the first just for reproducibility, but I'm open to the 
second. The main downside of the first is the extra work to port the script, 
since we're going to have to port it back to Python 3 at some point in the 
future. I think if others prefer the second we should go with that.


http://gerrit.cloudera.org:8080/#/c/13807/8/be/src/util/vectorised_bit_unpacking_generator.py
File be/src/util/vectorised_bit_unpacking_generator.py:

http://gerrit.cloudera.org:8080/#/c/13807/8/be/src/util/vectorised_bit_unpacking_generator.py@1540
PS8, Line 1540:
I'm included to remove this option and pick one alternative as the canonical 
behaviour. If the output is readable without clang-format, we could simplify 
the script. Otherwise we could just always run through clang-format.


http://gerrit.cloudera.org:8080/#/c/13807/8/be/src/util/vectorised_bit_unpacking_generator.py@1544
PS8, Line 1544:         else:
If we're going with clang-format, I think we should write the non-formatted 
output to a temporary file, then run it through clang format. Overwriting files 
in place as part of a build step doesn't play nice with a lot of build systems 
like make, since if the command fails partway through, make will just look at 
the timestamps and think that the output was correctly generated.


http://gerrit.cloudera.org:8080/#/c/13807/8/be/src/util/vectorised_bit_unpacking_generator.py@1547
PS8, Line 1547: if __name__ == "__main__":
I think the script should just fail if clang_format was enabled and couldn't be 
found - the script didn't do what the user asked for, so probably shouldn't do 
anything.



--
To view, visit http://gerrit.cloudera.org:8080/13807
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I9e452a547973778bbd8d768c608e1a32e948f947
Gerrit-Change-Number: 13807
Gerrit-PatchSet: 4
Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com>
Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com>
Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com>
Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com>
Gerrit-Comment-Date: Fri, 19 Jul 2019 17:13:46 +0000
Gerrit-HasComments: Yes

Reply via email to