Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/13807 )
Change subject: IMPALA-8741: Speed up bit unpacking by vectorisation ...................................................................... Patch Set 4: (4 comments) http://gerrit.cloudera.org:8080/#/c/13807/4/be/src/util/bit-packing-vectorized.h File be/src/util/bit-packing-vectorized.h: PS4: > We thought it would make the build process more complicated if we generated Ahh I see. I don't think it should be too bad, we have similar CMake rules, e.g. see the file2array invocations in e/src/codegen/CMakeLists. Having a reproducible build process ultimately simplifies maintenance even if it requires some upfront work. That said, I don't think we expect this file to change that often so requiring a manual rerun when the python script changes isn't unreasonable. Taking into account this and the Python 3 issue, I think we have two paths forward: * Port the script to Python 2 (we don't have Python 3 on all systems we build on yet; would probably have to add it to the toolchain, which I would like to do eventually to get us off Python 2 for tests, etc) and add a CMake rule to generate in each build * Keep this as is, but ensure that the code generation is easily reproducible by others. Maybe it would be enough to generate a comment in this files header with the vectorised_bit_unpacking_generator.py command-line used. I somewhat prefer the first just for reproducibility, but I'm open to the second. The main downside of the first is the extra work to port the script, since we're going to have to port it back to Python 3 at some point in the future. I think if others prefer the second we should go with that. http://gerrit.cloudera.org:8080/#/c/13807/8/be/src/util/vectorised_bit_unpacking_generator.py File be/src/util/vectorised_bit_unpacking_generator.py: http://gerrit.cloudera.org:8080/#/c/13807/8/be/src/util/vectorised_bit_unpacking_generator.py@1540 PS8, Line 1540: I'm included to remove this option and pick one alternative as the canonical behaviour. If the output is readable without clang-format, we could simplify the script. Otherwise we could just always run through clang-format. http://gerrit.cloudera.org:8080/#/c/13807/8/be/src/util/vectorised_bit_unpacking_generator.py@1544 PS8, Line 1544: else: If we're going with clang-format, I think we should write the non-formatted output to a temporary file, then run it through clang format. Overwriting files in place as part of a build step doesn't play nice with a lot of build systems like make, since if the command fails partway through, make will just look at the timestamps and think that the output was correctly generated. http://gerrit.cloudera.org:8080/#/c/13807/8/be/src/util/vectorised_bit_unpacking_generator.py@1547 PS8, Line 1547: if __name__ == "__main__": I think the script should just fail if clang_format was enabled and couldn't be found - the script didn't do what the user asked for, so probably shouldn't do anything. -- To view, visit http://gerrit.cloudera.org:8080/13807 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I9e452a547973778bbd8d768c608e1a32e948f947 Gerrit-Change-Number: 13807 Gerrit-PatchSet: 4 Gerrit-Owner: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Csaba Ringhofer <csringho...@cloudera.com> Gerrit-Reviewer: Daniel Becker <daniel.bec...@cloudera.com> Gerrit-Reviewer: Impala Public Jenkins <impala-public-jenk...@cloudera.com> Gerrit-Reviewer: Tim Armstrong <tarmstr...@cloudera.com> Gerrit-Comment-Date: Fri, 19 Jul 2019 17:13:46 +0000 Gerrit-HasComments: Yes