[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Impala Public Jenkins has submitted this change and it was merged. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() This change codegens the hash partitioning logic of KrpcDataStreamSender::Send() when the partitioning strategy is HASH_PARTITIONED. It does so by unrolling the loop which evaluates each row against the partitioning expressions and hashes the result. It also replaces the number of channels of that sender with a constant at runtime. With this change, we get reasonable speedup with some benchmarks: ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TPCH(_300) | parquet / none / none | 20.03 | -6.44% | 13.56 | -7.15% | ++---+-++++ +-+---+-++++ | Workload| File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-+---+-++++ | TARGETED-PERF(_300) | parquet / none / none | 58.59 | -5.56% | 12.28 | -5.30% | +-+---+-++++ +-+---+-++++ | Workload| File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-+---+-++++ | TPCDS-UNMODIFIED(_1000) | parquet / none / none | 15.60 | -3.10% | 7.16 | -4.33% | +-+---+-++++ +---+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +---+---+-++++ | TPCH_NESTED(_300) | parquet / none / none | 30.93 | -3.02% | 17.46 | -4.71% | +---+---+-++++ Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Reviewed-on: http://gerrit.cloudera.org:8080/10421 Reviewed-by: Michael Ho Tested-by: Impala Public Jenkins --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exchange-node.cc M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/fragment-instance-state.cc A be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/runtime-state.h M be/src/util/runtime-profile.h A testdata/workloads/functional-query/queries/QueryTest/datastream-sender-codegen.test M tests/query_test/test_codegen.py 26 files changed, 425 insertions(+), 101 deletions(-) Approvals: Michael Ho: Looks good to me, approved Impala Public Jenkins: Verified -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: merged Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 10 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 9 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 14 Jun 2018 23:36:59 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 9: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2672/ -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 9 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 14 Jun 2018 20:08:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 9: Code-Review+2 Fixed another clang-tidy error missed in previous patch. -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 9 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Thu, 14 Jun 2018 20:04:40 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Hello Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10421 to look at the new patch set (#8). Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() This change codegens the hash partitioning logic of KrpcDataStreamSender::Send() when the partitioning strategy is HASH_PARTITIONED. It does so by unrolling the loop which evaluates each row against the partitioning expressions and hashes the result. It also replaces the number of channels of that sender with a constant at runtime. With this change, we get reasonable speedup with some benchmarks: ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TPCH(_300) | parquet / none / none | 20.03 | -6.44% | 13.56 | -7.15% | ++---+-++++ +-+---+-++++ | Workload| File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-+---+-++++ | TARGETED-PERF(_300) | parquet / none / none | 58.59 | -5.56% | 12.28 | -5.30% | +-+---+-++++ +-+---+-++++ | Workload| File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-+---+-++++ | TPCDS-UNMODIFIED(_1000) | parquet / none / none | 15.60 | -3.10% | 7.16 | -4.33% | +-+---+-++++ +---+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +---+---+-++++ | TPCH_NESTED(_300) | parquet / none / none | 30.93 | -3.02% | 17.46 | -4.71% | +---+---+-++++ Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exchange-node.cc M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/fragment-instance-state.cc A be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/runtime-state.h M be/src/util/runtime-profile.h A testdata/workloads/functional-query/queries/QueryTest/datastream-sender-codegen.test M tests/query_test/test_codegen.py 26 files changed, 425 insertions(+), 101 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/10421/8 -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 8 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 7: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2655/ -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 13 Jun 2018 22:16:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 7: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 13 Jun 2018 18:49:25 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 7: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2655/ -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 7 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 13 Jun 2018 18:49:26 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 6: Code-Review+2 Fix clang-tidy issue. Changed the signature of DataSink::Codegen() to match that of phj-builder. Ran more sanity tests. Carry +2. -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 6 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 13 Jun 2018 18:41:42 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Hello Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/10421 to look at the new patch set (#6). Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() This change codegens the hash partitioning logic of KrpcDataStreamSender::Send() when the partitioning strategy is HASH_PARTITIONED. It does so by unrolling the loop which evaluates each row against the partitioning expressions and hashes the result. It also replaces the number of channels of that sender with a constant at runtime. With this change, we get reasonable speedup with some benchmarks: ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TPCH(_300) | parquet / none / none | 20.03 | -6.44% | 13.56 | -7.15% | ++---+-++++ +-+---+-++++ | Workload| File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-+---+-++++ | TARGETED-PERF(_300) | parquet / none / none | 58.59 | -5.56% | 12.28 | -5.30% | +-+---+-++++ +-+---+-++++ | Workload| File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-+---+-++++ | TPCDS-UNMODIFIED(_1000) | parquet / none / none | 15.60 | -3.10% | 7.16 | -4.33% | +-+---+-++++ +---+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +---+---+-++++ | TPCH_NESTED(_300) | parquet / none / none | 30.93 | -3.02% | 17.46 | -4.71% | +---+---+-++++ Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exchange-node.cc M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-builder.h M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/fragment-instance-state.cc A be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/runtime-state.h M be/src/util/runtime-profile.h A testdata/workloads/functional-query/queries/QueryTest/datastream-sender-codegen.test M tests/query_test/test_codegen.py 26 files changed, 424 insertions(+), 101 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/10421/6 -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 6 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 5: Verified-1 Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2640/ -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 12 Jun 2018 21:32:52 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 5: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 12 Jun 2018 18:15:37 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 5: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2640/ -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 5 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 12 Jun 2018 18:15:38 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 4: Code-Review+2 Rebase. Carry +2. -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 4 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Tue, 12 Jun 2018 18:14:18 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Impala Public Jenkins has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 3: Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2556/ -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 May 2018 18:01:54 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 3: Code-Review+2 Rebase. Carry +2. -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 3 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Fri, 25 May 2018 18:01:11 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 2: Oh, that's not nearly as bad as I expected. I think we can probably live with that given the huge speedups we're getting elsewhere. -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 23 May 2018 22:18:39 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 2: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 23 May 2018 22:18:43 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 2: Tim, thanks for pointing that out. I tried the idea of splitting up each partition expression evaluation and hashing into a separate function. The idea is that if there are few expressions, the compiler should be smart enough to inline them so we will get the same benefit. If there are too many expressions, the compiler will do appropriate level of inlining. However, for some reasons, this doesn't quite pan out as expected. In particular, splitting each expression into a separate function resulted in slightly longer codegen time. The patch is here (https://github.com/michaelhkw/incubator-impala/commit/76d071f1bf2d82d4b0905ed2e64150f43198538f). I compared the level of regression with different wide tables using the query you suggested: widetable_250_cols: 5% regression widetable_1000_cols: 3% regression widetable_250_cols_big: 12% regression widetable_250_cols_big is created by concatenating multiple widetable_250_cols to a table with 80 rows. It's large enough to trigger 3 instances of scan nodes (instead of 1 in the case of widetable_250_col). The larger table triggers slightly different optimized code as there are 3 channels. In the case of widetable_250_col (smaller table), there is only 1 channel so the optimizer was smart enough to optimize away all the evaluate and hash functions as all rows are being added to the same channel. Anyhow, I am open to other suggestions (e.g. optimizing the patch posted above). Given the level regression observed above, do you think there are extra steps which we should take ? -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Wed, 23 May 2018 01:21:16 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/10421/2/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/10421/2/be/src/runtime/krpc-data-stream-sender.cc@716 PS2, Line 716: for (int i = 0; i < partition_exprs_.size(); ++i) { > Have you done any experiments for wide rows to determine the codegen time? A concrete scenario where N could be large here is something like "select distinct * from wide_table" -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 21 May 2018 16:15:37 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Tim Armstrong has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 2: (1 comment) Code looks good, just want to be sure we get ahead of any codegen time issues :) http://gerrit.cloudera.org:8080/#/c/10421/2/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/10421/2/be/src/runtime/krpc-data-stream-sender.cc@716 PS2, Line 716: for (int i = 0; i < partition_exprs_.size(); ++i) { Have you done any experiments for wide rows to determine the codegen time? Unrolling a loop N times is the classic pattern that we've seen cause codegen time to blow up before. It would be good to get ahead of it in this instance and either do something to mitigate it (like http://gerrit.cloudera.org:8080/8211), or disable it above some number of expressions to avoid the regression -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Reviewer: Tim Armstrong Gerrit-Comment-Date: Mon, 21 May 2018 16:14:02 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 2: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Thu, 17 May 2018 23:56:23 + Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/10421/1/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test File testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test: http://gerrit.cloudera.org:8080/#/c/10421/1/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test@61 PS1, Line 61: > would you be adding the tests in the next patch? Oh sorry, forgot to reply. I added it in the new file datastream-sender-codegen.test. -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Thu, 17 May 2018 23:38:20 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 2: (2 comments) http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@718 PS1, Line 718: RETURN_IF_ERROR(partition_exprs_[i]->GetCodegendComputeFn(codegen, &compute_fn)); : : // Load the expression evaluator for the i-th partition expression : llvm::Function* get_expr_eval_fn = : codegen->GetFunction(IRFunction::KRPC_DSS_GET_PART_EXPR_EVAL, false); : DCHECK(get_expr_eval_fn != nullptr); > The above is actually the unoptimized version of the code. The reason I avo sounds good http://gerrit.cloudera.org:8080/#/c/10421/1/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test File testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test: http://gerrit.cloudera.org:8080/#/c/10421/1/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test@61 PS1, Line 61: > add cases for "codegen not supported/needed" would you be adding the tests in the next patch? -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Thu, 17 May 2018 22:40:42 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Michael Ho has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 2: (4 comments) http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@718 PS1, Line 718: RETURN_IF_ERROR(partition_exprs_[i]->GetCodegendComputeFn(codegen, &compute_fn)); : : // Load the expression evaluator for the i-th partition expression : llvm::Function* get_expr_eval_fn = : codegen->GetFunction(IRFunction::KRPC_DSS_GET_PART_EXPR_EVAL, false); : DCHECK(get_expr_eval_fn != nullptr); > can we use Builder.CreateExtractValue() and get rid of this function call. The above is actually the unoptimized version of the code. The reason I avoided using CreateExtractValue() or GetElementPtr() etc is to avoid the need to make assumption about the order of the fields inside the struct plus it's easier to review in general. I confirmed in the optimized version of the code that this function call is actually inlined and optimized away as dead code. http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@803 PS1, Line 803: t string sender_name = PartitionTypeName() + " Send > nit: Invert condition to avoid indentation. Done http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@818 PS1, Line 818: ced; > nit: HashRow() Done http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@820 PS1, Line 820: en->ReplaceCallSitesWithValue(hash_and_add_ > nit: maybe use a named const like Done -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Reviewer: Michael Ho Gerrit-Comment-Date: Thu, 17 May 2018 22:11:49 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Michael Ho has uploaded a new patch set (#2). ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() This change codegens the hash partitioning logic of KrpcDataStreamSender::Send() when the partitioning strategy is HASH_PARTITIONED. It does so by unrolling the loop which evaluates each row against the partitioning expressions and hashes the result. It also replaces the number of channels of that sender with a constant at runtime. With this change, we get reasonable speedup with some benchmarks: ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TPCH(_300) | parquet / none / none | 20.03 | -6.44% | 13.56 | -7.15% | ++---+-++++ +-+---+-++++ | Workload| File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-+---+-++++ | TARGETED-PERF(_300) | parquet / none / none | 58.59 | -5.56% | 12.28 | -5.30% | +-+---+-++++ +-+---+-++++ | Workload| File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-+---+-++++ | TPCDS-UNMODIFIED(_1000) | parquet / none / none | 15.60 | -3.10% | 7.16 | -4.33% | +-+---+-++++ +---+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +---+---+-++++ | TPCH_NESTED(_300) | parquet / none / none | 30.93 | -3.02% | 17.46 | -4.71% | +---+---+-++++ Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exchange-node.cc M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/fragment-instance-state.cc A be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/runtime-state.h M be/src/util/runtime-profile.h A testdata/workloads/functional-query/queries/QueryTest/datastream-sender-codegen.test M tests/query_test/test_codegen.py 25 files changed, 427 insertions(+), 100 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/10421/2 -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newpatchset Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 2 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Bikramjeet Vig has posted comments on this change. ( http://gerrit.cloudera.org:8080/10421 ) Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. Patch Set 1: (5 comments) http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc File be/src/runtime/krpc-data-stream-sender.cc: http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@718 PS1, Line 718: // Load the expression evaluator for the i-th partition expression : llvm::Function* get_expr_eval_fn = : codegen->GetFunction(IRFunction::KRPC_DSS_GET_PART_EXPR_EVAL, false); : DCHECK(get_expr_eval_fn != nullptr); : llvm::Value* expr_eval_arg = : builder.CreateCall(get_expr_eval_fn, {this_arg, codegen->GetI32Constant(i)}); can we use Builder.CreateExtractValue() and get rid of this function call. (Looks like something that the llvm optimizer could have inlined but didn't according to the sample generated code above) http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@803 PS1, Line 803: partition_type_ == TPartitionType::HASH_PARTITIONED nit: Invert condition to avoid indentation. if (partition_type_ != TPartitionType::HASH_PARTITIONED) { const string& msg = Substitute("not $0", partition_type_ == TPartitionType::KUDU ? "supported" : "needed"); profile()->AddCodegenMsg(false, msg, sender_name); return; } llvm::Function* hash_row_fn; codegen_status = CodegenHashRow(codegen, &hash_row_fn); . . . http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@818 PS1, Line 818: HashRows nit: HashRow() http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@820 PS1, Line 820: KrpcDataStreamSender7HashRowEPNS_8TupleRowE nit: maybe use a named const like const char* HASH_ROW_SYMBOL http://gerrit.cloudera.org:8080/#/c/10421/1/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test File testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test: http://gerrit.cloudera.org:8080/#/c/10421/1/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test@61 PS1, Line 61: QUERY add cases for "codegen not supported/needed" -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: comment Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho Gerrit-Reviewer: Bikramjeet Vig Gerrit-Comment-Date: Wed, 16 May 2018 23:42:33 + Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()
Michael Ho has uploaded this change for review. ( http://gerrit.cloudera.org:8080/10421 Change subject: IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() .. IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send() This change codegens the hash partitioning logic of KrpcDataStreamSender::Send() when the partitioning strategy is HASH_PARTITIONED. It does so by unrolling the loop which evaluates each row against the partitioning expressions and hashes the result. It also replaces the number of channels of that sender with a constant at runtime. With this change, we get reasonable speedup with some benchmarks: ++---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | ++---+-++++ | TPCH(_300) | parquet / none / none | 20.03 | -6.44% | 13.56 | -7.15% | ++---+-++++ +-+---+-++++ | Workload| File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-+---+-++++ | TARGETED-PERF(_300) | parquet / none / none | 58.59 | -5.56% | 12.28 | -5.30% | +-+---+-++++ +-+---+-++++ | Workload| File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +-+---+-++++ | TPCDS-UNMODIFIED(_1000) | parquet / none / none | 15.60 | -3.10% | 7.16 | -4.33% | +-+---+-++++ +---+---+-++++ | Workload | File Format | Avg (s) | Delta(Avg) | GeoMean(s) | Delta(GeoMean) | +---+---+-++++ | TPCH_NESTED(_300) | parquet / none / none | 30.93 | -3.02% | 17.46 | -4.71% | +---+---+-++++ Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff --- M be/src/codegen/gen_ir_descriptions.py M be/src/codegen/impala-ir.cc M be/src/exec/data-sink.cc M be/src/exec/data-sink.h M be/src/exec/exchange-node.cc M be/src/exec/exec-node.cc M be/src/exec/exec-node.h M be/src/exec/hdfs-scan-node-base.cc M be/src/exec/partial-sort-node.cc M be/src/exec/partitioned-aggregation-node.cc M be/src/exec/partitioned-hash-join-builder.cc M be/src/exec/partitioned-hash-join-node.cc M be/src/exec/sort-node.cc M be/src/exec/topn-node.cc M be/src/runtime/CMakeLists.txt M be/src/runtime/fragment-instance-state.cc A be/src/runtime/krpc-data-stream-sender-ir.cc M be/src/runtime/krpc-data-stream-sender.cc M be/src/runtime/krpc-data-stream-sender.h M be/src/runtime/raw-value-ir.cc M be/src/runtime/raw-value.cc M be/src/runtime/runtime-state.h M be/src/util/runtime-profile.h M testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test 24 files changed, 397 insertions(+), 99 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/10421/1 -- To view, visit http://gerrit.cloudera.org:8080/10421 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-MessageType: newchange Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff Gerrit-Change-Number: 10421 Gerrit-PatchSet: 1 Gerrit-Owner: Michael Ho