[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..

IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send()

This change codegens the hash partitioning logic of
KrpcDataStreamSender::Send() when the partitioning strategy
is HASH_PARTITIONED. It does so by unrolling the loop which
evaluates each row against the partitioning expressions and
hashes the result. It also replaces the number of channels
of that sender with a constant at runtime.

With this change, we get reasonable speedup with some benchmarks:

++---+-++++
| Workload   | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
++---+-++++
| TPCH(_300) | parquet / none / none | 20.03   | -6.44% | 13.56  | 
-7.15% |
++---+-++++

+-+---+-++++
| Workload| File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |
+-+---+-++++
| TARGETED-PERF(_300) | parquet / none / none | 58.59   | -5.56% | 12.28
  | -5.30% |
+-+---+-++++

+-+---+-++++
| Workload| File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |
+-+---+-++++
| TPCDS-UNMODIFIED(_1000) | parquet / none / none | 15.60   | -3.10% | 7.16 
  | -4.33% |
+-+---+-++++

+---+---+-++++
| Workload  | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) 
| Delta(GeoMean) |
+---+---+-++++
| TPCH_NESTED(_300) | parquet / none / none | 30.93   | -3.02% | 17.46  
| -4.71% |
+---+---+-++++

Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Reviewed-on: http://gerrit.cloudera.org:8080/10421
Reviewed-by: Michael Ho 
Tested-by: Impala Public Jenkins 
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/fragment-instance-state.cc
A be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/runtime-state.h
M be/src/util/runtime-profile.h
A 
testdata/workloads/functional-query/queries/QueryTest/datastream-sender-codegen.test
M tests/query_test/test_codegen.py
26 files changed, 425 insertions(+), 101 deletions(-)

Approvals:
  Michael Ho: Looks good to me, approved
  Impala Public Jenkins: Verified

--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: merged
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 10
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 9: Verified+1


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 14 Jun 2018 23:36:59 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-06-14 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 9:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2672/


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 14 Jun 2018 20:08:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-06-14 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 9: Code-Review+2

Fixed another clang-tidy error missed in previous patch.


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 9
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Thu, 14 Jun 2018 20:04:40 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-06-14 Thread Michael Ho (Code Review)
Hello Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10421

to look at the new patch set (#8).

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..

IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send()

This change codegens the hash partitioning logic of
KrpcDataStreamSender::Send() when the partitioning strategy
is HASH_PARTITIONED. It does so by unrolling the loop which
evaluates each row against the partitioning expressions and
hashes the result. It also replaces the number of channels
of that sender with a constant at runtime.

With this change, we get reasonable speedup with some benchmarks:

++---+-++++
| Workload   | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
++---+-++++
| TPCH(_300) | parquet / none / none | 20.03   | -6.44% | 13.56  | 
-7.15% |
++---+-++++

+-+---+-++++
| Workload| File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |
+-+---+-++++
| TARGETED-PERF(_300) | parquet / none / none | 58.59   | -5.56% | 12.28
  | -5.30% |
+-+---+-++++

+-+---+-++++
| Workload| File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |
+-+---+-++++
| TPCDS-UNMODIFIED(_1000) | parquet / none / none | 15.60   | -3.10% | 7.16 
  | -4.33% |
+-+---+-++++

+---+---+-++++
| Workload  | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) 
| Delta(GeoMean) |
+---+---+-++++
| TPCH_NESTED(_300) | parquet / none / none | 30.93   | -3.02% | 17.46  
| -4.71% |
+---+---+-++++

Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/fragment-instance-state.cc
A be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/runtime-state.h
M be/src/util/runtime-profile.h
A 
testdata/workloads/functional-query/queries/QueryTest/datastream-sender-codegen.test
M tests/query_test/test_codegen.py
26 files changed, 425 insertions(+), 101 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/10421/8
--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 8
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 7: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2655/


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 13 Jun 2018 22:16:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 7: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 13 Jun 2018 18:49:25 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-06-13 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 7:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2655/


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 7
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 13 Jun 2018 18:49:26 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-06-13 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 6: Code-Review+2

Fix clang-tidy issue. Changed the signature of DataSink::Codegen() to match 
that of phj-builder. Ran more sanity tests. Carry +2.


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 13 Jun 2018 18:41:42 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-06-13 Thread Michael Ho (Code Review)
Hello Tim Armstrong, Bikramjeet Vig, Impala Public Jenkins,

I'd like you to reexamine a change. Please visit

http://gerrit.cloudera.org:8080/10421

to look at the new patch set (#6).

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..

IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send()

This change codegens the hash partitioning logic of
KrpcDataStreamSender::Send() when the partitioning strategy
is HASH_PARTITIONED. It does so by unrolling the loop which
evaluates each row against the partitioning expressions and
hashes the result. It also replaces the number of channels
of that sender with a constant at runtime.

With this change, we get reasonable speedup with some benchmarks:

++---+-++++
| Workload   | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
++---+-++++
| TPCH(_300) | parquet / none / none | 20.03   | -6.44% | 13.56  | 
-7.15% |
++---+-++++

+-+---+-++++
| Workload| File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |
+-+---+-++++
| TARGETED-PERF(_300) | parquet / none / none | 58.59   | -5.56% | 12.28
  | -5.30% |
+-+---+-++++

+-+---+-++++
| Workload| File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |
+-+---+-++++
| TPCDS-UNMODIFIED(_1000) | parquet / none / none | 15.60   | -3.10% | 7.16 
  | -4.33% |
+-+---+-++++

+---+---+-++++
| Workload  | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) 
| Delta(GeoMean) |
+---+---+-++++
| TPCH_NESTED(_300) | parquet / none / none | 30.93   | -3.02% | 17.46  
| -4.71% |
+---+---+-++++

Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-builder.h
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/fragment-instance-state.cc
A be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/runtime-state.h
M be/src/util/runtime-profile.h
A 
testdata/workloads/functional-query/queries/QueryTest/datastream-sender-codegen.test
M tests/query_test/test_codegen.py
26 files changed, 424 insertions(+), 101 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/10421/6
--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 6
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-06-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 5: Verified-1

Build failed: https://jenkins.impala.io/job/gerrit-verify-dryrun/2640/


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 12 Jun 2018 21:32:52 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-06-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 5: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 12 Jun 2018 18:15:37 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-06-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 5:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2640/


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 5
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 12 Jun 2018 18:15:38 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-06-12 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 4: Code-Review+2

Rebase. Carry +2.


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 4
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Tue, 12 Jun 2018 18:14:18 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-05-25 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 3:

Build started: https://jenkins.impala.io/job/gerrit-verify-dryrun/2556/


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Impala Public Jenkins 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 May 2018 18:01:54 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-05-25 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 3: Code-Review+2

Rebase. Carry +2.


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 3
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Fri, 25 May 2018 18:01:11 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-05-23 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 2:

Oh, that's not nearly as bad as I expected. I think we can probably live with 
that given the huge speedups we're getting elsewhere.


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 23 May 2018 22:18:39 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-05-23 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 2: Code-Review+2


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 23 May 2018 22:18:43 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-05-22 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 2:

Tim, thanks for pointing that out. I tried the idea of splitting up each 
partition expression evaluation and hashing into a separate function. The idea 
is that if there are few expressions, the compiler should be smart enough to 
inline them so we will get the same benefit. If there are too many expressions, 
the compiler will do appropriate level of inlining. However, for some reasons, 
this doesn't quite pan out as expected. In particular, splitting each 
expression into a separate function resulted in slightly longer codegen time. 
The patch is here 
(https://github.com/michaelhkw/incubator-impala/commit/76d071f1bf2d82d4b0905ed2e64150f43198538f).
 I compared the level of regression with different wide tables using the query 
you suggested:

widetable_250_cols: 5% regression
widetable_1000_cols: 3% regression
widetable_250_cols_big: 12% regression

widetable_250_cols_big is created by concatenating multiple widetable_250_cols 
to a table with 80 rows. It's large enough to trigger 3 instances of scan 
nodes (instead of 1 in the case of widetable_250_col). The larger table 
triggers slightly different optimized code as there are 3 channels. In the case 
of widetable_250_col (smaller table), there is only 1 channel so the optimizer 
was smart enough to optimize away all the evaluate and hash functions  as all 
rows are being added to the same channel.

Anyhow, I am open to other suggestions (e.g. optimizing the patch posted 
above). Given the level regression observed above, do you think there are extra 
steps which we should take ?


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Wed, 23 May 2018 01:21:16 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-05-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10421/2/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/10421/2/be/src/runtime/krpc-data-stream-sender.cc@716
PS2, Line 716:   for (int i = 0; i < partition_exprs_.size(); ++i) {
> Have you done any experiments for wide rows to determine the codegen time?
A concrete scenario where N could be large here is something like "select 
distinct * from wide_table"



--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 21 May 2018 16:15:37 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-05-21 Thread Tim Armstrong (Code Review)
Tim Armstrong has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 2:

(1 comment)

Code looks good, just want to be sure we get ahead of any codegen time issues :)

http://gerrit.cloudera.org:8080/#/c/10421/2/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/10421/2/be/src/runtime/krpc-data-stream-sender.cc@716
PS2, Line 716:   for (int i = 0; i < partition_exprs_.size(); ++i) {
Have you done any experiments for wide rows to determine the codegen time? 
Unrolling a loop N times is the classic pattern that we've seen cause codegen 
time to blow up before.

It would be good to get ahead of it in this instance and either do something to 
mitigate it (like http://gerrit.cloudera.org:8080/8211), or disable it above 
some number of expressions to avoid the regression



--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Michael Ho 
Gerrit-Reviewer: Tim Armstrong 
Gerrit-Comment-Date: Mon, 21 May 2018 16:14:02 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-05-17 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 2: Code-Review+1


--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Thu, 17 May 2018 23:56:23 +
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-05-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 2:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/10421/1/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test
File testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test:

http://gerrit.cloudera.org:8080/#/c/10421/1/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test@61
PS1, Line 61:
> would you be adding the tests in the next patch?
Oh sorry, forgot to reply. I added it in the new file 
datastream-sender-codegen.test.



--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Thu, 17 May 2018 23:38:20 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-05-17 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 2:

(2 comments)

http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@718
PS1, Line 718:  
RETURN_IF_ERROR(partition_exprs_[i]->GetCodegendComputeFn(codegen, 
&compute_fn));
 :
 : // Load the expression evaluator for the i-th partition 
expression
 : llvm::Function* get_expr_eval_fn =
 : 
codegen->GetFunction(IRFunction::KRPC_DSS_GET_PART_EXPR_EVAL, false);
 : DCHECK(get_expr_eval_fn != nullptr);
> The above is actually the unoptimized version of the code. The reason I avo
sounds good


http://gerrit.cloudera.org:8080/#/c/10421/1/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test
File testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test:

http://gerrit.cloudera.org:8080/#/c/10421/1/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test@61
PS1, Line 61:
> add cases for "codegen not supported/needed"
would you be adding the tests in the next patch?



--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Thu, 17 May 2018 22:40:42 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-05-17 Thread Michael Ho (Code Review)
Michael Ho has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 2:

(4 comments)

http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@718
PS1, Line 718:  
RETURN_IF_ERROR(partition_exprs_[i]->GetCodegendComputeFn(codegen, 
&compute_fn));
 :
 : // Load the expression evaluator for the i-th partition 
expression
 : llvm::Function* get_expr_eval_fn =
 : 
codegen->GetFunction(IRFunction::KRPC_DSS_GET_PART_EXPR_EVAL, false);
 : DCHECK(get_expr_eval_fn != nullptr);
> can we use Builder.CreateExtractValue() and get rid of this function call.
The above is actually the unoptimized version of the code. The reason I avoided 
using CreateExtractValue() or GetElementPtr() etc is to avoid the need to make 
assumption about the order of the fields inside the struct plus it's easier to 
review in general.

I confirmed in the optimized version of the code that this function call is 
actually inlined and optimized away as dead code.


http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@803
PS1, Line 803: t string sender_name = PartitionTypeName() + " Send
> nit: Invert condition to avoid indentation.
Done


http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@818
PS1, Line 818: ced;
> nit: HashRow()
Done


http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@820
PS1, Line 820: en->ReplaceCallSitesWithValue(hash_and_add_
> nit: maybe use a named const like
Done



--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Reviewer: Michael Ho 
Gerrit-Comment-Date: Thu, 17 May 2018 22:11:49 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-05-17 Thread Michael Ho (Code Review)
Michael Ho has uploaded a new patch set (#2). ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..

IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send()

This change codegens the hash partitioning logic of
KrpcDataStreamSender::Send() when the partitioning strategy
is HASH_PARTITIONED. It does so by unrolling the loop which
evaluates each row against the partitioning expressions and
hashes the result. It also replaces the number of channels
of that sender with a constant at runtime.

With this change, we get reasonable speedup with some benchmarks:

++---+-++++
| Workload   | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
++---+-++++
| TPCH(_300) | parquet / none / none | 20.03   | -6.44% | 13.56  | 
-7.15% |
++---+-++++

+-+---+-++++
| Workload| File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |
+-+---+-++++
| TARGETED-PERF(_300) | parquet / none / none | 58.59   | -5.56% | 12.28
  | -5.30% |
+-+---+-++++

+-+---+-++++
| Workload| File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |
+-+---+-++++
| TPCDS-UNMODIFIED(_1000) | parquet / none / none | 15.60   | -3.10% | 7.16 
  | -4.33% |
+-+---+-++++

+---+---+-++++
| Workload  | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) 
| Delta(GeoMean) |
+---+---+-++++
| TPCH_NESTED(_300) | parquet / none / none | 30.93   | -3.02% | 17.46  
| -4.71% |
+---+---+-++++

Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/fragment-instance-state.cc
A be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/runtime-state.h
M be/src/util/runtime-profile.h
A 
testdata/workloads/functional-query/queries/QueryTest/datastream-sender-codegen.test
M tests/query_test/test_codegen.py
25 files changed, 427 insertions(+), 100 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/10421/2
--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 2
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-05-16 Thread Bikramjeet Vig (Code Review)
Bikramjeet Vig has posted comments on this change. ( 
http://gerrit.cloudera.org:8080/10421 )

Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..


Patch Set 1:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc
File be/src/runtime/krpc-data-stream-sender.cc:

http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@718
PS1, Line 718:  // Load the expression evaluator for the i-th partition 
expression
 : llvm::Function* get_expr_eval_fn =
 : 
codegen->GetFunction(IRFunction::KRPC_DSS_GET_PART_EXPR_EVAL, false);
 : DCHECK(get_expr_eval_fn != nullptr);
 : llvm::Value* expr_eval_arg =
 : builder.CreateCall(get_expr_eval_fn, {this_arg, 
codegen->GetI32Constant(i)});
can we use Builder.CreateExtractValue() and get rid of this function call.
(Looks like something that the llvm optimizer could have inlined but didn't 
according to the sample generated code above)


http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@803
PS1, Line 803: partition_type_ == TPartitionType::HASH_PARTITIONED
nit: Invert condition to avoid indentation.

if (partition_type_ != TPartitionType::HASH_PARTITIONED) {
const string& msg = Substitute("not $0",
partition_type_ == TPartitionType::KUDU ? "supported" : "needed");
profile()->AddCodegenMsg(false, msg, sender_name);
return;
}

llvm::Function* hash_row_fn;
codegen_status = CodegenHashRow(codegen, &hash_row_fn);
.
.
.


http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@818
PS1, Line 818: HashRows
nit: HashRow()


http://gerrit.cloudera.org:8080/#/c/10421/1/be/src/runtime/krpc-data-stream-sender.cc@820
PS1, Line 820: KrpcDataStreamSender7HashRowEPNS_8TupleRowE
nit: maybe use a named const like
const char* HASH_ROW_SYMBOL


http://gerrit.cloudera.org:8080/#/c/10421/1/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test
File testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test:

http://gerrit.cloudera.org:8080/#/c/10421/1/testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test@61
PS1, Line 61:  QUERY
add cases for "codegen not supported/needed"



--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: comment
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho 
Gerrit-Reviewer: Bikramjeet Vig 
Gerrit-Comment-Date: Wed, 16 May 2018 23:42:33 +
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-5168: Codegen HASH PARTITIONED KrpcDataStreamSender::Send()

2018-05-15 Thread Michael Ho (Code Review)
Michael Ho has uploaded this change for review. ( 
http://gerrit.cloudera.org:8080/10421


Change subject: IMPALA-5168: Codegen HASH_PARTITIONED 
KrpcDataStreamSender::Send()
..

IMPALA-5168: Codegen HASH_PARTITIONED KrpcDataStreamSender::Send()

This change codegens the hash partitioning logic of
KrpcDataStreamSender::Send() when the partitioning strategy
is HASH_PARTITIONED. It does so by unrolling the loop which
evaluates each row against the partitioning expressions and
hashes the result. It also replaces the number of channels
of that sender with a constant at runtime.

With this change, we get reasonable speedup with some benchmarks:

++---+-++++
| Workload   | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) | 
Delta(GeoMean) |
++---+-++++
| TPCH(_300) | parquet / none / none | 20.03   | -6.44% | 13.56  | 
-7.15% |
++---+-++++

+-+---+-++++
| Workload| File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |
+-+---+-++++
| TARGETED-PERF(_300) | parquet / none / none | 58.59   | -5.56% | 12.28
  | -5.30% |
+-+---+-++++

+-+---+-++++
| Workload| File Format   | Avg (s) | Delta(Avg) | 
GeoMean(s) | Delta(GeoMean) |
+-+---+-++++
| TPCDS-UNMODIFIED(_1000) | parquet / none / none | 15.60   | -3.10% | 7.16 
  | -4.33% |
+-+---+-++++

+---+---+-++++
| Workload  | File Format   | Avg (s) | Delta(Avg) | GeoMean(s) 
| Delta(GeoMean) |
+---+---+-++++
| TPCH_NESTED(_300) | parquet / none / none | 30.93   | -3.02% | 17.46  
| -4.71% |
+---+---+-++++

Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
---
M be/src/codegen/gen_ir_descriptions.py
M be/src/codegen/impala-ir.cc
M be/src/exec/data-sink.cc
M be/src/exec/data-sink.h
M be/src/exec/exchange-node.cc
M be/src/exec/exec-node.cc
M be/src/exec/exec-node.h
M be/src/exec/hdfs-scan-node-base.cc
M be/src/exec/partial-sort-node.cc
M be/src/exec/partitioned-aggregation-node.cc
M be/src/exec/partitioned-hash-join-builder.cc
M be/src/exec/partitioned-hash-join-node.cc
M be/src/exec/sort-node.cc
M be/src/exec/topn-node.cc
M be/src/runtime/CMakeLists.txt
M be/src/runtime/fragment-instance-state.cc
A be/src/runtime/krpc-data-stream-sender-ir.cc
M be/src/runtime/krpc-data-stream-sender.cc
M be/src/runtime/krpc-data-stream-sender.h
M be/src/runtime/raw-value-ir.cc
M be/src/runtime/raw-value.cc
M be/src/runtime/runtime-state.h
M be/src/util/runtime-profile.h
M testdata/workloads/functional-query/queries/QueryTest/disable-codegen.test
24 files changed, 397 insertions(+), 99 deletions(-)



  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/21/10421/1
--
To view, visit http://gerrit.cloudera.org:8080/10421
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1c44cc9312c062cc7a5a4ac9156ceaa31fb887ff
Gerrit-Change-Number: 10421
Gerrit-PatchSet: 1
Gerrit-Owner: Michael Ho