[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. It also disallows INSERT hints for Kudu tables, since the hints that we support (SHUFFLE, CLUSTER, SORTBY), so longer make sense. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Reviewed-on: http://gerrit.cloudera.org:8080/6559 Reviewed-by: Thomas Tauber-MarshallTested-by: Impala Public Jenkins --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-context.h M be/src/exprs/expr.cc A be/src/exprs/kudu-partition-expr.cc A be/src/exprs/kudu-partition-expr.h M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M common/thrift/Exprs.thrift M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test 26 files changed, 616 insertions(+), 170 deletions(-) Approvals: Impala Public Jenkins: Verified Thomas Tauber-Marshall: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 9: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 9: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/523/ -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 9: Code-Review+2 -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6559 to look at the new patch set (#9). Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. It also disallows INSERT hints for Kudu tables, since the hints that we support (SHUFFLE, CLUSTER, SORTBY), so longer make sense. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-context.h M be/src/exprs/expr.cc A be/src/exprs/kudu-partition-expr.cc A be/src/exprs/kudu-partition-expr.h M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M common/thrift/Exprs.thrift M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test 26 files changed, 616 insertions(+), 170 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6559/9 -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 8: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/6559/7/fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java File fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java: Line 37: * a given row. Returns -1 for rows that do not correspond to a partition. The children of > is it documented in some class header that values outside the legal range r Done Line 74: for (int i = 0; i < children_.size(); ++i) { > single line Done -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Hello Marcel Kornacker, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6559 to look at the new patch set (#8). Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. It also disallows INSERT hints for Kudu tables, since the hints that we support (SHUFFLE, CLUSTER, SORTBY), so longer make sense. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-context.h M be/src/exprs/expr.cc A be/src/exprs/kudu-partition-expr.cc A be/src/exprs/kudu-partition-expr.h M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M bin/impala-config.sh M common/thrift/Exprs.thrift M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test 27 files changed, 617 insertions(+), 171 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6559/8 -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 7: Code-Review+2 (2 comments) http://gerrit.cloudera.org:8080/#/c/6559/7/fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java File fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java: Line 37: * a given row. The children of this Expr produce the values for the partition columns. is it documented in some class header that values outside the legal range return a -1? probably most appropriate in the .h file. Line 74: if (i != 0) { single line -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has uploaded a new patch set (#7). Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. It also disallows INSERT hints for Kudu tables, since the hints that we support (SHUFFLE, CLUSTER, SORTBY), so longer make sense. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-context.h M be/src/exprs/expr.cc A be/src/exprs/kudu-partition-expr.cc A be/src/exprs/kudu-partition-expr.h M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M bin/impala-config.sh M common/thrift/Exprs.thrift M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test 27 files changed, 618 insertions(+), 171 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6559/7 -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has uploaded a new patch set (#6). Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. It also disallows INSERT hints for Kudu tables, since the hints that we support (SHUFFLE, CLUSTER, SORTBY), so longer make sense. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-context.h M be/src/exprs/expr.cc A be/src/exprs/kudu-partition-expr.cc A be/src/exprs/kudu-partition-expr.h M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M bin/impala-config.sh M common/thrift/Exprs.thrift M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test 27 files changed, 621 insertions(+), 162 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6559/6 -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 5: (22 comments) http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exec/kudu-util.h File be/src/exec/kudu-util.h: Line 65: void* value, bool copy_strings = true); const void* value, to indicate it's not an output param http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exprs/kudu-partition-expr.cc File be/src/exprs/kudu-partition-expr.cc: Line 64: row_.reset(table->schema().NewRow()); can this not return an error? Line 71: int col = tkudu_partition_expr_.referenced_columns[i]; move to where it's being used leave todo somewhere that the schema/our metadata should record the partition cols, this should really be part of the table descriptor. Line 75: // the KuduTableSink generate the error message. what's the point of round-robin instead of always returning, say, 0? Line 79: Status s = WriteKuduRowValue(row_.get(), col, type, val, false); where is this declared? it might be nice to require some form of namespace qualification for functions that aren't in the impala namespace. i wish kudu hadn't introduced these cumbersome nested namespaces. Line 82: DCHECK(s.ok()); add some context output (print called function, type and col name?). Line 89: DCHECK(s.ok()); same here http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exprs/kudu-partition-expr.h File be/src/exprs/kudu-partition-expr.h: Line 24: #include "runtime/descriptors.h" do you need this include, as opposed to forward decls? Line 46: TKuduPartitionExpr tkudu_partition_expr_; forward declare? Line 50: /// Used to call into Kudu to determine partitions. it's nice to indicate what gets set in prepare(), because that tells me it doesn't change during execution (despite the fact that it's not const). same for the other member vars. http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: Line 350: kudu_ = sink.output_partition.type == TPartitionType::KUDU; have the partition type as a member var, rather than 3 bools? Line 425: // to simplify this. i don't think that's practical, for broadcast and random we have different/more efficient logic that doesn't require evaluating exprs. Line 456: // hash-partition batch's rows across channels i'd leave a todo here to coalesce this with the kudu case, which would require the hash computation to be done via an expr (which probably requires codegen support, which would be worth having anyway). http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/runtime/data-stream-sender.h File be/src/runtime/data-stream-sender.h: Line 109: bool kudu_; comment missing http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 115: private List partitionKeyIdxs_ = Lists.newArrayList(); > The primary keys are always the first n columns, but the partition keys can that would be good to explain in the code, it's a bit subtle http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 643: Set kuduPartitionByColumnNames = null; get rid of 'partitionby' here as well Line 682: // declaration, and store their indexes. We need those exprs in the original order to 'store their indexes' is a bit mysterious, 'column indexes' or even column positions is more informative. http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java File fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java: Line 90: protected String toSqlImpl() { return "KuduPartition()"; } > Yes, it is included in the explain output. See the planner test files for e i think so, as it is it's pretty useless. http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java File fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java: Line 44: // Maps from this Epxrs children to column positions in the table, i.e. children_[i] since these are column positions, call the variable that? i have to keep reminding myself what 'key indexes' are. Line 48: private List partitionKeyTypes_; isn't this available in the table descriptor? http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/planner/DataPartition.java File fe/src/main/java/org/apache/impala/planner/DataPartition.java: Line 82: return new DataPartition(TPartitionType.KUDU, expr); why not just call the list c'tor and create a list here? (and drop the new non-list c'tor)
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has uploaded a new patch set (#5). Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. It also disallows INSERT hints for Kudu tables, since the hints that we support (SHUFFLE, CLUSTER, SORTBY), so longer make sense. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-context.h M be/src/exprs/expr.cc A be/src/exprs/kudu-partition-expr.cc A be/src/exprs/kudu-partition-expr.h M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M bin/impala-config.sh M common/thrift/Exprs.thrift M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test 27 files changed, 621 insertions(+), 162 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6559/5 -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 5: (24 comments) http://gerrit.cloudera.org:8080/#/c/6559/4/common/thrift/DataSinks.thrift File common/thrift/DataSinks.thrift: Line 59: // Creates a new Hdfs files according to the evaluation of the partitionKeyExprs, > how is this related to TDataPartition.partition_exprs? As discussed, we're going to move this to TDataPartition.partition_exprs. http://gerrit.cloudera.org:8080/#/c/6559/4/common/thrift/Exprs.thrift File common/thrift/Exprs.thrift: Line 128: struct TKuduPartitionExpr { > this feels redundant As discussed, we're keeping it but eliminating some unnecessary members. Line 129: // The Kudu table to use the partitioning scheme from. > how does TTableSink.target_table_id not capture that? The TTableSink is not accessible in the expr. Line 134: 2: required list referenced_columns > shouldn't this be part of the descriptor of the target table? ie, shouldn't Done Line 138: struct TExprNode { > what about TKuduTableSink.referenced_columns? The TKuduTableSink is not accessible from the expr. We could potentially use the KuduTableDescriptor but its not currently possible to eliminate this without other changes, because the KuduTableDescriptor exposes the primary key columns, but not the partition columns. The partition columns are contained within the TKuduTable that's used to construct the KuduTableDescriptor, but not in a particularly convenient form to extract the info we need here, due to the indirection of TKuduPartitionParam. I'm still happy to make this change if you feel its an improvement (and in the long run it would be nice to be able to get info about the partition cols from the KuduTableDescriptor), but I wanted to check before writing the code. That's especially since there's some design decisions - e.g. we could provide KuduTableDescriptor->partition_columns() that returns a list of column names, the way KuduTableDescriptor->key_columns() does, which would be sufficient for our needs here, but then we wouldn't know if they were hash or range partitions, what the range is, etc, so maybe instead just provide a list of TKuduPartitionParams or something. http://gerrit.cloudera.org:8080/#/c/6559/4/common/thrift/Partitions.thrift File common/thrift/Partitions.thrift: Line 40: // and then this can be removed. (IMPALA-5255) > could you file a jira for this? the reason this may not just be a cosmetic I filed: IMPALA-5255. http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 109: // Set in analyze(). Exprs correspond to the partitionKeyValues, if specified, or to > correspond Done Line 114: // table, eg. partitionKeyExprs_[i] corresponds to table_.columns(partitionKeyIdx_[i]). > the equality is wrong. partitionkeyexprs_ are the exprs from the select lis Done Line 115: private List partitionKeyIdxs_ = Lists.newArrayList(); > why wouldn't this just be 1, 2, 3, ...? partitionkeyexprs is already in tab The primary keys are always the first n columns, but the partition keys can be any subset of the primary keys, so the values in this list will be in ascending order, but it may skip some values. Line 699: // Make sure we have stats for partitionKeyExprs > odd indentation Done http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java File fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java: Line 36: * Internal expr that calls into the Kudu client to determine the partition index for > does this really need to exist in the fe, or is this an execution-/be-only As discussed, this is necessary, for example because we need an Expr that we can pass to the sorter. Line 90: protected String toSqlImpl() { return "KuduPartition()"; } > this feels wrong. will this ever get called? Yes, it is included in the explain output. See the planner test files for examples. That does bring up though that we may want to include more info here, such as the table name, a list of the partition columns, or the toSql of the partition exprs. http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: Line 154: public Set getPartitionColumnNames() { > call these partition column names? we also use 'partition exprs' and not 'p Done http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/planner/DataPartition.java File fe/src/main/java/org/apache/impala/planner/DataPartition.java: Line 45: // for hash partition: exprs used to compute hash value > pointless changes, there's no style requirement for this. Done
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 4: (24 comments) only looked at thrift and fe changes so far. let's discuss this in person/over hangout tomorrow. http://gerrit.cloudera.org:8080/#/c/6559/4/common/thrift/DataSinks.thrift File common/thrift/DataSinks.thrift: Line 59: 3: optional Exprs.TExpr partition_expr how is this related to TDataPartition.partition_exprs? also, the fact that it's kudu-specific should be reflected in the name. another naming issue: this returns an index, which isn't the case for the normal partition exprs. would be good to reflect that in the name, it's a bit confusing to read right now. http://gerrit.cloudera.org:8080/#/c/6559/4/common/thrift/Exprs.thrift File common/thrift/Exprs.thrift: Line 128: struct TKuduPartitionExpr { this feels redundant Line 129: // The Kudu table to use the partitioning scheme from. how does TTableSink.target_table_id not capture that? Line 134: 2: required list types shouldn't this be part of the descriptor of the target table? ie, shouldn't the kudu sink get that from elsewhere? same for the other fields in this struct. Line 138: 3: required list referenced_columns what about TKuduTableSink.referenced_columns? http://gerrit.cloudera.org:8080/#/c/6559/4/common/thrift/Partitions.thrift File common/thrift/Partitions.thrift: Line 40: // and then this can be removed. could you file a jira for this? the reason this may not just be a cosmetic issue is that by exposing the actual sequence of partitioning parameters, we can then optimize things like grouping aggregation and joins that often require repartitioning (by saving that repartitioning step). http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 109: // Set in analyze(). Exprs corresponding to the partitionKeyValues, if specified, or to correspond Line 114: // table, eg. partitionKeyExprs_[i] = table_.columns(partitionKeyIdx_[i]) the equality is wrong. partitionkeyexprs_ are the exprs from the select list, not columns. Line 115: private List partitionKeyIdxs_ = Lists.newArrayList(); why wouldn't this just be 1, 2, 3, ...? partitionkeyexprs is already in table order, not in select list order. Line 699:&& partitionKeyExprs_.size() == kuduPartitionByColumnNames.size())); odd indentation http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java File fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java: Line 36: * Internal expr that calls into the Kudu client to determine the partition index for does this really need to exist in the fe, or is this an execution-/be-only construct? it seems like the metadata is already duplicated elsewhere, at least in the thrift structure, and you could construct the corresponding be structure with just that. Line 90: protected String toSqlImpl() { return "KuduPartition()"; } this feels wrong. will this ever get called? http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: Line 154: public Set getPartitionByColumnNames() { call these partition column names? we also use 'partition exprs' and not 'partition-by exprs'. http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/planner/DataPartition.java File fe/src/main/java/org/apache/impala/planner/DataPartition.java: Line 45: // For hash partition: exprs used to compute hash value. pointless changes, there's no style requirement for this. http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/planner/DataStreamSink.java File fe/src/main/java/org/apache/impala/planner/DataStreamSink.java: Line 35: private Expr partitionExpr_; same naming comment as for TDataStreamSink http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: Line 247: Expr partitionExpr = null; partitionexpr and partitionexprs sounds too similar, and they are distinct concepts. http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/planner/PlanFragment.java File fe/src/main/java/org/apache/impala/planner/PlanFragment.java: Line 76: public class PlanFragment extends TreeNode { use blank lines where appropriate Line 99: private Expr partitionExpr_; this seems like an operational parameter of the partition (=dataPartition_) itself. why wouldn't it be part of that? DataPartition stores partitionExprs_, which are an operational parameter of hash partitions. how is this related to DSS.partitionExpr_?
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Alex Behm has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 4: I don't think we need an off switch for 10%. Seems better to re-claim the 10% with other independent improvements if we are really after that. I'm in favor of not having any insert hints for Kudu (simpler is better). -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 4: > As requested, I ran some tests with tpcds_1000_text.store_sales: > > with patch: 3755.68s > with patch and noshuffle: 3778.62s > without patch: 3571.33s > > The first two are averages over three runs, for the third it only > passed twice out of 10 runs. I also tried doing "with patch and > noclustered" but it timed out every time I ran it. > > So again we see here a ~10% perf cost in exchange for consistently > running to completion. > > These results also suggest that its the pre-sorting, not the > pre-partitioning, that's making the biggest difference in whether > or not Kudu can handle the load. We may be able to get more impact > from the partitioning if we start partitioning the rows to the > impalad collocated with the corresponding tserver, though that's > going to take some changes to the scheduler and we're leaving it as > future work. > > As far as adding the ability to turn this off: the current version > of the patch just disables insert hints for Kudu tables, as none of > the existing hints seem to make sense (would people need to specify > both 'noshuffle' and 'noclustered' to turn this off?). > > If anyone feels that its important to have an off switch, I'm happy > to include one (possibly with a new insert hint?), but I'm not sure > how much of a benefit that would be vs. the added complexity of > more knobs, given the above perf numbers and the fact that the > partitioning and sorting has to happen either way, its just a > question of whether its happing in Impala or in Kudu. > > Its also already the case that the exchange and sort won't happen > for single node plans as determined by exec_single_node_rows_threshold, > so for example small VALUES inserts will be unaffected, and we have > as future work to examine and improve this decision, such as by not > adding the exchange if the input is already partitioned correctly. Thanks, Thomas. I think these results are consistent with our thinking that this patch is beneficial (runs actually completed) and at an acceptable cost (10% which can be reclaimed later). I agree that an off switch should be considered, but let's not try to squeeze it into this patch which we should get in. We can address that as a follow-up task, and we can give it the attention it deserves to make sure it's done in a way that won't overcomplicate the knobs we already have. -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 4: As requested, I ran some tests with tpcds_1000_text.store_sales: with patch: 3755.68s with patch and noshuffle: 3778.62s without patch: 3571.33s The first two are averages over three runs, for the third it only passed twice out of 10 runs. I also tried doing "with patch and noclustered" but it timed out every time I ran it. So again we see here a ~10% perf cost in exchange for consistently running to completion. These results also suggest that its the pre-sorting, not the pre-partitioning, that's making the biggest difference in whether or not Kudu can handle the load. We may be able to get more impact from the partitioning if we start partitioning the rows to the impalad collocated with the corresponding tserver, though that's going to take some changes to the scheduler and we're leaving it as future work. As far as adding the ability to turn this off: the current version of the patch just disables insert hints for Kudu tables, as none of the existing hints seem to make sense (would people need to specify both 'noshuffle' and 'noclustered' to turn this off?). If anyone feels that its important to have an off switch, I'm happy to include one (possibly with a new insert hint?), but I'm not sure how much of a benefit that would be vs. the added complexity of more knobs, given the above perf numbers and the fact that the partitioning and sorting has to happen either way, its just a question of whether its happing in Impala or in Kudu. Its also already the case that the exchange and sort won't happen for single node plans as determined by exec_single_node_rows_threshold, so for example small VALUES inserts will be unaffected, and we have as future work to examine and improve this decision, such as by not adding the exchange if the input is already partitioned correctly. -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 4: > > > > Perf results from running on the 10 node cluster: > > > > > > > > For smaller queries that we were able to handle previously, > > > there's > > > > a regression of about 10% in overall query running time (all > > > > averaged over 3 runs): > > > > 240.13s with the patch vs. 224.58s previously for 200m rows > > > > inserted > > > > 472.97s with the patch vs. 433.05s previously for 400m rows > > > > inserted > > > > That's unfortunate, but can be improved in the future, e.g. > by > > > > codegen-ing the partition function. > > > > > > > > But, we can now handle significantly larger inserts - I was > > > seeing > > > > timeouts regularly at > 400m rows previously, but with the > > patch > > > > I've tested up to 6b row inserts without any timeouts. > > > > > > Those are good results, and the perf regression is fairly > > > negligible, so let's not worry about that for now. > > > > Currently KuduPartitionExpr::GetIntVal is very expensive, > although > > it is not currently the bottleneck we should. > > During the conducted tests is soft memory limit reached? > > While I think we should look at what we can do to improve the perf, > I don't think this should prevent us from getting this change in. > The most important thing is that non-trivial DML statements > actually complete without timing out. Nobody will touch this > otherwise. We can work on perf after this change, but probably the > issue is lack of codegen in the kudu client itself which will be > really hard to address and there is probably much lower hanging > fruit to reclaim the 11%. Agreed. -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 4: > > > Perf results from running on the 10 node cluster: > > > > > > For smaller queries that we were able to handle previously, > > there's > > > a regression of about 10% in overall query running time (all > > > averaged over 3 runs): > > > 240.13s with the patch vs. 224.58s previously for 200m rows > > > inserted > > > 472.97s with the patch vs. 433.05s previously for 400m rows > > > inserted > > > That's unfortunate, but can be improved in the future, e.g. by > > > codegen-ing the partition function. > > > > > > But, we can now handle significantly larger inserts - I was > > seeing > > > timeouts regularly at > 400m rows previously, but with the > patch > > > I've tested up to 6b row inserts without any timeouts. > > > > Those are good results, and the perf regression is fairly > > negligible, so let's not worry about that for now. > > Currently KuduPartitionExpr::GetIntVal is very expensive, although > it is not currently the bottleneck we should. > During the conducted tests is soft memory limit reached? While I think we should look at what we can do to improve the perf, I don't think this should prevent us from getting this change in. The most important thing is that non-trivial DML statements actually complete without timing out. Nobody will touch this otherwise. We can work on perf after this change, but probably the issue is lack of codegen in the kudu client itself which will be really hard to address and there is probably much lower hanging fruit to reclaim the 11%. -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 4: > Perf results from running on the 10 node cluster: > > For smaller queries that we were able to handle previously, there's > a regression of about 10% in overall query running time (all > averaged over 3 runs): > 240.13s with the patch vs. 224.58s previously for 200m rows > inserted > 472.97s with the patch vs. 433.05s previously for 400m rows > inserted > That's unfortunate, but can be improved in the future, e.g. by > codegen-ing the partition function. > > But, we can now handle significantly larger inserts - I was seeing > timeouts regularly at > 400m rows previously, but with the patch > I've tested up to 6b row inserts without any timeouts. Those are good results, and the perf regression is fairly negligible, so let's not worry about that for now. -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 4: Perf results from running on the 10 node cluster: For smaller queries that we were able to handle previously, there's a regression of about 10% in overall query running time (all averaged over 3 runs): 240.13s with the patch vs. 224.58s previously for 200m rows inserted 472.97s with the patch vs. 433.05s previously for 400m rows inserted That's unfortunate, but can be improved in the future, e.g. by codegen-ing the partition function. But, we can now handle significantly larger inserts - I was seeing timeouts regularly at > 400m rows previously, but with the patch I've tested up to 6b row inserts without any timeouts. -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has uploaded a new patch set (#4). Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. It also disallows INSERT hints for Kudu tables, since the hints that we support (SHUFFLE, CLUSTER, SORTBY), so longer make sense. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-context.h M be/src/exprs/expr.cc A be/src/exprs/kudu-partition-expr.cc A be/src/exprs/kudu-partition-expr.h M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M bin/impala-config.sh M common/thrift/DataSinks.thrift M common/thrift/Exprs.thrift M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test 30 files changed, 633 insertions(+), 167 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6559/4 -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has uploaded a new patch set (#3). Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-context.h M be/src/exprs/expr.cc A be/src/exprs/kudu-partition-expr.cc A be/src/exprs/kudu-partition-expr.h M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M bin/impala-config.sh M common/thrift/DataSinks.thrift M common/thrift/Exprs.thrift M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 27 files changed, 590 insertions(+), 88 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6559/3 -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has abandoned this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Abandoned Submitted by mistake. -- To view, visit http://gerrit.cloudera.org:8080/6593 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: abandon Gerrit-Change-Id: I8a1ba41eb3bdb3944349ecea87806d57f74feb82 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 2: (25 comments) http://gerrit.cloudera.org:8080/#/c/6559/1/be/src/exec/kudu-util.h File be/src/exec/kudu-util.h: PS1, Line 62: int col, PrimitiveType type > Comment on these params. Done http://gerrit.cloudera.org:8080/#/c/6559/2/be/src/exprs/kudu-partition-expr.cc File be/src/exprs/kudu-partition-expr.cc: PS2, Line 48: using namespace kudu::client > we typically don't use inline 'using' statements Done PS2, Line 51: CreateKuduClient > use QueryState::GetKuduClient I haven't rebased in awhile and don't have this locally, and rebasing is probably going to be painful as I'll have to rebase my patched version of Kudu for the toolchain also, so I'm going to punt on it for today and address it when I have more time. Line 61: RegisterFunctionContext(ctx, state); > comment the return value is ignored because the idx is stored on the base c Done PS2, Line 72: if (table_schema.Column(col).is_nullable()) { > this should never be true right now, PK cols are not nullable. per my comme Done PS2, Line 80: return PartitionError(ctx, : ErrorMsg::Init(TErrorCode::KUDU_NULL_CONSTRAINT_VIOLATION, : table_desc_->table_name()).msg()); > This will change the failure mode of our inserts, which is that NCV do not Done Line 87: if (!s.ok()) { > I checked with Alexey on the Kudu team, this would only fail if we: a) wrot Done PS2, Line 95: if (!s.ok()) { : return PartitionError( : ctx, Substitute("Failed to determine Kudu partition for row: $0", s.ToString())); : } > same thing about errors here, I think if this returns an error it means we Done PS2, Line 110: ctx->fn_context(fn_context_index_)->AddWarning(msg.c_str()); > I don't know if there are any errors we need to be reporting here. Done http://gerrit.cloudera.org:8080/#/c/6559/2/be/src/exprs/kudu-partition-expr.h File be/src/exprs/kudu-partition-expr.h: PS2, Line 30: number > index Done PS2, Line 47: // > /// Done http://gerrit.cloudera.org:8080/#/c/6559/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS1, Line 1845: f (fragment.__isset.output_sink && fragment.output_sink.__isset.stream_sink : && fragment.output_sink.type == TDataSinkType::DATA_STREAM_SINK : && fragment.output_sink.stream_sink.output_partition.type == TPartitionType::KUDU) > Add a brief comment. Done http://gerrit.cloudera.org:8080/#/c/6559/1/be/src/scheduling/scheduler.cc File be/src/scheduling/scheduler.cc: Line 364: DCHECK(sink.output_partition.type == TPartitionType::UNPARTITIONED > I don't see what the point of this is. This enumerates possible values. It doesn't include RANGE_PARTITIONED, although I don't believe we currently use RANGE_PARTITIONED, so yeah not sure what this DCHECK is accomplishing. Want me to remove it? http://gerrit.cloudera.org:8080/#/c/6559/1/common/thrift/Exprs.thrift File common/thrift/Exprs.thrift: PS1, Line 134: 4 > what happened to 2 and 3? Done PS1, Line 137: 5 > 3? :) Done http://gerrit.cloudera.org:8080/#/c/6559/2/common/thrift/Partitions.thrift File common/thrift/Partitions.thrift: PS2, Line 50: // Expr that takes a row and returns its partition number. If specified, : // 'partition_exprs' will be empty. Used for TPartitionType::KUDU. : 3: optional Exprs.TExpr partition_expr > how about for Kudu the expr just goes in partition_exprs[0] (and it's the o As discussed with Marcel, we're moving partition_expr out of TDataPartition anyways. http://gerrit.cloudera.org:8080/#/c/6559/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS1, Line 113: // Set in analyze(). Maps exprs in partitionKeyExprs_ to their column's position in the : // table, eg. partitionKeyExprs_[i] = table_.columns(partitionKeyIdx_[i]) : private List partitionKeyIdxs_ = Lists.newArrayList(); > Just for my understanding, can you explain why this is needed for this chan The partition keys are a subset of the primary keys and so not guaranteed to be the first n columns. We only need to send the partition keys to Kudu to get the partition, so this allows us to determine what column positions to write the partition values for the row into in the be. It gets passed through TKuduPartitionExpr.referenced_columns http://gerrit.cloudera.org:8080/#/c/6559/2/fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java File fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java: PS2, Line 49: artitionKeyExprs_ = Expr.cloneList(partitionKeyExprs); > Is there a reason we keep a separate copy of
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/6593 Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I8a1ba41eb3bdb3944349ecea87806d57f74feb82 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-context.h M be/src/exprs/expr.cc A be/src/exprs/kudu-partition-expr.cc A be/src/exprs/kudu-partition-expr.h M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M bin/impala-config.sh M common/thrift/DataSinks.thrift M common/thrift/Exprs.thrift M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DataStreamSink.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/PlanFragment.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 27 files changed, 590 insertions(+), 88 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/93/6593/1 -- To view, visit http://gerrit.cloudera.org:8080/6593 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I8a1ba41eb3bdb3944349ecea87806d57f74feb82 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6559/2/be/src/exprs/kudu-partition-expr.cc File be/src/exprs/kudu-partition-expr.cc: Line 87: if (!s.ok()) { > Actually I think we may want to have a parameter indicating if strings shou I checked with Alexey on the Kudu team, this would only fail if we: a) wrote null to a non nullable col, b) tried to set a value of the wrong type, e.g. calling SetString*() on a non string column I think we can avoid (a) as mentioned above, and (b) would be a bug in our code. Therefore I think we can DCHECK -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 2: (1 comment) http://gerrit.cloudera.org:8080/#/c/6559/2/be/src/exprs/kudu-partition-expr.cc File be/src/exprs/kudu-partition-expr.cc: Line 87: if (!s.ok()) { > I'm not sure if this will return an error we need to handle at runtime, i.e Actually I think we may want to have a parameter indicating if strings should be copied or not, because in this case I don't think we need to do a string copy. Then you can use SetStringNoCopy() instead of SetString(). I'm asking someone on the Kudu team if SetStringNoCopy() has any errors we need to handle. -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. Patch Set 2: (6 comments) Quick pass with some minor comments. I am still trying to wrap my head around the concept of KuduPartitionExpr... http://gerrit.cloudera.org:8080/#/c/6559/1/be/src/exec/kudu-util.h File be/src/exec/kudu-util.h: PS1, Line 62: int col, PrimitiveType type Comment on these params. http://gerrit.cloudera.org:8080/#/c/6559/1/be/src/runtime/coordinator.cc File be/src/runtime/coordinator.cc: PS1, Line 1845: f (fragment.__isset.output_sink && fragment.output_sink.__isset.stream_sink : && fragment.output_sink.type == TDataSinkType::DATA_STREAM_SINK : && fragment.output_sink.stream_sink.output_partition.type == TPartitionType::KUDU) Add a brief comment. http://gerrit.cloudera.org:8080/#/c/6559/1/common/thrift/Exprs.thrift File common/thrift/Exprs.thrift: PS1, Line 134: 4 2? PS1, Line 137: 5 3? :) http://gerrit.cloudera.org:8080/#/c/6559/1/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS1, Line 113: // Set in analyze(). Maps exprs in partitionKeyExprs_ to their column's position in the : // table, eg. partitionKeyExprs_[i] = table_.columns(partitionKeyIdx_[i]) : private List partitionKeyIdxs_ = Lists.newArrayList(); Just for my understanding, can you explain why this is needed for this change? http://gerrit.cloudera.org:8080/#/c/6559/1/fe/src/main/java/org/apache/impala/planner/DataPartition.java File fe/src/main/java/org/apache/impala/planner/DataPartition.java: PS1, Line 48: // For kudu partition: expr that calls into the kudu client to get the partition number. : private Expr partitionExpr_; : : // Should be called only by the static factory method for Kudu partitioned tables. : private DataPartition(TPartitionType type, Expr partitionExpr) { : Preconditions.checkNotNull(partitionExpr); : Preconditions.checkState(type == TPartitionType.KUDU); : type_ = type; : partitionExpr_ = partitionExpr; : partitionExprs_ = Lists.newArrayList(); : } I am not sure I get this. Why not passing a single element list of the case of Kudu partitioned? -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Matthew Jacobs Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has uploaded a new patch set (#2). Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-context.h M be/src/exprs/expr.cc A be/src/exprs/kudu-partition-expr.cc A be/src/exprs/kudu-partition-expr.h M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M bin/impala-config.sh M common/thrift/Exprs.thrift M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 24 files changed, 593 insertions(+), 86 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6559/2 -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-MarshallGerrit-Reviewer: Matthew Jacobs
[Impala-ASF-CR] IMPALA-3742: Partitions and sort INSERTs for Kudu tables
Thomas Tauber-Marshall has uploaded a new change for review. http://gerrit.cloudera.org:8080/6559 Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables .. IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-context.h M be/src/exprs/expr.cc M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M bin/impala-config.sh M common/thrift/Exprs.thrift M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test 21 files changed, 312 insertions(+), 86 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6559/1 -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall