Thomas Tauber-Marshall has uploaded a new patch set (#5). Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables ......................................................................
IMPALA-3742: Partitions and sort INSERTs for Kudu tables Bulk DMLs (INSERT, UPSERT, UPDATE, and DELETE) for Kudu are currently painful because we just send rows randomly, which creates a lot of work for Kudu since it partitions and sorts data before writing, causing writes to be slow and leading to timeouts. We can alleviate this by sending the rows to Kudu already partitioned and sorted. This patch partitions and sorts rows according to Kudu's partitioning scheme for INSERTs and UPSERTs. A followup patch will handle UPDATE and DELETE. It accomplishes this by inserting an exchange node and a sort node into the plan before the operation. Both the exchange and the sort are given a KuduPartitionExpr which takes a row and calls into the Kudu client to return its partition number. It also disallows INSERT hints for Kudu tables, since the hints that we support (SHUFFLE, CLUSTER, SORTBY), so longer make sense. Testing: - Updated planner tests. - Ran the Kudu functional tests. - Ran performance tests demonstrating that we can now handle much larger inserts without having timeouts. Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 --- M be/src/exec/kudu-table-sink.cc M be/src/exec/kudu-util.cc M be/src/exec/kudu-util.h M be/src/exprs/CMakeLists.txt M be/src/exprs/expr-context.h M be/src/exprs/expr.cc A be/src/exprs/kudu-partition-expr.cc A be/src/exprs/kudu-partition-expr.h M be/src/runtime/coordinator.cc M be/src/runtime/data-stream-sender.cc M be/src/runtime/data-stream-sender.h M be/src/scheduling/scheduler.cc M bin/impala-config.sh M common/thrift/Exprs.thrift M common/thrift/Partitions.thrift M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java A fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java M fe/src/main/java/org/apache/impala/catalog/KuduTable.java M fe/src/main/java/org/apache/impala/planner/DataPartition.java M fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeUpsertStmtTest.java M testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test M testdata/workloads/functional-planner/queries/PlannerTest/kudu.test M testdata/workloads/functional-query/queries/QueryTest/kudu_insert.test 27 files changed, 621 insertions(+), 162 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/59/6559/5 -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>