Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3742: partitions DMLs for Kudu tables ......................................................................
Patch Set 6: (13 comments) http://gerrit.cloudera.org:8080/#/c/6037/6/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: PS6, Line 446: TupleRow* current_row = batch->GetRow(i); : uint32_t partition; : RETURN_IF_ERROR(partitioner_->Partition(current_row, &partition)); : RETURN_IF_ERROR(channels_[partition % num_channels]->AddRow(current_row)); As we discussed, I think Henry is right that we need to avoid the v-f-calls per row. I think we can consider two approaches: 1) we can continue branching separately for kudu and for hash partitioning here, and calling non virtual fns on the partitioner. It's still good to have pulled all the Kudu code into a separate class. 2) the partitioner class you added can take row batches at a time as well as well as the channels_, and handle the per-row calls internally. #2 sounds cleaner but I'm open to suggestions. http://gerrit.cloudera.org:8080/#/c/6037/6/bin/impala-config.sh File bin/impala-config.sh: PS6, Line 124: 4cd7d85 btw this won't be in your change, right? we still need to get the Kudu API in http://gerrit.cloudera.org:8080/#/c/6037/6/common/thrift/Partitions.thrift File common/thrift/Partitions.thrift: PS6, Line 42: TKuduDataPartition brief comment PS6, Line 43: target_table_id mention this is necessary for the kudu Partitioner API http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 683: // Hdfs folder structure correctly. , or the indexes to construct rows passed to the Kudu partitioning API. (Or something similar) http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DataPartition.java File fe/src/main/java/org/apache/impala/planner/DataPartition.java: PS6, Line 46: partition partitioning PS6, Line 52: number index Line 55: private DataPartition( nit: comment should be called only by the static factory method for Kudu partitioned tables http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: PS6, Line 274: List<Expr> partitionExprs = Lists.newArrayList(modifyStmt.getPartitionExprs()); : List<Integer> targetColIdxs = Lists.newArrayList(modifyStmt.getTargetColIdxs()); maybe just create these inline as params to the static constructor method below, since they're not needed otherwise. http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 153: } else { a brief comment here may be helpful http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test: PS6, Line 64: [KUDU(a.id)] seems to be printing the alias rather than the base table name as we see in the line above http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test: PS6, Line 29: 01:EXCHANGE [KUDU(1)] hm this looks fishy with unions. i'm not sure if the approach doesn't make sense with unions, at least those with constant operands. can you add a test case which has unions with selecting from a table, e.g. upsert into foo select x,y,z from a union all select x,y,z from b PS6, Line 117: upsert into functional_kudu.testtbl /*+ clustered */ : select * from functional_kudu.testtbl : ---- PLAN : UPSERT INTO KUDU [functional_kudu.testtbl] : | : 01:SORT : | order by: id ASC NULLS LAST : | : 00:SCAN KUDU [functional_kudu.testtbl] : ---- DISTRIBUTEDPLAN : UPSERT INTO KUDU [functional_kudu.testtbl] : | : 02:SORT : | order by: id DESC NULLS LAST : | : 01:EXCHANGE [KUDU(functional_kudu.testtbl.id)] : | : 00:SCAN KUDU [functional_kudu.testtbl] we'll wanna do something similar without the clustered hint later, as we discussed -- To view, visit http://gerrit.cloudera.org:8080/6037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-HasComments: Yes