Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3742: partitions DMLs for Kudu tables ......................................................................
Patch Set 7: (14 comments) http://gerrit.cloudera.org:8080/#/c/6037/6/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: PS6, Line 446: batch, [this, num_channels](TupleRow* row, uint32_t partition) -> Status { : return this->channels_[partition % num_channels]->AddRow(row); : })); : } > nice. I assume this generates a function which is then passed directly by f Done http://gerrit.cloudera.org:8080/#/c/6037/6/bin/impala-config.sh File bin/impala-config.sh: PS6, Line 124: 4cd7d85 > btw this won't be in your change, right? we still need to get the Kudu API Yes, this isn't a real version of Kudu, just what resulted when I cherry-picked the Kudu API onto the latest version. Its just here to mind me that this change can't go in until it goes in on the Kudu side. http://gerrit.cloudera.org:8080/#/c/6037/6/common/thrift/Partitions.thrift File common/thrift/Partitions.thrift: PS6, Line 42: s partition inform > brief comment Done PS6, Line 43: > mention this is necessary for the kudu Partitioner API Done http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 683: // create the corresponding Hdfs folder structure correctly, or the indexes to > , or the indexes to construct rows passed to the Kudu partitioning API. Done http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DataPartition.java File fe/src/main/java/org/apache/impala/planner/DataPartition.java: PS6, Line 46: partition > partitioning Done PS6, Line 52: index > index Done Line 55: // Should be called only by the static factory method for Kudu partitioned tables. > nit: comment should be called only by the static factory method for Kudu pa Done http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: PS6, Line 274: ExchangeNode exchNode = : new ExchangeNode(ctx_.getNextNodeId(), inputFragment.getPlanRoot()); > maybe just create these inline as params to the static constructor method b Done http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: Line 153: } else { > a brief comment here may be helpful Done http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test: PS6, Line 64: [KUDU(a.id)] > seems to be printing the alias rather than the base table name as we see in What should the behavior be? There isn't an alias to use in the above queries, but it seems more convenient to use the alias here, like everywhere else this value is referred to in this plan. http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test: PS6, Line 29: 01:EXCHANGE [KUDU(1)] > hm this looks fishy with unions. i'm not sure if the approach doesn't make The '1' here is actually a SlotRef referring to the column the "1" is going into, id, so this basically works as intended, at least in theory Its printed as "1" because that's what gets set as the SlotRefs label. Obviously it would be preferable for it to be printed as the underlying column, but I'm not sure of a non-hacky way to do that (i.e. not just special casing this specific situation). In practice, this doesn't have any effect, as 'insert into... values...' will always operate on one node and DataStreamSender will use broadcast partitioning. Non-Kudu inserts avoid this with the cost-based decision in DistributedPlanner, which we could also do here, though as noted elsewhere in this review, this only appears here because PlannerTest sets exec_single_node_rows_threshold to 0, so in practice small values inserts won't be affected by this. http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: Line 53: 01:EXCHANGE [KUDU(10)] > we were talking about trying to figure out the right cost-based logic separ This turns out to be an artifact of the way PlannerTest works - it always sets exec_single_node_rows_threshold to 0. If you left it as the default 100 this exchange would not be here, so I think this is the correct behavior, and small queries should be fine. If you happen to look in insert.test and notice that similar INSERTs with small VALUEs clauses in there don't have an exchange in their DISTRIBUTEDPLANs, thats due to a more complicated cost based decision in DistributedPlanner, and we were planning on punting on doing the same thing for Kudu for now (there's a TODO). Line 248: # Kudu planner tests > i'm assuming the hint will lose its meaning once the default sorting is in Done -- To view, visit http://gerrit.cloudera.org:8080/6037 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Henry Robinson <he...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-HasComments: Yes