Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables ......................................................................
Patch Set 4: (9 comments) http://gerrit.cloudera.org:8080/#/c/6559/3/be/src/exec/kudu-util.cc File be/src/exec/kudu-util.cc: Line 110: > can you add a TODO that this needs to be codegen'd ? Done http://gerrit.cloudera.org:8080/#/c/6559/3/be/src/exec/kudu-util.h File be/src/exec/kudu-util.h: PS3, Line 62: ry > into memory owned by the row. If false, string data must remain valid while Done http://gerrit.cloudera.org:8080/#/c/6559/3/be/src/exprs/kudu-partition-expr.cc File be/src/exprs/kudu-partition-expr.cc: PS3, Line 55: KuduClient* client; > let's use the query state kudu client Done Line 84: > comment here and above why we're dchecking s.ok() Done http://gerrit.cloudera.org:8080/#/c/6559/3/be/src/runtime/data-stream-sender.h File be/src/runtime/data-stream-sender.h: PS3, Line 109: kudu_ > I think this may be better off avoiding mentioning kudu directly, e.g. call Done PS3, Line 127: sink_partition_expr > this needs explanation as well. Done http://gerrit.cloudera.org:8080/#/c/6559/3/common/thrift/Exprs.thrift File common/thrift/Exprs.thrift: Line 134: 2: required list<Types.TPrimitiveType> types > nit newline after this Done http://gerrit.cloudera.org:8080/#/c/6559/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java: PS3, Line 219: TODO: consider making a cost based decision for Kudu tables. : if (!insertStmt.hasShuffleHint() : && !(insertStmt.getTargetTable() instanceof KuduTable) > we should address this before the release. it'd be good to start thinking a That is already handled because we only call this function from Planner if '!ctx_.isSingleNodeExec()', which is true if the max # of rows processed is < single_node_exec_rows_threshold, which defaults to 100. This doesn't show up in the DISTRIBUTEDPLAN in the planner tests because PlannerTest sets single_node_exec_rows_threshold to 0. For non-Kudu tables, this 'if' is about handling things like if the input is already partitioned correctly, which we're leaving for future work for the Kudu case. http://gerrit.cloudera.org:8080/#/c/6559/3/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS3, Line 540: putFragment.getPartitionExpr() != null) { > I don't understand this logic about the clustered hint. Sure, I'll disallow all insert hints for Kudu tables (we support CLUSTERED, SHUFFLE, and SORT BY, none of which really apply now); The point of checking isSingleNodeExec here is to avoid sorting if we haven't partitioned, since then Kudu is just going to resort anyways, but if we're not also handling clustered here, then its cleaner to just look for the presence of the partition expr. -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-HasComments: Yes