Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables
......................................................................


Patch Set 4:

(9 comments)

http://gerrit.cloudera.org:8080/#/c/6559/3/be/src/exec/kudu-util.cc
File be/src/exec/kudu-util.cc:

Line 110: 
> can you add a TODO that this needs to be codegen'd ?
Done


http://gerrit.cloudera.org:8080/#/c/6559/3/be/src/exec/kudu-util.h
File be/src/exec/kudu-util.h:

PS3, Line 62: ry 
> into memory owned by the row. If false, string data must remain valid while
Done


http://gerrit.cloudera.org:8080/#/c/6559/3/be/src/exprs/kudu-partition-expr.cc
File be/src/exprs/kudu-partition-expr.cc:

PS3, Line 55:   KuduClient* client;
> let's use the query state kudu client
Done


Line 84: 
> comment here and above why we're dchecking s.ok()
Done


http://gerrit.cloudera.org:8080/#/c/6559/3/be/src/runtime/data-stream-sender.h
File be/src/runtime/data-stream-sender.h:

PS3, Line 109: kudu_
> I think this may be better off avoiding mentioning kudu directly, e.g. call
Done


PS3, Line 127: sink_partition_expr
> this needs explanation as well.
Done


http://gerrit.cloudera.org:8080/#/c/6559/3/common/thrift/Exprs.thrift
File common/thrift/Exprs.thrift:

Line 134:   2: required list<Types.TPrimitiveType> types
> nit newline after this
Done


http://gerrit.cloudera.org:8080/#/c/6559/3/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS3, Line 219:  TODO: consider making a cost based decision for Kudu tables.
             :     if (!insertStmt.hasShuffleHint()
             :         && !(insertStmt.getTargetTable() instanceof KuduTable)
> we should address this before the release. it'd be good to start thinking a
That is already handled because we only call this function from Planner if 
'!ctx_.isSingleNodeExec()', which is true if the max # of rows processed is < 
single_node_exec_rows_threshold, which defaults to 100.

This doesn't show up in the DISTRIBUTEDPLAN in the planner tests because 
PlannerTest sets single_node_exec_rows_threshold to 0.

For non-Kudu tables, this 'if' is about handling things like if the input is 
already partitioned correctly, which we're leaving for future work for the Kudu 
case.


http://gerrit.cloudera.org:8080/#/c/6559/3/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS3, Line 540: putFragment.getPartitionExpr() != null) {
> I don't understand this logic about the clustered hint.
Sure, I'll disallow all insert hints for Kudu tables (we support CLUSTERED, 
SHUFFLE, and SORT BY, none of which really apply now);

The point of checking isSingleNodeExec here is to avoid sorting if we haven't 
partitioned, since then Kudu is just going to resort anyways, but if we're not 
also handling clustered here, then its cleaner to just look for the presence of 
the partition expr.


-- 
To view, visit http://gerrit.cloudera.org:8080/6559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to