Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables
......................................................................


Patch Set 6:

(21 comments)

http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exec/kudu-util.h
File be/src/exec/kudu-util.h:

Line 65:     void* value, bool copy_strings = true);
> const void* value, to indicate it's not an output param
Done


http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exprs/kudu-partition-expr.cc
File be/src/exprs/kudu-partition-expr.cc:

Line 64:   row_.reset(table->schema().NewRow());
> can this not return an error?
No.


Line 71:     int col = tkudu_partition_expr_.referenced_columns[i];
> move to where it's being used
I put the todo in Exprs.thrift


Line 75:       // the KuduTableSink generate the error message.
> what's the point of round-robin instead of always returning, say, 0?
It ensures that the work is spread out for INSERTs with a large number of rows 
that fall outside the partitions. That's probably more important below, when 
the row will actually be written, than here where the kudu-table-sink isn't 
going to end up sending the row to kudu anyways, but it still seems worth it to 
me.

It does occur to me that this probably isn't the right place to do this - we 
should probably instead return '-1' here to indicate unknown and then allow the 
data-stream-sender to do the round-robin, since it makes the semantics of the 
Expr cleaner (e.g. it would no longer be non-deterministic).


Line 79:     Status s = WriteKuduRowValue(row_.get(), col, type, val, false);
> where is this declared?
This is an Impala function, from kudu-util.h

I think its probably the right call to remove the 'using kudu::client::...' 
decls from above as it makes it clearer where things are coming from, esp. 
since each of the 'using's is only even used once in the file, though in an 
earlier version I had it that way and it was suggested that I add the 'using' 
in light of the verbosity of kudu's namespaces.


Line 82:     DCHECK(s.ok());
> add some context output (print called function, type and col name?).
Done


Line 89:   DCHECK(s.ok());
> same here
Done


http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exprs/kudu-partition-expr.h
File be/src/exprs/kudu-partition-expr.h:

Line 24: #include "runtime/descriptors.h"
> do you need this include, as opposed to forward decls?
Done


Line 46:   TKuduPartitionExpr tkudu_partition_expr_;
> forward declare?
Done


Line 50:   /// Used to call into Kudu to determine partitions.
> it's nice to indicate what gets set in prepare(), because that tells me it 
Done


http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

Line 350:   kudu_ = sink.output_partition.type == TPartitionType::KUDU;
> have the partition type as a member var, rather than 3 bools?
Done


Line 425:   // to simplify this.
> i don't think that's practical, for broadcast and random we have different/
Done


Line 456:     // hash-partition batch's rows across channels
> i'd leave a todo here to coalesce this with the kudu case, which would requ
Done


http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/runtime/data-stream-sender.h
File be/src/runtime/data-stream-sender.h:

Line 109:   bool kudu_;
> comment missing
Done


http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

Line 115:   private List<Integer> partitionKeyIdxs_ = Lists.newArrayList();
> that would be good to explain in the code, it's a bit subtle
Done


http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

Line 643:     Set<String> kuduPartitionByColumnNames = null;
> get rid of 'partitionby' here as well
Done


Line 682:     // declaration, and store their indexes.  We need those exprs in 
the original order to
> 'store their indexes' is a bit mysterious, 'column indexes' or even column 
Done


http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java
File fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java:

Line 44:   // Maps from this Epxrs children to column positions in the table, 
i.e. children_[i]
> since these are column positions, call the variable that?
Done


Line 48:   private List<Type> partitionKeyTypes_;
> isn't this available in the table descriptor?
Done


http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/planner/DataPartition.java
File fe/src/main/java/org/apache/impala/planner/DataPartition.java:

Line 82:     return new DataPartition(TPartitionType.KUDU, expr);
> why not just call the list<expr> c'tor and create a list here? (and drop th
Done


http://gerrit.cloudera.org:8080/#/c/6559/5/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test:

Line 56: 01:EXCHANGE [KUDU(KuduPartition())]
> that's not very informative
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6559
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to