Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-3742: Partitions and sort INSERTs for Kudu tables ......................................................................
Patch Set 6: (21 comments) http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exec/kudu-util.h File be/src/exec/kudu-util.h: Line 65: void* value, bool copy_strings = true); > const void* value, to indicate it's not an output param Done http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exprs/kudu-partition-expr.cc File be/src/exprs/kudu-partition-expr.cc: Line 64: row_.reset(table->schema().NewRow()); > can this not return an error? No. Line 71: int col = tkudu_partition_expr_.referenced_columns[i]; > move to where it's being used I put the todo in Exprs.thrift Line 75: // the KuduTableSink generate the error message. > what's the point of round-robin instead of always returning, say, 0? It ensures that the work is spread out for INSERTs with a large number of rows that fall outside the partitions. That's probably more important below, when the row will actually be written, than here where the kudu-table-sink isn't going to end up sending the row to kudu anyways, but it still seems worth it to me. It does occur to me that this probably isn't the right place to do this - we should probably instead return '-1' here to indicate unknown and then allow the data-stream-sender to do the round-robin, since it makes the semantics of the Expr cleaner (e.g. it would no longer be non-deterministic). Line 79: Status s = WriteKuduRowValue(row_.get(), col, type, val, false); > where is this declared? This is an Impala function, from kudu-util.h I think its probably the right call to remove the 'using kudu::client::...' decls from above as it makes it clearer where things are coming from, esp. since each of the 'using's is only even used once in the file, though in an earlier version I had it that way and it was suggested that I add the 'using' in light of the verbosity of kudu's namespaces. Line 82: DCHECK(s.ok()); > add some context output (print called function, type and col name?). Done Line 89: DCHECK(s.ok()); > same here Done http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/exprs/kudu-partition-expr.h File be/src/exprs/kudu-partition-expr.h: Line 24: #include "runtime/descriptors.h" > do you need this include, as opposed to forward decls? Done Line 46: TKuduPartitionExpr tkudu_partition_expr_; > forward declare? Done Line 50: /// Used to call into Kudu to determine partitions. > it's nice to indicate what gets set in prepare(), because that tells me it Done http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/runtime/data-stream-sender.cc File be/src/runtime/data-stream-sender.cc: Line 350: kudu_ = sink.output_partition.type == TPartitionType::KUDU; > have the partition type as a member var, rather than 3 bools? Done Line 425: // to simplify this. > i don't think that's practical, for broadcast and random we have different/ Done Line 456: // hash-partition batch's rows across channels > i'd leave a todo here to coalesce this with the kudu case, which would requ Done http://gerrit.cloudera.org:8080/#/c/6559/5/be/src/runtime/data-stream-sender.h File be/src/runtime/data-stream-sender.h: Line 109: bool kudu_; > comment missing Done http://gerrit.cloudera.org:8080/#/c/6559/4/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 115: private List<Integer> partitionKeyIdxs_ = Lists.newArrayList(); > that would be good to explain in the code, it's a bit subtle Done http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 643: Set<String> kuduPartitionByColumnNames = null; > get rid of 'partitionby' here as well Done Line 682: // declaration, and store their indexes. We need those exprs in the original order to > 'store their indexes' is a bit mysterious, 'column indexes' or even column Done http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java File fe/src/main/java/org/apache/impala/analysis/KuduPartitionExpr.java: Line 44: // Maps from this Epxrs children to column positions in the table, i.e. children_[i] > since these are column positions, call the variable that? Done Line 48: private List<Type> partitionKeyTypes_; > isn't this available in the table descriptor? Done http://gerrit.cloudera.org:8080/#/c/6559/5/fe/src/main/java/org/apache/impala/planner/DataPartition.java File fe/src/main/java/org/apache/impala/planner/DataPartition.java: Line 82: return new DataPartition(TPartitionType.KUDU, expr); > why not just call the list<expr> c'tor and create a list here? (and drop th Done http://gerrit.cloudera.org:8080/#/c/6559/5/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test: Line 56: 01:EXCHANGE [KUDU(KuduPartition())] > that's not very informative Done -- To view, visit http://gerrit.cloudera.org:8080/6559 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I84ce0032a1b10958fdf31faef225372c5c38fdc4 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-Reviewer: Mostafa Mokhtar <mmokh...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-HasComments: Yes