Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-3742: partitions DMLs for Kudu tables
......................................................................


Patch Set 7:

(14 comments)

http://gerrit.cloudera.org:8080/#/c/6037/6/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

PS6, Line 446:   batch, [this, num_channels](TupleRow* row, uint32_t partition) 
-> Status {
             :           return this->channels_[partition % 
num_channels]->AddRow(row);
             :         }));
             :   }
> nice. I assume this generates a function which is then passed directly by f
Done


http://gerrit.cloudera.org:8080/#/c/6037/6/bin/impala-config.sh
File bin/impala-config.sh:

PS6, Line 124: 4cd7d85
> btw this won't be in your change, right? we still need to get the Kudu API 
Yes, this isn't a real version of Kudu, just what resulted when I cherry-picked 
the Kudu API onto the latest version. Its just here to mind me that this change 
can't go in until it goes in on the Kudu side.


http://gerrit.cloudera.org:8080/#/c/6037/6/common/thrift/Partitions.thrift
File common/thrift/Partitions.thrift:

PS6, Line 42: s partition inform
> brief comment
Done


PS6, Line 43: 
> mention this is necessary for the kudu Partitioner API
Done


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

Line 683:     // create the corresponding Hdfs folder structure correctly, or 
the indexes to
> , or the indexes to construct rows passed to the Kudu partitioning API.
Done


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DataPartition.java
File fe/src/main/java/org/apache/impala/planner/DataPartition.java:

PS6, Line 46: partition
> partitioning
Done


PS6, Line 52: index 
> index
Done


Line 55:   // Should be called only by the static factory method for Kudu 
partitioned tables.
> nit: comment should be called only by the static factory method for Kudu pa
Done


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS6, Line 274:     ExchangeNode exchNode =
             :         new ExchangeNode(ctx_.getNextNodeId(), 
inputFragment.getPlanRoot());
> maybe just create these inline as params to the static constructor method b
Done


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

Line 153:     } else {
> a brief comment here may be helpful
Done


http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test:

PS6, Line 64: [KUDU(a.id)]
> seems to be printing the alias rather than the base table name as we see in
What should the behavior be? There isn't an alias to use in the above queries, 
but it seems more convenient to use the alias here, like everywhere else this 
value is referred to in this plan.


http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test:

PS6, Line 29: 01:EXCHANGE [KUDU(1)]
> hm this looks fishy with unions. i'm not sure if the approach doesn't make 
The '1' here is actually a SlotRef referring to the column the "1" is going 
into, id, so this basically works as intended, at least in theory Its printed 
as "1" because that's what gets set as the SlotRefs label. Obviously it would 
be preferable for it to be printed as the underlying column, but I'm not sure 
of a non-hacky way to do that (i.e. not just special casing this specific 
situation).

In practice, this doesn't have any effect, as 'insert into... values...' will 
always operate on one node and DataStreamSender will use broadcast partitioning.

Non-Kudu inserts avoid this with the cost-based decision in DistributedPlanner, 
which we could also do here, though as noted elsewhere in this review, this 
only appears here because PlannerTest sets exec_single_node_rows_threshold to 
0, so in practice small values inserts won't be affected by this.


http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu.test:

Line 53: 01:EXCHANGE [KUDU(10)]
> we were talking about trying to figure out the right cost-based logic separ
This turns out to be an artifact of the way PlannerTest works - it always sets 
exec_single_node_rows_threshold to 0. If you left it as the default 100 this 
exchange would not be here, so I think this is the correct behavior, and small 
queries should be fine.

If you happen to look in insert.test and notice that similar INSERTs with small 
VALUEs clauses in there don't have an exchange in their DISTRIBUTEDPLANs, thats 
due to a more complicated cost based decision in DistributedPlanner, and we 
were planning on punting on doing the same thing for Kudu for now (there's a 
TODO).


Line 248: # Kudu planner tests
> i'm assuming the hint will lose its meaning once the default sorting is in 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to