[Impala-ASF-CR] IMPALA-3742: partitions DMLs for Kudu tables

Matthew Jacobs (Code Review) Thu, 09 Mar 2017 19:11:16 -0800

Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3742: partitions DMLs for Kudu tables
......................................................................



Patch Set 6:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/6037/6/be/src/runtime/data-stream-sender.cc
File be/src/runtime/data-stream-sender.cc:

PS6, Line 446: TupleRow* current_row = batch->GetRow(i);
             :       uint32_t partition;
             :       RETURN_IF_ERROR(partitioner_->Partition(current_row, 
&partition));
             :       RETURN_IF_ERROR(channels_[partition % 
num_channels]->AddRow(current_row));
As we discussed, I think Henry is right that we need to avoid the v-f-calls per 
row.

I think we can consider two approaches:
1) we can continue branching separately for kudu and for hash partitioning 
here, and calling non virtual fns on the partitioner. It's still good to have 
pulled all the Kudu code into a separate class.
2) the partitioner class you added can take row batches at a time as well as 
well as the channels_, and handle the per-row calls internally.

#2 sounds cleaner but I'm open to suggestions.


http://gerrit.cloudera.org:8080/#/c/6037/6/bin/impala-config.sh
File bin/impala-config.sh:

PS6, Line 124: 4cd7d85
btw this won't be in your change, right? we still need to get the Kudu API in


http://gerrit.cloudera.org:8080/#/c/6037/6/common/thrift/Partitions.thrift
File common/thrift/Partitions.thrift:

PS6, Line 42: TKuduDataPartition
brief comment


PS6, Line 43: target_table_id
mention this is necessary for the kudu Partitioner API


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

Line 683:     // Hdfs folder structure correctly.
, or the indexes to construct rows passed to the Kudu partitioning API.

(Or something similar)


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DataPartition.java
File fe/src/main/java/org/apache/impala/planner/DataPartition.java:

PS6, Line 46: partition
partitioning


PS6, Line 52: number
index


Line 55:   private DataPartition(
nit: comment should be called only by the static factory method for Kudu 
partitioned tables


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java
File fe/src/main/java/org/apache/impala/planner/DistributedPlanner.java:

PS6, Line 274:     List<Expr> partitionExprs = 
Lists.newArrayList(modifyStmt.getPartitionExprs());
             :     List<Integer> targetColIdxs = 
Lists.newArrayList(modifyStmt.getTargetColIdxs());
maybe just create these inline as params to the static constructor method 
below, since they're not needed otherwise.


http://gerrit.cloudera.org:8080/#/c/6037/6/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

Line 153:     } else {
a brief comment here may be helpful


http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu-update.test:

PS6, Line 64: [KUDU(a.id)]
seems to be printing the alias rather than the base table name as we see in the 
line above


http://gerrit.cloudera.org:8080/#/c/6037/6/testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test
File testdata/workloads/functional-planner/queries/PlannerTest/kudu-upsert.test:

PS6, Line 29: 01:EXCHANGE [KUDU(1)]
hm this looks fishy with unions. i'm not sure if the approach doesn't make 
sense with unions, at least those with constant operands.

can you add a test case which has unions with selecting from a table, e.g.

upsert into foo
select x,y,z from a union all
select x,y,z from b


PS6, Line 117: upsert into functional_kudu.testtbl /*+ clustered */
             : select * from functional_kudu.testtbl
             : ---- PLAN
             : UPSERT INTO KUDU [functional_kudu.testtbl]
             : |
             : 01:SORT
             : |  order by: id ASC NULLS LAST
             : |
             : 00:SCAN KUDU [functional_kudu.testtbl]
             : ---- DISTRIBUTEDPLAN
             : UPSERT INTO KUDU [functional_kudu.testtbl]
             : |
             : 02:SORT
             : |  order by: id DESC NULLS LAST
             : |
             : 01:EXCHANGE [KUDU(functional_kudu.testtbl.id)]
             : |
             : 00:SCAN KUDU [functional_kudu.testtbl]
we'll wanna do something similar without the clustered hint later, as we 
discussed


-- 
To view, visit http://gerrit.cloudera.org:8080/6037
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: Ic10b3295159354888efcde3df76b0edb24161515
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-Reviewer: Henry Robinson <he...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-3742: partitions DMLs for Kudu tables

Reply via email to