Matthew Jacobs has posted comments on this change.

Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables
......................................................................


Patch Set 3:

(11 comments)

Not even close to done but want to start giving you feedback in pieces so we 
can parallelize a bit.

http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/CatalogObjects.thrift
File common/thrift/CatalogObjects.thrift:

PS3, Line 344: // Enum listing all possible DISTRIBUTE BY types
             : enum TDistributeType {
             :   HASH,
             :   RANGE,
I don't see this used


PS3, Line 350: // Parameters needed for hash distribution
             : struct TDistributeByHashParam {
             :   1: required list<string> columns
             :   2: required i32 num_buckets
             : }
             : 
             : struct TRangeLiteral {
             :   1: optional i64 int_literal
             :   2: optional string string_literal
             : }
             : 
             : struct TRangeLiteralList {
             :   1: required list<TRangeLiteral> values
             : }
             : 
             : // A range distribution is identified by a list of columns and a 
series of split rows.
             : struct TDistributeByRangeParam {
             :   1: required list<string> columns
             :   2: optional list<TRangeLiteralList> split_rows;
             : }
             : 
             : // Parameters for the DISTRIBUTE BY clause. The actual 
distribution is identified by
             : // the type parameter.
             : struct TDistributeParam {
             :   // Set if type is set to HASH
             :   1: optional TDistributeByHashParam by_hash_param;
             : 
             :   // Set if type is set to RANGE
             :   2: optional TDistributeByRangeParam by_range_param;
             : }
I see that these are used in the serialized catalog objects, but given that 
none of this is stored in the metastore nor do we require it to be stored in 
the catalog, I don't see why we bother keeping all this in the thrift table 
object.

Can we just remove it and not serialize this info? Hopefully we can even split 
the distribute/partition parameters out of CREATE TABLE eventually, anyway.


PS3, Line 390: 
             :   // Distribution schemes
             :   4: required list<TDistributeParam> distribute_by
same for this, can we get rid of it?


http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

PS3, Line 398: CatalogObjects.TDistributeParam
wrt my comments in CatalogObjects, I guess we'd need them here, but I don't see 
why we bother storing them in the serialized catalog table object.


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java
File fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java:

PS3, Line 21:  * Placeholder for the column definition options of a CREATE 
TABLE statement.
            :  * Contains the list of column definitions and, optionally, the 
list of column names
            :  * specified using the PRIMARY KEY keyword.
this feels like a weird abstraction... do we really need this class?


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java
File fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java:

PS3, Line 24: class TableDataLayout {
            : 
            :   private final List<ColumnDef> partitionColDefs_;
            :   private final List<DistributeParam> distributeParams_;
Not that we're doing this in this review, but we need to think about how this 
might change when we do the next steps for Kudu partitioning.


PS3, Line 49:   List<ColumnDef> getPartitionColumnDefs() { return 
partitionColDefs_; }
            :   List<DistributeParam> getDistributeParams() { return 
distributeParams_; }
any reason these aren't public?


http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

PS3, Line 1: // Copyright 2016 Cloudera Inc.
           : //
           : // Licensed under the Apache License, Version 2.0 (the "License");
           : // you may not use this file except in compliance with the License.
           : // You may obtain a copy of the License at
           : //
           : // http://www.apache.org/licenses/LICENSE-2.0
           : //
           : // Unless required by applicable law or agreed to in writing, 
software
           : // distributed under the License is distributed on an "AS IS" 
BASIS,
           : // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
           : // See the License for the specific language governing permissions 
and
           : // limitations under the License.
this and maybe others need the new license header:

// Licensed to the Apache Software Foundation (ASF) under one
// or more contributor license agreements.  See the NOTICE file
// distributed with this work for additional information
// regarding copyright ownership.  The ASF licenses this file
// to you under the Apache License, Version 2.0 (the
// "License"); you may not use this file except in compliance
// with the License.  You may obtain a copy of the License at
//
//   http://www.apache.org/licenses/LICENSE-2.0
//
// Unless required by applicable law or agreed to in writing,
// software distributed under the License is distributed on an
// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
// KIND, either express or implied.  See the License for the
// specific language governing permissions and limitations
// under the License.


PS3, Line 126: return fullyQualifiedTableName_ != null ? 
fullyQualifiedTableName_ : tableName_;
Do we need to support calling this before analysis? It'd be nice if we just 
have 1 possible result here, e.g. throw if fullyQualifiedTableName_ is null 
(not yet analyzed) and just return fullyQualifiedTableName_


PS3, Line 137:   List<DistributeParam> getDistributeParams() {
             :     return dataLayout_.getDistributeParams();
             :   }
1line?


http://gerrit.cloudera.org:8080/#/c/4414/3/infra/python/deps/requirements.txt
File infra/python/deps/requirements.txt:

PS3, Line 83: The kudu-python
            : # version in download_requirements must be kept in sync with this 
version.
gotta update download_requirements as well


diff --git a/infra/python/deps/download_requirements 
b/infra/python/deps/download_requirements
index daa5025..d586104 100755
--- a/infra/python/deps/download_requirements
+++ b/infra/python/deps/download_requirements
@@ -29,5 +29,5 @@ PY26="$(./find_py26.py)"
 "$PY26" pip_download.py virtualenv 13.1.0
 # kudu-python is downloaded separately because pip install attempts to execute 
a
 # setup.py subcommand for kudu-python that can fail even if the download 
succeeds.
-"$PY26" pip_download.py kudu-python 0.1.1
+"$PY26" pip_download.py kudu-python 0.2.0


-- 
To view, visit http://gerrit.cloudera.org:8080/4414
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com>
Gerrit-HasComments: Yes

Reply via email to