Matthew Jacobs has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables ......................................................................
Patch Set 3: (11 comments) Not even close to done but want to start giving you feedback in pieces so we can parallelize a bit. http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: PS3, Line 344: // Enum listing all possible DISTRIBUTE BY types : enum TDistributeType { : HASH, : RANGE, I don't see this used PS3, Line 350: // Parameters needed for hash distribution : struct TDistributeByHashParam { : 1: required list<string> columns : 2: required i32 num_buckets : } : : struct TRangeLiteral { : 1: optional i64 int_literal : 2: optional string string_literal : } : : struct TRangeLiteralList { : 1: required list<TRangeLiteral> values : } : : // A range distribution is identified by a list of columns and a series of split rows. : struct TDistributeByRangeParam { : 1: required list<string> columns : 2: optional list<TRangeLiteralList> split_rows; : } : : // Parameters for the DISTRIBUTE BY clause. The actual distribution is identified by : // the type parameter. : struct TDistributeParam { : // Set if type is set to HASH : 1: optional TDistributeByHashParam by_hash_param; : : // Set if type is set to RANGE : 2: optional TDistributeByRangeParam by_range_param; : } I see that these are used in the serialized catalog objects, but given that none of this is stored in the metastore nor do we require it to be stored in the catalog, I don't see why we bother keeping all this in the thrift table object. Can we just remove it and not serialize this info? Hopefully we can even split the distribute/partition parameters out of CREATE TABLE eventually, anyway. PS3, Line 390: : // Distribution schemes : 4: required list<TDistributeParam> distribute_by same for this, can we get rid of it? http://gerrit.cloudera.org:8080/#/c/4414/3/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: PS3, Line 398: CatalogObjects.TDistributeParam wrt my comments in CatalogObjects, I guess we'd need them here, but I don't see why we bother storing them in the serialized catalog table object. http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java File fe/src/main/java/org/apache/impala/analysis/ColumnDefOptions.java: PS3, Line 21: * Placeholder for the column definition options of a CREATE TABLE statement. : * Contains the list of column definitions and, optionally, the list of column names : * specified using the PRIMARY KEY keyword. this feels like a weird abstraction... do we really need this class? http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java File fe/src/main/java/org/apache/impala/analysis/TableDataLayout.java: PS3, Line 24: class TableDataLayout { : : private final List<ColumnDef> partitionColDefs_; : private final List<DistributeParam> distributeParams_; Not that we're doing this in this review, but we need to think about how this might change when we do the next steps for Kudu partitioning. PS3, Line 49: List<ColumnDef> getPartitionColumnDefs() { return partitionColDefs_; } : List<DistributeParam> getDistributeParams() { return distributeParams_; } any reason these aren't public? http://gerrit.cloudera.org:8080/#/c/4414/3/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: PS3, Line 1: // Copyright 2016 Cloudera Inc. : // : // Licensed under the Apache License, Version 2.0 (the "License"); : // you may not use this file except in compliance with the License. : // You may obtain a copy of the License at : // : // http://www.apache.org/licenses/LICENSE-2.0 : // : // Unless required by applicable law or agreed to in writing, software : // distributed under the License is distributed on an "AS IS" BASIS, : // WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. : // See the License for the specific language governing permissions and : // limitations under the License. this and maybe others need the new license header: // Licensed to the Apache Software Foundation (ASF) under one // or more contributor license agreements. See the NOTICE file // distributed with this work for additional information // regarding copyright ownership. The ASF licenses this file // to you under the Apache License, Version 2.0 (the // "License"); you may not use this file except in compliance // with the License. You may obtain a copy of the License at // // http://www.apache.org/licenses/LICENSE-2.0 // // Unless required by applicable law or agreed to in writing, // software distributed under the License is distributed on an // "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY // KIND, either express or implied. See the License for the // specific language governing permissions and limitations // under the License. PS3, Line 126: return fullyQualifiedTableName_ != null ? fullyQualifiedTableName_ : tableName_; Do we need to support calling this before analysis? It'd be nice if we just have 1 possible result here, e.g. throw if fullyQualifiedTableName_ is null (not yet analyzed) and just return fullyQualifiedTableName_ PS3, Line 137: List<DistributeParam> getDistributeParams() { : return dataLayout_.getDistributeParams(); : } 1line? http://gerrit.cloudera.org:8080/#/c/4414/3/infra/python/deps/requirements.txt File infra/python/deps/requirements.txt: PS3, Line 83: The kudu-python : # version in download_requirements must be kept in sync with this version. gotta update download_requirements as well diff --git a/infra/python/deps/download_requirements b/infra/python/deps/download_requirements index daa5025..d586104 100755 --- a/infra/python/deps/download_requirements +++ b/infra/python/deps/download_requirements @@ -29,5 +29,5 @@ PY26="$(./find_py26.py)" "$PY26" pip_download.py virtualenv 13.1.0 # kudu-python is downloaded separately because pip install attempts to execute a # setup.py subcommand for kudu-python that can fail even if the download succeeds. -"$PY26" pip_download.py kudu-python 0.1.1 +"$PY26" pip_download.py kudu-python 0.2.0 -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-HasComments: Yes