Marcel Kornacker has posted comments on this change. Change subject: IMPALA-3719: Simplify CREATE TABLE statements with Kudu tables ......................................................................
Patch Set 4: (37 comments) http://gerrit.cloudera.org:8080/#/c/4414/4/common/thrift/CatalogObjects.thrift File common/thrift/CatalogObjects.thrift: Line 53: enum THdfsFileFormat { rename Line 63: // rename this enum to not be Hdfs specific. resolve http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 976: tbl_def_without_col_defs:tbl_def a 'create table' without col defs? Line 980: RESULT = new CreateTableStmt(tbl_def); trailing Line 1033: // class doesn't inherit from CreateTableStmt. should it? Line 1065: primary_keys_val ::= opt_primary_keys? Line 1089: tbl_data_layout ::= opt_...? Line 1139: {: fix spaces and tabs Line 1370: KW_PRIMARY key_ident what's wrong with KW_KEY? http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java File fe/src/main/java/org/apache/impala/analysis/AnalysisUtils.java: Line 30: static void throwIfNotNullOrNotEmpty(Collection<?> c, String message) this is actually 'not null *and* not empty'. you can also phrase that as 'not empty'. http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java: Line 158: if (fileFormat_ == THdfsFileFormat.KUDU) { check at top http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java: Line 236: String.format("PRIMARY KEY must be used instead of the table property '%s'.", not good: if you do that on an external table, you get this error message instead of 'primary key not allowed'. move this check into the next function. alternatively, why not have a list of valid table properties per storage format, and then flag everything else as invalid? Line 310: distributeParam.setPKColumnDefMap(pkColumnDefsByName); setPkColumn... Line 315: private boolean hasPrimaryKeysSpecified() { hasPrimaryKeySpecified (there's only one, which can be a composite key) hasPrimaryKey would express the same when is it legal to call this (before/after analyze()?) http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/DistributeParam.java File fe/src/main/java/org/apache/impala/analysis/DistributeParam.java: Line 121: org.apache.impala.catalog.Type colType = colDef.getType(); does simply Type conflict with something? Line 129: if (colType.isStringType() && !exprType.isStringType() this is basically looking for 'assignment compatible', and i'm sure we already have code somewhere to express that more succinctly. Line 150: builder.append(numBuckets_).append(" BUCKETS"); sprinkle some checkstates in here (on numbuckets and splitrows; or maybe a validate() function that does those checks, toThrift() would also benefit from it) Line 200: literal.setString_literal(expr.getStringValue()); checkstate that you're getting something valid Line 211: void setPKColumnDefMap(Map<String, ColumnDef> pkColumnDefByName) { setPkC... http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 74: static class TableDefOptions { 'Options' is enough Line 84: // File format of the table since this is clearly not a file format anymore, TStorageFormat? Line 160: fullyQualifiedTableName_ = analyzer.getFqTableName(getTblName()); stick with fq abbreviation? Line 189: for (ColumnDef colDef: getPartitionColumnDefs()) { this is a bit hard to follow. partition cols aren't defined separately, they're declared. so then why do you need to call colDef.analyze()? http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/main/java/org/apache/impala/catalog/KuduTable.java File fe/src/main/java/org/apache/impala/catalog/KuduTable.java: Line 111: // Distribution schemes of this Kudu table. Both rang and hash-based distributions are range Line 140: return msTbl.getParameters().get(KuduTable.KEY_TABLE_NAME); i don't think this is worth a function call, it just makes the code harder to follow (extra level of indirection) PS4, Line 175: numClusteringCols_ = 0; > not really related to this change, but it's kind of confusing to have numCl those should be the primary key cols http://gerrit.cloudera.org:8080/#/c/4414/4/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 1353: "functional.alltypestiny", "Columns cannot be specified with an external " + odd error message. i would expect the 'as select' to be the offending part. Line 1720: AnalyzesOk("create table tab (x int, y int, primary key (X)) " + i thought kudu is case-sensitive http://gerrit.cloudera.org:8080/#/c/4414/4/testdata/workloads/functional-query/queries/QueryTest/kudu_create.test File testdata/workloads/functional-query/queries/QueryTest/kudu_create.test: Line 6: as select * from functional.alltypestiny shouldn't this be part of an analyzer test? Line 30: distribute by hash (x) into 3 buckets stored as kudu same here, and for the other analysis error test cases in this file Line 32: NonRecoverableException: Key column may not have type of BOOL, FLOAT, or DOUBLE why wouldn't this be an analysis exception? Line 46: NonRecoverableException: Got out-of-order key column: name: "y" type: INT32 is_key: true is_nullable: false cfile_block_size: 0 inscrutable error message Line 53: NonRecoverableException: must have at least two hash buckets error message should point out the offending clause Line 60: NonRecoverableException: hash bucket schema components must not contain columns in common same here http://gerrit.cloudera.org:8080/#/c/4414/4/testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test File testdata/workloads/functional-query/queries/QueryTest/kudu_crud.test: Line 1: ==== might be a good idea to point out at the top that this test contains test cases for what basically amount to analysis errors, but they only show up at runtime. Line 5: DISTRIBUTE BY RANGE SPLIT ROWS ((10), (30)) STORED AS KUDU analyzer test? http://gerrit.cloudera.org:8080/#/c/4414/4/testdata/workloads/functional-query/queries/QueryTest/kudu_partition_ddl.test File testdata/workloads/functional-query/queries/QueryTest/kudu_partition_ddl.test: Line 6: DISTRIBUTE BY RANGE(name) SPLIT ROWS (('abc')) STORED AS KUDU analyzer test? -- To view, visit http://gerrit.cloudera.org:8080/4414 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I7b9d51b2720ab57649abdb7d5c710ea04ff50dc1 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Alex Behm <alex.b...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Matthew Jacobs <m...@cloudera.com> Gerrit-HasComments: Yes