Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause ......................................................................
Patch Set 5: (20 comments) http://gerrit.cloudera.org:8080/#/c/6495/5/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: Line 383: // any such columns of the source table. Otherwise? Will the destination table inherit the sort by columns of the source table, if any? http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java: PS5, Line 64: // Analyze 'sort.by.columns' property. Comment not applicable. Remove? http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS5, Line 127: hasNoClusteredHint_ Similar to the comment on hasShuffleHint, explain the allowed values for hasClusteredHint and hasNotClusteredHint. PS5, Line 763: "if any" http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS5, Line 77: return ImmutableList.<Integer>of(); This is an immutable list whereas the next function returns a mutable one. Make it consistent? PS5, Line 83: If there are any errors, 'error' contains an : * error string. Otherwise, 'error' is left untouched. remove PS5, Line 91: colName nit: rename to sortByColName? since you also have tableCols it may be good to make it explicit which ones you refer to in the for loop. Line 92: nit: remove empty line PS5, Line 102: and store : // the corresponding result expr in sortByExprs_. Not sure I follow this part based on the code below. Line 120: nit: remove empty line http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS5, Line 455: the caller must make sure that the value matches any columns he/she adds to the : * table. Can't we throw an exception instead? http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs() I assume this works for HBase tables... http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: PS5, Line 521: 'sort.by.columns'='id How about 'ID' (check for case)? PS5, Line 1642: AnalyzesOk("create table tbl sort by (int_col,id) like functional.alltypes"); Also, add the negative case where sort by cols don't exist in the source table. PS5, Line 1913: // Kudu table : AnalyzesOk("create table tab (i int, x int primary key) partition by hash(x) " + : "partitions 8 sort by(i) stored as kudu"); : // Column in sortby hint must not be a Kudu primary key column. : AnalysisError("create table tab (i int, x int primary key) partition by hash(x) " + : "partitions 8 sort by(x) stored as kudu", "SORT BY column list must not " + : "contain primary key column: 'x'"); : AnalysisError("create table tab (i int, x int, y int, primary key (x)) partition " + : "by hash(x) partitions 8 sort by(x) stored as kudu", "SORT BY column list must " + : "not contain primary key column: 'x'"); : AnalysisError("create table tab (i int, x int, y int, primary key(x, y)) partition " + : "by hash(y) partitions 8 sort by (i,x) stored as kudu", "SORT BY column list " + : "must not contain primary key column: 'x'"); : } For Kudu specific tests they need to be in a function that first calls TestUtils.assumeKuduIsSupported() (see L1930). Do you have tests for HBase? http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: PS5, Line 162: create table t sort by Add one with a partitioned table? Also, a CTAS with a more complex select (joins, maybe aggs) http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test: PS5, Line 10: ASC NULLS LAST Out of curiosity? Can we change the default behavior? If no, are we ok with this restriction? PS5, Line 39: ---- DISTRIBUTEDPLAN : WRITE TO HDFS [test_sort_by.alltypes, OVERWRITE=false, PARTITION-KEYS=(year,month)] : | partitions=24 : | : 01:SORT : | order by: year ASC NULLS LAST, month ASC NULLS LAST, int_col ASC NULLS LAST, bool_col ASC NULLS LAST : | : 00:SCAN HDFS [functional.alltypes] : partitions=24/24 files=24 size=478.45KB Do we need all these DISTRIBUTED_PLANs? PS5, Line 193: ==== Can you add one or two more complex select statements feeding the insert? i.e. something with a join that generates more complex plans http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-query/queries/QueryTest/alter-table.test File testdata/workloads/functional-query/queries/QueryTest/alter-table.test: PS5, Line 1066: where are these empty spaces coming from? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com> Gerrit-Reviewer: Lars Volker <l...@cloudera.com> Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com> Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com> Gerrit-HasComments: Yes