[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

Dimitris Tsirogiannis (Code Review) Mon, 03 Apr 2017 13:12:47 -0700

Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
......................................................................



Patch Set 5:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/6495/5/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

Line 383:   // any such columns of the source table.
Otherwise? Will the destination table inherit the sort by columns of the source 
table, if any?


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java:

PS5, Line 64: // Analyze 'sort.by.columns' property.
Comment not applicable. Remove?


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS5, Line 127: hasNoClusteredHint_
Similar to the comment on hasShuffleHint, explain the allowed values for 
hasClusteredHint and hasNotClusteredHint.


PS5, Line 763:  
"if any"


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS5, Line 77: return ImmutableList.<Integer>of();
This is an immutable list whereas the next function returns a mutable one. Make 
it consistent?


PS5, Line 83: If there are any errors, 'error' contains an
            :    * error string. Otherwise, 'error' is left untouched.
remove


PS5, Line 91: colName
nit: rename to sortByColName? since you also have tableCols it may be good to 
make it explicit which ones you refer to in the for loop.


Line 92: 
nit: remove empty line


PS5, Line 102: and store
             :       // the corresponding result expr in sortByExprs_.
Not sure I follow this part based on the code below.


Line 120: 
nit: remove empty line


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS5, Line 455: the caller must make sure that the value matches any columns 
he/she adds to the
             :    * table.
Can't we throw an exception instead?


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs()
I assume this works for HBase tables...


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

PS5, Line 521: 'sort.by.columns'='id
How about 'ID' (check for case)?


PS5, Line 1642: AnalyzesOk("create table tbl sort by (int_col,id) like 
functional.alltypes");
Also, add the negative case where sort by cols don't exist in the source table.


PS5, Line 1913: // Kudu table
              :     AnalyzesOk("create table tab (i int, x int primary key) 
partition by hash(x) " +
              :         "partitions 8 sort by(i) stored as kudu");
              :     // Column in sortby hint must not be a Kudu primary key 
column.
              :     AnalysisError("create table tab (i int, x int primary key) 
partition by hash(x) " +
              :         "partitions 8 sort by(x) stored as kudu", "SORT BY 
column list must not " +
              :         "contain primary key column: 'x'");
              :     AnalysisError("create table tab (i int, x int, y int, 
primary key (x)) partition " +
              :         "by hash(x) partitions 8 sort by(x) stored as kudu", 
"SORT BY column list must " +
              :         "not contain primary key column: 'x'");
              :     AnalysisError("create table tab (i int, x int, y int, 
primary key(x, y)) partition " +
              :         "by hash(y) partitions 8 sort by (i,x) stored as kudu", 
"SORT BY column list " +
              :         "must not contain primary key column: 'x'");
              :   }
For Kudu specific tests they need to be in a function that first calls 
TestUtils.assumeKuduIsSupported() (see L1930). Do you have tests for HBase?


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

PS5, Line 162: create table t sort by
Add one with a partitioned table? Also, a CTAS with a more complex select 
(joins, maybe aggs)


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

PS5, Line 10: ASC NULLS LAST
Out of curiosity? Can we change the default behavior? If no, are we ok with 
this restriction?


PS5, Line 39: ---- DISTRIBUTEDPLAN
            : WRITE TO HDFS [test_sort_by.alltypes, OVERWRITE=false, 
PARTITION-KEYS=(year,month)]
            : |  partitions=24
            : |
            : 01:SORT
            : |  order by: year ASC NULLS LAST, month ASC NULLS LAST, int_col 
ASC NULLS LAST, bool_col ASC NULLS LAST
            : |
            : 00:SCAN HDFS [functional.alltypes]
            :    partitions=24/24 files=24 size=478.45KB
Do we need all these DISTRIBUTED_PLANs?


PS5, Line 193: ====
Can you add one or two more complex select statements feeding the insert? i.e. 
something with a join that generates more complex plans


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

PS5, Line 1066:                  
where are these empty spaces coming from?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Dimitris Tsirogiannis <dtsirogian...@cloudera.com>
Gerrit-Reviewer: Lars Volker <l...@cloudera.com>
Gerrit-Reviewer: Marcel Kornacker <mar...@cloudera.com>
Gerrit-Reviewer: Thomas Tauber-Marshall <tmarsh...@cloudera.com>
Gerrit-HasComments: Yes

[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

Reply via email to