[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 31: Verified+1 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 31 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Impala Public Jenkins has submitted this change and it was merged. Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); Sort columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying sort columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the sort columns. We do this because sort columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. Sort columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Reviewed-on: http://gerrit.cloudera.org:8080/6495 Reviewed-by: Lars Volker Tested-by: Impala Public Jenkins --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M common/thrift/DataSinks.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 39 files changed, 1,531 insertions(+), 113 deletions(-) Approvals: Impala Public Jenkins: Verified Lars Volker: Looks good to me, approved -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: merged Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 32 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 31: Code-Review+2 Submit job for PS30 seems to have failed for unrelated Jenkins issues. Rebased, carrying Marcel's +2. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 31 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 31: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/575/ -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 31 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 30: Verified-1 Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/573/ -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 30 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Impala Public Jenkins has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 30: Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/573/ -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 30 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Impala Public Jenkins Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 30: Code-Review+2 I chatted with Mostafa, carrying Marcel's +2 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 30 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 29: (1 comment) Thanks for the review. Please see PS30. http://gerrit.cloudera.org:8080/#/c/6495/29/testdata/workloads/functional-query/queries/QueryTest/alter-table.test File testdata/workloads/functional-query/queries/QueryTest/alter-table.test: Line 1130: alter table insert_sorted replace columns (i int, e double, f boolean); > For more coverage how about. Done -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 29 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Marcel Kornacker, Dimitris Tsirogiannis, Mostafa Mokhtar, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#30). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); Sort columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying sort columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the sort columns. We do this because sort columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. Sort columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M common/thrift/DataSinks.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 39 files changed, 1,531 insertions(+), 113 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/30 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 30 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Mostafa Mokhtar has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 29: Code-Review+1 (1 comment) Thanks for adding the test cases. http://gerrit.cloudera.org:8080/#/c/6495/29/testdata/workloads/functional-query/queries/QueryTest/alter-table.test File testdata/workloads/functional-query/queries/QueryTest/alter-table.test: Line 1130: alter table insert_sorted replace columns (i int, e double, f boolean); For more coverage how about. (int bigint, e decimal(12,2), f boolean) -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 29 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 29: PS29 adds a negative test for alter table drop column. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 29 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Marcel Kornacker, Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#29). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); Sort columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying sort columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the sort columns. We do this because sort columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. Sort columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M common/thrift/DataSinks.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 39 files changed, 1,531 insertions(+), 113 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/29 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 29 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 27: PS27 passed a jenkins run here: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/570 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 27 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 27: (1 comment) I added more tests in PS28. http://gerrit.cloudera.org:8080/#/c/6495/27/testdata/workloads/functional-query/queries/QueryTest/alter-table.test File testdata/workloads/functional-query/queries/QueryTest/alter-table.test: Line 1058: QUERY > Can you please add tests cases that cover inserts and queries after alter t Done -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 27 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Marcel Kornacker, Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#28). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); Sort columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying sort columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the sort columns. We do this because sort columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. Sort columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M common/thrift/DataSinks.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 39 files changed, 1,517 insertions(+), 113 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/28 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 28 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Mostafa Mokhtar has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 27: (1 comment) http://gerrit.cloudera.org:8080/#/c/6495/27/testdata/workloads/functional-query/queries/QueryTest/alter-table.test File testdata/workloads/functional-query/queries/QueryTest/alter-table.test: Line 1058: QUERY Can you please add tests cases that cover inserts and queries after alter table? And if possibly verify the Order by operator has the correct columns after the table is modified. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 27 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Mostafa Mokhtar Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 27: Code-Review+2 Rebased, carrying Marcel's +2. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 27 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Marcel Kornacker, Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#26). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); Sort columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying sort columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the sort columns. We do this because sort columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. Sort columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M common/thrift/DataSinks.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 39 files changed, 1,406 insertions(+), 113 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/26 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 26 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 25: (4 comments) Addressed the final comments, will rebase next. http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 295: > when you're output error messages, it's probably better to refer to the exa Done Line 352: "Please retry without caching: CREATE TABLE ... UNCACHED", > why not "SORT BY is not supported..."? Done http://gerrit.cloudera.org:8080/#/c/6495/25/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 291: TableDef.analyzeSortColumns(sortCols, > do you need the 'tabledef.' prefix here? No, removed it. http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: Line 75: String sortByKey = AlterTableSortByStmt.TBL_PROP_SORT_COLUMNS; > aren't you implying that it should live in hdfstable as well? No, I meant to say that the skip.header.line.count key constant lives where it semantically somewhat belongs to, as compared to AlterTableSetTblProperties or TableDef. The class that most exclusively deals with SORT BY is this one here. Apologies for the confusion. :) -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 25 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 25: Code-Review+2 (4 comments) http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 295: when you're output error messages, it's probably better to refer to the exact syntactic element. so "SORT BY column list" is clearer than "sort column list". please apply elsewhere, in case i missed something. Line 352: "Please retry without caching: CREATE TABLE ... UNCACHED", why not "SORT BY is not supported..."? http://gerrit.cloudera.org:8080/#/c/6495/25/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 291: TableDef.analyzeSortColumns(sortCols, do you need the 'tabledef.' prefix here? http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: Line 75: String sortByKey = AlterTableSortByStmt.TBL_PROP_SORT_COLUMNS; > I've discussed that with Alex and Dimitris and we no better place has emerg aren't you implying that it should live in hdfstable as well? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 25 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 25: I also replaced "sortByExprs" with "sortExprs". Please have a look at PS25. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 25 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#25). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); Sort columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying sort columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the sort columns. We do this because sort columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. Sort columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M common/thrift/DataSinks.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 39 files changed, 1,407 insertions(+), 113 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/25 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 25 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 23: (10 comments) Thanks for the review, please see PS24. http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 336: nonterminal ArrayList opt_ident_list, opt_sort_cols; > opt_sort_by_clause or opt_sort_cols? "sort by columns" sounds odd. Done http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java: Line 43 > drop "columns", the statement doesn't include a COLUMNS Done Line 73 > rephrase "sort by columns" as "sort columns" universally Done Line 91 > maybe move this into TableDef as well? it seems kind of arbitrary to put it Done. This has moved a couple of times now. :) http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 863: // TODO: Remove this when removing the sortby() hint (IMPALA-5157). > is this going to happen soon? Yes, immediately after this change makes it in. We should remove the hint before releasing 2.9, since it hasn't been released yet. We could remove it in this change at the risk of the change becoming even larger than it already is. http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 46: > update Done Line 343: if (options_.location != null) { > why final? To express that it's just a shorter name for the constant. Removed it. http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: Line 75: final String sortByKey = AlterTableSortByStmt.TBL_PROP_SORT_COLUMNS; > it feels like that constant should live somewhere else I've discussed that with Alex and Dimitris and we no better place has emerged, though I've tried some of them. It's related to the sort by clause, similar to how TBL_PROP_SKIP_HEADER_LINE_COUNT lives in HdfsTable.java. Do you have a preference where it should go? I also thought of TableDef (where the analysis happens) or AlterTableSetTblProperties (where we deal with properties). http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 456:* the caller must make sure that the value matches any columns that were added to the > or "that were added to the table" to avoid the gender reference :) Done http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: Line 101: addTestTable("create table test_sort_by.t (id int, int_col int, " + > why call this alltypes? it clearly doesn't contain all types. Done -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#24). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); Sort columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying sort columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the sort columns. We do this because sort columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. Sort columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M common/thrift/DataSinks.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 39 files changed, 1,401 insertions(+), 106 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/24 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 24 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#23). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); Sort columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying sort columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the sort columns. We do this because sort columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. Sort columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M be/src/exec/hdfs-parquet-table-writer.cc M be/src/exec/hdfs-table-sink.cc M be/src/exec/hdfs-table-sink.h M common/thrift/DataSinks.thrift M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/planner/TableSink.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 39 files changed, 1,405 insertions(+), 107 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/23 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 23 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 22: (3 comments) Thanks for the review. I'm working on the comments, but in the meantime wanted to check whether we should uniformly rename all occurrences of "sort.?by.?columns" to sort_columns or sorting_columns. http://gerrit.cloudera.org:8080/#/c/6495/22/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: Line 382: // Optional list of sort by columns for the new table. If specified, these will override > "sort columns" or "sorting columns" (as in: partition columns) Done. There's a lot of other places in the code where that term is used, including the table property, which is called 'sort.by.columns'. Should I replace them all? Line 385: 9: optional list sort_by_columns > sort_columns Done Line 439: 16: optional list sort_by_columns > same Done -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 22: (13 comments) http://gerrit.cloudera.org:8080/#/c/6495/22/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: Line 382: // Optional list of sort by columns for the new table. If specified, these will override "sort columns" or "sorting columns" (as in: partition columns) Line 385: 9: optional list sort_by_columns sort_columns Line 439: 16: optional list sort_by_columns same http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: Line 336: nonterminal ArrayList opt_ident_list, opt_sort_by_cols; opt_sort_by_clause or opt_sort_cols? "sort by columns" sounds odd. http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java: Line 43: public class AlterTableSortByColumnsStmt extends AlterTableStmt { drop "columns", the statement doesn't include a COLUMNS Line 73: // Disallow setting sort by columns on HBase and Kudu tables. rephrase "sort by columns" as "sort columns" universally Line 91: public static void analyzeSortByColumns(List sortByCols, Table table) maybe move this into TableDef as well? it seems kind of arbitrary to put it here. http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: Line 863: // TODO: Remove this when removing the sortby() hint (IMPALA-5157). is this going to happen soon? http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: Line 46: * correspond to the following clauses in a CREATE TABLE statement: update Line 343: final String sortByKey = AlterTableSortByColumnsStmt.TBL_PROP_SORT_BY_COLUMNS; why final? http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: Line 75: final String sortByKey = AlterTableSortByColumnsStmt.TBL_PROP_SORT_BY_COLUMNS; it feels like that constant should live somewhere else http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 456:* the caller must make sure that the value matches any columns he/she added to the or "that were added to the table" to avoid the gender reference :) http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/test/java/org/apache/impala/planner/PlannerTest.java File fe/src/test/java/org/apache/impala/planner/PlannerTest.java: Line 101: addTestTable("create table test_sort_by.alltypes (id int, int_col int, " + why call this alltypes? it clearly doesn't contain all types. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 22: Done rebasing, please have a look at PS22. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#22). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort by columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 33 files changed, 1,382 insertions(+), 89 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/22 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 22 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
On Tue, May 9, 2017 at 2:25 AM, Dimitris Tsirogiannis < dtsirogian...@cloudera.com> wrote: > If you don't populate it for CREATE TABLE SORT BY() then I think you > should remove it altogether. > We don't set it there. On second look, we don't support removing table properties and would need to add that capability to maintain consistency here. Should we do this? As an alternative, we can set the table property to an empty value for CREATE TABLE SORT BY()? I'm good with either of these. Even leaving it set to an empty value had it ever been set seems fine to me, since it's an internal implementation detail and won't affect functionality. > > Dimitris > > On Mon, May 8, 2017 at 4:33 PM, Lars Volker wrote: > >> >> >> On Tue, May 9, 2017 at 1:24 AM, Dimitris Tsirogiannis < >> dtsirogian...@cloudera.com> wrote: >> >>> The other alternative would be to populate it with the partitioning >>> columns, if any. Thoughts? >>> >>> >> Adding the partitioning columns to the sort by list is not supported. >> They can be added to the pre-insert sort not using the clustered hint. I'm >> not sure though if I understood your statement correctly. >> >> My question was what to do if a user executes "ALTER TABLE SORT BY();", >> meaning to remove all sort by columns. Should we remove all columns from >> the property, or remove the property altogether? >> >> >>> Dimitris >>> >>> On Mon, May 8, 2017 at 4:02 PM, Lars Volker wrote: >>> On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker wrote: > > > On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis < > dtsirogian...@cloudera.com> wrote: > >> I believe it sends the wrong message and is probably confusing to >> throw an error when someone writes a CREATE TABLE with an empty SORT BY() >> but allow the same clause in an ALTER. No doubt the users can read the >> documentation and figure it out but its an extra step. Also, as Marcel >> mentions, scripting may be easier as you don't have to figure out whether >> to add a clause or not. Anyways, the patch is great as is, I just wanted >> to >> mention this. >> > > Sounds like we should allow an empty list then. > I will change the parser accordingly. On a related note, ALTER TABLE SORT BY () will leave an empty property 'sort.by.columns'. Should it remove the property altogether instead? > > >> >> >> Dimitris >> >> On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker > > wrote: >> >>> >>> >>> On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis < >>> dtsirogian...@cloudera.com> wrote: >>> For consistency, I believe we should at least allow an empty SORT BY() clause in the CREATE TABLE statement, but I'll defer the decision to Alex or Marcel. >>> >>> Do we think there'll be scripts generating create table statements >>> with empty sort-by clauses? I'm not sure there's a benefit to supporting >>> that (but happy to stand corrected). >>> >>> Dimitris On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) < ger...@cloudera.org> wrote: > Lars Volker has posted comments on this change. > > Change subject: IMPALA-4166: Add SORT BY sql clause > > .. > > > Patch Set 20: > > > There is still one thing that is not clear to me. Why is it > allowed > > to do an ALTER TABLE with an empty SORT BY clause but not a > CREATE > > TABLE? > > The easiest way to specify an empty list of sort by columns during > CREATE TABLE is to omit the SORT BY clause altogether. To keep things > simple I did not add additional support for an empty SORT BY () > clause. > > For ALTER TABLE there needs to be a way to specify an empty list > of columns, but since SORT BY is used to identify the command, the > most > simple form seemed to be an empty column list(). > > Do you think we should allow CREATE TABLE SORT BY() in addition to > specify an empty set of column? > > -- > To view, visit http://gerrit.cloudera.org:8080/6495 > To unsubscribe, visit http://gerrit.cloudera.org:8080/settings > > Gerrit-MessageType: comment > Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 > Gerrit-PatchSet: 20 > Gerrit-Project: Impala-ASF > Gerrit-Branch: master > Gerrit-Owner: Lars Volker > Gerrit-Reviewer: Alex Behm > Gerrit-Reviewer: Dimitris Tsirogiannis > > Gerrit-Reviewer: Lars Volker > Gerrit-Reviewer: Marcel Kornacker > Gerrit-Reviewer: Thomas Tauber-Marshall
Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
If you don't populate it for CREATE TABLE SORT BY() then I think you should remove it altogether. Dimitris On Mon, May 8, 2017 at 4:33 PM, Lars Volker wrote: > > > On Tue, May 9, 2017 at 1:24 AM, Dimitris Tsirogiannis < > dtsirogian...@cloudera.com> wrote: > >> The other alternative would be to populate it with the partitioning >> columns, if any. Thoughts? >> >> > Adding the partitioning columns to the sort by list is not supported. They > can be added to the pre-insert sort not using the clustered hint. I'm not > sure though if I understood your statement correctly. > > My question was what to do if a user executes "ALTER TABLE SORT BY();", > meaning to remove all sort by columns. Should we remove all columns from > the property, or remove the property altogether? > > >> Dimitris >> >> On Mon, May 8, 2017 at 4:02 PM, Lars Volker wrote: >> >>> >>> On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker >>> wrote: >>> On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis < dtsirogian...@cloudera.com> wrote: > I believe it sends the wrong message and is probably confusing to > throw an error when someone writes a CREATE TABLE with an empty SORT BY() > but allow the same clause in an ALTER. No doubt the users can read the > documentation and figure it out but its an extra step. Also, as Marcel > mentions, scripting may be easier as you don't have to figure out whether > to add a clause or not. Anyways, the patch is great as is, I just wanted > to > mention this. > Sounds like we should allow an empty list then. >>> >>> I will change the parser accordingly. On a related note, ALTER TABLE >>> SORT BY () will leave an empty property 'sort.by.columns'. Should it remove >>> the property altogether instead? >>> > > > Dimitris > > On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker > wrote: > >> >> >> On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis < >> dtsirogian...@cloudera.com> wrote: >> >>> For consistency, I believe we should at least allow an empty SORT >>> BY() clause in the CREATE TABLE statement, but I'll defer the decision >>> to >>> Alex or Marcel. >>> >> >> Do we think there'll be scripts generating create table statements >> with empty sort-by clauses? I'm not sure there's a benefit to supporting >> that (but happy to stand corrected). >> >> >>> >>> Dimitris >>> >>> On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) < >>> ger...@cloudera.org> wrote: >>> Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 20: > There is still one thing that is not clear to me. Why is it allowed > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE > TABLE? The easiest way to specify an empty list of sort by columns during CREATE TABLE is to omit the SORT BY clause altogether. To keep things simple I did not add additional support for an empty SORT BY () clause. For ALTER TABLE there needs to be a way to specify an empty list of columns, but since SORT BY is used to identify the command, the most simple form seemed to be an empty column list(). Do you think we should allow CREATE TABLE SORT BY() in addition to specify an empty set of column? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No >>> >>> >> > >>> >> >
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 21: Rebased, added support for CREATE TABLE SORT BY() with empty sort by list. Will need to rebase again though, since some conflicting change got merge in the meantime. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#21). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort by columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 33 files changed, 1,379 insertions(+), 89 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/21 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 21 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
On Tue, May 9, 2017 at 1:24 AM, Dimitris Tsirogiannis < dtsirogian...@cloudera.com> wrote: > The other alternative would be to populate it with the partitioning > columns, if any. Thoughts? > > Adding the partitioning columns to the sort by list is not supported. They can be added to the pre-insert sort not using the clustered hint. I'm not sure though if I understood your statement correctly. My question was what to do if a user executes "ALTER TABLE SORT BY();", meaning to remove all sort by columns. Should we remove all columns from the property, or remove the property altogether? > Dimitris > > On Mon, May 8, 2017 at 4:02 PM, Lars Volker wrote: > >> >> On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker >> wrote: >> >>> >>> >>> On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis < >>> dtsirogian...@cloudera.com> wrote: >>> I believe it sends the wrong message and is probably confusing to throw an error when someone writes a CREATE TABLE with an empty SORT BY() but allow the same clause in an ALTER. No doubt the users can read the documentation and figure it out but its an extra step. Also, as Marcel mentions, scripting may be easier as you don't have to figure out whether to add a clause or not. Anyways, the patch is great as is, I just wanted to mention this. >>> >>> Sounds like we should allow an empty list then. >>> >> >> I will change the parser accordingly. On a related note, ALTER TABLE SORT >> BY () will leave an empty property 'sort.by.columns'. Should it remove the >> property altogether instead? >> >>> >>> Dimitris On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker wrote: > > > On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis < > dtsirogian...@cloudera.com> wrote: > >> For consistency, I believe we should at least allow an empty SORT >> BY() clause in the CREATE TABLE statement, but I'll defer the decision to >> Alex or Marcel. >> > > Do we think there'll be scripts generating create table statements > with empty sort-by clauses? I'm not sure there's a benefit to supporting > that (but happy to stand corrected). > > >> >> Dimitris >> >> On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) < >> ger...@cloudera.org> wrote: >> >>> Lars Volker has posted comments on this change. >>> >>> Change subject: IMPALA-4166: Add SORT BY sql clause >>> >>> .. >>> >>> >>> Patch Set 20: >>> >>> > There is still one thing that is not clear to me. Why is it allowed >>> > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE >>> > TABLE? >>> >>> The easiest way to specify an empty list of sort by columns during >>> CREATE TABLE is to omit the SORT BY clause altogether. To keep things >>> simple I did not add additional support for an empty SORT BY () clause. >>> >>> For ALTER TABLE there needs to be a way to specify an empty list of >>> columns, but since SORT BY is used to identify the command, the most >>> simple >>> form seemed to be an empty column list(). >>> >>> Do you think we should allow CREATE TABLE SORT BY() in addition to >>> specify an empty set of column? >>> >>> -- >>> To view, visit http://gerrit.cloudera.org:8080/6495 >>> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings >>> >>> Gerrit-MessageType: comment >>> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 >>> Gerrit-PatchSet: 20 >>> Gerrit-Project: Impala-ASF >>> Gerrit-Branch: master >>> Gerrit-Owner: Lars Volker >>> Gerrit-Reviewer: Alex Behm >>> Gerrit-Reviewer: Dimitris Tsirogiannis >>> Gerrit-Reviewer: Lars Volker >>> Gerrit-Reviewer: Marcel Kornacker >>> Gerrit-Reviewer: Thomas Tauber-Marshall >>> Gerrit-HasComments: No >>> >> >> > >>> >> >
Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
The other alternative would be to populate it with the partitioning columns, if any. Thoughts? Dimitris On Mon, May 8, 2017 at 4:02 PM, Lars Volker wrote: > > On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker > wrote: > >> >> >> On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis < >> dtsirogian...@cloudera.com> wrote: >> >>> I believe it sends the wrong message and is probably confusing to throw >>> an error when someone writes a CREATE TABLE with an empty SORT BY() but >>> allow the same clause in an ALTER. No doubt the users can read the >>> documentation and figure it out but its an extra step. Also, as Marcel >>> mentions, scripting may be easier as you don't have to figure out whether >>> to add a clause or not. Anyways, the patch is great as is, I just wanted to >>> mention this. >>> >> >> Sounds like we should allow an empty list then. >> > > I will change the parser accordingly. On a related note, ALTER TABLE SORT > BY () will leave an empty property 'sort.by.columns'. Should it remove the > property altogether instead? > >> >> >>> >>> >>> Dimitris >>> >>> On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker >>> wrote: >>> On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis < dtsirogian...@cloudera.com> wrote: > For consistency, I believe we should at least allow an empty SORT BY() > clause in the CREATE TABLE statement, but I'll defer the decision to Alex > or Marcel. > Do we think there'll be scripts generating create table statements with empty sort-by clauses? I'm not sure there's a benefit to supporting that (but happy to stand corrected). > > Dimitris > > On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) < > ger...@cloudera.org> wrote: > >> Lars Volker has posted comments on this change. >> >> Change subject: IMPALA-4166: Add SORT BY sql clause >> >> .. >> >> >> Patch Set 20: >> >> > There is still one thing that is not clear to me. Why is it allowed >> > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE >> > TABLE? >> >> The easiest way to specify an empty list of sort by columns during >> CREATE TABLE is to omit the SORT BY clause altogether. To keep things >> simple I did not add additional support for an empty SORT BY () clause. >> >> For ALTER TABLE there needs to be a way to specify an empty list of >> columns, but since SORT BY is used to identify the command, the most >> simple >> form seemed to be an empty column list(). >> >> Do you think we should allow CREATE TABLE SORT BY() in addition to >> specify an empty set of column? >> >> -- >> To view, visit http://gerrit.cloudera.org:8080/6495 >> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings >> >> Gerrit-MessageType: comment >> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 >> Gerrit-PatchSet: 20 >> Gerrit-Project: Impala-ASF >> Gerrit-Branch: master >> Gerrit-Owner: Lars Volker >> Gerrit-Reviewer: Alex Behm >> Gerrit-Reviewer: Dimitris Tsirogiannis >> Gerrit-Reviewer: Lars Volker >> Gerrit-Reviewer: Marcel Kornacker >> Gerrit-Reviewer: Thomas Tauber-Marshall >> Gerrit-HasComments: No >> > > >>> >> >
Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker wrote: > > > On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis < > dtsirogian...@cloudera.com> wrote: > >> I believe it sends the wrong message and is probably confusing to throw >> an error when someone writes a CREATE TABLE with an empty SORT BY() but >> allow the same clause in an ALTER. No doubt the users can read the >> documentation and figure it out but its an extra step. Also, as Marcel >> mentions, scripting may be easier as you don't have to figure out whether >> to add a clause or not. Anyways, the patch is great as is, I just wanted to >> mention this. >> > > Sounds like we should allow an empty list then. > I will change the parser accordingly. On a related note, ALTER TABLE SORT BY () will leave an empty property 'sort.by.columns'. Should it remove the property altogether instead? > > >> >> >> Dimitris >> >> On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker >> wrote: >> >>> >>> >>> On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis < >>> dtsirogian...@cloudera.com> wrote: >>> For consistency, I believe we should at least allow an empty SORT BY() clause in the CREATE TABLE statement, but I'll defer the decision to Alex or Marcel. >>> >>> Do we think there'll be scripts generating create table statements with >>> empty sort-by clauses? I'm not sure there's a benefit to supporting that >>> (but happy to stand corrected). >>> >>> Dimitris On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) < ger...@cloudera.org> wrote: > Lars Volker has posted comments on this change. > > Change subject: IMPALA-4166: Add SORT BY sql clause > .. > > > Patch Set 20: > > > There is still one thing that is not clear to me. Why is it allowed > > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE > > TABLE? > > The easiest way to specify an empty list of sort by columns during > CREATE TABLE is to omit the SORT BY clause altogether. To keep things > simple I did not add additional support for an empty SORT BY () clause. > > For ALTER TABLE there needs to be a way to specify an empty list of > columns, but since SORT BY is used to identify the command, the most > simple > form seemed to be an empty column list(). > > Do you think we should allow CREATE TABLE SORT BY() in addition to > specify an empty set of column? > > -- > To view, visit http://gerrit.cloudera.org:8080/6495 > To unsubscribe, visit http://gerrit.cloudera.org:8080/settings > > Gerrit-MessageType: comment > Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 > Gerrit-PatchSet: 20 > Gerrit-Project: Impala-ASF > Gerrit-Branch: master > Gerrit-Owner: Lars Volker > Gerrit-Reviewer: Alex Behm > Gerrit-Reviewer: Dimitris Tsirogiannis > Gerrit-Reviewer: Lars Volker > Gerrit-Reviewer: Marcel Kornacker > Gerrit-Reviewer: Thomas Tauber-Marshall > Gerrit-HasComments: No > >>> >> >
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 20: Code-Review+1 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
I believe it sends the wrong message and is probably confusing to throw an error when someone writes a CREATE TABLE with an empty SORT BY() but allow the same clause in an ALTER. No doubt the users can read the documentation and figure it out but its an extra step. Also, as Marcel mentions, scripting may be easier as you don't have to figure out whether to add a clause or not. Anyways, the patch is great as is, I just wanted to mention this. Dimitris On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker wrote: > > > On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis < > dtsirogian...@cloudera.com> wrote: > >> For consistency, I believe we should at least allow an empty SORT BY() >> clause in the CREATE TABLE statement, but I'll defer the decision to Alex >> or Marcel. >> > > Do we think there'll be scripts generating create table statements with > empty sort-by clauses? I'm not sure there's a benefit to supporting that > (but happy to stand corrected). > > >> >> Dimitris >> >> On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) < >> ger...@cloudera.org> wrote: >> >>> Lars Volker has posted comments on this change. >>> >>> Change subject: IMPALA-4166: Add SORT BY sql clause >>> .. >>> >>> >>> Patch Set 20: >>> >>> > There is still one thing that is not clear to me. Why is it allowed >>> > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE >>> > TABLE? >>> >>> The easiest way to specify an empty list of sort by columns during >>> CREATE TABLE is to omit the SORT BY clause altogether. To keep things >>> simple I did not add additional support for an empty SORT BY () clause. >>> >>> For ALTER TABLE there needs to be a way to specify an empty list of >>> columns, but since SORT BY is used to identify the command, the most simple >>> form seemed to be an empty column list(). >>> >>> Do you think we should allow CREATE TABLE SORT BY() in addition to >>> specify an empty set of column? >>> >>> -- >>> To view, visit http://gerrit.cloudera.org:8080/6495 >>> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings >>> >>> Gerrit-MessageType: comment >>> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 >>> Gerrit-PatchSet: 20 >>> Gerrit-Project: Impala-ASF >>> Gerrit-Branch: master >>> Gerrit-Owner: Lars Volker >>> Gerrit-Reviewer: Alex Behm >>> Gerrit-Reviewer: Dimitris Tsirogiannis >>> Gerrit-Reviewer: Lars Volker >>> Gerrit-Reviewer: Marcel Kornacker >>> Gerrit-Reviewer: Thomas Tauber-Marshall >>> Gerrit-HasComments: No >>> >> >> >
Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Anything else you would like me to change for a +1? Alex, Marcel, do you prefer to have a look before or after rebasing the change? Cheers, Lars On Sun, May 7, 2017 at 4:57 AM, Dimitris Tsirogiannis < dtsirogian...@cloudera.com> wrote: > For consistency, I believe we should at least allow an empty SORT BY() > clause in the CREATE TABLE statement, but I'll defer the decision to Alex > or Marcel. > > Dimitris > > On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) < > ger...@cloudera.org> wrote: > >> Lars Volker has posted comments on this change. >> >> Change subject: IMPALA-4166: Add SORT BY sql clause >> .. >> >> >> Patch Set 20: >> >> > There is still one thing that is not clear to me. Why is it allowed >> > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE >> > TABLE? >> >> The easiest way to specify an empty list of sort by columns during CREATE >> TABLE is to omit the SORT BY clause altogether. To keep things simple I did >> not add additional support for an empty SORT BY () clause. >> >> For ALTER TABLE there needs to be a way to specify an empty list of >> columns, but since SORT BY is used to identify the command, the most simple >> form seemed to be an empty column list(). >> >> Do you think we should allow CREATE TABLE SORT BY() in addition to >> specify an empty set of column? >> >> -- >> To view, visit http://gerrit.cloudera.org:8080/6495 >> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings >> >> Gerrit-MessageType: comment >> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 >> Gerrit-PatchSet: 20 >> Gerrit-Project: Impala-ASF >> Gerrit-Branch: master >> Gerrit-Owner: Lars Volker >> Gerrit-Reviewer: Alex Behm >> Gerrit-Reviewer: Dimitris Tsirogiannis >> Gerrit-Reviewer: Lars Volker >> Gerrit-Reviewer: Marcel Kornacker >> Gerrit-Reviewer: Thomas Tauber-Marshall >> Gerrit-HasComments: No >> > >
Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
For consistency, I believe we should at least allow an empty SORT BY() clause in the CREATE TABLE statement, but I'll defer the decision to Alex or Marcel. Dimitris On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) < ger...@cloudera.org> wrote: > Lars Volker has posted comments on this change. > > Change subject: IMPALA-4166: Add SORT BY sql clause > .. > > > Patch Set 20: > > > There is still one thing that is not clear to me. Why is it allowed > > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE > > TABLE? > > The easiest way to specify an empty list of sort by columns during CREATE > TABLE is to omit the SORT BY clause altogether. To keep things simple I did > not add additional support for an empty SORT BY () clause. > > For ALTER TABLE there needs to be a way to specify an empty list of > columns, but since SORT BY is used to identify the command, the most simple > form seemed to be an empty column list(). > > Do you think we should allow CREATE TABLE SORT BY() in addition to specify > an empty set of column? > > -- > To view, visit http://gerrit.cloudera.org:8080/6495 > To unsubscribe, visit http://gerrit.cloudera.org:8080/settings > > Gerrit-MessageType: comment > Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 > Gerrit-PatchSet: 20 > Gerrit-Project: Impala-ASF > Gerrit-Branch: master > Gerrit-Owner: Lars Volker > Gerrit-Reviewer: Alex Behm > Gerrit-Reviewer: Dimitris Tsirogiannis > Gerrit-Reviewer: Lars Volker > Gerrit-Reviewer: Marcel Kornacker > Gerrit-Reviewer: Thomas Tauber-Marshall > Gerrit-HasComments: No >
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 20: > There is still one thing that is not clear to me. Why is it allowed > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE > TABLE? The easiest way to specify an empty list of sort by columns during CREATE TABLE is to omit the SORT BY clause altogether. To keep things simple I did not add additional support for an empty SORT BY () clause. For ALTER TABLE there needs to be a way to specify an empty list of columns, but since SORT BY is used to identify the command, the most simple form seemed to be an empty column list(). Do you think we should allow CREATE TABLE SORT BY() in addition to specify an empty set of column? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 20: There is still one thing that is not clear to me. Why is it allowed to do an ALTER TABLE with an empty SORT BY clause but not a CREATE TABLE? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 20: (1 comment) http://gerrit.cloudera.org:8080/#/c/6495/18/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: PS18, Line 183: # CTAS with partitions and sort by columns > It's a parser error in create but allowed in alter (see ParserTest.java:227 Oh, I understand :). Yes, dropping sort by columns should result in an empty list in the same way that alter table sort by() does - with the exception that the latter does not drop the columns from the table. I added 2 additional tests to alter-table.test. Can you have a look there? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#20). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort by columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 33 files changed, 1,371 insertions(+), 89 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/20 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 20 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 18: (1 comment) http://gerrit.cloudera.org:8080/#/c/6495/18/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: PS18, Line 183: # CTAS with partitions and sort by columns > I'm not sure I'm following here. sort by() cannot have an empty column list It's a parser error in create but allowed in alter (see ParserTest.java:2271). So, there is some inconsistency here. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 17: (3 comments) Thanks for the review. Please see PS19 and my inline comments. http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: PS17, Line 183: // ALTER TABLE SET is not supported on HBase tables altogether. : Preconditions.checkState(!(table instanceof HBaseTable)); > Sorry, I was misled by the comment above it and by the fact that I didn't s For the SORT BY, there's a check in AlterTableSortByColumnsStmt::analyze(). For ALTER TABLE SET, there's a check in AlterTableSetStmt::analyze(). I added a reference to that in the comment. Please let me know if I should describe it in more detail in the comment. http://gerrit.cloudera.org:8080/#/c/6495/18/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: PS18, Line 183: # CTAS with partitions and sort by columns > If you don't have one already, can you add a test for a partitioned table w I'm not sure I'm following here. sort by() cannot have an empty column list. It's a parser error and tested in ParserTest.java:2365. Do you have a particular corner case in mind that you'd like to test here? That being said, I'm not sure your second suggestion would work. I added a test to the bottom of alter-table.test to check that changing the sort by columns works on a partitioned table. http://gerrit.cloudera.org:8080/#/c/6495/18/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test: PS18, Line 327: > Thanks for adding this test. Can you also add another one in which theres i I added a test. The columns get concatenated as described in the commit message. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#19). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort by columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 33 files changed, 1,353 insertions(+), 89 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/19 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 19 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 18: (2 comments) Thanks Lars. Some minor comments on testing. http://gerrit.cloudera.org:8080/#/c/6495/18/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: PS18, Line 183: # CTAS with partitions and sort by columns If you don't have one already, can you add a test for a partitioned table with an empty sort by clause (e.g. partitioned by (year) sort by ()). Also, you may want to check the following behavior: 1. create partitioned table with non-empty sort by clause 2. drop all the columns that appear in the sort by clause 3. check if this is equivalent to sort by (). http://gerrit.cloudera.org:8080/#/c/6495/18/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test: PS18, Line 327: # Test that using both the sortby hint and sort by columns work. Thanks for adding this test. Can you also add another one in which theres is some overlap between the columns of the sort by hint and the columns of the sort by clause? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 17: (1 comment) http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: PS17, Line 183: // ALTER TABLE SET is not supported on HBase tables altogether. : Preconditions.checkState(!(table instanceof HBaseTable)); > I though your comment on PS15 asked me to add it, but I may have misunderst Sorry, I was misled by the comment above it and by the fact that I didn't see a check for HBaseTable in the analyze() function. If we want to be consistent with your comment, we should have a check in analyze() but I am wondering if there is a check somewhere else. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 17: (7 comments) Thank you for your review. Please see PS 18. http://gerrit.cloudera.org:8080/#/c/6495/17//COMMIT_MSG Commit Message: PS17, Line 30: SORT BY columns supersede the sortby() hint, which we will remove in a : subsequent change (IMPALA-5144). > Mention what happens if both are used (e.g. union the columns or something Done http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: PS17, Line 152: Returns a list of positions of :* the sort by columns within the table's list of columns > This function doesn't return anything. Done PS17, Line 183: // ALTER TABLE SET is not supported on HBase tables altogether. : Preconditions.checkState(!(table instanceof HBaseTable)); > This check doesn't belong here. Maybe move it to the analyze() function? I though your comment on PS15 asked me to add it, but I may have misunderstood. What did you mean? http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java: PS17, Line 61: HashMap > nit: Map Done PS17, Line 87: a > nit: an Done http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS17, Line 138: > Mention what happens if both are used (i.e. sort.by.column and sortby() hin Done http://gerrit.cloudera.org:8080/#/c/6495/17/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test: PS17, Line 1: # > Add one or more tests where the sortby hint is used in conjunction with sor I added a test at the bottom of the file. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#18). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Until then, it is possible to specify sort by columns using both ways at the same time and the column lists will be concatenated. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 33 files changed, 1,314 insertions(+), 89 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/18 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 18 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 17: (7 comments) http://gerrit.cloudera.org:8080/#/c/6495/17//COMMIT_MSG Commit Message: PS17, Line 30: SORT BY columns supersede the sortby() hint, which we will remove in a : subsequent change (IMPALA-5144). Mention what happens if both are used (e.g. union the columns or something else) http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: PS17, Line 152: Returns a list of positions of :* the sort by columns within the table's list of columns This function doesn't return anything. PS17, Line 183: // ALTER TABLE SET is not supported on HBase tables altogether. : Preconditions.checkState(!(table instanceof HBaseTable)); This check doesn't belong here. Maybe move it to the analyze() function? http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java: PS17, Line 61: HashMap nit: Map PS17, Line 87: a nit: an http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS17, Line 138: Mention what happens if both are used (i.e. sort.by.column and sortby() hint). http://gerrit.cloudera.org:8080/#/c/6495/17/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test: PS17, Line 1: # Add one or more tests where the sortby hint is used in conjunction with sort by clause, if you don't already have one. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 17: I think the interaction between the hint and clause confused me at some point. I'll take another look at the new patch and see if removing the hint will make things simpler. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 17: > Quick question. Is there a reason why we don't remove the sortby > hint in this patch? I thought it would be easier to review in a subsequent change, since adding the table property and removing the hint seem mostly orthogonal and this change already has over 1k lines. Would you prefer to remove the hint in the same change? I don't feel strongly about it. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 15: Quick question. Is there a reason why we don't remove the sortby hint in this patch? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 17: (19 comments) Thank you for your review! Please see my inline comments and PS 17. http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: PS15, Line 168: > typo: commas Done PS15, Line 169: ' > remove Done PS15, Line 170: f column names separated by commas, > nit: this function Done PS15, Line 172: during the an > You need to comment on the return value. Done PS15, Line 174: columns. > long line Done PS15, Line 178: if (!tblProperties.containsKey( > Whenever I see this, I keep wondering about HBase. Can you add a preconditi Done http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java: PS15, Line 44: public static final String TBL_PROP_SORT_BY_COLUMNS = "sort.by.columns"; > Maybe it makes more sense to put this in AlterTableSetTblProperiesStmt. Not After moving this back and forth, my reasoning was that this is the more natural way to change the underlying table property, and therefore I left it here. In a similar fashion, the property name for 'skip.header.line.count' is defined in HdfsTable.java. I also don't have a strong preference. Let's wait what the next reviewer says then. :) PS15, Line 93: TableDef.analyzeSortByColumns(sor > What else could it be? You can just do Preconditions.checkState(table insta Done http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS15, Line 768: if (!(table_ instanceof HdfsTable)) return; : > Why not if (!(table_ instanceof HdfsTable)) return; instead? Done PS15, Line 843: // We need to make a copy to make the sortByColumns_ list mutable. : // TODO: Remove this when removing the sortby() hint (IMPALA-5157). : sortByColumns_ = Lists.newArrayList(sortByColumns_); : } : for (int i = 0; i < columns.size(); ++i) { : if (columns.get(i).getName().equals(columnName)) { : sortByExprs_.add(resultExprs_.get(i)); : sortByColumns_.add(i); : foundColumn = true; : break; : } : } : i > Not sure I follow what is happening here. If you already have sortByColumns This code is required for the sortby() hint, and will be removed together with the hint. Should I try and explain this more in the comment above? http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: PS15, Line 284: List I like using interfaces but is this ever called with anything other than a No. I think it was used with the result of a split() in an earlier patchset, but now is only used for Lists. PS15, Line 350: if (options_.sortByCols == null) r > if (options_.sortByCols == null) return; Done http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: PS15, Line 77: S > nit: extra space Done PS15, Line 297: if (sortByColsSql != null) { : sb.append(String.format("SORT BY (\n %s\n)\n", : Joiner.on(", \n ").join(sortByColsSql))); : } > haha, I can't visualize the output here. I re-used the code from L288. Do you have a suggestion how it can be made more clear? http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS15, Line 456: he/she adde > nit: added I think both ways are fine, adding the columns first, or calling this method first. Changed it to added though, since that seems to reflect the more common case better. http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS15, Line 1881: String oldColumns = msTbl.getParameters().get(sortByKey); : String alteredColumns = MetaStoreUtil.intersectCsvLis > I find it hard to parse this. Can you split it into two statements? Also, e Done PS15, Line 1915: ring sortByKey = AlterTableSortByColumnsStmt.TBL_PROP_SORT_BY_COLUMNS; : if (msTbl.getParameters().containsKey(sortByKey)) { > same here Done PS15, Line 21
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#17). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 33 files changed, 1,287 insertions(+), 90 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/17 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 17 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#16). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 33 files changed, 1,287 insertions(+), 90 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/16 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 16 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 15: (19 comments) http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: PS15, Line 168: commans typo: commas PS15, Line 169: of remove PS15, Line 170: this will throw an AnalysisException nit: this function PS15, Line 172: List You need to comment on the return value. PS15, Line 174: if (!tblProperties.containsKey(AlterTableSortByColumnsStmt.TBL_PROP_SORT_BY_COLUMNS)) { long line PS15, Line 178: if (table instanceof KuduTable) { Whenever I see this, I keep wondering about HBase. Can you add a precondition check or a similar check? whatever is applicable. http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java: PS15, Line 44: public static final String TBL_PROP_SORT_BY_COLUMNS = "sort.by.columns"; Maybe it makes more sense to put this in AlterTableSetTblProperiesStmt. Not a strong preference though. PS15, Line 93: if (table instanceof HdfsTable) { What else could it be? You can just do Preconditions.checkState(table instanceof HdfsTable); and then cast and remove the if statement. http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS15, Line 768: if (table_ instanceof HBaseTable) return; : if (table_ instanceof KuduTable) return; Why not if (!(table_ instanceof HdfsTable)) return; instead? PS15, Line 843: if (!columns.isEmpty()) { : // We need to make a copy to make the sortByColumns_ list mutable. : // TODO: Remove this when removing the sortby() hint (IMPALA-5157). : sortByColumns_ = Lists.newArrayList(sortByColumns_); : } : for (int i = 0; i < columns.size(); ++i) { : if (columns.get(i).getName().equals(columnName)) { : sortByExprs_.add(resultExprs_.get(i)); : sortByColumns_.add(i); : foundColumn = true; : break; : } : } Not sure I follow what is happening here. If you already have sortByColumns_ do you expand them based on the hint or something else? Do we have tests for that? http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: PS15, Line 284: Iterable I like using interfaces but is this ever called with anything other than a List? PS15, Line 350: if (options_.sortByCols != null) { if (options_.sortByCols == null) return; http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java: PS15, Line 77: nit: extra space PS15, Line 297: if (sortByColsSql != null) { : sb.append(String.format("SORT BY (\n %s\n)\n", : Joiner.on(", \n ").join(sortByColsSql))); : } haha, I can't visualize the output here. http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS15, Line 456: he/she adds nit: added http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS15, Line 1881: msTbl.getParameters().put(sortByKey, MetaStoreUtil.intersectCsvListWithColumNames( : msTbl.getParameters().get( sortByKey), columns)); I find it hard to parse this. Can you split it into two statements? Also, extra space after 'get('. PS15, Line 1915: msTbl.getParameters().put(sortByKey, MetaStoreUtil.replaceValueInCsvList( : msTbl.getParameters().get( sortByKey), colName, newCol.getColumnName())); same here PS15, Line 2198: msTbl.getParameters().put(sortByKey, MetaStoreUtil.removeValueFromCsvList( : msTbl.getParameters().get( sortByKey), colName)); and here http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java: PS15, Line 218: HashSet rightColNames = Sets.newHashSet(); : for (TColumn c : rightCols) rightColNames.add(c.getColumnName().toLowerCase()); : List outputList = Lists.newAr
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 15: Doing another pass now... -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 14: Please have a look at PS15. It removes support for Kudu tables, and moves the analysis code from the TablePropertyAnalyzer to where it is used and/or where it semantically seems to belong better. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#15). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can only be specified for Hdfs tables and effectiveness may vary based on storage type; for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 33 files changed, 1,277 insertions(+), 89 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/15 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 15 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 9: (1 comment) Thank you for your review. Please see PS14 and the email thread I started on Kudu tables. http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: Line 114: if (!foundColumn) { > why not apply the drop/rename to the sort-by list? As discussed in person, storing sorting columns as names seems to make it more robust against changes that happen outside of our control, e.g. a user dropping a column in Hive. I added code to handle alter table statements properly and in a single statement. We need to decide whether we want to support Kudu tables at all, I'll start an email thread for that. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#14). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) PARTITIONS 8 SORT BY(i) STORED AS KUDU; CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can be specified for all table types, but effectiveness may vary based on table type, for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns for HDFS tables, and must not contain primary key columns for Kudu tables. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 34 files changed, 1,332 insertions(+), 85 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/14 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 14 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: Line 114: if (!foundColumn) { > Yes, currently that's the case. I think we discussed this in person once, t why not apply the drop/rename to the sort-by list? (why is rename an issue? do you not store the sorting columns as integer indices into the table column list?) -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 12: (5 comments) Thank you Alex for the review. Please see my inline comments and PS13. http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: Line 85:*/ Each of the public methods is called at least twice from elsewhere so inlining them seems wasteful. For cases where we don't have a table yet (e.g. create table statements), I can't think of an interface that's much better than this. Should we build lookup tables for the column lists first to address your performance considerations? We could also try build an interface and dummy table for cases where we don't have a table yet. Should we discuss this in a short meeting? Line 114: } > Does this mean we need 2 ALTER statements to drop/rename a column? Yes, currently that's the case. I think we discussed this in person once, too, but I don't remember what we decided to do. Do you have a preference? Should the drop/rename statement fail if the columns is still referenced in the sort by list? Or at least issue a warning? Line 115: if (!foundColumn) { Yes, in AnalyzeDDLStmts.java. http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 458:*/ > Preconditions.checkStats(RuntimeEnv.isTestEnv()); Cool, done. http://gerrit.cloudera.org:8080/#/c/6495/9/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test: Line 170: # IMPALA-4166: sort by columns are correct when using a partial column permutation. > What if neither bool_col or int_col are mentioned in the column permutation No, constant exprs are removed from the sortby list in the Planner, and for an empty list we don't create a sort node at all. I added a test to make sure this works as intended. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#13). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) PARTITIONS 8 SORT BY(i) STORED AS KUDU; CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can be specified for all table types, but effectiveness may vary based on table type, for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns for HDFS tables, and must not contain primary key columns for Kudu tables. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 33 files changed, 1,234 insertions(+), 85 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/13 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 13 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#12). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) PARTITIONS 8 SORT BY(i) STORED AS KUDU; CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can be specified for all table types, but effectiveness may vary based on table type, for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns for HDFS tables, and must not contain primary key columns for Kudu tables. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 33 files changed, 1,234 insertions(+), 85 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/12 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 12 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Alex Behm has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 9: (5 comments) http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: Line 85: public static List analyzeSortByColumns(List sortByCols, Do we need all the public APIs? Probably not a hug deal, but I worry about the N^2 loop in here. Table and KuduTable provide efficient ways for checking whether a column is a partition column or primary key. Line 114: if (!foundColumn) { Does this mean we need 2 ALTER statements to drop/rename a column? Line 115: throw new AnalysisException(String.format("Could not find SORT BY column '%s' " + Do we have a test for this? http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 458: public void setNumClusteringCols(int n) { numClusteringCols_ = n; } Preconditions.checkStats(RuntimeEnv.isTestEnv()); http://gerrit.cloudera.org:8080/#/c/6495/9/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test: Line 170: # IMPALA-4166: sort by columns are correct when using a partial column permutation. What if neither bool_col or int_col are mentioned in the column permutation? Will we still create a sort node? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Alex Behm has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true); > i think a 'show create table t' should print the sort-by clause, not a tabl This means that the SHOW CREATE TABLE will only be executable by Impala, and not by any other engine like Hive. If that's acceptable, then I don't mind this solution. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Alex Behm Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 9: (1 comment) Thanks for the review. I reworked the toSql() logic so that the 'sort.by.columns' property now is hidden from the user. Please see PS11. http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true); > i think a 'show create table t' should print the sort-by clause, not a tabl Done -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#11). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) PARTITIONS 8 SORT BY(i) STORED AS KUDU; CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can be specified for all table types, but effectiveness may vary based on table type, for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns for HDFS tables, and must not contain primary key columns for Kudu tables. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 33 files changed, 1,210 insertions(+), 85 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/11 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 11 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 9: (1 comment) http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true); > I commented on the decision to keep the sort.by.columns property visible in i think a 'show create table t' should print the sort-by clause, not a table property. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 10: (8 comments) Thank you for your reviews. Please see my inline comments and PS10. http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS9, Line 376: analyzeSortByColumn > Instead of checking this here, it's a bit cleaner to alway call analyzeSort Done http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS9, Line 49: roperties.containsKe > The caller is not supposed to modify the return values, correct? You may ju Done, I ended up using ImmutableList, since I already use it below. PS9, Line 77: > Same comment as above. Done PS9, Line 89: t by column in the li > You can still construct and return an ImmutableList. See Guava's ImmutableL I needed the contains() method in line 105, as well as size(), so I ended up using a LinkedHashSet. http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 515: AnalyzesOk("alter table functional.alltypes set tblproperties(" + > why allow this in addition to the '... sort by ()' clause? Alex and I discussed this and Alex suggested that we allow this, since it is easier than explicitly disabling it, and we'll always have the possibility that someone changes the values in Hive. The alternative would be to try and hide the table property completely. If Alex doesn't object and you think it's worth the effort to hide this from the users, I can implement it. Line 1905: "tblproperties ('sort.by.columns'='i')", "Table definition must not contain " + > same here, why allow the sort.by.columns table property? Please see my reply above. http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: PS9, Line 2374: // SORT BY must be the first table option : ParserError("CREATE > You may want to comment why this results in a parsing error. Done http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true); > why not print that as a 'sort by' clause? I commented on the decision to keep the sort.by.columns property visible in AnalyzeDDLTest.java. I discussed ToSqlTest() with Alex, too, and IIRC he explained that this is only really relevant for views, in which create table statements cannot occur anyways. Therefore, this should never be visible to a user and is tested here only for completeness. Should I add a comment to explain this? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Hello Dimitris Tsirogiannis, I'd like you to reexamine a change. Please visit http://gerrit.cloudera.org:8080/6495 to look at the new patch set (#10). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) PARTITIONS 8 SORT BY(i) STORED AS KUDU; CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can be specified for all table types, but effectiveness may vary based on table type, for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns for HDFS tables, and must not contain primary key columns for Kudu tables. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 30 files changed, 1,121 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/10 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 10 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 9: (3 comments) http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 515: AnalyzesOk("alter table functional.alltypes set tblproperties(" + why allow this in addition to the '... sort by ()' clause? Line 1905: "tblproperties ('sort.by.columns'='i')", "Table definition must not contain " + same here, why allow the sort.by.columns table property? http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java: Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true); why not print that as a 'sort by' clause? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 9: Code-Review+1 (5 comments) http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS9, Line 376: if (!isHBaseTable) Instead of checking this here, it's a bit cleaner to alway call analyzeSortByColumns() and move the check there. http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS9, Line 49: Lists.newArrayList() The caller is not supposed to modify the return values, correct? You may just return Collections.emptyList() which is also immutable. PS9, Line 77: Lists.newArrayList(); Same comment as above. PS9, Line 89: Lists.newArrayList(); You can still construct and return an ImmutableList. See Guava's ImmutableList.builder() http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: PS9, Line 2374: ParserError("CREATE TABLE Foo LIKE Baz STORED AS TEXTFILE LOCATION '/a/b' " + : "SORT BY(bar)"); You may want to comment why this results in a parsing error. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 8: (18 comments) Thank you for the review, please see PS9. http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS7, Line 984: Sor > There is no KW_SET, right? Update the name? Done http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java: PS7, Line 73: > t? Done http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: Line 2484:* Add a warning that will be displayed to the user. Ignores null messages. Once > Comment that no warning can be issued if warnings have been retrieved? Done http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java: PS7, Line 119: ortByColumns_ ! > I think this can throw a NPE. Looking at the parser I see that CreateTableL Thanks for catching this. I was able to trigger the NPE with an additional test in ToSqlTest.java. Fixed. http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: PS7, Line 296: > inline Done http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS7, Line 41: : /** :* Analyzes the 'sort.by.columns' property in 'tblProperties' against the columns of :* 'table'. :*/ : public static List analyzeSortByColumns(Map tblProperties, : Table table) throws AnalysisException { : if (!tblProperties.containsKey(TBL_PROP_SORT_BY_COLUMNS)) { : return Lists.newArrayList(); : } : : List sortByCols = Lists.newArrayList( : Splitter.on(",").trimResults().omitEmptyStrings().split( : tblProperties.get(TBL_PROP_SORT_BY_COLUMNS))); : return TablePropertyAnalyzer.analyzeSortByColumns(sortByCols, table); : } : > These are used in only one place, right? Maybe inline there? I removed the first one. The second one is used in two places: InsertStmt.analyzeSortByColumns() (that's where I inlined the first one), and AlterTableSetTblProperties.analyze(). I'd keep the second one unless you lean strongly towards inlining it. Line 76: } > Precondition check for HBase? Done PS7, Line 127: : > You may want to consider if it's better to have a getSortByColumn(Table tab Inlined it instead. http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS7, Line 539: // If the table has a 'sort.by.columns' property and the query has a 'noclustered' : // hint, we issue a warning during analysis and ignore the 'noclustered' hint. > That comment doesn't seem relevant here. Remove? I put it there to explain the intent of the if statement. Without it one might think that the noclusteredhint gets ignored by mistake (see previous review comment by Thomas). Should I remove/rephrase it? PS7, Line 541: nsertStmt.hasClusteredHint() || > If insertStmt.hasClusteredHint() is true doesn't that imply that hasNoClust True, fixed. http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS7, Line 1793: !params.sort_by_columns.isEmpty() > use isEmpty() Done http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 526: String long_property_key = ""; > Test for HBase and Kudu? Added tests for Kudu in testAlterKuduTable(). HBase tables do not support "ALTER TABLE SET" altogether, which is tested in L683. Line 1054: > Same here (HBase + Kudu). You need to put all Kudu tests in a function and Added test to check that HBase is not supported. Added Kudu tests to TestAlterKuduTable(). PS7, Line 1904: te t > "must" typo Done PS7, Line 1920: est > No need for that comment :) Done http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: PS7, Line 2347: // Sort by clau
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has uploaded a new patch set (#9). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) PARTITIONS 8 SORT BY(i) STORED AS KUDU; CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can be specified for all table types, but effectiveness may vary based on table type, for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns for HDFS tables, and must not contain primary key columns for Kudu tables. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 30 files changed, 1,112 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/9 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 9 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has uploaded a new patch set (#8). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) PARTITIONS 8 SORT BY(i) STORED AS KUDU; CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can be specified for all table types, but effectiveness may vary based on table type, for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns for HDFS tables, and must not contain primary key columns for Kudu tables. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 30 files changed, 1,115 insertions(+), 59 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/8 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 8 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 7: (18 comments) http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS7, Line 984: Set There is no KW_SET, right? Update the name? http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java: PS7, Line 73: getTargetTable() t? http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: Line 2484:* Add a warning that will be displayed to the user. Ignores null messages. Comment that no warning can be issued if warnings have been retrieved? http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java: PS7, Line 119: sortByColumns_. I think this can throw a NPE. Looking at the parser I see that CreateTableListStmt can pass null for sortByColumns_ and couldn't find where this changes. http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TableDef.java File fe/src/main/java/org/apache/impala/analysis/TableDef.java: PS7, Line 296: propertyString inline http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS7, Line 41: /** :* Analyzes the 'sort.by.columns' property of 'table'. :*/ : public static List analyzeSortByColumns(Table table) throws AnalysisException { : return TablePropertyAnalyzer.analyzeSortByColumns( : table.getMetaStoreTable().getParameters(), table); : } : : /** :* Analyzes the 'sort.by.columns' property in 'tblProperties' against the columns of :* 'table'. :*/ : public static List analyzeSortByColumns(Map tblProperties, : Table table) throws AnalysisException { : return TablePropertyAnalyzer.analyzeSortByColumns( : getSortByColumnsFromProperties(tblProperties), table); : } These are used in only one place, right? Maybe inline there? Line 76: } Precondition check for HBase? PS7, Line 127: private static List getSortByColumnsFromProperties( : Map tblProperties) { You may want to consider if it's better to have a getSortByColumn(Table table) function instead of this one. Just a though... http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS7, Line 539: // If the table has a 'sort.by.columns' property and the query has a 'noclustered' : // hint, we issue a warning during analysis and ignore the 'noclustered' hint. That comment doesn't seem relevant here. Remove? PS7, Line 541: !insertStmt.hasNoClusteredHint() If insertStmt.hasClusteredHint() is true doesn't that imply that hasNoClusteredHint() is false? I thought they are mutually exclusive. http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java: PS7, Line 1793: params.sort_by_columns.size() > 0 use isEmpty() http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 526: Test for HBase and Kudu? Line 1054: } Same here (HBase + Kudu). You need to put all Kudu tests in a function and check for supported. PS7, Line 1904: most "must" typo PS7, Line 1920: // Kudu table tests are in TestCreateManagedKuduTable(). No need for that comment :) http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/ParserTest.java File fe/src/test/java/org/apache/impala/analysis/ParserTest.java: PS7, Line 2347: // Sort by clause How about some tests with additional table options? Try changing the order in which the table options are specified, e.g. location bla sort by () http://gerrit.cloudera.org:8080/#/c/6495/7/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: PS7, Line 207: table select query http://gerrit.cloudera.org:8080/#/c/6495/7/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File testdata/workloads/functional-planner/queries/Pl
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 5: (5 comments) Responses to some comments. http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS5, Line 455: the caller must make sure that the value matches any columns he/she adds to the :* table. > We could throw one if the number is wrong, but eventually the caller will h No that's fine. You already mention that this is used only for tests. http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs() > Insert hints are not allowed for HBase tables and the analysis will fail. F If we enforce it at the analysis phase, then adding a Preconditions check here is not a bad idea. http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test: PS5, Line 10: ASC NULLS LAST > No, we cannot change it. I thought about this, but couldn't see why we'd wa I was thinking more of changing the ASC into DESC but that's fine if this is an acceptable limitation. PS5, Line 39: DISTRIBUTEDPLAN : WRITE TO HDFS [test_sort_by.alltypes, OVERWRITE=false, PARTITION-KEYS=(year,month)] : | partitions=24 : | : 01:SORT : | order by: year ASC NULLS LAST, month ASC NULLS LAST, int_col ASC NULLS LAST, bool_col ASC NULLS LAST : | : 00:SCAN HDFS [functional.alltypes] :partitions=24/24 files=24 size=478.45KB > They make sure that the shuffle/noshuffle hint has been observed, indicated Good point. I think we can leave them. http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-query/queries/QueryTest/alter-table.test File testdata/workloads/functional-query/queries/QueryTest/alter-table.test: PS5, Line 1066: > They are part of the output of "describe formatted". Without them the compa It's probably because of the HMS schema used (maybe they use CHAR(N)). Anyways, no need to do something. I just wanted to make sure we don't do anything funky. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 7: (20 comments) Thank you for your review! Please see PS7. http://gerrit.cloudera.org:8080/#/c/6495/5/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: Line 383: // any such columns of the source table. If unspecified, the destination table will > Otherwise? Will the destination table inherit the sort by columns of the so Yes, it'll inherit them. I expanded the comment. We have a test for this in QueryTest/create-table-like-table.test. http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java: PS5, Line 64: super.analyze(analyzer); > Comment not applicable. Remove? Done http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS5, Line 127: hasNoClusteredHint_ > Similar to the comment on hasShuffleHint, explain the allowed values for ha Done PS5, Line 763: > "if any" Done http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS5, Line 77: return Lists.newArrayList(); > This is an immutable list whereas the next function returns a mutable one. Done PS5, Line 83: :*/ > remove Done PS5, Line 91: sortByC > nit: rename to sortByColName? since you also have tableCols it may be good Done Line 92: // Make sure it's not a primary key column or partition column. > nit: remove empty line Done PS5, Line 102: : for (int j = 0; j < tableCols.size(); ++j) { > Not sure I follow this part based on the code below. Done, updated the comment. Line 120: return colIdxs; > nit: remove empty line Done http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS5, Line 455: the caller must make sure that the value matches any columns he/she adds to the :* table. > Can't we throw an exception instead? We could throw one if the number is wrong, but eventually the caller will have to provide k columns and then set this to some value 0 <= n < k. Should we check that? The comment was meant to indicate that the caller will be on themselves to use this. http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs() > I assume this works for HBase tables... Insert hints are not allowed for HBase tables and the analysis will fail. From looking at InsertStmt I think that this would return an empty list of exprs. However I think it could be more explicit. Should I add an "instanceof" check for HdfsTable here? http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: PS5, Line 521: 'sort.by.columns'='id > How about 'ID' (check for case)? Indeed it did not work, I fixed it. PS5, Line 1642: "CREATE TABLE LIKE is not supported for Kudu tables"); > Also, add the negative case where sort by cols don't exist in the source ta Done PS5, Line 1913: // Partitioned HDFS table : AnalyzesOk("create table functional.new_table (i int) PARTITIONED BY (d decimal)" + : "SORT BY (i)"); : // Column in sortby hint must not be a Hdfs partition column. : AnalysisError("create table functional.new_table (i int) PARTITIONED BY (d decimal)" + : "SORT BY (d)", "SORT BY column list must not contain partition column: 'd'"); : : // Kudu table tests are in TestCreateManagedKuduTable(). : } : : @Test : public void TestAlterKuduTable() { : TestUtils.assumeKuduIsSupported(); : > For Kudu specific tests they need to be in a function that first calls Test I move the tests to TestCreateManagedKuduTable(). We don't support setting sort by columns for HBase tables (I added code to AlterTableSortByStmt to enforce this). During analysis, we don't parse the table property, either. http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: PS5, Line 162: create table t sort by > Add one with a partitioned table? Also, a CTAS with a more
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has uploaded a new patch set (#7). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) PARTITIONS 8 SORT BY(i) STORED AS KUDU; CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can be specified for all table types, but effectiveness may vary based on table type, for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns for HDFS tables, and must not contain primary key columns for Kudu tables. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 30 files changed, 1,062 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/7 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 7 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has uploaded a new patch set (#6). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) PARTITIONS 8 SORT BY(i) STORED AS KUDU; CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can be specified for all table types, but effectiveness may vary based on table type, for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns for HDFS tables, and must not contain primary key columns for Kudu tables. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 30 files changed, 1,060 insertions(+), 58 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/6 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 6 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 5: (20 comments) http://gerrit.cloudera.org:8080/#/c/6495/5/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: Line 383: // any such columns of the source table. Otherwise? Will the destination table inherit the sort by columns of the source table, if any? http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java: PS5, Line 64: // Analyze 'sort.by.columns' property. Comment not applicable. Remove? http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS5, Line 127: hasNoClusteredHint_ Similar to the comment on hasShuffleHint, explain the allowed values for hasClusteredHint and hasNotClusteredHint. PS5, Line 763: "if any" http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS5, Line 77: return ImmutableList.of(); This is an immutable list whereas the next function returns a mutable one. Make it consistent? PS5, Line 83: If there are any errors, 'error' contains an :* error string. Otherwise, 'error' is left untouched. remove PS5, Line 91: colName nit: rename to sortByColName? since you also have tableCols it may be good to make it explicit which ones you refer to in the for loop. Line 92: nit: remove empty line PS5, Line 102: and store : // the corresponding result expr in sortByExprs_. Not sure I follow this part based on the code below. Line 120: nit: remove empty line http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: PS5, Line 455: the caller must make sure that the value matches any columns he/she adds to the :* table. Can't we throw an exception instead? http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs() I assume this works for HBase tables... http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: PS5, Line 521: 'sort.by.columns'='id How about 'ID' (check for case)? PS5, Line 1642: AnalyzesOk("create table tbl sort by (int_col,id) like functional.alltypes"); Also, add the negative case where sort by cols don't exist in the source table. PS5, Line 1913: // Kudu table : AnalyzesOk("create table tab (i int, x int primary key) partition by hash(x) " + : "partitions 8 sort by(i) stored as kudu"); : // Column in sortby hint must not be a Kudu primary key column. : AnalysisError("create table tab (i int, x int primary key) partition by hash(x) " + : "partitions 8 sort by(x) stored as kudu", "SORT BY column list must not " + : "contain primary key column: 'x'"); : AnalysisError("create table tab (i int, x int, y int, primary key (x)) partition " + : "by hash(x) partitions 8 sort by(x) stored as kudu", "SORT BY column list must " + : "not contain primary key column: 'x'"); : AnalysisError("create table tab (i int, x int, y int, primary key(x, y)) partition " + : "by hash(y) partitions 8 sort by (i,x) stored as kudu", "SORT BY column list " + : "must not contain primary key column: 'x'"); : } For Kudu specific tests they need to be in a function that first calls TestUtils.assumeKuduIsSupported() (see L1930). Do you have tests for HBase? http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test: PS5, Line 162: create table t sort by Add one with a partitioned table? Also, a CTAS with a more complex select (joins, maybe aggs) http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test File testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test: PS5, Line 10: ASC NULLS LAST Out of curiosity? Can we change the default behavior? If no, are we ok with this restriction? PS5, Line 39: DISTRIBUTEDPLAN : WRITE TO HDFS [t
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG Commit Message: PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts, : so the SORT node will contain all partitioning columns first, followed : by the SORT BY columns. > in general, the commit message shouldn't act as documentation. for things l That's fine by me. Feel free to simplify the description here and move it to the JIRA. I just wanted to understand the implemented semantics instead of trying to infer them from the code. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Marcel Kornacker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 3: (1 comment) http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG Commit Message: PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts, : so the SORT node will contain all partitioning columns first, followed : by the SORT BY columns. > Done in general, the commit message shouldn't act as documentation. for things like that, we should use the jira, which is more accessible as a public record. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Marcel Kornacker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 4: (7 comments) Thank you for the reviews. Please see my comments and PS5. http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG Commit Message: PS3, Line 9: SORT BY (...) clauses to CREATE : TABLE and ALTER TABLE statements > It would be helpful to have the formal specification of the new syntax here Done PS3, Line 13: TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) : PARTITIONS 8 SORT BY(i) STORED AS KUDU; : CREATE TABLE t SORT BY > Can you expand the comment a little bit on the semantics of this? A few hig Done http://gerrit.cloudera.org:8080/#/c/6495/3/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: PS3, Line 382: Optiona > Optional? Done http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS3, Line 1132: // This extra production is necessary since without it the parser will not be able to : // parse "CREATE TABLE A LIKE B". > I don't understand this comment, since the sort_col_cols here is optional. At first I tried adding sort_by_cols to the production above, but then the parser would not pick it up for queries like "CREATE TABLE A LIKE B" and instead try to parse it only as a "CREATE TABLE LIKE FILE" query. I could not figure out why it does that, but copying the rule fixes it. I agree that it should work. I'll ask Alex for help. http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 452: > I think this only makes sense in the context of testing? If so, note that h Done http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS3, Line 539: s' property and the query has a 'noclu > I think that you need a '&& !hasNoClusteredHint()' here. Done, also added a comment to explain that the hint will be ignored if there are sort by columns. http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 515: AnalyzesOk("alter table functional.alltypes set tblproperties(" + > Is 'sort.by.columns'='' handled? Yes, added a test. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has uploaded a new patch set (#5). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) PARTITIONS 8 SORT BY(i) STORED AS KUDU; CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can be specified for all table types, but effectiveness may vary based on table type, for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns for HDFS tables, and must not contain primary key columns for Kudu tables. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 30 files changed, 856 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/5 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 5 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has uploaded a new patch set (#4). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. Examples are: CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j); CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x) PARTITIONS 8 SORT BY(i) STORED AS KUDU; CREATE TABLE t SORT BY (int_col,id) LIKE u; CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip); ALTER TABLE t SORT BY (int_col,id); ALTER TABLE t SORT BY (); SORT BY columns can be specified for all table types, but effectiveness may vary based on table type, for example TEXT tables will not see improved compression. The SORT BY clause must not contain clustering columns for HDFS tables, and must not contain primary key columns for Kudu tables. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. We do this because SORT BY columns add a SORT node to the plan and adding the clustering columns to the SORT node is cheap. SORT BY columns supersede the sortby() hint, which we will remove in a subsequent change (IMPALA-5144). This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 30 files changed, 856 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/4 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 4 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Thomas Tauber-Marshall has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 3: (6 comments) Just a heads up, this is going to have some conflicts with IMPALA-3742 (partition and sort Kudu inserts) which I'll have a patch out for soon, but this will probably go in first so I guess that's my problem. http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG Commit Message: PS3, Line 9: SORT BY (...) clauses to CREATE : TABLE and ALTER TABLE statements It would be helpful to have the formal specification of the new syntax here, to show where the clause would go, e.g.: ALTER TABLE name SORT BY (col_spec[, col_spec...]) etc. http://gerrit.cloudera.org:8080/#/c/6495/3/common/thrift/JniCatalog.thrift File common/thrift/JniCatalog.thrift: PS3, Line 382: Options Optional? http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS3, Line 1132: // This extra production is necessary since without it the parser will not be able to : // parse "CREATE TABLE A LIKE B". I don't understand this comment, since the sort_col_cols here is optional. http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/catalog/Table.java File fe/src/main/java/org/apache/impala/catalog/Table.java: Line 452: public void setNumClusteringCols(int n) { numClusteringCols_ = n; } I think this only makes sense in the context of testing? If so, note that here. http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/planner/Planner.java File fe/src/main/java/org/apache/impala/planner/Planner.java: PS3, Line 539: !insertStmt.getSortByExprs().isEmpty() I think that you need a '&& !hasNoClusteredHint()' here. http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java: Line 515: AnalysisError("alter table functional.alltypes set tblproperties(" + Is 'sort.by.columns'='' handled? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-Reviewer: Thomas Tauber-Marshall Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 3: (1 comment) Lars, a few high level questions before I start the next review iteration. Thanks http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG Commit Message: PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts, : so the SORT node will contain all partitioning columns first, followed : by the SORT BY columns. Can you expand the comment a little bit on the semantics of this? A few high level questions are: 1. What is the connection between this and the sortby() hints? Can I use both? What if there is a conflict? Same for clustered hint. 2. Why does sort by enables clustering as well? 3. Can I specify any table columns (even the partitioning columns) in the SORT BY clause or are there some restrictions? 4. I suppose this works for all table types (including Kudu)? -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 2: (20 comments) Thank you for your Review. Please see my comments and PS3. http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS2, Line 986: // SORT BY columns are stored in the 'sort.by.columns' table property. : HashMap properties = new HashMap(); : String columnString = Joiner.on(",").join(col_names); : properties.put(TablePropertyAnalyzer.TBL_PROP_SORT_BY_COLUMNS, columnString); : RESULT = new AlterTableSetTblProperties(table, null, TTablePropertyType.TBL_PROPERTY, : properties); : :} > I don't see why you need to treat it as another SetTblProperties statement. Done PS2, Line 1143: opt_sort_by_cols:sort_by_cols : KW_LIKE table_name:other_table : opt_comment_val:comment : file_format_create_table_val:file_format location_val:location > nit: indentation Done PS2, Line 1182: opt_sort_by_cols:sort_by_cols opt_comment_val:comment row_format_val:row_format serde_properties:serde_props > nit: long line Done http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: PS2, Line 162: /** :* Analyzes the 'sort.by.columns' property to make sure the columns inside the property :* occur in the target table and are neither partitioning nor primary key columns. :*/ : public void analyzeSortByColumns() throws AnalysisException { : StringBuilder error = new StringBuilder(); : TablePropertyAnalyzer.analyzeSortByColumns(tblProperties_, getTargetTable(), error); : if (error.length() > 0) throw new AnalysisException(error.toString()); : } > Along the lines of my previous comment: a) I would move this completely out Done http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: PS2, Line 275: Boolean > boolean? why class? Done PS2, Line 2486: Preconditions.checkState(!globalState_.warningsRetrieved); > This is kind of harsh :). Do we guarantee somewhere that no one will call a We currently don't guarantee that, so we'd be adding that here. Should I remove it? Or add a comment to this method and getWarnings() to explain the behavior. I ran into this when trying to add warnings to the Analyzer after getWarnings() had been called and my intention was to prevent that error in the future. I don't feel strongly about keeping this. http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java: PS2, Line 119: sortByColumns_.size() > 0 > !sortByColumns_.isEmpty() Done PS2, Line 179: StringBuilder error = new StringBuilder(); : TablePropertyAnalyzer.analyzeSortByColumns(sortByColumns_, srcTable, error); : if (error.length() > 0) throw new AnalysisException(error.toString()); > Why don't you just throw the AnalysisException from the analyzeSortByColumn Done. When I started with this I thought I may have to call the method from places after the analysis so throwing AnalysisExceptions wouldn't work. That turned out to be false, so I now throw in analyzeSortByColumns() directly. http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS2, Line 382: sortByExprs_.size() > 0 > !sortByExprs_.isEmpty() Done PS2, Line 510: sortByExprs_.add(resultExprs_.get(colIdx)); > Does this work ok if you have column permutation? I think it does. colIdx is the index in the target table, and resultExprs_[i] gets written into column[i]. Lines 703ff in prepareExpressions seem to guarantee this. I added tests to the PlannerTest/insert-sort-by.test file. PS2, Line 768: StringBuilder error = new StringBuilder(); : sortByColumns_ = TablePropertyAnalyzer.analyzeSortByColumns(table_, error); : if (error.length() > 0) throw new AnalysisException(error.toString()); > Same comment as before. Move the throw clause inside the analyzeSortByColum Done http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS2, Line 33: /** : * This class collects various methods to analyze table properties. : *
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has uploaded a new patch set (#3). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup A fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 30 files changed, 845 insertions(+), 54 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/3 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 3 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Dimitris Tsirogiannis Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Dimitris Tsirogiannis has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 2: (20 comments) Initial pass, haven't looked at the tests yet. http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/cup/sql-parser.cup File fe/src/main/cup/sql-parser.cup: PS2, Line 986: // SORT BY columns are stored in the 'sort.by.columns' table property. : HashMap properties = new HashMap(); : String columnString = Joiner.on(",").join(col_names); : properties.put(TablePropertyAnalyzer.TBL_PROP_SORT_BY_COLUMNS, columnString); : RESULT = new AlterTableSetTblProperties(table, null, TTablePropertyType.TBL_PROPERTY, : properties); : :} I don't see why you need to treat it as another SetTblProperties statement. I understand that eventually the sort by columns will be stored as table properties but this adds unnecessary complexity to the parser. I would simply create a new AlterTableSetSortByColumns statement. PS2, Line 1143: opt_sort_by_cols:sort_by_cols : KW_LIKE table_name:other_table : opt_comment_val:comment : file_format_create_table_val:file_format location_val:location nit: indentation PS2, Line 1182: opt_sort_by_cols:sort_by_cols opt_comment_val:comment row_format_val:row_format serde_properties:serde_props nit: long line http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java File fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java: PS2, Line 162: /** :* Analyzes the 'sort.by.columns' property to make sure the columns inside the property :* occur in the target table and are neither partitioning nor primary key columns. :*/ : public void analyzeSortByColumns() throws AnalysisException { : StringBuilder error = new StringBuilder(); : TablePropertyAnalyzer.analyzeSortByColumns(tblProperties_, getTargetTable(), error); : if (error.length() > 0) throw new AnalysisException(error.toString()); : } Along the lines of my previous comment: a) I would move this completely out of this class, b) create a new AlterTableSetSortByColumns and move the analysis logic there. http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java File fe/src/main/java/org/apache/impala/analysis/Analyzer.java: PS2, Line 275: Boolean boolean? why class? PS2, Line 2486: Preconditions.checkState(!globalState_.warningsRetrieved); This is kind of harsh :). Do we guarantee somewhere that no one will call addWarning after warnings have been retrieved? http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java: PS2, Line 119: sortByColumns_.size() > 0 !sortByColumns_.isEmpty() PS2, Line 179: StringBuilder error = new StringBuilder(); : TablePropertyAnalyzer.analyzeSortByColumns(sortByColumns_, srcTable, error); : if (error.length() > 0) throw new AnalysisException(error.toString()); Why don't you just throw the AnalysisException from the analyzeSortByColumns()? http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java: PS2, Line 382: sortByExprs_.size() > 0 !sortByExprs_.isEmpty() PS2, Line 510: sortByExprs_.add(resultExprs_.get(colIdx)); Does this work ok if you have column permutation? PS2, Line 768: StringBuilder error = new StringBuilder(); : sortByColumns_ = TablePropertyAnalyzer.analyzeSortByColumns(table_, error); : if (error.length() > 0) throw new AnalysisException(error.toString()); Same comment as before. Move the throw clause inside the analyzeSortByColumns() function. http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: PS2, Line 33: /** : * This class collects various methods to analyze table properties. : */ : public class TablePropertyAnalyzer { You're only analyzing the sort by columns, so why this generic class? Can't we move these methods to the TableDef class? PS2, Line 41: Analyzes the 'sort.by.columns' property of 'table'. What is the return value? PS2, Line 86: List partitionCols, List primaryKeyCols You can simply pass one list which includes the list of column names to exclude from the sort columns. If you want a specific error message you can construct it in function analyzeSortByC
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 2: PS2 adds a file I had forgotten to "git add" in PS1. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-HasComments: No
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has uploaded a new patch set (#2). Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 29 files changed, 730 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/2 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newpatchset Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 2 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has posted comments on this change. Change subject: IMPALA-4166: Add SORT BY sql clause .. Patch Set 1: (1 comment) http://gerrit.cloudera.org:8080/#/c/6495/1/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java: Line 41:* Analyzes the 'sort.by.columns' property of 'table'. I'd be glad for a tip how to organize the different wrappers and comments. -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: comment Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker Gerrit-Reviewer: Lars Volker Gerrit-HasComments: Yes
[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause
Lars Volker has uploaded a new change for review. http://gerrit.cloudera.org:8080/6495 Change subject: IMPALA-4166: Add SORT BY sql clause .. IMPALA-4166: Add SORT BY sql clause This change adds support for adding SORT BY (...) clauses to CREATE TABLE and ALTER TABLE statements. The columns in the SORT BY clause are stored in the 'sort.by.columns' table property and will result in an additional SORT node being added to the plan before the final table sink. Specifying SORT BY columns also enables clustering during inserts, so the SORT node will contain all partitioning columns first, followed by the SORT BY columns. This change introduces a new TablePropertyAnalyzer class to collect various methods used to analyze table properties. Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 --- M common/thrift/JniCatalog.thrift M fe/src/main/cup/sql-parser.cup M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java M fe/src/main/java/org/apache/impala/analysis/Analyzer.java M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java M fe/src/main/java/org/apache/impala/analysis/TableDef.java A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java M fe/src/main/java/org/apache/impala/catalog/Column.java M fe/src/main/java/org/apache/impala/catalog/Table.java M fe/src/main/java/org/apache/impala/planner/Planner.java M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java M fe/src/main/jflex/sql-scanner.flex M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java M fe/src/test/java/org/apache/impala/analysis/ParserTest.java M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java M fe/src/test/java/org/apache/impala/planner/PlannerTest.java M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test M testdata/workloads/functional-planner/queries/PlannerTest/insert.test M testdata/workloads/functional-query/queries/QueryTest/alter-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test M testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test M testdata/workloads/functional-query/queries/QueryTest/create-table.test M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test 28 files changed, 609 insertions(+), 51 deletions(-) git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/1 -- To view, visit http://gerrit.cloudera.org:8080/6495 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75 Gerrit-PatchSet: 1 Gerrit-Project: Impala-ASF Gerrit-Branch: master Gerrit-Owner: Lars Volker