[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 31: Verified+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 31
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has submitted this change and it was merged.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

Sort columns can only be specified for Hdfs tables and effectiveness may
vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.columns' table property and will result in an additional SORT node
being added to the plan before the final table sink. Specifying sort
columns also enables clustering during inserts, so the SORT node will
contain all partitioning columns first, followed by the sort columns. We
do this because sort columns add a SORT node to the plan and adding the
clustering columns to the SORT node is cheap.

Sort columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Reviewed-on: http://gerrit.cloudera.org:8080/6495
Reviewed-by: Lars Volker 
Tested-by: Impala Public Jenkins
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
39 files changed, 1,531 insertions(+), 113 deletions(-)

Approvals:
  Impala Public Jenkins: Verified
  Lars Volker: Looks good to me, approved



-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: merged
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 32
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-12 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 31: Code-Review+2

Submit job for PS30 seems to have failed for unrelated Jenkins issues. Rebased, 
carrying Marcel's +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 31
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-12 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 31:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/575/

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 31
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 30: Verified-1

Build failed: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/573/

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 30
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Impala Public Jenkins (Code Review)
Impala Public Jenkins has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 30:

Build started: http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/573/

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 30
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Impala Public Jenkins
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 30: Code-Review+2

I chatted with Mostafa, carrying Marcel's +2

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 30
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 29:

(1 comment)

Thanks for the review. Please see PS30.

http://gerrit.cloudera.org:8080/#/c/6495/29/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

Line 1130: alter table insert_sorted replace columns (i int, e double, f 
boolean);
> For more coverage how about. 
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 29
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Lars Volker (Code Review)
Hello Marcel Kornacker, Dimitris Tsirogiannis, Mostafa Mokhtar,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#30).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

Sort columns can only be specified for Hdfs tables and effectiveness may
vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.columns' table property and will result in an additional SORT node
being added to the plan before the final table sink. Specifying sort
columns also enables clustering during inserts, so the SORT node will
contain all partitioning columns first, followed by the sort columns. We
do this because sort columns add a SORT node to the plan and adding the
clustering columns to the SORT node is cheap.

Sort columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
39 files changed, 1,531 insertions(+), 113 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/30
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 30
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 29: Code-Review+1

(1 comment)

Thanks for adding the test cases.

http://gerrit.cloudera.org:8080/#/c/6495/29/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

Line 1130: alter table insert_sorted replace columns (i int, e double, f 
boolean);
For more coverage how about. 
(int bigint, e decimal(12,2), f boolean)


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 29
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 29:

PS29 adds a negative test for alter table drop column.

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 29
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Lars Volker (Code Review)
Hello Marcel Kornacker, Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#29).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

Sort columns can only be specified for Hdfs tables and effectiveness may
vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.columns' table property and will result in an additional SORT node
being added to the plan before the final table sink. Specifying sort
columns also enables clustering during inserts, so the SORT node will
contain all partitioning columns first, followed by the sort columns. We
do this because sort columns add a SORT node to the plan and adding the
clustering columns to the SORT node is cheap.

Sort columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
39 files changed, 1,531 insertions(+), 113 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/29
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 29
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 27:

PS27 passed a jenkins run here: 
http://jenkins.impala.io:8080/job/gerrit-verify-dryrun/570

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 27
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 27:

(1 comment)

I added more tests in PS28.

http://gerrit.cloudera.org:8080/#/c/6495/27/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

Line 1058:  QUERY
> Can you please add tests cases that cover inserts and queries after alter t
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 27
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Lars Volker (Code Review)
Hello Marcel Kornacker, Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#28).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

Sort columns can only be specified for Hdfs tables and effectiveness may
vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.columns' table property and will result in an additional SORT node
being added to the plan before the final table sink. Specifying sort
columns also enables clustering during inserts, so the SORT node will
contain all partitioning columns first, followed by the sort columns. We
do this because sort columns add a SORT node to the plan and adding the
clustering columns to the SORT node is cheap.

Sort columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
39 files changed, 1,517 insertions(+), 113 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/28
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 28
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Mostafa Mokhtar (Code Review)
Mostafa Mokhtar has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 27:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6495/27/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

Line 1058:  QUERY
Can you please add tests cases that cover inserts and queries after alter table?
And if possibly verify the Order by operator has the correct columns after the 
table is modified.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 27
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Mostafa Mokhtar 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 27: Code-Review+2

Rebased, carrying Marcel's +2.

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 27
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Lars Volker (Code Review)
Hello Marcel Kornacker, Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#26).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

Sort columns can only be specified for Hdfs tables and effectiveness may
vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.columns' table property and will result in an additional SORT node
being added to the plan before the final table sink. Specifying sort
columns also enables clustering during inserts, so the SORT node will
contain all partitioning columns first, followed by the sort columns. We
do this because sort columns add a SORT node to the plan and adding the
clustering columns to the SORT node is cheap.

Sort columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
39 files changed, 1,406 insertions(+), 113 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/26
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 26
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 25:

(4 comments)

Addressed the final comments, will rebase next.

http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 295: 
> when you're output error messages, it's probably better to refer to the exa
Done


Line 352: "Please retry without caching: CREATE TABLE ... UNCACHED",
> why not "SORT BY is not supported..."?
Done


http://gerrit.cloudera.org:8080/#/c/6495/25/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 291: TableDef.analyzeSortColumns(sortCols,
> do you need the 'tabledef.' prefix here?
No, removed it.


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

Line 75: String sortByKey = AlterTableSortByStmt.TBL_PROP_SORT_COLUMNS;
> aren't you implying that it should live in hdfstable as well?
No, I meant to say that the skip.header.line.count key constant lives where it 
semantically somewhat belongs to, as compared to AlterTableSetTblProperties or 
TableDef. The class that most exclusively deals with SORT BY is this one here. 
Apologies for the confusion. :)


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 25
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 25: Code-Review+2

(4 comments)

http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 295: 
when you're output error messages, it's probably better to refer to the exact 
syntactic element. so "SORT BY column list" is clearer than "sort column list".

please apply elsewhere, in case i missed something.


Line 352: "Please retry without caching: CREATE TABLE ... UNCACHED",
why not "SORT BY is not supported..."?


http://gerrit.cloudera.org:8080/#/c/6495/25/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 291: TableDef.analyzeSortColumns(sortCols,
do you need the 'tabledef.' prefix here?


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

Line 75: String sortByKey = AlterTableSortByStmt.TBL_PROP_SORT_COLUMNS;
> I've discussed that with Alex and Dimitris and we no better place has emerg
aren't you implying that it should live in hdfstable as well?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 25
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 25:

I also replaced "sortByExprs" with "sortExprs". Please have a look at PS25.

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 25
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-11 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#25).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

Sort columns can only be specified for Hdfs tables and effectiveness may
vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.columns' table property and will result in an additional SORT node
being added to the plan before the final table sink. Specifying sort
columns also enables clustering during inserts, so the SORT node will
contain all partitioning columns first, followed by the sort columns. We
do this because sort columns add a SORT node to the plan and adding the
clustering columns to the SORT node is cheap.

Sort columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
39 files changed, 1,407 insertions(+), 113 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/25
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 25
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-10 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 23:

(10 comments)

Thanks for the review, please see PS24.

http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 336: nonterminal ArrayList opt_ident_list, opt_sort_cols;
> opt_sort_by_clause or opt_sort_cols? "sort by columns" sounds odd.
Done


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java:

Line 43
> drop "columns", the statement doesn't include a COLUMNS
Done


Line 73
> rephrase "sort by columns" as "sort columns" universally
Done


Line 91
> maybe move this into TableDef as well? it seems kind of arbitrary to put it
Done. This has moved a couple of times now. :)


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

Line 863: // TODO: Remove this when removing the sortby() hint 
(IMPALA-5157).
> is this going to happen soon?
Yes, immediately after this change makes it in. We should remove the hint 
before releasing 2.9, since it hasn't been released yet. We could remove it in 
this change at the risk of the change becoming even larger than it already is.


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 46: 
> update
Done


Line 343: if (options_.location != null) {
> why final?
To express that it's just a shorter name for the constant. Removed it.


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

Line 75: final String sortByKey = 
AlterTableSortByStmt.TBL_PROP_SORT_COLUMNS;
> it feels like that constant should live somewhere else
I've discussed that with Alex and Dimitris and we no better place has emerged, 
though I've tried some of them. It's related to the sort by clause, similar to 
how TBL_PROP_SKIP_HEADER_LINE_COUNT lives in HdfsTable.java. Do you have a 
preference where it should go?

I also thought of TableDef (where the analysis happens) or 
AlterTableSetTblProperties (where we deal with properties).


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 456:* the caller must make sure that the value matches any columns 
that were added to the
> or "that were added to the table" to avoid the gender reference :)
Done


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 101: addTestTable("create table test_sort_by.t (id int, int_col int, " 
+
> why call this alltypes? it clearly doesn't contain all types.
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-10 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#24).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

Sort columns can only be specified for Hdfs tables and effectiveness may
vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.columns' table property and will result in an additional SORT node
being added to the plan before the final table sink. Specifying sort
columns also enables clustering during inserts, so the SORT node will
contain all partitioning columns first, followed by the sort columns. We
do this because sort columns add a SORT node to the plan and adding the
clustering columns to the SORT node is cheap.

Sort columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
39 files changed, 1,401 insertions(+), 106 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/24
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 24
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-10 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#23).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

Sort columns can only be specified for Hdfs tables and effectiveness may
vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.columns' table property and will result in an additional SORT node
being added to the plan before the final table sink. Specifying sort
columns also enables clustering during inserts, so the SORT node will
contain all partitioning columns first, followed by the sort columns. We
do this because sort columns add a SORT node to the plan and adding the
clustering columns to the SORT node is cheap.

Sort columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M be/src/exec/hdfs-parquet-table-writer.cc
M be/src/exec/hdfs-table-sink.cc
M be/src/exec/hdfs-table-sink.h
M common/thrift/DataSinks.thrift
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/HdfsTableSink.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/planner/TableSink.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
39 files changed, 1,405 insertions(+), 107 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/23
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 23
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-10 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 22:

(3 comments)

Thanks for the review. I'm working on the comments, but in the meantime wanted 
to check whether we should uniformly rename all occurrences of 
"sort.?by.?columns" to sort_columns or sorting_columns.

http://gerrit.cloudera.org:8080/#/c/6495/22/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

Line 382:   // Optional list of sort by columns for the new table. If 
specified, these will override
> "sort columns" or "sorting columns" (as in: partition columns)
Done. There's a lot of other places in the code where that term is used, 
including the table property, which is called 'sort.by.columns'. Should I 
replace them all?


Line 385:   9: optional list sort_by_columns
> sort_columns
Done


Line 439:   16: optional list sort_by_columns
> same
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-10 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 22:

(13 comments)

http://gerrit.cloudera.org:8080/#/c/6495/22/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

Line 382:   // Optional list of sort by columns for the new table. If 
specified, these will override
"sort columns" or "sorting columns" (as in: partition columns)


Line 385:   9: optional list sort_by_columns
sort_columns


Line 439:   16: optional list sort_by_columns
same


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

Line 336: nonterminal ArrayList opt_ident_list, opt_sort_by_cols;
opt_sort_by_clause or opt_sort_cols? "sort by columns" sounds odd.


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java:

Line 43: public class AlterTableSortByColumnsStmt extends AlterTableStmt {
drop "columns", the statement doesn't include a COLUMNS


Line 73: // Disallow setting sort by columns on HBase and Kudu tables.
rephrase "sort by columns" as "sort columns" universally


Line 91:   public static void analyzeSortByColumns(List sortByCols, 
Table table)
maybe move this into TableDef as well? it seems kind of arbitrary to put it 
here.


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

Line 863: // TODO: Remove this when removing the sortby() hint 
(IMPALA-5157).
is this going to happen soon?


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

Line 46:  * correspond to the following clauses in a CREATE TABLE statement:
update


Line 343: final String sortByKey = 
AlterTableSortByColumnsStmt.TBL_PROP_SORT_BY_COLUMNS;
why final?


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

Line 75: final String sortByKey = 
AlterTableSortByColumnsStmt.TBL_PROP_SORT_BY_COLUMNS;
it feels like that constant should live somewhere else


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 456:* the caller must make sure that the value matches any columns 
he/she added to the
or "that were added to the table" to avoid the gender reference :)


http://gerrit.cloudera.org:8080/#/c/6495/22/fe/src/test/java/org/apache/impala/planner/PlannerTest.java
File fe/src/test/java/org/apache/impala/planner/PlannerTest.java:

Line 101: addTestTable("create table test_sort_by.alltypes (id int, int_col 
int, " +
why call this alltypes? it clearly doesn't contain all types.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-09 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 22:

Done rebasing, please have a look at PS22.

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-09 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#22).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can only be specified for Hdfs tables and effectiveness
may vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort by columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
33 files changed, 1,382 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/22
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 22
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-09 Thread Lars Volker
On Tue, May 9, 2017 at 2:25 AM, Dimitris Tsirogiannis <
dtsirogian...@cloudera.com> wrote:

> If you don't populate it for CREATE TABLE SORT BY() then I think you
> should remove it altogether.
>

We don't set it there. On second look, we don't support removing table
properties and would need to add that capability to maintain consistency
here. Should we do this? As an alternative, we can set the table property
to an empty value for CREATE TABLE SORT BY()?

I'm good with either of these. Even leaving it set to an empty value had it
ever been set seems fine to me, since it's an internal implementation
detail and won't affect functionality.


>
> Dimitris
>
> On Mon, May 8, 2017 at 4:33 PM, Lars Volker  wrote:
>
>>
>>
>> On Tue, May 9, 2017 at 1:24 AM, Dimitris Tsirogiannis <
>> dtsirogian...@cloudera.com> wrote:
>>
>>> The other alternative would be to populate it with the partitioning
>>> columns, if any. Thoughts?
>>>
>>>
>> Adding the partitioning columns to the sort by list is not supported.
>> They can be added to the pre-insert sort not using the clustered hint. I'm
>> not sure though if I understood your statement correctly.
>>
>> My question was what to do if a user executes "ALTER TABLE SORT BY();",
>> meaning to remove all sort by columns. Should we remove all columns from
>> the property, or remove the property altogether?
>>
>>
>>> Dimitris
>>>
>>> On Mon, May 8, 2017 at 4:02 PM, Lars Volker  wrote:
>>>

 On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker 
 wrote:

>
>
> On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis <
> dtsirogian...@cloudera.com> wrote:
>
>> I believe it sends the wrong message and is probably confusing to
>> throw an error when someone writes a CREATE TABLE with an empty SORT BY()
>> but allow the same clause in an ALTER. No doubt the users can read the
>> documentation and figure it out but its an extra step. Also, as Marcel
>> mentions, scripting may be easier as you don't have to figure out whether
>> to add a clause or not. Anyways, the patch is great as is, I just wanted 
>> to
>> mention this.
>>
>
> Sounds like we should allow an empty list then.
>

 I will change the parser accordingly. On a related note, ALTER TABLE
 SORT BY () will leave an empty property 'sort.by.columns'. Should it remove
 the property altogether instead?

>
>
>>
>>
>> Dimitris
>>
>> On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker > > wrote:
>>
>>>
>>>
>>> On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis <
>>> dtsirogian...@cloudera.com> wrote:
>>>
 For consistency, I believe we should at least allow an empty SORT
 BY() clause in the CREATE TABLE statement, but I'll defer the decision 
 to
 Alex or Marcel.

>>>
>>> Do we think there'll be scripts generating create table statements
>>> with empty sort-by clauses? I'm not sure there's a benefit to supporting
>>> that (but happy to stand corrected).
>>>
>>>

 Dimitris

 On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) <
 ger...@cloudera.org> wrote:

> Lars Volker has posted comments on this change.
>
> Change subject: IMPALA-4166: Add SORT BY sql clause
> 
> ..
>
>
> Patch Set 20:
>
> > There is still one thing that is not clear to me. Why is it
> allowed
>  > to do an ALTER TABLE with an empty SORT BY clause but not a
> CREATE
>  > TABLE?
>
> The easiest way to specify an empty list of sort by columns during
> CREATE TABLE is to omit the SORT BY clause altogether. To keep things
> simple I did not add additional support for an empty SORT BY () 
> clause.
>
> For ALTER TABLE there needs to be a way to specify an empty list
> of columns, but since SORT BY is used to identify the command, the 
> most
> simple form seemed to be an empty column list().
>
> Do you think we should allow CREATE TABLE SORT BY() in addition to
> specify an empty set of column?
>
> --
> To view, visit http://gerrit.cloudera.org:8080/6495
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
> Gerrit-PatchSet: 20
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-Owner: Lars Volker 
> Gerrit-Reviewer: Alex Behm 
> Gerrit-Reviewer: Dimitris Tsirogiannis  >
> Gerrit-Reviewer: Lars Volker 
> Gerrit-Reviewer: Marcel Kornacker 
> Gerrit-Reviewer: Thomas Tauber-Marshall 

Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-08 Thread Dimitris Tsirogiannis
If you don't populate it for CREATE TABLE SORT BY() then I think you should
remove it altogether.

Dimitris

On Mon, May 8, 2017 at 4:33 PM, Lars Volker  wrote:

>
>
> On Tue, May 9, 2017 at 1:24 AM, Dimitris Tsirogiannis <
> dtsirogian...@cloudera.com> wrote:
>
>> The other alternative would be to populate it with the partitioning
>> columns, if any. Thoughts?
>>
>>
> Adding the partitioning columns to the sort by list is not supported. They
> can be added to the pre-insert sort not using the clustered hint. I'm not
> sure though if I understood your statement correctly.
>
> My question was what to do if a user executes "ALTER TABLE SORT BY();",
> meaning to remove all sort by columns. Should we remove all columns from
> the property, or remove the property altogether?
>
>
>> Dimitris
>>
>> On Mon, May 8, 2017 at 4:02 PM, Lars Volker  wrote:
>>
>>>
>>> On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker 
>>> wrote:
>>>


 On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis <
 dtsirogian...@cloudera.com> wrote:

> I believe it sends the wrong message and is probably confusing to
> throw an error when someone writes a CREATE TABLE with an empty SORT BY()
> but allow the same clause in an ALTER. No doubt the users can read the
> documentation and figure it out but its an extra step. Also, as Marcel
> mentions, scripting may be easier as you don't have to figure out whether
> to add a clause or not. Anyways, the patch is great as is, I just wanted 
> to
> mention this.
>

 Sounds like we should allow an empty list then.

>>>
>>> I will change the parser accordingly. On a related note, ALTER TABLE
>>> SORT BY () will leave an empty property 'sort.by.columns'. Should it remove
>>> the property altogether instead?
>>>


>
>
> Dimitris
>
> On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker 
> wrote:
>
>>
>>
>> On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis <
>> dtsirogian...@cloudera.com> wrote:
>>
>>> For consistency, I believe we should at least allow an empty SORT
>>> BY() clause in the CREATE TABLE statement, but I'll defer the decision 
>>> to
>>> Alex or Marcel.
>>>
>>
>> Do we think there'll be scripts generating create table statements
>> with empty sort-by clauses? I'm not sure there's a benefit to supporting
>> that (but happy to stand corrected).
>>
>>
>>>
>>> Dimitris
>>>
>>> On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) <
>>> ger...@cloudera.org> wrote:
>>>
 Lars Volker has posted comments on this change.

 Change subject: IMPALA-4166: Add SORT BY sql clause
 
 ..


 Patch Set 20:

 > There is still one thing that is not clear to me. Why is it
 allowed
  > to do an ALTER TABLE with an empty SORT BY clause but not a
 CREATE
  > TABLE?

 The easiest way to specify an empty list of sort by columns during
 CREATE TABLE is to omit the SORT BY clause altogether. To keep things
 simple I did not add additional support for an empty SORT BY () clause.

 For ALTER TABLE there needs to be a way to specify an empty list of
 columns, but since SORT BY is used to identify the command, the most 
 simple
 form seemed to be an empty column list().

 Do you think we should allow CREATE TABLE SORT BY() in addition to
 specify an empty set of column?

 --
 To view, visit http://gerrit.cloudera.org:8080/6495
 To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

 Gerrit-MessageType: comment
 Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
 Gerrit-PatchSet: 20
 Gerrit-Project: Impala-ASF
 Gerrit-Branch: master
 Gerrit-Owner: Lars Volker 
 Gerrit-Reviewer: Alex Behm 
 Gerrit-Reviewer: Dimitris Tsirogiannis 
 Gerrit-Reviewer: Lars Volker 
 Gerrit-Reviewer: Marcel Kornacker 
 Gerrit-Reviewer: Thomas Tauber-Marshall 
 Gerrit-HasComments: No

>>>
>>>
>>
>

>>>
>>
>


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-08 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 21:

Rebased, added support for CREATE TABLE SORT BY() with empty sort by list. Will 
need to rebase again though, since some conflicting change got merge in the 
meantime.

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-08 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#21).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can only be specified for Hdfs tables and effectiveness
may vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort by columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
33 files changed, 1,379 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/21
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 21
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-08 Thread Lars Volker
On Tue, May 9, 2017 at 1:24 AM, Dimitris Tsirogiannis <
dtsirogian...@cloudera.com> wrote:

> The other alternative would be to populate it with the partitioning
> columns, if any. Thoughts?
>
>
Adding the partitioning columns to the sort by list is not supported. They
can be added to the pre-insert sort not using the clustered hint. I'm not
sure though if I understood your statement correctly.

My question was what to do if a user executes "ALTER TABLE SORT BY();",
meaning to remove all sort by columns. Should we remove all columns from
the property, or remove the property altogether?


> Dimitris
>
> On Mon, May 8, 2017 at 4:02 PM, Lars Volker  wrote:
>
>>
>> On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker 
>> wrote:
>>
>>>
>>>
>>> On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis <
>>> dtsirogian...@cloudera.com> wrote:
>>>
 I believe it sends the wrong message and is probably confusing to throw
 an error when someone writes a CREATE TABLE with an empty SORT BY() but
 allow the same clause in an ALTER. No doubt the users can read the
 documentation and figure it out but its an extra step. Also, as Marcel
 mentions, scripting may be easier as you don't have to figure out whether
 to add a clause or not. Anyways, the patch is great as is, I just wanted to
 mention this.

>>>
>>> Sounds like we should allow an empty list then.
>>>
>>
>> I will change the parser accordingly. On a related note, ALTER TABLE SORT
>> BY () will leave an empty property 'sort.by.columns'. Should it remove the
>> property altogether instead?
>>
>>>
>>>


 Dimitris

 On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker 
 wrote:

>
>
> On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis <
> dtsirogian...@cloudera.com> wrote:
>
>> For consistency, I believe we should at least allow an empty SORT
>> BY() clause in the CREATE TABLE statement, but I'll defer the decision to
>> Alex or Marcel.
>>
>
> Do we think there'll be scripts generating create table statements
> with empty sort-by clauses? I'm not sure there's a benefit to supporting
> that (but happy to stand corrected).
>
>
>>
>> Dimitris
>>
>> On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) <
>> ger...@cloudera.org> wrote:
>>
>>> Lars Volker has posted comments on this change.
>>>
>>> Change subject: IMPALA-4166: Add SORT BY sql clause
>>> 
>>> ..
>>>
>>>
>>> Patch Set 20:
>>>
>>> > There is still one thing that is not clear to me. Why is it allowed
>>>  > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE
>>>  > TABLE?
>>>
>>> The easiest way to specify an empty list of sort by columns during
>>> CREATE TABLE is to omit the SORT BY clause altogether. To keep things
>>> simple I did not add additional support for an empty SORT BY () clause.
>>>
>>> For ALTER TABLE there needs to be a way to specify an empty list of
>>> columns, but since SORT BY is used to identify the command, the most 
>>> simple
>>> form seemed to be an empty column list().
>>>
>>> Do you think we should allow CREATE TABLE SORT BY() in addition to
>>> specify an empty set of column?
>>>
>>> --
>>> To view, visit http://gerrit.cloudera.org:8080/6495
>>> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>>>
>>> Gerrit-MessageType: comment
>>> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
>>> Gerrit-PatchSet: 20
>>> Gerrit-Project: Impala-ASF
>>> Gerrit-Branch: master
>>> Gerrit-Owner: Lars Volker 
>>> Gerrit-Reviewer: Alex Behm 
>>> Gerrit-Reviewer: Dimitris Tsirogiannis 
>>> Gerrit-Reviewer: Lars Volker 
>>> Gerrit-Reviewer: Marcel Kornacker 
>>> Gerrit-Reviewer: Thomas Tauber-Marshall 
>>> Gerrit-HasComments: No
>>>
>>
>>
>

>>>
>>
>


Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-08 Thread Dimitris Tsirogiannis
The other alternative would be to populate it with the partitioning
columns, if any. Thoughts?

Dimitris

On Mon, May 8, 2017 at 4:02 PM, Lars Volker  wrote:

>
> On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker 
> wrote:
>
>>
>>
>> On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis <
>> dtsirogian...@cloudera.com> wrote:
>>
>>> I believe it sends the wrong message and is probably confusing to throw
>>> an error when someone writes a CREATE TABLE with an empty SORT BY() but
>>> allow the same clause in an ALTER. No doubt the users can read the
>>> documentation and figure it out but its an extra step. Also, as Marcel
>>> mentions, scripting may be easier as you don't have to figure out whether
>>> to add a clause or not. Anyways, the patch is great as is, I just wanted to
>>> mention this.
>>>
>>
>> Sounds like we should allow an empty list then.
>>
>
> I will change the parser accordingly. On a related note, ALTER TABLE SORT
> BY () will leave an empty property 'sort.by.columns'. Should it remove the
> property altogether instead?
>
>>
>>
>>>
>>>
>>> Dimitris
>>>
>>> On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker 
>>> wrote:
>>>


 On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis <
 dtsirogian...@cloudera.com> wrote:

> For consistency, I believe we should at least allow an empty SORT BY()
> clause in the CREATE TABLE statement, but I'll defer the decision to Alex
> or Marcel.
>

 Do we think there'll be scripts generating create table statements with
 empty sort-by clauses? I'm not sure there's a benefit to supporting that
 (but happy to stand corrected).


>
> Dimitris
>
> On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) <
> ger...@cloudera.org> wrote:
>
>> Lars Volker has posted comments on this change.
>>
>> Change subject: IMPALA-4166: Add SORT BY sql clause
>> 
>> ..
>>
>>
>> Patch Set 20:
>>
>> > There is still one thing that is not clear to me. Why is it allowed
>>  > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE
>>  > TABLE?
>>
>> The easiest way to specify an empty list of sort by columns during
>> CREATE TABLE is to omit the SORT BY clause altogether. To keep things
>> simple I did not add additional support for an empty SORT BY () clause.
>>
>> For ALTER TABLE there needs to be a way to specify an empty list of
>> columns, but since SORT BY is used to identify the command, the most 
>> simple
>> form seemed to be an empty column list().
>>
>> Do you think we should allow CREATE TABLE SORT BY() in addition to
>> specify an empty set of column?
>>
>> --
>> To view, visit http://gerrit.cloudera.org:8080/6495
>> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>>
>> Gerrit-MessageType: comment
>> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
>> Gerrit-PatchSet: 20
>> Gerrit-Project: Impala-ASF
>> Gerrit-Branch: master
>> Gerrit-Owner: Lars Volker 
>> Gerrit-Reviewer: Alex Behm 
>> Gerrit-Reviewer: Dimitris Tsirogiannis 
>> Gerrit-Reviewer: Lars Volker 
>> Gerrit-Reviewer: Marcel Kornacker 
>> Gerrit-Reviewer: Thomas Tauber-Marshall 
>> Gerrit-HasComments: No
>>
>
>

>>>
>>
>


Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-08 Thread Lars Volker
On Mon, May 8, 2017 at 11:41 PM, Marcel Kornacker 
wrote:

>
>
> On Mon, May 8, 2017 at 9:50 AM, Dimitris Tsirogiannis <
> dtsirogian...@cloudera.com> wrote:
>
>> I believe it sends the wrong message and is probably confusing to throw
>> an error when someone writes a CREATE TABLE with an empty SORT BY() but
>> allow the same clause in an ALTER. No doubt the users can read the
>> documentation and figure it out but its an extra step. Also, as Marcel
>> mentions, scripting may be easier as you don't have to figure out whether
>> to add a clause or not. Anyways, the patch is great as is, I just wanted to
>> mention this.
>>
>
> Sounds like we should allow an empty list then.
>

I will change the parser accordingly. On a related note, ALTER TABLE SORT
BY () will leave an empty property 'sort.by.columns'. Should it remove the
property altogether instead?

>
>
>>
>>
>> Dimitris
>>
>> On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker 
>> wrote:
>>
>>>
>>>
>>> On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis <
>>> dtsirogian...@cloudera.com> wrote:
>>>
 For consistency, I believe we should at least allow an empty SORT BY()
 clause in the CREATE TABLE statement, but I'll defer the decision to Alex
 or Marcel.

>>>
>>> Do we think there'll be scripts generating create table statements with
>>> empty sort-by clauses? I'm not sure there's a benefit to supporting that
>>> (but happy to stand corrected).
>>>
>>>

 Dimitris

 On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) <
 ger...@cloudera.org> wrote:

> Lars Volker has posted comments on this change.
>
> Change subject: IMPALA-4166: Add SORT BY sql clause
> ..
>
>
> Patch Set 20:
>
> > There is still one thing that is not clear to me. Why is it allowed
>  > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE
>  > TABLE?
>
> The easiest way to specify an empty list of sort by columns during
> CREATE TABLE is to omit the SORT BY clause altogether. To keep things
> simple I did not add additional support for an empty SORT BY () clause.
>
> For ALTER TABLE there needs to be a way to specify an empty list of
> columns, but since SORT BY is used to identify the command, the most 
> simple
> form seemed to be an empty column list().
>
> Do you think we should allow CREATE TABLE SORT BY() in addition to
> specify an empty set of column?
>
> --
> To view, visit http://gerrit.cloudera.org:8080/6495
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
> Gerrit-PatchSet: 20
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-Owner: Lars Volker 
> Gerrit-Reviewer: Alex Behm 
> Gerrit-Reviewer: Dimitris Tsirogiannis 
> Gerrit-Reviewer: Lars Volker 
> Gerrit-Reviewer: Marcel Kornacker 
> Gerrit-Reviewer: Thomas Tauber-Marshall 
> Gerrit-HasComments: No
>


>>>
>>
>


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-08 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 20: Code-Review+1

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-08 Thread Dimitris Tsirogiannis
I believe it sends the wrong message and is probably confusing to throw an
error when someone writes a CREATE TABLE with an empty SORT BY() but allow
the same clause in an ALTER. No doubt the users can read the documentation
and figure it out but its an extra step. Also, as Marcel mentions,
scripting may be easier as you don't have to figure out whether to add a
clause or not. Anyways, the patch is great as is, I just wanted to mention
this.

Dimitris

On Mon, May 8, 2017 at 7:50 AM, Marcel Kornacker 
wrote:

>
>
> On Sat, May 6, 2017 at 7:57 PM, Dimitris Tsirogiannis <
> dtsirogian...@cloudera.com> wrote:
>
>> For consistency, I believe we should at least allow an empty SORT BY()
>> clause in the CREATE TABLE statement, but I'll defer the decision to Alex
>> or Marcel.
>>
>
> Do we think there'll be scripts generating create table statements with
> empty sort-by clauses? I'm not sure there's a benefit to supporting that
> (but happy to stand corrected).
>
>
>>
>> Dimitris
>>
>> On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) <
>> ger...@cloudera.org> wrote:
>>
>>> Lars Volker has posted comments on this change.
>>>
>>> Change subject: IMPALA-4166: Add SORT BY sql clause
>>> ..
>>>
>>>
>>> Patch Set 20:
>>>
>>> > There is still one thing that is not clear to me. Why is it allowed
>>>  > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE
>>>  > TABLE?
>>>
>>> The easiest way to specify an empty list of sort by columns during
>>> CREATE TABLE is to omit the SORT BY clause altogether. To keep things
>>> simple I did not add additional support for an empty SORT BY () clause.
>>>
>>> For ALTER TABLE there needs to be a way to specify an empty list of
>>> columns, but since SORT BY is used to identify the command, the most simple
>>> form seemed to be an empty column list().
>>>
>>> Do you think we should allow CREATE TABLE SORT BY() in addition to
>>> specify an empty set of column?
>>>
>>> --
>>> To view, visit http://gerrit.cloudera.org:8080/6495
>>> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>>>
>>> Gerrit-MessageType: comment
>>> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
>>> Gerrit-PatchSet: 20
>>> Gerrit-Project: Impala-ASF
>>> Gerrit-Branch: master
>>> Gerrit-Owner: Lars Volker 
>>> Gerrit-Reviewer: Alex Behm 
>>> Gerrit-Reviewer: Dimitris Tsirogiannis 
>>> Gerrit-Reviewer: Lars Volker 
>>> Gerrit-Reviewer: Marcel Kornacker 
>>> Gerrit-Reviewer: Thomas Tauber-Marshall 
>>> Gerrit-HasComments: No
>>>
>>
>>
>


Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-07 Thread Lars Volker
Anything else you would like me to change for a +1?

Alex, Marcel, do you prefer to have a look before or after rebasing the
change?

Cheers, Lars

On Sun, May 7, 2017 at 4:57 AM, Dimitris Tsirogiannis <
dtsirogian...@cloudera.com> wrote:

> For consistency, I believe we should at least allow an empty SORT BY()
> clause in the CREATE TABLE statement, but I'll defer the decision to Alex
> or Marcel.
>
> Dimitris
>
> On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) <
> ger...@cloudera.org> wrote:
>
>> Lars Volker has posted comments on this change.
>>
>> Change subject: IMPALA-4166: Add SORT BY sql clause
>> ..
>>
>>
>> Patch Set 20:
>>
>> > There is still one thing that is not clear to me. Why is it allowed
>>  > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE
>>  > TABLE?
>>
>> The easiest way to specify an empty list of sort by columns during CREATE
>> TABLE is to omit the SORT BY clause altogether. To keep things simple I did
>> not add additional support for an empty SORT BY () clause.
>>
>> For ALTER TABLE there needs to be a way to specify an empty list of
>> columns, but since SORT BY is used to identify the command, the most simple
>> form seemed to be an empty column list().
>>
>> Do you think we should allow CREATE TABLE SORT BY() in addition to
>> specify an empty set of column?
>>
>> --
>> To view, visit http://gerrit.cloudera.org:8080/6495
>> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>>
>> Gerrit-MessageType: comment
>> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
>> Gerrit-PatchSet: 20
>> Gerrit-Project: Impala-ASF
>> Gerrit-Branch: master
>> Gerrit-Owner: Lars Volker 
>> Gerrit-Reviewer: Alex Behm 
>> Gerrit-Reviewer: Dimitris Tsirogiannis 
>> Gerrit-Reviewer: Lars Volker 
>> Gerrit-Reviewer: Marcel Kornacker 
>> Gerrit-Reviewer: Thomas Tauber-Marshall 
>> Gerrit-HasComments: No
>>
>
>


Re: [Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-06 Thread Dimitris Tsirogiannis
For consistency, I believe we should at least allow an empty SORT BY()
clause in the CREATE TABLE statement, but I'll defer the decision to Alex
or Marcel.

Dimitris

On Sat, May 6, 2017 at 8:16 AM, Lars Volker (Code Review) <
ger...@cloudera.org> wrote:

> Lars Volker has posted comments on this change.
>
> Change subject: IMPALA-4166: Add SORT BY sql clause
> ..
>
>
> Patch Set 20:
>
> > There is still one thing that is not clear to me. Why is it allowed
>  > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE
>  > TABLE?
>
> The easiest way to specify an empty list of sort by columns during CREATE
> TABLE is to omit the SORT BY clause altogether. To keep things simple I did
> not add additional support for an empty SORT BY () clause.
>
> For ALTER TABLE there needs to be a way to specify an empty list of
> columns, but since SORT BY is used to identify the command, the most simple
> form seemed to be an empty column list().
>
> Do you think we should allow CREATE TABLE SORT BY() in addition to specify
> an empty set of column?
>
> --
> To view, visit http://gerrit.cloudera.org:8080/6495
> To unsubscribe, visit http://gerrit.cloudera.org:8080/settings
>
> Gerrit-MessageType: comment
> Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
> Gerrit-PatchSet: 20
> Gerrit-Project: Impala-ASF
> Gerrit-Branch: master
> Gerrit-Owner: Lars Volker 
> Gerrit-Reviewer: Alex Behm 
> Gerrit-Reviewer: Dimitris Tsirogiannis 
> Gerrit-Reviewer: Lars Volker 
> Gerrit-Reviewer: Marcel Kornacker 
> Gerrit-Reviewer: Thomas Tauber-Marshall 
> Gerrit-HasComments: No
>


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-06 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 20:

> There is still one thing that is not clear to me. Why is it allowed
 > to do an ALTER TABLE with an empty SORT BY clause but not a CREATE
 > TABLE?

The easiest way to specify an empty list of sort by columns during CREATE TABLE 
is to omit the SORT BY clause altogether. To keep things simple I did not add 
additional support for an empty SORT BY () clause.

For ALTER TABLE there needs to be a way to specify an empty list of columns, 
but since SORT BY is used to identify the command, the most simple form seemed 
to be an empty column list().

Do you think we should allow CREATE TABLE SORT BY() in addition to specify an 
empty set of column?

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 20:

There is still one thing that is not clear to me. Why is it allowed to do an 
ALTER TABLE with an empty SORT BY clause but not a CREATE TABLE?

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-05 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 20:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6495/18/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

PS18, Line 183: # CTAS with partitions and sort by columns
> It's a parser error in create but allowed in alter (see ParserTest.java:227
Oh, I understand :). Yes, dropping sort by columns should result in an empty 
list in the same way that alter table sort by() does - with the exception that 
the latter does not drop the columns from the table.

I added 2 additional tests to alter-table.test. Can you have a look there?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-05 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#20).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can only be specified for Hdfs tables and effectiveness
may vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort by columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
33 files changed, 1,371 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/20
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 20
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 18:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6495/18/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

PS18, Line 183: # CTAS with partitions and sort by columns
> I'm not sure I'm following here. sort by() cannot have an empty column list
It's a parser error in create but allowed in alter (see ParserTest.java:2271). 
So, there is some inconsistency here.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-05 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 17:

(3 comments)

Thanks for the review. Please see PS19 and my inline comments.

http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

PS17, Line 183: // ALTER TABLE SET is not supported on HBase tables altogether.
  : Preconditions.checkState(!(table instanceof HBaseTable));
> Sorry, I was misled by the comment above it and by the fact that I didn't s
For the SORT BY, there's a check in AlterTableSortByColumnsStmt::analyze().

For ALTER TABLE SET, there's a check in AlterTableSetStmt::analyze().

I added a reference to that in the comment. Please let me know if I should 
describe it in more detail in the comment.


http://gerrit.cloudera.org:8080/#/c/6495/18/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

PS18, Line 183: # CTAS with partitions and sort by columns
> If you don't have one already, can you add a test for a partitioned table w
I'm not sure I'm following here. sort by() cannot have an empty column list. 
It's a parser error and tested in ParserTest.java:2365. Do you have a 
particular corner case in mind that you'd like to test here?

That being said, I'm not sure your second suggestion would work. I added a test 
to the bottom of alter-table.test to check that changing the sort by columns 
works on a partitioned table.


http://gerrit.cloudera.org:8080/#/c/6495/18/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

PS18, Line 327: 
> Thanks for adding this test. Can you also add another one in which theres i
I added a test. The columns get concatenated as described in the commit message.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-05 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#19).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can only be specified for Hdfs tables and effectiveness
may vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort by columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
33 files changed, 1,353 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/19
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 19
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-04 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 18:

(2 comments)

Thanks Lars. Some minor comments on testing.

http://gerrit.cloudera.org:8080/#/c/6495/18/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

PS18, Line 183: # CTAS with partitions and sort by columns
If you don't have one already, can you add a test for a partitioned table with 
an empty sort by clause (e.g. partitioned by (year) sort by ()). Also, you may 
want to check the following behavior:
1. create partitioned table with non-empty sort by clause
2. drop all the columns that appear in the sort by clause
3. check if this is equivalent to sort by ().


http://gerrit.cloudera.org:8080/#/c/6495/18/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

PS18, Line 327: # Test that using both the sortby hint and sort by columns work.
Thanks for adding this test. Can you also add another one in which theres is 
some overlap between the columns of the sort by hint and the columns of the 
sort by clause?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-04 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 17:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

PS17, Line 183: // ALTER TABLE SET is not supported on HBase tables altogether.
  : Preconditions.checkState(!(table instanceof HBaseTable));
> I though your comment on PS15 asked me to add it, but I may have misunderst
Sorry, I was misled by the comment above it and by the fact that I didn't see a 
check for HBaseTable in the analyze() function. If we want to be consistent 
with your comment, we should have a check in analyze() but I am wondering if 
there is a check somewhere else.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-04 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 17:

(7 comments)

Thank you for your review. Please see PS 18.

http://gerrit.cloudera.org:8080/#/c/6495/17//COMMIT_MSG
Commit Message:

PS17, Line 30: SORT BY columns supersede the sortby() hint, which we will 
remove in a
 : subsequent change (IMPALA-5144).
> Mention what happens if both are used (e.g. union the columns or something 
Done


http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

PS17, Line 152: Returns a list of positions of
  :* the sort by columns within the table's list of columns
> This function doesn't return anything.
Done


PS17, Line 183: // ALTER TABLE SET is not supported on HBase tables altogether.
  : Preconditions.checkState(!(table instanceof HBaseTable));
> This check doesn't belong here. Maybe move it to the analyze() function?
I though your comment on PS15 asked me to add it, but I may have misunderstood. 
What did you mean?


http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java:

PS17, Line 61: HashMap
> nit: Map
Done


PS17, Line 87: a
> nit: an
Done


http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS17, Line 138:  
> Mention what happens if both are used (i.e. sort.by.column and sortby() hin
Done


http://gerrit.cloudera.org:8080/#/c/6495/17/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

PS17, Line 1: #
> Add one or more tests where the sortby hint is used in conjunction with sor
I added a test at the bottom of the file.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-04 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#18).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can only be specified for Hdfs tables and effectiveness
may vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144). Until then, it is possible to specify
sort by columns using both ways at the same time and the column lists
will be concatenated.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
33 files changed, 1,314 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/18
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 18
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-04 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 17:

(7 comments)

http://gerrit.cloudera.org:8080/#/c/6495/17//COMMIT_MSG
Commit Message:

PS17, Line 30: SORT BY columns supersede the sortby() hint, which we will 
remove in a
 : subsequent change (IMPALA-5144).
Mention what happens if both are used (e.g. union the columns or something else)


http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

PS17, Line 152: Returns a list of positions of
  :* the sort by columns within the table's list of columns
This function doesn't return anything.


PS17, Line 183: // ALTER TABLE SET is not supported on HBase tables altogether.
  : Preconditions.checkState(!(table instanceof HBaseTable));
This check doesn't belong here. Maybe move it to the analyze() function?


http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java:

PS17, Line 61: HashMap
nit: Map


PS17, Line 87: a
nit: an


http://gerrit.cloudera.org:8080/#/c/6495/17/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS17, Line 138:  
Mention what happens if both are used (i.e. sort.by.column and sortby() hint).


http://gerrit.cloudera.org:8080/#/c/6495/17/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

PS17, Line 1: #
Add one or more tests where the sortby hint is used in conjunction with sort by 
clause, if you don't already have one.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-03 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 17:

I think the interaction between the hint and clause confused me at some point. 
I'll take another look at the new patch and see if removing the hint will make 
things simpler.

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-03 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 17:

> Quick question. Is there a reason why we don't remove the sortby
 > hint in this patch?

I thought it would be easier to review in a subsequent change, since adding the 
table property and removing the hint seem mostly orthogonal and this change 
already has over 1k lines. Would you prefer to remove the hint in the same 
change? I don't feel strongly about it.

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-03 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 15:

Quick question. Is there a reason why we don't remove the sortby hint in this 
patch?

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-03 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 17:

(19 comments)

Thank you for your review! Please see my inline comments and PS 17.

http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

PS15, Line 168: 
> typo: commas
Done


PS15, Line 169:  '
> remove
Done


PS15, Line 170: f column names separated by commas, 
> nit: this function
Done


PS15, Line 172: during the an
> You need to comment on the return value.
Done


PS15, Line 174:  columns.
> long line
Done


PS15, Line 178: if (!tblProperties.containsKey(
> Whenever I see this, I keep wondering about HBase. Can you add a preconditi
Done


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java:

PS15, Line 44: public static final String TBL_PROP_SORT_BY_COLUMNS = 
"sort.by.columns";
> Maybe it makes more sense to put this in AlterTableSetTblProperiesStmt. Not
After moving this back and forth, my reasoning was that this is the more 
natural way to change the underlying table property, and therefore I left it 
here. In a similar fashion, the property name for 'skip.header.line.count' is 
defined in HdfsTable.java. I also don't have a strong preference. Let's wait 
what the next reviewer says then. :)


PS15, Line 93: TableDef.analyzeSortByColumns(sor
> What else could it be? You can just do Preconditions.checkState(table insta
Done


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS15, Line 768: if (!(table_ instanceof HdfsTable)) return;
  : 
> Why not if (!(table_ instanceof HdfsTable)) return; instead?
Done


PS15, Line 843:   // We need to make a copy to make the sortByColumns_ list 
mutable.
  : // TODO: Remove this when removing the sortby() hint 
(IMPALA-5157).
  : sortByColumns_ = Lists.newArrayList(sortByColumns_);
  :   }
  :   for (int i = 0; i < columns.size(); ++i) {
  : if (columns.get(i).getName().equals(columnName)) {
  :   sortByExprs_.add(resultExprs_.get(i));
  :   sortByColumns_.add(i);
  :   foundColumn = true;
  :   break;
  : }
  :   }
  :   i
> Not sure I follow what is happening here. If you already have sortByColumns
This code is required for the sortby() hint, and will be removed together with 
the hint. Should I try and explain this more in the comment above?


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

PS15, Line 284: List I like using interfaces but is this ever called with anything other than a 
No. I think it was used with the result of a split() in an earlier patchset, 
but now is only used for Lists.


PS15, Line 350: if (options_.sortByCols == null) r
> if (options_.sortByCols == null) return;
Done


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

PS15, Line 77: S
> nit: extra space
Done


PS15, Line 297: if (sortByColsSql != null) {
  :   sb.append(String.format("SORT BY (\n  %s\n)\n",
  :   Joiner.on(", \n  ").join(sortByColsSql)));
  : }
> haha, I can't visualize the output here.
I re-used the code from L288. Do you have a suggestion how it can be made more 
clear?


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS15, Line 456: he/she adde
> nit: added
I think both ways are fine, adding the columns first, or calling this method 
first. Changed it to added though, since that seems to reflect the more common 
case better.


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS15, Line 1881: String oldColumns = msTbl.getParameters().get(sortByKey);
   : String alteredColumns = MetaStoreUtil.intersectCsvLis
> I find it hard to parse this. Can you split it into two statements? Also, e
Done


PS15, Line 1915: ring sortByKey = 
AlterTableSortByColumnsStmt.TBL_PROP_SORT_BY_COLUMNS;
   : if (msTbl.getParameters().containsKey(sortByKey)) {
> same here
Done


PS15, Line 21

[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-03 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#17).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can only be specified for Hdfs tables and effectiveness
may vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
33 files changed, 1,287 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/17
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 17
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-05-03 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#16).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can only be specified for Hdfs tables and effectiveness
may vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
33 files changed, 1,287 insertions(+), 90 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/16
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 16
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 15:

(19 comments)

http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

PS15, Line 168: commans
typo: commas


PS15, Line 169: of
remove


PS15, Line 170: this will throw an AnalysisException
nit: this function


PS15, Line 172: List
You need to comment on the return value.


PS15, Line 174: if 
(!tblProperties.containsKey(AlterTableSortByColumnsStmt.TBL_PROP_SORT_BY_COLUMNS))
 {
long line


PS15, Line 178: if (table instanceof KuduTable) {
Whenever I see this, I keep wondering about HBase. Can you add a precondition 
check or a similar check? whatever is applicable.


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java:

PS15, Line 44: public static final String TBL_PROP_SORT_BY_COLUMNS = 
"sort.by.columns";
Maybe it makes more sense to put this in AlterTableSetTblProperiesStmt. Not a 
strong preference though.


PS15, Line 93: if (table instanceof HdfsTable) {
What else could it be? You can just do Preconditions.checkState(table 
instanceof HdfsTable); and then cast and remove the if statement.


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS15, Line 768: if (table_ instanceof HBaseTable) return;
  : if (table_ instanceof KuduTable) return;
Why not if (!(table_ instanceof HdfsTable)) return; instead?


PS15, Line 843: if (!columns.isEmpty()) {
  : // We need to make a copy to make the sortByColumns_ 
list mutable.
  : // TODO: Remove this when removing the sortby() hint 
(IMPALA-5157).
  : sortByColumns_ = Lists.newArrayList(sortByColumns_);
  :   }
  :   for (int i = 0; i < columns.size(); ++i) {
  : if (columns.get(i).getName().equals(columnName)) {
  :   sortByExprs_.add(resultExprs_.get(i));
  :   sortByColumns_.add(i);
  :   foundColumn = true;
  :   break;
  : }
  :   }
Not sure I follow what is happening here. If you already have sortByColumns_ do 
you expand them based on the hint or something else? Do we have tests for that?


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

PS15, Line 284: Iterable
I like using interfaces but is this ever called with anything other than a List?


PS15, Line 350: if (options_.sortByCols != null) {
if (options_.sortByCols == null) return;


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
File fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java:

PS15, Line 77:  
nit: extra space


PS15, Line 297: if (sortByColsSql != null) {
  :   sb.append(String.format("SORT BY (\n  %s\n)\n",
  :   Joiner.on(", \n  ").join(sortByColsSql)));
  : }
haha, I can't visualize the output here.


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS15, Line 456: he/she adds
nit: added


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS15, Line 1881: msTbl.getParameters().put(sortByKey, 
MetaStoreUtil.intersectCsvListWithColumNames(
   : msTbl.getParameters().get( sortByKey), columns));
I find it hard to parse this. Can you split it into two statements? Also, extra 
space after 'get('.


PS15, Line 1915: msTbl.getParameters().put(sortByKey, 
MetaStoreUtil.replaceValueInCsvList(
   :   msTbl.getParameters().get( sortByKey), colName, 
newCol.getColumnName()));
same here


PS15, Line 2198: msTbl.getParameters().put(sortByKey, 
MetaStoreUtil.removeValueFromCsvList(
   :   msTbl.getParameters().get( sortByKey), colName));
and here


http://gerrit.cloudera.org:8080/#/c/6495/15/fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
File fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java:

PS15, Line 218: HashSet rightColNames = Sets.newHashSet();
  : for (TColumn c : rightCols) 
rightColNames.add(c.getColumnName().toLowerCase());
  : List outputList = Lists.newAr

[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-17 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 15:

Doing another pass now...

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-11 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 14:

Please have a look at PS15. It removes support for Kudu tables, and moves the 
analysis code from the TablePropertyAnalyzer to where it is used and/or where 
it semantically seems to belong better.

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-11 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#15).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can only be specified for Hdfs tables and effectiveness
may vary based on storage type; for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
33 files changed, 1,277 insertions(+), 89 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/15
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 15
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-11 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 9:

(1 comment)

Thank you for your review. Please see PS14 and the email thread I started on 
Kudu tables.

http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

Line 114:   if (!foundColumn) {
> why not apply the drop/rename to the sort-by list?
As discussed in person, storing sorting columns as names seems to make it more 
robust against changes that happen outside of our control, e.g. a user dropping 
a column in Hive.

I added code to handle alter table statements properly and in a single 
statement. We need to decide whether we want to support Kudu tables at all, 
I'll start an email thread for that.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-11 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#14).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
PARTITIONS 8 SORT BY(i) STORED AS KUDU;
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can be specified for all table types, but effectiveness
may vary based on table type, for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns for HDFS tables, and must not contain primary key columns for
Kudu tables. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/java/org/apache/impala/util/MetaStoreUtil.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
34 files changed, 1,332 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/14
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 14
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-10 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

Line 114:   if (!foundColumn) {
> Yes, currently that's the case. I think we discussed this in person once, t
why not apply the drop/rename to the sort-by list?

(why is rename an issue? do you not store the sorting columns as integer 
indices into the table column list?)


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-09 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 12:

(5 comments)

Thank you Alex for the review. Please see my inline comments and PS13.

http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

Line 85:*/
Each of the public methods is called at least twice from elsewhere so inlining 
them seems wasteful. For cases where we don't have a table yet (e.g. create 
table statements), I can't think of an interface that's much better than this. 
Should we build lookup tables for the column lists first to address your 
performance considerations? We could also try build an interface and dummy 
table for cases where we don't have a table yet.

Should we discuss this in a short meeting?


Line 114:   }
> Does this mean we need 2 ALTER statements to drop/rename a column?
Yes, currently that's the case. I think we discussed this in person once, too, 
but I don't remember what we decided to do. Do you have a preference? Should 
the drop/rename statement fail if the columns is still referenced in the sort 
by list? Or at least issue a warning?


Line 115:   if (!foundColumn) {
Yes, in AnalyzeDDLStmts.java.


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 458:*/
> Preconditions.checkStats(RuntimeEnv.isTestEnv());
Cool, done.


http://gerrit.cloudera.org:8080/#/c/6495/9/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

Line 170: # IMPALA-4166: sort by columns are correct when using a partial 
column permutation.
> What if neither bool_col or int_col are mentioned in the column permutation
No, constant exprs are removed from the sortby list in the Planner, and for an 
empty list we don't create a sort node at all. I added a test to make sure this 
works as intended.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-09 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#13).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
PARTITIONS 8 SORT BY(i) STORED AS KUDU;
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can be specified for all table types, but effectiveness
may vary based on table type, for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns for HDFS tables, and must not contain primary key columns for
Kudu tables. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
33 files changed, 1,234 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/13
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 13
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-09 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#12).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
PARTITIONS 8 SORT BY(i) STORED AS KUDU;
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can be specified for all table types, but effectiveness
may vary based on table type, for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns for HDFS tables, and must not contain primary key columns for
Kudu tables. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
33 files changed, 1,234 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/12
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 12
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 9:

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

Line 85:   public static List analyzeSortByColumns(List 
sortByCols,
Do we need all the public APIs? Probably not a hug deal, but I worry about the 
N^2 loop in here. Table and KuduTable provide efficient ways for checking 
whether a column is a partition column or primary key.


Line 114:   if (!foundColumn) {
Does this mean we need 2 ALTER statements to drop/rename a column?


Line 115: throw new AnalysisException(String.format("Could not find 
SORT BY column '%s' " +
Do we have a test for this?


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 458:   public void setNumClusteringCols(int n) { numClusteringCols_ = n; }
Preconditions.checkStats(RuntimeEnv.isTestEnv());


http://gerrit.cloudera.org:8080/#/c/6495/9/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

Line 170: # IMPALA-4166: sort by columns are correct when using a partial 
column permutation.
What if neither bool_col or int_col are mentioned in the column permutation? 
Will we still create a sort node?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-07 Thread Alex Behm (Code Review)
Alex Behm has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true);
> i think a 'show create table t' should print the sort-by clause, not a tabl
This means that the SHOW CREATE TABLE will only be executable by Impala, and 
not by any other engine like Hive. If that's acceptable, then I don't mind this 
solution.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Alex Behm 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-07 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 9:

(1 comment)

Thanks for the review. I reworked the toSql() logic so that the 
'sort.by.columns' property now is hidden from the user. Please see PS11.

http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true);
> i think a 'show create table t' should print the sort-by clause, not a tabl
Done


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-07 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#11).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
PARTITIONS 8 SORT BY(i) STORED AS KUDU;
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can be specified for all table types, but effectiveness
may vary based on table type, for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns for HDFS tables, and must not contain primary key columns for
Kudu tables. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeFileStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/analysis/ToSqlUtils.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
33 files changed, 1,210 insertions(+), 85 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/11
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 11
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 9:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true);
> I commented on the decision to keep the sort.by.columns property visible in
i think a 'show create table t' should print the sort-by clause, not a table 
property.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 10:

(8 comments)

Thank you for your reviews. Please see my inline comments and PS10.

http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS9, Line 376: analyzeSortByColumn
> Instead of checking this here, it's a bit cleaner to alway call analyzeSort
Done


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS9, Line 49: roperties.containsKe
> The caller is not supposed to modify the return values, correct? You may ju
Done, I ended up using ImmutableList, since I already use it below.


PS9, Line 77: 
> Same comment as above.
Done


PS9, Line 89: t by column in the li
> You can still construct and return an ImmutableList. See Guava's ImmutableL
I needed the contains() method in line 105, as well as size(), so I ended up 
using a LinkedHashSet.


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 515: AnalyzesOk("alter table functional.alltypes set tblproperties(" +
> why allow this in addition to the '... sort by ()' clause?
Alex and I discussed this and Alex suggested that we allow this, since it is 
easier than explicitly disabling it, and we'll always have the possibility that 
someone changes the values in Hive. The alternative would be to try and hide 
the table property completely. If Alex doesn't object and you think it's worth 
the effort to hide this from the users, I can implement it.


Line 1905: "tblproperties ('sort.by.columns'='i')", "Table definition 
must not contain " +
> same here, why allow the sort.by.columns table property?
Please see my reply above.


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

PS9, Line 2374: // SORT BY must be the first table option
  : ParserError("CREATE 
> You may want to comment why this results in a parsing error.
Done


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true);
> why not print that as a 'sort by' clause?
I commented on the decision to keep the sort.by.columns property visible in 
AnalyzeDDLTest.java. I discussed ToSqlTest() with Alex, too, and IIRC he 
explained that this is only really relevant for views, in which create table 
statements cannot occur anyways. Therefore, this should never be visible to a 
user and is tested here only for completeness. Should I add a comment to 
explain this?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Lars Volker (Code Review)
Hello Dimitris Tsirogiannis,

I'd like you to reexamine a change.  Please visit

http://gerrit.cloudera.org:8080/6495

to look at the new patch set (#10).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
PARTITIONS 8 SORT BY(i) STORED AS KUDU;
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can be specified for all table types, but effectiveness
may vary based on table type, for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns for HDFS tables, and must not contain primary key columns for
Kudu tables. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
30 files changed, 1,121 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/10
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 10
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 9:

(3 comments)

http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 515: AnalyzesOk("alter table functional.alltypes set tblproperties(" +
why allow this in addition to the '... sort by ()' clause?


Line 1905: "tblproperties ('sort.by.columns'='i')", "Table definition 
must not contain " +
same here, why allow the sort.by.columns table property?


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
File fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java:

Line 304: "TBLPROPERTIES ('sort.by.columns'='a')", true);
why not print that as a 'sort by' clause?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 9: Code-Review+1

(5 comments)

http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS9, Line 376: if (!isHBaseTable) 
Instead of checking this here, it's a bit cleaner to alway call 
analyzeSortByColumns() and move the check there.


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS9, Line 49: Lists.newArrayList()
The caller is not supposed to modify the return values, correct? You may just 
return Collections.emptyList() which is also immutable.


PS9, Line 77: Lists.newArrayList();
Same comment as above.


PS9, Line 89: Lists.newArrayList();
You can still construct and return an ImmutableList. See Guava's 
ImmutableList.builder()


http://gerrit.cloudera.org:8080/#/c/6495/9/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

PS9, Line 2374: ParserError("CREATE TABLE Foo LIKE Baz STORED AS TEXTFILE 
LOCATION '/a/b' " +
  : "SORT BY(bar)");
You may want to comment why this results in a parsing error.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 8:

(18 comments)

Thank you for the review, please see PS9.

http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS7, Line 984: Sor
> There is no KW_SET, right? Update the name?
Done


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java:

PS7, Line 73: 
> t?
Done


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 2484:* Add a warning that will be displayed to the user. Ignores null 
messages. Once
> Comment that no warning can be issued if warnings have been retrieved?
Done


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java:

PS7, Line 119: ortByColumns_ !
> I think this can throw a NPE. Looking at the parser I see that CreateTableL
Thanks for catching this. I was able to trigger the NPE with an additional test 
in ToSqlTest.java. Fixed.


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

PS7, Line 296: 
> inline
Done


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS7, Line 41: 
:   /**
:* Analyzes the 'sort.by.columns' property in 'tblProperties' 
against the columns of
:* 'table'.
:*/
:   public static List analyzeSortByColumns(Map tblProperties,
:   Table table) throws AnalysisException {
: if (!tblProperties.containsKey(TBL_PROP_SORT_BY_COLUMNS)) {
:   return Lists.newArrayList();
: }
: 
: List sortByCols = Lists.newArrayList(
: Splitter.on(",").trimResults().omitEmptyStrings().split(
: tblProperties.get(TBL_PROP_SORT_BY_COLUMNS)));
: return TablePropertyAnalyzer.analyzeSortByColumns(sortByCols, 
table);
:   }
: 
> These are used in only one place, right? Maybe inline there?
I removed the first one. The second one is used in two places: 
InsertStmt.analyzeSortByColumns() (that's where I inlined the first one), and 
AlterTableSetTblProperties.analyze(). I'd keep the second one unless you lean 
strongly towards inlining it.


Line 76: }
> Precondition check for HBase?
Done


PS7, Line 127: 
 : 
> You may want to consider if it's better to have a getSortByColumn(Table tab
Inlined it instead.


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS7, Line 539: // If the table has a 'sort.by.columns' property and the query 
has a 'noclustered'
 : // hint, we issue a warning during analysis and ignore the 
'noclustered' hint.
> That comment doesn't seem relevant here. Remove?
I put it there to explain the intent of the if statement. Without it one might 
think that the noclusteredhint gets ignored by mistake (see previous review 
comment by Thomas). Should I remove/rephrase it?


PS7, Line 541: nsertStmt.hasClusteredHint() || 
> If insertStmt.hasClusteredHint() is true doesn't that imply that hasNoClust
True, fixed.


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS7, Line 1793: !params.sort_by_columns.isEmpty()
> use isEmpty()
Done


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 526:   String long_property_key = "";
> Test for HBase and Kudu?
Added tests for Kudu in testAlterKuduTable(). HBase tables do not support 
"ALTER TABLE SET" altogether, which is tested in L683.


Line 1054: 
> Same here (HBase + Kudu). You need to put all Kudu tests in a function and 
Added test to check that HBase is not supported. Added Kudu tests to 
TestAlterKuduTable().


PS7, Line 1904: te t
> "must" typo
Done


PS7, Line 1920: est
> No need for that comment :)
Done


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

PS7, Line 2347: // Sort by clau

[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#9).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
PARTITIONS 8 SORT BY(i) STORED AS KUDU;
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can be specified for all table types, but effectiveness
may vary based on table type, for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns for HDFS tables, and must not contain primary key columns for
Kudu tables. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
30 files changed, 1,112 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/9
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 9
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-05 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#8).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
PARTITIONS 8 SORT BY(i) STORED AS KUDU;
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can be specified for all table types, but effectiveness
may vary based on table type, for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns for HDFS tables, and must not contain primary key columns for
Kudu tables. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
A fe/src/main/java/org/apache/impala/analysis/AlterTableSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
30 files changed, 1,115 insertions(+), 59 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/8
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 8
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-04 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 7:

(18 comments)

http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS7, Line 984: Set
There is no KW_SET, right? Update the name?


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java:

PS7, Line 73: getTargetTable()
t?


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

Line 2484:* Add a warning that will be displayed to the user. Ignores null 
messages.
Comment that no warning can be issued if warnings have been retrieved?


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java:

PS7, Line 119: sortByColumns_.
I think this can throw a NPE. Looking at the parser I see that 
CreateTableListStmt can pass null for sortByColumns_ and couldn't find where 
this changes.


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TableDef.java
File fe/src/main/java/org/apache/impala/analysis/TableDef.java:

PS7, Line 296: propertyString
inline


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS7, Line 41: /**
:* Analyzes the 'sort.by.columns' property of 'table'.
:*/
:   public static List analyzeSortByColumns(Table table) 
throws AnalysisException {
: return TablePropertyAnalyzer.analyzeSortByColumns(
: table.getMetaStoreTable().getParameters(), table);
:   }
: 
:   /**
:* Analyzes the 'sort.by.columns' property in 'tblProperties' 
against the columns of
:* 'table'.
:*/
:   public static List analyzeSortByColumns(Map tblProperties,
:   Table table) throws AnalysisException {
: return TablePropertyAnalyzer.analyzeSortByColumns(
: getSortByColumnsFromProperties(tblProperties), table);
:   }
These are used in only one place, right? Maybe inline there?


Line 76: }
Precondition check for HBase?


PS7, Line 127: private static List getSortByColumnsFromProperties(
 :   Map tblProperties) {
You may want to consider if it's better to have a getSortByColumn(Table table) 
function instead of this one. Just a though...


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS7, Line 539: // If the table has a 'sort.by.columns' property and the query 
has a 'noclustered'
 : // hint, we issue a warning during analysis and ignore the 
'noclustered' hint.
That comment doesn't seem relevant here. Remove?


PS7, Line 541: !insertStmt.hasNoClusteredHint()
If insertStmt.hasClusteredHint() is true doesn't that imply that 
hasNoClusteredHint() is false? I thought they are mutually exclusive.


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
File fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java:

PS7, Line 1793: params.sort_by_columns.size() > 0
use isEmpty()


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 526: 
Test for HBase and Kudu?


Line 1054:   }
Same here (HBase + Kudu). You need to put all Kudu tests in a function and 
check for supported.


PS7, Line 1904: most
"must" typo


PS7, Line 1920: // Kudu table tests are in TestCreateManagedKuduTable().
No need for that comment :)


http://gerrit.cloudera.org:8080/#/c/6495/7/fe/src/test/java/org/apache/impala/analysis/ParserTest.java
File fe/src/test/java/org/apache/impala/analysis/ParserTest.java:

PS7, Line 2347: // Sort by clause
How about some tests with additional table options? Try changing the order in 
which the table options are specified, e.g. location bla sort by ()


http://gerrit.cloudera.org:8080/#/c/6495/7/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

PS7, Line 207: table
select query


http://gerrit.cloudera.org:8080/#/c/6495/7/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 
testdata/workloads/functional-planner/queries/Pl

[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-04 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 5:

(5 comments)

Responses to some comments.

http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS5, Line 455: the caller must make sure that the value matches any columns 
he/she adds to the
 :* table.
> We could throw one if the number is wrong, but eventually the caller will h
No that's fine. You already mention that this is used only for tests.


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs()
> Insert hints are not allowed for HBase tables and the analysis will fail. F
If we enforce it at the analysis phase, then adding a Preconditions check here 
is not a bad idea.


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

PS5, Line 10: ASC NULLS LAST
> No, we cannot change it. I thought about this, but couldn't see why we'd wa
I was thinking more of changing the ASC into DESC but that's fine if this is an 
acceptable limitation.


PS5, Line 39:  DISTRIBUTEDPLAN
: WRITE TO HDFS [test_sort_by.alltypes, OVERWRITE=false, 
PARTITION-KEYS=(year,month)]
: |  partitions=24
: |
: 01:SORT
: |  order by: year ASC NULLS LAST, month ASC NULLS LAST, int_col 
ASC NULLS LAST, bool_col ASC NULLS LAST
: |
: 00:SCAN HDFS [functional.alltypes]
:partitions=24/24 files=24 size=478.45KB
> They make sure that the shuffle/noshuffle hint has been observed, indicated
Good point. I think we can leave them.


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-query/queries/QueryTest/alter-table.test
File testdata/workloads/functional-query/queries/QueryTest/alter-table.test:

PS5, Line 1066:  
> They are part of the output of "describe formatted". Without them the compa
It's probably because of the HMS schema used (maybe they use CHAR(N)). Anyways, 
no need to do something. I just wanted to make sure we don't do anything funky.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-04 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 7:

(20 comments)

Thank you for your review! Please see PS7.

http://gerrit.cloudera.org:8080/#/c/6495/5/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

Line 383:   // any such columns of the source table. If unspecified, the 
destination table will
> Otherwise? Will the destination table inherit the sort by columns of the so
Yes, it'll inherit them. I expanded the comment. We have a test for this in 
QueryTest/create-table-like-table.test.


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java:

PS5, Line 64: super.analyze(analyzer);
> Comment not applicable. Remove?
Done


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS5, Line 127: hasNoClusteredHint_
> Similar to the comment on hasShuffleHint, explain the allowed values for ha
Done


PS5, Line 763:  
> "if any"
Done


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS5, Line 77: return Lists.newArrayList();
> This is an immutable list whereas the next function returns a mutable one. 
Done


PS5, Line 83: 
:*/
> remove
Done


PS5, Line 91: sortByC
> nit: rename to sortByColName? since you also have tableCols it may be good 
Done


Line 92:   // Make sure it's not a primary key column or partition column.
> nit: remove empty line
Done


PS5, Line 102: 
 :   for (int j = 0; j < tableCols.size(); ++j) {
> Not sure I follow this part based on the code below.
Done, updated the comment.


Line 120: return colIdxs;
> nit: remove empty line
Done


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS5, Line 455: the caller must make sure that the value matches any columns 
he/she adds to the
 :* table.
> Can't we throw an exception instead?
We could throw one if the number is wrong, but eventually the caller will have 
to provide k columns and then set this to some value 0 <= n < k. Should we 
check that? The comment was meant to indicate that the caller will be on 
themselves to use this.


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs()
> I assume this works for HBase tables...
Insert hints are not allowed for HBase tables and the analysis will fail. From 
looking at InsertStmt I think that this would return an empty list of exprs. 
However I think it could be more explicit. Should I add an "instanceof" check 
for HdfsTable here?


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

PS5, Line 521: 'sort.by.columns'='id
> How about 'ID' (check for case)?
Indeed it did not work, I fixed it.


PS5, Line 1642: "CREATE TABLE LIKE is not supported for Kudu tables");
> Also, add the negative case where sort by cols don't exist in the source ta
Done


PS5, Line 1913: // Partitioned HDFS table
  : AnalyzesOk("create table functional.new_table (i int) 
PARTITIONED BY (d decimal)" +
  : "SORT BY (i)");
  : // Column in sortby hint must not be a Hdfs partition 
column.
  : AnalysisError("create table functional.new_table (i int) 
PARTITIONED BY (d decimal)" +
  : "SORT BY (d)", "SORT BY column list must not contain 
partition column: 'd'");
  : 
  : // Kudu table tests are in TestCreateManagedKuduTable().
  :   }
  : 
  :   @Test
  :   public void TestAlterKuduTable() {
  : TestUtils.assumeKuduIsSupported();
  :
> For Kudu specific tests they need to be in a function that first calls Test
I move the tests to TestCreateManagedKuduTable().

We don't support setting sort by columns for HBase tables (I added code to 
AlterTableSortByStmt to enforce this). During analysis, we don't parse the 
table property, either.


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

PS5, Line 162: create table t sort by
> Add one with a partitioned table? Also, a CTAS with a more 

[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-04 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#7).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
PARTITIONS 8 SORT BY(i) STORED AS KUDU;
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can be specified for all table types, but effectiveness
may vary based on table type, for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns for HDFS tables, and must not contain primary key columns for
Kudu tables. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
30 files changed, 1,062 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/7
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 7
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-04 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#6).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
PARTITIONS 8 SORT BY(i) STORED AS KUDU;
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can be specified for all table types, but effectiveness
may vary based on table type, for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns for HDFS tables, and must not contain primary key columns for
Kudu tables. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
30 files changed, 1,060 insertions(+), 58 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/6
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 6
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-04-03 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 5:

(20 comments)

http://gerrit.cloudera.org:8080/#/c/6495/5/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

Line 383:   // any such columns of the source table.
Otherwise? Will the destination table inherit the sort by columns of the source 
table, if any?


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java:

PS5, Line 64: // Analyze 'sort.by.columns' property.
Comment not applicable. Remove?


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS5, Line 127: hasNoClusteredHint_
Similar to the comment on hasShuffleHint, explain the allowed values for 
hasClusteredHint and hasNotClusteredHint.


PS5, Line 763:  
"if any"


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS5, Line 77: return ImmutableList.of();
This is an immutable list whereas the next function returns a mutable one. Make 
it consistent?


PS5, Line 83: If there are any errors, 'error' contains an
:* error string. Otherwise, 'error' is left untouched.
remove


PS5, Line 91: colName
nit: rename to sortByColName? since you also have tableCols it may be good to 
make it explicit which ones you refer to in the for loop.


Line 92: 
nit: remove empty line


PS5, Line 102: and store
 :   // the corresponding result expr in sortByExprs_.
Not sure I follow this part based on the code below.


Line 120: 
nit: remove empty line


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

PS5, Line 455: the caller must make sure that the value matches any columns 
he/she adds to the
 :* table.
Can't we throw an exception instead?


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS5, Line 546: orderingExprs.addAll(insertStmt.getPartitionKeyExprs()
I assume this works for HBase tables...


http://gerrit.cloudera.org:8080/#/c/6495/5/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

PS5, Line 521: 'sort.by.columns'='id
How about 'ID' (check for case)?


PS5, Line 1642: AnalyzesOk("create table tbl sort by (int_col,id) like 
functional.alltypes");
Also, add the negative case where sort by cols don't exist in the source table.


PS5, Line 1913: // Kudu table
  : AnalyzesOk("create table tab (i int, x int primary key) 
partition by hash(x) " +
  : "partitions 8 sort by(i) stored as kudu");
  : // Column in sortby hint must not be a Kudu primary key 
column.
  : AnalysisError("create table tab (i int, x int primary key) 
partition by hash(x) " +
  : "partitions 8 sort by(x) stored as kudu", "SORT BY 
column list must not " +
  : "contain primary key column: 'x'");
  : AnalysisError("create table tab (i int, x int, y int, 
primary key (x)) partition " +
  : "by hash(x) partitions 8 sort by(x) stored as kudu", 
"SORT BY column list must " +
  : "not contain primary key column: 'x'");
  : AnalysisError("create table tab (i int, x int, y int, 
primary key(x, y)) partition " +
  : "by hash(y) partitions 8 sort by (i,x) stored as kudu", 
"SORT BY column list " +
  : "must not contain primary key column: 'x'");
  :   }
For Kudu specific tests they need to be in a function that first calls 
TestUtils.assumeKuduIsSupported() (see L1930). Do you have tests for HBase?


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
File testdata/workloads/functional-planner/queries/PlannerTest/ddl.test:

PS5, Line 162: create table t sort by
Add one with a partitioned table? Also, a CTAS with a more complex select 
(joins, maybe aggs)


http://gerrit.cloudera.org:8080/#/c/6495/5/testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
File 
testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test:

PS5, Line 10: ASC NULLS LAST
Out of curiosity? Can we change the default behavior? If no, are we ok with 
this restriction?


PS5, Line 39:  DISTRIBUTEDPLAN
: WRITE TO HDFS [t

[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG
Commit Message:

PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts,
: so the SORT node will contain all partitioning columns first, 
followed
: by the SORT BY columns.
> in general, the commit message shouldn't act as documentation. for things l
That's fine by me. Feel free to simplify the description here and move it to 
the JIRA. I just wanted to understand the implemented semantics instead of 
trying to infer them from the code.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Marcel Kornacker (Code Review)
Marcel Kornacker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 3:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG
Commit Message:

PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts,
: so the SORT node will contain all partitioning columns first, 
followed
: by the SORT BY columns.
> Done
in general, the commit message shouldn't act as documentation. for things like 
that, we should use the jira, which is more accessible as a public record.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Marcel Kornacker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 4:

(7 comments)

Thank you for the reviews. Please see my comments and PS5.

http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG
Commit Message:

PS3, Line 9: SORT BY (...) clauses to CREATE
   : TABLE and ALTER TABLE statements
> It would be helpful to have the formal specification of the new syntax here
Done


PS3, Line 13:  TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
: PARTITIONS 8 SORT BY(i) STORED AS KUDU;
: CREATE TABLE t SORT BY 
> Can you expand the comment a little bit on the semantics of this? A few hig
Done


http://gerrit.cloudera.org:8080/#/c/6495/3/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

PS3, Line 382: Optiona
> Optional?
Done


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS3, Line 1132: // This extra production is necessary since without it the 
parser will not be able to
  :   // parse "CREATE TABLE A LIKE B".
> I don't understand this comment, since the sort_col_cols here is optional.
At first I tried adding sort_by_cols to the production above, but then the 
parser would not pick it up for queries like "CREATE TABLE A LIKE B" and 
instead try to parse it only as a "CREATE TABLE LIKE FILE" query. I could not 
figure out why it does that, but copying the rule fixes it. I agree that it 
should work. I'll ask Alex for help.


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 452: 
> I think this only makes sense in the context of testing? If so, note that h
Done


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS3, Line 539: s' property and the query has a 'noclu
> I think that you need a '&& !hasNoClusteredHint()' here.
Done, also added a comment to explain that the hint will be ignored if there 
are sort by columns.


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 515: AnalyzesOk("alter table functional.alltypes set tblproperties(" +
> Is 'sort.by.columns'='' handled?
Yes, added a test.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#5).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
PARTITIONS 8 SORT BY(i) STORED AS KUDU;
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can be specified for all table types, but effectiveness
may vary based on table type, for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns for HDFS tables, and must not contain primary key columns for
Kudu tables. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
30 files changed, 856 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/5
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 5
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#4).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. Examples are:

CREATE TABLE t (i INT, j INT, k INT) PARTITIONED BY (l INT) SORT BY (i, j);
CREATE TABLE t (i INT, x INT PRIMARY KEY) PARTITION BY HASH(x)
PARTITIONS 8 SORT BY(i) STORED AS KUDU;
CREATE TABLE t SORT BY (int_col,id) LIKE u;
CREATE TABLE t LIKE PARQUET '/foo' SORT BY (id,zip);

ALTER TABLE t SORT BY (int_col,id);
ALTER TABLE t SORT BY ();

SORT BY columns can be specified for all table types, but effectiveness
may vary based on table type, for example TEXT tables will not see
improved compression. The SORT BY clause must not contain clustering
columns for HDFS tables, and must not contain primary key columns for
Kudu tables. The columns in the SORT BY clause are stored in the
'sort.by.columns' table property and will result in an additional SORT
node being added to the plan before the final table sink. Specifying
SORT BY columns also enables clustering during inserts, so the SORT node
will contain all partitioning columns first, followed by the SORT BY
columns. We do this because SORT BY columns add a SORT node to the plan
and adding the clustering columns to the SORT node is cheap.

SORT BY columns supersede the sortby() hint, which we will remove in a
subsequent change (IMPALA-5144).

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
30 files changed, 856 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/4
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 4
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Thomas Tauber-Marshall 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Thomas Tauber-Marshall (Code Review)
Thomas Tauber-Marshall has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 3:

(6 comments)

Just a heads up, this is going to have some conflicts with IMPALA-3742 
(partition and sort Kudu inserts) which I'll have a patch out for soon, but 
this will probably go in first so I guess that's my problem.

http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG
Commit Message:

PS3, Line 9: SORT BY (...) clauses to CREATE
   : TABLE and ALTER TABLE statements
It would be helpful to have the formal specification of the new syntax here, to 
show where the clause would go, e.g.:
ALTER TABLE name SORT BY (col_spec[, col_spec...])
etc.


http://gerrit.cloudera.org:8080/#/c/6495/3/common/thrift/JniCatalog.thrift
File common/thrift/JniCatalog.thrift:

PS3, Line 382: Options
Optional?


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS3, Line 1132: // This extra production is necessary since without it the 
parser will not be able to
  :   // parse "CREATE TABLE A LIKE B".
I don't understand this comment, since the sort_col_cols here is optional.


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/catalog/Table.java
File fe/src/main/java/org/apache/impala/catalog/Table.java:

Line 452:   public void setNumClusteringCols(int n) { numClusteringCols_ = n; }
I think this only makes sense in the context of testing? If so, note that here.


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/main/java/org/apache/impala/planner/Planner.java
File fe/src/main/java/org/apache/impala/planner/Planner.java:

PS3, Line 539: !insertStmt.getSortByExprs().isEmpty()
I think that you need a '&& !hasNoClusteredHint()' here.


http://gerrit.cloudera.org:8080/#/c/6495/3/fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
File fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java:

Line 515: AnalysisError("alter table functional.alltypes set 
tblproperties(" +
Is 'sort.by.columns'='' handled?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-Reviewer: Thomas Tauber-Marshall 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-30 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 3:

(1 comment)

Lars, a few high level questions before I start the next review iteration. 
Thanks

http://gerrit.cloudera.org:8080/#/c/6495/3//COMMIT_MSG
Commit Message:

PS3, Line 13: Specifying SORT BY columns also enables clustering during inserts,
: so the SORT node will contain all partitioning columns first, 
followed
: by the SORT BY columns.
Can you expand the comment a little bit on the semantics of this? A few high 
level questions are:
1. What is the connection between this and the sortby() hints? Can I use both? 
What if there is a conflict? Same for clustered hint.
2. Why does sort by enables clustering as well? 
3. Can I specify any table columns (even the partitioning columns) in the SORT 
BY clause or are there some restrictions? 
4. I suppose this works for all table types (including Kudu)?


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-29 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 2:

(20 comments)

Thank you for your Review. Please see my comments and PS3.

http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS2, Line 986: // SORT BY columns are stored in the 'sort.by.columns' table 
property.
 : HashMap properties = new HashMap();
 : String columnString = Joiner.on(",").join(col_names);
 : 
properties.put(TablePropertyAnalyzer.TBL_PROP_SORT_BY_COLUMNS, columnString);
 : RESULT = new AlterTableSetTblProperties(table, null, 
TTablePropertyType.TBL_PROPERTY,
 : properties);
 :   :}
> I don't see why you need to treat it as another SetTblProperties statement.
Done


PS2, Line 1143: opt_sort_by_cols:sort_by_cols
  :   KW_LIKE table_name:other_table
  :   opt_comment_val:comment
  :   file_format_create_table_val:file_format location_val:location
> nit: indentation
Done


PS2, Line 1182:   opt_sort_by_cols:sort_by_cols opt_comment_val:comment 
row_format_val:row_format serde_properties:serde_props
> nit: long line
Done


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

PS2, Line 162: /**
 :* Analyzes the 'sort.by.columns' property to make sure the 
columns inside the property
 :* occur in the target table and are neither partitioning nor 
primary key columns.
 :*/
 :   public void analyzeSortByColumns() throws AnalysisException {
 : StringBuilder error = new StringBuilder();
 : TablePropertyAnalyzer.analyzeSortByColumns(tblProperties_, 
getTargetTable(), error);
 : if (error.length() > 0) throw new 
AnalysisException(error.toString());
 :   }
> Along the lines of my previous comment: a) I would move this completely out
Done


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS2, Line 275: Boolean
> boolean? why class?
Done


PS2, Line 2486: Preconditions.checkState(!globalState_.warningsRetrieved);
> This is kind of harsh :). Do we guarantee somewhere that no one will call a
We currently don't guarantee that, so we'd be adding that here. Should I remove 
it? Or add a comment to this method and getWarnings() to explain the behavior. 
I ran into this when trying to add warnings to the Analyzer after getWarnings() 
had been called and my intention was to prevent that error in the future. I 
don't feel strongly about keeping this.


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java:

PS2, Line 119: sortByColumns_.size() > 0
> !sortByColumns_.isEmpty()
Done


PS2, Line 179: StringBuilder error = new StringBuilder();
 :   TablePropertyAnalyzer.analyzeSortByColumns(sortByColumns_, 
srcTable, error);
 :   if (error.length() > 0) throw new 
AnalysisException(error.toString());
> Why don't you just throw the AnalysisException from the analyzeSortByColumn
Done. When I started with this I thought I may have to call the method from 
places after the analysis so throwing AnalysisExceptions wouldn't work. That 
turned out to be false, so I now throw in analyzeSortByColumns() directly.


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS2, Line 382:  sortByExprs_.size() > 0
> !sortByExprs_.isEmpty()
Done


PS2, Line 510: sortByExprs_.add(resultExprs_.get(colIdx));
> Does this work ok if you have column permutation?
I think it does. colIdx is the index in the target table, and resultExprs_[i] 
gets written into column[i]. Lines 703ff in prepareExpressions seem to 
guarantee this. I added tests to the PlannerTest/insert-sort-by.test file.


PS2, Line 768: StringBuilder error = new StringBuilder();
 : sortByColumns_ = 
TablePropertyAnalyzer.analyzeSortByColumns(table_, error);
 : if (error.length() > 0) throw new 
AnalysisException(error.toString());
> Same comment as before. Move the throw clause inside the analyzeSortByColum
Done


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS2, Line 33: /**
:  * This class collects various methods to analyze table 
properties.
:  *

[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-29 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#3).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. The columns in the SORT BY clause are
stored in the 'sort.by.columns' table property and will result in an
additional SORT node being added to the plan before the final table
sink. Specifying SORT BY columns also enables clustering during inserts,
so the SORT node will contain all partitioning columns first, followed
by the SORT BY columns.

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
A 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetSortByColumnsStmt.java
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
30 files changed, 845 insertions(+), 54 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/3
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 3
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Dimitris Tsirogiannis 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-28 Thread Dimitris Tsirogiannis (Code Review)
Dimitris Tsirogiannis has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 2:

(20 comments)

Initial pass, haven't looked at the tests yet.

http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/cup/sql-parser.cup
File fe/src/main/cup/sql-parser.cup:

PS2, Line 986: // SORT BY columns are stored in the 'sort.by.columns' table 
property.
 : HashMap properties = new HashMap();
 : String columnString = Joiner.on(",").join(col_names);
 : 
properties.put(TablePropertyAnalyzer.TBL_PROP_SORT_BY_COLUMNS, columnString);
 : RESULT = new AlterTableSetTblProperties(table, null, 
TTablePropertyType.TBL_PROPERTY,
 : properties);
 :   :}
I don't see why you need to treat it as another SetTblProperties statement. I 
understand that eventually the sort by columns will be stored as table 
properties but this adds unnecessary complexity to the parser. I would simply 
create a new AlterTableSetSortByColumns statement.


PS2, Line 1143: opt_sort_by_cols:sort_by_cols
  :   KW_LIKE table_name:other_table
  :   opt_comment_val:comment
  :   file_format_create_table_val:file_format location_val:location
nit: indentation


PS2, Line 1182:   opt_sort_by_cols:sort_by_cols opt_comment_val:comment 
row_format_val:row_format serde_properties:serde_props
nit: long line


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
File 
fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java:

PS2, Line 162: /**
 :* Analyzes the 'sort.by.columns' property to make sure the 
columns inside the property
 :* occur in the target table and are neither partitioning nor 
primary key columns.
 :*/
 :   public void analyzeSortByColumns() throws AnalysisException {
 : StringBuilder error = new StringBuilder();
 : TablePropertyAnalyzer.analyzeSortByColumns(tblProperties_, 
getTargetTable(), error);
 : if (error.length() > 0) throw new 
AnalysisException(error.toString());
 :   }
Along the lines of my previous comment: a) I would move this completely out of 
this class, b) create a new AlterTableSetSortByColumns and move the analysis 
logic there.


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/Analyzer.java
File fe/src/main/java/org/apache/impala/analysis/Analyzer.java:

PS2, Line 275: Boolean
boolean? why class?


PS2, Line 2486: Preconditions.checkState(!globalState_.warningsRetrieved);
This is kind of harsh :). Do we guarantee somewhere that no one will call 
addWarning after warnings have been retrieved?


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
File fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java:

PS2, Line 119: sortByColumns_.size() > 0
!sortByColumns_.isEmpty()


PS2, Line 179: StringBuilder error = new StringBuilder();
 :   TablePropertyAnalyzer.analyzeSortByColumns(sortByColumns_, 
srcTable, error);
 :   if (error.length() > 0) throw new 
AnalysisException(error.toString());
Why don't you just throw the AnalysisException from the analyzeSortByColumns()?


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
File fe/src/main/java/org/apache/impala/analysis/InsertStmt.java:

PS2, Line 382:  sortByExprs_.size() > 0
!sortByExprs_.isEmpty()


PS2, Line 510: sortByExprs_.add(resultExprs_.get(colIdx));
Does this work ok if you have column permutation?


PS2, Line 768: StringBuilder error = new StringBuilder();
 : sortByColumns_ = 
TablePropertyAnalyzer.analyzeSortByColumns(table_, error);
 : if (error.length() > 0) throw new 
AnalysisException(error.toString());
Same comment as before. Move the throw clause inside the analyzeSortByColumns() 
function.


http://gerrit.cloudera.org:8080/#/c/6495/2/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

PS2, Line 33: /**
:  * This class collects various methods to analyze table 
properties.
:  */
: public class TablePropertyAnalyzer {
You're only analyzing the sort by columns, so why this generic class? Can't we 
move these methods to the TableDef class?


PS2, Line 41: Analyzes the 'sort.by.columns' property of 'table'.
What is the return value?


PS2, Line 86: List partitionCols, List primaryKeyCols
You can simply pass one list which includes the list of column names to exclude 
from the sort columns. If you want a specific error message you can construct 
it in function analyzeSortByC

[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-28 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 2:

PS2 adds a file I had forgotten to "git add" in PS1.

-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: No


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-28 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new patch set (#2).

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. The columns in the SORT BY clause are
stored in the 'sort.by.columns' table property and will result in an
additional SORT node being added to the plan before the final table
sink. Specifying SORT BY columns also enables clustering during inserts,
so the SORT node will contain all partitioning columns first, followed
by the SORT BY columns.

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
A testdata/workloads/functional-planner/queries/PlannerTest/insert-sort-by.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
29 files changed, 730 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/2
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newpatchset
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 2
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-27 Thread Lars Volker (Code Review)
Lars Volker has posted comments on this change.

Change subject: IMPALA-4166: Add SORT BY sql clause
..


Patch Set 1:

(1 comment)

http://gerrit.cloudera.org:8080/#/c/6495/1/fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
File fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java:

Line 41:* Analyzes the 'sort.by.columns' property of 'table'.
I'd be glad for a tip how to organize the different wrappers and comments.


-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: comment
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 
Gerrit-Reviewer: Lars Volker 
Gerrit-HasComments: Yes


[Impala-ASF-CR] IMPALA-4166: Add SORT BY sql clause

2017-03-27 Thread Lars Volker (Code Review)
Lars Volker has uploaded a new change for review.

  http://gerrit.cloudera.org:8080/6495

Change subject: IMPALA-4166: Add SORT BY sql clause
..

IMPALA-4166: Add SORT BY sql clause

This change adds support for adding SORT BY (...) clauses to CREATE
TABLE and ALTER TABLE statements. The columns in the SORT BY clause are
stored in the 'sort.by.columns' table property and will result in an
additional SORT node being added to the plan before the final table
sink. Specifying SORT BY columns also enables clustering during inserts,
so the SORT node will contain all partitioning columns first, followed
by the SORT BY columns.

This change introduces a new TablePropertyAnalyzer class to collect
various methods used to analyze table properties.

Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
---
M common/thrift/JniCatalog.thrift
M fe/src/main/cup/sql-parser.cup
M fe/src/main/java/org/apache/impala/analysis/AlterTableSetTblProperties.java
M fe/src/main/java/org/apache/impala/analysis/Analyzer.java
M fe/src/main/java/org/apache/impala/analysis/BaseTableRef.java
M fe/src/main/java/org/apache/impala/analysis/CreateTableLikeStmt.java
M fe/src/main/java/org/apache/impala/analysis/InsertStmt.java
M fe/src/main/java/org/apache/impala/analysis/TableDef.java
A fe/src/main/java/org/apache/impala/analysis/TablePropertyAnalyzer.java
M fe/src/main/java/org/apache/impala/catalog/Column.java
M fe/src/main/java/org/apache/impala/catalog/Table.java
M fe/src/main/java/org/apache/impala/planner/Planner.java
M fe/src/main/java/org/apache/impala/service/CatalogOpExecutor.java
M fe/src/main/jflex/sql-scanner.flex
M fe/src/test/java/org/apache/impala/analysis/AnalyzeDDLTest.java
M fe/src/test/java/org/apache/impala/analysis/AnalyzeStmtsTest.java
M fe/src/test/java/org/apache/impala/analysis/ParserTest.java
M fe/src/test/java/org/apache/impala/analysis/ToSqlTest.java
M fe/src/test/java/org/apache/impala/common/FrontendTestBase.java
M fe/src/test/java/org/apache/impala/planner/PlannerTest.java
M testdata/workloads/functional-planner/queries/PlannerTest/ddl.test
M testdata/workloads/functional-planner/queries/PlannerTest/insert.test
M testdata/workloads/functional-query/queries/QueryTest/alter-table.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-as-select.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-file.test
M 
testdata/workloads/functional-query/queries/QueryTest/create-table-like-table.test
M testdata/workloads/functional-query/queries/QueryTest/create-table.test
M testdata/workloads/functional-query/queries/QueryTest/show-create-table.test
28 files changed, 609 insertions(+), 51 deletions(-)


  git pull ssh://gerrit.cloudera.org:29418/Impala-ASF refs/changes/95/6495/1
-- 
To view, visit http://gerrit.cloudera.org:8080/6495
To unsubscribe, visit http://gerrit.cloudera.org:8080/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: I08834f38a941786ab45a4381c2732d929a934f75
Gerrit-PatchSet: 1
Gerrit-Project: Impala-ASF
Gerrit-Branch: master
Gerrit-Owner: Lars Volker 


  1   2   >