fsk119 commented on code in PR #20696: URL: https://github.com/apache/flink/pull/20696#discussion_r961242871
########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview.md: ########## @@ -0,0 +1,163 @@ +--- +title: "Overview" +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/Queries/overview +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Queries + +## Description + +Hive dialect supports a commonly-used subset of Hive’s [DQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select). +The following lists some parts of HiveQL supported by the Hive dialect. + +- [Sort/Cluster/Distributed BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by" >}}) +- [Group By]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/group-by" >}}) +- [Join]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/join" >}}) +- [Set Operation]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op" >}}) +- [Lateral View]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/lateral-view" >}}) +- [Window Functions]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/window-functions" >}}) +- [SubQueries]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) +- [CTE]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/cte" >}}) +- [Transform]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/transform" >}}) +- [Table Sample]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/table-sample" >}}) + +## Syntax + +The following section describes the overall query syntax. +The SELECT clause can be part of a query which also includes common table expressions (CTE), set operations, and various other clauses. + +```sql +[WITH CommonTableExpression (, CommonTableExpression)*] +SELECT [ALL | DISTINCT] select_expr, select_expr, ... + FROM table_reference + [WHERE where_condition] + [GROUP BY col_list] + [ORDER BY col_list] + [CLUSTER BY col_list + | [DISTRIBUTE BY col_list] [SORT BY col_list] + ] + [LIMIT [offset,] rows] +``` +- A `SELECT` statement can be part of a [set]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op" >}}) query or a [subquery]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) of another query +- `table_reference` indicates the input to the query. It can be a regular table, a view, a [join]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/join" >}}) or a [subquery]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}). +- Table names and column names are case-insensitive + +### WHERE Clause + +The `WHERE` condition is a boolean expression. Hive dialect supports a number of [operators and UDFS](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF) +in the `WHERE` clause. Some types of [sub queries]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) are supported in `WHERE` clause. Review Comment: sub queryies -> subqueries ? I find in some cases you use sub-query, sub query or subquery. I think it's better we can unify this. ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview.md: ########## @@ -0,0 +1,163 @@ +--- +title: "Overview" +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/Queries/overview +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Queries + +## Description + +Hive dialect supports a commonly-used subset of Hive’s [DQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select). +The following lists some parts of HiveQL supported by the Hive dialect. + +- [Sort/Cluster/Distributed BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by" >}}) +- [Group By]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/group-by" >}}) +- [Join]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/join" >}}) +- [Set Operation]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op" >}}) +- [Lateral View]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/lateral-view" >}}) +- [Window Functions]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/window-functions" >}}) +- [SubQueries]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) +- [CTE]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/cte" >}}) +- [Transform]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/transform" >}}) +- [Table Sample]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/table-sample" >}}) + +## Syntax + +The following section describes the overall query syntax. +The SELECT clause can be part of a query which also includes common table expressions (CTE), set operations, and various other clauses. + +```sql +[WITH CommonTableExpression (, CommonTableExpression)*] Review Comment: Could you describe what is CommonTableExpression? I don't find any description or example in the doc. ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by.md: ########## @@ -0,0 +1,94 @@ +--- +title: "Sort/Cluster/Distributed By" +weight: 2 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Sort/Cluster/Distributed by Clause + +## Sort By + +### Description + +Unlike [ORDER BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview" >}}#order-by-clause) which guarantees a total order of output, +`SORT BY` only guarantees the result rows with each partition is in the user specified order. +So when there's more than one partition, `SORT BY` may return result that's partially ordered. + +### Syntax + +```sql +colOrder: ( ASC | DESC ) +sortBy: SORT BY BY expression [ , ... ] +query: SELECT expression [ , ... ] FROM src sortBy +``` + +### Parameters +- colOrder + + it's used specified the order of returned rows. The default order is `ASC`. + +### Examples + +```sql +SELECT x, y FROM t SORT BY x; +SELECT x, y FROM t SORT BY abs(y) DESC; +``` + +## Distribute By + +### Description + +The `DISTRIBUTE BY` clause is used to repartition the data. +The data with same value evaluated by the specified expression will be in same partition. + +### Syntax + +```sql +distributeBy: DISTRIBUTE BY expression [ , ... ] +query: SELECT expression [ , ... ] FROM src distributeBy +``` + +### Examples + +```sql +SELECT x, y FROM t DISTRIBUTE BY x; +SELECT x, y FROM t DISTRIBUTE BY abs(y); +``` + +## Cluster By + +### Description + +`CLUSTER BY` is a short-cut for both `DISTRIBUTE BY` and `SORT BY`. Review Comment: Can cluster by clause specify sort by order? ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by.md: ########## @@ -0,0 +1,94 @@ +--- +title: "Sort/Cluster/Distributed By" +weight: 2 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Sort/Cluster/Distributed by Clause + +## Sort By + +### Description + +Unlike [ORDER BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview" >}}#order-by-clause) which guarantees a total order of output, +`SORT BY` only guarantees the result rows with each partition is in the user specified order. +So when there's more than one partition, `SORT BY` may return result that's partially ordered. + +### Syntax + +```sql +colOrder: ( ASC | DESC ) +sortBy: SORT BY BY expression [ , ... ] +query: SELECT expression [ , ... ] FROM src sortBy +``` + +### Parameters +- colOrder + + it's used specified the order of returned rows. The default order is `ASC`. + +### Examples + +```sql +SELECT x, y FROM t SORT BY x; +SELECT x, y FROM t SORT BY abs(y) DESC; +``` + +## Distribute By + +### Description + +The `DISTRIBUTE BY` clause is used to repartition the data. +The data with same value evaluated by the specified expression will be in same partition. + +### Syntax + +```sql +distributeBy: DISTRIBUTE BY expression [ , ... ] +query: SELECT expression [ , ... ] FROM src distributeBy +``` + +### Examples + +```sql +SELECT x, y FROM t DISTRIBUTE BY x; +SELECT x, y FROM t DISTRIBUTE BY abs(y); Review Comment: Users can use DISTRIBUTE BY with SORT BY. Could you give an example about it. ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/join.md: ########## @@ -0,0 +1,105 @@ +--- +title: "Join" +weight: 4 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Join + +## Description + +`JOIN` is used to combine rows from two relations based on join condition. + +## Syntax + +Hive Dialect supports the following syntax for joining tables: +```sql +join_table: + table_reference [ INNER ] JOIN table_factor [ join_condition ] + | table_reference { LEFT | RIGHT | FULL } [ OUTER ] JOIN table_reference join_condition + | table_reference LEFT SEMI JOIN table_reference [ ON expression ] + | table_reference CROSS JOIN table_reference [ join_condition ] + +table_reference: + table_factor + | join_table + +table_factor: + tbl_name [ alias ] + | table_subquery alias + | ( table_references ) Review Comment: what is table_references here? ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/create.md: ########## @@ -0,0 +1,247 @@ +--- +title: "CREATE Statements" +weight: 2 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/create.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# CREATE Statements + +With Hive dialect, the following CREATE statements are supported for now: + +- CREATE DATABASE +- CREATE TABLE +- CREATE VIEW +- CREATE MARCO +- CREATE FUNCTION + +## CREATE DATABASE + +### Description + +`CREATE DATABASE` statement is used to create a database with the specified name. + +### Syntax + +```sql +CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name + [COMMENT database_comment] + [LOCATION hdfs_path] + [WITH DBPROPERTIES (property_name=property_value, ...)]; +``` + +### Examples + +```sql +CREATE DATABASE db1; +CREATE DATABASE IF NOT EXISTS db1 COMMENT 'db1' LOCATION '/user/hive/warehouse/db1' + WITH DBPROPERTIES ('name'='example-db'); +``` + + +## CREATE TABLE + +### Description + +`CREATE TABLE` statement is used to define a table in an existing database. + +### Syntax + +```sql +CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name + [(col_name data_type [column_constraint] [COMMENT col_comment], ... [table_constraint])] + [COMMENT table_comment] + [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] + [ + [ROW FORMAT row_format] + [STORED AS file_format] + ] + [LOCATION fs_path] + [TBLPROPERTIES (property_name=property_value, ...)] + [AS select_statment]; + +data_type + : primitive_type + | array_type + | map_type + | struct_type +primitive_type + : TINYINT + | SMALLINT + | INT + | BIGINT + | BOOLEAN + | FLOAT + | DOUBLE + | DOUBLE PRECISION + | STRING + | BINARY + | TIMESTAMP + | DECIMAL + | DECIMAL(precision, scale) + | DATE + | VARCHAR + | CHAR +array_type + : ARRAY < data_type > + +array_type + : ARRAY < data_type > +struct_type + : STRUCT < col_name : data_type [COMMENT col_comment], ...> +row_format: + : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char] + [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] + [NULL DEFINED AS char] + | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, ...)] +file_format: + : SEQUENCEFILE + | TEXTFILE + | RCFILE + | ORC + | PARQUET + | AVRO + | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname +column_constraint: + : NOT NULL +table_constraint: + : [CONSTRAINT constraint_name] PRIMARY KEY (col_name, ...) +``` + +{{< hint warning >}} +**NOTE:** + +- Create table with `STORED BY 'class_name'` / `CLUSTERED BY` / `SKEWED BY` is not supported yet. Review Comment: I don't find the `CLUSTERED BY` and `SKEWED BY` in the syntax. I think we can remove this. ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/create.md: ########## @@ -0,0 +1,247 @@ +--- +title: "CREATE Statements" +weight: 2 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/create.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# CREATE Statements + +With Hive dialect, the following CREATE statements are supported for now: + +- CREATE DATABASE +- CREATE TABLE +- CREATE VIEW +- CREATE MARCO +- CREATE FUNCTION + +## CREATE DATABASE + +### Description + +`CREATE DATABASE` statement is used to create a database with the specified name. + +### Syntax + +```sql +CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name + [COMMENT database_comment] + [LOCATION hdfs_path] + [WITH DBPROPERTIES (property_name=property_value, ...)]; +``` + +### Examples + +```sql +CREATE DATABASE db1; +CREATE DATABASE IF NOT EXISTS db1 COMMENT 'db1' LOCATION '/user/hive/warehouse/db1' + WITH DBPROPERTIES ('name'='example-db'); +``` + + +## CREATE TABLE + +### Description + +`CREATE TABLE` statement is used to define a table in an existing database. + +### Syntax + +```sql +CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name + [(col_name data_type [column_constraint] [COMMENT col_comment], ... [table_constraint])] + [COMMENT table_comment] + [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] + [ + [ROW FORMAT row_format] + [STORED AS file_format] + ] + [LOCATION fs_path] + [TBLPROPERTIES (property_name=property_value, ...)] + [AS select_statment]; Review Comment: create table doesn't have AS caluse ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/group-by.md: ########## @@ -0,0 +1,129 @@ +--- +title: "Group By" +weight: 3 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Group By Clause + +## Description + +The `Group by` clause is used to compute a single result from multiple input rows with given aggregation function. +Hive dialect also supports enhanced aggregation features to do multiple aggregations based on the same record by using +`ROLLUP`/`CUBE`/`GROUPING SETS`. + +## Syntax + +```sql +groupByClause: groupByClause-1 | groupByClause-2 +groupByClause-1: GROUP BY group_expression [, ...] [ WITH ROLLUP | WITH CUBE ] + +groupByClause-2: GROUP BY { group_expression | { ROLLUP | CUBE | GROUPING SETS } ( grouping_set [, ...] ) } [, ...] +grouping_set: { expression | ( [ expression [, ...] ] ) } + +groupByQuery: SELECT expression [, ...] FROM src groupByClause? +``` +In `group_expression`, columns can be also specified by position number. But please remember: +- For Hive 0.11.0 through 2.1.x, set `hive.groupby.orderby.position.alias` to true (the default is false) +- For Hive 2.2.0 and later, set `hive.groupby.position.alias` to true (the default is false) + +## Parameters + +### GROUPING SETS + +`GROUPING SETS` allow for more complex grouping operations than those describable by a standard `GROUP BY`. +Rows are grouped separately by each specified grouping set and aggregates are computed for each group just as for simple `GROUP BY` clauses. + +All `GROUPING SET` clauses can be logically expressed in terms of several `GROUP BY` queries connected by `UNION`. + +For example: +```sql +SELECT a, b, SUM( c ) FROM tab1 GROUP BY a, b GROUPING SETS ( (a, b), a, b, ( ) ) +``` +is equivalent to +```sql +SELECT a, b, SUM( c ) FROM tab1 GROUP BY a, b +UNION +SELECT a, null, SUM( c ) FROM tab1 GROUP BY a, null +UNION +SELECT null, b, SUM( c ) FROM tab1 GROUP BY null, b +UNION +SELECT null, null, SUM( c ) FROM tab1 +``` +When aggregates are displayed for a column its value is null. This may conflict in case the column itself has some null values. +There needs to be some way to identify NULL in column, which means aggregate and NULL in column, which means `GROUPING__ID` function is the solution to that. + +This function returns a bitvector corresponding to whether each column is present or not. +For each column, a value of "1" is produced for a row in the result set if that column has been aggregated in that row, otherwise the value is "0". +This can be used to differentiate when there are nulls in the data. +For more details, please refer to Hive's docs [Grouping__ID function](https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup#EnhancedAggregation,Cube,GroupingandRollup-Grouping__IDfunction). + +Also, there's [Grouping function](https://cwiki.apache.org/confluence/display/Hive/Enhanced+Aggregation%2C+Cube%2C+Grouping+and+Rollup#EnhancedAggregation,Cube,GroupingandRollup-Groupingfunction) indicates whether an expression in a `GROUP BY` clause is aggregated or not for a given row. +The value 0 represents a column that is part of the grouping set, while the value 1 represents a column that is not part of the grouping set. + +### ROLLUP + +`ROLLUP` is a shorthand notation for specifying a common type of grouping set. +It represents the given list of expressions and all prefixes of the list, including the empty list. +For example: +```sql +GROUP BY a, b, c WITH ROLLUP +``` +is equivalent to +```sql +GROUP BY a, b, c GROUPING SETS ( (a, b, c), (a, b), (a), ( )). +``` + +### CUBE + +`CUBE` is a shorthand notation for specifying a common type of grouping set. +It represents the given list and all of its possible subsets - the power set. + +For example: +```sql +GROUP BY a, b, c, WITH CUBE Review Comment: GROUP BY a, b, c WITH CUBE ? ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/transform.md: ########## @@ -0,0 +1,129 @@ +--- +title: "Transform Clause" +weight: 10 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Transform Clause + +## Description + +The `TRANSFORM` clause allows user to transform inputs using user-specified command or script. + +## Syntax + +```sql +rowFormat + : ROW FORMAT + (DELIMITED [FIELDS TERMINATED BY char] + [COLLECTION ITEMS TERMINATED BY char] + [MAP KEYS TERMINATED BY char] + [ESCAPED BY char] + [LINES SEPARATED BY char] + | + SERDE serde_name [WITH SERDEPROPERTIES + property_name=property_value, + property_name=property_value, ...]) + +outRowFormat : rowFormat +inRowFormat : rowFormat +outRecordReader : RECORDREADER className +inRecordWriter: RECORDWRITER record_write_class + +query: + SELECT TRANSFORM '(' expression [ , ... ] ')' Review Comment: ``` SELECT TRANSFORM (<expresssion> [, ...]) [ inRowFormat ] [ inRecordWriter ] USING command_or_script [ AS colName [colType][, ...]] [ outRowFormat [outRecordReader]] ``` It's better we can move the query at the beginning and explain each part later. ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/create.md: ########## @@ -0,0 +1,247 @@ +--- +title: "CREATE Statements" +weight: 2 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/create.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# CREATE Statements + +With Hive dialect, the following CREATE statements are supported for now: + +- CREATE DATABASE +- CREATE TABLE +- CREATE VIEW +- CREATE MARCO +- CREATE FUNCTION + +## CREATE DATABASE + +### Description + +`CREATE DATABASE` statement is used to create a database with the specified name. + +### Syntax + +```sql +CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name + [COMMENT database_comment] + [LOCATION hdfs_path] + [WITH DBPROPERTIES (property_name=property_value, ...)]; +``` + +### Examples + +```sql +CREATE DATABASE db1; +CREATE DATABASE IF NOT EXISTS db1 COMMENT 'db1' LOCATION '/user/hive/warehouse/db1' + WITH DBPROPERTIES ('name'='example-db'); +``` + + +## CREATE TABLE + +### Description + +`CREATE TABLE` statement is used to define a table in an existing database. + +### Syntax + +```sql +CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name + [(col_name data_type [column_constraint] [COMMENT col_comment], ... [table_constraint])] + [COMMENT table_comment] + [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] + [ + [ROW FORMAT row_format] + [STORED AS file_format] + ] + [LOCATION fs_path] + [TBLPROPERTIES (property_name=property_value, ...)] + [AS select_statment]; + +data_type + : primitive_type + | array_type + | map_type + | struct_type +primitive_type + : TINYINT + | SMALLINT + | INT + | BIGINT + | BOOLEAN + | FLOAT + | DOUBLE + | DOUBLE PRECISION + | STRING + | BINARY + | TIMESTAMP + | DECIMAL + | DECIMAL(precision, scale) + | DATE + | VARCHAR + | CHAR +array_type + : ARRAY < data_type > + +array_type + : ARRAY < data_type > +struct_type + : STRUCT < col_name : data_type [COMMENT col_comment], ...> +row_format: + : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char] + [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] + [NULL DEFINED AS char] + | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, ...)] +file_format: + : SEQUENCEFILE + | TEXTFILE + | RCFILE + | ORC + | PARQUET + | AVRO + | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname +column_constraint: + : NOT NULL +table_constraint: + : [CONSTRAINT constraint_name] PRIMARY KEY (col_name, ...) +``` + +{{< hint warning >}} +**NOTE:** + +- Create table with `STORED BY 'class_name'` / `CLUSTERED BY` / `SKEWED BY` is not supported yet. +- Create temporary table is not supported yet. + {{< /hint >}} + +### Examples + +```sql +-- creaet non-partition table +CREATE TABLE t1(key string, value string); + +-- creaet partitioned table +CREATE TABLE pt1(key string, value string) PARTITIONED by (year int, month int); + +-- creaet table with specifc format +CREATE TABLE t1(key string, value string) stored as ORC; + +-- create table with specifc rowfromat +CREATE TABLE t1(m MAP<BIGINT, STRING>) + ROW FROMAT DELIMITED COLLECTION ITEMS TERMINATED BY ';' + MAP KEYS TERMINATED BY ':'; + +-- create table as select +CREATE TABLE t2 AS SELECT key, COUNT(1) FROM t1 GROUP BY key; +``` + +## CREATE VIEW + +### Description + +`View` +`CREATE VIEW` creates a view with the given name. Review Comment: Remove `View` at the beginning. `CREATE VIEW` statement creates ... ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op.md: ########## @@ -0,0 +1,95 @@ +--- +title: "Set Operations" +weight: 5 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Set Operations + +Set Operation is used to combing two select into single one. Review Comment: is used to combine the result from multiple SELECT statements into a single result set. ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/set.md: ########## @@ -0,0 +1,72 @@ +--- +title: "SET Statements" +weight: 8 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/create.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# SET Statements + +## Description + +The `SET` statement sets a property which provide a ways to set variables for a session and +configuration property including system variable and Hive configuration. +But environment variable can't be set via `SET` statement. The behavior of `SET` with Hive dialect is compatible to Hive's. + +## Syntax + +```sql +SET key=value; Review Comment: The examples are much different from the syntax... ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/load-data.md: ########## @@ -0,0 +1,86 @@ +--- +title: "Load Data Statements" +weight: 4 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/load.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Load Data Statements + +## Description + +The `LOAD DATA` statement is used to load the data into a Hive table from the user specified directory or file. +The load operation are currently pure copy/move operations that move data files into locations corresponding to Hive tables. Review Comment: copy/move? It seems it always move the origin file no matter whether it parse the row successfully.... ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op.md: ########## @@ -0,0 +1,95 @@ +--- +title: "Set Operations" +weight: 5 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Set Operations + +Set Operation is used to combing two select into single one. +Hive dialect supports the following operations: +- UNION +- INTERSECT +- EXCEPT/MINUS + +## UNION + +### Description + +`UNION`/`UNION DISTINCT`/`UNION ALL` returns the rows that are found in either side. + +`UNION` and `UNION DISTINCT` only returns the distinct rows, while `UNION ALL` does not duplicate. + +### Syntax + +```sql +select_statement { UNION [ ALL | DISTINCT ] } select_statement [ .. ] Review Comment: use <query> to replace `select_statement` ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries.md: ########## @@ -0,0 +1,69 @@ +--- +title: "Sub-Queries" +weight: 8 +type: docs +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Sub-Queries + +## Sub-Queries in the FROM Clause + +### Description + +Hive dialect supports sub-queries in the `FROM` clause. The sub-query has to be given a name because every table in a `FROM` clause must have a name. +Columns in the sub-query select list must have unique names. +The columns in the sub-query select list are available in the outer query just like columns of a table. +The sub-query can also be a query expression with `UNION`. Hive dialect supports arbitrary levels of sub-queries. + +### Syntax + +```sql +select_statement from ( subquery_select_statement ) [ AS ] name Review Comment: The syntax is a little confusing to me. Can we explain more details to describe the select_statment or subquery_select_statement. Maybe we move the describtion to the overview.md and tell users the query is recursive. ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview.md: ########## @@ -0,0 +1,163 @@ +--- +title: "Overview" +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/Queries/overview +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Queries + +## Description + +Hive dialect supports a commonly-used subset of Hive’s [DQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select). +The following lists some parts of HiveQL supported by the Hive dialect. + +- [Sort/Cluster/Distributed BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by" >}}) +- [Group By]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/group-by" >}}) +- [Join]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/join" >}}) +- [Set Operation]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op" >}}) +- [Lateral View]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/lateral-view" >}}) +- [Window Functions]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/window-functions" >}}) +- [SubQueries]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) +- [CTE]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/cte" >}}) +- [Transform]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/transform" >}}) +- [Table Sample]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/table-sample" >}}) + +## Syntax + +The following section describes the overall query syntax. +The SELECT clause can be part of a query which also includes common table expressions (CTE), set operations, and various other clauses. + +```sql +[WITH CommonTableExpression (, CommonTableExpression)*] Review Comment: I just wonder whether it's better: ``` <query>: [WITH CommonTableExpression (, CommonTableExpression)*] SELECT [ALL | DISTINCT] col_list FROM table_reference | query [WHERE where_condition] [GROUP BY col_list] [ORDER BY col_list] [CLUSTER BY col_list | [DISTRIBUTE BY col_list] [SORT BY col_list] ] [LIMIT [offset,] rows] ``` to indicate the query is recursible. ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/overview.md: ########## @@ -0,0 +1,97 @@ +--- +title: "Overview" +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/overview +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Hive Dialect + +Flink allows users to write SQL statements in Hive syntax when Hive dialect is used. +By providing compatibility with Hive syntax, we aim to improve the interoperability with Hive and reduce the scenarios when users need to switch between Flink and Hive in order to execute different statements. + +## Use Hive Dialect + +Flink currently supports two SQL dialects: `default` and `hive`. You need to switch to Hive dialect +before you can write in Hive syntax. The following describes how to set dialect with +SQL Client and Table API. Also notice that you can dynamically switch dialect for each +statement you execute. There's no need to restart a session to use a different dialect. + +{{< hint warning >}} +**Note:** + +- To use Hive dialect, you have to add dependencies related to Hive. Please refer to [Hive dependencies]({{< ref "docs/connectors/table/hive/overview" >}}#dependencies) for how to add the dependencies. +- Please make sure the current catalog is [HiveCatalog]({{< ref "docs/connectors/table/hive/hive_catalog" >}}). Otherwise, it will fall back to Flink's `default` dialect. +- In order to have better syntax and semantic compatibility, it’s highly recommended to load [HiveModule]({{< ref "docs/connectors/table/hive/hive_functions" >}}#use-hive-built-in-functions-via-hivemodule) and + place it first in the module list, so that Hive built-in functions can be picked up during function resolution. + Please refer [here]({{< ref "docs/dev/table/modules" >}}#how-to-load-unload-use-and-list-modules) for how to change resolution order. +- Hive dialect only supports 2-part identifiers, so you can't specify catalog for an identifier. +- While all Hive versions support the same syntax, whether a specific feature is available still depends on the + [Hive version]({{< ref "docs/connectors/table/hive/overview" >}}#supported-hive-versions) you use. For example, updating database + location is only supported in Hive-2.4.0 or later. +{{< /hint >}} + +### SQL Client + +SQL dialect can be specified via the `table.sql-dialect` property. +Therefore,you can set the dialect after the SQL Client has launched. + +```bash +Flink SQL> SET 'table.sql-dialect' = 'hive'; -- to use hive dialect +[INFO] Session property has been set. + +Flink SQL> SET 'table.sql-dialect' = 'default'; -- to use default dialect +[INFO] Session property has been set. +``` + +{{< hint warning >}} +**Note:** +Since Flink 1.15, when you want to use Hive dialect in Flink SQL client, you have to swap the jar `flink-table-planner-loader` located in `FLINK_HOME/lib` Review Comment: in Flink SQL client -> in Flink SQL client or SQL Gateway ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/show.md: ########## @@ -0,0 +1,119 @@ +--- +title: "Show Statements" +weight: 5 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/create.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# SHOW Statements + +With Hive dialect, the following SHOW statements are supported for now: + +- SHOW DATABASES +- SHOW TABLES +- SHOW VIEWS +- SHOW PARTITIONS +- SHOW FUNCTIONS + +## SHOW DATABASES + +### Description + +`SHOW DATABASES` statement is used to list all the databases defined in the metastore. + +### Syntax + +```sql +SHOW (DATABASES|SCHEMAS); +``` +The use of `SCHEMA` and `DATABASE` are interchangeable - they mean the same thing. + + +## SHOW TABLES + +### Description + +`SHOW TABLES` statement lists all the base tables and views in the current database. + +### Syntax + +```sql +SHOW TABLES; +``` + +## SHOW VIEWS + +### Description + +`SHOW VIEWS` statement lists all the views in the current database. + +### Syntax + +```sql +SHOW VIEWS; +``` + +## SHOW PARTITIONS + +### Description + +`SHOW PARTITIONS` lists all the existing partitions or the partitions matching the specified partition spec for a given base table. + +### Syntax + +```sql +SHOW PARTITIONS table_name [ partition_spec ]; +partition_spec: + : (partition_column = partition_col_value, partition_column = partition_col_value, ...) +``` + +### Parameter + +- partition_spec + + The optional `partition_spec` is used to what kind of partition should be returned. + When specified, the partitions that match the `partition_spec` specification are returned. + The `partition_spec` can be partial. Review Comment: Can we explain more details what is partial? ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/overview.md: ########## @@ -0,0 +1,97 @@ +--- +title: "Overview" +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/overview +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Hive Dialect + +Flink allows users to write SQL statements in Hive syntax when Hive dialect is used. +By providing compatibility with Hive syntax, we aim to improve the interoperability with Hive and reduce the scenarios when users need to switch between Flink and Hive in order to execute different statements. + +## Use Hive Dialect + +Flink currently supports two SQL dialects: `default` and `hive`. You need to switch to Hive dialect +before you can write in Hive syntax. The following describes how to set dialect with +SQL Client and Table API. Also notice that you can dynamically switch dialect for each +statement you execute. There's no need to restart a session to use a different dialect. + +{{< hint warning >}} +**Note:** + +- To use Hive dialect, you have to add dependencies related to Hive. Please refer to [Hive dependencies]({{< ref "docs/connectors/table/hive/overview" >}}#dependencies) for how to add the dependencies. +- Please make sure the current catalog is [HiveCatalog]({{< ref "docs/connectors/table/hive/hive_catalog" >}}). Otherwise, it will fall back to Flink's `default` dialect. +- In order to have better syntax and semantic compatibility, it’s highly recommended to load [HiveModule]({{< ref "docs/connectors/table/hive/hive_functions" >}}#use-hive-built-in-functions-via-hivemodule) and + place it first in the module list, so that Hive built-in functions can be picked up during function resolution. + Please refer [here]({{< ref "docs/dev/table/modules" >}}#how-to-load-unload-use-and-list-modules) for how to change resolution order. +- Hive dialect only supports 2-part identifiers, so you can't specify catalog for an identifier. +- While all Hive versions support the same syntax, whether a specific feature is available still depends on the + [Hive version]({{< ref "docs/connectors/table/hive/overview" >}}#supported-hive-versions) you use. For example, updating database + location is only supported in Hive-2.4.0 or later. +{{< /hint >}} + +### SQL Client + +SQL dialect can be specified via the `table.sql-dialect` property. +Therefore,you can set the dialect after the SQL Client has launched. + +```bash +Flink SQL> SET 'table.sql-dialect' = 'hive'; -- to use hive dialect +[INFO] Session property has been set. + +Flink SQL> SET 'table.sql-dialect' = 'default'; -- to use default dialect +[INFO] Session property has been set. +``` + +{{< hint warning >}} +**Note:** +Since Flink 1.15, when you want to use Hive dialect in Flink SQL client, you have to swap the jar `flink-table-planner-loader` located in `FLINK_HOME/lib` Review Comment: BTW, we don't need to swap. Just move out is enough ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/overview.md: ########## @@ -0,0 +1,97 @@ +--- +title: "Overview" +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/overview +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Hive Dialect + +Flink allows users to write SQL statements in Hive syntax when Hive dialect is used. +By providing compatibility with Hive syntax, we aim to improve the interoperability with Hive and reduce the scenarios when users need to switch between Flink and Hive in order to execute different statements. + +## Use Hive Dialect + +Flink currently supports two SQL dialects: `default` and `hive`. You need to switch to Hive dialect +before you can write in Hive syntax. The following describes how to set dialect with +SQL Client and Table API. Also notice that you can dynamically switch dialect for each +statement you execute. There's no need to restart a session to use a different dialect. + +{{< hint warning >}} +**Note:** + +- To use Hive dialect, you have to add dependencies related to Hive. Please refer to [Hive dependencies]({{< ref "docs/connectors/table/hive/overview" >}}#dependencies) for how to add the dependencies. +- Please make sure the current catalog is [HiveCatalog]({{< ref "docs/connectors/table/hive/hive_catalog" >}}). Otherwise, it will fall back to Flink's `default` dialect. +- In order to have better syntax and semantic compatibility, it’s highly recommended to load [HiveModule]({{< ref "docs/connectors/table/hive/hive_functions" >}}#use-hive-built-in-functions-via-hivemodule) and + place it first in the module list, so that Hive built-in functions can be picked up during function resolution. + Please refer [here]({{< ref "docs/dev/table/modules" >}}#how-to-load-unload-use-and-list-modules) for how to change resolution order. +- Hive dialect only supports 2-part identifiers, so you can't specify catalog for an identifier. +- While all Hive versions support the same syntax, whether a specific feature is available still depends on the + [Hive version]({{< ref "docs/connectors/table/hive/overview" >}}#supported-hive-versions) you use. For example, updating database + location is only supported in Hive-2.4.0 or later. +{{< /hint >}} + +### SQL Client + +SQL dialect can be specified via the `table.sql-dialect` property. +Therefore,you can set the dialect after the SQL Client has launched. + +```bash +Flink SQL> SET 'table.sql-dialect' = 'hive'; -- to use hive dialect +[INFO] Session property has been set. + +Flink SQL> SET 'table.sql-dialect' = 'default'; -- to use default dialect Review Comment: It will throw exception actually. It should be ``` SET table.sql-dialect = default; ``` BTW, I think it's better we can support SET 'xxx' = 'xxx' in hive dialect. ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview.md: ########## @@ -0,0 +1,163 @@ +--- +title: "Overview" +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/Queries/overview +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Queries + +## Description + +Hive dialect supports a commonly-used subset of Hive’s [DQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select). +The following lists some parts of HiveQL supported by the Hive dialect. + +- [Sort/Cluster/Distributed BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by" >}}) +- [Group By]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/group-by" >}}) +- [Join]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/join" >}}) +- [Set Operation]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op" >}}) +- [Lateral View]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/lateral-view" >}}) +- [Window Functions]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/window-functions" >}}) +- [SubQueries]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) +- [CTE]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/cte" >}}) +- [Transform]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/transform" >}}) +- [Table Sample]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/table-sample" >}}) + +## Syntax + +The following section describes the overall query syntax. +The SELECT clause can be part of a query which also includes common table expressions (CTE), set operations, and various other clauses. + +```sql +[WITH CommonTableExpression (, CommonTableExpression)*] +SELECT [ALL | DISTINCT] select_expr, select_expr, ... + FROM table_reference + [WHERE where_condition] + [GROUP BY col_list] + [ORDER BY col_list] + [CLUSTER BY col_list + | [DISTRIBUTE BY col_list] [SORT BY col_list] + ] + [LIMIT [offset,] rows] +``` +- A `SELECT` statement can be part of a [set]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op" >}}) query or a [subquery]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) of another query +- `table_reference` indicates the input to the query. It can be a regular table, a view, a [join]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/join" >}}) or a [subquery]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}). +- Table names and column names are case-insensitive + +### WHERE Clause + +The `WHERE` condition is a boolean expression. Hive dialect supports a number of [operators and UDFS](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+UDF) +in the `WHERE` clause. Some types of [sub queries]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) are supported in `WHERE` clause. + +### GROUP BY Clause + +Please refer to [GROUP BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/group-by" >}}) for more details. + +### ORDER BY Clause + +The `ORDER BY` clause is used to return the result rows in a sorted manner in the user specified order. +Different from [SORT BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by" >}}#sort-by), `ORDER BY` clause guarantees +a total order in the output. Review Comment: a total order -> a global order ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/Queries/overview.md: ########## @@ -0,0 +1,163 @@ +--- +title: "Overview" +weight: 1 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/Queries/overview +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Queries + +## Description + +Hive dialect supports a commonly-used subset of Hive’s [DQL](https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Select). +The following lists some parts of HiveQL supported by the Hive dialect. + +- [Sort/Cluster/Distributed BY]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sort-cluster-distribute-by" >}}) +- [Group By]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/group-by" >}}) +- [Join]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/join" >}}) +- [Set Operation]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op" >}}) +- [Lateral View]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/lateral-view" >}}) +- [Window Functions]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/window-functions" >}}) +- [SubQueries]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) +- [CTE]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/cte" >}}) +- [Transform]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/transform" >}}) +- [Table Sample]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/table-sample" >}}) + +## Syntax + +The following section describes the overall query syntax. +The SELECT clause can be part of a query which also includes common table expressions (CTE), set operations, and various other clauses. + +```sql +[WITH CommonTableExpression (, CommonTableExpression)*] +SELECT [ALL | DISTINCT] select_expr, select_expr, ... + FROM table_reference + [WHERE where_condition] + [GROUP BY col_list] + [ORDER BY col_list] + [CLUSTER BY col_list + | [DISTRIBUTE BY col_list] [SORT BY col_list] + ] + [LIMIT [offset,] rows] +``` +- A `SELECT` statement can be part of a [set]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/set-op" >}}) query or a [subquery]({{< ref "docs/dev/table/hiveCompatibility/hiveDialect/Queries/sub-queries" >}}) of another query Review Comment: A `SELECT` statement -> The `select_expr` ########## docs/content/docs/dev/table/hiveCompatibility/hiveDialect/load-data.md: ########## @@ -0,0 +1,86 @@ +--- +title: "Load Data Statements" +weight: 4 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/load.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# Load Data Statements + +## Description + +The `LOAD DATA` statement is used to load the data into a Hive table from the user specified directory or file. +The load operation are currently pure copy/move operations that move data files into locations corresponding to Hive tables. + +## Syntax + +```sql +LOAD DATA [LOCAL] INPATH 'filepath' [OVERWRITE] INTO TABLE tablename [PARTITION (partcol1=val1, partcol2=val2 ...)]; +``` + +## Parameters + +- filepath + + The `filepath` can be: + - a relative path, such as `warehouse/data1` + - an absolute path, such as `/user/hive/warehouse/data1` + - a full URL with schema and (optionally) an authority, such as `hdfs://namenode:9000/user/hive/warehouse/data1` + + The `filepath` can refer to a file (in which case, only the single file is loaded) or it can be a directory (in which case, all the files from + the directory are loaded). + +- `LOCAL` + + If specify `LOCAL` keyword, then: + - it will look for `filepath` in the local file system. If a relative path is specified, it will be interpreted relative to the users' current working directory. + The user can specify a full URI for local files as well - for example: file:///user/hive/warehouse/data1 + - it will try to **copy** all the files addressed by `filepath` to the target file system. + The target file system is inferred by looking at the location attribution. The coped data files will then be moved to the table. Review Comment: coped -> copied be moved to the table -> be moved to location of the table ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/add.md: ########## @@ -0,0 +1,54 @@ +--- +title: "ADD Statements" +weight: 7 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/create.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# ADD Statements + +With Hive dialect, the following `ADD` statements are supported for now: +- ADD JAR + +## ADD JAR + +### Description + +`ADD JAR` statement is used to add user jars into the classpath. +Add multiple jars file in single `ADD JAR` statement is not supported. + + +### Syntax + +```sql +ADD JAR filename; Review Comment: I think it should be `<jar_path>`. ########## docs/content.zh/docs/dev/table/hiveCompatibility/hiveDialect/create.md: ########## @@ -0,0 +1,247 @@ +--- +title: "CREATE Statements" +weight: 2 +type: docs +aliases: +- /dev/table/hiveCompatibility/hiveDialect/create.html +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + http://www.apache.org/licenses/LICENSE-2.0 +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + +# CREATE Statements + +With Hive dialect, the following CREATE statements are supported for now: + +- CREATE DATABASE +- CREATE TABLE +- CREATE VIEW +- CREATE MARCO +- CREATE FUNCTION + +## CREATE DATABASE + +### Description + +`CREATE DATABASE` statement is used to create a database with the specified name. + +### Syntax + +```sql +CREATE (DATABASE|SCHEMA) [IF NOT EXISTS] database_name + [COMMENT database_comment] + [LOCATION hdfs_path] + [WITH DBPROPERTIES (property_name=property_value, ...)]; +``` + +### Examples + +```sql +CREATE DATABASE db1; +CREATE DATABASE IF NOT EXISTS db1 COMMENT 'db1' LOCATION '/user/hive/warehouse/db1' + WITH DBPROPERTIES ('name'='example-db'); +``` + + +## CREATE TABLE + +### Description + +`CREATE TABLE` statement is used to define a table in an existing database. + +### Syntax + +```sql +CREATE [EXTERNAL] TABLE [IF NOT EXISTS] [db_name.]table_name + [(col_name data_type [column_constraint] [COMMENT col_comment], ... [table_constraint])] + [COMMENT table_comment] + [PARTITIONED BY (col_name data_type [COMMENT col_comment], ...)] + [ + [ROW FORMAT row_format] + [STORED AS file_format] + ] + [LOCATION fs_path] + [TBLPROPERTIES (property_name=property_value, ...)] + [AS select_statment]; + +data_type + : primitive_type + | array_type + | map_type + | struct_type +primitive_type + : TINYINT + | SMALLINT + | INT + | BIGINT + | BOOLEAN + | FLOAT + | DOUBLE + | DOUBLE PRECISION + | STRING + | BINARY + | TIMESTAMP + | DECIMAL + | DECIMAL(precision, scale) + | DATE + | VARCHAR + | CHAR +array_type + : ARRAY < data_type > + +array_type + : ARRAY < data_type > +struct_type + : STRUCT < col_name : data_type [COMMENT col_comment], ...> +row_format: + : DELIMITED [FIELDS TERMINATED BY char [ESCAPED BY char]] [COLLECTION ITEMS TERMINATED BY char] + [MAP KEYS TERMINATED BY char] [LINES TERMINATED BY char] + [NULL DEFINED AS char] + | SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, ...)] +file_format: + : SEQUENCEFILE + | TEXTFILE + | RCFILE + | ORC + | PARQUET + | AVRO + | INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname +column_constraint: + : NOT NULL +table_constraint: + : [CONSTRAINT constraint_name] PRIMARY KEY (col_name, ...) +``` + +{{< hint warning >}} +**NOTE:** + +- Create table with `STORED BY 'class_name'` / `CLUSTERED BY` / `SKEWED BY` is not supported yet. +- Create temporary table is not supported yet. + {{< /hint >}} + +### Examples + +```sql +-- creaet non-partition table +CREATE TABLE t1(key string, value string); + +-- creaet partitioned table +CREATE TABLE pt1(key string, value string) PARTITIONED by (year int, month int); + +-- creaet table with specifc format +CREATE TABLE t1(key string, value string) stored as ORC; + +-- create table with specifc rowfromat +CREATE TABLE t1(m MAP<BIGINT, STRING>) + ROW FROMAT DELIMITED COLLECTION ITEMS TERMINATED BY ';' + MAP KEYS TERMINATED BY ':'; + +-- create table as select +CREATE TABLE t2 AS SELECT key, COUNT(1) FROM t1 GROUP BY key; +``` + +## CREATE VIEW + +### Description + +`View` +`CREATE VIEW` creates a view with the given name. +If no column names are supplied, the names of the view's columns will be derived automatically from the defining SELECT expression. +(If the SELECT contains un-aliased scalar expressions such as x+y, the resulting view column names will be generated in the form _C0, _C1, etc.) +When renaming columns, column comments can also optionally be supplied. (Comments are not automatically inherited from underlying columns.) + +Note that a view is a purely logical object with no associated storage. When a query references a view, the view's definition is evaluated in order to produce a set of rows for further processing by the query. + +### Syntax + +```sql +CREATE VIEW [IF NOT EXISTS] [db_name.]view_name [(column_name, ...) ] + [COMMENT view_comment] + [TBLPROPERTIES (property_name = property_value, ...)] + AS SELECT ...; +``` + +### Examples + +```sql +CREATE VIEW IF NOT EXISTS v1 + (key COMMENT 'key') + COMMENT 'View for key=1' + AS SELECT key FROM src + WHERE key = '1'; +``` + +## CREATE MARCO + +### Description + +`CREATE TEMPORARY MACRO` statement creates a macro using the given optional list of columns as inputs to the expression. +Macros exists for the duration of the current session. + +### Syntax + +```sql +CREATE TEMPORARY MACRO macro_name([col_name col_type, ...]) expression; +``` + +### Examples + +```sql +CREATE TEMPORARY MACRO fixed_number() 42; +CREATE TEMPORARY MACRO string_len_plus_two(x string) length(x) + 2; +CREATE TEMPORARY MACRO simple_add (x int, y int) x + y; +``` + +## CREATE FUNCTION + +### Description + +` CREATE FUNCTION` statement creates a function that is implemented by the class_name. + +### Syntax + +#### Create Temporary Function + +```sql +CREATE TEMPORARY FUNCTION function_name AS class_name [USING JAR 'file_uri']; +``` + +The function exists for the duration of the current session. + +#### Create Permanent Function + +```sql +CREATE FUNCTION [db_name.]function_name AS class_name + [USING JAR 'file_uri']; +``` +The function is registered to metastore and will exist in all session unless the function is dropped. + +### Parameter +- `[USING JAR 'file_uri']` + + User can use the clause to add Jar that contains the implementation of the function along with its dependencies while creating the function. + The `file_uri` can be a local file or distributed file system. Review Comment: We'd better add a case to tell users Flink supports to download the resources. BTW, the downloaded resources will be removed when table program finish or session exits. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
