sjwiesman commented on a change in pull request #14003: URL: https://github.com/apache/flink/pull/14003#discussion_r521573057
########## File path: docs/dev/table/sql/gettingStarted.md ########## @@ -0,0 +1,200 @@ +--- +title: "Getting Started - Flink SQL" +nav-parent_id: sql +nav-pos: 0 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +* This will be replaced by the TOC +{:toc} + +Flink SQL enables SQL developers to design and develop the batch or streaming application without writing the Java, Scala, or any other programming language code. It provides a unified API for both batch and streaming APIs. As a user, you can perform powerful transformations. Flink’s SQL support is based on [Apache Calcite](https://calcite.apache.org/) which implements the SQL standard. Review comment: Generally like to focus on streaming over batch. ```suggestion Flink SQL enables SQL developers to design and develop the batch or streaming application without writing the Java, Scala, or any other programming language code. It provides a unified API for both stream and batch processing. As a user, you can perform powerful transformations. Flink’s SQL support is based on [Apache Calcite](https://calcite.apache.org/) which implements the SQL standard. ``` ########## File path: docs/dev/table/sql/gettingStarted.md ########## @@ -0,0 +1,200 @@ +--- +title: "Getting Started - Flink SQL" +nav-parent_id: sql +nav-pos: 0 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +* This will be replaced by the TOC +{:toc} + +Flink SQL enables SQL developers to design and develop the batch or streaming application without writing the Java, Scala, or any other programming language code. It provides a unified API for both batch and streaming APIs. As a user, you can perform powerful transformations. Flink’s SQL support is based on [Apache Calcite](https://calcite.apache.org/) which implements the SQL standard. + +In addition to the SQL API, Flink also has Table API as well with similar semantics as SQL. Table API is a Language integrated API, where we use a specific programming language to write the queries or call the API. For example, we create the table environment, get a table object, and apply different methods that return table API objects. It supports different languages e.g. Java, Scala, Python. + +Flink SQL and Table API are just two ways to write queries that use the same API underneath. It wouldn’t be wrong if we say Table API is a wrapper on top of the streaming API. SQL API is a more descriptive way of writing queries using well-known SQL standards where we usually have ‘select * from Table’ pattern. Table API query starts with from clause, followed by joins and where clause, and then finally projection or select at the last. SQL API is easy to learn and almost everyone knows it already. All the queries are optimized for efficient execution. We will focus on the Flink SQL API, while you can read more about Table API [here]({{ site.baseurl }}/dev/table/). + +### Pre-requisites +You only need to have basic knowledge of SQL to follow along. You will not need to write Java or Scala code or use an IDE. + +### Installation +There are various ways to install the Flink. You can download the source code, compile it, and run it. Another option is to have it running inside the container. Probably the easiest one is to download the binaries and run it locally for experimentation. We assume local installation for the rest of the tutorial. You can start a local cluster using the following command from the installation folder Review comment: > There are various ways to install the Flink Its not *the flink* just Flink. "There are various ways to install Flink". > You can download the source code, compile it, and run it. Compiling flink is difficult, I'd rather not mention that in a getting started guide. If we are guiding users towards local installation, how about a link to the downloads page. ########## File path: docs/dev/table/sql/gettingStarted.md ########## @@ -0,0 +1,200 @@ +--- +title: "Getting Started - Flink SQL" +nav-parent_id: sql +nav-pos: 0 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +* This will be replaced by the TOC +{:toc} + +Flink SQL enables SQL developers to design and develop the batch or streaming application without writing the Java, Scala, or any other programming language code. It provides a unified API for both batch and streaming APIs. As a user, you can perform powerful transformations. Flink’s SQL support is based on [Apache Calcite](https://calcite.apache.org/) which implements the SQL standard. + +In addition to the SQL API, Flink also has Table API as well with similar semantics as SQL. Table API is a Language integrated API, where we use a specific programming language to write the queries or call the API. For example, we create the table environment, get a table object, and apply different methods that return table API objects. It supports different languages e.g. Java, Scala, Python. + +Flink SQL and Table API are just two ways to write queries that use the same API underneath. It wouldn’t be wrong if we say Table API is a wrapper on top of the streaming API. SQL API is a more descriptive way of writing queries using well-known SQL standards where we usually have ‘select * from Table’ pattern. Table API query starts with from clause, followed by joins and where clause, and then finally projection or select at the last. SQL API is easy to learn and almost everyone knows it already. All the queries are optimized for efficient execution. We will focus on the Flink SQL API, while you can read more about Table API [here]({{ site.baseurl }}/dev/table/). + +### Pre-requisites +You only need to have basic knowledge of SQL to follow along. You will not need to write Java or Scala code or use an IDE. + +### Installation +There are various ways to install the Flink. You can download the source code, compile it, and run it. Another option is to have it running inside the container. Probably the easiest one is to download the binaries and run it locally for experimentation. We assume local installation for the rest of the tutorial. You can start a local cluster using the following command from the installation folder + +{% highlight bash %} +./bin/start-cluster.sh +{% endhighlight %} + +Once the cluster is started, it will also start a web server on [localhost:8081](localhost:8081) to manage settings and monitor the different jobs + +### SQL Client +SQL Client is an interactive client to submit SQL queries to Flink and visualize the results. It’s like a query editor for any other database management system where you can write queries using standard SQL. You can start the SQL client from the installation folder as follows + + {% highlight bash %} +./bin/sql-client.sh embedded + {% endhighlight %} + + +This will start the SQL client in embedded mode. In the future, a user will have two possibilities of starting the SQL Client CLI either by starting an embedded standalone process or by connecting to a remote SQL Client Gateway. At the moment only the embedded mode is supported. + +### Hello World query + +Once SQL client, our query editor, is up and running we can start writing SQL queries. These queries will be submitted to the Flink cluster for computation and results will be returned to the SQL client UI. Let's start with printing 'Hello World'. You can print hello world using the following simple query + +{% highlight sql %} +SELECT 'Hello World'; +{% endhighlight %} + +The following example will print the current stamp + +{% highlight sql %} +SELECT CURRENT_TIMESTAMP; +{% endhighlight %} + + +`Help;` command is used to see different supported DDL (Data definition language) commands. Furthermore, Flink SQL does support different built-in functions as well. The following query will show all the built-in and user defined functions. +{% highlight sql %} +SHOW FUNCTIONS; +{% endhighlight %} + +## Setting up tables +Real-world database queries are run against the SQL tables. Although Flink is a stream processing engine, users can define a table on top of the streaming data. As we know, all the Flink data processing pipelines generally have three components - source, compute, sink. + +The source is our input or from where we read the data e.g. a text file, Kafka topic. Then we defined some computations that needed to be performed on those data elements. Finally, the sink defines what to do with the output or where to store the results. A sink can be a console log, another output file, or a Kafka topic. It's similar to a database query where we read data from a table, perform a query on it and then display or store the results. + +In Flink SQL semantics, source and sink will be tables, but we know Flink isn’t a storage engine and cannot store the data. So we need to back our table with a [storage connector]({{ site.baseurl }}/dev/table/connect.html) like [file system]({{ site.baseurl }}/dev/table/connect.html#file-system-connector), [Kafka]({{ site.baseurl }}/dev/table/connect.html#kafka-connector) or [MySQL]({{ site.baseurl }}/dev/table/connect.html#jdbc-connector). When we will be defining these tables, we will configure the storage connector type, [format]({{ site.baseurl }}/dev/table/connect.html#table-formats) and schema for each table. + + +### Input Source Tables +SQL API environment is configured using the YAML[yaml.org] configuration files. When we start SQL client, it reads the default configuration from the `/conf/sql-client-defaults.yaml` but it can be overriden by user defined configuration file. These files are used to define different environment variables including table source, sinks, [catalogs](({{ site.baseurl }}/dev/table/catalogs.html)), [user-defined functions](({{ site.baseurl }}/dev/table/functions/udfs.html)). + +Tables can be defined through the environment config file or using the SQL Client. Environment file will have YAML format while SQL client will use [SQL DDL commands](({{ site.baseurl }}/dev/table/sql)). + +Following is an example to define a source table using file system with [csv format]({{ site.baseurl }}/dev/table/connectors/formats/csv.html) in the environment config file but Flink community has added the support for quite a few [formats]({{ site.baseurl }}/dev/table/connectors/formats/). + +{% highlight yaml %} + +tables: + - name: EmployeeTableSource + type: source-table + update-mode: append + connector: + type: filesystem + path: "/path/to/something.csv" + format: + type: csv + fields: + - name: EmpId + data-type: INT + - name: EmpName + data-type: VARCHAR + - name: DeptId + data-type: INT + line-delimiter: "\n" + comment-prefix: "#" + schema: + - name: EmpId + data-type: INT + - name: EmpName + data-type: VARCHAR + - name: DeptId + data-type: INT +{% endhighlight %} + +Addittionally, we can use SQL DDL to [create]({{ site.baseurl }}/dev/table/sql/create.html), [alter]({{ site.baseurl }}/dev/table/sql/alter.html), [drop]({{ site.baseurl }}/dev/table/sql/drop.html) tables from the SQL client. Same table can be defined using DDL as follows + +{% highlight sql %} +CREATE TABLE EmployeeTableSource ( Review comment: Can we reorder this, lets show off create table first and then mention yaml as an alternative. ########## File path: docs/dev/table/sql/gettingStarted.md ########## @@ -0,0 +1,200 @@ +--- +title: "Getting Started - Flink SQL" +nav-parent_id: sql +nav-pos: 0 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +* This will be replaced by the TOC +{:toc} + +Flink SQL enables SQL developers to design and develop the batch or streaming application without writing the Java, Scala, or any other programming language code. It provides a unified API for both batch and streaming APIs. As a user, you can perform powerful transformations. Flink’s SQL support is based on [Apache Calcite](https://calcite.apache.org/) which implements the SQL standard. + +In addition to the SQL API, Flink also has Table API as well with similar semantics as SQL. Table API is a Language integrated API, where we use a specific programming language to write the queries or call the API. For example, we create the table environment, get a table object, and apply different methods that return table API objects. It supports different languages e.g. Java, Scala, Python. + +Flink SQL and Table API are just two ways to write queries that use the same API underneath. It wouldn’t be wrong if we say Table API is a wrapper on top of the streaming API. SQL API is a more descriptive way of writing queries using well-known SQL standards where we usually have ‘select * from Table’ pattern. Table API query starts with from clause, followed by joins and where clause, and then finally projection or select at the last. SQL API is easy to learn and almost everyone knows it already. All the queries are optimized for efficient execution. We will focus on the Flink SQL API, while you can read more about Table API [here]({{ site.baseurl }}/dev/table/). + +### Pre-requisites +You only need to have basic knowledge of SQL to follow along. You will not need to write Java or Scala code or use an IDE. + +### Installation +There are various ways to install the Flink. You can download the source code, compile it, and run it. Another option is to have it running inside the container. Probably the easiest one is to download the binaries and run it locally for experimentation. We assume local installation for the rest of the tutorial. You can start a local cluster using the following command from the installation folder + +{% highlight bash %} +./bin/start-cluster.sh +{% endhighlight %} + +Once the cluster is started, it will also start a web server on [localhost:8081](localhost:8081) to manage settings and monitor the different jobs + +### SQL Client +SQL Client is an interactive client to submit SQL queries to Flink and visualize the results. It’s like a query editor for any other database management system where you can write queries using standard SQL. You can start the SQL client from the installation folder as follows + + {% highlight bash %} +./bin/sql-client.sh embedded + {% endhighlight %} + + +This will start the SQL client in embedded mode. In the future, a user will have two possibilities of starting the SQL Client CLI either by starting an embedded standalone process or by connecting to a remote SQL Client Gateway. At the moment only the embedded mode is supported. + +### Hello World query + +Once SQL client, our query editor, is up and running we can start writing SQL queries. These queries will be submitted to the Flink cluster for computation and results will be returned to the SQL client UI. Let's start with printing 'Hello World'. You can print hello world using the following simple query + +{% highlight sql %} +SELECT 'Hello World'; +{% endhighlight %} + +The following example will print the current stamp + +{% highlight sql %} +SELECT CURRENT_TIMESTAMP; +{% endhighlight %} + + +`Help;` command is used to see different supported DDL (Data definition language) commands. Furthermore, Flink SQL does support different built-in functions as well. The following query will show all the built-in and user defined functions. +{% highlight sql %} +SHOW FUNCTIONS; +{% endhighlight %} + +## Setting up tables +Real-world database queries are run against the SQL tables. Although Flink is a stream processing engine, users can define a table on top of the streaming data. As we know, all the Flink data processing pipelines generally have three components - source, compute, sink. + +The source is our input or from where we read the data e.g. a text file, Kafka topic. Then we defined some computations that needed to be performed on those data elements. Finally, the sink defines what to do with the output or where to store the results. A sink can be a console log, another output file, or a Kafka topic. It's similar to a database query where we read data from a table, perform a query on it and then display or store the results. + +In Flink SQL semantics, source and sink will be tables, but we know Flink isn’t a storage engine and cannot store the data. So we need to back our table with a [storage connector]({{ site.baseurl }}/dev/table/connect.html) like [file system]({{ site.baseurl }}/dev/table/connect.html#file-system-connector), [Kafka]({{ site.baseurl }}/dev/table/connect.html#kafka-connector) or [MySQL]({{ site.baseurl }}/dev/table/connect.html#jdbc-connector). When we will be defining these tables, we will configure the storage connector type, [format]({{ site.baseurl }}/dev/table/connect.html#table-formats) and schema for each table. + + +### Input Source Tables +SQL API environment is configured using the YAML[yaml.org] configuration files. When we start SQL client, it reads the default configuration from the `/conf/sql-client-defaults.yaml` but it can be overriden by user defined configuration file. These files are used to define different environment variables including table source, sinks, [catalogs](({{ site.baseurl }}/dev/table/catalogs.html)), [user-defined functions](({{ site.baseurl }}/dev/table/functions/udfs.html)). Review comment: > When we start SQL client When we start the SQL client. Unlike Flink, the sql client is a generic term. ########## File path: docs/dev/table/sql/gettingStarted.md ########## @@ -0,0 +1,200 @@ +--- +title: "Getting Started - Flink SQL" +nav-parent_id: sql +nav-pos: 0 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +* This will be replaced by the TOC +{:toc} + +Flink SQL enables SQL developers to design and develop the batch or streaming application without writing the Java, Scala, or any other programming language code. It provides a unified API for both batch and streaming APIs. As a user, you can perform powerful transformations. Flink’s SQL support is based on [Apache Calcite](https://calcite.apache.org/) which implements the SQL standard. + +In addition to the SQL API, Flink also has Table API as well with similar semantics as SQL. Table API is a Language integrated API, where we use a specific programming language to write the queries or call the API. For example, we create the table environment, get a table object, and apply different methods that return table API objects. It supports different languages e.g. Java, Scala, Python. + +Flink SQL and Table API are just two ways to write queries that use the same API underneath. It wouldn’t be wrong if we say Table API is a wrapper on top of the streaming API. SQL API is a more descriptive way of writing queries using well-known SQL standards where we usually have ‘select * from Table’ pattern. Table API query starts with from clause, followed by joins and where clause, and then finally projection or select at the last. SQL API is easy to learn and almost everyone knows it already. All the queries are optimized for efficient execution. We will focus on the Flink SQL API, while you can read more about Table API [here]({{ site.baseurl }}/dev/table/). + +### Pre-requisites +You only need to have basic knowledge of SQL to follow along. You will not need to write Java or Scala code or use an IDE. + +### Installation +There are various ways to install the Flink. You can download the source code, compile it, and run it. Another option is to have it running inside the container. Probably the easiest one is to download the binaries and run it locally for experimentation. We assume local installation for the rest of the tutorial. You can start a local cluster using the following command from the installation folder + +{% highlight bash %} +./bin/start-cluster.sh +{% endhighlight %} + +Once the cluster is started, it will also start a web server on [localhost:8081](localhost:8081) to manage settings and monitor the different jobs + +### SQL Client +SQL Client is an interactive client to submit SQL queries to Flink and visualize the results. It’s like a query editor for any other database management system where you can write queries using standard SQL. You can start the SQL client from the installation folder as follows + + {% highlight bash %} +./bin/sql-client.sh embedded + {% endhighlight %} + + +This will start the SQL client in embedded mode. In the future, a user will have two possibilities of starting the SQL Client CLI either by starting an embedded standalone process or by connecting to a remote SQL Client Gateway. At the moment only the embedded mode is supported. Review comment: ```suggestion ``` For a getting started guide, let's not focus on what Flink can't do :) ########## File path: docs/dev/table/sql/gettingStarted.md ########## @@ -0,0 +1,200 @@ +--- +title: "Getting Started - Flink SQL" +nav-parent_id: sql +nav-pos: 0 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +* This will be replaced by the TOC +{:toc} + +Flink SQL enables SQL developers to design and develop the batch or streaming application without writing the Java, Scala, or any other programming language code. It provides a unified API for both batch and streaming APIs. As a user, you can perform powerful transformations. Flink’s SQL support is based on [Apache Calcite](https://calcite.apache.org/) which implements the SQL standard. + +In addition to the SQL API, Flink also has Table API as well with similar semantics as SQL. Table API is a Language integrated API, where we use a specific programming language to write the queries or call the API. For example, we create the table environment, get a table object, and apply different methods that return table API objects. It supports different languages e.g. Java, Scala, Python. + +Flink SQL and Table API are just two ways to write queries that use the same API underneath. It wouldn’t be wrong if we say Table API is a wrapper on top of the streaming API. SQL API is a more descriptive way of writing queries using well-known SQL standards where we usually have ‘select * from Table’ pattern. Table API query starts with from clause, followed by joins and where clause, and then finally projection or select at the last. SQL API is easy to learn and almost everyone knows it already. All the queries are optimized for efficient execution. We will focus on the Flink SQL API, while you can read more about Table API [here]({{ site.baseurl }}/dev/table/). + +### Pre-requisites +You only need to have basic knowledge of SQL to follow along. You will not need to write Java or Scala code or use an IDE. + +### Installation +There are various ways to install the Flink. You can download the source code, compile it, and run it. Another option is to have it running inside the container. Probably the easiest one is to download the binaries and run it locally for experimentation. We assume local installation for the rest of the tutorial. You can start a local cluster using the following command from the installation folder + +{% highlight bash %} +./bin/start-cluster.sh +{% endhighlight %} + +Once the cluster is started, it will also start a web server on [localhost:8081](localhost:8081) to manage settings and monitor the different jobs + +### SQL Client +SQL Client is an interactive client to submit SQL queries to Flink and visualize the results. It’s like a query editor for any other database management system where you can write queries using standard SQL. You can start the SQL client from the installation folder as follows + + {% highlight bash %} +./bin/sql-client.sh embedded + {% endhighlight %} + + +This will start the SQL client in embedded mode. In the future, a user will have two possibilities of starting the SQL Client CLI either by starting an embedded standalone process or by connecting to a remote SQL Client Gateway. At the moment only the embedded mode is supported. + +### Hello World query + +Once SQL client, our query editor, is up and running we can start writing SQL queries. These queries will be submitted to the Flink cluster for computation and results will be returned to the SQL client UI. Let's start with printing 'Hello World'. You can print hello world using the following simple query + +{% highlight sql %} +SELECT 'Hello World'; +{% endhighlight %} + +The following example will print the current stamp + +{% highlight sql %} +SELECT CURRENT_TIMESTAMP; +{% endhighlight %} + + +`Help;` command is used to see different supported DDL (Data definition language) commands. Furthermore, Flink SQL does support different built-in functions as well. The following query will show all the built-in and user defined functions. +{% highlight sql %} +SHOW FUNCTIONS; +{% endhighlight %} + +## Setting up tables +Real-world database queries are run against the SQL tables. Although Flink is a stream processing engine, users can define a table on top of the streaming data. As we know, all the Flink data processing pipelines generally have three components - source, compute, sink. + +The source is our input or from where we read the data e.g. a text file, Kafka topic. Then we defined some computations that needed to be performed on those data elements. Finally, the sink defines what to do with the output or where to store the results. A sink can be a console log, another output file, or a Kafka topic. It's similar to a database query where we read data from a table, perform a query on it and then display or store the results. Review comment: > Then we defined some computations that needed to be performed on those data elements. Focus on the present or future tense. We haven't done these things yet. > We then define queries to be executed on the rows of that input table. ########## File path: docs/dev/table/sql/gettingStarted.md ########## @@ -0,0 +1,200 @@ +--- +title: "Getting Started - Flink SQL" +nav-parent_id: sql +nav-pos: 0 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +* This will be replaced by the TOC +{:toc} + +Flink SQL enables SQL developers to design and develop the batch or streaming application without writing the Java, Scala, or any other programming language code. It provides a unified API for both batch and streaming APIs. As a user, you can perform powerful transformations. Flink’s SQL support is based on [Apache Calcite](https://calcite.apache.org/) which implements the SQL standard. + +In addition to the SQL API, Flink also has Table API as well with similar semantics as SQL. Table API is a Language integrated API, where we use a specific programming language to write the queries or call the API. For example, we create the table environment, get a table object, and apply different methods that return table API objects. It supports different languages e.g. Java, Scala, Python. + +Flink SQL and Table API are just two ways to write queries that use the same API underneath. It wouldn’t be wrong if we say Table API is a wrapper on top of the streaming API. SQL API is a more descriptive way of writing queries using well-known SQL standards where we usually have ‘select * from Table’ pattern. Table API query starts with from clause, followed by joins and where clause, and then finally projection or select at the last. SQL API is easy to learn and almost everyone knows it already. All the queries are optimized for efficient execution. We will focus on the Flink SQL API, while you can read more about Table API [here]({{ site.baseurl }}/dev/table/). + +### Pre-requisites +You only need to have basic knowledge of SQL to follow along. You will not need to write Java or Scala code or use an IDE. + +### Installation +There are various ways to install the Flink. You can download the source code, compile it, and run it. Another option is to have it running inside the container. Probably the easiest one is to download the binaries and run it locally for experimentation. We assume local installation for the rest of the tutorial. You can start a local cluster using the following command from the installation folder + +{% highlight bash %} +./bin/start-cluster.sh +{% endhighlight %} + +Once the cluster is started, it will also start a web server on [localhost:8081](localhost:8081) to manage settings and monitor the different jobs + +### SQL Client +SQL Client is an interactive client to submit SQL queries to Flink and visualize the results. It’s like a query editor for any other database management system where you can write queries using standard SQL. You can start the SQL client from the installation folder as follows + + {% highlight bash %} +./bin/sql-client.sh embedded + {% endhighlight %} + + +This will start the SQL client in embedded mode. In the future, a user will have two possibilities of starting the SQL Client CLI either by starting an embedded standalone process or by connecting to a remote SQL Client Gateway. At the moment only the embedded mode is supported. + +### Hello World query + +Once SQL client, our query editor, is up and running we can start writing SQL queries. These queries will be submitted to the Flink cluster for computation and results will be returned to the SQL client UI. Let's start with printing 'Hello World'. You can print hello world using the following simple query + +{% highlight sql %} +SELECT 'Hello World'; +{% endhighlight %} + +The following example will print the current stamp Review comment: ```suggestion We can also execute a built-in function, the following will print the current timestamp ``` ########## File path: docs/dev/table/sql/gettingStarted.md ########## @@ -0,0 +1,200 @@ +--- +title: "Getting Started - Flink SQL" +nav-parent_id: sql +nav-pos: 0 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +* This will be replaced by the TOC +{:toc} + +Flink SQL enables SQL developers to design and develop the batch or streaming application without writing the Java, Scala, or any other programming language code. It provides a unified API for both batch and streaming APIs. As a user, you can perform powerful transformations. Flink’s SQL support is based on [Apache Calcite](https://calcite.apache.org/) which implements the SQL standard. + +In addition to the SQL API, Flink also has Table API as well with similar semantics as SQL. Table API is a Language integrated API, where we use a specific programming language to write the queries or call the API. For example, we create the table environment, get a table object, and apply different methods that return table API objects. It supports different languages e.g. Java, Scala, Python. + +Flink SQL and Table API are just two ways to write queries that use the same API underneath. It wouldn’t be wrong if we say Table API is a wrapper on top of the streaming API. SQL API is a more descriptive way of writing queries using well-known SQL standards where we usually have ‘select * from Table’ pattern. Table API query starts with from clause, followed by joins and where clause, and then finally projection or select at the last. SQL API is easy to learn and almost everyone knows it already. All the queries are optimized for efficient execution. We will focus on the Flink SQL API, while you can read more about Table API [here]({{ site.baseurl }}/dev/table/). + +### Pre-requisites +You only need to have basic knowledge of SQL to follow along. You will not need to write Java or Scala code or use an IDE. + +### Installation +There are various ways to install the Flink. You can download the source code, compile it, and run it. Another option is to have it running inside the container. Probably the easiest one is to download the binaries and run it locally for experimentation. We assume local installation for the rest of the tutorial. You can start a local cluster using the following command from the installation folder + +{% highlight bash %} +./bin/start-cluster.sh +{% endhighlight %} + +Once the cluster is started, it will also start a web server on [localhost:8081](localhost:8081) to manage settings and monitor the different jobs + +### SQL Client +SQL Client is an interactive client to submit SQL queries to Flink and visualize the results. It’s like a query editor for any other database management system where you can write queries using standard SQL. You can start the SQL client from the installation folder as follows + + {% highlight bash %} +./bin/sql-client.sh embedded + {% endhighlight %} + + +This will start the SQL client in embedded mode. In the future, a user will have two possibilities of starting the SQL Client CLI either by starting an embedded standalone process or by connecting to a remote SQL Client Gateway. At the moment only the embedded mode is supported. + +### Hello World query + +Once SQL client, our query editor, is up and running we can start writing SQL queries. These queries will be submitted to the Flink cluster for computation and results will be returned to the SQL client UI. Let's start with printing 'Hello World'. You can print hello world using the following simple query + +{% highlight sql %} +SELECT 'Hello World'; +{% endhighlight %} + +The following example will print the current stamp + +{% highlight sql %} +SELECT CURRENT_TIMESTAMP; +{% endhighlight %} + + +`Help;` command is used to see different supported DDL (Data definition language) commands. Furthermore, Flink SQL does support different built-in functions as well. The following query will show all the built-in and user defined functions. +{% highlight sql %} +SHOW FUNCTIONS; +{% endhighlight %} + +## Setting up tables +Real-world database queries are run against the SQL tables. Although Flink is a stream processing engine, users can define a table on top of the streaming data. As we know, all the Flink data processing pipelines generally have three components - source, compute, sink. + +The source is our input or from where we read the data e.g. a text file, Kafka topic. Then we defined some computations that needed to be performed on those data elements. Finally, the sink defines what to do with the output or where to store the results. A sink can be a console log, another output file, or a Kafka topic. It's similar to a database query where we read data from a table, perform a query on it and then display or store the results. + +In Flink SQL semantics, source and sink will be tables, but we know Flink isn’t a storage engine and cannot store the data. So we need to back our table with a [storage connector]({{ site.baseurl }}/dev/table/connect.html) like [file system]({{ site.baseurl }}/dev/table/connect.html#file-system-connector), [Kafka]({{ site.baseurl }}/dev/table/connect.html#kafka-connector) or [MySQL]({{ site.baseurl }}/dev/table/connect.html#jdbc-connector). When we will be defining these tables, we will configure the storage connector type, [format]({{ site.baseurl }}/dev/table/connect.html#table-formats) and schema for each table. + + +### Input Source Tables +SQL API environment is configured using the YAML[yaml.org] configuration files. When we start SQL client, it reads the default configuration from the `/conf/sql-client-defaults.yaml` but it can be overriden by user defined configuration file. These files are used to define different environment variables including table source, sinks, [catalogs](({{ site.baseurl }}/dev/table/catalogs.html)), [user-defined functions](({{ site.baseurl }}/dev/table/functions/udfs.html)). Review comment: > SQL API environment is configured using the YAML[yaml.org] Did you mean for this to be al ink? ########## File path: docs/dev/table/sql/gettingStarted.md ########## @@ -0,0 +1,200 @@ +--- +title: "Getting Started - Flink SQL" +nav-parent_id: sql +nav-pos: 0 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +* This will be replaced by the TOC +{:toc} + +Flink SQL enables SQL developers to design and develop the batch or streaming application without writing the Java, Scala, or any other programming language code. It provides a unified API for both batch and streaming APIs. As a user, you can perform powerful transformations. Flink’s SQL support is based on [Apache Calcite](https://calcite.apache.org/) which implements the SQL standard. + +In addition to the SQL API, Flink also has Table API as well with similar semantics as SQL. Table API is a Language integrated API, where we use a specific programming language to write the queries or call the API. For example, we create the table environment, get a table object, and apply different methods that return table API objects. It supports different languages e.g. Java, Scala, Python. + +Flink SQL and Table API are just two ways to write queries that use the same API underneath. It wouldn’t be wrong if we say Table API is a wrapper on top of the streaming API. SQL API is a more descriptive way of writing queries using well-known SQL standards where we usually have ‘select * from Table’ pattern. Table API query starts with from clause, followed by joins and where clause, and then finally projection or select at the last. SQL API is easy to learn and almost everyone knows it already. All the queries are optimized for efficient execution. We will focus on the Flink SQL API, while you can read more about Table API [here]({{ site.baseurl }}/dev/table/). + +### Pre-requisites +You only need to have basic knowledge of SQL to follow along. You will not need to write Java or Scala code or use an IDE. + +### Installation +There are various ways to install the Flink. You can download the source code, compile it, and run it. Another option is to have it running inside the container. Probably the easiest one is to download the binaries and run it locally for experimentation. We assume local installation for the rest of the tutorial. You can start a local cluster using the following command from the installation folder + +{% highlight bash %} +./bin/start-cluster.sh +{% endhighlight %} + +Once the cluster is started, it will also start a web server on [localhost:8081](localhost:8081) to manage settings and monitor the different jobs + +### SQL Client +SQL Client is an interactive client to submit SQL queries to Flink and visualize the results. It’s like a query editor for any other database management system where you can write queries using standard SQL. You can start the SQL client from the installation folder as follows + + {% highlight bash %} +./bin/sql-client.sh embedded + {% endhighlight %} + + +This will start the SQL client in embedded mode. In the future, a user will have two possibilities of starting the SQL Client CLI either by starting an embedded standalone process or by connecting to a remote SQL Client Gateway. At the moment only the embedded mode is supported. + +### Hello World query + +Once SQL client, our query editor, is up and running we can start writing SQL queries. These queries will be submitted to the Flink cluster for computation and results will be returned to the SQL client UI. Let's start with printing 'Hello World'. You can print hello world using the following simple query + +{% highlight sql %} +SELECT 'Hello World'; +{% endhighlight %} + +The following example will print the current stamp + +{% highlight sql %} +SELECT CURRENT_TIMESTAMP; +{% endhighlight %} + + +`Help;` command is used to see different supported DDL (Data definition language) commands. Furthermore, Flink SQL does support different built-in functions as well. The following query will show all the built-in and user defined functions. Review comment: Shouldn't this go before printing the current timestamp? ########## File path: docs/dev/table/sql/gettingStarted.md ########## @@ -0,0 +1,200 @@ +--- +title: "Getting Started - Flink SQL" +nav-parent_id: sql +nav-pos: 0 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +* This will be replaced by the TOC +{:toc} + +Flink SQL enables SQL developers to design and develop the batch or streaming application without writing the Java, Scala, or any other programming language code. It provides a unified API for both batch and streaming APIs. As a user, you can perform powerful transformations. Flink’s SQL support is based on [Apache Calcite](https://calcite.apache.org/) which implements the SQL standard. + +In addition to the SQL API, Flink also has Table API as well with similar semantics as SQL. Table API is a Language integrated API, where we use a specific programming language to write the queries or call the API. For example, we create the table environment, get a table object, and apply different methods that return table API objects. It supports different languages e.g. Java, Scala, Python. + +Flink SQL and Table API are just two ways to write queries that use the same API underneath. It wouldn’t be wrong if we say Table API is a wrapper on top of the streaming API. SQL API is a more descriptive way of writing queries using well-known SQL standards where we usually have ‘select * from Table’ pattern. Table API query starts with from clause, followed by joins and where clause, and then finally projection or select at the last. SQL API is easy to learn and almost everyone knows it already. All the queries are optimized for efficient execution. We will focus on the Flink SQL API, while you can read more about Table API [here]({{ site.baseurl }}/dev/table/). Review comment: > Flink SQL and Table API are just two ways to write queries that use the same API underneath. It's not really the same API, right? It's two different api's that give you access to the same runtime. > It wouldn’t be wrong if we say Table API is a wrapper on top of the streaming API. I'm not really sure what you're trying to say here. The table API supports both batch and streaming. ########## File path: docs/dev/table/sql/gettingStarted.md ########## @@ -0,0 +1,200 @@ +--- +title: "Getting Started - Flink SQL" +nav-parent_id: sql +nav-pos: 0 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +* This will be replaced by the TOC +{:toc} + +Flink SQL enables SQL developers to design and develop the batch or streaming application without writing the Java, Scala, or any other programming language code. It provides a unified API for both batch and streaming APIs. As a user, you can perform powerful transformations. Flink’s SQL support is based on [Apache Calcite](https://calcite.apache.org/) which implements the SQL standard. + +In addition to the SQL API, Flink also has Table API as well with similar semantics as SQL. Table API is a Language integrated API, where we use a specific programming language to write the queries or call the API. For example, we create the table environment, get a table object, and apply different methods that return table API objects. It supports different languages e.g. Java, Scala, Python. + +Flink SQL and Table API are just two ways to write queries that use the same API underneath. It wouldn’t be wrong if we say Table API is a wrapper on top of the streaming API. SQL API is a more descriptive way of writing queries using well-known SQL standards where we usually have ‘select * from Table’ pattern. Table API query starts with from clause, followed by joins and where clause, and then finally projection or select at the last. SQL API is easy to learn and almost everyone knows it already. All the queries are optimized for efficient execution. We will focus on the Flink SQL API, while you can read more about Table API [here]({{ site.baseurl }}/dev/table/). + +### Pre-requisites +You only need to have basic knowledge of SQL to follow along. You will not need to write Java or Scala code or use an IDE. + +### Installation +There are various ways to install the Flink. You can download the source code, compile it, and run it. Another option is to have it running inside the container. Probably the easiest one is to download the binaries and run it locally for experimentation. We assume local installation for the rest of the tutorial. You can start a local cluster using the following command from the installation folder + +{% highlight bash %} +./bin/start-cluster.sh +{% endhighlight %} + +Once the cluster is started, it will also start a web server on [localhost:8081](localhost:8081) to manage settings and monitor the different jobs + +### SQL Client +SQL Client is an interactive client to submit SQL queries to Flink and visualize the results. It’s like a query editor for any other database management system where you can write queries using standard SQL. You can start the SQL client from the installation folder as follows + + {% highlight bash %} +./bin/sql-client.sh embedded + {% endhighlight %} + + +This will start the SQL client in embedded mode. In the future, a user will have two possibilities of starting the SQL Client CLI either by starting an embedded standalone process or by connecting to a remote SQL Client Gateway. At the moment only the embedded mode is supported. + +### Hello World query + +Once SQL client, our query editor, is up and running we can start writing SQL queries. These queries will be submitted to the Flink cluster for computation and results will be returned to the SQL client UI. Let's start with printing 'Hello World'. You can print hello world using the following simple query + +{% highlight sql %} +SELECT 'Hello World'; +{% endhighlight %} + +The following example will print the current stamp + +{% highlight sql %} +SELECT CURRENT_TIMESTAMP; +{% endhighlight %} + + +`Help;` command is used to see different supported DDL (Data definition language) commands. Furthermore, Flink SQL does support different built-in functions as well. The following query will show all the built-in and user defined functions. +{% highlight sql %} +SHOW FUNCTIONS; +{% endhighlight %} + +## Setting up tables +Real-world database queries are run against the SQL tables. Although Flink is a stream processing engine, users can define a table on top of the streaming data. As we know, all the Flink data processing pipelines generally have three components - source, compute, sink. Review comment: > As we know, Don't say as we know, new users might not know ########## File path: docs/dev/table/sql/gettingStarted.md ########## @@ -0,0 +1,200 @@ +--- +title: "Getting Started - Flink SQL" +nav-parent_id: sql +nav-pos: 0 +--- +<!-- +Licensed to the Apache Software Foundation (ASF) under one +or more contributor license agreements. See the NOTICE file +distributed with this work for additional information +regarding copyright ownership. The ASF licenses this file +to you under the Apache License, Version 2.0 (the +"License"); you may not use this file except in compliance +with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, +software distributed under the License is distributed on an +"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +KIND, either express or implied. See the License for the +specific language governing permissions and limitations +under the License. +--> + + +* This will be replaced by the TOC +{:toc} + +Flink SQL enables SQL developers to design and develop the batch or streaming application without writing the Java, Scala, or any other programming language code. It provides a unified API for both batch and streaming APIs. As a user, you can perform powerful transformations. Flink’s SQL support is based on [Apache Calcite](https://calcite.apache.org/) which implements the SQL standard. + +In addition to the SQL API, Flink also has Table API as well with similar semantics as SQL. Table API is a Language integrated API, where we use a specific programming language to write the queries or call the API. For example, we create the table environment, get a table object, and apply different methods that return table API objects. It supports different languages e.g. Java, Scala, Python. Review comment: Try to avoid "we". It isn't clear who "we" refers to. ```suggestion In addition to the SQL API, Flink also has a Table API with similar semantics as SQL. The Table API is a language-integrated API, where users develop in a specific programming language to write the queries or call the API. For example, jobs create a table environment, read a table, and apply different transformations and aggregations, and write the results back to another table. It supports different languages e.g. Java, Scala, Python. ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org