subject:"\[GitHub\] \[flink\] sjwiesman commented on a change in pull request #8903\: \[FLINK\-12747\]\[docs\] Getting Started \- Table API Example Walkthrough"

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-26 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r307730100
 
 

 ##
 File path: docs/getting-started/walkthroughs/table_api.md
 ##
 @@ -0,0 +1,444 @@
+---
+title: "Table API"
+nav-id: tableapiwalkthrough
+nav-title: 'Table API'
+nav-parent_id: walkthroughs
+nav-pos: 1
+---
+
+
+Apache Flink offers a Table API as a unified, relational API for batch and 
stream processing, i.e., queries are executed with the same semantics on 
unbounded, real-time streams or bounded, recorded streams and produce the same 
results.
+The Table API in Flink is commonly used to ease the definition of data 
analytics, data pipelining, and ETL applications.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What Will You Be Building? 
+
+In this tutorial, you will learn how to build a continuous ETL pipeline for 
tracking financial transactions by account over time.
+You will start by building your report as a nightly batch job, and then 
migrate to a streaming pipeline.
+
+## Prerequisites
+
+This walkthrough assumes that you have some familiarity with Java or Scala, 
but you should be able to follow along even if you are coming from a different 
programming language.
+It also assumes that you are familiar with basic relational concepts such as 
`SELECT` and `GROUP BY` clauses. 
+
+## Help, I’m Stuck! 
+
+If you get stuck, check out the [community support 
resources](https://flink.apache.org/community.html).
+In particular, Apache Flink's [user mailing 
list](https://flink.apache.org/community.html#mailing-lists) is consistently 
ranked as one of the most active of any Apache project and a great way to get 
help quickly. 
+
+## How To Follow Along
+
+If you want to follow along, you will require a computer with: 
+
+* Java 8 
+* Maven 
+
+A provided Flink Maven Archetype will create a skeleton project with all the 
necessary dependencies quickly:
+
+
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table-java \{% unless 
site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table-scala \{% unless 
site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+
+
+{% unless site.is_stable %}
+
+Note: For Maven 3.0 or higher, it is no longer possible to specify 
the repository (-DarchetypeCatalog) via the commandline. If you wish to use the 
snapshot repository, you need to add a repository entry to your settings.xml. 
For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html;>Maven's
 official document
+
+{% endunless %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With the 
above parameters,
+Maven will create a project with all the dependencies to complete this 
tutorial.
+After importing the project into your editor, you will see a file with the 
following code which you can run directly inside your IDE.
+
+
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+
+tEnv.registerTableSource("transactions", new BoundedTransactionTableSource());
+tEnv.registerTableSink("spend_report", new SpendReportTableSink());
+tEnv.registerFunction("truncateDateToHour", new TruncateDateToHour());
+
+tEnv
+   .scan("transactions")
+   .insertInto("spend_report");
+
+env.execute("Spend Report");
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+val env  = ExecutionEnvironment.getExecutionEnvironment
+val tEnv = BatchTableEnvironment.create(env)
+
+tEnv.registerTableSource("transactions", new BoundedTransactionTableSource)
+tEnv.registerTableSink("spend_report", new SpendReportTableSink)
+
+val truncateDateToHour = new TruncateDateToHour
+
+tEnv
+   .scan("transactions")
+   .insertInto("spend_report")
+
+env.execute("Spend Report")
+{% endhighlight %}
+
+
+
+## Breaking Down The Code
+
+ The Execution Environment
+
+The first two lines set up your `ExecutionEnvironment`.
+The execution environment is how you can set properties for your Job, specify 
whether you are writing a batch or a streaming

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-25 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r307369349
 
 

 ##
 File path: 
flink-walkthroughs/flink-walkthrough-table-java/src/main/resources/archetype-resources/pom.xml
 ##
 @@ -0,0 +1,263 @@
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+   4.0.0
+
+   ${groupId}
+   ${artifactId}
+   ${version}
+   jar
+
+   Flink Walkthrough Table Java
+   http://www.myorganization.org
+
+   
+   
UTF-8
+   @project.version@
+   1.8
+   2.11
+   ${java.version}
+   ${java.version}
+   
+
+   
+   
+   apache.snapshots
+   Apache Development Snapshot Repository
+   
https://repository.apache.org/content/repositories/snapshots/
+   
+   false
+   
+   
+   true
+   
+   
+   
+
+   
+   
+   org.apache.flink
+   
flink-walkthrough-common_${scala.binary.version}
+   ${flink.version}
+   
+
+   
+   
+   org.apache.flink
+   flink-java
+   ${flink.version}
+   provided
+   
+   
+   org.apache.flink
+   
flink-streaming-java_${scala.binary.version}
+   ${flink.version}
+   provided
+   
+
+   
+   
+   org.apache.flink
+   
flink-table-api-java-bridge_${scala.binary.version}
+   ${flink.version}
+   provided
+   
+   
+   org.apache.flink
+   
flink-table-planner_${scala.binary.version}
+   ${flink.version}
+   provided
 
 Review comment:
   I've added myself as a watcher on that ticket. I will follow and update this 
pr accordingly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-25 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r307369349
 
 

 ##
 File path: 
flink-walkthroughs/flink-walkthrough-table-java/src/main/resources/archetype-resources/pom.xml
 ##
 @@ -0,0 +1,263 @@
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+   4.0.0
+
+   ${groupId}
+   ${artifactId}
+   ${version}
+   jar
+
+   Flink Walkthrough Table Java
+   http://www.myorganization.org
+
+   
+   
UTF-8
+   @project.version@
+   1.8
+   2.11
+   ${java.version}
+   ${java.version}
+   
+
+   
+   
+   apache.snapshots
+   Apache Development Snapshot Repository
+   
https://repository.apache.org/content/repositories/snapshots/
+   
+   false
+   
+   
+   true
+   
+   
+   
+
+   
+   
+   org.apache.flink
+   
flink-walkthrough-common_${scala.binary.version}
+   ${flink.version}
+   
+
+   
+   
+   org.apache.flink
+   flink-java
+   ${flink.version}
+   provided
+   
+   
+   org.apache.flink
+   
flink-streaming-java_${scala.binary.version}
+   ${flink.version}
+   provided
+   
+
+   
+   
+   org.apache.flink
+   
flink-table-api-java-bridge_${scala.binary.version}
+   ${flink.version}
+   provided
+   
+   
+   org.apache.flink
+   
flink-table-planner_${scala.binary.version}
+   ${flink.version}
+   provided
 
 Review comment:
   I've added myself as a watcher on that ticket. I will follow and update this 
pr accordingly. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-24 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r306980051
 
 

 ##
 File path: docs/getting-started/walkthroughs/table_api.md
 ##
 @@ -0,0 +1,445 @@
+---
+title: "Table API"
+nav-id: tableapiwalkthrough
+nav-title: 'Table API'
+nav-parent_id: walkthroughs
+nav-pos: 1
+---
+
+
+Apache Flink offers a Table API as a unified, relational API for batch and 
stream processing, i.e., queries are executed with the same semantics on 
unbounded, real-time streams or bounded, recorded streams and produce the same 
results.
+The Table API in Flink is commonly used to ease the definition of data 
analytics, data pipelining, and ETL applications.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What Will You Be Building? 
+
+In this tutorial, you'll learn how to build a continuous ETL pipeline for 
tracking financial transactions by account over time.
+You will start by building your report as a nightly batch job, and then 
migrate to a streaming pipeline.
+
+## Prerequisites
+
+This walkthrough assumes that you have some familiarity with Java or Scala, 
but you should be able to follow along even if you're coming from a different 
programming language.
+It also assumes that you're familiar with basic relational concepts such as 
`SELECT` and `GROUP BY` clauses. 
+
+## Help, I’m Stuck! 
+
+If you get stuck, check out the [community support 
resources](https://flink.apache.org/community.html).
+In particular, Apache Flink's [user mailing 
list](https://flink.apache.org/community.html#mailing-lists) is consistently 
ranked as one of the most active of any Apache project and a great way to get 
help quickly. 
+
+## How To Follow Along
+
+If you want to follow along, you will require a computer with: 
+
+* Java 8 
+* Maven 
+
+There is also a provided Flink Maven Archetype to create a skeleton project 
with all the necessary dependencies quickly.
+
+
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table-java \{% unless 
site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table-scala \{% unless 
site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+
+
+{% unless site.is_stable %}
+
+Note: For Maven 3.0 or higher, it is no longer possible to specify 
the repository (-DarchetypeCatalog) via the commandline. If you wish to use the 
snapshot repository, you need to add a repository entry to your settings.xml. 
For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html;>Maven
 official document
+
+{% endunless %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With the 
above parameters,
+Maven will create a project with all the dependencies to complete this 
tutorial.
+After importing the project into your editor, you will see a file following 
code. 
+
+
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+
+tEnv.registerTableSource("transactions", new BoundedTransactionTableSource());
+tEnv.registerTableSink("spend_report", new SpendReportTableSink());
+tEnv.registerFunction("truncateDateToHour", new TruncateDateToHour());
+
+tEnv
+   .scan("transactions")
+   .insertInto("spend_report");
+
+env.execute("Spend Report");
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+val env  = ExecutionEnvironment.getExecutionEnvironment
+val tEnv = BatchTableEnvironment.create(env)
+
+tEnv.registerTableSource("transactions", new BoundedTransactionTableSource)
+tEnv.registerTableSink("spend_report", new SpendReportTableSink)
+
+val truncateDateToHour = new TruncateDateToHour
+
+tEnv
+   .scan("transactions")
+   .insertInto("spend_report")
+
+env.execute("Spend Report")
+{% endhighlight %}
+
+
+
+Let's break down this code by component. 
+
+## Breaking Down The Code
+
+ The Execution Environment
+
+The first two lines set up your `ExecutionEnvironment`.
+The execution environment is how you can set properties for your Job, specify 
whether you are writing a batch or streaming

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-24 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r306917597
 
 

 ##
 File path: 
flink-walkthroughs/flink-walkthrough-table-scala/src/main/resources/archetype-resources/pom.xml
 ##
 @@ -0,0 +1,333 @@
+
+http://maven.apache.org/POM/4.0.0;
+xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+   4.0.0
+
+   ${groupId}
+   ${artifactId}
+   ${version}
+   jar
+
+   Flink Walkthrough Table
+   http://www.myorganization.org
+
+   
+   
+   apache.snapshots
+   Apache Development Snapshot Repository
+   
https://repository.apache.org/content/repositories/snapshots/
+   
+   false
+   
+   
+   true
+   
+   
+   
+
+   
+   
UTF-8
+   @project.version@
+   2.11
+   2.11.12
+   
+
+   
+   
+   org.apache.flink
+   
flink-walkthrough-common_${scala.binary.version}
+   ${flink.version}
+   
+
+   
+   
+   
+   org.apache.flink
+   
flink-scala_${scala.binary.version}
+   ${flink.version}
+   provided
+   
+   
+   org.apache.flink
+   
flink-streaming-scala_${scala.binary.version}
+   ${flink.version}
+   provided
+   
+
+   
+   
+   org.scala-lang
+   scala-library
+   ${scala.version}
+   provided
+   
+
+   
+   
+   org.apache.flink
+   
flink-table-api-java-bridge_${scala.binary.version}
+   ${flink.version}
+   provided
+   
 
 Review comment:
   Yes, actually `flink-walkthrough-common` should only depend on 
`flink-table-common` but there are some classes that haven't been ported yet. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-24 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r306917652
 
 

 ##
 File path: flink-walkthroughs/flink-walkthrough-common/pom.xml
 ##
 @@ -0,0 +1,71 @@
+
+
+http://maven.apache.org/POM/4.0.0;
+xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/maven-v4_0_0.xsd;>
+
+   4.0.0
+
+   
+   org.apache.flink
+   flink-walkthroughs
+   1.10-SNAPSHOT
+   ..
+   
+
+   
flink-walkthrough-common_${scala.binary.version}
+   flink-walkthrough-common
+
+   jar
+
+   
+   
+   org.apache.flink
+   flink-java
+   ${project.version}
+   provided
+   
+   
+   org.apache.flink
+   
flink-streaming-java_${scala.binary.version}
+   ${project.version}
+   provided
+   
+   
+   org.apache.flink
+   flink-table-common
+   ${project.version}
+   provided
+   
+   
+   org.apache.flink
+   
flink-table-api-java-bridge_${scala.binary.version}
+   ${project.version}
+   provided
+   
+   
+   org.apache.flink
+   
flink-table-planner_${scala.binary.version}
+   ${project.version}
+   provided
+   
 
 Review comment:
   Same as above


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-24 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r306916406
 
 

 ##
 File path: 
flink-walkthroughs/flink-walkthrough-table-scala/src/main/resources/archetype-resources/pom.xml
 ##
 @@ -0,0 +1,333 @@
+
+http://maven.apache.org/POM/4.0.0;
+xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+   4.0.0
+
+   ${groupId}
+   ${artifactId}
+   ${version}
+   jar
+
+   Flink Walkthrough Table
+   http://www.myorganization.org
+
+   
+   
+   apache.snapshots
+   Apache Development Snapshot Repository
+   
https://repository.apache.org/content/repositories/snapshots/
+   
+   false
+   
+   
+   true
+   
+   
+   
+
+   
+   
UTF-8
+   @project.version@
+   2.11
+   2.11.12
+   
+
+   
+   
+   org.apache.flink
+   
flink-walkthrough-common_${scala.binary.version}
+   ${flink.version}
+   
+
+   
+   
+   
+   org.apache.flink
+   
flink-scala_${scala.binary.version}
+   ${flink.version}
+   provided
+   
+   
+   org.apache.flink
+   
flink-streaming-scala_${scala.binary.version}
+   ${flink.version}
+   provided
+   
+
+   
+   
+   org.scala-lang
+   scala-library
+   ${scala.version}
+   provided
+   
+
+   
+   
+   org.apache.flink
+   
flink-table-api-java-bridge_${scala.binary.version}
+   ${flink.version}
+   provided
+   
+   
+   org.apache.flink
+   
flink-table-api-scala-bridge_${scala.binary.version}
+   ${flink.version}
+   provided
+   
+   
+   org.apache.flink
+   
flink-table-planner_${scala.binary.version}
+   ${flink.version}
+   provided
+   
 
 Review comment:
   The other option is to copy `flink-table-uber` from `/opt` to '/lib' which I 
think is a cleaner solution. I will add a note to the walkthrough. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-24 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r306909738
 
 

 ##
 File path: 
flink-walkthroughs/flink-walkthrough-table-java/src/main/resources/archetype-resources/pom.xml
 ##
 @@ -0,0 +1,263 @@
+
+http://maven.apache.org/POM/4.0.0; 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+   4.0.0
+
+   ${groupId}
+   ${artifactId}
+   ${version}
+   jar
+
+   Flink Walkthrough Table Java
+   http://www.myorganization.org
+
+   
+   
UTF-8
+   @project.version@
+   1.8
+   2.11
+   ${java.version}
+   ${java.version}
+   
+
+   
+   
+   apache.snapshots
+   Apache Development Snapshot Repository
+   
https://repository.apache.org/content/repositories/snapshots/
+   
+   false
+   
+   
+   true
+   
+   
+   
+
+   
+   
+   org.apache.flink
+   
flink-walkthrough-common_${scala.binary.version}
+   ${flink.version}
+   
+
+   
+   
+   org.apache.flink
+   flink-java
+   ${flink.version}
+   provided
+   
+   
+   org.apache.flink
+   
flink-streaming-java_${scala.binary.version}
+   ${flink.version}
+   provided
+   
+
+   
+   
+   org.apache.flink
+   
flink-table-api-java-bridge_${scala.binary.version}
+   ${flink.version}
+   provided
+   
+   
+   org.apache.flink
+   
flink-table-planner_${scala.binary.version}
+   ${flink.version}
+   provided
 
 Review comment:
   You can either do that or place `flink-table-uber` in `/lib`. I think 
putting flink deps in `/lib` is cleaner but this is targetted at new users and 
this should be optimized towards ease of deployment. +1


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-24 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r306908708
 
 

 ##
 File path: 
flink-walkthroughs/flink-walkthrough-common/src/main/java/org/apache/flink/walkthrough/common/table/SpendReportTableSink.java
 ##
 @@ -0,0 +1,94 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.flink.walkthrough.common.table;
+
+import org.apache.flink.annotation.PublicEvolving;
+import org.apache.flink.api.common.typeinfo.TypeInformation;
+import org.apache.flink.api.common.typeinfo.Types;
+import org.apache.flink.api.java.DataSet;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.table.api.TableSchema;
+import org.apache.flink.table.sinks.AppendStreamTableSink;
+import org.apache.flink.table.sinks.BatchTableSink;
+import org.apache.flink.table.sinks.TableSink;
+import org.apache.flink.table.types.DataType;
+import org.apache.flink.types.Row;
+import org.apache.flink.walkthrough.common.sink.LoggerOutputFormat;
+
+/**
+ * A simple table sink for writing to stdout.
+ */
+@PublicEvolving
+@SuppressWarnings("deprecation")
+public class SpendReportTableSink implements AppendStreamTableSink, 
BatchTableSink {
+
+   private final TableSchema schema;
+
+   public SpendReportTableSink() {
+   this.schema = TableSchema
+   .builder()
+   .field("accountId", Types.LONG)
+   .field("timestamp", Types.SQL_TIMESTAMP)
+   .field("amount", Types.DOUBLE)
+   .build();
+   }
+
+   @Override
+   public void emitDataSet(DataSet dataSet) {
+   dataSet
+   .map(SpendReportTableSink::format)
+   .output(new LoggerOutputFormat());
+   }
+
+   @Override
+   public void emitDataStream(DataStream dataStream) {
+   dataStream
+   .map(SpendReportTableSink::format)
+   .writeUsingOutputFormat(new LoggerOutputFormat());
+   }
+
+   @Override
+   public TableSchema getTableSchema() {
+   return schema;
+   }
+
+   @Override
+   public DataType getConsumedDataType() {
+   return getTableSchema().toRowDataType();
+   }
+
+   @Override
+   public String[] getFieldNames() {
+   return getTableSchema().getFieldNames();
+   }
+
+   @Override
+   public TypeInformation[] getFieldTypes() {
+   return getTableSchema().getFieldTypes();
+   }
+
+   @Override
+   public TableSink configure(String[] fieldNames, 
TypeInformation[] fieldTypes) {
+   return this;
+   }
+
+   private static String format(Row row) {
 
 Review comment:
   Thank you, did not know how to do that!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-24 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r306895655
 
 

 ##
 File path: docs/getting-started/walkthroughs/table_api.md
 ##
 @@ -0,0 +1,445 @@
+---
+title: "Table API"
+nav-id: tableapiwalkthrough
+nav-title: 'Table API'
+nav-parent_id: walkthroughs
+nav-pos: 1
+---
+
+
+Apache Flink offers a Table API as a unified, relational API for batch and 
stream processing, i.e., queries are executed with the same semantics on 
unbounded, real-time streams or bounded, recorded streams and produce the same 
results.
+The Table API in Flink is commonly used to ease the definition of data 
analytics, data pipelining, and ETL applications.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What Will You Be Building? 
+
+In this tutorial, you'll learn how to build a continuous ETL pipeline for 
tracking financial transactions by account over time.
+You will start by building your report as a nightly batch job, and then 
migrate to a streaming pipeline.
+
+## Prerequisites
+
+This walkthrough assumes that you have some familiarity with Java or Scala, 
but you should be able to follow along even if you're coming from a different 
programming language.
+It also assumes that you're familiar with basic relational concepts such as 
`SELECT` and `GROUP BY` clauses. 
+
+## Help, I’m Stuck! 
+
+If you get stuck, check out the [community support 
resources](https://flink.apache.org/community.html).
+In particular, Apache Flink's [user mailing 
list](https://flink.apache.org/community.html#mailing-lists) is consistently 
ranked as one of the most active of any Apache project and a great way to get 
help quickly. 
+
+## How To Follow Along
+
+If you want to follow along, you will require a computer with: 
+
+* Java 8 
+* Maven 
+
+There is also a provided Flink Maven Archetype to create a skeleton project 
with all the necessary dependencies quickly.
+
+
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table-java \{% unless 
site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table-scala \{% unless 
site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+
+
+{% unless site.is_stable %}
+
+Note: For Maven 3.0 or higher, it is no longer possible to specify 
the repository (-DarchetypeCatalog) via the commandline. If you wish to use the 
snapshot repository, you need to add a repository entry to your settings.xml. 
For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html;>Maven
 official document
+
+{% endunless %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With the 
above parameters,
+Maven will create a project with all the dependencies to complete this 
tutorial.
+After importing the project into your editor, you will see a file following 
code. 
+
+
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+
+tEnv.registerTableSource("transactions", new BoundedTransactionTableSource());
+tEnv.registerTableSink("spend_report", new SpendReportTableSink());
+tEnv.registerFunction("truncateDateToHour", new TruncateDateToHour());
+
+tEnv
+   .scan("transactions")
+   .insertInto("spend_report");
+
+env.execute("Spend Report");
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+val env  = ExecutionEnvironment.getExecutionEnvironment
+val tEnv = BatchTableEnvironment.create(env)
+
+tEnv.registerTableSource("transactions", new BoundedTransactionTableSource)
+tEnv.registerTableSink("spend_report", new SpendReportTableSink)
+
+val truncateDateToHour = new TruncateDateToHour
+
+tEnv
+   .scan("transactions")
+   .insertInto("spend_report")
+
+env.execute("Spend Report")
+{% endhighlight %}
+
+
+
+Let's break down this code by component. 
+
+## Breaking Down The Code
+
+ The Execution Environment
+
+The first two lines set up your `ExecutionEnvironment`.
+The execution environment is how you can set properties for your Job, specify 
whether you are writing a batch or streaming

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-24 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r306894748
 
 

 ##
 File path: docs/getting-started/walkthroughs/table_api.md
 ##
 @@ -0,0 +1,445 @@
+---
+title: "Table API"
+nav-id: tableapiwalkthrough
+nav-title: 'Table API'
+nav-parent_id: walkthroughs
+nav-pos: 1
+---
+
+
+Apache Flink offers a Table API as a unified, relational API for batch and 
stream processing, i.e., queries are executed with the same semantics on 
unbounded, real-time streams or bounded, recorded streams and produce the same 
results.
+The Table API in Flink is commonly used to ease the definition of data 
analytics, data pipelining, and ETL applications.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What Will You Be Building? 
+
+In this tutorial, you'll learn how to build a continuous ETL pipeline for 
tracking financial transactions by account over time.
+You will start by building your report as a nightly batch job, and then 
migrate to a streaming pipeline.
+
+## Prerequisites
+
+This walkthrough assumes that you have some familiarity with Java or Scala, 
but you should be able to follow along even if you're coming from a different 
programming language.
+It also assumes that you're familiar with basic relational concepts such as 
`SELECT` and `GROUP BY` clauses. 
+
+## Help, I’m Stuck! 
+
+If you get stuck, check out the [community support 
resources](https://flink.apache.org/community.html).
+In particular, Apache Flink's [user mailing 
list](https://flink.apache.org/community.html#mailing-lists) is consistently 
ranked as one of the most active of any Apache project and a great way to get 
help quickly. 
+
+## How To Follow Along
+
+If you want to follow along, you will require a computer with: 
+
+* Java 8 
+* Maven 
+
+There is also a provided Flink Maven Archetype to create a skeleton project 
with all the necessary dependencies quickly.
+
+
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table-java \{% unless 
site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table-scala \{% unless 
site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+
+
+{% unless site.is_stable %}
+
+Note: For Maven 3.0 or higher, it is no longer possible to specify 
the repository (-DarchetypeCatalog) via the commandline. If you wish to use the 
snapshot repository, you need to add a repository entry to your settings.xml. 
For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html;>Maven
 official document
+
+{% endunless %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With the 
above parameters,
+Maven will create a project with all the dependencies to complete this 
tutorial.
+After importing the project into your editor, you will see a file following 
code. 
 
 Review comment:
   yikes, good catch 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-12 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r303138794
 
 

 ##
 File path: docs/getting-started/tutorials/table_api.md
 ##
 @@ -0,0 +1,423 @@
+---
+title: "Table API"
+nav-id: tableapitutorials
+nav-title: 'Table API'
+nav-parent_id: apitutorials
+nav-pos: 1
+---
+
+
+Apache Flink offers a Table API as a unified, relational API for batch and 
stream processing, i.e., queries are executed with the same semantics on 
unbounded, real-time streams or bounded, recorded streams and produce the same 
results.
+The Table API in Flink is commonly used to ease the definition of data 
analytics, data pipelining, and ETL applications.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What Are We Building? 
+
+In this tutorial, we'll show how to build a continuous ETL pipeline for 
tracking financial transactions by account over time.
+We will start by building our report as a nightly batch job, and then migrate 
to a streaming pipeline to see how batch is just a special case of streaming. 
+
+## Prerequisites
+
+We'll assume that you have some familiarity with Java or Scala, but you should 
be able to follow along even if you're coming from a different programming 
language.
+We'll also assume that you're familiar with basic relational concepts such as 
`SELECT` and `GROUP BY` clauses. 
+
+If you want to follow along, you will require a computer with: 
+
+* Java 8 
+* Maven 
+
+## Help, I’m Stuck! 
+
+If you get stuck, check out the [community support 
resources](https://flink.apache.org/community.html).
+In particular, Apache Flink's [user mailing 
list](https://flink.apache.org/community.html#mailing-lists) is consistently 
ranked as one of the most active of any Apache project and a great way to get 
help quickly. 
+
+## How To Follow Along
+
+If you would like to follow along this walkthrough provides a Flink Maven 
Archetype to create a skeleton project with all the necessary dependencies 
quickly.
+
+
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table-java \{% unless 
site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table-scala \{% unless 
site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+
+
+{% unless site.is_stable %}
+
+Note: For Maven 3.0 or higher, it is no longer possible to specify 
the repository (-DarchetypeCatalog) via the commandline. If you wish to use the 
snapshot repository, you need to add a repository entry to your settings.xml. 
For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html;>Maven
 official document
+
+{% endunless %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With the 
above parameters,
+Maven will create a project with all the dependencies to complete this 
tutorial.
+After importing the project into your editor, you will see a file following 
code. 
+
+
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+
+tEnv.registerTableSource("transactions", new TransactionTableSource());
+tEnv.registerTableSink("spend_report", new SpendReportTableSink());
+tEnv.registerFunction("truncateDateToHour", new TruncateDateToHour());
+
+tEnv
+   .scan("transactions")
+   .insertInto("spend_report");
+
+env.execute("Spend Report");
+{% endhighlight %}
+
+
+
+{% highlight scala %}
+val env  = ExecutionEnvironment.getExecutionEnvironment()
+val tEnv = BatchTableEnvironment.create(env)
+
+tEnv.registerTableSource("transactions", new TransactionTableSource())
+tEnv.registerTableSink("spend_report", new SpendReportTableSink())
+
+val truncateDateToHour = new TruncateDateToHour
+
+tEnv
+   .scan("transactions")
+   .insertInto("spend_report")
+
+env.execute("Spend Report")
+{% endhighlight %}
+
+
+
+Let's break down this code by component. 
+
+## Breaking Down The Code
+
+ The Execution Environment
+
+The first two lines set up our `ExecutionEnvironment`.
+The execution environment is how we set properties for our deployments,

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-05 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r300739522
 
 

 ##
 File path: docs/getting-started/tutorials/table_api.md
 ##
 @@ -0,0 +1,243 @@
+---
+title: "Table API"
+nav-id: tableapitutorials
+nav-title: 'Table API'
+nav-parent_id: apitutorials
+nav-pos: 1
+---
+
+
+Apache Flink offers a Table API as a unified, relational API for batch and 
stream processing, i.e., queries are executed with the same semantics on 
unbounded, real-time streams or bounded, recorded streams and produce the same 
results.
+The Table API in Flink is commonly used to ease the definition of data 
analytics, data pipelining, and ETL applications.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What Are We Building? 
 
 Review comment:
   Ok, I think we're in agreement. It will take me a little bit to remove 
"we"'s in a way that makes sense. In the meantime, let me know if you have any 
other comments content-wise. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-02 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r299567479
 
 

 ##
 File path: docs/getting-started/tutorials/table_api.md
 ##
 @@ -0,0 +1,250 @@
+---
+title: "Table API"
+nav-id: tableapitutorials
+nav-title: 'Table API'
+nav-parent_id: apitutorials
+nav-pos: 1
+---
+
+
+Apache Flink offers a Table API as a unified, relational API for batch and 
stream processing, i.e., queries are executed with the same semantics on 
unbounded, real-time streams or bounded, recorded streams and produce the same 
results.
+The Table API in Flink is commonly used to ease the definition of data 
analytics, data pipelining, and ETL applications.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What Are We Building? 
+
+In this tutorial, we'll show how to build a continuous ETL pipeline for 
tracking financial transactions by account over time.
+We will start by building our report as a nightly batch job, and then migrate 
to a streaming pipeline to see how batch is just a special case of streaming. 
+
+## Prerequisites
+
+We'll assume that you have some familiarity with Java or Scala, but you should 
be able to follow along even if you're coming from a different programming 
language.
+We'll also assume that you're familiar with basic relational concepts such as 
`SELECT` and `GROUP BY` clauses. 
+
+If you want to follow along you will require a computer with: 
+
+* Java 8 
+* Maven 
+
+## Help, I’m Stuck! 
+
+If you get stuck, check out the [community support 
resources](https://flink.apache.org/community.html).
+In particular, Apache Flink's [user mailing 
list](https://flink.apache.org/community.html#mailing-lists) is consistently 
ranked as one of the most active of any Apache project and a great way to get 
help quickly. 
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project structure.
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table \{% unless site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+{% unless site.is_stable %}
+
+Note: For Maven 3.0 or higher, it is no longer possible to specify 
the repository (-DarchetypeCatalog) via the commandline. If you wish to use the 
snapshot repository, you need to add a repository entry to your settings.xml. 
For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html;>Maven
 official document
+
+{% endunless %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With the 
above parameters,
+Maven will create a project with all the dependencies to complete this 
tutorial.
+After importing the project in your editor you will see a file following code. 
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+
+tEnv.registerTableSource("transactions", new TransactionTableSource());
+tEnv.registerTableSink("stdout", new StdOutTableSink());
+tEnv.registerFunction("truncateDateToHour", new TruncateDateToHour());
+
+tEnv
+   .scan("transactions")
+   .insertInto("stdout");
+
+env.execute("Spend Report");
+{% endhighlight %}
+
+Let's break down this code by component. 
+
+## Breaking Down The Code
+
+ The Execution Environment
+
+The first two lines set up our `ExecutionEnvironment`.
+The execution environment is how we set properties for our deployments, 
specify whether we are writing a batch or streaming application, and create our 
sources.
+Here we have chosen to use the batch environment since we are building a 
periodic batch report.
+We then wrap it in a `BatchTableEnvironment` so to have full access to the 
Table Api.
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+{% endhighlight %}
+
+ Registering Tables
+
+Next, we register tables that we can use for the input and output of our 
application. 
+The TableEnvironment maintains a catalog of tables that are registered by 
name. There are two types of tables, input tables and output tables.
+Input tables can be referenced in Table API and SQL queries and provide input 
data.
+Output tables can be used to emit the result of a Table API or SQL query to an 
external system.
+Tables can support batch queries, streaming queries, or both. 
+
+{% highlight java %}
 
 Review comment:
   ~~I agree but I would like to get some more

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-02 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r299567479
 
 

 ##
 File path: docs/getting-started/tutorials/table_api.md
 ##
 @@ -0,0 +1,250 @@
+---
+title: "Table API"
+nav-id: tableapitutorials
+nav-title: 'Table API'
+nav-parent_id: apitutorials
+nav-pos: 1
+---
+
+
+Apache Flink offers a Table API as a unified, relational API for batch and 
stream processing, i.e., queries are executed with the same semantics on 
unbounded, real-time streams or bounded, recorded streams and produce the same 
results.
+The Table API in Flink is commonly used to ease the definition of data 
analytics, data pipelining, and ETL applications.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What Are We Building? 
+
+In this tutorial, we'll show how to build a continuous ETL pipeline for 
tracking financial transactions by account over time.
+We will start by building our report as a nightly batch job, and then migrate 
to a streaming pipeline to see how batch is just a special case of streaming. 
+
+## Prerequisites
+
+We'll assume that you have some familiarity with Java or Scala, but you should 
be able to follow along even if you're coming from a different programming 
language.
+We'll also assume that you're familiar with basic relational concepts such as 
`SELECT` and `GROUP BY` clauses. 
+
+If you want to follow along you will require a computer with: 
+
+* Java 8 
+* Maven 
+
+## Help, I’m Stuck! 
+
+If you get stuck, check out the [community support 
resources](https://flink.apache.org/community.html).
+In particular, Apache Flink's [user mailing 
list](https://flink.apache.org/community.html#mailing-lists) is consistently 
ranked as one of the most active of any Apache project and a great way to get 
help quickly. 
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project structure.
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table \{% unless site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+{% unless site.is_stable %}
+
+Note: For Maven 3.0 or higher, it is no longer possible to specify 
the repository (-DarchetypeCatalog) via the commandline. If you wish to use the 
snapshot repository, you need to add a repository entry to your settings.xml. 
For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html;>Maven
 official document
+
+{% endunless %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With the 
above parameters,
+Maven will create a project with all the dependencies to complete this 
tutorial.
+After importing the project in your editor you will see a file following code. 
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+
+tEnv.registerTableSource("transactions", new TransactionTableSource());
+tEnv.registerTableSink("stdout", new StdOutTableSink());
+tEnv.registerFunction("truncateDateToHour", new TruncateDateToHour());
+
+tEnv
+   .scan("transactions")
+   .insertInto("stdout");
+
+env.execute("Spend Report");
+{% endhighlight %}
+
+Let's break down this code by component. 
+
+## Breaking Down The Code
+
+ The Execution Environment
+
+The first two lines set up our `ExecutionEnvironment`.
+The execution environment is how we set properties for our deployments, 
specify whether we are writing a batch or streaming application, and create our 
sources.
+Here we have chosen to use the batch environment since we are building a 
periodic batch report.
+We then wrap it in a `BatchTableEnvironment` so to have full access to the 
Table Api.
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+{% endhighlight %}
+
+ Registering Tables
+
+Next, we register tables that we can use for the input and output of our 
application. 
+The TableEnvironment maintains a catalog of tables that are registered by 
name. There are two types of tables, input tables and output tables.
+Input tables can be referenced in Table API and SQL queries and provide input 
data.
+Output tables can be used to emit the result of a Table API or SQL query to an 
external system.
+Tables can support batch queries, streaming queries, or both. 
+
+{% highlight java %}
 
 Review comment:
   ~~I agree but I would like to get some more

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-02 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r299610872
 
 

 ##
 File path: docs/getting-started/tutorials/table_api.md
 ##
 @@ -0,0 +1,243 @@
+---
+title: "Table API"
+nav-id: tableapitutorials
+nav-title: 'Table API'
+nav-parent_id: apitutorials
+nav-pos: 1
+---
+
+
+Apache Flink offers a Table API as a unified, relational API for batch and 
stream processing, i.e., queries are executed with the same semantics on 
unbounded, real-time streams or bounded, recorded streams and produce the same 
results.
+The Table API in Flink is commonly used to ease the definition of data 
analytics, data pipelining, and ETL applications.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What Are We Building? 
 
 Review comment:
   I agree in general, but I was making the change and it did sound right. I 
think the reason why is I don't view this a technical document so much as a 
conversation. We are taking the user through a tour of a sample application, 
skipping most of the details in favor of high-level concepts. Does that make 
sense or do I sound crazy? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-07-02 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r299567479
 
 

 ##
 File path: docs/getting-started/tutorials/table_api.md
 ##
 @@ -0,0 +1,250 @@
+---
+title: "Table API"
+nav-id: tableapitutorials
+nav-title: 'Table API'
+nav-parent_id: apitutorials
+nav-pos: 1
+---
+
+
+Apache Flink offers a Table API as a unified, relational API for batch and 
stream processing, i.e., queries are executed with the same semantics on 
unbounded, real-time streams or bounded, recorded streams and produce the same 
results.
+The Table API in Flink is commonly used to ease the definition of data 
analytics, data pipelining, and ETL applications.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What Are We Building? 
+
+In this tutorial, we'll show how to build a continuous ETL pipeline for 
tracking financial transactions by account over time.
+We will start by building our report as a nightly batch job, and then migrate 
to a streaming pipeline to see how batch is just a special case of streaming. 
+
+## Prerequisites
+
+We'll assume that you have some familiarity with Java or Scala, but you should 
be able to follow along even if you're coming from a different programming 
language.
+We'll also assume that you're familiar with basic relational concepts such as 
`SELECT` and `GROUP BY` clauses. 
+
+If you want to follow along you will require a computer with: 
+
+* Java 8 
+* Maven 
+
+## Help, I’m Stuck! 
+
+If you get stuck, check out the [community support 
resources](https://flink.apache.org/community.html).
+In particular, Apache Flink's [user mailing 
list](https://flink.apache.org/community.html#mailing-lists) is consistently 
ranked as one of the most active of any Apache project and a great way to get 
help quickly. 
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project structure.
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table \{% unless site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+{% unless site.is_stable %}
+
+Note: For Maven 3.0 or higher, it is no longer possible to specify 
the repository (-DarchetypeCatalog) via the commandline. If you wish to use the 
snapshot repository, you need to add a repository entry to your settings.xml. 
For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html;>Maven
 official document
+
+{% endunless %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With the 
above parameters,
+Maven will create a project with all the dependencies to complete this 
tutorial.
+After importing the project in your editor you will see a file following code. 
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+
+tEnv.registerTableSource("transactions", new TransactionTableSource());
+tEnv.registerTableSink("stdout", new StdOutTableSink());
+tEnv.registerFunction("truncateDateToHour", new TruncateDateToHour());
+
+tEnv
+   .scan("transactions")
+   .insertInto("stdout");
+
+env.execute("Spend Report");
+{% endhighlight %}
+
+Let's break down this code by component. 
+
+## Breaking Down The Code
+
+ The Execution Environment
+
+The first two lines set up our `ExecutionEnvironment`.
+The execution environment is how we set properties for our deployments, 
specify whether we are writing a batch or streaming application, and create our 
sources.
+Here we have chosen to use the batch environment since we are building a 
periodic batch report.
+We then wrap it in a `BatchTableEnvironment` so to have full access to the 
Table Api.
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+{% endhighlight %}
+
+ Registering Tables
+
+Next, we register tables that we can use for the input and output of our 
application. 
+The TableEnvironment maintains a catalog of tables that are registered by 
name. There are two types of tables, input tables and output tables.
+Input tables can be referenced in Table API and SQL queries and provide input 
data.
+Output tables can be used to emit the result of a Table API or SQL query to an 
external system.
+Tables can support batch queries, streaming queries, or both. 
+
+{% highlight java %}
 
 Review comment:
   I agree but I would like to get some more

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-06-27 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r298206821
 
 

 ##
 File path: docs/getting-started/tutorials/table_api.md
 ##
 @@ -0,0 +1,250 @@
+---
+title: "Table API"
+nav-id: tableapitutorials
+nav-title: 'Table API'
+nav-parent_id: apitutorials
+nav-pos: 1
+---
+
+
+Apache Flink offers a Table API as a unified, relational API for batch and 
stream processing, i.e., queries are executed with the same semantics on 
unbounded, real-time streams or bounded, recorded streams and produce the same 
results.
+The Table API in Flink is commonly used to ease the definition of data 
analytics, data pipelining, and ETL applications.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What Are We Building? 
+
+In this tutorial, we'll show how to build a continuous ETL pipeline for 
tracking financial transactions by account over time.
+We will start by building our report as a nightly batch job, and then migrate 
to a streaming pipeline to see how batch is just a special case of streaming. 
+
+## Prerequisites
+
+We'll assume that you have some familiarity with Java or Scala, but you should 
be able to follow along even if you're coming from a different programming 
language.
+We'll also assume that you're familiar with basic relational concepts such as 
`SELECT` and `GROUP BY` clauses. 
+
+If you want to follow along you will require a computer with: 
+
+* Java 8 
+* Maven 
+
+## Help, I’m Stuck! 
+
+If you get stuck, check out the [community support 
resources](https://flink.apache.org/community.html).
+In particular, Apache Flink's [user mailing 
list](https://flink.apache.org/community.html#mailing-lists) is consistently 
ranked as one of the most active of any Apache project and a great way to get 
help quickly. 
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project structure.
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table \{% unless site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+{% unless site.is_stable %}
+
+Note: For Maven 3.0 or higher, it is no longer possible to specify 
the repository (-DarchetypeCatalog) via the commandline. If you wish to use the 
snapshot repository, you need to add a repository entry to your settings.xml. 
For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html;>Maven
 official document
+
+{% endunless %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With the 
above parameters,
+Maven will create a project with all the dependencies to complete this 
tutorial.
+After importing the project in your editor you will see a file following code. 
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+
+tEnv.registerTableSource("transactions", new TransactionTableSource());
+tEnv.registerTableSink("stdout", new StdOutTableSink());
+tEnv.registerFunction("truncateDateToHour", new TruncateDateToHour());
+
+tEnv
+   .scan("transactions")
+   .insertInto("stdout");
+
+env.execute("Spend Report");
+{% endhighlight %}
+
+Let's break down this code by component. 
+
+## Breaking Down The Code
+
+ The Execution Environment
+
+The first two lines set up our `ExecutionEnvironment`.
+The execution environment is how we set properties for our deployments, 
specify whether we are writing a batch or streaming application, and create our 
sources.
+Here we have chosen to use the batch environment since we are building a 
periodic batch report.
+We then wrap it in a `BatchTableEnvironment` so to have full access to the 
Table Api.
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+{% endhighlight %}
+
+ Registering Tables
+
+Next, we register tables that we can use for the input and output of our 
application. 
+The TableEnvironment maintains a catalog of tables that are registered by 
name. There are two types of tables, input tables and output tables.
+Input tables can be referenced in Table API and SQL queries and provide input 
data.
+Output tables can be used to emit the result of a Table API or SQL query to an 
external system.
+Tables can support batch queries, streaming queries, or both. 
+
+{% highlight java %}
+tEnv.registerTableSource("transactions", new

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-06-27 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r298200573
 
 

 ##
 File path: docs/getting-started/tutorials/table_api.md
 ##
 @@ -0,0 +1,250 @@
+---
+title: "Table API"
+nav-id: tableapitutorials
+nav-title: 'Table API'
+nav-parent_id: apitutorials
+nav-pos: 1
+---
+
+
+Apache Flink offers a Table API as a unified, relational API for batch and 
stream processing, i.e., queries are executed with the same semantics on 
unbounded, real-time streams or bounded, recorded streams and produce the same 
results.
+The Table API in Flink is commonly used to ease the definition of data 
analytics, data pipelining, and ETL applications.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What Are We Building? 
+
+In this tutorial, we'll show how to build a continuous ETL pipeline for 
tracking financial transactions by account over time.
+We will start by building our report as a nightly batch job, and then migrate 
to a streaming pipeline to see how batch is just a special case of streaming. 
+
+## Prerequisites
+
+We'll assume that you have some familiarity with Java or Scala, but you should 
be able to follow along even if you're coming from a different programming 
language.
+We'll also assume that you're familiar with basic relational concepts such as 
`SELECT` and `GROUP BY` clauses. 
+
+If you want to follow along you will require a computer with: 
+
+* Java 8 
+* Maven 
+
+## Help, I’m Stuck! 
+
+If you get stuck, check out the [community support 
resources](https://flink.apache.org/community.html).
+In particular, Apache Flink's [user mailing 
list](https://flink.apache.org/community.html#mailing-lists) is consistently 
ranked as one of the most active of any Apache project and a great way to get 
help quickly. 
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project structure.
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table \{% unless site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+{% unless site.is_stable %}
+
+Note: For Maven 3.0 or higher, it is no longer possible to specify 
the repository (-DarchetypeCatalog) via the commandline. If you wish to use the 
snapshot repository, you need to add a repository entry to your settings.xml. 
For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html;>Maven
 official document
+
+{% endunless %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With the 
above parameters,
+Maven will create a project with all the dependencies to complete this 
tutorial.
+After importing the project in your editor you will see a file following code. 
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+
+tEnv.registerTableSource("transactions", new TransactionTableSource());
+tEnv.registerTableSink("stdout", new StdOutTableSink());
+tEnv.registerFunction("truncateDateToHour", new TruncateDateToHour());
+
+tEnv
+   .scan("transactions")
+   .insertInto("stdout");
+
+env.execute("Spend Report");
+{% endhighlight %}
+
+Let's break down this code by component. 
+
+## Breaking Down The Code
+
+ The Execution Environment
+
+The first two lines set up our `ExecutionEnvironment`.
+The execution environment is how we set properties for our deployments, 
specify whether we are writing a batch or streaming application, and create our 
sources.
+Here we have chosen to use the batch environment since we are building a 
periodic batch report.
+We then wrap it in a `BatchTableEnvironment` so to have full access to the 
Table Api.
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+{% endhighlight %}
+
+ Registering Tables
+
+Next, we register tables that we can use for the input and output of our 
application. 
+The TableEnvironment maintains a catalog of tables that are registered by 
name. There are two types of tables, input tables and output tables.
+Input tables can be referenced in Table API and SQL queries and provide input 
data.
+Output tables can be used to emit the result of a Table API or SQL query to an 
external system.
+Tables can support batch queries, streaming queries, or both. 
+
+{% highlight java %}
+tEnv.registerTableSource("transactions", new

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-06-27 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r298169905
 
 

 ##
 File path: pom.xml
 ##
 @@ -82,11 +81,13 @@ under the License.
flink-mesos
flink-metrics
flink-yarn
+   flink-shaded-yarn-tests
flink-yarn-tests
flink-fs-tests
flink-docs
flink-python
flink-ml-parent
+   flink-walkthroughs
 
 Review comment:
   Rebased on fixed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-06-26 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r297843842
 
 

 ##
 File path: pom.xml
 ##
 @@ -82,11 +81,13 @@ under the License.
flink-mesos
flink-metrics
flink-yarn
+   flink-shaded-yarn-tests
flink-yarn-tests
flink-fs-tests
flink-docs
flink-python
flink-ml-parent
+   flink-walkthroughs
 
 Review comment:
   I don't know why the pom diff is so crazy, I think it's because this is 
based on #8873 which does not have the most recent pom. I've verified that this 
line is the only difference between this branch and master. After #8873 is 
merged I will rebase and ensure the parent pom is correct. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-06-26 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r297844944
 
 

 ##
 File path: docs/getting-started/tutorials/table_api.md
 ##
 @@ -0,0 +1,250 @@
+---
+title: "Table API"
+nav-id: tableapitutorials
+nav-title: 'Table API'
+nav-parent_id: apitutorials
+nav-pos: 1
+---
+
+
+Apache Flink offers a Table API as a unified, relational API for batch and 
stream processing, i.e., queries are executed with the same semantics on 
unbounded, real-time streams or bounded, recorded streams and produce the same 
results.
+The Table API in Flink is commonly used to ease the definition of data 
analytics, data pipelining, and ETL applications.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What Are We Building? 
+
+In this tutorial, we'll show how to build a continuous ETL pipeline for 
tracking financial transactions by account over time.
+We will start by building our report as a nightly batch job, and then migrate 
to a streaming pipeline to see how batch is just a special case of streaming. 
+
+## Prerequisites
+
+We'll assume that you have some familiarity with Java or Scala, but you should 
be able to follow along even if you're coming from a different programming 
language.
+We'll also assume that you're familiar with basic relational concepts such as 
`SELECT` and `GROUP BY` clauses. 
+
+If you want to follow along you will require a computer with: 
+
+* Java 8 
+* Maven 
+
+## Help, I’m Stuck! 
+
+If you get stuck, check out the [community support 
resources](https://flink.apache.org/community.html).
+In particular, Apache Flink's [user 
mailing](https://flink.apache.org/community.html#mailing-lists) list is 
consistently rated as one of the most active of any Apache project and a great 
way to get help quickly. 
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project structure.
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table \{% unless site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+{% unless site.is_stable %}
+
+Note: For Maven 3.0 or higher, it is no longer possible to specify 
the repository (-DarchetypeCatalog) via the commandline. If you wish to use the 
snapshot repository, you need to add a repository entry to your settings.xml. 
For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html;>Maven
 official document
+
+{% endunless %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With the 
above parameters,
+Maven will create a project with all the dependencies to complete this 
tutorial.
+After importing the project in your editor you will see a file following code. 
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+
+tEnv.registerTableSource("transactions", new TransactionTableSource());
+tEnv.registerTableSink("stdout", new StdOutTableSink());
+tEnv.registerFunction("truncateDateToHour", new TruncateDateToHour());
+
+tEnv
+   .scan("transactions")
+   .insertInto("stdout");
+
+env.execute("Spend Report");
+{% endhighlight %}
+
+Let's break down this code by component. 
+
+## Breaking Down The Code
+
+ The Execution Environment
+
+The first two lines set up our `ExecutionEnvironment`.
+The execution environment is how we set properties for our deployments, 
specify whether we are writing a batch or streaming application, and create our 
sources.
+Here we have chosen to use the batch environment since we are building a 
periodic batch report.
+We then wrap it in a `BatchTableEnvironment` so to have full access to the 
Table Api.
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+{% endhighlight %}
+
+ Registering Tables
+
+Next, we register tables that we can use for the input and output of our 
application. 
+The TableEnvironment maintains a catalog of tables that are registered by 
name. There are two types of tables, input tables and output tables.
+Input tables can be referenced in Table API and SQL queries and provide input 
data.
+Output tables can be used to emit the result of a Table API or SQL query to an 
external system.
+Tables can support batch queries, streaming queries, or both. 
+
+{% highlight java %}
+tEnv.registerTableSource("transactions", new

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-06-26 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r297844944
 
 

 ##
 File path: docs/getting-started/tutorials/table_api.md
 ##
 @@ -0,0 +1,250 @@
+---
+title: "Table API"
+nav-id: tableapitutorials
+nav-title: 'Table API'
+nav-parent_id: apitutorials
+nav-pos: 1
+---
+
+
+Apache Flink offers a Table API as a unified, relational API for batch and 
stream processing, i.e., queries are executed with the same semantics on 
unbounded, real-time streams or bounded, recorded streams and produce the same 
results.
+The Table API in Flink is commonly used to ease the definition of data 
analytics, data pipelining, and ETL applications.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What Are We Building? 
+
+In this tutorial, we'll show how to build a continuous ETL pipeline for 
tracking financial transactions by account over time.
+We will start by building our report as a nightly batch job, and then migrate 
to a streaming pipeline to see how batch is just a special case of streaming. 
+
+## Prerequisites
+
+We'll assume that you have some familiarity with Java or Scala, but you should 
be able to follow along even if you're coming from a different programming 
language.
+We'll also assume that you're familiar with basic relational concepts such as 
`SELECT` and `GROUP BY` clauses. 
+
+If you want to follow along you will require a computer with: 
+
+* Java 8 
+* Maven 
+
+## Help, I’m Stuck! 
+
+If you get stuck, check out the [community support 
resources](https://flink.apache.org/community.html).
+In particular, Apache Flink's [user 
mailing](https://flink.apache.org/community.html#mailing-lists) list is 
consistently rated as one of the most active of any Apache project and a great 
way to get help quickly. 
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project structure.
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table \{% unless site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+{% unless site.is_stable %}
+
+Note: For Maven 3.0 or higher, it is no longer possible to specify 
the repository (-DarchetypeCatalog) via the commandline. If you wish to use the 
snapshot repository, you need to add a repository entry to your settings.xml. 
For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html;>Maven
 official document
+
+{% endunless %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With the 
above parameters,
+Maven will create a project with all the dependencies to complete this 
tutorial.
+After importing the project in your editor you will see a file following code. 
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+
+tEnv.registerTableSource("transactions", new TransactionTableSource());
+tEnv.registerTableSink("stdout", new StdOutTableSink());
+tEnv.registerFunction("truncateDateToHour", new TruncateDateToHour());
+
+tEnv
+   .scan("transactions")
+   .insertInto("stdout");
+
+env.execute("Spend Report");
+{% endhighlight %}
+
+Let's break down this code by component. 
+
+## Breaking Down The Code
+
+ The Execution Environment
+
+The first two lines set up our `ExecutionEnvironment`.
+The execution environment is how we set properties for our deployments, 
specify whether we are writing a batch or streaming application, and create our 
sources.
+Here we have chosen to use the batch environment since we are building a 
periodic batch report.
+We then wrap it in a `BatchTableEnvironment` so to have full access to the 
Table Api.
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+{% endhighlight %}
+
+ Registering Tables
+
+Next, we register tables that we can use for the input and output of our 
application. 
+The TableEnvironment maintains a catalog of tables that are registered by 
name. There are two types of tables, input tables and output tables.
+Input tables can be referenced in Table API and SQL queries and provide input 
data.
+Output tables can be used to emit the result of a Table API or SQL query to an 
external system.
+Tables can support batch queries, streaming queries, or both. 
+
+{% highlight java %}
+tEnv.registerTableSource("transactions", new

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-06-26 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r297844944
 
 

 ##
 File path: docs/getting-started/tutorials/table_api.md
 ##
 @@ -0,0 +1,250 @@
+---
+title: "Table API"
+nav-id: tableapitutorials
+nav-title: 'Table API'
+nav-parent_id: apitutorials
+nav-pos: 1
+---
+
+
+Apache Flink offers a Table API as a unified, relational API for batch and 
stream processing, i.e., queries are executed with the same semantics on 
unbounded, real-time streams or bounded, recorded streams and produce the same 
results.
+The Table API in Flink is commonly used to ease the definition of data 
analytics, data pipelining, and ETL applications.
+
+* This will be replaced by the TOC
+{:toc}
+
+## What Are We Building? 
+
+In this tutorial, we'll show how to build a continuous ETL pipeline for 
tracking financial transactions by account over time.
+We will start by building our report as a nightly batch job, and then migrate 
to a streaming pipeline to see how batch is just a special case of streaming. 
+
+## Prerequisites
+
+We'll assume that you have some familiarity with Java or Scala, but you should 
be able to follow along even if you're coming from a different programming 
language.
+We'll also assume that you're familiar with basic relational concepts such as 
`SELECT` and `GROUP BY` clauses. 
+
+If you want to follow along you will require a computer with: 
+
+* Java 8 
+* Maven 
+
+## Help, I’m Stuck! 
+
+If you get stuck, check out the [community support 
resources](https://flink.apache.org/community.html).
+In particular, Apache Flink's [user 
mailing](https://flink.apache.org/community.html#mailing-lists) list is 
consistently rated as one of the most active of any Apache project and a great 
way to get help quickly. 
+
+## Setting up a Maven Project
+
+We are going to use a Flink Maven Archetype for creating our project structure.
+
+{% highlight bash %}
+$ mvn archetype:generate \
+-DarchetypeGroupId=org.apache.flink \
+-DarchetypeArtifactId=flink-walkthrough-table \{% unless site.is_stable %}
+
-DarchetypeCatalog=https://repository.apache.org/content/repositories/snapshots/
 \{% endunless %}
+-DarchetypeVersion={{ site.version }} \
+-DgroupId=spend-report \
+-DartifactId=spend-report \
+-Dversion=0.1 \
+-Dpackage=spendreport \
+-DinteractiveMode=false
+{% endhighlight %}
+
+{% unless site.is_stable %}
+
+Note: For Maven 3.0 or higher, it is no longer possible to specify 
the repository (-DarchetypeCatalog) via the commandline. If you wish to use the 
snapshot repository, you need to add a repository entry to your settings.xml. 
For details about this change, please refer to http://maven.apache.org/archetype/maven-archetype-plugin/archetype-repository.html;>Maven
 official document
+
+{% endunless %}
+
+You can edit the `groupId`, `artifactId` and `package` if you like. With the 
above parameters,
+Maven will create a project with all the dependencies to complete this 
tutorial.
+After importing the project in your editor you will see a file following code. 
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+
+tEnv.registerTableSource("transactions", new TransactionTableSource());
+tEnv.registerTableSink("stdout", new StdOutTableSink());
+tEnv.registerFunction("truncateDateToHour", new TruncateDateToHour());
+
+tEnv
+   .scan("transactions")
+   .insertInto("stdout");
+
+env.execute("Spend Report");
+{% endhighlight %}
+
+Let's break down this code by component. 
+
+## Breaking Down The Code
+
+ The Execution Environment
+
+The first two lines set up our `ExecutionEnvironment`.
+The execution environment is how we set properties for our deployments, 
specify whether we are writing a batch or streaming application, and create our 
sources.
+Here we have chosen to use the batch environment since we are building a 
periodic batch report.
+We then wrap it in a `BatchTableEnvironment` so to have full access to the 
Table Api.
+
+{% highlight java %}
+ExecutionEnvironment env   = ExecutionEnvironment.getExecutionEnvironment();
+BatchTableEnvironment tEnv = BatchTableEnvironment.create(env);
+{% endhighlight %}
+
+ Registering Tables
+
+Next, we register tables that we can use for the input and output of our 
application. 
+The TableEnvironment maintains a catalog of tables that are registered by 
name. There are two types of tables, input tables and output tables.
+Input tables can be referenced in Table API and SQL queries and provide input 
data.
+Output tables can be used to emit the result of a Table API or SQL query to an 
external system.
+Tables can support batch queries, streaming queries, or both. 
+
+{% highlight java %}
+tEnv.registerTableSource("transactions", new

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

2019-06-26 Thread GitBox

sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] 
Getting Started - Table API Example Walkthrough
URL: https://github.com/apache/flink/pull/8903#discussion_r297843842
 
 

 ##
 File path: pom.xml
 ##
 @@ -82,11 +81,13 @@ under the License.
flink-mesos
flink-metrics
flink-yarn
+   flink-shaded-yarn-tests
flink-yarn-tests
flink-fs-tests
flink-docs
flink-python
flink-ml-parent
+   flink-walkthroughs
 
 Review comment:
   I don't know why the pom diff is so crazy, I think it's because this is 
based on #8873 which does not have the most recent pom. I've verified that this 
line is the only difference between this branch and master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

[GitHub] [flink] sjwiesman commented on a change in pull request #8903: [FLINK-12747][docs] Getting Started - Table API Example Walkthrough

25 matches

Site Navigation

Mail list logo

Footer information