[GitHub] [flink] alpinegizmo commented on a change in pull request #11834: [FLINK-17237][docs] Add Intro to DataStream API tutorial

2020-04-22 Thread GitBox


alpinegizmo commented on a change in pull request #11834:
URL: https://github.com/apache/flink/pull/11834#discussion_r412156252



##
File path: docs/tutorials/datastream_api.md
##
@@ -0,0 +1,252 @@
+---
+title: Intro to the DataStream API
+nav-id: datastream-api
+nav-pos: 2
+nav-title: Intro to the DataStream API
+nav-parent_id: tutorials
+permalink: /tutorials/datastream_api.html
+---
+
+
+The focus of this tutorial is to broadly cover the DataStream API well enough 
that you will be
+able to get started writing streaming applications. 
+
+* This will be replaced by the TOC
+{:toc}
+
+## What can be Streamed?
+
+Flink's DataStream APIs for Java and Scala will let you stream anything they 
can serialize. Flink's
+own serializer is used for
+
+- basic types, i.e., String, Long, Integer, Boolean, Array
+- composite types: Tuples, POJOs, and Scala case classes
+
+and Flink falls back to Kryo for other types. It's also possible to use other 
serializers with
+Flink. Avro, in particular, is well supported.
+
+### Java tuples and POJOs
+
+Flink's native serializer can operate efficiently on tuples and POJOs.
+
+ Tuples
+
+For Java, Flink defines its own Tuple1 thru Tuple25 types.
+
+{% highlight java %}
+Tuple2 person = new Tuple2<>("Fred", 35);
+
+// zero based index!  
+String name = person.f0;
+Integer age = person.f1;
+{% endhighlight %}
+
+ POJOs
+
+A POJO (plain old Java object) is any Java class that
+
+- has an empty default constructor
+- all fields are either
+  - public, or
+  - have a default getter and setter
+
+Example:
+
+{% highlight java %}
+public class Person {
+public String name;  
+public Integer age;  
+public Person() {};  
+public Person(String name, Integer age) {  
+. . .
+};  
+}  
+
+Person person = new Person("Fred Flintstone", 35);
+{% endhighlight %}
+
+Flink's serializer [supports schema evolution for POJO types]({{ site.baseurl 
}}{% link dev/stream/state/schema_evolution.md %}#pojo-types).
+
+### Scala tuples and case classes
+
+These work just as you'd expect.
+
+{% top %}
+
+## A Complete Example
+
+This example takes a stream of records about people as input, and filters it 
to only include the adults.
+
+{% highlight java %}
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.api.common.functions.FilterFunction;
+
+public class Example {
+
+public static void main(String[] args) throws Exception {
+final StreamExecutionEnvironment env =
+StreamExecutionEnvironment.getExecutionEnvironment();
+
+DataStream flintstones = env.fromElements(
+new Person("Fred", 35),
+new Person("Wilma", 35),
+new Person("Pebbles", 2));
+
+DataStream adults = flintstones.filter(new 
FilterFunction() {
+@Override
+public boolean filter(Person person) throws Exception {
+return person.age >= 18;
+}
+});
+
+adults.print();
+
+env.execute();
+}
+
+public static class Person {
+public String name;
+public Integer age;
+public Person() {};
+
+public Person(String name, Integer age) {
+this.name = name;
+this.age = age;
+};
+
+public String toString() {
+return this.name.toString() + ": age " + this.age.toString();
+};
+}
+}
+{% endhighlight %}
+
+### Stream execution environment
+
+Every Flink application needs an execution environment, `env` in this example. 
Streaming
+applications should use a `StreamExecutionEnvironment`.
+
+The DataStream API calls made in your application build a job graph that is 
attached to the
+`StreamExecutionEnvironment`. When `env.execute()` is called this graph is 
packaged up and sent to
+the Flink Master, which parallelizes the job and distributes slices of it to 
the Task Managers for
+execution. Each parallel slice of your job will be executed in a *task slot*.
+
+Note that if you don't call execute(), your application won't be run.
+
+
+
+This distributed runtime depends on your application being serializable. It 
also requires that all
+dependencies are available to each node in the cluster.
+
+### Basic stream sources
+
+The example above constructs a `DataStream` using 
`env.fromElements(...)`. This is a
+convenient way to throw together a simple stream for use in a prototype or 
test. There is also a
+`fromCollection(Collection)` method on `StreamExecutionEnvironment`. So 
instead, you could do this:
+
+{% highlight java %}
+List people = new ArrayList();
+
+people.add(new Person("Fred", 35));
+people.add(new Person("Wilma", 35));
+people.add(new Person("Pebbles", 2));
+
+DataStream flintstones = env.fromCollection(people);
+{% endhighlight %}
+
+Another convenient way to get some data into a stream while prototyping is to 
use a 

[GitHub] [flink] alpinegizmo commented on a change in pull request #11834: [FLINK-17237][docs] Add Intro to DataStream API tutorial

2020-04-21 Thread GitBox


alpinegizmo commented on a change in pull request #11834:
URL: https://github.com/apache/flink/pull/11834#discussion_r412151143



##
File path: docs/tutorials/datastream_api.md
##
@@ -0,0 +1,252 @@
+---
+title: Intro to the DataStream API
+nav-id: datastream-api
+nav-pos: 2
+nav-title: Intro to the DataStream API
+nav-parent_id: tutorials
+permalink: /tutorials/datastream_api.html
+---
+
+
+The focus of this tutorial is to broadly cover the DataStream API well enough 
that you will be
+able to get started writing streaming applications. 
+
+* This will be replaced by the TOC
+{:toc}
+
+## What can be Streamed?
+
+Flink's DataStream APIs for Java and Scala will let you stream anything they 
can serialize. Flink's
+own serializer is used for
+
+- basic types, i.e., String, Long, Integer, Boolean, Array
+- composite types: Tuples, POJOs, and Scala case classes
+
+and Flink falls back to Kryo for other types. It's also possible to use other 
serializers with
+Flink. Avro, in particular, is well supported.
+
+### Java tuples and POJOs
+
+Flink's native serializer can operate efficiently on tuples and POJOs.
+
+ Tuples
+
+For Java, Flink defines its own Tuple1 thru Tuple25 types.
+
+{% highlight java %}
+Tuple2 person = new Tuple2<>("Fred", 35);
+
+// zero based index!  
+String name = person.f0;
+Integer age = person.f1;
+{% endhighlight %}
+
+ POJOs
+
+A POJO (plain old Java object) is any Java class that
+
+- has an empty default constructor
+- all fields are either
+  - public, or
+  - have a default getter and setter
+
+Example:
+
+{% highlight java %}
+public class Person {
+public String name;  
+public Integer age;  
+public Person() {};  
+public Person(String name, Integer age) {  
+. . .
+};  
+}  
+
+Person person = new Person("Fred Flintstone", 35);
+{% endhighlight %}
+
+Flink's serializer [supports schema evolution for POJO types]({{ site.baseurl 
}}{% link dev/stream/state/schema_evolution.md %}#pojo-types).
+
+### Scala tuples and case classes
+
+These work just as you'd expect.
+
+{% top %}
+
+## A Complete Example
+
+This example takes a stream of records about people as input, and filters it 
to only include the adults.
+
+{% highlight java %}
+import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
+import org.apache.flink.streaming.api.datastream.DataStream;
+import org.apache.flink.api.common.functions.FilterFunction;
+
+public class Example {
+
+public static void main(String[] args) throws Exception {
+final StreamExecutionEnvironment env =
+StreamExecutionEnvironment.getExecutionEnvironment();
+
+DataStream flintstones = env.fromElements(
+new Person("Fred", 35),
+new Person("Wilma", 35),
+new Person("Pebbles", 2));
+
+DataStream adults = flintstones.filter(new 
FilterFunction() {
+@Override
+public boolean filter(Person person) throws Exception {
+return person.age >= 18;
+}
+});

Review comment:
   This is the only explanation of a FilterFunction that they're going to 
get before the exercise (which uses one). For that reason, I wrote this out in 
full.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] alpinegizmo commented on a change in pull request #11834: [FLINK-17237][docs] Add Intro to DataStream API tutorial

2020-04-21 Thread GitBox


alpinegizmo commented on a change in pull request #11834:
URL: https://github.com/apache/flink/pull/11834#discussion_r412150195



##
File path: docs/tutorials/datastream_api.md
##
@@ -0,0 +1,252 @@
+---
+title: Intro to the DataStream API
+nav-id: datastream-api
+nav-pos: 2
+nav-title: Intro to the DataStream API
+nav-parent_id: tutorials
+permalink: /tutorials/datastream_api.html
+---
+
+
+The focus of this tutorial is to broadly cover the DataStream API well enough 
that you will be
+able to get started writing streaming applications. 
+
+* This will be replaced by the TOC
+{:toc}
+
+## What can be Streamed?
+
+Flink's DataStream APIs for Java and Scala will let you stream anything they 
can serialize. Flink's
+own serializer is used for
+
+- basic types, i.e., String, Long, Integer, Boolean, Array
+- composite types: Tuples, POJOs, and Scala case classes
+
+and Flink falls back to Kryo for other types. It's also possible to use other 
serializers with
+Flink. Avro, in particular, is well supported.
+
+### Java tuples and POJOs
+
+Flink's native serializer can operate efficiently on tuples and POJOs.
+
+ Tuples
+
+For Java, Flink defines its own Tuple1 thru Tuple25 types.
+
+{% highlight java %}
+Tuple2 person = new Tuple2<>("Fred", 35);
+
+// zero based index!  
+String name = person.f0;
+Integer age = person.f1;
+{% endhighlight %}
+
+ POJOs
+
+A POJO (plain old Java object) is any Java class that
+
+- has an empty default constructor
+- all fields are either
+  - public, or
+  - have a default getter and setter

Review comment:
   sure, done





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] alpinegizmo commented on a change in pull request #11834: [FLINK-17237][docs] Add Intro to DataStream API tutorial

2020-04-21 Thread GitBox


alpinegizmo commented on a change in pull request #11834:
URL: https://github.com/apache/flink/pull/11834#discussion_r412125480



##
File path: docs/tutorials/datastream_api.md
##
@@ -0,0 +1,252 @@
+---
+title: Intro to the DataStream API
+nav-id: datastream-api
+nav-pos: 2
+nav-title: Intro to the DataStream API
+nav-parent_id: tutorials
+permalink: /tutorials/datastream_api.html
+---
+
+
+The focus of this tutorial is to broadly cover the DataStream API well enough 
that you will be
+able to get started writing streaming applications. 
+
+* This will be replaced by the TOC
+{:toc}
+
+## What can be Streamed?
+
+Flink's DataStream APIs for Java and Scala will let you stream anything they 
can serialize. Flink's
+own serializer is used for
+
+- basic types, i.e., String, Long, Integer, Boolean, Array
+- composite types: Tuples, POJOs, and Scala case classes
+
+and Flink falls back to Kryo for other types. It's also possible to use other 
serializers with
+Flink. Avro, in particular, is well supported.
+
+### Java tuples and POJOs
+
+Flink's native serializer can operate efficiently on tuples and POJOs.
+
+ Tuples
+
+For Java, Flink defines its own Tuple1 thru Tuple25 types.
+
+{% highlight java %}
+Tuple2 person = new Tuple2<>("Fred", 35);

Review comment:
   fixed





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




[GitHub] [flink] alpinegizmo commented on a change in pull request #11834: [FLINK-17237][docs] Add Intro to DataStream API tutorial

2020-04-21 Thread GitBox


alpinegizmo commented on a change in pull request #11834:
URL: https://github.com/apache/flink/pull/11834#discussion_r412124573



##
File path: docs/tutorials/datastream_api.md
##
@@ -0,0 +1,252 @@
+---
+title: Intro to the DataStream API
+nav-id: datastream-api
+nav-pos: 2
+nav-title: Intro to the DataStream API
+nav-parent_id: tutorials
+permalink: /tutorials/datastream_api.html
+---
+
+
+The focus of this tutorial is to broadly cover the DataStream API well enough 
that you will be
+able to get started writing streaming applications. 
+
+* This will be replaced by the TOC
+{:toc}
+
+## What can be Streamed?
+
+Flink's DataStream APIs for Java and Scala will let you stream anything they 
can serialize. Flink's
+own serializer is used for
+
+- basic types, i.e., String, Long, Integer, Boolean, Array
+- composite types: Tuples, POJOs, and Scala case classes
+
+and Flink falls back to Kryo for other types. It's also possible to use other 
serializers with
+Flink. Avro, in particular, is well supported.
+
+### Java tuples and POJOs
+
+Flink's native serializer can operate efficiently on tuples and POJOs.
+
+ Tuples
+
+For Java, Flink defines its own Tuple1 thru Tuple25 types.

Review comment:
   True, but Tuple0 is weird. What's it for?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org