[GitHub] [flink] alpinegizmo commented on a change in pull request #11826: [FLINK-17236][docs] Add Tutorials section overview

GitBox Mon, 20 Apr 2020 13:25:41 -0700


alpinegizmo commented on a change in pull request #11826:
URL: https://github.com/apache/flink/pull/11826#discussion_r411667459




##########
File path: docs/tutorials/index.md
##########
@@ -0,0 +1,186 @@
+---
+title: Hands-on Tutorials
+nav-id: tutorials
+nav-pos: 2
+nav-title: '<i class="fa fa-hand-paper-o title appetizer" 
aria-hidden="true"></i> Hands-on Tutorials'
+nav-parent_id: root
+nav-show_overview: true
+always-expand: true
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+* This will be replaced by the TOC
+{:toc}
+
+## Goals and Scope of these Tutorials
+
+These tutorials present an introduction to Apache Flink that includes just 
enough to get you started
+writing scalable streaming ETL, analytics, and event-driven applications, 
while leaving out a lot of
+(ultimately important) details. The focus is on providing straightforward 
introductions to Flink's
+APIs for managing state and time, with the expectation that having mastered 
these fundamentals,
+you'll be much better equipped to pick up the rest of what you need to know 
from the more detailed
+reference documentation. The links at the end of each page will lead you to 
where you can learn
+more.
+
+Specifically, you will learn:
+
+- how to implement streaming data processing pipelines
+- how and why Flink manages state
+- how to use event time to consistently compute accurate analytics
+- how to build event-driven applications on continuous streams
+- how Flink is able to provide fault-tolerant, stateful stream processing with 
exactly-once semantics
+
+These tutorials focus on four critical concepts: continuous processing of 
streaming data, event
+time, stateful stream processing, and state snapshots. This page introduces 
these concepts.
+
+{% info Note %} Accompanying these tutorials are a set of hands-on exercises 
that will guide you
+through learning how to work with the concepts being presented.
+
+{% top %}
+
+## Stream Processing
+
+Streams are data's natural habitat. Whether it's events from web servers, 
trades from a stock
+exchange, or sensor readings from a machine on a factory floor, data is 
created as part of a stream.
+But when you analyze data, you can either organize your processing around 
_bounded_ or _unbounded_
+streams, and which of these paradigms you choose has profound consequences.
+
+<img src="{{ site.baseurl }}/fig/bounded-unbounded.png" alt="Bounded and 
unbounded streams" class="offset" width="90%" />
+
+**Batch processing** is the paradigm at work when you process a bounded data 
stream. In this mode of
+operation you can choose to ingest the entire dataset before producing any 
results, which means that
+it's possible, for example, to sort the data, compute global statistics, or 
produce a final report
+that summarizes all of the input.
+
+**Stream processing**, on the other hand, involves unbounded data streams. 
Conceptually, at least,
+the input may never end, and so you are forced to continuously process the 
data as it arrives. 
+
+In Flink, applications are composed of **streaming dataflows** that may be 
transformed by
+user-defined **operators**. These dataflows form directed graphs that start 
with one or more
+**sources**, and end in one or more **sinks**.
+
+<img src="{{ site.baseurl }}/fig/program_dataflow.svg" alt="A DataStream 
program, and its dataflow." class="offset" width="80%" />
+
+Often there is a one-to-one correspondence between the transformations in the 
programs and the
+operators in the dataflow. Sometimes, however, one transformation may consist 
of multiple operators.
+
+An application may consume real-time data from streaming sources such as 
message queues or
+distributed logs, such as Apache Kafka or Kinesis. But flink can also consume 
bounded, historic data
+from a variety of data sources. Similarly, the streams of results being 
produced by a Flink
+application can be sent to a wide variety of systems, and the state held 
within Flink can be
+accessed via a REST API.

Review comment:
       Given the questionable nature of queryable state, I think I'll just drop 
this phrase.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [flink] alpinegizmo commented on a change in pull request #11826: [FLINK-17236][docs] Add Tutorials section overview

Reply via email to