fhueske commented on a change in pull request #8607: [FLINK-12652] 
[documentation] add first version of a glossary
URL: https://github.com/apache/flink/pull/8607#discussion_r291671831
 
 

 ##########
 File path: docs/concepts/glossary.md
 ##########
 @@ -0,0 +1,166 @@
+---
+title: Glossary
+nav-pos: 3
+nav-title: Glossary
+nav-parent_id: concepts
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+#### Flink Application Cluster
+
+A Flink Application Cluster is a dedicated [Flink 
Cluster](./glossary#flink-cluster) that only
+executes a single [Flink Job](./glossary#flink-job). The lifetime of the
+[Flink Cluster](./glossary#flink-cluster) is bound to the lifetime of the 
Flink Job. Formerly
+Flink Application Clusters were also known as Flink Clusters in *job mode*. 
Compare to
+[Flink Session Cluster](./glossary#flink-session-cluster).
+
+#### Flink Cluster
+
+The distributed system consisting of (typically) one Flink Master process and 
one or more Flink
+Taskmanagers processes.
+
+#### Event
+
+An event is a statement about a change of the state of the domain modelled by 
the
+application. Events can be input and/or output of a stream or batch processing 
application.
+Events are special types of [records](./glossary#Record)
+
+#### ExecutionGraph
+
+see [Physical Graph](./glossary#physical-graph)
+
+#### Function
+
+Functions, or user-defined functions (UDFs), are implemented by the user and 
encapsulate the
+application logic of a Flink program. Most Functions are wrapped by a 
corresponding
+[Operator](./glossary#operator).
+
+#### Instance
+
+The term *instance* is used to describe a specific instance of a specific type 
(usually
+[Operator](./glossary#operator) or [Function](./glossary#function)) during 
runtime. As Apache Flink
+is mostly written in Java, this corresponds to the definition of *Instance* or 
*Object* in Java.
+In the context of Apache Flink, the term *parallel instance* is also 
frequently used to emphasize
+that multiple instances of the same [Operator](./glossary#operator) or
+[Function](./glossary#function) type are running in parallel.
+
+#### Flink Job
+
+A Flink Job is the runtime representation of a Flink program. A Flink Job can 
either be submitted
+to a long running [Flink Session Cluster](./glossary#flink-session-cluster) or 
it can be started as a
+self-contained [Flink Application 
Cluster](./glossary#flink-application-cluster).
+
+#### JobGraph
+
+see [Logical Graph](./glossary#logical-graph)
+
+#### Flink JobManager
+
+JobManagers are one of the components running in the
+[Flink JobManger Process](./glossary#flink-jobmanager-process). A JobManager 
is responsible for
+supervising the execution of the [Tasks](./glossary#task) of a single job.
+
+#### Logical Graph
+
+A logical graph is a directed graph describing the high-level logic of a 
stream processing program.
+The nodes are [Operators](./glossary#operator) and the edges indicate 
input/output-relationships or
+data streams or data sets.
+
+#### Managed State
+
+Managed State describes application state which has been registered with the 
framework. For
+Managed State, Apache Flink will take care about persistence and rescaling 
among other things.
+
+#### Flink JobManager Process
+
+The Job Manager Process is the master of a [Flink 
Cluster](./glossary#flink-cluster). It is called
+*JobManager* for historical reasons, but actually has actually contains three 
distinct components:
+Flink Resource Manager, Flink Dispatcher and one [Flink 
JobManager](./glossary#flink-jobmanager)
+per running [Flink Job](./glossary#flink-job).
+
+#### Operator
+
+Node of a [Logical Graph](./glossary#logical-graph). An Operator performs a 
certain operation,
+which is usually executed by a [Function](./glossary#function). Sources and 
Sinks are special
+Operators for data ingestion and data egress.
+
+#### Operator Chain
+
+An Operator Chain consists of one or more consecutive 
[Operators](./glossary#operator) without any
+repartitioning in between. Operators within the same Operation Chain forward 
records to each other
+directly without going through serialization or Flink's network stack.
+
+#### Partition
+
+A partition is an independent subset of the overall data stream or data set. A 
data stream or
+data set is divided into partitions by assigning each 
[record](./glossary#Record) to one or more
+partitions. Partitions of data streams or data sets are consumed by 
[Tasks](./glossary#task) during
+runtime. A transformation which changes the way a data stream or data set is 
partitioned is often
+called repartitioning.
+
+#### Physical Graph
+
+A physical graph is the result of translating a [Logical 
Graph](./glossary#logical-graph) for
+execution in a distributed runtime. The nodes are [Tasks](./glossary#task) and 
the edges indicate
+input/output-relationships or [partitions](./glossary#partition) of data 
streams or data sets.
+
+#### Record
+
+Records are the constituent elements of a data set or data stream.
+[Operators](./glossary#operator) and [Functions](./glossary#Function) receive 
records as input
+and emit records as output.
+
+#### Flink Session Cluster
+
+A long-running [Flink Cluster](./glossary#flink-cluster) which accepts multiple
+[Flink Jobs](./glossary#flink-job) for execution. The lifetime of this Flink 
Cluster is not bound
+to the lifetime of any Flink Job. Formerly, a Flink Session Cluster was also 
known as a Flink Cluster in
+*session mode*. Compare to [Flink Application 
Cluster](./glossary#flink-application-cluster).
+
+#### State Backend
+
+For stream processing programs, the State Backend determines how state is 
stored on each Taskmanager
+(Java Heap of Taskmanager or (embedded) RocksDB) as well as where it is 
written upon a checkpoint
+(Java Heap of Flink Master or Filesystem).
+
+#### Sub-Task
+
+A Sub-Task is an instance of [Task](./glossary#task) responsible for 
processing a
+[partition](./glossary#partition) of the data stream. The term "Sub-Task" 
emphasizes that there are
+multiple parallel Tasks for the same [Operator(s)](./glossary#operator).
+
+#### Task
+
+Node of a [Physical Graph](./glossary#physical-graph). A task is the basic 
unit of work, which is
+executed by Flink's runtime. Tasks encapsulate exactly one parallel
+[Operator Chain](./glossary#operator-chain).
 
 Review comment:
   "one parallel Operator Chain" -> "one parallel instance of an Operator or 
Operator Chain"?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

Reply via email to