[flink-kubernetes-operator] branch main updated: [FLINK-26510] Add FlinkDeployment CR overview + management docs

gyfora Tue, 15 Mar 2022 09:14:58 -0700

This is an automated email from the ASF dual-hosted git repository.

gyfora pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/flink-kubernetes-operator.git



The following commit(s) were added to refs/heads/main by this push:
     new 3233206  [FLINK-26510] Add FlinkDeployment CR overview + management 
docs
3233206 is described below

commit 32332066e80d04912989f71d6b1313a001f6b3f3
Author: Gyula Fora <g_f...@apple.com>
AuthorDate: Tue Mar 15 16:19:42 2022 +0100

    [FLINK-26510] Add FlinkDeployment CR overview + management docs
---
 .../content/docs/custom-resource/job-management.md | 130 ++++++++++++++++++++-
 docs/content/docs/custom-resource/overview.md      | 128 +++++++++++++++++++-
 docs/content/docs/custom-resource/reference.md     |  13 +++
 .../partials/docs/inject/content-before.html       |   5 +-
 docs/template/crd-ref.template                     |  14 ++-
 .../kubernetes/operator/crd/FlinkDeployment.java   |   2 +-
 6 files changed, 285 insertions(+), 7 deletions(-)

diff --git a/docs/content/docs/custom-resource/job-management.md 
b/docs/content/docs/custom-resource/job-management.md
index 0d99555..a0120e0 100644
--- a/docs/content/docs/custom-resource/job-management.md
+++ b/docs/content/docs/custom-resource/job-management.md
@@ -24,4 +24,132 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Job Management
+# Job Lifecycle Management
+
+The core responsibility of the Flink operator is to manage the full production 
lifecycle of Flink jobs.
+
+What is covered:
+ - Running, suspending and deleting applications
+ - Stateful and stateless application upgrades
+ - Triggering savepoints
+
+The behaviour is always controlled by the respective configuration fields of 
the `JobSpec` object as introduced in the [FlinkDeployment overview]({{< ref 
"docs/custom-resource/overview" >}}).
+
+Let's take a look at these operations in detail.
+
+## Running, suspending and deleting applications
+
+By controlling the `state` field of the `JobSpec` users can define the desired 
state of the application.
+
+Supported application states:
+ - `running` : The job is expected to be running and processing data.
+ - `suspended` : Data processing should be temporarily suspended, with the 
intention of continuing later.
+
+**Job State transitions**
+
+There are 4 possible state change scenarios when updating the current 
FlinkDeployment.
+
+ - `running` -> `running` : Job upgrade operation. In practice, a suspend 
followed by a restore operation.
+ - `running` -> `suspended` : Suspend operation. Stops the job while 
maintaining state information for stateful applications.
+ - `suspended` -> `running` : Restore operation. Start the application from 
current state using the latest spec.
+ - `suspended` -> `suspended` : Update spec, job is not started.
+
+The way state is handled for suspend and restore operations is described in 
detail in the next section.
+
+**Cancelling/Deleting applications**
+
+As you can see there is no cancelled or deleted among the possible desired 
states. When users no longer wish to process data with a given FlinkDeployment 
they can simply delete the deployment object using the Kubernetes api:
+
+```
+kubectl delete flinkdeployment my-deployment
+```
+
+{{< hint danger >}}
+Deleting a deployment will remove all checkpoint and status information. 
Future deployments will from an empty state unless manually overriden by the 
user.
+{{< /hint >}}
+
+## Stateful and stateless application upgrades
+
+When the spec changes for a FlinkDeployment the running Application or Session 
cluster must be upgraded.
+In order to do this the operator will stop the currently running job (unless 
already suspended) and redeploy it using the latest spec and state carried over 
from the previous run for stateful applications.
+
+Users have full control on how state should be managed when stopping and 
restoring stateful applications using the `upgradeMode` setting of the JobSpec.
+
+Supported values:`stateless`, `savepoint`, `last-state`
+
+The `upgradeMode` setting controls both the stop and restore mechanisms as 
detailed in the following table:
+
+| | Stateless | Last State | Savepoint |
+| ---- | ---------- | ---- | ---- |
+| Config Requirement | None | Checkpointing & Kubernetes HA Enabled | 
Savepoint directory defined |
+| Job Status Requirement | None* | None* | Job Running |
+| Suspend Mechanism | Cancel / Delete | Delete Flink deployment (keep HA 
metadata) | Cancel with savepoint |
+| Restore Mechanism | Deploy from empty state | Recover last state using HA 
metadata | Restore From savepoint |
+
+*\*In general no update can be executed while a savepoint operation is in 
progress*
+
+The three different upgrade modes are intended to support different use-cases:
+ - stateless: Stateless applications, prototyping
+ - last-state: Suitable for most stateful production applications. Quick 
upgrades in any application state (even for failing jobs), does not require a 
healthy job. Requires Flink Kubernetes HA configuration (see example below).
+ - savepoint: Suitable for forking, migrating applications. Requires a healthy 
running job as it requires a savepoint operation before shutdown.
+
+Full example using the `last-state` strategy:
+
+```yaml
+apiVersion: flink.apache.org/v1alpha1
+kind: FlinkDeployment
+metadata:
+  namespace: default
+  name: basic-checkpoint-ha-example
+spec:
+  image: flink:1.14.3
+  flinkConfiguration:
+    taskmanager.numberOfTaskSlots: "2"
+    state.savepoints.dir: file:///flink-data/savepoints
+    high-availability: 
org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
+    high-availability.storageDir: file:///flink-data/ha
+  jobManager:
+    replicas: 1
+    resource:
+      memory: "2048m"
+      cpu: 1
+  taskManager:
+    resource:
+      memory: "2048m"
+      cpu: 1
+  podTemplate:
+    spec:
+      serviceAccount: flink
+      containers:
+        - name: flink-main-container
+          volumeMounts:
+          - mountPath: /flink-data
+            name: flink-volume
+      volumes:
+      - name: flink-volume
+        hostPath:
+          # directory location on host
+          path: /tmp/flink
+          # this field is optional
+          type: Directory
+  job:
+    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
+    parallelism: 2
+    upgradeMode: last-state
+    state: running
+```
+
+## Savepoint management
+
+Savepoints can be triggered manually by defining a random (nonce) value to the 
variable `savepointTriggerNonce` in the job specification:
+
+```yaml
+ job:
+    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
+    parallelism: 2
+    upgradeMode: savepoint
+    state: running
+    savepointTriggerNonce: 123
+```
+
+Changing the nonce value will trigger a new savepoint. Information about 
pending and last savepoint is stored in the FlinkDeployment status.
diff --git a/docs/content/docs/custom-resource/overview.md 
b/docs/content/docs/custom-resource/overview.md
index 567a729..2cf25b3 100644
--- a/docs/content/docs/custom-resource/overview.md
+++ b/docs/content/docs/custom-resource/overview.md
@@ -24,4 +24,130 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# Architecture
+# FlinkDeployment Overview
+
+The core user facing API of the Flink Kubernetes Operator is the 
FlinkDeployment Custom Resource (CR).
+
+Custom Resources are extensions of the Kubernetes API and define new object 
types. In our case the FlinkDeployment CR defines Flink Application and Session 
cluster deployments.
+
+Once the Flink Kubernetes Operator is installed and running in your Kubernetes 
environment, it will continuously watch FlinkDeployment objects submitted by 
the user to detect new deployments and changes to existing ones. In case you 
haven't deployed the operator yet, please check out the [quickstart]({{< ref 
"docs/try-flink-kubernetes-operator/quick-start" >}}) for detailed instructions 
on how to get started.
+
+FlinkDeployment objects are defined in YAML format by the user and must 
contain the following required fields:
+
+```
+apiVersion: flink.apache.org/v1alpha1
+kind: FlinkDeployment
+metadata:
+  namespace: namespace-of-my-deployment
+  name: my-deployment
+spec:
+  // Deployment specs of your Flink Session/Application
+```
+
+The `apiVersion`, `kind` fields have fixed values while `metadata` and `spec` 
control the actual Flink deployment.
+
+The Flink operator will subsequently add status information to your 
FlinkDeployment object based on the observed deployment state:
+
+```
+kubectl get flinkdeployment my-deployment -o yaml
+```
+
+```yaml
+apiVersion: flink.apache.org/v1alpha1
+kind: FlinkDeployment
+metadata:
+  ...
+spec:
+  ...
+status:
+  jobManagerDeploymentStatus: READY
+  jobStatus:
+    jobId: 93dfe8199a35d5503f4048a1a999c704
+    jobName: State machine job
+    savepointInfo: {}
+    state: RUNNING
+    updateTime: "1647351134601"
+  reconciliationStatus:
+    lastReconciledSpec:
+      ...
+    success: true
+```
+
+Users can use the status of the FlinkDeployment to gauge the health of their 
deployments and any executed operation.
+
+## FlinkDeployment spec overview
+
+The `spec` is the most important part of the `FlinkDeployment` as it describes 
the desired Flink Application or Session cluster.
+The spec contains all the information the operator need to deploy and manage 
your Flink deployments, including docker images, configurations, desired state 
etc.
+
+Most deployments will define at least the following fields:
+ - `image` : Docker used to run Flink job and task manager processes
+ - `serviceAccount` : Kubernetes service account used by the Flink pods
+ - `taskManager, jobManager` : Job and Task manager pod resource specs (cpu, 
memory, etc.)
+ - `flinkConfiguration` : Map of Flink configuration overrides such as HA and 
checkpointing configs
+ - `job` : Job Spec for Application deployments
+
+The Flink Kubernetes Operator supports two main types of deployments: 
**Application** and **Session**
+
+Application deployments manage a single job deployment in Application mode 
while Session deployments manage Flink Session clusters without providing any 
job management for it. The type of cluster created depends on the `spec` 
provided by the user as we will see in the next sections.
+
+### Application Deployments
+
+To create an Application deployment users must define the `job` (JobSpec) 
field in their deployment spec.
+
+Required fields:
+ - `jarURI` : URI of the job jar
+ - `parallelism` : Parallelism of the job
+ - `upgradeMode` : Upgrade mode of the job (stateless/savepoint/last-state)
+ - `state` : Desired state of the job (running/suspended)
+
+Minimal example:
+
+```yaml
+apiVersion: flink.apache.org/v1alpha1
+kind: FlinkDeployment
+metadata:
+  namespace: default
+  name: basic-example
+spec:
+  image: flink:1.14.3
+  flinkConfiguration:
+    taskmanager.numberOfTaskSlots: "2"
+  serviceAccount: flink
+  jobManager:
+    replicas: 1
+    resource:
+      memory: "2048m"
+      cpu: 1
+  taskManager:
+    resource:
+      memory: "2048m"
+      cpu: 1
+  job:
+    jarURI: local:///opt/flink/examples/streaming/StateMachineExample.jar
+    parallelism: 2
+    upgradeMode: stateless
+    state: running
+```
+
+Once created FlinkDeployment yamls can be submitted through kubectl:
+
+```bash
+kubectl apply -f your-deployment.yaml
+```
+
+### Session Cluster Deployments
+
+Session clusters use a similar spec to application clusters with the only 
difference that `job` is not defined.
+
+For Session clusters the operator only provides very basic management and 
monitoring that cover:
+ - Start Session cluster
+ - Monitor overall cluster health
+ - Stop / Delete Session clsuter
+
+## Further information
+
+ - [Job Management and Stateful upgrades]({{< ref 
"docs/custom-resource/job-management" >}})
+ - [Deployment customoziation and pod templates]({{< ref 
"docs/custom-resource/pod-template" >}})
+ - [Full Reference]({{< ref "docs/custom-resource/reference" >}})
+ - 
[Examples](https://github.com/apache/flink-kubernetes-operator/tree/main/examples)
diff --git a/docs/content/docs/custom-resource/reference.md 
b/docs/content/docs/custom-resource/reference.md
index d04c79e..614b959 100644
--- a/docs/content/docs/custom-resource/reference.md
+++ b/docs/content/docs/custom-resource/reference.md
@@ -23,6 +23,19 @@ under the License.
 -->
 
 # FlinkDeployment Reference
+
+This page serves as a full reference for FlinkDeployment custom resource 
definition including all the possible configuration parameters.
+
+## FlinkDeployment
+**Class**: org.apache.flink.kubernetes.operator.crd.FlinkDeployment
+
+**Description**: Custom resource that represents both Application and Session 
deployments.
+
+| Parameter | Type | Docs |
+| ----------| ---- | ---- |
+| spec | org.apache.flink.kubernetes.operator.crd.spec.FlinkDeploymentSpec | 
Spec that describes a Flink application or session cluster deployment. |
+| status | 
org.apache.flink.kubernetes.operator.crd.status.FlinkDeploymentStatus | Last 
observed status of the Flink deployment. |
+
 ## Spec
 
 ### FlinkDeploymentSpec
diff --git a/docs/layouts/partials/docs/inject/content-before.html 
b/docs/layouts/partials/docs/inject/content-before.html
index 0be58be..7304b9c 100644
--- a/docs/layouts/partials/docs/inject/content-before.html
+++ b/docs/layouts/partials/docs/inject/content-before.html
@@ -23,15 +23,14 @@ under the License.
 {{ if $.Site.Params.ShowOutDatedWarning }}
 <article class="markdown">
     <blockquote style="border-color:#f66">
-        {{ markdownify "This documentation is for an out-of-date version of 
Apache Flink Kubernetes Operator. We recommend you use the latest [stable 
version](https://ci.apache.org/projects/flink/flink-ml-docs-stable/)."}}
+        {{ markdownify "This documentation is for an out-of-date version of 
Apache Flink Kubernetes Operator. We recommend you use the latest [stable 
version](https://ci.apache.org/projects/flink/flink-kubernetes-operator-docs-stable/)."}}
     </blockquote>
 </article>
 {{ end }}
 {{ if (not $.Site.Params.IsStable) }}
 <article class="markdown">
     <blockquote style="border-color:#f66">
-        {{ markdownify "This documentation is for an unreleased version of the 
Apache Flink Kubernetes Operator. We recommend you use the latest [stable 
version](https://ci.apache.org/projects/flink/flink-ml-docs-stable/)."}}
+        {{ markdownify "This documentation is for an unreleased version of the 
Apache Flink Kubernetes Operator. We recommend you use the latest [stable 
version](https://ci.apache.org/projects/flink/flink-kubernetes-operator-docs-stable/)."}}
     </blockquote>
 </article>
 {{ end }}
-
diff --git a/docs/template/crd-ref.template b/docs/template/crd-ref.template
index c7aa433..835fd90 100644
--- a/docs/template/crd-ref.template
+++ b/docs/template/crd-ref.template
@@ -22,4 +22,16 @@ specific language governing permissions and limitations
 under the License.
 -->
 
-# FlinkDeployment Reference
\ No newline at end of file
+# FlinkDeployment Reference
+
+This page serves as a full reference for FlinkDeployment custom resource 
definition including all the possible configuration parameters.
+
+## FlinkDeployment
+**Class**: org.apache.flink.kubernetes.operator.crd.FlinkDeployment
+
+**Description**: Custom resource that represents both Application and Session 
deployments.
+
+| Parameter | Type | Docs |
+| ----------| ---- | ---- |
+| spec | org.apache.flink.kubernetes.operator.crd.spec.FlinkDeploymentSpec | 
Spec that describes a Flink application or session cluster deployment. |
+| status | 
org.apache.flink.kubernetes.operator.crd.status.FlinkDeploymentStatus | Last 
observed status of the Flink deployment. |
diff --git 
a/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/crd/FlinkDeployment.java
 
b/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/crd/FlinkDeployment.java
index cc5f3fb..b0a43e6 100644
--- 
a/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/crd/FlinkDeployment.java
+++ 
b/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/crd/FlinkDeployment.java
@@ -29,7 +29,7 @@ import io.fabric8.kubernetes.model.annotation.Group;
 import io.fabric8.kubernetes.model.annotation.ShortNames;
 import io.fabric8.kubernetes.model.annotation.Version;
 
-/** Flink deployment object (spec + status). */
+/** Custom resource definition that represents both Application and Session 
deployments. */
 @Experimental
 @JsonInclude(JsonInclude.Include.NON_NULL)
 @JsonDeserialize()

[flink-kubernetes-operator] branch main updated: [FLINK-26510] Add FlinkDeployment CR overview + management docs

Reply via email to