[incubator-uniffle-website] 01/01: Add deploy doc (v2)

zhifgli Sat, 22 Oct 2022 08:14:04 -0700

This is an automated email from the ASF dual-hosted git repository.

zhifgli pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-uniffle-website.git


commit 56a53524ddaa883e8a47e0286df6cb960786d71b
Author: frankzfli <[email protected]>
AuthorDate: Sat Oct 22 21:50:15 2022 +0800

    Add deploy doc (v2)
---
 docs/01-intro.md                                   |   4 +
 docs/02-client-guide.md                            | 182 +++++++++++++++++++++
 docs/02-client_guide.md                            |   0
 docs/03-coordinator_guide.md                       |   0
 .../00-coordinator-guide.md                        | 111 +++++++++++++
 .../01-server-guide.md}                            |   0
 .../00-uniffle-operator-design.md                  | 137 ++++++++++++++++
 .../04-deploy-uniffle-cluster-on-k8s/01-install.md |  76 +++++++++
 .../02-examples.md                                 |  31 ++++
 docs/img/rss-crd-state-transition.png              | Bin 0 -> 71668 bytes
 10 files changed, 541 insertions(+)

diff --git a/docs/01-intro.md b/docs/01-intro.md
index f4a6da9..4773fc9 100644
--- a/docs/01-intro.md
+++ b/docs/01-intro.md
@@ -15,6 +15,10 @@
   ~ limitations under the License.
   -->
 
+---
+sidebar_position: 1
+---
+
 # What is Uniffle
 
 Uniffle is a Remote Shuffle Service, and provides the capability for Apache 
Spark applications
diff --git a/docs/02-client-guide.md b/docs/02-client-guide.md
new file mode 100644
index 0000000..dce7120
--- /dev/null
+++ b/docs/02-client-guide.md
@@ -0,0 +1,182 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+
+# Uniffle Shuffle Client Guide
+
+Uniffle is designed as a unified shuffle engine for multiple computing 
frameworks, including Apache Spark and Apache Hadoop.
+Uniffle has provided pluggable client plugins to enable remote shuffle in 
Spark and MapReduce.
+
+## Deploy
+This document will introduce how to deploy Uniffle client plugins with Spark 
and MapReduce.
+
+### Deploy Spark Client Plugin
+
+1. Add client jar to Spark classpath, eg, SPARK_HOME/jars/
+
+   The jar for Spark2 is located in 
<RSS_HOME>/jars/client/spark2/rss-client-XXXXX-shaded.jar
+
+   The jar for Spark3 is located in 
<RSS_HOME>/jars/client/spark3/rss-client-XXXXX-shaded.jar
+
+2. Update Spark conf to enable Uniffle, eg,
+
+   ```
+   spark.shuffle.manager org.apache.spark.shuffle.RssShuffleManager
+   spark.rss.coordinator.quorum <coordinatorIp1>:19999,<coordinatorIp2>:19999
+   # Note: For Spark2, spark.sql.adaptive.enabled should be false because 
Spark2 doesn't support AQE.
+   ```
+
+### Support Spark Dynamic Allocation
+
+To support spark dynamic allocation with Uniffle, spark code should be updated.
+There are 2 patches for spark-2.4.6 and spark-3.1.2 in spark-patches folder 
for reference.
+
+After apply the patch and rebuild spark, add following configuration in spark 
conf to enable dynamic allocation:
+  ```
+  spark.shuffle.service.enabled false
+  spark.dynamicAllocation.enabled true
+  ```
+
+### Deploy MapReduce Client Plugin
+
+1. Add client jar to the classpath of each NodeManager, e.g., 
<HADOOP_HOME>/share/hadoop/mapreduce/
+
+The jar for MapReduce is located in 
<RSS_HOME>/jars/client/mr/rss-client-mr-XXXXX-shaded.jar
+
+2. Update MapReduce conf to enable Uniffle, eg,
+
+   ```
+   
-Dmapreduce.rss.coordinator.quorum=<coordinatorIp1>:19999,<coordinatorIp2>:19999
+   
-Dyarn.app.mapreduce.am.command-opts=org.apache.hadoop.mapreduce.v2.app.RssMRAppMaster
+   
-Dmapreduce.job.map.output.collector.class=org.apache.hadoop.mapred.RssMapOutputCollector
+   
-Dmapreduce.job.reduce.shuffle.consumer.plugin.class=org.apache.hadoop.mapreduce.task.reduce.RssShuffle
+   ```
+Note that the RssMRAppMaster will automatically disable slow start (i.e., 
`mapreduce.job.reduce.slowstart.completedmaps=1`)
+and job recovery (i.e., `yarn.app.mapreduce.am.job.recovery.enable=false`)
+
+
+## Configuration
+
+The important configuration of client is listed as following.
+
+### Common Setting
+These configurations are shared by all types of clients.
+
+|Property Name|Default|Description|
+|---|---|---|
+|<client_type>.rss.coordinator.quorum|-|Coordinator quorum|
+|<client_type>.rss.writer.buffer.size|3m|Buffer size for single partition data|
+|<client_type>.rss.storage.type|-|Supports MEMORY_LOCALFILE, MEMORY_HDFS, 
MEMORY_LOCALFILE_HDFS|
+|<client_type>.rss.client.read.buffer.size|14m|The max data size read from 
storage|
+|<client_type>.rss.client.send.threadPool.size|5|The thread size for send 
shuffle data to shuffle server|
+|<client_type>.rss.client.assignment.tags|-|The comma-separated list of tags 
for deciding assignment shuffle servers. Notice that the SHUFFLE_SERVER_VERSION 
will always as the assignment tag whether this conf is set or not|
+|<client_type>.rss.client.data.commit.pool.size|The number of assigned shuffle 
servers|The thread size for sending commit to shuffle servers|
+|<client_type>.rss.client.assignment.shuffle.nodes.max|-1|The number of 
required assignment shuffle servers. If it is less than 0 or equals to 0 or 
greater than the coordinator's config of "rss.coordinator.shuffle.nodes.max", 
it will use the size of "rss.coordinator.shuffle.nodes.max" default|
+Notice:
+
+1. `<client_type>` should be `spark` or `mapreduce`
+
+2. `<client_type>.rss.coordinator.quorum` is compulsory, and other 
configurations are optional when coordinator dynamic configuration is enabled.
+
+### Adaptive Remote Shuffle Enabling
+
+To select build-in shuffle or remote shuffle in a smart manner, Uniffle 
support adaptive enabling.
+The client should use `DelegationRssShuffleManager` and provide its unique 
<access_id> so that the coordinator could distinguish whether it should enable 
remote shuffle.
+
+```
+spark.shuffle.manager org.apache.spark.shuffle.DelegationRssShuffleManager
+spark.rss.access.id=<access_id> 
+```
+
+Notice:
+Currently, this feature only supports Spark.
+
+Other configuration:
+
+|Property Name|Default|Description|
+|---|---|---|
+|spark.rss.access.timeout.ms|10000|The timeout to access Uniffle coordinator|
+|spark.rss.client.access.retry.interval.ms|20000|The interval between retries 
fallback to SortShuffleManager|
+|spark.rss.client.access.retry.times|0|The number of retries fallback to 
SortShuffleManager|
+
+
+### Client Quorum Setting
+
+Uniffle supports client-side quorum protocol to tolerant shuffle server crash.
+This feature is client-side behaviour, in which shuffle writer sends each 
block to multiple servers, and shuffle readers could fetch block data from one 
of server.
+Since sending multiple replicas of blocks can reduce the shuffle performance 
and resource consumption, we designed it as an optional feature.
+
+|Property Name|Default|Description|
+|---|---|---|
+|<client_type>.rss.data.replica|1|The max server number that each block can be 
send by client in quorum protocol|
+|<client_type>.rss.data.replica.write|1|The min server number that each block 
should be send by client successfully|
+|<client_type>.rss.data.replica.read|1|The min server number that metadata 
should be fetched by client successfully |
+
+Notice:
+
+1. `spark.rss.data.replica.write` + `spark.rss.data.replica.read` > 
`spark.rss.data.replica`
+
+Recommended examples:
+
+1. Performance First (default)
+```
+spark.rss.data.replica 1
+spark.rss.data.replica.write 1
+spark.rss.data.replica.read 1
+```
+
+2. Fault-tolerant First
+```
+spark.rss.data.replica 3
+spark.rss.data.replica.write 2
+spark.rss.data.replica.read 2
+```
+
+### Spark Specialized Setting
+
+The important configuration is listed as following.
+
+|Property Name|Default|Description|
+|---|---|---|
+|spark.rss.writer.buffer.spill.size|128m|Buffer size for total partition data|
+|spark.rss.client.send.size.limit|16m|The max data size sent to shuffle server|
+|spark.rss.client.unregister.thread.pool.size|10|The max size of thread pool 
of unregistering|
+|spark.rss.client.unregister.request.timeout.sec|10|The max timeout sec when 
doing unregister to remote shuffle-servers|
+
+
+### MapReduce Specialized Setting
+
+|Property Name|Default|Description|
+|---|---|---|
+|mapreduce.rss.client.max.buffer.size|3k|The max buffer size in map side|
+|mapreduce.rss.client.batch.trigger.num|50|The max batch of buffers to send 
data in map side|
+
+
+
+
+### Remote Spill (Experimental)
+
+In cloud environment, VM may have very limited disk space and performance.
+This experimental feature allows reduce tasks to spill data to remote storage 
(e.g., hdfs)
+
+|Property Name|Default|Description|
+|---|---|---|
+|mapreduce.rss.reduce.remote.spill.enable|false|Whether to use remote spill|
+|mapreduce.rss.reduce.remote.spill.attempt.inc|1|Increase reduce attempts as 
hdfs is easier to crash than disk|
+|mapreduce.rss.reduce.remote.spill.replication|1|The replication number to 
spill data to hdfs|
+|mapreduce.rss.reduce.remote.spill.retries|5|The retry number to spill data to 
hdfs|
+
+Notice: this feature requires the MEMORY_LOCAL_HDFS mode.
\ No newline at end of file
diff --git a/docs/02-client_guide.md b/docs/02-client_guide.md
deleted file mode 100644
index e69de29..0000000
diff --git a/docs/03-coordinator_guide.md b/docs/03-coordinator_guide.md
deleted file mode 100644
index e69de29..0000000
diff --git a/docs/03-deploy-uniffle-cluster/00-coordinator-guide.md 
b/docs/03-deploy-uniffle-cluster/00-coordinator-guide.md
new file mode 100644
index 0000000..0afb99c
--- /dev/null
+++ b/docs/03-deploy-uniffle-cluster/00-coordinator-guide.md
@@ -0,0 +1,111 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+
+# Uniffle Coordinator Guide
+
+Uniffle is a unified remote shuffle service for compute engines, the role of 
coordinator is responsibility for
+collecting status of shuffle server and doing the assignment for the job.
+
+## Deploy
+This document will introduce how to deploy Uniffle coordinators.
+
+### Steps
+1. unzip package to RSS_HOME
+2. update RSS_HOME/bin/rss-env.sh, eg,
+   ```
+     JAVA_HOME=<java_home>
+     HADOOP_HOME=<hadoop home>
+     XMX_SIZE="16g"
+   ```
+3. update RSS_HOME/conf/coordinator.conf, eg,
+   ```
+     rss.rpc.server.port 19999
+     rss.jetty.http.port 19998
+     rss.coordinator.server.heartbeat.timeout 30000
+     rss.coordinator.app.expired 60000
+     rss.coordinator.shuffle.nodes.max 5
+     # enable dynamicClientConf, and coordinator will be responsible for most 
of client conf
+     rss.coordinator.dynamicClientConf.enabled true
+     # config the path of client conf
+     rss.coordinator.dynamicClientConf.path <RSS_HOME>/conf/dynamic_client.conf
+     # config the path of excluded shuffle server
+     rss.coordinator.exclude.nodes.file.path <RSS_HOME>/conf/exclude_nodes
+   ```
+4. update <RSS_HOME>/conf/dynamic_client.conf, rss client will get default 
conf from coordinator eg,
+   ```
+    # MEMORY_LOCALFILE_HDFS is recommandation for production environment
+    rss.storage.type MEMORY_LOCALFILE_HDFS
+    # multiple remote storages are supported, and client will get assignment 
from coordinator
+    rss.coordinator.remote.storage.path 
hdfs://cluster1/path,hdfs://cluster2/path
+    rss.writer.require.memory.retryMax 1200
+    rss.client.retry.max 100
+    rss.writer.send.check.timeout 600000
+    rss.client.read.buffer.size 14m
+   ```
+
+5. update <RSS_HOME>/conf/exclude_nodes, coordinator will update excluded node 
by this file eg,
+   ```
+    # shuffleServer's ip and port, connected with "-"
+    110.23.15.36-19999
+    110.23.15.35-19996
+   ```
+
+6. start Coordinator
+   ```
+    bash RSS_HOME/bin/start-coordnator.sh
+   ```
+
+## Configuration
+
+### Common settings
+|Property Name|Default|        Description|
+|---|---|---|
+|rss.coordinator.server.heartbeat.timeout|30000|Timeout if can't get heartbeat 
from shuffle server|
+|rss.coordinator.server.periodic.output.interval.times|30|The periodic 
interval times of output alive nodes. The interval sec can be calculated by 
(rss.coordinator.server.heartbeat.timeout/3 * 
rss.coordinator.server.periodic.output.interval.times). Default output interval 
is 5min.|
+|rss.coordinator.assignment.strategy|PARTITION_BALANCE|Strategy for assigning 
shuffle server, PARTITION_BALANCE should be used for workload balance|
+|rss.coordinator.app.expired|60000|Application expired time (ms), the 
heartbeat interval should be less than it|
+|rss.coordinator.shuffle.nodes.max|9|The max number of shuffle server when do 
the assignment|
+|rss.coordinator.dynamicClientConf.path|-|The path of configuration file which 
have default conf for rss client|
+|rss.coordinator.exclude.nodes.file.path|-|The path of configuration file 
which have exclude nodes|
+|rss.coordinator.exclude.nodes.check.interval.ms|60000|Update interval (ms) 
for exclude nodes|
+|rss.coordinator.access.checkers|org.apache.uniffle.coordinator.AccessClusterLoadChecker|The
 access checkers will be used when the spark client use the 
DelegationShuffleManager, which will decide whether to use rss according to the 
result of the specified access checkers|
+|rss.coordinator.access.loadChecker.memory.percentage|15.0|The minimal 
percentage of available memory percentage of a server|
+|rss.coordinator.dynamicClientConf.enabled|false|whether to enable dynamic 
client conf, which will be fetched by spark client|
+|rss.coordinator.dynamicClientConf.path|-|The dynamic client conf of this 
cluster and can be stored in HDFS or local|
+|rss.coordinator.dynamicClientConf.updateIntervalSec|120|The dynamic client 
conf update interval in seconds|
+|rss.coordinator.remote.storage.cluster.conf|-|Remote Storage Cluster related 
conf with format $clusterId,$key=$value, separated by ';'|
+|rss.rpc.server.port|-|RPC port for coordinator|
+|rss.jetty.http.port|-|Http port for coordinator|
+|rss.coordinator.remote.storage.select.strategy|APP_BALANCE|Strategy for 
selecting the remote path|
+|rss.coordinator.remote.storage.io.sample.schedule.time|60000|The time of 
scheduling the read and write time of the paths to obtain different HDFS|
+|rss.coordinator.remote.storage.io.sample.file.size|204800000|The size of the 
file that the scheduled thread reads and writes|
+|rss.coordinator.remote.storage.io.sample.access.times|3|The number of times 
to read and write HDFS files|
+|rss.coordinator.startup-silent-period.enabled|false|Enable the 
startup-silent-period to reject the assignment requests for avoiding partial 
assignments. To avoid service interruption, this mechanism is disabled by 
default. Especially it's recommended to use in coordinator HA mode when 
restarting single coordinator.|
+|rss.coordinator.startup-silent-period.duration|20000|The waiting duration(ms) 
when conf of rss.coordinator.startup-silent-period.enabled is enabled.|
+
+### AccessClusterLoadChecker settings
+|Property Name|Default|        Description|
+|---|---|---|
+|rss.coordinator.access.loadChecker.serverNum.threshold|-|The minimal required 
number of healthy shuffle servers when being accessed by client. And when not 
specified, it will use the required shuffle-server number from client as the 
checking condition. If there is no client shuffle-server number specified, the 
coordinator conf of rss.coordinator.shuffle.nodes.max will be adopted|
+
+### AccessCandidatesChecker settings
+AccessCandidatesChecker is one of the built-in access checker, which will 
allow user to define the candidates list to use rss.
+
+|Property Name|Default|        Description|
+|---|---|---|
+|rss.coordinator.access.candidates.updateIntervalSec|120|Accessed candidates 
update interval in seconds, which is only valid when AccessCandidatesChecker is 
enabled.|
+|rss.coordinator.access.candidates.path|-|Accessed candidates file path, the 
file can be stored on HDFS|
diff --git a/docs/04-server_guide.md 
b/docs/03-deploy-uniffle-cluster/01-server-guide.md
similarity index 100%
rename from docs/04-server_guide.md
rename to docs/03-deploy-uniffle-cluster/01-server-guide.md
diff --git 
a/docs/04-deploy-uniffle-cluster-on-k8s/00-uniffle-operator-design.md 
b/docs/04-deploy-uniffle-cluster-on-k8s/00-uniffle-operator-design.md
new file mode 100644
index 0000000..13c81da
--- /dev/null
+++ b/docs/04-deploy-uniffle-cluster-on-k8s/00-uniffle-operator-design.md
@@ -0,0 +1,137 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+
+# Uniffle Operator Design
+
+## Summary
+
+The purpose is to develop an operator to facilitate the rapid deployment of 
Uniffle in kubernetes environments.
+
+## Motivation
+
+Using the advantages of kubernetes in container orchestration, elastic 
scaling, and rolling upgrades, uniffle can more
+easily manage coordinator and shuffle server clusters.
+
+In addition, based on the operating characteristics of shuffle servers, we 
hope to achieve safe offline:
+
+1. Before a shuffle server is scaled down or upgraded, it should be added to 
the Coordinator's blacklist in advance.
+2. After ensuring that the number of remaining applications is 0, allow its 
corresponding pod to be deleted and removed
+   from the blacklist.
+
+We don't just want to simply pull up the coordinators and shuffle servers, but 
also ensure that running jobs are not
+affected. Therefore, we decided to develop an operator specifically.
+
+## Goals
+
+Operator will implement the following functions:
+
+1. Normally pull up two coordinator deployments (to ensure active-active) and 
a shuffle server statefulSet.
+2. Supports replica expansion and upgrade of coordinators and shuffle servers, 
among which shuffle server also supports
+   grayscale upgrade.
+3. Using the webhook mechanism, before a shuffle server is deleted, add its 
name to the coordinator's blacklist, and
+   check the number of applications remaining running, and then release the 
pod deletion request after ensuring safety.
+
+## Design Details
+
+This operator consists of two components: a crd controller and a webhook that 
admits crd and pod requests.
+
+The crd controller observes the status changes of the crd and controls the 
workload changes.
+
+The webhook verifies the changes of the crd, and admits the pod deletion 
request according to whether the number of
+remaining applications is 0.
+
+The webhook will add the pod to be deleted to the coordinator's blacklist. 
When the pod is actually deleted, the
+controller will remove it from the blacklist.
+
+## CRD Definition
+
+An example of a crd object is as follows：
+
+```yaml
+apiVersion: uniffle.apache.org/v1alpha1
+kind: RemoteShuffleService
+metadata:
+  name: rss-demo
+  namespace: kube-system
+spec:
+  # ConfigMapName indicates configMap name stores configurations of 
coordinators and shuffle servers.
+  configMapName: rss-demo
+  # Coordinator represents the relevant configuration of the coordinators.
+  coordinator:
+    # Image represents the mirror image used by coordinators.
+    image: ${coordinator-image}
+    # InitContainerImage is optional, mainly for non-root users to initialize 
host path permissions.
+    initContainerImage: "busybox:latest"
+    # Count is the number of coordinator workloads to be generated.
+    # By default, we will deploy two coordinators to ensure active-active.
+    count: 2
+    # RpcNodePort represents the port required by the rpc protocol of the 
coordinators,
+    # and the range is the same as the port range of the NodePort type service 
in kubernetes.
+    # By default, we will deploy two coordinators to ensure active-active.
+    rpcNodePort:
+      - 30001
+      - 30011
+    # httpNodePort represents the port required by the http protocol of the 
coordinators,
+    # and the range is the same as the port range of the NodePort type service 
in kubernetes.
+    # By default, we will deploy two coordinators to ensure active-active.
+    httpNodePort:
+      - 30002
+      - 30012
+    # XmxSize indicates the xmx size configured for coordinators.
+    xmxSize: "10G"
+    # ConfigDir records the directory where the configuration of coordinators 
reside.
+    configDir: "/data/rssadmin/rss/conf"
+    # Replicas field is the replicas of each coordinator's deployment.
+    replicas: 1
+    # ExcludeNodesFilePath indicates exclude nodes file path in coordinators' 
containers.
+    excludeNodesFilePath: "/data/rssadmin/rss/coo/exclude_nodes"
+    # SecurityContext holds pod-level security attributes and common container 
settings.
+    securityContext:
+      # RunAsUser specifies the user ID of all processes in coordinator pods.
+      runAsUser: 1000
+      # FsGroup specifies the group ID of the owner of the volume within 
coordinator pods.
+      fsGroup: 1000
+    # LogHostPath represents the host path used to save logs of coordinators.
+    logHostPath: "/data/logs/rss"
+    # HostPathMounts field indicates host path volumes and their mounting path 
within coordinators' containers.
+    hostPathMounts:
+      /data/logs/rss: /data/rssadmin/rss/logs
+  # shuffleServer represents the relevant configuration of the shuffleServers
+  shuffleServer:
+    # Sync marks whether the shuffle server needs to be updated or restarted.
+    # When the user needs to update the shuffle servers, it needs to be set to 
true.
+    # After the update is successful, the controller will modify it to false.
+    sync: true
+    # Replicas field is the replicas of each coordinator's deployment.
+    replicas: 3
+    # Image represents the mirror image used by shuffle servers.
+    image: ${shuffle-server-image}
+```
+
+After a user creates a rss object, the rss-controller component will create 
the corresponding workloads.
+
+For coordinators, the user directly modifies the rss object, and the 
controller synchronizes the corresponding state to
+the workloads.
+
+For shuffle servers, only by changing the spec.shuffleServer.sync field to 
true, the controller will apply the
+corresponding updates to the workloads.
+
+If you want more examples, please read more in [examples](02-examples.md).
+
+## State Transition
+
+![state transition](../img/rss-crd-state-transition.png)
diff --git a/docs/04-deploy-uniffle-cluster-on-k8s/01-install.md 
b/docs/04-deploy-uniffle-cluster-on-k8s/01-install.md
new file mode 100644
index 0000000..82b8930
--- /dev/null
+++ b/docs/04-deploy-uniffle-cluster-on-k8s/01-install.md
@@ -0,0 +1,76 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+
+
+# Installation
+
+This section shows us how to install operator in our cluster.
+
+## Requirements
+
+1. Kubernetes 1.14+
+2. Kubectl 1.14+
+
+Please make sure the kubectl is properly configured to interact with the 
Kubernetes environment.
+
+## Preparing Images of Coordinators and Shuffle Servers
+
+Run the following command:
+
+```
+cd /deploy/kubernetes/docker && sh build.sh --registry ${our-registry}
+```
+
+## Creating or Updating CRD
+
+We can refer
+to [crd yaml 
file](https://github.com/apache/incubator-uniffle/tree/master/deploy/kubernetes/operator/config/crd/bases/uniffle.apache.org_remoteshuffleservices.yaml).
+
+Run the following command:
+
+```
+kubectl apply -f ${crd-yaml-file}
+```
+
+## Setup or Update Uniffle Webhook
+
+We can refer to [webhook yaml 
file](https://github.com/apache/incubator-uniffle/tree/master/deploy/kubernetes/operator/config/manager/rss-webhook.yaml).
+
+Run the following command:
+
+```
+kubectl apply -f ${webhook-yaml-file}
+```
+
+## Setup or Update Uniffle Controller
+
+We can refer to [controller yaml 
file](https://github.com/apache/incubator-uniffle/tree/master/deploy/kubernetes/operator/config/manager/rss-controller.yaml).
+
+Run the following command:
+
+```
+kubectl apply -f ${controller-yaml-file}
+```
+
+## How To Use
+
+We can learn more details about usage of CRD
+from [uniffle operator design](00-uniffle-operator-design.md).
+
+## Examples
+
+Example uses of CRD have been [provided](02-examples.md).
diff --git a/docs/04-deploy-uniffle-cluster-on-k8s/02-examples.md 
b/docs/04-deploy-uniffle-cluster-on-k8s/02-examples.md
new file mode 100644
index 0000000..f8514c0
--- /dev/null
+++ b/docs/04-deploy-uniffle-cluster-on-k8s/02-examples.md
@@ -0,0 +1,31 @@
+<!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the "License"); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~    http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an "AS IS" BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  -->
+
+
+# Examples
+
+We need to create configMap first which saves coordinators, shuffleServers and 
log4j's configuration(we can refer
+to 
[configuration](https://github.com/apache/incubator-uniffle/tree/master/deploy/kubernetes/operator/examples/configuration.yaml)).
+
+Coordinator is a stateless service, when upgrading, we can directly update the 
configuration and then update the image.
+
+Shuffle server is a stateful service, and the upgrade operation is more 
complicated, so we show examples of different
+upgrade modes.
+- [Full 
Upgrade](https://github.com/apache/incubator-uniffle/tree/master/deploy/kubernetes/operator/examples/full-upgrade)
+- [Full 
Restart](https://github.com/apache/incubator-uniffle/tree/master/deploy/kubernetes/operator/examples/full-restart)
+- [Partition 
Upgrade](https://github.com/apache/incubator-uniffle/tree/master/deploy/kubernetes/operator/examples/partition-upgrade)
+- [Specific 
Upgrade](https://github.com/apache/incubator-uniffle/tree/master/deploy/kubernetes/operator/examples/specific-upgrade)
diff --git a/docs/img/rss-crd-state-transition.png 
b/docs/img/rss-crd-state-transition.png
new file mode 100644
index 0000000..f5329b8
Binary files /dev/null and b/docs/img/rss-crd-state-transition.png differ

[incubator-uniffle-website] 01/01: Add deploy doc (v2)

Reply via email to