surahman edited a comment on pull request #3741:
URL: https://github.com/apache/incubator-heron/pull/3741#issuecomment-984255367


   ### This PR combines all the functionality to customize the Heron execution 
environment in Kubernetes.
   
   The documentation at this point in the PR can be found 
[here](https://github.com/apache/incubator-heron/blob/ae2a6606e51e2d122d0df819fe29d3ca4077d22c/website2/docs/schedulers-k8s-execution-environment.md).
   
   - [x] Separation of the Heron Manager and Executors:
     - Two `StatefulSet`s per topology.
       - `Manager` named `[topology-name]-manager` with a single replica.
       - `Executor` named `[topology-name]-executors` with multiple replicas.
     - Both `StatefulSet`s, the headless `Service`, and all `Persistent Volume 
Claims` which are generated for the topology are removed on termination.
     - `restart` will restart the `Manager` and all `Executor`s for a topology.
     - `addContainers` will only add to the replica count for `Executor`s.
     - `removeContainers` will only decrement the replica count for `Executor`s.
     - `patchStatefulSetReplicas` will only patch the `StatefulSet` for 
`Executor`s.
   - [x] Loading of Pod Templates via `heron.kubernetes.[executor | 
manager].pod.template=[ConfigMap].[Pod Template Name]`.
   - [x] Configure Persistent Volumes via `heron.kubernetes.[executor | 
manager].volumes.persistentVolumeClaim.[Volume Name].[Option]=[Value]`
   - [x] Configure resources for the manager and/or executor via 
`heron.kubernetes.[executor | manager].[limits | requests].[Option]=[Value]`
   
   I have completed some deployment testing and this PR is now available for 
review and broader testing.
   
   <details><summary>Submit command</summary>
   
   ```bash
   ~/bin/heron submit kubernetes ~/.heron/examples/heron-api-examples.jar \
   org.apache.heron.examples.api.AckingTopology acking \
   --verbose \
   --config-property 
heron.kubernetes.executor.pod.template=pod-templ-executor.pod-template-executor.yaml
 \
   --config-property 
heron.kubernetes.manager.pod.template=pod-templ-manager.pod-template-manager.yaml
 \
   --config-property heron.kubernetes.manager.limits.cpu=2 \
   --config-property heron.kubernetes.manager.limits.memory=3 \
   --config-property heron.kubernetes.manager.requests.cpu=1 \
   --config-property heron.kubernetes.manager.requests.memory=2 \
   --config-property heron.kubernetes.executor.limits.cpu=5 \
   --config-property heron.kubernetes.executor.limits.memory=6 \
   --config-property heron.kubernetes.executor.requests.cpu=2 \
   --config-property heron.kubernetes.executor.requests.memory=1 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.claimName=OnDemand
 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.accessModes=ReadWriteOnce,ReadOnlyMany
 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.sizeLimit=256Gi
 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.volumeMode=Block
 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.path=path/to/mount/dynamic/volume
 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-dynamic-volume.subPath=sub/path/to/mount/dynamic/volume
 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.claimName=OnDemand
 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.storageClassName=storage-class-name
 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.accessModes=ReadWriteOnce,ReadOnlyMany
 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.sizeLimit=512Gi
 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.volumeMode=Block
 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.path=path/to/mount/static/volume
 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-static-volume.subPath=sub/path/to/mount/static/volume
 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-shared-volume.claimName=requested-claim-by-user
 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-shared-volume.path=path/to/mount/shared/volume
 \
   --config-property 
heron.kubernetes.executor.volumes.persistentVolumeClaim.executor-shared-volume.subPath=sub/path/to/mount/shared/volume
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.claimName=OnDemand
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.accessModes=ReadWriteOnce,ReadOnlyMany
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.sizeLimit=256Gi
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.volumeMode=Block
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.path=path/to/mount/dynamic/volume
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-dynamic-volume.subPath=sub/path/to/mount/dynamic/volume
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.claimName=OnDemand
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.storageClassName=storage-class-name
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.accessModes=ReadWriteOnce,ReadOnlyMany
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.sizeLimit=512Gi
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.volumeMode=Block
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.path=path/to/mount/static/volume
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-static-volume.subPath=sub/path/to/mount/static/volume
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-shared-volume.claimName=requested-claim-by-user
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-shared-volume.path=path/to/mount/shared/volume
 \
   --config-property 
heron.kubernetes.manager.volumes.persistentVolumeClaim.manager-shared-volume.subPath=sub/path/to/mount/shared/volume
   ```
   
   </details>
   
   <details><summary>Manager StatefulSet</summary>
   
   ```yaml
   apiVersion: apps/v1
   kind: StatefulSet
   metadata:
     creationTimestamp: "2021-12-02T00:12:20Z"
     generation: 1
     labels:
       app: heron
       topology: acking
     name: acking-manager
     namespace: default
     resourceVersion: "1216"
     uid: c823bb62-c798-46e2-8f7c-ec7f66a663ac
   spec:
     podManagementPolicy: Parallel
     replicas: 1
     revisionHistoryLimit: 10
     selector:
       matchLabels:
         app: heron
         topology: acking
     serviceName: acking
     template:
       metadata:
         annotations:
           prometheus.io/port: "8080"
           prometheus.io/scrape: "true"
         creationTimestamp: null
         labels:
           app: heron
           topology: acking
       spec:
         containers:
         - command:
           - sh
           - -c
           - './heron-core/bin/heron-downloader-config kubernetes && 
./heron-core/bin/heron-downloader
             
distributedlog://zookeeper:2181/heronbkdl/acking-saad-tag-0-1634139749345622293.tar.gz
             . && SHARD_ID=${POD_NAME##*-} && echo shardId=${SHARD_ID} && 
./heron-core/bin/heron-executor
             --topology-name=acking 
--topology-id=acking92ff5e65-2f7c-42c1-b8f3-aa3d9e3847d6
             --topology-defn-file=acking.defn 
--state-manager-connection=zookeeper:2181
             --state-manager-root=/heron 
--state-manager-config-file=./heron-conf/statemgr.yaml
             --tmanager-binary=./heron-core/bin/heron-tmanager 
--stmgr-binary=./heron-core/bin/heron-stmgr
             --metrics-manager-classpath=./heron-core/lib/metricsmgr/* 
--instance-jvm-opts="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg(61)(61)"
             --classpath=heron-api-examples.jar 
--heron-internals-config-file=./heron-conf/heron_internals.yaml
             --override-config-file=./heron-conf/override.yaml 
--component-ram-map=exclaim1:1073741824,word:1073741824
             --component-jvm-opts="" --pkg-type=jar 
--topology-binary-file=heron-api-examples.jar
             --heron-java-home=$JAVA_HOME 
--heron-shell-binary=./heron-core/bin/heron-shell
             --cluster=kubernetes --role=saad --environment=default 
--instance-classpath=./heron-core/lib/instance/*
             --metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml 
--scheduler-classpath=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*
             --python-instance-binary=./heron-core/bin/heron-python-instance 
--cpp-instance-binary=./heron-core/bin/heron-cpp-instance
             
--metricscache-manager-classpath=./heron-core/lib/metricscachemgr/* 
--metricscache-manager-mode=disabled
             --is-stateful=false 
--checkpoint-manager-classpath=./heron-core/lib/ckptmgr/*:./heron-core/lib/statefulstorage/*:
             --stateful-config-file=./heron-conf/stateful.yaml 
--checkpoint-manager-ram=1073741824
             --health-manager-mode=disabled 
--health-manager-classpath=./heron-core/lib/healthmgr/*
             --shard=$SHARD_ID --server-port=6001 
--tmanager-controller-port=6002 --tmanager-stats-port=6003
             --shell-port=6004 --metrics-manager-port=6005 
--scheduler-port=6006 --metricscache-manager-server-port=6007
             --metricscache-manager-stats-port=6008 
--checkpoint-manager-port=6009'
           env:
           - name: HOST
             valueFrom:
               fieldRef:
                 apiVersion: v1
                 fieldPath: status.podIP
           - name: POD_NAME
             valueFrom:
               fieldRef:
                 apiVersion: v1
                 fieldPath: metadata.name
           - name: var_one_manager
             value: variable one on manager
           - name: var_three_manager
             value: variable three on manager
           - name: var_two_manager
             value: variable two on manager
           image: apache/heron:testbuild
           imagePullPolicy: IfNotPresent
           name: manager
           ports:
           - containerPort: 6001
             name: server
             protocol: TCP
           - containerPort: 6002
             name: tmanager-ctl
             protocol: TCP
           - containerPort: 6003
             name: tmanager-stats
             protocol: TCP
           - containerPort: 6004
             name: shell-port
             protocol: TCP
           - containerPort: 6005
             name: metrics-mgr
             protocol: TCP
           - containerPort: 6006
             name: scheduler
             protocol: TCP
           - containerPort: 6007
             name: metrics-cache-m
             protocol: TCP
           - containerPort: 6008
             name: metrics-cache-s
             protocol: TCP
           - containerPort: 6009
             name: ckptmgr
             protocol: TCP
           - containerPort: 7775
             name: tcp-port-kept
             protocol: TCP
           - containerPort: 7776
             name: udp-port-kept
             protocol: UDP
           resources:
             limits:
               cpu: "2"
               memory: 3Mi
             requests:
               cpu: "1"
               memory: 2Mi
           securityContext:
             allowPrivilegeEscalation: false
           terminationMessagePath: /dev/termination-log
           terminationMessagePolicy: File
           volumeMounts:
           - mountPath: path/to/mount/dynamic/volume
             name: manager-dynamic-volume
             subPath: sub/path/to/mount/dynamic/volume
           - mountPath: path/to/mount/shared/volume
             name: manager-shared-volume
             subPath: sub/path/to/mount/shared/volume
           - mountPath: path/to/mount/static/volume
             name: manager-static-volume
             subPath: sub/path/to/mount/static/volume
           - mountPath: /shared_volume/manager
             name: shared-volume-manager
         - image: alpine
           imagePullPolicy: Always
           name: manager-sidecar-container
           resources: {}
           terminationMessagePath: /dev/termination-log
           terminationMessagePolicy: File
           volumeMounts:
           - mountPath: /shared_volume/manager
             name: shared-volume-manager
         dnsPolicy: ClusterFirst
         restartPolicy: Always
         schedulerName: default-scheduler
         securityContext: {}
         terminationGracePeriodSeconds: 0
         tolerations:
         - effect: NoExecute
           key: node.kubernetes.io/not-ready
           operator: Exists
           tolerationSeconds: 10
         - effect: NoExecute
           key: node.kubernetes.io/unreachable
           operator: Exists
           tolerationSeconds: 10
         volumes:
         - name: manager-shared-volume
           persistentVolumeClaim:
             claimName: requested-claim-by-user
         - emptyDir: {}
           name: shared-volume-manager
     updateStrategy:
       rollingUpdate:
         partition: 0
       type: RollingUpdate
     volumeClaimTemplates:
     - apiVersion: v1
       kind: PersistentVolumeClaim
       metadata:
         creationTimestamp: null
         labels:
           onDemand: "true"
           topology: acking
         name: manager-static-volume
       spec:
         accessModes:
         - ReadWriteOnce
         - ReadOnlyMany
         resources:
           requests:
             storage: 512Gi
         storageClassName: storage-class-name
         volumeMode: Block
       status:
         phase: Pending
     - apiVersion: v1
       kind: PersistentVolumeClaim
       metadata:
         creationTimestamp: null
         labels:
           onDemand: "true"
           topology: acking
         name: manager-dynamic-volume
       spec:
         accessModes:
         - ReadWriteOnce
         - ReadOnlyMany
         resources:
           requests:
             storage: 256Gi
         volumeMode: Block
       status:
         phase: Pending
   status:
     collisionCount: 0
     currentReplicas: 1
     currentRevision: acking-manager-7596cff587
     observedGeneration: 1
     replicas: 1
     updateRevision: acking-manager-7596cff587
     updatedReplicas: 1
   ```
   
   </details>
   
   <details><summary>Executor StatefulSet</summary>
   
   ```yaml
   apiVersion: apps/v1
   kind: StatefulSet
   metadata:
     creationTimestamp: "2021-12-02T00:12:20Z"
     generation: 1
     labels:
       app: heron
       topology: acking
     name: acking-executors
     namespace: default
     resourceVersion: "1211"
     uid: 3ec133e2-591e-4864-b054-478021b8062d
   spec:
     podManagementPolicy: Parallel
     replicas: 2
     revisionHistoryLimit: 10
     selector:
       matchLabels:
         app: heron
         topology: acking
     serviceName: acking
     template:
       metadata:
         annotations:
           prometheus.io/port: "8080"
           prometheus.io/scrape: "true"
         creationTimestamp: null
         labels:
           app: heron
           topology: acking
       spec:
         containers:
         - command:
           - sh
           - -c
           - './heron-core/bin/heron-downloader-config kubernetes && 
./heron-core/bin/heron-downloader
             
distributedlog://zookeeper:2181/heronbkdl/acking-saad-tag-0-1634139749345622293.tar.gz
             . && SHARD_ID=$((${POD_NAME##*-} + 1)) && echo shardId=${SHARD_ID} 
&& ./heron-core/bin/heron-executor
             --topology-name=acking 
--topology-id=acking92ff5e65-2f7c-42c1-b8f3-aa3d9e3847d6
             --topology-defn-file=acking.defn 
--state-manager-connection=zookeeper:2181
             --state-manager-root=/heron 
--state-manager-config-file=./heron-conf/statemgr.yaml
             --tmanager-binary=./heron-core/bin/heron-tmanager 
--stmgr-binary=./heron-core/bin/heron-stmgr
             --metrics-manager-classpath=./heron-core/lib/metricsmgr/* 
--instance-jvm-opts="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg(61)(61)"
             --classpath=heron-api-examples.jar 
--heron-internals-config-file=./heron-conf/heron_internals.yaml
             --override-config-file=./heron-conf/override.yaml 
--component-ram-map=exclaim1:1073741824,word:1073741824
             --component-jvm-opts="" --pkg-type=jar 
--topology-binary-file=heron-api-examples.jar
             --heron-java-home=$JAVA_HOME 
--heron-shell-binary=./heron-core/bin/heron-shell
             --cluster=kubernetes --role=saad --environment=default 
--instance-classpath=./heron-core/lib/instance/*
             --metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml 
--scheduler-classpath=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*
             --python-instance-binary=./heron-core/bin/heron-python-instance 
--cpp-instance-binary=./heron-core/bin/heron-cpp-instance
             
--metricscache-manager-classpath=./heron-core/lib/metricscachemgr/* 
--metricscache-manager-mode=disabled
             --is-stateful=false 
--checkpoint-manager-classpath=./heron-core/lib/ckptmgr/*:./heron-core/lib/statefulstorage/*:
             --stateful-config-file=./heron-conf/stateful.yaml 
--checkpoint-manager-ram=1073741824
             --health-manager-mode=disabled 
--health-manager-classpath=./heron-core/lib/healthmgr/*
             --shard=$SHARD_ID --server-port=6001 
--tmanager-controller-port=6002 --tmanager-stats-port=6003
             --shell-port=6004 --metrics-manager-port=6005 
--scheduler-port=6006 --metricscache-manager-server-port=6007
             --metricscache-manager-stats-port=6008 
--checkpoint-manager-port=6009'
           env:
           - name: HOST
             valueFrom:
               fieldRef:
                 apiVersion: v1
                 fieldPath: status.podIP
           - name: POD_NAME
             valueFrom:
               fieldRef:
                 apiVersion: v1
                 fieldPath: metadata.name
           - name: var_one
             value: variable one
           - name: var_three
             value: variable three
           - name: var_two
             value: variable two
           image: apache/heron:testbuild
           imagePullPolicy: IfNotPresent
           name: executor
           ports:
           - containerPort: 5555
             name: tcp-port-kept
             protocol: TCP
           - containerPort: 5556
             name: udp-port-kept
             protocol: UDP
           - containerPort: 6001
             name: server
             protocol: TCP
           - containerPort: 6002
             name: tmanager-ctl
             protocol: TCP
           - containerPort: 6003
             name: tmanager-stats
             protocol: TCP
           - containerPort: 6004
             name: shell-port
             protocol: TCP
           - containerPort: 6005
             name: metrics-mgr
             protocol: TCP
           - containerPort: 6006
             name: scheduler
             protocol: TCP
           - containerPort: 6007
             name: metrics-cache-m
             protocol: TCP
           - containerPort: 6008
             name: metrics-cache-s
             protocol: TCP
           - containerPort: 6009
             name: ckptmgr
             protocol: TCP
           resources:
             limits:
               cpu: "5"
               memory: 6Mi
             requests:
               cpu: "2"
               memory: 1Mi
           securityContext:
             allowPrivilegeEscalation: false
           terminationMessagePath: /dev/termination-log
           terminationMessagePolicy: File
           volumeMounts:
           - mountPath: path/to/mount/dynamic/volume
             name: executor-dynamic-volume
             subPath: sub/path/to/mount/dynamic/volume
           - mountPath: path/to/mount/shared/volume
             name: executor-shared-volume
             subPath: sub/path/to/mount/shared/volume
           - mountPath: path/to/mount/static/volume
             name: executor-static-volume
             subPath: sub/path/to/mount/static/volume
           - mountPath: /shared_volume
             name: shared-volume
         - image: alpine
           imagePullPolicy: Always
           name: sidecar-container
           resources: {}
           terminationMessagePath: /dev/termination-log
           terminationMessagePolicy: File
           volumeMounts:
           - mountPath: /shared_volume
             name: shared-volume
         dnsPolicy: ClusterFirst
         restartPolicy: Always
         schedulerName: default-scheduler
         securityContext: {}
         terminationGracePeriodSeconds: 0
         tolerations:
         - effect: NoExecute
           key: node.kubernetes.io/not-ready
           operator: Exists
           tolerationSeconds: 10
         - effect: NoExecute
           key: node.kubernetes.io/unreachable
           operator: Exists
           tolerationSeconds: 10
         volumes:
         - name: executor-shared-volume
           persistentVolumeClaim:
             claimName: requested-claim-by-user
         - emptyDir: {}
           name: shared-volume
     updateStrategy:
       rollingUpdate:
         partition: 0
       type: RollingUpdate
     volumeClaimTemplates:
     - apiVersion: v1
       kind: PersistentVolumeClaim
       metadata:
         creationTimestamp: null
         labels:
           onDemand: "true"
           topology: acking
         name: executor-dynamic-volume
       spec:
         accessModes:
         - ReadWriteOnce
         - ReadOnlyMany
         resources:
           requests:
             storage: 256Gi
         volumeMode: Block
       status:
         phase: Pending
     - apiVersion: v1
       kind: PersistentVolumeClaim
       metadata:
         creationTimestamp: null
         labels:
           onDemand: "true"
           topology: acking
         name: executor-static-volume
       spec:
         accessModes:
         - ReadWriteOnce
         - ReadOnlyMany
         resources:
           requests:
             storage: 512Gi
         storageClassName: storage-class-name
         volumeMode: Block
       status:
         phase: Pending
   status:
     collisionCount: 0
     currentReplicas: 2
     currentRevision: acking-executors-68f9654bd9
     observedGeneration: 1
     replicas: 2
     updateRevision: acking-executors-68f9654bd9
     updatedReplicas: 2
   ```
   
   </details>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to