surahman opened a new pull request #3741:
URL: https://github.com/apache/incubator-heron/pull/3741


   ***Feature #3724: Management/Driver pod should not use the same resources as 
the other topology pods. Should create separate deployment/service.***
   
   This PR builds upon #3725 in order to test the functionality of Volume Claim 
configurations in the `Manager`. Once #3725 is merged into `master` I will 
hard-reset the feature branch for this PR onto `master` (this may temporarily 
close the PR). After that, I will merge the `dev` branch with the feature 
branch again and resolve any merge conflicts which may arise.
   
   The following are the current features and I am soliciting input across all 
areas:
   
   * The `StatefulSet`s are named `[topology-name]-manager` and 
`[topology-name]-executors`.
   * A single headless `Service` is used.
   * The `Manager` is a duplicate of the `Executor` `StatefulSet`.
   * Both `StatefulSet`s, the `Service`, and all `Volume Claims` which are 
generated for the topology are removed on termination.
   * `restart` will restart the `Manager` and `Executor`s for a topology.
   * `addContainers` will only add `Executor` containers.
   * `removeContainers` will only remove `Executor` containers.
   * `patchStatefulSetReplicas` will only patch `Executor` containers.
   
   ### Usage
   
   The command pattern is as follows:
   `heron.kubernetes.manager.[limits | requests].[OPTION]=[VALUE]`
   
   The currently supported CLI `options` are:
   
   * `cpu`
   * `memory`
   
   All associated `value`s must be natural numbers.
   
   #### Example:
   
   ```bash
   ~/bin/heron submit kubernetes ~/.heron/examples/heron-api-examples.jar \
   org.apache.heron.examples.api.AckingTopology acking \
   --verbose \
   --config-property 
heron.kubernetes.pod.template.configmap.name=pod-templ-cf-map.pod-template.yaml 
\
   --config-property heron.kubernetes.manager.limits.cpu=2 \
   --config-property heron.kubernetes.manager.limits.memory=3 \
   --config-property heron.kubernetes.manager.requests.cpu=1 \
   --config-property heron.kubernetes.manager.requests.memory=2 \
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.claimName=OnDemand 
\
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.accessModes=ReadWriteOnce,ReadOnlyMany
 \
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.sizeLimit=256Gi \
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.volumeMode=Block \
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.path=path/to/mount/dynamic/volume
 \
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.subPath=sub/path/to/mount/dynamic/volume
 \
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.claimName=OnDemand \
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.storageClassName=storage-class-name
 \
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.accessModes=ReadWriteOnce,ReadOnlyMany
 \
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.sizeLimit=512Gi \
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.volumeMode=Block \
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.path=path/to/mount/static/volume
 \
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.subPath=sub/path/to/mount/static/volume
 \
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.sharedvolume.claimName=requested-claim-by-user
 \
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.sharedvolume.path=path/to/mount/shared/volume
 \
   --config-property 
heron.kubernetes.volumes.persistentVolumeClaim.sharedvolume.subPath=sub/path/to/mount/shared/volume
   ```
   
   
   <details><summary>Manager StatefulSet</summary>
   
   ```yaml
   apiVersion: apps/v1
   kind: StatefulSet
   metadata:
     creationTimestamp: "2021-11-23T23:56:06Z"
     generation: 1
     labels:
       app: heron
       topology: acking
     name: acking-manager
     namespace: default
     resourceVersion: "1706"
     uid: 2117b2e9-248e-4d2c-a4cc-7ff1be45375f
   spec:
     podManagementPolicy: Parallel
     replicas: 1
     revisionHistoryLimit: 10
     selector:
       matchLabels:
         app: heron
         topology: acking
     serviceName: acking
     template:
       metadata:
         annotations:
           prometheus.io/port: "8080"
           prometheus.io/scrape: "true"
         creationTimestamp: null
         labels:
           app: heron
           topology: acking
       spec:
         containers:
         - command:
           - sh
           - -c
           - './heron-core/bin/heron-downloader-config kubernetes && 
./heron-core/bin/heron-downloader
             
distributedlog://zookeeper:2181/heronbkdl/acking-saad-tag-0-6322466307195919246.tar.gz
             . && SHARD_ID=${POD_NAME##*-} && echo shardId=${SHARD_ID} && 
./heron-core/bin/heron-executor
             --topology-name=acking 
--topology-id=acking242db601-6bb5-4703-b2f3-f38f0a3f8a0c
             --topology-defn-file=acking.defn 
--state-manager-connection=zookeeper:2181
             --state-manager-root=/heron 
--state-manager-config-file=./heron-conf/statemgr.yaml
             --tmanager-binary=./heron-core/bin/heron-tmanager 
--stmgr-binary=./heron-core/bin/heron-stmgr
             --metrics-manager-classpath=./heron-core/lib/metricsmgr/* 
--instance-jvm-opts="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg(61)(61)"
             --classpath=heron-api-examples.jar 
--heron-internals-config-file=./heron-conf/heron_internals.yaml
             --override-config-file=./heron-conf/override.yaml 
--component-ram-map=exclaim1:1073741824,word:1073741824
             --component-jvm-opts="" --pkg-type=jar 
--topology-binary-file=heron-api-examples.jar
             --heron-java-home=$JAVA_HOME 
--heron-shell-binary=./heron-core/bin/heron-shell
             --cluster=kubernetes --role=saad --environment=default 
--instance-classpath=./heron-core/lib/instance/*
             --metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml 
--scheduler-classpath=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*
             --python-instance-binary=./heron-core/bin/heron-python-instance 
--cpp-instance-binary=./heron-core/bin/heron-cpp-instance
             
--metricscache-manager-classpath=./heron-core/lib/metricscachemgr/* 
--metricscache-manager-mode=disabled
             --is-stateful=false 
--checkpoint-manager-classpath=./heron-core/lib/ckptmgr/*:./heron-core/lib/statefulstorage/*:
             --stateful-config-file=./heron-conf/stateful.yaml 
--checkpoint-manager-ram=1073741824
             --health-manager-mode=disabled 
--health-manager-classpath=./heron-core/lib/healthmgr/*
             --shard=$SHARD_ID --server-port=6001 
--tmanager-controller-port=6002 --tmanager-stats-port=6003
             --shell-port=6004 --metrics-manager-port=6005 
--scheduler-port=6006 --metricscache-manager-server-port=6007
             --metricscache-manager-stats-port=6008 
--checkpoint-manager-port=6009'
           env:
           - name: HOST
             valueFrom:
               fieldRef:
                 apiVersion: v1
                 fieldPath: status.podIP
           - name: POD_NAME
             valueFrom:
               fieldRef:
                 apiVersion: v1
                 fieldPath: metadata.name
           - name: var_one
             value: variable one
           - name: var_three
             value: variable three
           - name: var_two
             value: variable two
           image: apache/heron:testbuild
           imagePullPolicy: IfNotPresent
           name: manager
           ports:
           - containerPort: 5555
             name: tcp-port-kept
             protocol: TCP
           - containerPort: 5556
             name: udp-port-kept
             protocol: UDP
           - containerPort: 6001
             name: server
             protocol: TCP
           - containerPort: 6002
             name: tmanager-ctl
             protocol: TCP
           - containerPort: 6003
             name: tmanager-stats
             protocol: TCP
           - containerPort: 6004
             name: shell-port
             protocol: TCP
           - containerPort: 6005
             name: metrics-mgr
             protocol: TCP
           - containerPort: 6006
             name: scheduler
             protocol: TCP
           - containerPort: 6007
             name: metrics-cache-m
             protocol: TCP
           - containerPort: 6008
             name: metrics-cache-s
             protocol: TCP
           - containerPort: 6009
             name: ckptmgr
             protocol: TCP
           resources:
             limits:
               cpu: "2"
               memory: 3Gi
             requests:
               cpu: "1"
               memory: 2Gi
           securityContext:
             allowPrivilegeEscalation: false
           terminationMessagePath: /dev/termination-log
           terminationMessagePolicy: File
           volumeMounts:
           - mountPath: path/to/mount/dynamic/volume
             name: dynamicvolume
             subPath: sub/path/to/mount/dynamic/volume
           - mountPath: /shared_volume
             name: shared-volume
           - mountPath: path/to/mount/shared/volume
             name: sharedvolume
             subPath: sub/path/to/mount/shared/volume
           - mountPath: path/to/mount/static/volume
             name: staticvolume
             subPath: sub/path/to/mount/static/volume
         - image: alpine
           imagePullPolicy: Always
           name: sidecar-container
           resources: {}
           terminationMessagePath: /dev/termination-log
           terminationMessagePolicy: File
           volumeMounts:
           - mountPath: /shared_volume
             name: shared-volume
         dnsPolicy: ClusterFirst
         restartPolicy: Always
         schedulerName: default-scheduler
         securityContext: {}
         terminationGracePeriodSeconds: 0
         tolerations:
         - effect: NoExecute
           key: node.kubernetes.io/not-ready
           operator: Exists
           tolerationSeconds: 10
         - effect: NoExecute
           key: node.kubernetes.io/unreachable
           operator: Exists
           tolerationSeconds: 10
         volumes:
         - emptyDir: {}
           name: shared-volume
         - name: sharedvolume
           persistentVolumeClaim:
             claimName: requested-claim-by-user
     updateStrategy:
       rollingUpdate:
         partition: 0
       type: RollingUpdate
     volumeClaimTemplates:
     - apiVersion: v1
       kind: PersistentVolumeClaim
       metadata:
         creationTimestamp: null
         labels:
           onDemand: "true"
           topology: acking
         name: dynamicvolume
       spec:
         accessModes:
         - ReadWriteOnce
         - ReadOnlyMany
         resources:
           requests:
             storage: 256Gi
         volumeMode: Block
       status:
         phase: Pending
     - apiVersion: v1
       kind: PersistentVolumeClaim
       metadata:
         creationTimestamp: null
         labels:
           onDemand: "true"
           topology: acking
         name: staticvolume
       spec:
         accessModes:
         - ReadWriteOnce
         - ReadOnlyMany
         resources:
           requests:
             storage: 512Gi
         storageClassName: storage-class-name
         volumeMode: Block
       status:
         phase: Pending
   status:
     collisionCount: 0
     currentReplicas: 1
     currentRevision: acking-manager-7464c5697
     observedGeneration: 1
     replicas: 1
     updateRevision: acking-manager-7464c5697
     updatedReplicas: 1
   ```
   
   </details>
   
   <details><summary>Executor StatefulSet</summary>
   
   ```yaml
   apiVersion: apps/v1
   kind: StatefulSet
   metadata:
     creationTimestamp: "2021-11-23T23:56:06Z"
     generation: 1
     labels:
       app: heron
       topology: acking
     name: acking-executors
     namespace: default
     resourceVersion: "1704"
     uid: 73c3dfcf-2810-4060-8963-138a41d0d4c0
   spec:
     podManagementPolicy: Parallel
     replicas: 2
     revisionHistoryLimit: 10
     selector:
       matchLabels:
         app: heron
         topology: acking
     serviceName: acking
     template:
       metadata:
         annotations:
           prometheus.io/port: "8080"
           prometheus.io/scrape: "true"
         creationTimestamp: null
         labels:
           app: heron
           topology: acking
       spec:
         containers:
         - command:
           - sh
           - -c
           - './heron-core/bin/heron-downloader-config kubernetes && 
./heron-core/bin/heron-downloader
             
distributedlog://zookeeper:2181/heronbkdl/acking-saad-tag-0-6322466307195919246.tar.gz
             . && SHARD_ID=$((${POD_NAME##*-} + 1)) && echo shardId=${SHARD_ID} 
&& ./heron-core/bin/heron-executor
             --topology-name=acking 
--topology-id=acking242db601-6bb5-4703-b2f3-f38f0a3f8a0c
             --topology-defn-file=acking.defn 
--state-manager-connection=zookeeper:2181
             --state-manager-root=/heron 
--state-manager-config-file=./heron-conf/statemgr.yaml
             --tmanager-binary=./heron-core/bin/heron-tmanager 
--stmgr-binary=./heron-core/bin/heron-stmgr
             --metrics-manager-classpath=./heron-core/lib/metricsmgr/* 
--instance-jvm-opts="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg(61)(61)"
             --classpath=heron-api-examples.jar 
--heron-internals-config-file=./heron-conf/heron_internals.yaml
             --override-config-file=./heron-conf/override.yaml 
--component-ram-map=exclaim1:1073741824,word:1073741824
             --component-jvm-opts="" --pkg-type=jar 
--topology-binary-file=heron-api-examples.jar
             --heron-java-home=$JAVA_HOME 
--heron-shell-binary=./heron-core/bin/heron-shell
             --cluster=kubernetes --role=saad --environment=default 
--instance-classpath=./heron-core/lib/instance/*
             --metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml 
--scheduler-classpath=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*
             --python-instance-binary=./heron-core/bin/heron-python-instance 
--cpp-instance-binary=./heron-core/bin/heron-cpp-instance
             
--metricscache-manager-classpath=./heron-core/lib/metricscachemgr/* 
--metricscache-manager-mode=disabled
             --is-stateful=false 
--checkpoint-manager-classpath=./heron-core/lib/ckptmgr/*:./heron-core/lib/statefulstorage/*:
             --stateful-config-file=./heron-conf/stateful.yaml 
--checkpoint-manager-ram=1073741824
             --health-manager-mode=disabled 
--health-manager-classpath=./heron-core/lib/healthmgr/*
             --shard=$SHARD_ID --server-port=6001 
--tmanager-controller-port=6002 --tmanager-stats-port=6003
             --shell-port=6004 --metrics-manager-port=6005 
--scheduler-port=6006 --metricscache-manager-server-port=6007
             --metricscache-manager-stats-port=6008 
--checkpoint-manager-port=6009'
           env:
           - name: HOST
             valueFrom:
               fieldRef:
                 apiVersion: v1
                 fieldPath: status.podIP
           - name: POD_NAME
             valueFrom:
               fieldRef:
                 apiVersion: v1
                 fieldPath: metadata.name
           - name: var_one
             value: variable one
           - name: var_three
             value: variable three
           - name: var_two
             value: variable two
           image: apache/heron:testbuild
           imagePullPolicy: IfNotPresent
           name: executor
           ports:
           - containerPort: 5555
             name: tcp-port-kept
             protocol: TCP
           - containerPort: 5556
             name: udp-port-kept
             protocol: UDP
           - containerPort: 6001
             name: server
             protocol: TCP
           - containerPort: 6002
             name: tmanager-ctl
             protocol: TCP
           - containerPort: 6003
             name: tmanager-stats
             protocol: TCP
           - containerPort: 6004
             name: shell-port
             protocol: TCP
           - containerPort: 6005
             name: metrics-mgr
             protocol: TCP
           - containerPort: 6006
             name: scheduler
             protocol: TCP
           - containerPort: 6007
             name: metrics-cache-m
             protocol: TCP
           - containerPort: 6008
             name: metrics-cache-s
             protocol: TCP
           - containerPort: 6009
             name: ckptmgr
             protocol: TCP
           resources:
             limits:
               cpu: "3"
               memory: 4Gi
             requests:
               cpu: "3"
               memory: 4Gi
           securityContext:
             allowPrivilegeEscalation: false
           terminationMessagePath: /dev/termination-log
           terminationMessagePolicy: File
           volumeMounts:
           - mountPath: path/to/mount/dynamic/volume
             name: dynamicvolume
             subPath: sub/path/to/mount/dynamic/volume
           - mountPath: /shared_volume
             name: shared-volume
           - mountPath: path/to/mount/shared/volume
             name: sharedvolume
             subPath: sub/path/to/mount/shared/volume
           - mountPath: path/to/mount/static/volume
             name: staticvolume
             subPath: sub/path/to/mount/static/volume
         - image: alpine
           imagePullPolicy: Always
           name: sidecar-container
           resources: {}
           terminationMessagePath: /dev/termination-log
           terminationMessagePolicy: File
           volumeMounts:
           - mountPath: /shared_volume
             name: shared-volume
         dnsPolicy: ClusterFirst
         restartPolicy: Always
         schedulerName: default-scheduler
         securityContext: {}
         terminationGracePeriodSeconds: 0
         tolerations:
         - effect: NoExecute
           key: node.kubernetes.io/not-ready
           operator: Exists
           tolerationSeconds: 10
         - effect: NoExecute
           key: node.kubernetes.io/unreachable
           operator: Exists
           tolerationSeconds: 10
         volumes:
         - emptyDir: {}
           name: shared-volume
         - name: sharedvolume
           persistentVolumeClaim:
             claimName: requested-claim-by-user
     updateStrategy:
       rollingUpdate:
         partition: 0
       type: RollingUpdate
     volumeClaimTemplates:
     - apiVersion: v1
       kind: PersistentVolumeClaim
       metadata:
         creationTimestamp: null
         labels:
           onDemand: "true"
           topology: acking
         name: dynamicvolume
       spec:
         accessModes:
         - ReadWriteOnce
         - ReadOnlyMany
         resources:
           requests:
             storage: 256Gi
         volumeMode: Block
       status:
         phase: Pending
     - apiVersion: v1
       kind: PersistentVolumeClaim
       metadata:
         creationTimestamp: null
         labels:
           onDemand: "true"
           topology: acking
         name: staticvolume
       spec:
         accessModes:
         - ReadWriteOnce
         - ReadOnlyMany
         resources:
           requests:
             storage: 512Gi
         storageClassName: storage-class-name
         volumeMode: Block
       status:
         phase: Pending
   status:
     collisionCount: 0
     currentReplicas: 2
     currentRevision: acking-executors-6467c98557
     observedGeneration: 1
     replicas: 2
     updateRevision: acking-executors-6467c98557
     updatedReplicas: 2
   ```
   
   </details>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to