surahman opened a new pull request #3741:
URL: https://github.com/apache/incubator-heron/pull/3741
***Feature #3724: Management/Driver pod should not use the same resources as
the other topology pods. Should create separate deployment/service.***
This PR builds upon #3725 in order to test the functionality of Volume Claim
configurations in the `Manager`. Once #3725 is merged into `master` I will
hard-reset the feature branch for this PR onto `master` (this may temporarily
close the PR). After that, I will merge the `dev` branch with the feature
branch again and resolve any merge conflicts which may arise.
The following are the current features and I am soliciting input across all
areas:
* The `StatefulSet`s are named `[topology-name]-manager` and
`[topology-name]-executors`.
* A single headless `Service` is used.
* The `Manager` is a duplicate of the `Executor` `StatefulSet`.
* Both `StatefulSet`s, the `Service`, and all `Volume Claims` which are
generated for the topology are removed on termination.
* `restart` will restart the `Manager` and `Executor`s for a topology.
* `addContainers` will only add `Executor` containers.
* `removeContainers` will only remove `Executor` containers.
* `patchStatefulSetReplicas` will only patch `Executor` containers.
### Usage
The command pattern is as follows:
`heron.kubernetes.manager.[limits | requests].[OPTION]=[VALUE]`
The currently supported CLI `options` are:
* `cpu`
* `memory`
All associated `value`s must be natural numbers.
#### Example:
```bash
~/bin/heron submit kubernetes ~/.heron/examples/heron-api-examples.jar \
org.apache.heron.examples.api.AckingTopology acking \
--verbose \
--config-property
heron.kubernetes.pod.template.configmap.name=pod-templ-cf-map.pod-template.yaml
\
--config-property heron.kubernetes.manager.limits.cpu=2 \
--config-property heron.kubernetes.manager.limits.memory=3 \
--config-property heron.kubernetes.manager.requests.cpu=1 \
--config-property heron.kubernetes.manager.requests.memory=2 \
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.claimName=OnDemand
\
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.accessModes=ReadWriteOnce,ReadOnlyMany
\
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.sizeLimit=256Gi \
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.volumeMode=Block \
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.path=path/to/mount/dynamic/volume
\
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.dynamicvolume.subPath=sub/path/to/mount/dynamic/volume
\
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.claimName=OnDemand \
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.storageClassName=storage-class-name
\
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.accessModes=ReadWriteOnce,ReadOnlyMany
\
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.sizeLimit=512Gi \
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.volumeMode=Block \
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.path=path/to/mount/static/volume
\
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.staticvolume.subPath=sub/path/to/mount/static/volume
\
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.sharedvolume.claimName=requested-claim-by-user
\
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.sharedvolume.path=path/to/mount/shared/volume
\
--config-property
heron.kubernetes.volumes.persistentVolumeClaim.sharedvolume.subPath=sub/path/to/mount/shared/volume
```
<details><summary>Manager StatefulSet</summary>
```yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: "2021-11-23T23:56:06Z"
generation: 1
labels:
app: heron
topology: acking
name: acking-manager
namespace: default
resourceVersion: "1706"
uid: 2117b2e9-248e-4d2c-a4cc-7ff1be45375f
spec:
podManagementPolicy: Parallel
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: heron
topology: acking
serviceName: acking
template:
metadata:
annotations:
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: heron
topology: acking
spec:
containers:
- command:
- sh
- -c
- './heron-core/bin/heron-downloader-config kubernetes &&
./heron-core/bin/heron-downloader
distributedlog://zookeeper:2181/heronbkdl/acking-saad-tag-0-6322466307195919246.tar.gz
. && SHARD_ID=${POD_NAME##*-} && echo shardId=${SHARD_ID} &&
./heron-core/bin/heron-executor
--topology-name=acking
--topology-id=acking242db601-6bb5-4703-b2f3-f38f0a3f8a0c
--topology-defn-file=acking.defn
--state-manager-connection=zookeeper:2181
--state-manager-root=/heron
--state-manager-config-file=./heron-conf/statemgr.yaml
--tmanager-binary=./heron-core/bin/heron-tmanager
--stmgr-binary=./heron-core/bin/heron-stmgr
--metrics-manager-classpath=./heron-core/lib/metricsmgr/*
--instance-jvm-opts="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg(61)(61)"
--classpath=heron-api-examples.jar
--heron-internals-config-file=./heron-conf/heron_internals.yaml
--override-config-file=./heron-conf/override.yaml
--component-ram-map=exclaim1:1073741824,word:1073741824
--component-jvm-opts="" --pkg-type=jar
--topology-binary-file=heron-api-examples.jar
--heron-java-home=$JAVA_HOME
--heron-shell-binary=./heron-core/bin/heron-shell
--cluster=kubernetes --role=saad --environment=default
--instance-classpath=./heron-core/lib/instance/*
--metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml
--scheduler-classpath=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*
--python-instance-binary=./heron-core/bin/heron-python-instance
--cpp-instance-binary=./heron-core/bin/heron-cpp-instance
--metricscache-manager-classpath=./heron-core/lib/metricscachemgr/*
--metricscache-manager-mode=disabled
--is-stateful=false
--checkpoint-manager-classpath=./heron-core/lib/ckptmgr/*:./heron-core/lib/statefulstorage/*:
--stateful-config-file=./heron-conf/stateful.yaml
--checkpoint-manager-ram=1073741824
--health-manager-mode=disabled
--health-manager-classpath=./heron-core/lib/healthmgr/*
--shard=$SHARD_ID --server-port=6001
--tmanager-controller-port=6002 --tmanager-stats-port=6003
--shell-port=6004 --metrics-manager-port=6005
--scheduler-port=6006 --metricscache-manager-server-port=6007
--metricscache-manager-stats-port=6008
--checkpoint-manager-port=6009'
env:
- name: HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: var_one
value: variable one
- name: var_three
value: variable three
- name: var_two
value: variable two
image: apache/heron:testbuild
imagePullPolicy: IfNotPresent
name: manager
ports:
- containerPort: 5555
name: tcp-port-kept
protocol: TCP
- containerPort: 5556
name: udp-port-kept
protocol: UDP
- containerPort: 6001
name: server
protocol: TCP
- containerPort: 6002
name: tmanager-ctl
protocol: TCP
- containerPort: 6003
name: tmanager-stats
protocol: TCP
- containerPort: 6004
name: shell-port
protocol: TCP
- containerPort: 6005
name: metrics-mgr
protocol: TCP
- containerPort: 6006
name: scheduler
protocol: TCP
- containerPort: 6007
name: metrics-cache-m
protocol: TCP
- containerPort: 6008
name: metrics-cache-s
protocol: TCP
- containerPort: 6009
name: ckptmgr
protocol: TCP
resources:
limits:
cpu: "2"
memory: 3Gi
requests:
cpu: "1"
memory: 2Gi
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: path/to/mount/dynamic/volume
name: dynamicvolume
subPath: sub/path/to/mount/dynamic/volume
- mountPath: /shared_volume
name: shared-volume
- mountPath: path/to/mount/shared/volume
name: sharedvolume
subPath: sub/path/to/mount/shared/volume
- mountPath: path/to/mount/static/volume
name: staticvolume
subPath: sub/path/to/mount/static/volume
- image: alpine
imagePullPolicy: Always
name: sidecar-container
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /shared_volume
name: shared-volume
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 10
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 10
volumes:
- emptyDir: {}
name: shared-volume
- name: sharedvolume
persistentVolumeClaim:
claimName: requested-claim-by-user
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: dynamicvolume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 256Gi
volumeMode: Block
status:
phase: Pending
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: staticvolume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 512Gi
storageClassName: storage-class-name
volumeMode: Block
status:
phase: Pending
status:
collisionCount: 0
currentReplicas: 1
currentRevision: acking-manager-7464c5697
observedGeneration: 1
replicas: 1
updateRevision: acking-manager-7464c5697
updatedReplicas: 1
```
</details>
<details><summary>Executor StatefulSet</summary>
```yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
creationTimestamp: "2021-11-23T23:56:06Z"
generation: 1
labels:
app: heron
topology: acking
name: acking-executors
namespace: default
resourceVersion: "1704"
uid: 73c3dfcf-2810-4060-8963-138a41d0d4c0
spec:
podManagementPolicy: Parallel
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: heron
topology: acking
serviceName: acking
template:
metadata:
annotations:
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
creationTimestamp: null
labels:
app: heron
topology: acking
spec:
containers:
- command:
- sh
- -c
- './heron-core/bin/heron-downloader-config kubernetes &&
./heron-core/bin/heron-downloader
distributedlog://zookeeper:2181/heronbkdl/acking-saad-tag-0-6322466307195919246.tar.gz
. && SHARD_ID=$((${POD_NAME##*-} + 1)) && echo shardId=${SHARD_ID}
&& ./heron-core/bin/heron-executor
--topology-name=acking
--topology-id=acking242db601-6bb5-4703-b2f3-f38f0a3f8a0c
--topology-defn-file=acking.defn
--state-manager-connection=zookeeper:2181
--state-manager-root=/heron
--state-manager-config-file=./heron-conf/statemgr.yaml
--tmanager-binary=./heron-core/bin/heron-tmanager
--stmgr-binary=./heron-core/bin/heron-stmgr
--metrics-manager-classpath=./heron-core/lib/metricsmgr/*
--instance-jvm-opts="LVhYOitIZWFwRHVtcE9uT3V0T2ZNZW1vcnlFcnJvcg(61)(61)"
--classpath=heron-api-examples.jar
--heron-internals-config-file=./heron-conf/heron_internals.yaml
--override-config-file=./heron-conf/override.yaml
--component-ram-map=exclaim1:1073741824,word:1073741824
--component-jvm-opts="" --pkg-type=jar
--topology-binary-file=heron-api-examples.jar
--heron-java-home=$JAVA_HOME
--heron-shell-binary=./heron-core/bin/heron-shell
--cluster=kubernetes --role=saad --environment=default
--instance-classpath=./heron-core/lib/instance/*
--metrics-sinks-config-file=./heron-conf/metrics_sinks.yaml
--scheduler-classpath=./heron-core/lib/scheduler/*:./heron-core/lib/packing/*:./heron-core/lib/statemgr/*
--python-instance-binary=./heron-core/bin/heron-python-instance
--cpp-instance-binary=./heron-core/bin/heron-cpp-instance
--metricscache-manager-classpath=./heron-core/lib/metricscachemgr/*
--metricscache-manager-mode=disabled
--is-stateful=false
--checkpoint-manager-classpath=./heron-core/lib/ckptmgr/*:./heron-core/lib/statefulstorage/*:
--stateful-config-file=./heron-conf/stateful.yaml
--checkpoint-manager-ram=1073741824
--health-manager-mode=disabled
--health-manager-classpath=./heron-core/lib/healthmgr/*
--shard=$SHARD_ID --server-port=6001
--tmanager-controller-port=6002 --tmanager-stats-port=6003
--shell-port=6004 --metrics-manager-port=6005
--scheduler-port=6006 --metricscache-manager-server-port=6007
--metricscache-manager-stats-port=6008
--checkpoint-manager-port=6009'
env:
- name: HOST
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
- name: var_one
value: variable one
- name: var_three
value: variable three
- name: var_two
value: variable two
image: apache/heron:testbuild
imagePullPolicy: IfNotPresent
name: executor
ports:
- containerPort: 5555
name: tcp-port-kept
protocol: TCP
- containerPort: 5556
name: udp-port-kept
protocol: UDP
- containerPort: 6001
name: server
protocol: TCP
- containerPort: 6002
name: tmanager-ctl
protocol: TCP
- containerPort: 6003
name: tmanager-stats
protocol: TCP
- containerPort: 6004
name: shell-port
protocol: TCP
- containerPort: 6005
name: metrics-mgr
protocol: TCP
- containerPort: 6006
name: scheduler
protocol: TCP
- containerPort: 6007
name: metrics-cache-m
protocol: TCP
- containerPort: 6008
name: metrics-cache-s
protocol: TCP
- containerPort: 6009
name: ckptmgr
protocol: TCP
resources:
limits:
cpu: "3"
memory: 4Gi
requests:
cpu: "3"
memory: 4Gi
securityContext:
allowPrivilegeEscalation: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: path/to/mount/dynamic/volume
name: dynamicvolume
subPath: sub/path/to/mount/dynamic/volume
- mountPath: /shared_volume
name: shared-volume
- mountPath: path/to/mount/shared/volume
name: sharedvolume
subPath: sub/path/to/mount/shared/volume
- mountPath: path/to/mount/static/volume
name: staticvolume
subPath: sub/path/to/mount/static/volume
- image: alpine
imagePullPolicy: Always
name: sidecar-container
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /shared_volume
name: shared-volume
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 0
tolerations:
- effect: NoExecute
key: node.kubernetes.io/not-ready
operator: Exists
tolerationSeconds: 10
- effect: NoExecute
key: node.kubernetes.io/unreachable
operator: Exists
tolerationSeconds: 10
volumes:
- emptyDir: {}
name: shared-volume
- name: sharedvolume
persistentVolumeClaim:
claimName: requested-claim-by-user
updateStrategy:
rollingUpdate:
partition: 0
type: RollingUpdate
volumeClaimTemplates:
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: dynamicvolume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 256Gi
volumeMode: Block
status:
phase: Pending
- apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
labels:
onDemand: "true"
topology: acking
name: staticvolume
spec:
accessModes:
- ReadWriteOnce
- ReadOnlyMany
resources:
requests:
storage: 512Gi
storageClassName: storage-class-name
volumeMode: Block
status:
phase: Pending
status:
collisionCount: 0
currentReplicas: 2
currentRevision: acking-executors-6467c98557
observedGeneration: 1
replicas: 2
updateRevision: acking-executors-6467c98557
updatedReplicas: 2
```
</details>
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]