rakeshadr commented on code in PR #20:
URL: https://github.com/apache/ozone-helm-charts/pull/20#discussion_r2631983830
##########
charts/ozone/templates/datanode/datanode-statefulset.yaml:
##########
@@ -65,6 +65,10 @@ spec:
ports:
- name: ui
containerPort: {{ .Values.datanode.service.port }}
+ - name: ratis-ipc
+ containerPort: {{ .Values.datanode.service.ratisIpcPort }}
+ - name: ipc
+ containerPort: {{ .Values.datanode.service.ipcPort }}
livenessProbe:
Review Comment:
Presently, only livenessProbe is configured which is okay with non-HA but I
think, HA deployments may take longer initialization time. How about adding
startupProbe & readinessProbe for HA deployments to handle longer
initialization ?
Probably, you can raise a follow-up task to implement this. Below is sample
reference,
For OM
```
startupProbe:
exec:
command:
- sh
- -c
- |
# Check if OM has joined the Ratis ring
ozone admin om roles -id {{ .Values.clusterId }} 2>/dev/null | grep
-q $(hostname) && exit 0 || exit 1
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 30 # 5 minutes for bootstrap
timeoutSeconds: 5
readinessProbe:
exec:
command:
- sh
- -c
- |
# Check if OM is in LEADER or FOLLOWER state
ozone admin om roles -id {{ .Values.clusterId }} 2>/dev/null | grep
$(hostname) | grep -qE 'LEADER|FOLLOWER' && exit 0 || exit 1
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
```
For SCM:
```
startupProbe:
exec:
command:
- sh
- -c
- |
# Check if SCM has joined the Ratis ring
ozone admin scm roles 2>/dev/null | grep -q $(hostname -f) && exit 0
|| exit 1
initialDelaySeconds: 20
periodSeconds: 10
failureThreshold: 30
timeoutSeconds: 5
readinessProbe:
exec:
command:
- sh
- -c
- |
# Check if SCM is in LEADER or FOLLOWER state and not in SAFE_MODE
ozone admin scm roles 2>/dev/null | grep $(hostname -f) | grep -qE
'LEADER|FOLLOWER' && exit 0 || exit 1
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
```
##########
charts/ozone/templates/om/om-service-headless.yaml:
##########
@@ -28,6 +28,8 @@ spec:
ports:
- name: ui
port: {{ .Values.om.service.port }}
+ - name: ratis
+ port: {{ .Values.om.service.ratisPort }}
Review Comment:
The RPC port (9862) is missing from the headless service, but it's exposed
in the container (line 92 of om-statefulset.yaml). This port is used for client
connections, isn't it? Have you performed ozone client file write/read
operations and was that working fine without this?
```
ports:
- name: rpc
port: 9862
```
```
- name: rpc
port: {{ .Values.om.service.rpcPort }}
```
##########
charts/ozone/templates/scm/scm-statefulset.yaml:
##########
@@ -31,6 +31,7 @@ metadata:
app.kubernetes.io/component: scm
spec:
replicas: {{ .Values.scm.replicas }}
+ podManagementPolicy: Parallel
Review Comment:
https://github.com/apache/ozone-helm-charts/blob/c0fb376573ec6822b19c7e0814896f979f2969d7/charts/ozone/templates/scm/scm-statefulset.yaml#L43
OM includes dynamic environment config in checksum but SCM doesn't. Means,
SCM pods won't get restarted on the config changes. Seems inconsistent to me.
Since ozone.configuration.env generates dynamic config based on replica
changes, both OM & SCM should restart, right?
##########
charts/ozone/templates/om/om-bootstrap-configmap.yaml:
##########
@@ -0,0 +1,78 @@
+{{- if and .Values.om.persistence.enabled (gt (len (ternary (splitList ","
(include "ozone.om.bootstrap.nodes" .)) (list) (ne "" (include
"ozone.om.bootstrap.nodes" .)))) 0) }}
+apiVersion: v1
+kind: ConfigMap
+metadata:
+ name: {{ .Release.Name }}-om-bootstrap-script
+ labels:
+ {{- include "ozone.labels" . | nindent 4 }}
+ app.kubernetes.io/component: om
+data:
+ om-bootstrap.sh: |-
+ #!/bin/sh
+ set -e
+
+ HELM_MANAGER_PATH="{{ .Values.om.persistence.path }}{{
.Values.helm.persistence.path }}"
+ HELM_MANAGER_BOOTSTRAPPED_FILE="$HELM_MANAGER_PATH/bootstrapped"
+
+ # These are templated from Helm
+ OZONE_OM_ARGS_LIST="{{- range .Values.om.args }} {{ . }} {{- end }}"
+ OZONE_OM_BOOTSTRAP_NODES="{{ include "ozone.om.bootstrap.nodes" . }}"
+ OZONE_OM_CLUSTER_IDS="{{ include "ozone.om.cluster.ids" . }}"
+ OZONE_CLUSTER_ID="{{ .Values.clusterId }}"
+
+ if [ -z "$OZONE_OM_BOOTSTRAP_NODES" ]; then
+ echo "No bootstrap handling needed!"
+ exit 0
+ fi
+
+ joinArr() {
+ local IFS=","
+ echo "$*"
+ }
+
Review Comment:
Network issues or some glitches might cause transient failures. Adding retry
logic helps to improve reliability.
Below is an example code with simple retry, but if you can extend this
snippet to "exponential retry logic" with minimal code changes then it would be
good.
```
run_bootstrap() {
local overwriteCmd="$1"
local max_attempts=3
local attempt=1
while [ $attempt -le $max_attempts ]; do
echo "Bootstrap attempt $attempt of $max_attempts"
if ozone admin om --set "ozone.om.nodes.$OZONE_CLUSTER_ID=$overwriteCmd"
--bootstrap; then
echo "$HOSTNAME was successfully bootstrapped!"
mkdir -p "$HELM_MANAGER_PATH"
touch "$HELM_MANAGER_BOOTSTRAPPED_FILE"
exit 0
else
echo "Bootstrap failed with exit code $?, attempt $attempt"
attempt=$((attempt + 1))
[ $attempt -le $max_attempts ] && sleep 10
fi
done
echo "Bootstrap failed after $max_attempts attempts"
exit 1
}
```
##########
charts/ozone/templates/om/om-statefulset.yaml:
##########
@@ -39,25 +40,45 @@ spec:
template:
metadata:
annotations:
- checksum/config: {{ include (print $.Template.BasePath
"/ozone-configmap.yaml") . | sha256sum }}
+ checksum/config: {{ include (print $.Template.BasePath
"/ozone-configmap.yaml") . | cat (include "ozone.configuration.env" .) |
sha256sum }}
labels:
{{- include "ozone.selectorLabels" . | nindent 8 }}
app.kubernetes.io/component: om
spec:
+ {{- if and .Values.om.persistence.enabled (gt (len $bnodes) 0) }}
Review Comment:
During pod termination, OM/SCM stops immediately. This may interrupt Ratis
log replication and could cause uncommitted transactions to be lost, isn't it?.
Can we think of _preStop_ hook or _graceful termination period_ configured.
If you think so, then this also can be implemented in a follow-up task as it
requires some additional test efforts.
Below is sample snippet, its only for a quick reference.
```
spec:
template:
spec:
terminationGracePeriodSeconds: 120 # Allow time for graceful shutdown
containers:
- name: om
lifecycle:
preStop:
exec:
command:
- sh
- -c
- |
# Transfer leadership if this is the leader
LEADER=$(ozone admin om roles -id {{ .Values.clusterId
}} | grep LEADER | awk '{print $1}')
if [ "$LEADER" = "$(hostname)" ]; then
echo "This OM is leader, transferring leadership..."
ozone admin om transfer -id {{ .Values.clusterId }} -r
|| true
sleep 10
fi
# Graceful shutdown
kill -TERM 1
sleep 30
```
##########
charts/ozone/templates/_helpers.tpl:
##########
@@ -51,26 +51,166 @@ app.kubernetes.io/instance: {{ .Release.Name }}
{{- $pods | join "," }}
{{- end }}
-{{/* Common configuration environment variables */}}
-{{- define "ozone.configuration.env" -}}
+{{/* List of comma separated om ids */}}
+{{- define "ozone.om.cluster.ids" -}}
+ {{- $pods := list }}
+ {{- $replicas := .Values.om.replicas | int }}
+ {{- range $i := until $replicas }}
+ {{- $pods = append $pods (printf "%s-om-%d" $.Release.Name $i) }}
+ {{- end }}
+ {{- $pods | join "," }}
+{{- end }}
+
+{{/* List of comma separated scm ids */}}
+{{- define "ozone.scm.cluster.ids" -}}
+ {{- $pods := list }}
+ {{- $replicas := .Values.scm.replicas | int }}
+ {{- range $i := until $replicas }}
+ {{- $pods = append $pods (printf "%s-scm-%d" $.Release.Name $i) }}
+ {{- end }}
+ {{- $pods | join "," }}
+{{- end }}
+
+{{/* List of decommission om nodes */}}
+{{- define "ozone.om.decommissioned.nodes" -}}
+ {{- $nodes := list }}
+ {{- $statefulset := lookup "apps/v1" "StatefulSet" $.Release.Namespace
(printf "%s-om" $.Release.Name) -}}
+ {{- if $statefulset }}
+ {{- $oldCount := $statefulset.spec.replicas | int -}}
+ {{- $newCount := .Values.om.replicas | int }}
+ {{- range $i := until $oldCount }}
+ {{- $minCount := max $newCount 1 -}}
+ {{- if ge $i $minCount }}
+ {{- $nodes = append $nodes (printf "%s-om-%d" $.Release.Name $i) }}
+ {{- end }}
+ {{- end }}
+ {{- end }}
+ {{- $nodes | join "," }}
+{{- end }}
+
+{{/* List of bootstrap om nodes */}}
+{{- define "ozone.om.bootstrap.nodes" -}}
+ {{- $nodes := list }}
+ {{- $statefulset := lookup "apps/v1" "StatefulSet" $.Release.Namespace
(printf "%s-om" $.Release.Name) -}}
+ {{- if $statefulset }}
+ {{- $oldCount := $statefulset.spec.replicas | int -}}
+ {{- $newCount := .Values.om.replicas | int }}
+ {{- range $i := until $newCount }}
+ {{- if ge $i $oldCount }}
+ {{- $nodes = append $nodes (printf "%s-om-%d" $.Release.Name $i) }}
+ {{- end }}
+ {{- end }}
+ {{- end }}
+ {{- $nodes | join ","}}
+{{- end }}
+
+{{/* List of decommission scm nodes */}}
+{{- define "ozone.scm.decommissioned.nodes" -}}
+ {{- $nodes := list }}
+ {{- $statefulset := lookup "apps/v1" "StatefulSet" $.Release.Namespace
(printf "%s-scm" $.Release.Name) -}}
+ {{- if $statefulset }}
+ {{- $oldCount := $statefulset.spec.replicas | int -}}
+ {{- $newCount := .Values.scm.replicas | int }}
+ {{- range $i := until $oldCount }}
+ {{- if ge $i $newCount }}
+ {{- $nodes = append $nodes (printf "%s-scm-%d" $.Release.Name $i)
}}
+ {{- end }}
+ {{- end }}
+ {{- end }}
+ {{- $nodes | join "," -}}
+{{- end }}
+
+{{/* List of decommission data nodes */}}
+{{- define "ozone.data.decommissioned.hosts" -}}
+ {{- $hosts := list }}
+ {{- $statefulset := lookup "apps/v1" "StatefulSet" $.Release.Namespace
(printf "%s-datanode" $.Release.Name) -}}
+ {{- if $statefulset }}
+ {{- $oldCount := $statefulset.spec.replicas | int -}}
+ {{- $newCount := .Values.datanode.replicas | int }}
+ {{- range $i := until $oldCount }}
+ {{- if ge $i $newCount }}
+ {{- $hosts = append $hosts (printf
"%s-datanode-%d.%s-datanode-headless.%s.svc.cluster.local" $.Release.Name $i
$.Release.Name $.Release.Namespace) }}
+ {{- end }}
+ {{- end }}
+ {{- end }}
+ {{- $hosts | join "," -}}
+{{- end }}
+
+{{- define "ozone.configuration.env.common" -}}
- name: OZONE-SITE.XML_hdds.datanode.dir
value: /data/storage
- name: OZONE-SITE.XML_ozone.scm.datanode.id.dir
value: /data/metadata
- name: OZONE-SITE.XML_ozone.metadata.dirs
value: /data/metadata
-- name: OZONE-SITE.XML_ozone.scm.block.client.address
- value: {{ include "ozone.scm.pods" . }}
-- name: OZONE-SITE.XML_ozone.scm.client.address
- value: {{ include "ozone.scm.pods" . }}
-- name: OZONE-SITE.XML_ozone.scm.names
- value: {{ include "ozone.scm.pods" . }}
-- name: OZONE-SITE.XML_ozone.om.address
- value: {{ include "ozone.om.pods" . }}
+- name: OZONE-SITE.XML_ozone.scm.ratis.enable
+ value: "true"
+- name: OZONE-SITE.XML_ozone.scm.service.ids
+ value: {{ .Values.clusterId }}
+- name: OZONE-SITE.XML_ozone.scm.nodes.{{ .Values.clusterId }}
+ value: {{ include "ozone.scm.cluster.ids" . }}
+ {{/*- name: OZONE-SITE.XML_ozone.scm.skip.bootstrap.validation*/}}
+ {{/* value: {{ quote .Values.scm.skipBootstrapValidation }}*/}}
+{{- range $i, $val := until ( .Values.scm.replicas | int ) }}
+- name: {{ printf "OZONE-SITE.XML_ozone.scm.address.%s.%s-scm-%d"
$.Values.clusterId $.Release.Name $i }}
+ value: {{ printf "%s-scm-%d.%s-scm-headless.%s.svc.cluster.local"
$.Release.Name $i $.Release.Name $.Release.Namespace }}
+{{- end }}
+- name: OZONE-SITE.XML_ozone.scm.primordial.node.id
+ value: {{ printf "%s-scm-0" $.Release.Name }}
+- name: OZONE-SITE.XML_ozone.om.ratis.enable
+ value: "true"
+- name: OZONE-SITE.XML_ozone.om.service.ids
+ value: {{ .Values.clusterId }}
- name: OZONE-SITE.XML_hdds.scm.safemode.min.datanode
value: "3"
- name: OZONE-SITE.XML_ozone.datanode.pipeline.limit
value: "1"
- name: OZONE-SITE.XML_dfs.datanode.use.datanode.hostname
value: "true"
{{- end }}
+
+{{/* Common configuration environment variables */}}
+{{- define "ozone.configuration.env" -}}
+{{- $bOmNodes := ternary (splitList "," (include "ozone.om.bootstrap.nodes"
.)) (list) (ne "" (include "ozone.om.bootstrap.nodes" .)) }}
+{{- $dOmNodes := ternary (splitList "," (include
"ozone.om.decommissioned.nodes" .)) (list) (ne "" (include
"ozone.om.decommissioned.nodes" .)) }}
+{{- $activeOmNodes := ternary (splitList "," (include "ozone.om.cluster.ids"
.)) (list) (ne "" (include "ozone.om.cluster.ids" .)) }}
+{{ include "ozone.configuration.env.common" . }}
+{{- if gt (len $dOmNodes) 0 }}
+{{- $decomIds := $dOmNodes | join "," }}
+- name: OZONE-SITE.XML_ozone.om.decommissioned.nodes.{{ .Values.clusterId }}
+ value: {{ $decomIds }}
+{{- else}}
+- name: OZONE-SITE.XML_ozone.om.decommissioned.nodes.{{ .Values.clusterId }}
+ value: ""
+{{- end }}
+- name: OZONE-SITE.XML_ozone.om.nodes.{{ .Values.clusterId }}
+ value: {{ $activeOmNodes | join "," }}
+{{- range $tempId := $activeOmNodes }}
+- name: {{ printf "OZONE-SITE.XML_ozone.om.address.%s.%s" $.Values.clusterId
$tempId }}
+ value: {{ printf "%s.%s-om-headless.%s.svc.cluster.local" $tempId
$.Release.Name $.Release.Namespace }}
+{{- end }}
+{{- range $tempId := $dOmNodes }}
+- name: {{ printf "OZONE-SITE.XML_ozone.om.address.%s.%s" $.Values.clusterId
$tempId }}
+ value: {{ printf "%s-helm-manager-decommission-%s-svc.%s.svc.cluster.local"
$.Release.Name $tempId $.Release.Namespace }}
+{{- end }}
+{{- end }}
+
+{{/* Common configuration environment variables for pre hook */}}
+{{- define "ozone.configuration.env.prehook" -}}
+{{- $bOmNodes := ternary (splitList "," (include "ozone.om.bootstrap.nodes"
.)) (list) (ne "" (include "ozone.om.bootstrap.nodes" .)) }}
+{{- $dOmNodes := ternary (splitList "," (include
"ozone.om.decommissioned.nodes" .)) (list) (ne "" (include
"ozone.om.decommissioned.nodes" .)) }}
+{{- $activeOmNodes := ternary (splitList "," (include "ozone.om.cluster.ids"
.)) (list) (ne "" (include "ozone.om.cluster.ids" .)) }}
+{{- $allOmNodes := concat $activeOmNodes $dOmNodes }}
+{{ include "ozone.configuration.env.common" . }}
+- name: OZONE-SITE.XML_ozone.om.decommissioned.nodes.{{ .Values.clusterId }}
+ value: ""
+{{- range $tempId := $allOmNodes }}
+- name: {{ printf "OZONE-SITE.XML_ozone.om.address.%s.%s" $.Values.clusterId
$tempId }}
+ value: {{ printf "%s.%s-om-headless.%s.svc.cluster.local" $tempId
$.Release.Name $.Release.Namespace }}
+{{- end }}
+{{ $allOmNodes = append $allOmNodes "om-leader-transfer"}}
+- name: OZONE-SITE.XML_ozone.om.nodes.{{ .Values.clusterId }}
+ value: {{ $allOmNodes | join "," }}
+- name: "OZONE-SITE.XML_ozone.om.address.{{ .Values.clusterId
}}.om-leader-transfer"
+ value: localhost
+{{- end }}
Review Comment:
Its good to consider ratis configurations to handle network partitions or
slow networks. Probably, you can raise a follow-up task to find out the
necessary ratis configurations and add it.
```
1. Ratis Timeouts for Network Resilience
2. Heartbeat Settings
3. Leader Election
```
##########
charts/ozone/templates/helm/om-leader-transfer-job.yaml:
##########
@@ -0,0 +1,82 @@
+{{- if .Values.om.persistence.enabled }}
+{{- $dnodes := ternary (splitList "," (include "ozone.om.decommissioned.nodes"
.)) (list) (ne "" (include "ozone.om.decommissioned.nodes" .)) }}
+{{- $env := concat .Values.env .Values.helm.env }}
+{{- $envFrom := concat .Values.envFrom .Values.helm.envFrom }}
+{{- $nodeSelector := or .Values.helm.nodeSelector .Values.nodeSelector }}
+{{- $affinity := or .Values.helm.affinity .Values.affinity }}
+{{- $tolerations := or .Values.helm.tolerations .Values.tolerations }}
+{{- $securityContext := or .Values.helm.securityContext
.Values.securityContext }}
+{{- if (gt (len $dnodes) 0) }}
+---
+apiVersion: batch/v1
+kind: Job
+metadata:
+ name: {{ printf "%s-helm-manager-leader-transfer" $.Release.Name }}
+ labels:
+ {{- include "ozone.labels" $ | nindent 4 }}
+ app.kubernetes.io/component: helm-manager
+ annotations:
+ "helm.sh/hook": pre-upgrade
+ "helm.sh/hook-weight": "0"
+ "helm.sh/hook-delete-policy": hook-succeeded,hook-failed
+spec:
+ backoffLimit: {{ $.Values.helm.backoffLimit }}
+ template:
+ metadata:
+ labels:
+ {{- include "ozone.selectorLabels" $ | nindent 8 }}
+ app.kubernetes.io/component: helm-manager
+ spec:
+ containers:
+ - name: om-leader-transfer
+ image: "{{ $.Values.image.repository }}:{{ $.Values.image.tag |
default $.Chart.AppVersion }}"
+ imagePullPolicy: {{ $.Values.image.pullPolicy }}
+ {{- with $.Values.om.command }}
+ command: {{- tpl (toYaml .) $ | nindent 12 }}
+ {{- end }}
+ args:
+ - sh
+ - -c
+ - |
+ set -e
+ exec ozone admin om transfer -id={{ $.Values.clusterId }} -n={{
$.Release.Name }}-om-0
+ env:
+ {{- include "ozone.configuration.env.prehook" $ | nindent 12 }}
+ {{- with $env }}
+ {{- tpl (toYaml .) $ | nindent 12 }}
+ {{- end }}
+ {{- with $envFrom }}
+ envFrom:
+ {{- tpl (toYaml .) $ | nindent 12 }}
+ {{- end }}
+ ports:
+ - name: om-rpc
+ containerPort: {{ $.Values.om.service.rpcPort }}
+ - name: om-ratis
+ containerPort: {{ $.Values.om.service.ratisPort }}
Review Comment:
Ratis port is exposed even when om.replicas = 1 (non-HA mode). In
single-node mode, Ratis is not needed.
Suggestion:
`{{- if gt (int .Values.om.replicas) 1 }}`
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]