Re: [I] Our Pinot cluster is not highly available [pinot]

via GitHub Tue, 11 Jun 2024 16:52:08 -0700


piby180 commented on issue #12888:
URL: https://github.com/apache/pinot/issues/12888#issuecomment-2161795276


   We have `terminationGracePeriodSeconds: 300` . This means k8s will allow a 
grace period of 5 mins before the server can shut down gracefully. After 5 
mins, k8s will then kill the server pod. Our server usually terminate 
gracefully in 1-2 mins. So, I think the server pods are shutting down 
gracefully. 
   
   I see this error **after** server pod has finished restarting and now has 
healthy logs but the status of the pod is still not Ready yet. When the status 
of the pod is Ready, the error disappears. 
   
   Here is the yaml for our server statefulset
   ````
   apiVersion: apps/v1
   kind: StatefulSet
   metadata:
     annotations:
       meta.helm.sh/release-name: pinot
       meta.helm.sh/release-namespace: pinot-dev
     creationTimestamp: "2024-04-09T15:24:16Z"
     generation: 23
     labels:
       app: pinot
       app.kubernetes.io/managed-by: Helm
       app.kubernetes.io/version: 1.0.0
       component: server
       helm.sh/chart: pinot-0.2.9
       heritage: Helm
       release: pinot
     name: pinot-server
     namespace: pinot-dev
     resourceVersion: "1272018086"
     uid: 04b4409e-c877-4210-9083-97d4652f7174
   spec:
     persistentVolumeClaimRetentionPolicy:
       whenDeleted: Retain
       whenScaled: Retain
     podManagementPolicy: Parallel
     replicas: 3
     revisionHistoryLimit: 10
     selector:
       matchLabels:
         app: pinot
         component: server
         release: pinot
     serviceName: pinot-server-headless
     template:
       metadata:
         annotations:
           checksum/config: 
5cb2dcd44b42446bd08039c9485c290d7f2d4b3e2bd080090c5a19f6eda456a8
           kubectl.kubernetes.io/restartedAt: "2024-06-10T14:43:42Z"
           prometheus.io/port: "8008"
           prometheus.io/scrape: "true"
         creationTimestamp: null
         labels:
           app: pinot
           app.kubernetes.io/managed-by: Helm
           app.kubernetes.io/version: 1.0.0
           component: server
           helm.sh/chart: pinot-0.2.9
           heritage: Helm
           release: pinot
       spec:
         affinity: {}
         containers:
         - args:
           - StartServer
           - -clusterName
           - pinot
           - -zkAddress
           - pinot-zookeeper:2181
           - -configFileName
           - /var/pinot/server/config/pinot-server.conf
           env:
           - name: JAVA_OPTS
             value: -XX:+ExitOnOutOfMemoryError -Xms1G -Xmx6G 
-Djute.maxbuffer=100000000
               -XX:+UseG1GC -XX:MaxGCPauseMillis=200 
-Xlog:gc*:file=/opt/pinot/gc-pinot-controller.log  
-javaagent:/opt/pinot/etc/jmx_prometheus_javaagent/jmx_prometheus_javaagent.jar=8008:/opt/pinot/etc/jmx_prometheus_javaagent/configs/pinot.yml
               
-Dlog4j2.configurationFile=/opt/pinot/etc/conf/pinot-server-log4j2.xml
               -Dplugins.dir=/opt/pinot/plugins
           - name: LOG4J_CONSOLE_LEVEL
             value: error
           image: <registry>/pinot:release-1.1.0
           imagePullPolicy: Always
           livenessProbe:
             failureThreshold: 10
             httpGet:
               path: /health/liveness
               port: 8097
               scheme: HTTP
             initialDelaySeconds: 60
             periodSeconds: 10
             successThreshold: 1
             timeoutSeconds: 5
           name: server
           ports:
           - containerPort: 8098
             name: netty
             protocol: TCP
           - containerPort: 8097
             name: admin
             protocol: TCP
           readinessProbe:
             failureThreshold: 10
             httpGet:
               path: /health/readiness
               port: 8097
               scheme: HTTP
             initialDelaySeconds: 60
             periodSeconds: 10
             successThreshold: 1
             timeoutSeconds: 5
           resources:
             limits:
               cpu: "4"
               memory: 14Gi
             requests:
               cpu: "4"
               memory: 14Gi
           securityContext:
             runAsGroup: 3000
             runAsNonRoot: true
             runAsUser: 1000
           startupProbe:
             failureThreshold: 120
             httpGet:
               path: /health/liveness
               port: 8097
               scheme: HTTP
             initialDelaySeconds: 60
             periodSeconds: 10
             successThreshold: 1
             timeoutSeconds: 5
           terminationMessagePath: /dev/termination-log
           terminationMessagePolicy: File
           volumeMounts:
           - mountPath: /var/pinot/server/config
             name: config
           - mountPath: /var/pinot/server/data
             name: data
         dnsPolicy: ClusterFirst
         imagePullSecrets:
         - name: pinot
         nodeSelector:
           workload-type: pinot
         restartPolicy: Always
         schedulerName: default-scheduler
         securityContext:
           fsGroup: 3000
           fsGroupChangePolicy: Always
           runAsGroup: 3000
           runAsUser: 1000
         serviceAccount: pinot
         serviceAccountName: pinot
         terminationGracePeriodSeconds: 300
         volumes:
         - configMap:
             defaultMode: 420
             name: pinot-server-config
           name: config
     updateStrategy:
       type: RollingUpdate
     volumeClaimTemplates:
     - apiVersion: v1
       kind: PersistentVolumeClaim
       metadata:
         creationTimestamp: null
         name: data
       spec:
         accessModes:
         - ReadWriteOnce
         resources:
           requests:
             storage: 1T
         storageClassName: ebs-csi-gp3-pinot
         volumeMode: Filesystem
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] Our Pinot cluster is not highly available [pinot]

Reply via email to