[
https://issues.apache.org/jira/browse/FLINK-38047?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Robert Metzger updated FLINK-38047:
-----------------------------------
Fix Version/s: kubernetes-operator-1.13.0
> Bump cert-manager in the Kubernetes Operator
> --------------------------------------------
>
> Key: FLINK-38047
> URL: https://issues.apache.org/jira/browse/FLINK-38047
> Project: Flink
> Issue Type: Technical Debt
> Components: Kubernetes Operator
> Reporter: Kumar Mallikarjuna
> Priority: Major
> Labels: dependency, pull-request-available
> Fix For: kubernetes-operator-1.13.0
>
>
> Flink Kubernetes Operator currently use cert-manager:{_}v1.8.2{_} in the
> [CI|https://github.com/apache/flink-kubernetes-operator/blob/main/e2e-tests/cert-manager.yaml]
> and recommends the same in
> [docs|https://github.com/apache/flink-kubernetes-operator/blob/8812c78cd6a2c0ad1b672ca08a8b880bd890ae8b/docs/content/docs/try-flink-kubernetes-operator/quick-start.md?plain=1#L69-L72].
> The latest stable release _v1.18.2_ is ten minor versions ahead. We should
> bump the recommendations and tests to the latest release.
>
> *Validation for _cert-manager:v1.18.2_ with
> _flink-kubernetes-operator:v1.12.0_*
> 1. Start a kind cluster
> {code:java}
> ➜ flink-kubernetes-operator git:(main) ✗ kind create cluster
> Creating cluster "kind" ...
> ✓ Ensuring node image (kindest/node:v1.32.2) 🖼
> ✓ Preparing nodes 📦
> ✓ Writing configuration 📜
> ✓ Starting control-plane 🕹️
> ✓ Installing CNI 🔌
> ✓ Installing StorageClass 💾
> Set kubectl context to "kind-kind"
> You can now use your cluster with:kubectl cluster-info --context
> kind-kindHave a nice day! 👋
> {code}
>
> 2. Install cert-manager v1.18.2
> {code:java}
> ➜ flink-kubernetes-operator git:(main) ✗ kubectl create -f
> https://github.com/cert-manager/cert-manager/releases/download/v1.18.2/cert-manager.yaml
> namespace/cert-manager created
> customresourcedefinition.apiextensions.k8s.io/certificaterequests.cert-manager.io
> created
> customresourcedefinition.apiextensions.k8s.io/certificates.cert-manager.io
> created
> customresourcedefinition.apiextensions.k8s.io/challenges.acme.cert-manager.io
> created
> customresourcedefinition.apiextensions.k8s.io/clusterissuers.cert-manager.io
> created
> customresourcedefinition.apiextensions.k8s.io/issuers.cert-manager.io created
> customresourcedefinition.apiextensions.k8s.io/orders.acme.cert-manager.io
> created
> serviceaccount/cert-manager-cainjector created
> serviceaccount/cert-manager created
> serviceaccount/cert-manager-webhook created
> clusterrole.rbac.authorization.k8s.io/cert-manager-cainjector created
> clusterrole.rbac.authorization.k8s.io/cert-manager-controller-issuers created
> clusterrole.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers
> created
> clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificates
> created
> clusterrole.rbac.authorization.k8s.io/cert-manager-controller-orders created
> clusterrole.rbac.authorization.k8s.io/cert-manager-controller-challenges
> created
> clusterrole.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim
> created
> clusterrole.rbac.authorization.k8s.io/cert-manager-cluster-view created
> clusterrole.rbac.authorization.k8s.io/cert-manager-view created
> clusterrole.rbac.authorization.k8s.io/cert-manager-edit created
> clusterrole.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io
> created
> clusterrole.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests
> created
> clusterrole.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews
> created
> clusterrolebinding.rbac.authorization.k8s.io/cert-manager-cainjector created
> clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-issuers
> created
> clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-clusterissuers
> created
> clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificates
> created
> clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-orders
> created
> clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-challenges
> created
> clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-ingress-shim
> created
> clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-approve:cert-manager-io
> created
> clusterrolebinding.rbac.authorization.k8s.io/cert-manager-controller-certificatesigningrequests
> created
> clusterrolebinding.rbac.authorization.k8s.io/cert-manager-webhook:subjectaccessreviews
> created
> role.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection created
> role.rbac.authorization.k8s.io/cert-manager:leaderelection created
> role.rbac.authorization.k8s.io/cert-manager-tokenrequest created
> role.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving created
> rolebinding.rbac.authorization.k8s.io/cert-manager-cainjector:leaderelection
> created
> rolebinding.rbac.authorization.k8s.io/cert-manager:leaderelection created
> rolebinding.rbac.authorization.k8s.io/cert-manager-cert-manager-tokenrequest
> created
> rolebinding.rbac.authorization.k8s.io/cert-manager-webhook:dynamic-serving
> created
> service/cert-manager-cainjector created
> service/cert-manager created
> service/cert-manager-webhook created
> deployment.apps/cert-manager-cainjector created
> deployment.apps/cert-manager created
> deployment.apps/cert-manager-webhook created
> mutatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook
> created
> validatingwebhookconfiguration.admissionregistration.k8s.io/cert-manager-webhook
> created
> {code}
>
> 3. Wait for cert-manager to be ready
> {code:java}
> ➜ flink-kubernetes-operator git:(main) ✗ k -n cert-manager get po
> NAME READY STATUS RESTARTS AGE
> cert-manager-69f748766f-28s8d 1/1 Running 0 44s
> cert-manager-cainjector-7cf6557c49-gdfd7 1/1 Running 0 44s
> cert-manager-webhook-58f4cff74d-kz4pc 1/1 Running 0 44s
> {code}
>
> 4. Install flink-kubernetes-operator
> {code:java}
> ➜ flink-kubernetes-operator git:(main) ✗ helm install
> flink-kubernetes-operator flink-operator-repo/flink-kubernetes-operator
> W0704 14:33:26.593488 51760 warnings.go:70] spec.privateKey.rotationPolicy:
> In cert-manager >= v1.18.0, the default value changed from `Never` to
> `Always`.
> NAME: flink-kubernetes-operator
> LAST DEPLOYED: Fri Jul 4 14:33:25 2025
> NAMESPACE: default
> STATUS: deployed
> REVISION: 1
> TEST SUITE: None{code}
>
> *Note:* The warning about _spec.privateKey.rotationPolicy_ is expected and
> can be ignored since it does not affect the functionality of the
> operator/webhook.
>
> 5. Verify the operator/webhook are running
> {code:java}
> ➜ flink-kubernetes-operator git:(main) ✗ k get po
> NAME READY STATUS RESTARTS AGE
> flink-kubernetes-operator-7dc7858566-42g5z 2/2 Running 0
> 112s{code}
>
> 6. Test with a sample FlinkDeployment
> {code:java}
> ➜ flink-kubernetes-operator git:(main) ✗ k create -f examples/basic.yaml
> flinkdeployment.flink.apache.org/basic-example created
>
> ➜ flink-kubernetes-operator git:(main) ✗ k get
> flinkdeployments.flink.apache.org
> NAME JOB STATUS LIFECYCLE STATE
> basic-example RUNNING STABLE
> ➜ flink-kubernetes-operator git:(main) ✗ k get po
> NAME READY STATUS RESTARTS AGE
> basic-example-6c7bff5c68-w669x 1/1 Running 0 70s
> basic-example-taskmanager-1-1 1/1 Running 0 23s
> flink-kubernetes-operator-7dc7858566-42g5z 2/2 Running 0
> 3m27s{code}
>
> 7. Clean up the FlinkDeployment
> {code:java}
> ➜ flink-kubernetes-operator git:(main) ✗ k delete
> flinkdeployments.flink.apache.org basic-example
> flinkdeployment.flink.apache.org "basic-example" deleted {code}
>
> 8. Force rotate the certificate
> {code:java}
> ➜ flink-kubernetes-operator git:(main) ✗ k get certificate
> NAME READY SECRET AGE
> flink-operator-serving-cert True webhook-server-cert 4m48s
> ➜ flink-kubernetes-operator git:(main) ✗ k get certificate
> flink-operator-serving-cert -oyaml
> apiVersion: cert-manager.io/v1
> kind: Certificate
> metadata:
> annotations:
> meta.helm.sh/release-name: flink-kubernetes-operator
> meta.helm.sh/release-namespace: default
> creationTimestamp: "2025-07-04T09:03:26Z"
> generation: 1
> labels:
> app.kubernetes.io/managed-by: Helm
> name: flink-operator-serving-cert
> namespace: default
> resourceVersion: "997"
> uid: b0e1935c-eab8-4b61-ad9f-7bb0bf166c07
> spec:
> commonName: FlinkDeployment Validator
> dnsNames:
> - flink-operator-webhook-service.default.svc
> - flink-operator-webhook-service.default.svc.cluster.local
> issuerRef:
> kind: Issuer
> name: flink-operator-selfsigned-issuer
> keystores:
> pkcs12:
> create: true
> passwordSecretRef:
> key: password
> name: flink-operator-webhook-secret
> secretName: webhook-server-cert
> status:
> conditions:
> - lastTransitionTime: "2025-07-04T09:03:26Z"
> message: Certificate is up to date and has not expired
> observedGeneration: 1
> reason: Ready
> status: "True"
> type: Ready
> notAfter: "2025-10-02T09:03:26Z"
> notBefore: "2025-07-04T09:03:26Z"
> renewalTime: "2025-09-02T09:03:26Z"
> revision: 1
> ➜ flink-kubernetes-operator git:(main) ✗ cmctl renew
> flink-operator-serving-cert
> Manually triggered issuance of Certificate default/flink-operator-serving-cert
> ➜ flink-kubernetes-operator git:(main) ✗ k get certificate
> flink-operator-serving-cert -oyaml
> apiVersion: cert-manager.io/v1
> kind: Certificate
> metadata:
> annotations:
> meta.helm.sh/release-name: flink-kubernetes-operator
> meta.helm.sh/release-namespace: default
> creationTimestamp: "2025-07-04T09:03:26Z"
> generation: 1
> labels:
> app.kubernetes.io/managed-by: Helm
> name: flink-operator-serving-cert
> namespace: default
> resourceVersion: "1591"
> uid: b0e1935c-eab8-4b61-ad9f-7bb0bf166c07
> spec:
> commonName: FlinkDeployment Validator
> dnsNames:
> - flink-operator-webhook-service.default.svc
> - flink-operator-webhook-service.default.svc.cluster.local
> issuerRef:
> kind: Issuer
> name: flink-operator-selfsigned-issuer
> keystores:
> pkcs12:
> create: true
> passwordSecretRef:
> key: password
> name: flink-operator-webhook-secret
> secretName: webhook-server-cert
> status:
> conditions:
> - lastTransitionTime: "2025-07-04T09:03:26Z"
> message: Certificate is up to date and has not expired
> observedGeneration: 1
> reason: Ready
> status: "True"
> type: Ready
> notAfter: "2025-10-02T09:08:37Z"
> notBefore: "2025-07-04T09:08:37Z"
> renewalTime: "2025-09-02T09:08:37Z"
> revision: 2 {code}
>
> 9. Verify the operator/webhook are still running
> {code:java}
> ➜ flink-kubernetes-operator git:(main) ✗ k get po
> NAME READY STATUS RESTARTS AGE
> flink-kubernetes-operator-7dc7858566-42g5z 2/2 Running 0
> 5m50s {code}
>
> 10. Check logs for the webhook and verify if the certificate was reloaded
> {code:java}
> ➜ flink-kubernetes-operator git:(main) ✗ k logs
> flink-kubernetes-operator-7dc7858566-42g5z -c flink-webhook | tail -20
> 2025-07-04 09:03:57,113 o.a.f.k.o.f.FileSystemWatchService [INFO ] Starting
> watching path: /certs
> 2025-07-04 09:03:57,117 o.a.f.k.o.f.FileSystemWatchService [INFO ] Path is
> resolved to real path: /certs
> 2025-07-04 09:03:57,186 o.a.f.k.o.a.FlinkOperatorWebhook [INFO ] Webhook
> listening at 0:0:0:0:0:0:0:0:9443
> 2025-07-04 09:08:47,807 o.a.f.k.o.a.FlinkOperatorWebhook [INFO ] Reloading
> SSL context because of certificate change
> 2025-07-04 09:08:47,809 o.a.f.k.o.s.ReloadableSslContext [INFO ] Creating
> keystore with type: pkcs12
> 2025-07-04 09:08:47,810 o.a.f.k.o.s.ReloadableSslContext [INFO ] Loading
> keystore from file: /certs/keystore.p12
> 2025-07-04 09:08:47,816 o.a.f.k.o.s.ReloadableSslContext [INFO ] Initializing
> key manager with keystore and password
> 2025-07-04 09:08:47,821 o.a.f.k.o.a.FlinkOperatorWebhook [INFO ] SSL context
> reloaded successfully
> 2025-07-04 09:08:56,977 o.a.f.c.GlobalConfiguration [INFO ] Using legacy
> YAML parser to load flink configuration file from
> /opt/flink/conf/flink-conf.yaml.
> 2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading
> configuration property: parallelism.default, 1
> 2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading
> configuration property: taskmanager.numberOfTaskSlots, 1
> 2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading
> configuration property:
> kubernetes.operator.default-configuration.flink-version.v1_18.env.java.opts.all,
> --add-exports=java.base/sun.net.util=ALL-UNNAMED
> --add-exports=java.rmi/sun.rmi.registry=ALL-UNNAMED
> --add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED
> --add-opens=java.base/java.lang=ALL-UNNAMED
> --add-opens=java.base/java.net=ALL-UNNAMED
> --add-opens=java.base/java.io=ALL-UNNAMED
> --add-opens=java.base/java.nio=ALL-UNNAMED
> --add-opens=java.base/sun.nio.ch=ALL-UNNAMED
> --add-opens=java.base/java.lang.reflect=ALL-UNNAMED
> --add-opens=java.base/java.text=ALL-UNNAMED
> --add-opens=java.base/java.time=ALL-UNNAMED
> --add-opens=java.base/java.util=ALL-UNNAMED
> --add-opens=java.base/java.util.concurrent=ALL-UNNAMED
> --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
> --add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED
> 2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading
> configuration property: kubernetes.operator.reconcile.interval, 15 s
> 2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading
> configuration property:
> kubernetes.operator.default-configuration.flink-version.v1_19+.env.java.default-opts.all,
> --add-exports=java.base/sun.net.util=ALL-UNNAMED
> --add-exports=java.rmi/sun.rmi.registry=ALL-UNNAMED
> --add-exports=java.security.jgss/sun.security.krb5=ALL-UNNAMED
> --add-opens=java.base/java.lang=ALL-UNNAMED
> --add-opens=java.base/java.net=ALL-UNNAMED
> --add-opens=java.base/java.io=ALL-UNNAMED
> --add-opens=java.base/java.nio=ALL-UNNAMED
> --add-opens=java.base/sun.nio.ch=ALL-UNNAMED
> --add-opens=java.base/java.lang.reflect=ALL-UNNAMED
> --add-opens=java.base/java.text=ALL-UNNAMED
> --add-opens=java.base/java.time=ALL-UNNAMED
> --add-opens=java.base/java.util=ALL-UNNAMED
> --add-opens=java.base/java.util.concurrent=ALL-UNNAMED
> --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED
> --add-opens=java.base/java.util.concurrent.locks=ALL-UNNAMED
> 2025-07-04 09:08:56,982 o.a.f.c.GlobalConfiguration [INFO ] Loading
> configuration property: kubernetes.operator.metrics.reporter.slf4j.interval,
> 5 MINUTE
> 2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration [INFO ] Loading
> configuration property: kubernetes.operator.observer.progress-check.interval,
> 5 s
> 2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration [INFO ] Loading
> configuration property: kubernetes.operator.health.probe.enabled, true
> 2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration [INFO ] Loading
> configuration property: kubernetes.operator.health.probe.port, 8085
> 2025-07-04 09:08:56,983 o.a.f.c.GlobalConfiguration [INFO ] Loading
> configuration property:
> kubernetes.operator.metrics.reporter.slf4j.factory.class,
> org.apache.flink.metrics.slf4j.Slf4jReporterFactory
> 2025-07-04 09:08:56,984 o.a.f.k.o.c.FlinkConfigManager [INFO ] Default
> configuration did not change, nothing to do... {code}
>
> 11. Create a resource to test the webhook
> {code:java}
> ➜ flink-kubernetes-operator git:(main) ✗ k create -f examples/basic.yaml
> flinkdeployment.flink.apache.org/basic-example created {code}
>
> 12. Check the resource status
> {code:java}
> ➜ flink-kubernetes-operator git:(main) ✗ k get
> flinkdeployments.flink.apache.org
> NAME JOB STATUS LIFECYCLE STATE
> basic-example RUNNING STABLE
> ➜ flink-kubernetes-operator git:(main) ✗ k get po
> NAME READY STATUS RESTARTS AGE
> basic-example-6c7bff5c68-gmlh2 1/1 Running 0 25s
> basic-example-taskmanager-1-1 1/1 Running 0 14s
> flink-kubernetes-operator-7dc7858566-42g5z 2/2 Running 0
> 7m28s {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)