[
https://issues.apache.org/jira/browse/FLINK-36513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Rui Fan resolved FLINK-36513.
-----------------------------
Fix Version/s: kubernetes-operator-1.10.0
Resolution: Fixed
Merged to main(1.10.0) via : 12d809a25f888d4d41179a4b8eb0c055a6fed169
> A lot of CI failures are caused by Install cert-manager
> -------------------------------------------------------
>
> Key: FLINK-36513
> URL: https://issues.apache.org/jira/browse/FLINK-36513
> Project: Flink
> Issue Type: Technical Debt
> Components: Kubernetes Operator
> Affects Versions: kubernetes-operator-1.9.0
> Reporter: Rui Fan
> Assignee: Rui Fan
> Priority: Major
> Labels: pull-request-available
> Fix For: kubernetes-operator-1.10.0
>
> Attachments: image-2024-10-12-10-30-38-781.png
>
>
> A lot of CI failures are caused by Install cert-manager, such as:
> [https://github.com/apache/flink-kubernetes-operator/actions/runs/11292388626/job/31408603383]
> [https://github.com/apache/flink-kubernetes-operator/actions/runs/11294831791/job/31436330397]
> h1. Root cause:
> I checked the raw log[1], the failure reason is : _Unable to connect to the
> server: dial tcp 140.82.113.3:443: i/o timeout._
> !image-2024-10-12-10-30-38-781.png|width=1354,height=492!
>
> CI code:
> [https://github.com/apache/flink-kubernetes-operator/blob/d2c01737c745979c6aadb670334565ee11aa2f4a/.github/workflows/ci.yml#L227]
>
> It needs to download cert-manager.yaml from github, and 140.82.113.3:443 is
> the github ip+port. So download cert-manager.yaml is the root cause.
>
> h1. Solution:
> * Solution1: Introducing retry mechanism
> * Solution2: Put the cert-manager.yaml on the flink-kubernetes-operator repo
> directly
> ** I saw it's a fixed version, so the cert-manager.yaml is immutable IIUC.
>
> [1]
> [https://github.com/apache/flink-kubernetes-operator/commit/d2c01737c745979c6aadb670334565ee11aa2f4a/checks/31436330397/logs]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)