[jira] [Created] (FLINK-32886) Issue with volumeMounts when creating OLM for Flink Operator 1.6.0
James Busche created FLINK-32886: Summary: Issue with volumeMounts when creating OLM for Flink Operator 1.6.0 Key: FLINK-32886 URL: https://issues.apache.org/jira/browse/FLINK-32886 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.6.0 Reporter: James Busche I notice a volumemount problem when trying to deploy the OLM CSV for the 1.6.0 Flink Kubernetes Operator. (Following the directions from [OLM Verification of a Flink Kubernetes Operator Release|https://cwiki.apache.org/confluence/display/FLINK/OLM+Verification+of+a+Flink+Kubernetes+Operator+Release]] ^{{oc describe csv}}^ ^{{...}}^ ^{{Warning InstallComponentFailed 46s (x7 over 49s) operator-lifecycle-manager install strategy failed: Deployment.apps "flink-kubernetes-operator" is invalid: [spec.template.spec.volumes[2].name: Duplicate value: "keystore", spec.template.spec.containers[0].volumeMounts[1].name: Not found: "flink-artifacts-volume"]}}^ My current workaround is to change [line 88|https://github.com/apache/flink-kubernetes-operator/blob/main/tools/olm/docker-entry.sh#L88] to look like this: {{ ^yq ea -i '.spec.install.spec.deployments[0].spec.template.spec.volumes[1] = \{"name": "flink-artifacts-volume","emptyDir": {}}' "${CSV_FILE}"^}} ^{{yq ea -i '.spec.install.spec.deployments[0].spec.template.spec.volumes[2] = \{"name": "keystore","emptyDir": {}}' "${CSV_FILE}"}}^ And then the operator deploys without error: ^oc get csv NAME DISPLAY VERSION REPLACES PHASEflink-kubernetes-operator.v1.6.0 Flink Kubernetes Operator 1.6.0 flink-kubernetes-operator.v1.5.0 Succeeded^ -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-32103) RBAC flinkdeployments/finalizers missing for OpenShift Deployment
James Busche created FLINK-32103: Summary: RBAC flinkdeployments/finalizers missing for OpenShift Deployment Key: FLINK-32103 URL: https://issues.apache.org/jira/browse/FLINK-32103 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.5.0 Reporter: James Busche In OpenShift 4.10 and above, I'm noticing with the Flink 1.5.0 RC release that there's an issue with flinkdeployments on OpenShift. Flinkdeployments are stuck in upgrading: {quote}oc get flinkdep NAME JOB STATUS LIFECYCLE STATE basic-example UPGRADING {quote} The error message looks like: {quote}oc describe flinkdep basic-example Error: {"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"org.apache.flink.client.deployment.ClusterDeploymentException: Could not create Kubernetes cluster \"basic-example\".","throwableList":[\{"type":"org.apache.flink.client.deployment.ClusterDeploymentException","message":"Could not create Kubernetes cluster \"basic-example\"."},\{"type":"org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure executing: POST at: https://172.30.0.1/apis/apps/v1/namespaces/default/deployments. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. deployments.apps \"basic-example\" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , ."}]} Job Manager Deployment Status: MISSING {quote} The solution is to fix it in the rbac.yaml of the helm template, adding a " - flinkdeployments/finalizers" line to the flink.apache.org apiGroup. If the Operator is already running and flinkdeployments are having trouble on OpenShift, then someone can manually edit the flink-kubernetes-operator.v1.5.0 clusterrole and add the " - flinkdeployments/finalizers" in the flink.apache.org apiGroup. I'll create a PR that addresses this. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-31982) Build image from source Dockerfile error in main
James Busche created FLINK-31982: Summary: Build image from source Dockerfile error in main Key: FLINK-31982 URL: https://issues.apache.org/jira/browse/FLINK-31982 Project: Flink Issue Type: Bug Components: Kubernetes Operator Reporter: James Busche I'm noticing a problem trying to build the Debian Flink Operator image from the Dockerfile in the main branch. podman build -f Dockerfile -t debian-release:1.5.0-rc1 [INFO] Compiling 9 source files to /app/flink-kubernetes-operator-autoscaler/target/test-classes [ERROR] COMPILATION ERROR : [INFO] - [ERROR] /app/flink-kubernetes-operator-autoscaler/src/test/java/org/apache/flink/kubernetes/operator/autoscaler/ScalingMetricEvaluatorTest.java:[59,8] error while writing org.apache.flink.kubernetes.operator.autoscaler.ScalingMetricEvaluatorTest: /app/flink-kubernetes-operator-autoscaler/target/test-classes/org/apache/flink/kubernetes/operator/autoscaler/ScalingMetricEvaluatorTest.class: Too many open files [ERROR] /app/flink-kubernetes-operator-autoscaler/src/test/java/org/apache/flink/kubernetes/operator/autoscaler/JobVertexScalerTest.java:[78,29] cannot access org.apache.flink.kubernetes.operator.autoscaler.ScalingSummary bad class file: /app/flink-kubernetes-operator-autoscaler/target/classes/org/apache/flink/kubernetes/operator/autoscaler/ScalingSummary.class unable to access file: java.nio.file.FileSystemException: /app/flink-kubernetes-operator-autoscaler/target/classes/org/apache/flink/kubernetes/operator/autoscaler/ScalingSummary.class: Too many open files Please remove or make sure it appears in the correct subdirectory of the classpath. [ERROR] /app/flink-kubernetes-operator-autoscaler/src/test/java/org/apache/flink/kubernetes/operator/autoscaler/JobVertexScalerTest.java:[84,29] incompatible types: inferred type does not conform to equality constraint(s) I've tried increasing my nofiles to unlimited, but still see the error. I tried building the release 1.4.0 and it built fine, so not certain what's recently changed in 1.5.0. Maybe it builds fine in Docker instead of podman? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-30577) OpenShift FlinkSessionJob artifact write error on non-default namespaces
James Busche created FLINK-30577: Summary: OpenShift FlinkSessionJob artifact write error on non-default namespaces Key: FLINK-30577 URL: https://issues.apache.org/jira/browse/FLINK-30577 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.3.0 Reporter: James Busche [~tagarr] has pointed out an issue with using the /opt/flink/artifacts filesystem on OpenShift in non-default namespaces. The OpenShift permissions don't allow write to /opt. ``` org.apache.flink.util.FlinkRuntimeException: Failed to create the dir: /opt/flink/artifacts/jim/basic-session-deployment-only-example/basic-session-job-only-example ``` A few ways to solve the problem are: 1. Remove the comment on line 34 here in [flink-conf.yaml|https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/conf/flink-conf.yaml#L34] and change it to: /tmp/flink/artifacts 2. Append this after line 143 here in [values.yaml|https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/values.yaml#L142]: kubernetes.operator.user.artifacts.base.dir: /tmp/flink/artifacts 3. Changing it in line 142 of [KubernetesOperatorConfigOptions.java|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java#L142] like this: .defaultValue("/tmp/flink/artifacts") and then rebuilding the operator image. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-30456) OLM Bundle Description Version Problems
James Busche created FLINK-30456: Summary: OLM Bundle Description Version Problems Key: FLINK-30456 URL: https://issues.apache.org/jira/browse/FLINK-30456 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.3.0 Reporter: James Busche OLM is working great with OperatorHub, but noticed a few items that need fixing. 1. The basic.yaml example version is release-1.1 instead of the latest release (release-1.3). This needs fixing in two places: tools/olm/generate-olm-bundle.sh tools/olm/docker-entry.sh 2. The label versions in the description are hardcoded to 1.2.0 instead of the latest release (1.3.0) 3. The Provider is blank space " " but soon needs to have some text in there to avoid CI errors with the latest operator-sdk versions. The person who noticed it recommended "Community" for now, but maybe we can being making it "The Apache Software Foundation" now? Not sure if we're ready for that yet or not... I'm working on a PR to address these items. Can you assign the issue to me? Thanks! fyi [~tedchang] [~gyfora] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29853) Older jackson-databind found in flink-kubernetes-operator-1.2.0-shaded.jar
James Busche created FLINK-29853: Summary: Older jackson-databind found in flink-kubernetes-operator-1.2.0-shaded.jar Key: FLINK-29853 URL: https://issues.apache.org/jira/browse/FLINK-29853 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.2.1 Reporter: James Busche A Twistlock security scan of the existing 1.2.0 operator as well as the current main release shows a high vulnerability with the current jackson-databind version. == severity: High cvss: 7.5 riskFactors: Attack complexity: low,Attack vector: network,Has fix,High severity,Recent vulnerability CVE link: [https://nvd.nist.gov/vuln/detail/CVE-2022-42003] packageName: com.fasterxml.jackson.core_jackson-databind packagePath: /flink-kubernetes-operator/flink-kubernetes-operator-1.2.0-shaded.jar and/or /flink-kubernetes-operator/flink-kubernetes-operator-1.3-SNAPSHOT-shaded.jar description: In FasterXML jackson-databind before 2.14.0-rc1, resource exhaustion can occur because of a lack of a check in primitive value deserializers to avoid deep wrapper array nesting, when the UNWRAP_SINGLE_VALUE_ARRAYS feature is enabled. Additional fix version in 2.13.4.1 and 2.12.17.1 This is exactly like the older issue https://issues.apache.org/jira/browse/FLINK-27654 I'm going to see if I can fix it myself and create a PR if I'm successful. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-29384) snakeyaml version 1.30 in flink-kubernetes-operator-1.2-SNAPSHOT-shaded.jar has vulnerabilities
James Busche created FLINK-29384: Summary: snakeyaml version 1.30 in flink-kubernetes-operator-1.2-SNAPSHOT-shaded.jar has vulnerabilities Key: FLINK-29384 URL: https://issues.apache.org/jira/browse/FLINK-29384 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.2.0 Reporter: James Busche I did a twistlock scan of the current operator image from main, and it looks good except for in the flink-kubernetes-operator-1.2-SNAPSHOT-shaded.jar, I'm seeing 5 CVEs on snakeyaml. Looks like updating from 1.30 to 1.32 should fix it, but I'm not sure how to bump that up, other than the [NOTICES|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/resources/META-INF/NOTICE#L65] entry. The 5 CVEs are: [https://nvd.nist.gov/vuln/detail/CVE-2022-25857] [https://nvd.nist.gov/vuln/detail/CVE-2022-25857] [https://nvd.nist.gov/vuln/detail/CVE-2022-38751] [https://nvd.nist.gov/vuln/detail/CVE-2022-38750] [https://nvd.nist.gov/vuln/detail/CVE-2022-38752] Resulting in 1 High (CVSS 7.5) and 4 Mediums (CVSS 6.5, 6.5, 5.5, 4) -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-28637) High vulnerability in flink-kubernetes-operator-1.1.0-shaded.jar
James Busche created FLINK-28637: Summary: High vulnerability in flink-kubernetes-operator-1.1.0-shaded.jar Key: FLINK-28637 URL: https://issues.apache.org/jira/browse/FLINK-28637 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.1.0 Reporter: James Busche I noticed a high vulnerability in the flink-kubernetes-operator-1.1.0-shaded.jar file. === cvss: 7.5 riskFactors: Has fix,High severity cve: PRISMA-2022-0239 link: https://github.com/square/okhttp/issues/6738 status: fixed in 4.9.2 packagePath: /flink-kubernetes-operator/flink-kubernetes-operator-1.1.0-shaded.jar description: com.squareup.okhttp3_okhttp packages prior to version 4.9.2 are vulnerable for sensitive information disclosure. An illegal character in a header value will cause IllegalArgumentException which will include full header value. This applies to Authorization, Cookie, Proxy-Authorization and Set-Cookie headers. === It looks like we're using version 3.12.12, and there's no plans to provide this fix for the 3.x version. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (FLINK-27923) Typo fix for release-1.0.0 quick-start.md
James Busche created FLINK-27923: Summary: Typo fix for release-1.0.0 quick-start.md Key: FLINK-27923 URL: https://issues.apache.org/jira/browse/FLINK-27923 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-1.0.0 Reporter: James Busche Fix For: kubernetes-operator-1.0.0 Noticed a typo while deploying the example. Currently: kubectl create -f https://raw.githubusercontent.com/apache/flink-kubernetes-operator/release-0.1/examples/basic.yaml Where it should be: kubectl create -f https://raw.githubusercontent.com/apache/flink-kubernetes-operator/release-1.0.0/examples/basic.yaml -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-27728) dockerFile build results in five vulnerabilities
James Busche created FLINK-27728: Summary: dockerFile build results in five vulnerabilities Key: FLINK-27728 URL: https://issues.apache.org/jira/browse/FLINK-27728 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-0.1.0 Reporter: James Busche Fix For: kubernetes-operator-1.0.0 A Twistlock security scan of the default flink-kubernetes-operator currently shows five fixable vulnerabilities. One [~wangyang0918] and I are trying to fix in [FLINK-27654|https://issues.apache.org/jira/browse/FLINK-27654]. The other four are easily addressable if we update the underlying OS. I'll propose a PR for this later this evening. The four vulnerabilities are: 1. packageName: gzip severity: Low cvss: 0 riskFactors: Has fix,Recent vulnerability CVE Link: [https://security-tracker.debian.org/tracker/CVE-2022-1271] Description: DOCUMENTATION: No description is available for this CVE. STATEMENT: This bug was introduced in gzip-1.3.10 and is relatively hard to exploit. Red Hat Enterprise Linux 6 was affected but Out of Support Cycle because gzip was not listed in Red Hat Enterprise Linux 6 ELS Inclusion List. [https://access.redhat.com/articles/4997301] MITIGATION: Red Hat has investigated whether possible mitigation exists for this issue, and has not been able to identify a practical example. Please update the affected package as soon as possible. 2. packageName: openssl severity: Critical cvss: 9.8 riskFactors: Attack complexity: low,Attack vector: network,Critical severity,Has fix,Recent vulnerability CVE Link: [https://security-tracker.debian.org/tracker/CVE-2022-1292] Description: The c_rehash script does not properly sanitise shell metacharacters to prevent command injection. This script is distributed by some operating systems in a manner where it is automatically executed. On such operating systems, an attacker could execute arbitrary commands with the privileges of the script. Use of the c_rehash script is considered obsolete and should be replaced by the OpenSSL rehash command line tool. Fixed in OpenSSL 3.0.3 (Affected 3.0.0,3.0.1,3.0.2). Fixed in OpenSSL 1.1.1o (Affected 1.1.1-1.1.1n). Fixed in OpenSSL 1.0.2ze (Affected 1.0.2-1.0.2zd). 3. packageName: zlib severity: High cvss: 7.5 riskFactors: Attack complexity: low,Attack vector: network,Has fix,High severity CVE Link: [https://security-tracker.debian.org/tracker/CVE-2018-25032] Description: zlib before 1.2.12 allows memory corruption when deflating (i.e., when compressing) if the input has many distant matches. 4. packageName: openldap severity: Critical cvss: 9.8 riskFactors: Attack complexity: low,Attack vector: network,Critical severity,Has fix,Recent vulnerability CVE Link: [https://security-tracker.debian.org/tracker/CVE-2022-29155] Description: In OpenLDAP 2.x before 2.5.12 and 2.6.x before 2.6.2, a SQL injection vulnerability exists in the experimental back-sql backend to slapd, via a SQL statement within an LDAP query. This can occur during an LDAP search operation when the search filter is processed, due to a lack of proper escaping. -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-27654) Older jackson-databind found in /flink-kubernetes-shaded-1.0-SNAPSHOT.jar
James Busche created FLINK-27654: Summary: Older jackson-databind found in /flink-kubernetes-shaded-1.0-SNAPSHOT.jar Key: FLINK-27654 URL: https://issues.apache.org/jira/browse/FLINK-27654 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-0.1.0 Reporter: James Busche A twistlock security scan of the latest kubernetes flink operator is showing an older version of jackson-databind in the /flink-kubernetes-shaded-1.0-SNAPSHOT.jar file. I don't know how to control/update the contents of this snapshot file. I see this in the report (Otherwise, everything else looks good!): == severity: High cvss: 7.5 riskFactors: Attack complexity: low,Attack vector: network,DoS,Has fix,High severity cve: CVE-2020-36518 Link: [https://nvd.nist.gov/vuln/detail/CVE-2020-36518] packageName: com.fasterxml.jackson.core_jackson-databind packagePath: /flink-kubernetes-operator-1.0-SNAPSHOT-shaded.jar description: jackson-databind before 2.13.0 allows a Java StackOverflow exception and denial of service via a large depth of nested objects. = I'd be glad to try to fix it, I'm just not sure how the jackson-databind versions are controlled in this /flink-kubernetes-operator-1.0-SNAPSHOT-shaded.jar -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Created] (FLINK-27211) RBAC deployments/finalizers missing for OpenShift Deployment
James Busche created FLINK-27211: Summary: RBAC deployments/finalizers missing for OpenShift Deployment Key: FLINK-27211 URL: https://issues.apache.org/jira/browse/FLINK-27211 Project: Flink Issue Type: Bug Components: Kubernetes Operator Affects Versions: kubernetes-operator-0.1.0 Reporter: James Busche On Openshift 4.8 when applying the basic.yaml, we see in the operator logs: ??2022-04-12 23:11:56,290 i.j.o.p.e.ReconciliationDispatcher *[ERROR][default/basic-example] Error during event processing ExecutionScope{ resource id*?? ??*: CustomResourceID\{name='basic-example', namespace='default'}, version: 680939} failed.*?? ??{*}org.apache.flink.kubernetes.operator.exception.ReconciliationException: org.apache.flink.client.deployment.ClusterDeploymentException: Could not create Kubernetes clus{*}{*}ter "basic-example".{*}?? ??{*}{*}{*}{*}?? ??*Caused by: org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at:* [*https://172.30.0.1/api/v1/namespaces/*]?? ??{*}default/services. Message: Forbidden!Configured service account doesn't have access. Service account may have been revoked. services "basic-example" is forbidden: cann{*}{*}ot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: , .{*}?? Manually, this can be fixed by adding to the flink role under apps apiGroups: - deployments/finalizers and to add to the flink-operator clusterrole under apps apiGrups: - deployments/finalizers -- This message was sent by Atlassian Jira (v8.20.1#820001)