[jira] [Created] (FLINK-32886) Issue with volumeMounts when creating OLM for Flink Operator 1.6.0

2023-08-16 Thread James Busche (Jira)
James Busche created FLINK-32886:


 Summary: Issue with volumeMounts when creating OLM for Flink 
Operator 1.6.0
 Key: FLINK-32886
 URL: https://issues.apache.org/jira/browse/FLINK-32886
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.6.0
Reporter: James Busche


I notice a volumemount problem when trying to deploy the OLM CSV for the 1.6.0 
Flink Kubernetes Operator.  (Following the directions from [OLM Verification of 
a Flink Kubernetes Operator 
Release|https://cwiki.apache.org/confluence/display/FLINK/OLM+Verification+of+a+Flink+Kubernetes+Operator+Release]]

 

^{{oc describe csv}}^

^{{...}}^

^{{Warning  InstallComponentFailed  46s (x7 over 49s)  
operator-lifecycle-manager  install strategy failed: Deployment.apps 
"flink-kubernetes-operator" is invalid: [spec.template.spec.volumes[2].name: 
Duplicate value: "keystore", 
spec.template.spec.containers[0].volumeMounts[1].name: Not found: 
"flink-artifacts-volume"]}}^

 

My current workaround is to change [line 
88|https://github.com/apache/flink-kubernetes-operator/blob/main/tools/olm/docker-entry.sh#L88]
 to look like this:

 

{{  ^yq ea -i '.spec.install.spec.deployments[0].spec.template.spec.volumes[1] 
= \{"name": "flink-artifacts-volume","emptyDir": {}}' "${CSV_FILE}"^}}  ^{{yq 
ea -i '.spec.install.spec.deployments[0].spec.template.spec.volumes[2] = 
\{"name": "keystore","emptyDir": {}}' "${CSV_FILE}"}}^

 

And then the operator deploys without error:

^oc get csv                                                                     
                                                                     NAME       
                        DISPLAY                     VERSION   REPLACES          
                 PHASEflink-kubernetes-operator.v1.6.0   Flink Kubernetes 
Operator   1.6.0     flink-kubernetes-operator.v1.5.0   Succeeded^



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-32103) RBAC flinkdeployments/finalizers missing for OpenShift Deployment

2023-05-15 Thread James Busche (Jira)
James Busche created FLINK-32103:


 Summary: RBAC flinkdeployments/finalizers missing for OpenShift 
Deployment
 Key: FLINK-32103
 URL: https://issues.apache.org/jira/browse/FLINK-32103
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.5.0
Reporter: James Busche


In OpenShift 4.10 and above, I'm noticing with the Flink 1.5.0 RC release that 
there's an issue with flinkdeployments on OpenShift.  Flinkdeployments are 
stuck in upgrading:
{quote}oc get flinkdep

NAME                                    JOB STATUS   LIFECYCLE STATE

basic-example                                        UPGRADING
{quote}
 

The error message looks like:
{quote}oc describe flinkdep basic-example



Error:                          
{"type":"org.apache.flink.kubernetes.operator.exception.ReconciliationException","message":"org.apache.flink.client.deployment.ClusterDeploymentException:
 Could not create Kubernetes cluster 
\"basic-example\".","throwableList":[\{"type":"org.apache.flink.client.deployment.ClusterDeploymentException","message":"Could
 not create Kubernetes cluster 
\"basic-example\"."},\{"type":"org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException","message":"Failure
 executing: POST at: 
https://172.30.0.1/apis/apps/v1/namespaces/default/deployments. Message: 
Forbidden!Configured service account doesn't have access. Service account may 
have been revoked. deployments.apps \"basic-example\" is forbidden: cannot set 
blockOwnerDeletion if an ownerReference refers to a resource you can't set 
finalizers on: , ."}]}

 

 Job Manager Deployment Status:  MISSING
{quote}
 

The solution is to fix it in the rbac.yaml of the helm template, adding a "  - 
flinkdeployments/finalizers" line to the flink.apache.org apiGroup.

 

If the Operator is already running and flinkdeployments are having trouble on 
OpenShift, then someone can manually edit the flink-kubernetes-operator.v1.5.0 
clusterrole and add the

"  - flinkdeployments/finalizers" in the flink.apache.org apiGroup.

 

I'll create a PR that addresses this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-31982) Build image from source Dockerfile error in main

2023-05-02 Thread James Busche (Jira)
James Busche created FLINK-31982:


 Summary: Build image from source Dockerfile error in main
 Key: FLINK-31982
 URL: https://issues.apache.org/jira/browse/FLINK-31982
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: James Busche


I'm noticing a problem trying to build the Debian Flink Operator image from the 
Dockerfile in the main branch.

 

podman build -f Dockerfile -t debian-release:1.5.0-rc1



[INFO] Compiling 9 source files to 
/app/flink-kubernetes-operator-autoscaler/target/test-classes



[ERROR] COMPILATION ERROR : 

[INFO] -

[ERROR] 
/app/flink-kubernetes-operator-autoscaler/src/test/java/org/apache/flink/kubernetes/operator/autoscaler/ScalingMetricEvaluatorTest.java:[59,8]
 error while writing 
org.apache.flink.kubernetes.operator.autoscaler.ScalingMetricEvaluatorTest: 
/app/flink-kubernetes-operator-autoscaler/target/test-classes/org/apache/flink/kubernetes/operator/autoscaler/ScalingMetricEvaluatorTest.class:
 Too many open files

[ERROR] 
/app/flink-kubernetes-operator-autoscaler/src/test/java/org/apache/flink/kubernetes/operator/autoscaler/JobVertexScalerTest.java:[78,29]
 cannot access org.apache.flink.kubernetes.operator.autoscaler.ScalingSummary

  bad class file: 
/app/flink-kubernetes-operator-autoscaler/target/classes/org/apache/flink/kubernetes/operator/autoscaler/ScalingSummary.class

    unable to access file: java.nio.file.FileSystemException: 
/app/flink-kubernetes-operator-autoscaler/target/classes/org/apache/flink/kubernetes/operator/autoscaler/ScalingSummary.class:
 Too many open files

    Please remove or make sure it appears in the correct subdirectory of the 
classpath.

[ERROR] 
/app/flink-kubernetes-operator-autoscaler/src/test/java/org/apache/flink/kubernetes/operator/autoscaler/JobVertexScalerTest.java:[84,29]
 incompatible types: inferred type does not conform to equality constraint(s)

 

I've tried increasing my nofiles to unlimited, but still see the error.

I tried building the release 1.4.0 and it built fine, so not certain what's 
recently changed in 1.5.0. Maybe it builds fine in Docker instead of podman?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30577) OpenShift FlinkSessionJob artifact write error on non-default namespaces

2023-01-05 Thread James Busche (Jira)
James Busche created FLINK-30577:


 Summary: OpenShift FlinkSessionJob artifact write error on 
non-default namespaces
 Key: FLINK-30577
 URL: https://issues.apache.org/jira/browse/FLINK-30577
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.3.0
Reporter: James Busche


[~tagarr] has pointed out an issue with using the /opt/flink/artifacts 
filesystem on OpenShift in non-default namespaces.  The OpenShift permissions 
don't allow write to /opt.  
```
org.apache.flink.util.FlinkRuntimeException: Failed to create the dir: 
/opt/flink/artifacts/jim/basic-session-deployment-only-example/basic-session-job-only-example
```
A few ways to solve the problem are:
1. Remove the comment on line 34 here in 
[flink-conf.yaml|https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/conf/flink-conf.yaml#L34]
 and change it to: /tmp/flink/artifacts

2. Append this after line 143 here in 
[values.yaml|https://github.com/apache/flink-kubernetes-operator/blob/main/helm/flink-kubernetes-operator/values.yaml#L142]:
kubernetes.operator.user.artifacts.base.dir: /tmp/flink/artifacts

3.  Changing it in line 142 of 
[KubernetesOperatorConfigOptions.java|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/config/KubernetesOperatorConfigOptions.java#L142]
 like this:
.defaultValue("/tmp/flink/artifacts") 
and then rebuilding the operator image.






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-30456) OLM Bundle Description Version Problems

2022-12-19 Thread James Busche (Jira)
James Busche created FLINK-30456:


 Summary: OLM Bundle Description Version Problems
 Key: FLINK-30456
 URL: https://issues.apache.org/jira/browse/FLINK-30456
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.3.0
Reporter: James Busche


OLM is working great with OperatorHub, but noticed a few items that need fixing.

1.  The basic.yaml example version is release-1.1 instead of the latest release 
(release-1.3).  This needs fixing in two places:

tools/olm/generate-olm-bundle.sh

tools/olm/docker-entry.sh

2.  The label versions in the description are hardcoded to 1.2.0 instead of the 
latest release (1.3.0)

3. The Provider is blank space " " but soon needs to have some text in there to 
avoid CI errors with the latest operator-sdk versions.  The person who noticed 
it recommended "Community" for now, but maybe we can being making it "The 
Apache Software Foundation" now?  Not sure if we're ready for that yet or not...

 

I'm working on a PR to address these items.  Can you assign the issue to me?  
Thanks!

fyi [~tedchang] [~gyfora] 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29853) Older jackson-databind found in flink-kubernetes-operator-1.2.0-shaded.jar

2022-11-02 Thread James Busche (Jira)
James Busche created FLINK-29853:


 Summary: Older jackson-databind found in 
flink-kubernetes-operator-1.2.0-shaded.jar
 Key: FLINK-29853
 URL: https://issues.apache.org/jira/browse/FLINK-29853
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.2.1
Reporter: James Busche


A Twistlock security scan of the existing 1.2.0 operator as well as the current 
main release shows a high vulnerability with the current jackson-databind 
version.

==
severity: High

cvss: 7.5

riskFactors:  Attack complexity: low,Attack vector: network,Has fix,High 
severity,Recent vulnerability

CVE link: [https://nvd.nist.gov/vuln/detail/CVE-2022-42003]

packageName: com.fasterxml.jackson.core_jackson-databind

packagePath: 
/flink-kubernetes-operator/flink-kubernetes-operator-1.2.0-shaded.jar and/or 
/flink-kubernetes-operator/flink-kubernetes-operator-1.3-SNAPSHOT-shaded.jar

description: In FasterXML jackson-databind before 2.14.0-rc1, resource 
exhaustion can occur because of a lack of a check in primitive value 
deserializers to avoid deep wrapper array nesting, when the 
UNWRAP_SINGLE_VALUE_ARRAYS feature is enabled. Additional fix version in 
2.13.4.1 and 2.12.17.1



This is exactly like the older issue 
https://issues.apache.org/jira/browse/FLINK-27654 

I'm going to see if I can fix it myself and create a PR if I'm successful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-29384) snakeyaml version 1.30 in flink-kubernetes-operator-1.2-SNAPSHOT-shaded.jar has vulnerabilities

2022-09-21 Thread James Busche (Jira)
James Busche created FLINK-29384:


 Summary: snakeyaml version 1.30 in 
flink-kubernetes-operator-1.2-SNAPSHOT-shaded.jar has vulnerabilities
 Key: FLINK-29384
 URL: https://issues.apache.org/jira/browse/FLINK-29384
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.2.0
Reporter: James Busche


I did a twistlock scan of the current operator image from main, and it looks 
good except for in the flink-kubernetes-operator-1.2-SNAPSHOT-shaded.jar, I'm 
seeing 5 CVEs on snakeyaml.  Looks like updating from 1.30 to 1.32 should fix 
it, but I'm not sure how to bump that up, other than the 
[NOTICES|https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/resources/META-INF/NOTICE#L65]
 entry.

The 5 CVEs are:
[https://nvd.nist.gov/vuln/detail/CVE-2022-25857]

[https://nvd.nist.gov/vuln/detail/CVE-2022-25857]

[https://nvd.nist.gov/vuln/detail/CVE-2022-38751]

[https://nvd.nist.gov/vuln/detail/CVE-2022-38750]

[https://nvd.nist.gov/vuln/detail/CVE-2022-38752]

Resulting in 1 High (CVSS 7.5) and 4 Mediums (CVSS 6.5, 6.5, 5.5, 4)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-28637) High vulnerability in flink-kubernetes-operator-1.1.0-shaded.jar

2022-07-21 Thread James Busche (Jira)
James Busche created FLINK-28637:


 Summary: High vulnerability in 
flink-kubernetes-operator-1.1.0-shaded.jar
 Key: FLINK-28637
 URL: https://issues.apache.org/jira/browse/FLINK-28637
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.1.0
Reporter: James Busche


I noticed a high vulnerability in the 
flink-kubernetes-operator-1.1.0-shaded.jar file.

===

cvss: 7.5

riskFactors: Has fix,High severity

cve: PRISMA-2022-0239    

link: https://github.com/square/okhttp/issues/6738

status: fixed in 4.9.2

packagePath: 
/flink-kubernetes-operator/flink-kubernetes-operator-1.1.0-shaded.jar

description: com.squareup.okhttp3_okhttp packages prior to version 4.9.2 are 
vulnerable for sensitive information disclosure. An illegal character in a 
header value will cause IllegalArgumentException which will include full header 
value. This applies to Authorization, Cookie, Proxy-Authorization and 
Set-Cookie headers. 

===

It looks like we're using version 3.12.12, and there's no plans to provide this 
fix for the 3.x version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (FLINK-27923) Typo fix for release-1.0.0 quick-start.md

2022-06-06 Thread James Busche (Jira)
James Busche created FLINK-27923:


 Summary: Typo fix for release-1.0.0 quick-start.md
 Key: FLINK-27923
 URL: https://issues.apache.org/jira/browse/FLINK-27923
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-1.0.0
Reporter: James Busche
 Fix For: kubernetes-operator-1.0.0


Noticed a typo while deploying the example.

Currently:
kubectl create -f 
https://raw.githubusercontent.com/apache/flink-kubernetes-operator/release-0.1/examples/basic.yaml


Where it should be:

kubectl create -f 
https://raw.githubusercontent.com/apache/flink-kubernetes-operator/release-1.0.0/examples/basic.yaml



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27728) dockerFile build results in five vulnerabilities

2022-05-20 Thread James Busche (Jira)
James Busche created FLINK-27728:


 Summary: dockerFile build results in five vulnerabilities
 Key: FLINK-27728
 URL: https://issues.apache.org/jira/browse/FLINK-27728
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-0.1.0
Reporter: James Busche
 Fix For: kubernetes-operator-1.0.0


A Twistlock security scan of the default flink-kubernetes-operator currently 
shows five fixable vulnerabilities.  One [~wangyang0918] and I are trying to 
fix in [FLINK-27654|https://issues.apache.org/jira/browse/FLINK-27654].

The other four are easily addressable if we update the underlying OS.  I'll 
propose a PR for this later this evening.

The four vulnerabilities are: 
1.  packageName: gzip

severity: Low

cvss: 0

riskFactors: Has fix,Recent vulnerability

CVE Link:  [https://security-tracker.debian.org/tracker/CVE-2022-1271] 

Description: DOCUMENTATION: No description is available for this CVE.           
   STATEMENT: This bug was introduced in gzip-1.3.10 and is relatively hard to 
exploit.  Red Hat Enterprise Linux 6 was affected but Out of Support Cycle 
because gzip was not listed in Red Hat Enterprise Linux 6 ELS Inclusion List. 
[https://access.redhat.com/articles/4997301]             MITIGATION: Red Hat 
has investigated whether possible mitigation exists for this issue, and has not 
been able to identify a practical example. Please update the affected package 
as soon as possible.

2.  packageName: openssl

severity: Critical

cvss: 9.8

riskFactors: Attack complexity: low,Attack vector: network,Critical 
severity,Has fix,Recent vulnerability

CVE Link: [https://security-tracker.debian.org/tracker/CVE-2022-1292] 

Description: 

The c_rehash script does not properly sanitise shell metacharacters to prevent 
command injection. This script is distributed by some operating systems in a 
manner where it is automatically executed. On such operating systems, an 
attacker could execute arbitrary commands with the privileges of the script. 
Use of the c_rehash script is considered obsolete and should be replaced by the 
OpenSSL rehash command line tool. Fixed in OpenSSL 3.0.3 (Affected 
3.0.0,3.0.1,3.0.2). Fixed in OpenSSL 1.1.1o (Affected 1.1.1-1.1.1n). Fixed in 
OpenSSL 1.0.2ze (Affected 1.0.2-1.0.2zd).

3.  packageName: zlib

severity: High

cvss: 7.5

riskFactors: Attack complexity: low,Attack vector: network,Has fix,High severity

CVE Link: [https://security-tracker.debian.org/tracker/CVE-2018-25032] 

Description: zlib before 1.2.12 allows memory corruption when deflating (i.e., 
when compressing) if the input has many distant matches.

4.  packageName: openldap

severity: Critical

cvss: 9.8

riskFactors: Attack complexity: low,Attack vector: network,Critical 
severity,Has fix,Recent vulnerability

CVE Link: [https://security-tracker.debian.org/tracker/CVE-2022-29155] 

Description: In OpenLDAP 2.x before 2.5.12 and 2.6.x before 2.6.2, a SQL 
injection vulnerability exists in the experimental back-sql backend to slapd, 
via a SQL statement within an LDAP query. This can occur during an LDAP search 
operation when the search filter is processed, due to a lack of proper escaping.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27654) Older jackson-databind found in /flink-kubernetes-shaded-1.0-SNAPSHOT.jar

2022-05-16 Thread James Busche (Jira)
James Busche created FLINK-27654:


 Summary: Older jackson-databind found in 
/flink-kubernetes-shaded-1.0-SNAPSHOT.jar
 Key: FLINK-27654
 URL: https://issues.apache.org/jira/browse/FLINK-27654
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-0.1.0
Reporter: James Busche


A twistlock security scan of the latest kubernetes flink operator is showing an 
older version of jackson-databind in the 
/flink-kubernetes-shaded-1.0-SNAPSHOT.jar file.  I don't know how to 
control/update the contents of this snapshot file.  

I see this in the report (Otherwise, everything else looks good!):

==
severity: High

cvss: 7.5 

riskFactors: Attack complexity: low,Attack vector: network,DoS,Has fix,High 
severity

cve: CVE-2020-36518

Link: [https://nvd.nist.gov/vuln/detail/CVE-2020-36518]

packageName: com.fasterxml.jackson.core_jackson-databind

packagePath: /flink-kubernetes-operator-1.0-SNAPSHOT-shaded.jar

description: jackson-databind before 2.13.0 allows a Java StackOverflow 
exception and denial of service via a large depth of nested objects.

=

I'd be glad to try to fix it, I'm just not sure how the jackson-databind 
versions are controlled in this 
/flink-kubernetes-operator-1.0-SNAPSHOT-shaded.jar 



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Created] (FLINK-27211) RBAC deployments/finalizers missing for OpenShift Deployment

2022-04-12 Thread James Busche (Jira)
James Busche created FLINK-27211:


 Summary: RBAC deployments/finalizers missing for OpenShift 
Deployment
 Key: FLINK-27211
 URL: https://issues.apache.org/jira/browse/FLINK-27211
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Affects Versions: kubernetes-operator-0.1.0
Reporter: James Busche


On Openshift 4.8 when applying the basic.yaml, we see in the operator logs:

 

??2022-04-12 23:11:56,290 i.j.o.p.e.ReconciliationDispatcher 
*[ERROR][default/basic-example] Error during event processing ExecutionScope{ 
resource id*??

??*: CustomResourceID\{name='basic-example', namespace='default'}, version: 
680939} failed.*??

??{*}org.apache.flink.kubernetes.operator.exception.ReconciliationException: 
org.apache.flink.client.deployment.ClusterDeploymentException: Could not create 
Kubernetes clus{*}{*}ter "basic-example".{*}??
??{*}{*}{*}{*}??

??*Caused by: 
org.apache.flink.kubernetes.shaded.io.fabric8.kubernetes.client.KubernetesClientException:
 Failure executing: POST at:* [*https://172.30.0.1/api/v1/namespaces/*]??

??{*}default/services. Message: Forbidden!Configured service account doesn't 
have access. Service account may have been revoked. services "basic-example" is 
forbidden: cann{*}{*}ot set blockOwnerDeletion if an ownerReference refers to a 
resource you can't set finalizers on: , .{*}??

Manually, this can be fixed by adding to the flink role under apps apiGroups:

  - deployments/finalizers

 

and to add to the flink-operator clusterrole under apps apiGrups:

  - deployments/finalizers

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)