[jira] [Commented] (FLINK-26493) Use state machine based mechanism to simply the reconciler

2022-03-04 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501381#comment-17501381
 ] 

Aitozi commented on FLINK-26493:


I will try to define a initial state machine transfer flow we are target as 
described in the dev mail 

[https://github.com/lyft/flinkk8soperator/blob/master/docs/state_machine.md]

> Use state machine based mechanism to simply the reconciler
> --
>
> Key: FLINK-26493
> URL: https://issues.apache.org/jira/browse/FLINK-26493
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> As discussed in 
> [link|https://lists.apache.org/list?d...@flink.apache.org:lte=1M:controller%20flow]
>  , we reach a consensus to use the state machine mechanism to simplify the 
> annoying if-else in the reconciler. Since the modular {{Observer}} 
> {{Reconciler}} and {{validator}} have completed. I think we can start this 
> work now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26493) Use state machine based mechanism to simply the reconciler

2022-03-04 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26493?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501379#comment-17501379
 ] 

Aitozi edited comment on FLINK-26493 at 3/4/22, 3:22 PM:
-

Do you have some inputs for this [~gyfora]  [~wangyang0918] [~t...@apache.org] ?


was (Author: aitozi):
Do you have some inputs for this [~gyfora]  [~wangyang0918] ?

> Use state machine based mechanism to simply the reconciler
> --
>
> Key: FLINK-26493
> URL: https://issues.apache.org/jira/browse/FLINK-26493
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> As discussed in 
> [link|https://lists.apache.org/list?d...@flink.apache.org:lte=1M:controller%20flow]
>  , we reach a consensus to use the state machine mechanism to simplify the 
> annoying if-else in the reconciler. Since the modular {{Observer}} 
> {{Reconciler}} and {{validator}} have completed. I think we can start this 
> work now.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26528) Trigger the updateControl when the FlinkDeployment have changed

2022-03-07 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26528:
---
Description: If the CR has not changed since last reconcile, we could 
create a UpdateControl with {{UpdateControl#noUpdate}} , this is meant to 
reduce the unnecessary update for resource   (was: If the CR has not changed 
since last reconcile, we could create a UpdateControl with 
{{UpdateControl#noUpdate}} , this is meant to reduce the unnecessary reconcile )

> Trigger the updateControl when the FlinkDeployment have changed
> ---
>
> Key: FLINK-26528
> URL: https://issues.apache.org/jira/browse/FLINK-26528
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> If the CR has not changed since last reconcile, we could create a 
> UpdateControl with {{UpdateControl#noUpdate}} , this is meant to reduce the 
> unnecessary update for resource 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26528) Trigger the updateControl when the FlinkDeployment have changed

2022-03-07 Thread Aitozi (Jira)
Aitozi created FLINK-26528:
--

 Summary: Trigger the updateControl when the FlinkDeployment have 
changed
 Key: FLINK-26528
 URL: https://issues.apache.org/jira/browse/FLINK-26528
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Aitozi


If the CR has not changed since last reconcile, we could create a UpdateControl 
with {{UpdateControl#noUpdate}} , this is meant to reduce the unnecessary 
reconcile 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26528) Trigger the updateControl when the FlinkDeployment have changed

2022-03-08 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503027#comment-17503027
 ] 

Aitozi commented on FLINK-26528:


I proposal to provide a {{ReconcileResult}} which will generate the 
{{UpdateControl}} by comparing the FlinkDeployment before and after the 
reconcile handle. This requires to clone and keep the original resource object 
at the reconcile entrypoint.  By this, we can only update the resource when it 
has changed, reducing the unnecessary update, e.g. when doing the state sync.
cc [~gyfora] [~wangyang0918]

> Trigger the updateControl when the FlinkDeployment have changed
> ---
>
> Key: FLINK-26528
> URL: https://issues.apache.org/jira/browse/FLINK-26528
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> If the CR has not changed since last reconcile, we could create a 
> UpdateControl with {{UpdateControl#noUpdate}} , this is meant to reduce the 
> unnecessary update for resource 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26538) Ability to restart deployment w/o spec change

2022-03-13 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17505983#comment-17505983
 ] 

Aitozi commented on FLINK-26538:


[~thw] Are you planning to work on this, If not, Maybe I can do you a favor.

> Ability to restart deployment w/o spec change
> -
>
> Key: FLINK-26538
> URL: https://issues.apache.org/jira/browse/FLINK-26538
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Priority: Major
>
> Operator should allow restart of the Flink deployment w/o any other spec 
> change. This provides the escape hatch for an operator to recover a 
> deployment that has gone into a bad state (for whatever reason including 
> memory leaks, hung JVM etc.) without direct access to the k8s cluster. This 
> can be addressed by adding a restartNonce to the CRD.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26647) Can not add extra config files on native Kubernetes

2022-03-15 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507022#comment-17507022
 ] 

Aitozi commented on FLINK-26647:


It looks like the way 2 can meet your requirements, may be it caused by used 
the wrong volume name, I guess :) . It will be better to look into if you could 
provide some more logs.

> Can not add extra config files on native Kubernetes 
> 
>
> Key: FLINK-26647
> URL: https://issues.apache.org/jira/browse/FLINK-26647
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.13.5
>Reporter: Zhe Wang
>Priority: Critical
>
> When using native Kubernetes mode (both session and application), predefine 
> FLINK_CONF_DIR environment with config files in. Only two files( 
> *flink-conf.yaml and log4j-console.properties* ) are populated to configmap 
> which means missing of other config files(like sql-client-defaults.yaml, 
> zoo.cfg etc.)
> Tried these, neither worked out:
> 1) After native Kubernetes startup, change both configmap and deployment:
>     1. add all my config files to configmap.
>     2. add config file to deployment.spec.template.spec.volumes[]
>     3. Flink job pod startups fail(log: lost leadership )
>  
> 2) Using a *pod-template-file.taskmanager* file:
>     1. add config files to created confimap.
>     2. add my config files to template(others can be merged by Flink as guide 
> says)
>     3. Flink task pod startup fail, log: Duplicated volume name



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26647) Can not add extra config files on native Kubernetes

2022-03-15 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17507026#comment-17507026
 ] 

Aitozi commented on FLINK-26647:


I think we may could provide the tool to ship the client config files to config 
map directly. cc [~wangyang0918] 

> Can not add extra config files on native Kubernetes 
> 
>
> Key: FLINK-26647
> URL: https://issues.apache.org/jira/browse/FLINK-26647
> Project: Flink
>  Issue Type: Bug
>  Components: Deployment / Kubernetes
>Affects Versions: 1.13.5
>Reporter: Zhe Wang
>Priority: Critical
>
> When using native Kubernetes mode (both session and application), predefine 
> FLINK_CONF_DIR environment with config files in. Only two files( 
> *flink-conf.yaml and log4j-console.properties* ) are populated to configmap 
> which means missing of other config files(like sql-client-defaults.yaml, 
> zoo.cfg etc.)
> Tried these, neither worked out:
> 1) After native Kubernetes startup, change both configmap and deployment:
>     1. add all my config files to configmap.
>     2. add config file to deployment.spec.template.spec.volumes[]
>     3. Flink job pod startups fail(log: lost leadership )
>  
> 2) Using a *pod-template-file.taskmanager* file:
>     1. add config files to created confimap.
>     2. add my config files to template(others can be merged by Flink as guide 
> says)
>     3. Flink task pod startup fail, log: Duplicated volume name



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26538) Ability to restart deployment w/o spec change

2022-03-08 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503228#comment-17503228
 ] 

Aitozi commented on FLINK-26538:


Can this achieved by setting JobState to {{SUSPENDED}} and then set to 
{{RUNNING}}

> Ability to restart deployment w/o spec change
> -
>
> Key: FLINK-26538
> URL: https://issues.apache.org/jira/browse/FLINK-26538
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Priority: Major
>
> Operator should allow restart of the Flink deployment w/o any other spec 
> change. This provides the escape hatch for an operator to recover a 
> deployment that has gone into a bad state (for whatever reason including 
> memory leaks, hung JVM etc.) without direct access to the k8s cluster. This 
> can be addressed by adding a restartNonce to the CRD.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26546) Extract Observer Interface

2022-03-09 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503535#comment-17503535
 ] 

Aitozi commented on FLINK-26546:


Get it.

> Extract Observer Interface
> --
>
> Key: FLINK-26546
> URL: https://issues.apache.org/jira/browse/FLINK-26546
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Matyas Orhidi
>Assignee: Aitozi
>Priority: Major
>
> Similarly to the Reconciler Interface we should extract the Observer 
> interface.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26528) Trigger the updateControl when the FlinkDeployment have changed

2022-03-09 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503534#comment-17503534
 ] 

Aitozi commented on FLINK-26528:


Yes, I can take this work, I have done some poc code locally, I will continue 
on your current code

> Trigger the updateControl when the FlinkDeployment have changed
> ---
>
> Key: FLINK-26528
> URL: https://issues.apache.org/jira/browse/FLINK-26528
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> If the CR has not changed since last reconcile, we could create a 
> UpdateControl with {{UpdateControl#noUpdate}} , this is meant to reduce the 
> unnecessary update for resource 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26719) Rethink the default reschedule reconcile loop

2022-03-18 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17508597#comment-17508597
 ] 

Aitozi commented on FLINK-26719:


cc [~wangyang0918] [~gyfora] 

> Rethink the default reschedule reconcile loop
> -
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the 
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I 
> think we just need to reconcile
>  # waiting for the status change
>  # receive the new event
>  # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the 
> reconcile except waiting for the savepoint result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26719) Rethink the default reschedule reconcile loop

2022-03-18 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17508658#comment-17508658
 ] 

Aitozi commented on FLINK-26719:


[~matyas] Thanks for your inputs, It seems we have done the same thing manually 
in 
\{{org.apache.flink.kubernetes.operator.observer.JobManagerDeploymentStatus#rescheduleAfter}}
 .

[~wangyang0918], IMO we have to define a final/target status for example: {{the 
JobManager is ready for serve}} and stop the reconcile, It's not a common way 
to run each loop to sync status without an end. 

> Rethink the default reschedule reconcile loop
> -
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the 
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I 
> think we just need to reconcile
>  # waiting for the status change
>  # receive the new event
>  # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the 
> reconcile except waiting for the savepoint result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26719) Rethink the default reschedule reconcile loop

2022-03-18 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17508658#comment-17508658
 ] 

Aitozi edited comment on FLINK-26719 at 3/18/22, 8:55 AM:
--

[~matyas] Thanks for your inputs, It seems we have done the same thing manually 
in 
{{org.apache.flink.kubernetes.operator.observer.JobManagerDeploymentStatus#rescheduleAfter}}
 .

[~wangyang0918], IMO we have to define a final/target status for example: {{the 
JobManager is ready for serve}} and stop the reconcile, It's not a common way 
to run a periodic loop to sync status without an end. 


was (Author: aitozi):
[~matyas] Thanks for your inputs, It seems we have done the same thing manually 
in 
\{{org.apache.flink.kubernetes.operator.observer.JobManagerDeploymentStatus#rescheduleAfter}}
 .

[~wangyang0918], IMO we have to define a final/target status for example: {{the 
JobManager is ready for serve}} and stop the reconcile, It's not a common way 
to run each loop to sync status without an end. 

> Rethink the default reschedule reconcile loop
> -
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the 
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I 
> think we just need to reconcile
>  # waiting for the status change
>  # receive the new event
>  # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the 
> reconcile except waiting for the savepoint result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26719) Rethink the default reschedule reconcile loop

2022-03-18 Thread Aitozi (Jira)
Aitozi created FLINK-26719:
--

 Summary: Rethink the default reschedule reconcile loop
 Key: FLINK-26719
 URL: https://issues.apache.org/jira/browse/FLINK-26719
 Project: Flink
  Issue Type: Sub-task
Reporter: Aitozi


When I test locally, I found that it will reschedule and reconcile with the 
{{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I 
think we just need to reconcile
 # waiting for the status change
 # receive the new event
 # waiting for the savepoint result

So when JobManagerDeploymentStatus is Ready, we do not have to trigger the 
reconcile except waiting for the savepoint result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26538) Ability to restart deployment w/o spec change

2022-03-09 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26538?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503428#comment-17503428
 ] 

Aitozi commented on FLINK-26538:


For the convenient purpose, I'm +1 for this.  It looks like the {{XXNonce}} can 
act as a {{one-shot}} command to interact with operator :)

> Ability to restart deployment w/o spec change
> -
>
> Key: FLINK-26538
> URL: https://issues.apache.org/jira/browse/FLINK-26538
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Thomas Weise
>Priority: Major
>
> Operator should allow restart of the Flink deployment w/o any other spec 
> change. This provides the escape hatch for an operator to recover a 
> deployment that has gone into a bad state (for whatever reason including 
> memory leaks, hung JVM etc.) without direct access to the k8s cluster. This 
> can be addressed by adding a restartNonce to the CRD.
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26546) Extract Observer Interface

2022-03-09 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26546?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503438#comment-17503438
 ] 

Aitozi commented on FLINK-26546:


+1. I'm willing to take the work

> Extract Observer Interface
> --
>
> Key: FLINK-26546
> URL: https://issues.apache.org/jira/browse/FLINK-26546
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Matyas Orhidi
>Priority: Major
>
> Similarly to the Reconciler Interface we should extract the Observer 
> interface.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26528) Trigger the updateControl when the FlinkDeployment have changed

2022-03-09 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503409#comment-17503409
 ] 

Aitozi commented on FLINK-26528:


It's not the same question, this ticket is want to solve that we should avoid 
to trigger the update for the object when it has not changed. 

> Trigger the updateControl when the FlinkDeployment have changed
> ---
>
> Key: FLINK-26528
> URL: https://issues.apache.org/jira/browse/FLINK-26528
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> If the CR has not changed since last reconcile, we could create a 
> UpdateControl with {{UpdateControl#noUpdate}} , this is meant to reduce the 
> unnecessary update for resource 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26528) Trigger the updateControl when the FlinkDeployment have changed

2022-03-09 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503631#comment-17503631
 ] 

Aitozi edited comment on FLINK-26528 at 3/9/22, 3:07 PM:
-

Yes, my meaning is to eliminate the unchanged update towards kubernetes. IMO, 
The {{UpdateControl}} have four abilities
 * update status
 * update resource
 * not update not reschedule
 * not update but reschedule

So if we have change the status or resource (not recommended and not used now), 
we should apply the update. If the status have not achieved the desired state, 
and is not changed since last reconcile loop, we should just reschedule and 
wait the next turn.


was (Author: aitozi):
Yes, my meaning is to eliminate the unchanged update to toward kubernetes. IMO, 
The {{UpdateControl}} have four abilities
 * update status
 * update resource
 * not update not reschedule
 * not update but reschedule

So if we have change the status or resource (not recommended and not used now), 
we should apply the update. If the status have not achieved the desired state, 
and is not changed since last reconcile loop, we should just reschedule and 
wait the next turn.

> Trigger the updateControl when the FlinkDeployment have changed
> ---
>
> Key: FLINK-26528
> URL: https://issues.apache.org/jira/browse/FLINK-26528
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> If the CR has not changed since last reconcile, we could create a 
> UpdateControl with {{UpdateControl#noUpdate}} , this is meant to reduce the 
> unnecessary update for resource 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26528) Trigger the updateControl when the FlinkDeployment have changed

2022-03-09 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17503631#comment-17503631
 ] 

Aitozi commented on FLINK-26528:


Yes, my meaning is to eliminate the unchanged update to toward kubernetes. IMO, 
The {{UpdateControl}} have four abilities
 * update status
 * update resource
 * not update not reschedule
 * not update but reschedule

So if we have change the status or resource (not recommended and not used now), 
we should apply the update. If the status have not achieved the desired state, 
and is not changed since last reconcile loop, we should just reschedule and 
wait the next turn.

> Trigger the updateControl when the FlinkDeployment have changed
> ---
>
> Key: FLINK-26528
> URL: https://issues.apache.org/jira/browse/FLINK-26528
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> If the CR has not changed since last reconcile, we could create a 
> UpdateControl with {{UpdateControl#noUpdate}} , this is meant to reduce the 
> unnecessary update for resource 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26136) Implement shared validation logic for FlinkDeployment objects

2022-02-23 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497202#comment-17497202
 ] 

Aitozi commented on FLINK-26136:


Got it, thanks.

> Implement shared validation logic for FlinkDeployment objects
> -
>
> Key: FLINK-26136
> URL: https://issues.apache.org/jira/browse/FLINK-26136
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Major
>
> At the moment there is only a very basic “placeholder” validation logic 
> implemented in the webhook module: 
> org.apache.flink.kubernetes.operator.admission.FlinkDeploymentValidator
> We should aim to validate parts of the FlinkDeployment that can be done 
> upfront, things like most common Flink config options, parallelism, resources 
> etc.
> As described in https://issues.apache.org/jira/browse/FLINK-26135 this 
> validation should be part of the flink-kubernetes-operator module.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-26339) Introduce the webhook config to free the environment options

2022-02-23 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi closed FLINK-26339.
--
Resolution: Not A Problem

> Introduce the webhook config to free the environment options
> 
>
> Key: FLINK-26339
> URL: https://issues.apache.org/jira/browse/FLINK-26339
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> Introduce the webhook config to free the responsibilities of some 
> environments. We should depend on the environments the less the better 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26339) Introduce the webhook config to free the environment options

2022-02-23 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497158#comment-17497158
 ] 

Aitozi commented on FLINK-26339:


I mixed it up, there is no need to do this now, sorry for bother :(

> Introduce the webhook config to free the environment options
> 
>
> Key: FLINK-26339
> URL: https://issues.apache.org/jira/browse/FLINK-26339
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> Introduce the webhook config to free the responsibilities of some 
> environments. We should depend on the environments the less the better 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26136) Implement shared validation logic for FlinkDeployment objects

2022-02-23 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26136?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497184#comment-17497184
 ] 

Aitozi commented on FLINK-26136:


Hi [~gyfora], I have a doubt for this issue. I think the webhook will intercept 
the request of CREATE and UPDATE of the FlinkDeployments. If we apply the 
update to a FlinkDeployment, it will first validate by webhook, then reconcile 
by the flink-kubernetes-operator. So do we need to share the validation with 
the flink-kubernetes-operator ? 

> Implement shared validation logic for FlinkDeployment objects
> -
>
> Key: FLINK-26136
> URL: https://issues.apache.org/jira/browse/FLINK-26136
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Assignee: Gyula Fora
>Priority: Major
>
> At the moment there is only a very basic “placeholder” validation logic 
> implemented in the webhook module: 
> org.apache.flink.kubernetes.operator.admission.FlinkDeploymentValidator
> We should aim to validate parts of the FlinkDeployment that can be done 
> upfront, things like most common Flink config options, parallelism, resources 
> etc.
> As described in https://issues.apache.org/jira/browse/FLINK-26135 this 
> validation should be part of the flink-kubernetes-operator module.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26356) Revisit the create of RestClusterClient

2022-02-24 Thread Aitozi (Jira)
Aitozi created FLINK-26356:
--

 Summary: Revisit the create of RestClusterClient
 Key: FLINK-26356
 URL: https://issues.apache.org/jira/browse/FLINK-26356
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Aitozi


The clusterClient is built as below. The config is mixed up with the 
FlinkDeploymentSpec and local default config. 
{code:java}
final int port = config.getInteger(RestOptions.PORT);
final String host =
config.getString(
RestOptions.ADDRESS, String.format("%s-rest.%s", clusterId, 
namespace));
final String restServerAddress = String.format("http://%s:%s;, host, port); 
{code}
But the {{RestOptions.ADDRESS}} is generated at the entrypoint when the HA is 
enabled, so the option can not obtain from the FlinkDeploymentSpec.

Furthermore, the default rest url is not suitable for all the service type. I 
think we should extract the rest endpoint from the Flink external service.

One more concern is that, if the operator manage the multiple namespace, the 
rest url of \{{serviceName.namespace}} may not enough, it can not access across 
the namespace. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26356) Revisit the create of RestClusterClient

2022-02-24 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17497848#comment-17497848
 ] 

Aitozi commented on FLINK-26356:


I have not make it very clear yet, I will do more investigation first. I can 
take this ticket. 

> Revisit the create of RestClusterClient
> ---
>
> Key: FLINK-26356
> URL: https://issues.apache.org/jira/browse/FLINK-26356
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> The clusterClient is built as below. The config is mixed up with the 
> FlinkDeploymentSpec and local default config. 
> {code:java}
> final int port = config.getInteger(RestOptions.PORT);
> final String host =
> config.getString(
> RestOptions.ADDRESS, String.format("%s-rest.%s", clusterId, 
> namespace));
> final String restServerAddress = String.format("http://%s:%s;, host, port); 
> {code}
> But the {{RestOptions.ADDRESS}} is generated at the entrypoint when the HA is 
> enabled, so the option can not obtain from the FlinkDeploymentSpec.
> Furthermore, the default rest url is not suitable for all the service type. I 
> think we should extract the rest endpoint from the Flink external service.
> One more concern is that, if the operator manage the multiple namespace, the 
> rest url of \{{serviceName.namespace}} may not enough, it can not access 
> across the namespace. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26356) Revisit the create of RestClusterClient

2022-02-24 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26356?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26356:
---
Description: 
The clusterClient is built as below. The config is mixed up with the 
FlinkDeploymentSpec and local default config. 
{code:java}
final int port = config.getInteger(RestOptions.PORT);
final String host =
config.getString(
RestOptions.ADDRESS, String.format("%s-rest.%s", clusterId, 
namespace));
final String restServerAddress = String.format("http://%s:%s;, host, port); 
{code}
But the {{RestOptions.ADDRESS}} is generated at the entrypoint when the HA is 
enabled, so the option can not obtain from the FlinkDeploymentSpec.

Furthermore, the default rest url is not suitable for all the service type. I 
think we should extract the rest endpoint from the Flink external service.

  was:
The clusterClient is built as below. The config is mixed up with the 
FlinkDeploymentSpec and local default config. 
{code:java}
final int port = config.getInteger(RestOptions.PORT);
final String host =
config.getString(
RestOptions.ADDRESS, String.format("%s-rest.%s", clusterId, 
namespace));
final String restServerAddress = String.format("http://%s:%s;, host, port); 
{code}
But the {{RestOptions.ADDRESS}} is generated at the entrypoint when the HA is 
enabled, so the option can not obtain from the FlinkDeploymentSpec.

Furthermore, the default rest url is not suitable for all the service type. I 
think we should extract the rest endpoint from the Flink external service.

One more concern is that, if the operator manage the multiple namespace, the 
rest url of \{{serviceName.namespace}} may not enough, it can not access across 
the namespace. 


> Revisit the create of RestClusterClient
> ---
>
> Key: FLINK-26356
> URL: https://issues.apache.org/jira/browse/FLINK-26356
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Aitozi
>Priority: Major
>
> The clusterClient is built as below. The config is mixed up with the 
> FlinkDeploymentSpec and local default config. 
> {code:java}
> final int port = config.getInteger(RestOptions.PORT);
> final String host =
> config.getString(
> RestOptions.ADDRESS, String.format("%s-rest.%s", clusterId, 
> namespace));
> final String restServerAddress = String.format("http://%s:%s;, host, port); 
> {code}
> But the {{RestOptions.ADDRESS}} is generated at the entrypoint when the HA is 
> enabled, so the option can not obtain from the FlinkDeploymentSpec.
> Furthermore, the default rest url is not suitable for all the service type. I 
> think we should extract the rest endpoint from the Flink external service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26399) Make some option of operator configurable

2022-02-28 Thread Aitozi (Jira)
Aitozi created FLINK-26399:
--

 Summary: Make some option of operator configurable
 Key: FLINK-26399
 URL: https://issues.apache.org/jira/browse/FLINK-26399
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Aitozi


As described in 
[pr|https://github.com/apache/flink-kubernetes-operator/pull/28], we'd better 
to use option to control the operator related configs. I will first make the 
scattered 
static config variables in current version configurable. cc [~gyfora] 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26377) Extract Reconciler interface

2022-02-25 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498393#comment-17498393
 ] 

Aitozi commented on FLINK-26377:


I have the same sense to extract common reconciler interface. We can choose or 
create the target reconciler based on the FlinkDeployment. I volunteer to do 
this refactor.
One more further question I want to discuss: Do we need to introduce the extra 
filed like {{mode}} to reflect the mode FlinkDeployment, for example: {{JOB}} 
and {{SESSION}} and do not depend on the {{JobSpec}}

> Extract Reconciler interface
> 
>
> Key: FLINK-26377
> URL: https://issues.apache.org/jira/browse/FLINK-26377
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
>
> We should extract a common interface for the different reconciler classes 
> (Job and Session for now) and create the reconciler instance on the fly based 
> on the FlinkDeployment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26356) Revisit the create of RestClusterClient

2022-02-25 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498402#comment-17498402
 ] 

Aitozi commented on FLINK-26356:


I have some misunderstand before, since the operator will always be deployed in 
the same cluster with Flink job, so we do not rely on the full functionality of 
the external service.

I did a little test in minikube cluster to verify the behavior.  In 
\{{NodePort}}, \{{LoadBalancer}} and \{{ClusterIP}} mode the rest url 
{{-rest.}} will be route to the clusterIp. In 
\{{Headless_Cluster_IP}} the rest url will be route to the jobManager directly. 
 So all the service type here will be functional work use with 
{{-rest.}} :)

> Revisit the create of RestClusterClient
> ---
>
> Key: FLINK-26356
> URL: https://issues.apache.org/jira/browse/FLINK-26356
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> The clusterClient is built as below. The config is mixed up with the 
> FlinkDeploymentSpec and local default config. 
> {code:java}
> final int port = config.getInteger(RestOptions.PORT);
> final String host =
> config.getString(
> RestOptions.ADDRESS, String.format("%s-rest.%s", clusterId, 
> namespace));
> final String restServerAddress = String.format("http://%s:%s;, host, port); 
> {code}
> But the {{RestOptions.ADDRESS}} is generated at the entrypoint when the HA is 
> enabled, so the option can not obtain from the FlinkDeploymentSpec.
> Furthermore, the default rest url is not suitable for all the service type. I 
> think we should extract the rest endpoint from the Flink external service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26377) Extract Reconciler interface

2022-02-25 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498393#comment-17498393
 ] 

Aitozi edited comment on FLINK-26377 at 2/26/22, 7:01 AM:
--

I have the same sense to extract common reconciler interface. We can choose or 
create the target reconciler based on the FlinkDeployment. I volunteer to do 
this refactor.

One more further question I want to discuss: Do we need to introduce the extra 
filed like {{mode}} to reflect the mode of FlinkDeployment, for example: 
{{JOB}} and {{SESSION}}. MayBe {{Standalone}} mode on the way. By this, we can 
directly know the mode of a FlinkApp 
instead of deciding by the null or not null of {{JobSpec}}.


was (Author: aitozi):
I have the same sense to extract common reconciler interface. We can choose or 
create the target reconciler based on the FlinkDeployment. I volunteer to do 
this refactor.
One more further question I want to discuss: Do we need to introduce the extra 
filed like {{mode}} to reflect the mode of FlinkDeployment, for example: 
{{JOB}} and {{SESSION}}. MayBe {{Standalone}} mode on the way. By this, we can 
directly know the mode of a FlinkApp 
instead of deciding by the null or not null of {{JobSpec}}.

> Extract Reconciler interface
> 
>
> Key: FLINK-26377
> URL: https://issues.apache.org/jira/browse/FLINK-26377
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
>
> We should extract a common interface for the different reconciler classes 
> (Job and Session for now) and create the reconciler instance on the fly based 
> on the FlinkDeployment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26377) Extract Reconciler interface

2022-02-25 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498393#comment-17498393
 ] 

Aitozi edited comment on FLINK-26377 at 2/26/22, 7:01 AM:
--

I have the same sense to extract common reconciler interface. We can choose or 
create the target reconciler based on the FlinkDeployment. I volunteer to do 
this refactor.
One more further question I want to discuss: Do we need to introduce the extra 
filed like {{mode}} to reflect the mode of FlinkDeployment, for example: 
{{JOB}} and {{SESSION}}. MayBe {{Standalone}} mode on the way. By this, we can 
directly know the mode of a FlinkApp 
instead of deciding by the null or not null of {{JobSpec}}.


was (Author: aitozi):
I have the same sense to extract common reconciler interface. We can choose or 
create the target reconciler based on the FlinkDeployment. I volunteer to do 
this refactor.
One more further question I want to discuss: Do we need to introduce the extra 
filed like {{mode}} to reflect the mode of FlinkDeployment, for example: 
{{JOB}} and {{SESSION}} and do not depend on the {{JobSpec.}} MayBe 
{{Standalone}} mode in future.

> Extract Reconciler interface
> 
>
> Key: FLINK-26377
> URL: https://issues.apache.org/jira/browse/FLINK-26377
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
>
> We should extract a common interface for the different reconciler classes 
> (Job and Session for now) and create the reconciler instance on the fly based 
> on the FlinkDeployment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26377) Extract Reconciler interface

2022-02-25 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498408#comment-17498408
 ] 

Aitozi commented on FLINK-26377:


Get it !

> Extract Reconciler interface
> 
>
> Key: FLINK-26377
> URL: https://issues.apache.org/jira/browse/FLINK-26377
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Assignee: Aitozi
>Priority: Major
>
> We should extract a common interface for the different reconciler classes 
> (Job and Session for now) and create the reconciler instance on the fly based 
> on the FlinkDeployment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-17232) Rethink the implicit behavior to use the Service externalIP as the address of the Endpoint

2022-02-21 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17495844#comment-17495844
 ] 

Aitozi commented on FLINK-17232:


[~wangyang0918] Could you help review 
[PR|https://github.com/apache/flink/pull/18762] ?  I want to move forward to 
finish the another part of this work which reply on the current PR 

> Rethink the implicit behavior to use the Service externalIP as the address of 
> the Endpoint
> --
>
> Key: FLINK-17232
> URL: https://issues.apache.org/jira/browse/FLINK-17232
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.10.0, 1.10.1
>Reporter: Canbin Zheng
>Assignee: Aitozi
>Priority: Major
>  Labels: auto-unassigned
>
> Currently, for the LB/NodePort type Service, if we found that the 
> {{LoadBalancer}} in the {{Service}} is null, we would use the externalIPs 
> configured in the external Service as the address of the Endpoint. Again, 
> this is another implicit toleration and may confuse the users.
> This ticket proposes to rethink the implicit toleration behaviour.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26332) Move the Operator Env to the common Utils

2022-02-23 Thread Aitozi (Jira)
Aitozi created FLINK-26332:
--

 Summary: Move the Operator Env to the common Utils
 Key: FLINK-26332
 URL: https://issues.apache.org/jira/browse/FLINK-26332
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Aitozi


Add a common util to extract the system env variables



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26332) Move the Operator Env to the common Utils

2022-02-23 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26332?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496680#comment-17496680
 ] 

Aitozi commented on FLINK-26332:


I want to work on this :D

> Move the Operator Env to the common Utils
> -
>
> Key: FLINK-26332
> URL: https://issues.apache.org/jira/browse/FLINK-26332
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> Add a common util to extract the system env variables



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-17232) Rethink the implicit behavior to use the Service externalIP as the address of the Endpoint

2022-02-23 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496678#comment-17496678
 ] 

Aitozi commented on FLINK-17232:


Get it, thanks 

> Rethink the implicit behavior to use the Service externalIP as the address of 
> the Endpoint
> --
>
> Key: FLINK-17232
> URL: https://issues.apache.org/jira/browse/FLINK-17232
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.10.0, 1.10.1
>Reporter: Canbin Zheng
>Assignee: Aitozi
>Priority: Major
>  Labels: auto-unassigned
>
> Currently, for the LB/NodePort type Service, if we found that the 
> {{LoadBalancer}} in the {{Service}} is null, we would use the externalIPs 
> configured in the external Service as the address of the Endpoint. Again, 
> this is another implicit toleration and may confuse the users.
> This ticket proposes to rethink the implicit toleration behaviour.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26337) Avoid load flink conf each reconcile loop

2022-02-23 Thread Aitozi (Jira)
Aitozi created FLINK-26337:
--

 Summary: Avoid load flink conf each reconcile loop
 Key: FLINK-26337
 URL: https://issues.apache.org/jira/browse/FLINK-26337
 Project: Flink
  Issue Type: Sub-task
Reporter: Aitozi


It will create FlinkConfigBuilder at every loop of reconcile,  It's not 
necessary and may bring overhead. The default flink conf and operator conf 
should load at entry point. If the ConfigMap is updated, then operator should 
be trigger upgrade to load new config. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26337) Avoid load flink conf each reconcile loop

2022-02-23 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496791#comment-17496791
 ] 

Aitozi commented on FLINK-26337:


cc [~gyfora]  [~wangyang0918] 

> Avoid load flink conf each reconcile loop
> -
>
> Key: FLINK-26337
> URL: https://issues.apache.org/jira/browse/FLINK-26337
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> It will create FlinkConfigBuilder at every loop of reconcile,  It's not 
> necessary and may bring overhead. The default flink conf and operator conf 
> should load at entry point. If the ConfigMap is updated, then operator should 
> be trigger upgrade to load new config. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26332) Move the Operator Env to the common Utils

2022-02-23 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26332:
---
Parent: FLINK-25963
Issue Type: Sub-task  (was: Improvement)

> Move the Operator Env to the common Utils
> -
>
> Key: FLINK-26332
> URL: https://issues.apache.org/jira/browse/FLINK-26332
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> Add a common util to extract the system env variables



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26339) Introduce the webhook config to free the environment options

2022-02-23 Thread Aitozi (Jira)
Aitozi created FLINK-26339:
--

 Summary: Introduce the webhook config to free the environment 
options
 Key: FLINK-26339
 URL: https://issues.apache.org/jira/browse/FLINK-26339
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Aitozi


Introduce the webhook config to free the responsibilities of some environments. 
We should depend on the environments the less the better 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26339) Introduce the webhook config to free the environment options

2022-02-23 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496827#comment-17496827
 ] 

Aitozi commented on FLINK-26339:


cc [~gyfora]  [~wangyang0918], I can work on this ticket :) 

> Introduce the webhook config to free the environment options
> 
>
> Key: FLINK-26339
> URL: https://issues.apache.org/jira/browse/FLINK-26339
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> Introduce the webhook config to free the responsibilities of some 
> environments. We should depend on the environments the less the better 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26337) Avoid load flink conf each reconcile loop

2022-02-23 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17496813#comment-17496813
 ] 

Aitozi commented on FLINK-26337:


Yes, I can take this ticket. 

> Avoid load flink conf each reconcile loop
> -
>
> Key: FLINK-26337
> URL: https://issues.apache.org/jira/browse/FLINK-26337
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> It will create FlinkConfigBuilder at every loop of reconcile,  It's not 
> necessary and may bring overhead. The default flink conf and operator conf 
> should load at entry point. If the ConfigMap is updated, then operator should 
> be trigger upgrade to load new config. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26337) Avoid load flink conf each reconcile loop

2022-02-23 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26337:
---
Component/s: Kubernetes Operator

> Avoid load flink conf each reconcile loop
> -
>
> Key: FLINK-26337
> URL: https://issues.apache.org/jira/browse/FLINK-26337
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Assignee: Aitozi
>Priority: Major
>  Labels: pull-request-available
>
> It will create FlinkConfigBuilder at every loop of reconcile,  It's not 
> necessary and may bring overhead. The default flink conf and operator conf 
> should load at entry point. If the ConfigMap is updated, then operator should 
> be trigger upgrade to load new config. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26435) Provide the container name for CI debug log

2022-03-01 Thread Aitozi (Jira)
Aitozi created FLINK-26435:
--

 Summary: Provide the container name for CI debug log 
 Key: FLINK-26435
 URL: https://issues.apache.org/jira/browse/FLINK-26435
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Aitozi


Current kubectl log for the CI have not specified the container name as below

 
{code:java}
Flink logs:
335Current logs for flink-operator-6c66c5-9ptqw: 
336error: a container name must be specified for pod 
flink-operator-6c66c5-9ptqw, choose one of: [flink-operator flink-webhook] 
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26435) Provide the container name for CI debug log

2022-03-01 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26435:
---
Description: 
Current kubectl log for the CI have not specified the container name as below

 
{code:java}
Flink logs:
Current logs for flink-operator-6c66c5-9ptqw: 
error: a container name must be specified for pod 
flink-operator-6c66c5-9ptqw, choose one of: [flink-operator flink-webhook] 
{code}

  was:
Current kubectl log for the CI have not specified the container name as below

 
{code:java}
Flink logs:
335Current logs for flink-operator-6c66c5-9ptqw: 
336error: a container name must be specified for pod 
flink-operator-6c66c5-9ptqw, choose one of: [flink-operator flink-webhook] 
{code}


> Provide the container name for CI debug log 
> 
>
> Key: FLINK-26435
> URL: https://issues.apache.org/jira/browse/FLINK-26435
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> Current kubectl log for the CI have not specified the container name as below
>  
> {code:java}
> Flink logs:
> Current logs for flink-operator-6c66c5-9ptqw: 
> error: a container name must be specified for pod 
> flink-operator-6c66c5-9ptqw, choose one of: [flink-operator 
> flink-webhook] {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26435) Provide the container name for CI debug log

2022-03-02 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500052#comment-17500052
 ] 

Aitozi commented on FLINK-26435:


Hmm.. It's come from the {{flink-kubernetes-operator}} project's ci, see 
[here|https://github.com/apache/flink-kubernetes-operator/runs/5378023577?check_suite_focus=true]

> Provide the container name for CI debug log 
> 
>
> Key: FLINK-26435
> URL: https://issues.apache.org/jira/browse/FLINK-26435
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Assignee: Aitozi
>Priority: Major
>
> Current kubectl log for the CI have not specified the container name as below
>  
> {code:java}
> Flink logs:
> Current logs for flink-operator-6c66c5-9ptqw: 
> error: a container name must be specified for pod 
> flink-operator-6c66c5-9ptqw, choose one of: [flink-operator 
> flink-webhook] {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26435) Provide the container name for CI debug log

2022-03-02 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17500086#comment-17500086
 ] 

Aitozi commented on FLINK-26435:


Get it, Done. 

> Provide the container name for CI debug log 
> 
>
> Key: FLINK-26435
> URL: https://issues.apache.org/jira/browse/FLINK-26435
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Assignee: Aitozi
>Priority: Major
>
> Current kubectl log for the CI have not specified the container name as below
>  
> {code:java}
> Flink logs:
> Current logs for flink-operator-6c66c5-9ptqw: 
> error: a container name must be specified for pod 
> flink-operator-6c66c5-9ptqw, choose one of: [flink-operator 
> flink-webhook] {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26435) Provide the container name for CI debug log

2022-03-02 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26435?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26435:
---
Parent: (was: FLINK-25963)
Issue Type: Bug  (was: Sub-task)

> Provide the container name for CI debug log 
> 
>
> Key: FLINK-26435
> URL: https://issues.apache.org/jira/browse/FLINK-26435
> Project: Flink
>  Issue Type: Bug
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Assignee: Aitozi
>Priority: Major
>
> Current kubectl log for the CI have not specified the container name as below
>  
> {code:java}
> Flink logs:
> Current logs for flink-operator-6c66c5-9ptqw: 
> error: a container name must be specified for pod 
> flink-operator-6c66c5-9ptqw, choose one of: [flink-operator 
> flink-webhook] {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26377) Extract Reconciler interface

2022-03-04 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501360#comment-17501360
 ] 

Aitozi commented on FLINK-26377:


[~gyfora] I have finished it, and just submit the pull request, please help 
take a look.

> Extract Reconciler interface
> 
>
> Key: FLINK-26377
> URL: https://issues.apache.org/jira/browse/FLINK-26377
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Assignee: Aitozi
>Priority: Major
>  Labels: pull-request-available
>
> We should extract a common interface for the different reconciler classes 
> (Job and Session for now) and create the reconciler instance on the fly based 
> on the FlinkDeployment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26112) Port getEndpoint method to the specific service type subclass

2022-02-14 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26112:
---
Description: In the 
[FLINK-20830|https://issues.apache.org/jira/projects/FLINK/issues/FLINK-20830]we
 introduce serval subclass to deal with the service build and query, This 
ticket is meant to move the related code to the proper class   (was: In the 
[FLINK-20830 
|https://issues.apache.org/jira/projects/FLINK/issues/FLINK-20830]we introduce 
serval subclass to deal with the service build and query, This ticket is meant 
to move the related code to the proper class )

> Port getEndpoint method to the specific service type subclass
> -
>
> Key: FLINK-26112
> URL: https://issues.apache.org/jira/browse/FLINK-26112
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> In the 
> [FLINK-20830|https://issues.apache.org/jira/projects/FLINK/issues/FLINK-20830]we
>  introduce serval subclass to deal with the service build and query, This 
> ticket is meant to move the related code to the proper class 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26112) Port getEndpoint method to the specific service type subclass

2022-02-14 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26112:
---
Parent: FLINK-17196
Issue Type: Sub-task  (was: Improvement)

> Port getEndpoint method to the specific service type subclass
> -
>
> Key: FLINK-26112
> URL: https://issues.apache.org/jira/browse/FLINK-26112
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> In the [FLINK-20830 
> |https://issues.apache.org/jira/projects/FLINK/issues/FLINK-20830]we 
> introduce serval subclass to deal with the service build and query, This 
> ticket is meant to move the related code to the proper class 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-17232) Rethink the implicit behavior to use the Service externalIP as the address of the Endpoint

2022-02-14 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17491971#comment-17491971
 ] 

Aitozi commented on FLINK-17232:


Hi, when i do some work to refactor the code in 
\{{Fabric8FlinkKubeClient#getRestEndpoint}} in 
[FLINK-26112|https://issues.apache.org/jira/projects/FLINK/issues/FLINK-26112]. 
I also confused by this behavior which will fallback to externalIP when 
loadBalancer is null. Do you have some suggestion for this cc [~felixzheng]  
[~wangyang0918]  

> Rethink the implicit behavior to use the Service externalIP as the address of 
> the Endpoint
> --
>
> Key: FLINK-17232
> URL: https://issues.apache.org/jira/browse/FLINK-17232
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.10.0, 1.10.1
>Reporter: Canbin Zheng
>Priority: Major
>  Labels: auto-unassigned
>
> Currently, for the LB/NodePort type Service, if we found that the 
> {{LoadBalancer}} in the {{Service}} is null, we would use the externalIPs 
> configured in the external Service as the address of the Endpoint. Again, 
> this is another implicit toleration and may confuse the users.
> This ticket proposes to rethink the implicit toleration behaviour.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26112) Port getEndpoint method to the specific service type subclass

2022-02-14 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26112?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17491973#comment-17491973
 ] 

Aitozi commented on FLINK-26112:


I found it's also an improvement of the implementation of 
Fabric8FlinkKubeClient#getRestEndpoint , so I convert it to the subtask of  
FLINK-17196 Could you help assign this ticket to me [~wangyang0918] ?

> Port getEndpoint method to the specific service type subclass
> -
>
> Key: FLINK-26112
> URL: https://issues.apache.org/jira/browse/FLINK-26112
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> In the [FLINK-20830 
> |https://issues.apache.org/jira/projects/FLINK/issues/FLINK-20830]we 
> introduce serval subclass to deal with the service build and query, This 
> ticket is meant to move the related code to the proper class 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26112) Port getEndpoint method to the specific service type subclass

2022-02-13 Thread Aitozi (Jira)
Aitozi created FLINK-26112:
--

 Summary: Port getEndpoint method to the specific service type 
subclass
 Key: FLINK-26112
 URL: https://issues.apache.org/jira/browse/FLINK-26112
 Project: Flink
  Issue Type: Improvement
Reporter: Aitozi


In the 
[FLINK-20830|https://issues.apache.org/jira/projects/FLINK/issues/FLINK-20830], 
we introduce serval subclass to deal with the service build and query, This 
ticket is meant to move the related code to the proper class 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26112) Port getEndpoint method to the specific service type subclass

2022-02-13 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26112:
---
Description: In the [FLINK-20830 
|https://issues.apache.org/jira/projects/FLINK/issues/FLINK-20830]we introduce 
serval subclass to deal with the service build and query, This ticket is meant 
to move the related code to the proper class   (was: In the 
[FLINK-20830|https://issues.apache.org/jira/projects/FLINK/issues/FLINK-20830], 
we introduce serval subclass to deal with the service build and query, This 
ticket is meant to move the related code to the proper class )

> Port getEndpoint method to the specific service type subclass
> -
>
> Key: FLINK-26112
> URL: https://issues.apache.org/jira/browse/FLINK-26112
> Project: Flink
>  Issue Type: Improvement
>Reporter: Aitozi
>Priority: Major
>
> In the [FLINK-20830 
> |https://issues.apache.org/jira/projects/FLINK/issues/FLINK-20830]we 
> introduce serval subclass to deal with the service build and query, This 
> ticket is meant to move the related code to the proper class 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26112) Port getEndpoint method to the specific service type subclass

2022-02-14 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26112:
---
Description: In the 
[FLINK-20830|https://issues.apache.org/jira/browse/FLINK-20830]we introduce 
serval subclass to deal with the service build and query, This ticket is meant 
to move the related code to the proper class   (was: In the 
[FLINK-20830|https://issues.apache.org/jira/projects/FLINK/issues/FLINK-20830]we
 introduce serval subclass to deal with the service build and query, This 
ticket is meant to move the related code to the proper class )

> Port getEndpoint method to the specific service type subclass
> -
>
> Key: FLINK-26112
> URL: https://issues.apache.org/jira/browse/FLINK-26112
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> In the [FLINK-20830|https://issues.apache.org/jira/browse/FLINK-20830]we 
> introduce serval subclass to deal with the service build and query, This 
> ticket is meant to move the related code to the proper class 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26112) Port getEndpoint method to the specific service type subclass

2022-02-14 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26112:
---
Description: In the 
[FLINK-20830|https://issues.apache.org/jira/browse/FLINK-20830], we introduce 
serval subclass to deal with the service build and query, This ticket is meant 
to move the related code to the proper class   (was: In the 
[FLINK-20830|https://issues.apache.org/jira/browse/FLINK-20830]we introduce 
serval subclass to deal with the service build and query, This ticket is meant 
to move the related code to the proper class )

> Port getEndpoint method to the specific service type subclass
> -
>
> Key: FLINK-26112
> URL: https://issues.apache.org/jira/browse/FLINK-26112
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> In the [FLINK-20830|https://issues.apache.org/jira/browse/FLINK-20830], we 
> introduce serval subclass to deal with the service build and query, This 
> ticket is meant to move the related code to the proper class 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-17232) Rethink the implicit behavior to use the Service externalIP as the address of the Endpoint

2022-02-14 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-17232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492049#comment-17492049
 ] 

Aitozi commented on FLINK-17232:


After some dig for the [external 
IPs|https://kubernetes.io/docs/concepts/services-networking/service/#external-ips].
  It seems only comes from the service created with the explicit declaration of 
external ip. Currently there is no usage of creating a service with external 
IP, Can we safely remove the implicit behavior which use the Service externalIP 
as the address of the Endpoint ? 

> Rethink the implicit behavior to use the Service externalIP as the address of 
> the Endpoint
> --
>
> Key: FLINK-17232
> URL: https://issues.apache.org/jira/browse/FLINK-17232
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.10.0, 1.10.1
>Reporter: Canbin Zheng
>Priority: Major
>  Labels: auto-unassigned
>
> Currently, for the LB/NodePort type Service, if we found that the 
> {{LoadBalancer}} in the {{Service}} is null, we would use the externalIPs 
> configured in the external Service as the address of the Endpoint. Again, 
> this is another implicit toleration and may confuse the users.
> This ticket proposes to rethink the implicit toleration behaviour.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-16601) Correct the way to get Endpoint address for NodePort rest Service

2022-02-14 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-16601?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17492038#comment-17492038
 ] 

Aitozi commented on FLINK-16601:


Hi [~felixzheng], I think this issue have already been solved by 
[FLINK-23507|https://issues.apache.org/jira/browse/FLINK-23507]. This one 
should be closed. cc [~wangyang0918]

> Correct the way to get Endpoint address for NodePort rest Service
> -
>
> Key: FLINK-16601
> URL: https://issues.apache.org/jira/browse/FLINK-16601
> Project: Flink
>  Issue Type: Sub-task
>  Components: Deployment / Kubernetes
>Affects Versions: 1.10.0
>Reporter: Canbin Zheng
>Priority: Major
>  Labels: auto-unassigned
>
> Currently, if one sets the type of the rest-service to {{NodePort}}, then the 
> way to get the Endpoint address is by calling the method of 
> {{KubernetesClient.getMasterUrl().getHost()}}. This solution works fine for 
> the case of the non-managed Kubernetes cluster but not for the managed ones.
> For the managed Kubernetes cluster setups, the Kubernetes masters are 
> deployed in a pool different from the Kubernetes nodes and the master node 
> does not expose a NodePort for the NodePort Service.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26112) Port getRestEndpoint method to the specific service type subclass

2022-02-14 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26112:
---
Summary: Port getRestEndpoint method to the specific service type subclass  
(was: Port getEndpoint method to the specific service type subclass)

> Port getRestEndpoint method to the specific service type subclass
> -
>
> Key: FLINK-26112
> URL: https://issues.apache.org/jira/browse/FLINK-26112
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> In the [FLINK-20830|https://issues.apache.org/jira/browse/FLINK-20830], we 
> introduce serval subclass to deal with the service build and query, This 
> ticket is meant to move the related code to the proper class 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26377) Extract Reconciler interface

2022-02-25 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498393#comment-17498393
 ] 

Aitozi edited comment on FLINK-26377 at 2/26/22, 6:35 AM:
--

I have the same sense to extract common reconciler interface. We can choose or 
create the target reconciler based on the FlinkDeployment. I volunteer to do 
this refactor.
One more further question I want to discuss: Do we need to introduce the extra 
filed like {{mode}} to reflect the mode of FlinkDeployment, for example: 
{{JOB}} and {{SESSION}} and do not depend on the {{JobSpec.}} MayBe 
{{Standalone}} mode in future.


was (Author: aitozi):
I have the same sense to extract common reconciler interface. We can choose or 
create the target reconciler based on the FlinkDeployment. I volunteer to do 
this refactor.
One more further question I want to discuss: Do we need to introduce the extra 
filed like {{mode}} to reflect the mode FlinkDeployment, for example: {{JOB}} 
and {{SESSION}} and do not depend on the {{JobSpec}}

> Extract Reconciler interface
> 
>
> Key: FLINK-26377
> URL: https://issues.apache.org/jira/browse/FLINK-26377
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Priority: Major
>
> We should extract a common interface for the different reconciler classes 
> (Job and Session for now) and create the reconciler instance on the fly based 
> on the FlinkDeployment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26377) Extract Reconciler interface

2022-02-27 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17498687#comment-17498687
 ] 

Aitozi commented on FLINK-26377:


ok, i will take a look first before work on it.

> Extract Reconciler interface
> 
>
> Key: FLINK-26377
> URL: https://issues.apache.org/jira/browse/FLINK-26377
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>Reporter: Gyula Fora
>Assignee: Aitozi
>Priority: Major
>
> We should extract a common interface for the different reconciler classes 
> (Job and Session for now) and create the reconciler instance on the fly based 
> on the FlinkDeployment.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26719) Rethink the default reschedule reconcile loop

2022-03-19 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509231#comment-17509231
 ] 

Aitozi edited comment on FLINK-26719 at 3/19/22, 9:35 AM:
--

{quote}If we do not want to provide stronger resiliency/guarantees than the 
Flink native integration in itself then I guess we do not need to check, or 
it's enough to check at larger intervals.
{quote}
I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).


was (Author: aitozi):
> If we do not want to provide stronger resiliency/guarantees than the Flink 
> native integration in itself then I guess we do not need to check, or it's 
> enough to check at larger intervals.

I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).

> Rethink the default reschedule reconcile loop
> -
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the 
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I 
> think we just need to reconcile
>  # waiting for the status change
>  # receive the new event
>  # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the 
> reconcile except waiting for the savepoint result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26719) Rethink the default reschedule reconcile loop

2022-03-19 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509231#comment-17509231
 ] 

Aitozi edited comment on FLINK-26719 at 3/19/22, 9:35 AM:
--

{quote}
If we do not want to provide stronger resiliency/guarantees than the Flink 
native integration in itself then I guess we do not need to check, or it's 
enough to check at larger intervals.
{quote}
I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).


was (Author: aitozi):
{quote}If we do not want to provide stronger resiliency/guarantees than the 
Flink native integration in itself then I guess we do not need to check, or 
it's enough to check at larger intervals.
{quote}
I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).

> Rethink the default reschedule reconcile loop
> -
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the 
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I 
> think we just need to reconcile
>  # waiting for the status change
>  # receive the new event
>  # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the 
> reconcile except waiting for the savepoint result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26719) Rethink the default reschedule reconcile loop

2022-03-19 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509231#comment-17509231
 ] 

Aitozi commented on FLINK-26719:


> If we do not want to provide stronger resiliency/guarantees than the Flink 
> native integration in itself then I guess we do not need to check, or it's 
> enough to check at larger intervals.

I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).

> Rethink the default reschedule reconcile loop
> -
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the 
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I 
> think we just need to reconcile
>  # waiting for the status change
>  # receive the new event
>  # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the 
> reconcile except waiting for the savepoint result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26737) Add CRD management in development doc

2022-03-19 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509237#comment-17509237
 ] 

Aitozi commented on FLINK-26737:


I will work on it, Please help assign this to me :).

> Add CRD management in development doc 
> --
>
> Key: FLINK-26737
> URL: https://issues.apache.org/jira/browse/FLINK-26737
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> Add the crd operation in the development doc, such as generate, upgrade and 
> so on.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26719) Rethink the default reschedule reconcile loop

2022-03-19 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509231#comment-17509231
 ] 

Aitozi edited comment on FLINK-26719 at 3/19/22, 9:36 AM:
--

{quote}If we do not want to provide stronger resiliency/guarantees than the 
Flink native integration in itself then I guess we do not need to check, or 
it's enough to check at larger intervals.
{quote}
I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).


was (Author: aitozi):
{quote}
If we do not want to provide stronger resiliency/guarantees than the Flink 
native integration in itself then I guess we do not need to check, or it's 
enough to check at larger intervals.
{quote}
I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).

> Rethink the default reschedule reconcile loop
> -
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the 
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I 
> think we just need to reconcile
>  # waiting for the status change
>  # receive the new event
>  # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the 
> reconcile except waiting for the savepoint result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Comment Edited] (FLINK-26719) Rethink the default reschedule reconcile loop

2022-03-19 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509231#comment-17509231
 ] 

Aitozi edited comment on FLINK-26719 at 3/19/22, 9:37 AM:
--

{quote}
If we do not want to provide stronger resiliency/guarantees than the Flink 
native integration in itself then I guess we do not need to check, or it's 
enough to check at larger intervals.
{quote}
I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).


was (Author: aitozi):
{quote}If we do not want to provide stronger resiliency/guarantees than the 
Flink native integration in itself then I guess we do not need to check, or 
it's enough to check at larger intervals.
{quote}
I have understood generally. In other words, we are using the reconcile loop to 
do the periodic check and plan to produce the ERROR events, Right? 

I think it's an interesting feature to explore, it may be an ability of 
monitoring or self-healing of the operator. The monitoring can use the polling 
or the informer based technique.

Thanks for your guys' explanation, Let’s go and see the evolution of this 
ability :).

> Rethink the default reschedule reconcile loop
> -
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the 
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I 
> think we just need to reconcile
>  # waiting for the status change
>  # receive the new event
>  # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the 
> reconcile except waiting for the savepoint result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-26719) Rethink the default reschedule reconcile loop

2022-03-19 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26719?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi closed FLINK-26719.
--
Resolution: Not A Problem

> Rethink the default reschedule reconcile loop
> -
>
> Key: FLINK-26719
> URL: https://issues.apache.org/jira/browse/FLINK-26719
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>
> When I test locally, I found that it will reschedule and reconcile with the 
> {{operator.reconciler.reschedule.interval.sec}} I doubt why we need this? I 
> think we just need to reconcile
>  # waiting for the status change
>  # receive the new event
>  # waiting for the savepoint result
> So when JobManagerDeploymentStatus is Ready, we do not have to trigger the 
> reconcile except waiting for the savepoint result.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26737) Add CRD management in development doc

2022-03-19 Thread Aitozi (Jira)
Aitozi created FLINK-26737:
--

 Summary: Add CRD management in development doc 
 Key: FLINK-26737
 URL: https://issues.apache.org/jira/browse/FLINK-26737
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Aitozi


Add the crd operation in the development doc, such as generate, upgrade 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26737) Add CRD management in development doc

2022-03-19 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26737?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26737:
---
Description: Add the crd operation in the development doc, such as 
generate, upgrade and so on.  (was: Add the crd operation in the development 
doc, such as generate, upgrade )

> Add CRD management in development doc 
> --
>
> Key: FLINK-26737
> URL: https://issues.apache.org/jira/browse/FLINK-26737
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> Add the crd operation in the development doc, such as generate, upgrade and 
> so on.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26776) The LocalBufferPoolDestroyTest#isInBlockingBufferRequest is fragile in different JDK environment

2022-03-21 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509806#comment-17509806
 ] 

Aitozi commented on FLINK-26776:


Then I found it has been solved by 
https://issues.apache.org/jira/browse/FLINK-24985 in master branch. Closing as 
duplicated :)

> The LocalBufferPoolDestroyTest#isInBlockingBufferRequest is fragile in 
> different JDK environment
> 
>
> Key: FLINK-26776
> URL: https://issues.apache.org/jira/browse/FLINK-26776
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 1.15.0
>Reporter: Aitozi
>Priority: Major
> Attachments: image-2022-03-21-19-53-44-689.png
>
>
> In the {{{}isInBlockingBufferRequest{}}}, it depends on the stackTrace 5 and 
> 7 show the correspond method. But in our internal JDK11 version, it's a bit 
> different, I think we should make it more flexible.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-26776) The LocalBufferPoolDestroyTest#isInBlockingBufferRequest is fragile in different JDK environment

2022-03-21 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi closed FLINK-26776.
--
Resolution: Duplicate

> The LocalBufferPoolDestroyTest#isInBlockingBufferRequest is fragile in 
> different JDK environment
> 
>
> Key: FLINK-26776
> URL: https://issues.apache.org/jira/browse/FLINK-26776
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 1.15.0
>Reporter: Aitozi
>Priority: Major
> Attachments: image-2022-03-21-19-53-44-689.png
>
>
> In the {{{}isInBlockingBufferRequest{}}}, it depends on the stackTrace 5 and 
> 7 show the correspond method. But in our internal JDK11 version, it's a bit 
> different, I think we should make it more flexible.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26784) FLIP-215: Introduce FlinkSessionJob CRD in the kubernetes operator

2022-03-21 Thread Aitozi (Jira)
Aitozi created FLINK-26784:
--

 Summary: FLIP-215: Introduce FlinkSessionJob CRD in the kubernetes 
operator
 Key: FLINK-26784
 URL: https://issues.apache.org/jira/browse/FLINK-26784
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Aitozi


https://cwiki.apache.org/confluence/display/FLINK/FLIP-215%3A+Introduce+FlinkSessionJob+CRD+in+the+kubernetes+operator



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26776) The LocalBufferPoolDestroyTest#isInBlockingBufferRequest is fragile in different JDK environment

2022-03-21 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26776:
---
Affects Version/s: 1.14.4
   (was: 1.15.0)

> The LocalBufferPoolDestroyTest#isInBlockingBufferRequest is fragile in 
> different JDK environment
> 
>
> Key: FLINK-26776
> URL: https://issues.apache.org/jira/browse/FLINK-26776
> Project: Flink
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 1.14.4
>Reporter: Aitozi
>Priority: Major
> Attachments: image-2022-03-21-19-53-44-689.png
>
>
> In the {{{}isInBlockingBufferRequest{}}}, it depends on the stackTrace 5 and 
> 7 show the correspond method. But in our internal JDK11 version, it's a bit 
> different, I think we should make it more flexible.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26785) Add FlinkSessionJob CRD

2022-03-21 Thread Aitozi (Jira)
Aitozi created FLINK-26785:
--

 Summary: Add FlinkSessionJob CRD
 Key: FLINK-26785
 URL: https://issues.apache.org/jira/browse/FLINK-26785
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Aitozi


This ticket is to add the {{FlinkSessionJob}} CRD



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26787) Implement FlinkSessionJobController and Reconciler

2022-03-21 Thread Aitozi (Jira)
Aitozi created FLINK-26787:
--

 Summary: Implement FlinkSessionJobController and Reconciler
 Key: FLINK-26787
 URL: https://issues.apache.org/jira/browse/FLINK-26787
 Project: Flink
  Issue Type: Sub-task
  Components: Kubernetes Operator
Reporter: Aitozi






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26807) The batch job not work well with Operator

2022-03-22 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510991#comment-17510991
 ] 

Aitozi commented on FLINK-26807:


Get it, I will look over that discussion, Closing this one as duplicated.

> The batch job not work well with Operator
> -
>
> Key: FLINK-26807
> URL: https://issues.apache.org/jira/browse/FLINK-26807
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> When I test the batch job or finite streaming job, the flinkdep will be an 
> orphaned resource and keep listing job after job finished. Because the 
> JobManagerDeploymentStatus will not be sync again.
> I think we should sync the global terminated status from the application job, 
> and do the clean up work for the flinkdep resource



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Closed] (FLINK-26807) The batch job not work well with Operator

2022-03-22 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi closed FLINK-26807.
--
Resolution: Duplicate

> The batch job not work well with Operator
> -
>
> Key: FLINK-26807
> URL: https://issues.apache.org/jira/browse/FLINK-26807
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> When I test the batch job or finite streaming job, the flinkdep will be an 
> orphaned resource and keep listing job after job finished. Because the 
> JobManagerDeploymentStatus will not be sync again.
> I think we should sync the global terminated status from the application job, 
> and do the clean up work for the flinkdep resource



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26871) Handle Session job spec change

2022-03-26 Thread Aitozi (Jira)
Aitozi created FLINK-26871:
--

 Summary: Handle Session job spec change 
 Key: FLINK-26871
 URL: https://issues.apache.org/jira/browse/FLINK-26871
 Project: Flink
  Issue Type: Sub-task
Reporter: Aitozi






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26870) Implement session job observer

2022-03-26 Thread Aitozi (Jira)
Aitozi created FLINK-26870:
--

 Summary: Implement session job observer
 Key: FLINK-26870
 URL: https://issues.apache.org/jira/browse/FLINK-26870
 Project: Flink
  Issue Type: Sub-task
Reporter: Aitozi






--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26871) Handle Session job spec change

2022-03-26 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17512747#comment-17512747
 ] 

Aitozi commented on FLINK-26871:


I will work on this 

> Handle Session job spec change 
> ---
>
> Key: FLINK-26871
> URL: https://issues.apache.org/jira/browse/FLINK-26871
> Project: Flink
>  Issue Type: Sub-task
>Reporter: Aitozi
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-26873) Align the helm chart version with the flink operator

2022-03-26 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-26873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-26873:
---
Component/s: Kubernetes Operator

> Align the helm chart version with the flink operator
> 
>
> Key: FLINK-26873
> URL: https://issues.apache.org/jira/browse/FLINK-26873
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> Now the flink-operator helm chart version is 1.0.13. I think it should be 
> aligned to the flink-operator version during release 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26873) Align the helm chart version with the flink operator

2022-03-26 Thread Aitozi (Jira)
Aitozi created FLINK-26873:
--

 Summary: Align the helm chart version with the flink operator
 Key: FLINK-26873
 URL: https://issues.apache.org/jira/browse/FLINK-26873
 Project: Flink
  Issue Type: Sub-task
Reporter: Aitozi


Now the flink-operator helm chart version is 1.0.13. I think it should be 
aligned to the flink-operator version during release 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-26873) Align the helm chart version with the flink operator

2022-03-26 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-26873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17512748#comment-17512748
 ] 

Aitozi commented on FLINK-26873:


cc [~gyfora] 

> Align the helm chart version with the flink operator
> 
>
> Key: FLINK-26873
> URL: https://issues.apache.org/jira/browse/FLINK-26873
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> Now the flink-operator helm chart version is 1.0.13. I think it should be 
> aligned to the flink-operator version during release 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26889) Eliminate the duplicated construct for FlinkOperatorConfiguration in test

2022-03-28 Thread Aitozi (Jira)
Aitozi created FLINK-26889:
--

 Summary: Eliminate the duplicated construct for 
FlinkOperatorConfiguration in test
 Key: FLINK-26889
 URL: https://issues.apache.org/jira/browse/FLINK-26889
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Aitozi


A minor improvement to reduce the boilerplate code



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-18356) flink-table-planner Exit code 137 returned from process

2022-03-28 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-18356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513286#comment-17513286
 ] 

Aitozi commented on FLINK-18356:


Hi, I'm running testing CI against release-1.14 locally and still meet the 
problem with exit code 137, Is release-1.14 miss some fix for it ?

> flink-table-planner Exit code 137 returned from process
> ---
>
> Key: FLINK-18356
> URL: https://issues.apache.org/jira/browse/FLINK-18356
> Project: Flink
>  Issue Type: Bug
>  Components: Build System / Azure Pipelines, Tests
>Affects Versions: 1.12.0, 1.13.0, 1.14.0, 1.15.0
>Reporter: Piotr Nowojski
>Assignee: Martijn Visser
>Priority: Critical
>  Labels: pull-request-available, test-stability
> Fix For: 1.15.0
>
> Attachments: 1234.jpg, app-profiling_4.gif
>
>
> {noformat}
> = test session starts 
> ==
> platform linux -- Python 3.7.3, pytest-5.4.3, py-1.8.2, pluggy-0.13.1
> cachedir: .tox/py37-cython/.pytest_cache
> rootdir: /__w/3/s/flink-python
> collected 568 items
> pyflink/common/tests/test_configuration.py ..[  
> 1%]
> pyflink/common/tests/test_execution_config.py ...[  
> 5%]
> pyflink/dataset/tests/test_execution_environment.py .
> ##[error]Exit code 137 returned from process: file name '/bin/docker', 
> arguments 'exec -i -u 1002 
> 97fc4e22522d2ced1f4d23096b8929045d083dd0a99a4233a8b20d0489e9bddb 
> /__a/externals/node/bin/node /__w/_temp/containerHandlerInvoker.js'.
> Finishing: Test - python
> {noformat}
> https://dev.azure.com/apache-flink/apache-flink/_build/results?buildId=3729=logs=9cada3cb-c1d3-5621-16da-0f718fb86602=8d78fe4f-d658-5c70-12f8-4921589024c3



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-26996) Break the reconcile after first create session cluster

2022-04-01 Thread Aitozi (Jira)
Aitozi created FLINK-26996:
--

 Summary: Break the reconcile after first create session cluster
 Key: FLINK-26996
 URL: https://issues.apache.org/jira/browse/FLINK-26996
 Project: Flink
  Issue Type: Bug
  Components: Kubernetes Operator
Reporter: Aitozi


When I test session cluster, I found that it will always start twice for the 
session cluster. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-20808) Remove redundant checkstyle rules

2022-04-01 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516211#comment-17516211
 ] 

Aitozi commented on FLINK-20808:


Hi [~chesnay]  sorry to bother you here, I run into a case that: I follow the 
development doc to use the save action and google-java-format to automatically 
format the code. But the formatted code can not pass the checksytyle rule 
[NewlineAtEndOfFile] But I check the unpassed code,It has the end line 
actually. Is there some bug for checkstyle or I miss some configuration for 
development?

> Remove redundant checkstyle rules
> -
>
> Key: FLINK-20808
> URL: https://issues.apache.org/jira/browse/FLINK-20808
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Chesnay Schepler
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
>
> There are probably a few checkstyle rules that are now enforced by spotless, 
> and we could remove these to clarify the responsibilities of each tool.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-20808) Remove redundant checkstyle rules

2022-04-01 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-20808:
---
Attachment: image-2022-04-02-12-46-28-065.png

> Remove redundant checkstyle rules
> -
>
> Key: FLINK-20808
> URL: https://issues.apache.org/jira/browse/FLINK-20808
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Chesnay Schepler
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
> Attachments: image-2022-04-02-12-46-11-005.png, 
> image-2022-04-02-12-46-28-065.png
>
>
> There are probably a few checkstyle rules that are now enforced by spotless, 
> and we could remove these to clarify the responsibilities of each tool.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-20808) Remove redundant checkstyle rules

2022-04-01 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-20808:
---
Attachment: image-2022-04-02-12-46-11-005.png

> Remove redundant checkstyle rules
> -
>
> Key: FLINK-20808
> URL: https://issues.apache.org/jira/browse/FLINK-20808
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Chesnay Schepler
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
> Attachments: image-2022-04-02-12-46-11-005.png, 
> image-2022-04-02-12-46-28-065.png
>
>
> There are probably a few checkstyle rules that are now enforced by spotless, 
> and we could remove these to clarify the responsibilities of each tool.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-20808) Remove redundant checkstyle rules

2022-04-01 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-20808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-20808:
---
Attachment: (was: image-2022-04-02-12-46-11-005.png)

> Remove redundant checkstyle rules
> -
>
> Key: FLINK-20808
> URL: https://issues.apache.org/jira/browse/FLINK-20808
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Chesnay Schepler
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
> Attachments: image-2022-04-02-12-46-28-065.png
>
>
> There are probably a few checkstyle rules that are now enforced by spotless, 
> and we could remove these to clarify the responsibilities of each tool.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27001) Support to specify the resource of the operator

2022-04-01 Thread Aitozi (Jira)
Aitozi created FLINK-27001:
--

 Summary: Support to specify the resource of the operator 
 Key: FLINK-27001
 URL: https://issues.apache.org/jira/browse/FLINK-27001
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Aitozi


Supporting to specify the operator resource requirements and limits



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27000) Support to set JVM args for operator

2022-04-01 Thread Aitozi (Jira)
Aitozi created FLINK-27000:
--

 Summary: Support to set JVM args for operator
 Key: FLINK-27000
 URL: https://issues.apache.org/jira/browse/FLINK-27000
 Project: Flink
  Issue Type: Improvement
  Components: Kubernetes Operator
Reporter: Aitozi


In production we often need to set the JVM option to operator



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-20808) Remove redundant checkstyle rules

2022-04-01 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-20808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516216#comment-17516216
 ] 

Aitozi commented on FLINK-20808:


No need to answer, after some search, I found it can be solved by set the IDE 
line separator to {{LF :)}}

!image-2022-04-02-12-46-28-065.png|width=354,height=85!

> Remove redundant checkstyle rules
> -
>
> Key: FLINK-20808
> URL: https://issues.apache.org/jira/browse/FLINK-20808
> Project: Flink
>  Issue Type: Technical Debt
>  Components: Build System
>Reporter: Chesnay Schepler
>Priority: Not a Priority
>  Labels: auto-deprioritized-major, auto-deprioritized-minor
> Attachments: image-2022-04-02-12-46-11-005.png, 
> image-2022-04-02-12-46-28-065.png
>
>
> There are probably a few checkstyle rules that are now enforced by spotless, 
> and we could remove these to clarify the responsibilities of each tool.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Updated] (FLINK-27028) Support to upload jar and run jar in RestClusterClient

2022-04-02 Thread Aitozi (Jira)


 [ 
https://issues.apache.org/jira/browse/FLINK-27028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aitozi updated FLINK-27028:
---
Description: 
The {{flink-kubernetes-operator}} is using the JarUpload + JarRun to support 
the session job submission. However, currently the RestClusterClient do not 
expose a way to upload the user jar to session cluster and trigger the jar run 
api. So a naked RestClient is used to achieve this, but it lacks the common 
retry logic.

Can we expose these two api the the rest cluster client to make it more 
convenient to use in the operator 

  was:
The flink-kubernetes-operator is using the JarUpload + JarRun to support the 
session job management. However, currently the RestClusterClient do not expose 
a way to upload the user jar to session cluster and trigger the jar run api. So 
I used to naked RestClient to achieve this. 

Can we expose these two api the the rest cluster client to make it more 
convenient to use in the operator


> Support to upload jar and run jar in RestClusterClient
> --
>
> Key: FLINK-27028
> URL: https://issues.apache.org/jira/browse/FLINK-27028
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission
>    Reporter: Aitozi
>Priority: Major
>
> The {{flink-kubernetes-operator}} is using the JarUpload + JarRun to support 
> the session job submission. However, currently the RestClusterClient do not 
> expose a way to upload the user jar to session cluster and trigger the jar 
> run api. So a naked RestClient is used to achieve this, but it lacks the 
> common retry logic.
> Can we expose these two api the the rest cluster client to make it more 
> convenient to use in the operator 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (FLINK-27028) Support to upload jar and run jar in RestClusterClient

2022-04-02 Thread Aitozi (Jira)
Aitozi created FLINK-27028:
--

 Summary: Support to upload jar and run jar in RestClusterClient
 Key: FLINK-27028
 URL: https://issues.apache.org/jira/browse/FLINK-27028
 Project: Flink
  Issue Type: Improvement
  Components: Client / Job Submission
Reporter: Aitozi


The flink-kubernetes-operator is using the JarUpload + JarRun to support the 
session job management. However, currently the RestClusterClient do not expose 
a way to upload the user jar to session cluster and trigger the jar run api. So 
I used to naked RestClient to achieve this. 

Can we expose these two api the the rest cluster client to make it more 
convenient to use in the operator



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27028) Support to upload jar and run jar in RestClusterClient

2022-04-02 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516448#comment-17516448
 ] 

Aitozi commented on FLINK-27028:


cc [~wangyang0918]  [~chesnay]   If no objection, I'm willing to open a pull 
request for this.

> Support to upload jar and run jar in RestClusterClient
> --
>
> Key: FLINK-27028
> URL: https://issues.apache.org/jira/browse/FLINK-27028
> Project: Flink
>  Issue Type: Improvement
>  Components: Client / Job Submission
>    Reporter: Aitozi
>Priority: Major
>
> The {{flink-kubernetes-operator}} is using the JarUpload + JarRun to support 
> the session job submission. However, currently the RestClusterClient do not 
> expose a way to upload the user jar to session cluster and trigger the jar 
> run api. So a naked RestClient is used to achieve this, but it lacks the 
> common retry logic.
> Can we expose these two api the the rest cluster client to make it more 
> convenient to use in the operator 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Commented] (FLINK-27001) Support to specify the resource of the operator

2022-04-02 Thread Aitozi (Jira)


[ 
https://issues.apache.org/jira/browse/FLINK-27001?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17516445#comment-17516445
 ] 

Aitozi commented on FLINK-27001:


It seems this can also implement by 
https://issues.apache.org/jira/browse/FLINK-26663 So I will not work on this 
right now, I will keep an eye on this.

> Support to specify the resource of the operator 
> 
>
> Key: FLINK-27001
> URL: https://issues.apache.org/jira/browse/FLINK-27001
> Project: Flink
>  Issue Type: Sub-task
>  Components: Kubernetes Operator
>    Reporter: Aitozi
>Priority: Major
>
> Supporting to specify the operator resource requirements and limits



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


<    1   2   3   4   5   6   7   8   9   >