Thanks for posting the discussion here.

Having the components `validator` `observer` `reconciler` makes lots of
sense. And the "Validate -> Observe -> Reconcile"
flow seems natural to me.

Regarding the implementation in the PR, instead of directly using the
observer in the reconciler, I lean to let the observer
exports the results to the status(e.g. jobmanager deployment status, rest
port readiness, flink jobs status, etc.) and
the reconciler reads it from the status. Then each component is more
self-contained and the boundary will be clearer.


Best,
Yang

Gyula Fóra <gyf...@apache.org> 于2022年2月28日周一 16:01写道:

> Hi All!
>
> I would like to start a discussion thread regarding the structure of
> the Kubernetes
> Operator
> <
> https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java
> >
> controller
> flow. Based on some recent PR discussions we have no clear consensus on the
> structure and the expectations which can potentially lead to back and forth
> changes and unnecessary complexity.
>
> *Background*
> In the initial prototype we had a very basic flow:
>  1. Observe flink job status
>  2. (if observation successful) reconcile changes
>  3. Reschedule reconcile with success/error
>
> This basic prototype flow could not cover all requirements and did not
> allow for things like waiting until Jobmanager deployment is ready etc.
>
> To solve these shortcomings, some changes were introduced recently here
> <https://github.com/apache/flink-kubernetes-operator/pull/21>. While this
> change introduced many improvements and safeguards it also completely
> changed the original controller flow. Now the reconciler is responsible for
> ensuring that it can actually reconcile by checking the deployment and
> ports. The job status observation logic has also been moved into the actual
> reconcile logic.
>
>
> *Discussion Question*What controller flow would we like to have? Do we want
> to separate the observer from the reconciler or keep them together?
>
> In my personal view, we should try to adopt a very simple flow to make the
> operator clean and modular. If possible I would like to restore the
> original flow with some modifications:
>
>  1. Validate deployment object
>  2. Observe deployment and flink job status -> Return comprehensive status
> info
>  3. Reconcile deployment based on observed status and resource changes
>  (Both 2/3 should be able to reschedule immediately if necessary)
>
> I think the Observer component should be able to describe the current
> status of the deployment objects and the flink job to the extent that the
> reconciler can work with that information alone. If we do it this way, we
> can also use the status information that the observer provides to produce
> other events and aid operations like shutdown which depend on the current
> deployment status.
>
> I think this would satisfy our needs, but I might be missing something that
> cannot be done if we structure the code this way.
>
> I have a PR open
> <https://github.com/apache/flink-kubernetes-operator/pull/26/commits>
> which
> includes some of these proposed changes (as the optional second commit) so
> that you can easily compare with the current state of the operator.
>
> Please let us know what we think!
>
> Cheers,
> Gyula
>

Reply via email to