Re: [DISCUSS] Flink Kubernetes Operator controller flow

Gyula Fóra Tue, 01 Mar 2022 07:16:08 -0800

Let's keep this discussion for another day so everyone can still chip in.

In the meantime I have opened a JIRA ticket
<https://issues.apache.org/jira/browse/FLINK-26432> for this.
Also I invite you to review this PR
<https://github.com/apache/flink-kubernetes-operator/pull/26> , it is ready
with the base changes according to the discussion.


Once we have this merged it will be easier to parallelize work on the state
machine/observation/reconcile logic.

Gyula

On Tue, Mar 1, 2022 at 11:24 AM Gyula Fóra <gyula.f...@gmail.com> wrote:

> Hi All!
>
> Based on your ideas and suggestions I have made some improvements to the
> refactor proposal PR:
> https://github.com/apache/flink-kubernetes-operator/pull/26/commits
>
> I have now completely decoupled the observer from the reconciler and the
> JobManagerDeploymentStatus is recorded in the FlinkDeployment resource
> status. This greatly simplifies the flow and eliminates the need for any
> caching.
>
> I haven't touched the reconciliation logic here, and did not introduce a
> state machine to avoid making this change even bigger. The state machine
> changes could be built on top of this as a second stage of the refactoring
> work after this.
>
> I will work on improving the tests to capture and validate the flow better.
>
> Let me know what you think and if you have any ideas to further improve
> the high level flow!
>
> Thanks
> Gyula
>
> On Tue, Mar 1, 2022 at 9:07 AM Yang Wang <danrtsey...@gmail.com> wrote:
>
>> The state machine is going to replace the annoying if-else in the
>> reconciler.
>> It seems to have no conflicts with modular mechanism(aka observer,
>> reconciler, etc.).
>> And we could make them happen step by step.
>>
>>
>> Best,
>> Yang
>>
>>
>> Gyula Fóra <gyula.f...@gmail.com> 于2022年3月1日周二 13:53写道：
>>
>> > And just one more thing I forgot to add:
>> >
>> > The observer does not have to be a dummy monolithic logic. Different job
>> > types will have different state machines with different observe
>> > requirements and different observer implementations. I think this is
>> > complexly normal. The observer can be aware of the expected state and
>> what
>> > it should observe on the happy path.
>> >
>> > Gyula
>> >
>> > On Tue, 1 Mar 2022 at 06:42, Gyula Fóra <gyula.f...@gmail.com> wrote:
>> >
>> > > @Thomas:
>> > >
>> > > Thanks for the input! I completely agree with a well principled state
>> > > machine based approach in the operator would be the best.
>> > >
>> > > You are right that the code contains mostly if then else based logic
>> at
>> > > the moment as it evolved after initial prototype to cover more
>> scenarios.
>> > >
>> > > However I think the self contained observer - reconciler approach is
>> very
>> > > capable of implementing the suggested state machine based view in a
>> very
>> > > elegant way.
>> > >
>> > > Simply put, the observer is responsible for determining and recording
>> the
>> > > current state and any extra information that comes with that state.
>> > >
>> > > The reconciler then look at the current vs desired state and execute a
>> > > state transition.
>> > >
>> > > Gyula
>> > >
>> > > On Tue, 1 Mar 2022 at 02:16, Thomas Weise <t...@apache.org> wrote:
>> > >
>> > >> Thanks for bringing the discussion. It's a good time to revisit this
>> > >> area as the operator implementation becomes more complex.
>> > >>
>> > >> I think the most important part of the design are well defined states
>> > >> and transitions. Unless that can be reasoned about, there will be
>> > >> confusion. For example:
>> > >>
>> > >> 1) For job upgrade, it wasn't clear in which state reconciliation
>> > >> should happen and as a result the implementation ended up incomplete.
>> > >> 2) "JobStatusObserver" would attempt to list jobs before the REST API
>> > >> is ready. And it would be used in session mode, even though the
>> > >> session mode needs a different state machine.
>> > >>
>> > >> The implementation currently contains a lot of if-then-else logic,
>> > >> which is hard to follow and difficult to maintain. It will make it
>> > >> harder to introduce new states that would be necessary to implement
>> > >> more advanced upgrade strategies, amongst others.
>> > >>
>> > >> Did you consider a state machine centric approach? See [1] for
>> example.
>> > >>
>> > >> As Biao mentioned, observe -> reconcile may repeat and different
>> > >> states will require different checks. Once a job (in job mode) is
>> > >> running, there is no need to check the JM deployment etc. A
>> monolithic
>> > >> observer may not work so well for that. Rather, I think different
>> > >> states have different monitoring needs that inform transitions,
>> > >> including the actual state and changes made to the CR.
>> > >>
>> > >> It would also be good if the current state is directly reflected in
>> > >> the CR status so that it is easier to check where the deployment is
>> > >> at.
>> > >>
>> > >> Cheers,
>> > >> Thomas
>> > >>
>> > >>
>> > >> [1]
>> > >>
>> >
>> https://github.com/lyft/flinkk8soperator/blob/master/docs/state_machine.md
>> > >>
>> > >> On Mon, Feb 28, 2022 at 8:16 AM Gyula Fóra <gyula.f...@gmail.com>
>> > wrote:
>> > >> >
>> > >> > Hi All!
>> > >> >
>> > >> > Thank you for the feedback, I agree with what has been proposed to
>> > >> include
>> > >> > as much as possible in the actual resource status and make the
>> > >> reconciler
>> > >> > completely independent from the observer.
>> > >> >
>> > >> > @Biao Geng:
>> > >> > Validation could depend on the current status (depending on how we
>> > >> > implement the validation logic) so it might always be necessary
>> (and
>> > it
>> > >> is
>> > >> > also cheap).
>> > >> > What you are saying with multiple observe -> reconcile cycles ties
>> > back
>> > >> to
>> > >> > what Matyas said, that we should probably have an Observe loop
>> until
>> > we
>> > >> > have a stable state ready for reconciliation, then reconcile once.
>> > >> >
>> > >> > So probably validate -> observe until stable -> reconcile ->
>> observe
>> > >> until
>> > >> > stable
>> > >> >
>> > >> > Cheers,
>> > >> > Gyula
>> > >> >
>> > >> > On Mon, Feb 28, 2022 at 4:49 PM Biao Geng <biaoge...@gmail.com>
>> > wrote:
>> > >> >
>> > >> > > Hi Gyula,
>> > >> > > Thanks for the discussion. It also makes senses to me on the
>> > >> separation of
>> > >> > > 3 components and Yang's proposal.
>> > >> > > Just 1 follow-up thought after checking your PR: in the
>> reconcile()
>> > >> > > method of controller, IIUC, the real flow could be
>> > >> > > `validate->observe->reconcile->validate->observe->reconcile...".
>> The
>> > >> > > validation phase seems to be required only when the creation of
>> the
>> > >> job
>> > >> > > cluster and the upgrade of config. For phases like waiting the JM
>> > from
>> > >> > > deploying to ready, it is not mandatory and thus the flow can
>> look
>> > >> like
>> > >> > > `validate->observe->reconcile->optional validate due to current
>> > >> > > state->observe->reconcile...`
>> > >> > >
>> > >> > > Őrhidi Mátyás <matyas.orh...@gmail.com> 于2022年2月28日周一 21:26写道：
>> > >> > >
>> > >> > > > It is worth looking at the controller code in the spotify
>> operator
>> > >> too:
>> > >> > > >
>> > >> > > >
>> > >> > >
>> > >>
>> >
>> https://github.com/spotify/flink-on-k8s-operator/blob/master/controllers/flinkcluster/flinkcluster_controller.go
>> > >> > > >
>> > >> > > > It is looping in the 'observer phase' until it reaches a stable
>> > >> state,
>> > >> > > then
>> > >> > > > it performs the necessary changes.
>> > >> > > >
>> > >> > > > Based on this I also suggest keeping the logic in separate
>> > >> > > > modules(Validate->Observe->Reconcile). The three components
>> might
>> > >> not
>> > >> > > > even be enough as we add more and more complexity to the code.
>> > >> > > >
>> > >> > > > Cheers,
>> > >> > > > Matyas
>> > >> > > >
>> > >> > > >
>> > >> > > > On Mon, Feb 28, 2022 at 2:03 PM Aitozi <gjying1...@gmail.com>
>> > >> wrote:
>> > >> > > >
>> > >> > > > > Hi, Gyula
>> > >> > > > >       Thanks for driving this discussion. I second Yang
>> Wang's
>> > >> idea
>> > >> > > that
>> > >> > > > > it's better to make the `validator`, `observer` and
>> `reconciler`
>> > >> > > > > self-contained. I also prefer to define the `Observer` as an
>> > >> interface
>> > >> > > > and
>> > >> > > > > we could define the statuses that `Observer` will expose. It
>> > acts
>> > >> like
>> > >> > > > the
>> > >> > > > > observer protocol between the `Observer` and `Reconciler`.
>> > >> > > > >
>> > >> > > > > Best,
>> > >> > > > > Aitozi.
>> > >> > > > >
>> > >> > > > > Yang Wang <danrtsey...@gmail.com> 于2022年2月28日周一 16:28写道：
>> > >> > > > >
>> > >> > > > > > Thanks for posting the discussion here.
>> > >> > > > > >
>> > >> > > > > >
>> > >> > > > > > Having the components `validator` `observer` `reconciler`
>> > makes
>> > >> lots
>> > >> > > of
>> > >> > > > > > sense. And the "Validate -> Observe -> Reconcile"
>> > >> > > > > > flow seems natural to me.
>> > >> > > > > >
>> > >> > > > > > Regarding the implementation in the PR, instead of directly
>> > >> using the
>> > >> > > > > > observer in the reconciler, I lean to let the observer
>> > >> > > > > > exports the results to the status(e.g. jobmanager
>> deployment
>> > >> status,
>> > >> > > > rest
>> > >> > > > > > port readiness, flink jobs status, etc.) and
>> > >> > > > > > the reconciler reads it from the status. Then each
>> component
>> > is
>> > >> more
>> > >> > > > > > self-contained and the boundary will be clearer.
>> > >> > > > > >
>> > >> > > > > >
>> > >> > > > > > Best,
>> > >> > > > > > Yang
>> > >> > > > > >
>> > >> > > > > > Gyula Fóra <gyf...@apache.org> 于2022年2月28日周一 16:01写道：
>> > >> > > > > >
>> > >> > > > > > > Hi All!
>> > >> > > > > > >
>> > >> > > > > > > I would like to start a discussion thread regarding the
>> > >> structure
>> > >> > > of
>> > >> > > > > > > the Kubernetes
>> > >> > > > > > > Operator
>> > >> > > > > > > <
>> > >> > > > > > >
>> > >> > > > > >
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >>
>> >
>> https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java
>> > >> > > > > > > >
>> > >> > > > > > > controller
>> > >> > > > > > > flow. Based on some recent PR discussions we have no
>> clear
>> > >> > > consensus
>> > >> > > > on
>> > >> > > > > > the
>> > >> > > > > > > structure and the expectations which can potentially
>> lead to
>> > >> back
>> > >> > > and
>> > >> > > > > > forth
>> > >> > > > > > > changes and unnecessary complexity.
>> > >> > > > > > >
>> > >> > > > > > > *Background*
>> > >> > > > > > > In the initial prototype we had a very basic flow:
>> > >> > > > > > >  1. Observe flink job status
>> > >> > > > > > >  2. (if observation successful) reconcile changes
>> > >> > > > > > >  3. Reschedule reconcile with success/error
>> > >> > > > > > >
>> > >> > > > > > > This basic prototype flow could not cover all
>> requirements
>> > >> and did
>> > >> > > > not
>> > >> > > > > > > allow for things like waiting until Jobmanager
>> deployment is
>> > >> ready
>> > >> > > > etc.
>> > >> > > > > > >
>> > >> > > > > > > To solve these shortcomings, some changes were introduced
>> > >> recently
>> > >> > > > here
>> > >> > > > > > > <
>> > https://github.com/apache/flink-kubernetes-operator/pull/21
>> > >> >.
>> > >> > > While
>> > >> > > > > > this
>> > >> > > > > > > change introduced many improvements and safeguards it
>> also
>> > >> > > completely
>> > >> > > > > > > changed the original controller flow. Now the reconciler
>> is
>> > >> > > > responsible
>> > >> > > > > > for
>> > >> > > > > > > ensuring that it can actually reconcile by checking the
>> > >> deployment
>> > >> > > > and
>> > >> > > > > > > ports. The job status observation logic has also been
>> moved
>> > >> into
>> > >> > > the
>> > >> > > > > > actual
>> > >> > > > > > > reconcile logic.
>> > >> > > > > > >
>> > >> > > > > > >
>> > >> > > > > > > *Discussion Question*What controller flow would we like
>> to
>> > >> have? Do
>> > >> > > > we
>> > >> > > > > > want
>> > >> > > > > > > to separate the observer from the reconciler or keep them
>> > >> together?
>> > >> > > > > > >
>> > >> > > > > > > In my personal view, we should try to adopt a very simple
>> > >> flow to
>> > >> > > > make
>> > >> > > > > > the
>> > >> > > > > > > operator clean and modular. If possible I would like to
>> > >> restore the
>> > >> > > > > > > original flow with some modifications:
>> > >> > > > > > >
>> > >> > > > > > >  1. Validate deployment object
>> > >> > > > > > >  2. Observe deployment and flink job status -> Return
>> > >> comprehensive
>> > >> > > > > > status
>> > >> > > > > > > info
>> > >> > > > > > >  3. Reconcile deployment based on observed status and
>> > resource
>> > >> > > > changes
>> > >> > > > > > >  (Both 2/3 should be able to reschedule immediately if
>> > >> necessary)
>> > >> > > > > > >
>> > >> > > > > > > I think the Observer component should be able to describe
>> > the
>> > >> > > current
>> > >> > > > > > > status of the deployment objects and the flink job to the
>> > >> extent
>> > >> > > that
>> > >> > > > > the
>> > >> > > > > > > reconciler can work with that information alone. If we
>> do it
>> > >> this
>> > >> > > > way,
>> > >> > > > > we
>> > >> > > > > > > can also use the status information that the observer
>> > >> provides to
>> > >> > > > > produce
>> > >> > > > > > > other events and aid operations like shutdown which
>> depend
>> > on
>> > >> the
>> > >> > > > > current
>> > >> > > > > > > deployment status.
>> > >> > > > > > >
>> > >> > > > > > > I think this would satisfy our needs, but I might be
>> missing
>> > >> > > > something
>> > >> > > > > > that
>> > >> > > > > > > cannot be done if we structure the code this way.
>> > >> > > > > > >
>> > >> > > > > > > I have a PR open
>> > >> > > > > > > <
>> > >> > >
>> https://github.com/apache/flink-kubernetes-operator/pull/26/commits
>> > >> > > > >
>> > >> > > > > > > which
>> > >> > > > > > > includes some of these proposed changes (as the optional
>> > >> second
>> > >> > > > commit)
>> > >> > > > > > so
>> > >> > > > > > > that you can easily compare with the current state of the
>> > >> operator.
>> > >> > > > > > >
>> > >> > > > > > > Please let us know what we think!
>> > >> > > > > > >
>> > >> > > > > > > Cheers,
>> > >> > > > > > > Gyula
>> > >> > > > > > >
>> > >> > > > > >
>> > >> > > > >
>> > >> > > >
>> > >> > >
>> > >>
>> > >
>> >
>>
>

Re: [DISCUSS] Flink Kubernetes Operator controller flow

Reply via email to