Re: [DISCUSS] Flink Kubernetes Operator controller flow

Gyula Fóra Mon, 28 Feb 2022 21:53:41 -0800

And just one more thing I forgot to add:

The observer does not have to be a dummy monolithic logic. Different job
types will have different state machines with different observe
requirements and different observer implementations. I think this is
complexly normal. The observer can be aware of the expected state and what
it should observe on the happy path.


Gyula

On Tue, 1 Mar 2022 at 06:42, Gyula Fóra <gyula.f...@gmail.com> wrote:

> @Thomas:
>
> Thanks for the input! I completely agree with a well principled state
> machine based approach in the operator would be the best.
>
> You are right that the code contains mostly if then else based logic at
> the moment as it evolved after initial prototype to cover more scenarios.
>
> However I think the self contained observer - reconciler approach is very
> capable of implementing the suggested state machine based view in a very
> elegant way.
>
> Simply put, the observer is responsible for determining and recording the
> current state and any extra information that comes with that state.
>
> The reconciler then look at the current vs desired state and execute a
> state transition.
>
> Gyula
>
> On Tue, 1 Mar 2022 at 02:16, Thomas Weise <t...@apache.org> wrote:
>
>> Thanks for bringing the discussion. It's a good time to revisit this
>> area as the operator implementation becomes more complex.
>>
>> I think the most important part of the design are well defined states
>> and transitions. Unless that can be reasoned about, there will be
>> confusion. For example:
>>
>> 1) For job upgrade, it wasn't clear in which state reconciliation
>> should happen and as a result the implementation ended up incomplete.
>> 2) "JobStatusObserver" would attempt to list jobs before the REST API
>> is ready. And it would be used in session mode, even though the
>> session mode needs a different state machine.
>>
>> The implementation currently contains a lot of if-then-else logic,
>> which is hard to follow and difficult to maintain. It will make it
>> harder to introduce new states that would be necessary to implement
>> more advanced upgrade strategies, amongst others.
>>
>> Did you consider a state machine centric approach? See [1] for example.
>>
>> As Biao mentioned, observe -> reconcile may repeat and different
>> states will require different checks. Once a job (in job mode) is
>> running, there is no need to check the JM deployment etc. A monolithic
>> observer may not work so well for that. Rather, I think different
>> states have different monitoring needs that inform transitions,
>> including the actual state and changes made to the CR.
>>
>> It would also be good if the current state is directly reflected in
>> the CR status so that it is easier to check where the deployment is
>> at.
>>
>> Cheers,
>> Thomas
>>
>>
>> [1]
>> https://github.com/lyft/flinkk8soperator/blob/master/docs/state_machine.md
>>
>> On Mon, Feb 28, 2022 at 8:16 AM Gyula Fóra <gyula.f...@gmail.com> wrote:
>> >
>> > Hi All!
>> >
>> > Thank you for the feedback, I agree with what has been proposed to
>> include
>> > as much as possible in the actual resource status and make the
>> reconciler
>> > completely independent from the observer.
>> >
>> > @Biao Geng:
>> > Validation could depend on the current status (depending on how we
>> > implement the validation logic) so it might always be necessary (and it
>> is
>> > also cheap).
>> > What you are saying with multiple observe -> reconcile cycles ties back
>> to
>> > what Matyas said, that we should probably have an Observe loop until we
>> > have a stable state ready for reconciliation, then reconcile once.
>> >
>> > So probably validate -> observe until stable -> reconcile -> observe
>> until
>> > stable
>> >
>> > Cheers,
>> > Gyula
>> >
>> > On Mon, Feb 28, 2022 at 4:49 PM Biao Geng <biaoge...@gmail.com> wrote:
>> >
>> > > Hi Gyula,
>> > > Thanks for the discussion. It also makes senses to me on the
>> separation of
>> > > 3 components and Yang's proposal.
>> > > Just 1 follow-up thought after checking your PR: in the reconcile()
>> > > method of controller, IIUC, the real flow could be
>> > > `validate->observe->reconcile->validate->observe->reconcile...". The
>> > > validation phase seems to be required only when the creation of the
>> job
>> > > cluster and the upgrade of config. For phases like waiting the JM from
>> > > deploying to ready, it is not mandatory and thus the flow can look
>> like
>> > > `validate->observe->reconcile->optional validate due to current
>> > > state->observe->reconcile...`
>> > >
>> > > Őrhidi Mátyás <matyas.orh...@gmail.com> 于2022年2月28日周一 21:26写道：
>> > >
>> > > > It is worth looking at the controller code in the spotify operator
>> too:
>> > > >
>> > > >
>> > >
>> https://github.com/spotify/flink-on-k8s-operator/blob/master/controllers/flinkcluster/flinkcluster_controller.go
>> > > >
>> > > > It is looping in the 'observer phase' until it reaches a stable
>> state,
>> > > then
>> > > > it performs the necessary changes.
>> > > >
>> > > > Based on this I also suggest keeping the logic in separate
>> > > > modules(Validate->Observe->Reconcile). The three components might
>> not
>> > > > even be enough as we add more and more complexity to the code.
>> > > >
>> > > > Cheers,
>> > > > Matyas
>> > > >
>> > > >
>> > > > On Mon, Feb 28, 2022 at 2:03 PM Aitozi <gjying1...@gmail.com>
>> wrote:
>> > > >
>> > > > > Hi, Gyula
>> > > > >       Thanks for driving this discussion. I second Yang Wang's
>> idea
>> > > that
>> > > > > it's better to make the `validator`, `observer` and `reconciler`
>> > > > > self-contained. I also prefer to define the `Observer` as an
>> interface
>> > > > and
>> > > > > we could define the statuses that `Observer` will expose. It acts
>> like
>> > > > the
>> > > > > observer protocol between the `Observer` and `Reconciler`.
>> > > > >
>> > > > > Best,
>> > > > > Aitozi.
>> > > > >
>> > > > > Yang Wang <danrtsey...@gmail.com> 于2022年2月28日周一 16:28写道：
>> > > > >
>> > > > > > Thanks for posting the discussion here.
>> > > > > >
>> > > > > >
>> > > > > > Having the components `validator` `observer` `reconciler` makes
>> lots
>> > > of
>> > > > > > sense. And the "Validate -> Observe -> Reconcile"
>> > > > > > flow seems natural to me.
>> > > > > >
>> > > > > > Regarding the implementation in the PR, instead of directly
>> using the
>> > > > > > observer in the reconciler, I lean to let the observer
>> > > > > > exports the results to the status(e.g. jobmanager deployment
>> status,
>> > > > rest
>> > > > > > port readiness, flink jobs status, etc.) and
>> > > > > > the reconciler reads it from the status. Then each component is
>> more
>> > > > > > self-contained and the boundary will be clearer.
>> > > > > >
>> > > > > >
>> > > > > > Best,
>> > > > > > Yang
>> > > > > >
>> > > > > > Gyula Fóra <gyf...@apache.org> 于2022年2月28日周一 16:01写道：
>> > > > > >
>> > > > > > > Hi All!
>> > > > > > >
>> > > > > > > I would like to start a discussion thread regarding the
>> structure
>> > > of
>> > > > > > > the Kubernetes
>> > > > > > > Operator
>> > > > > > > <
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>> https://github.com/apache/flink-kubernetes-operator/blob/main/flink-kubernetes-operator/src/main/java/org/apache/flink/kubernetes/operator/controller/FlinkDeploymentController.java
>> > > > > > > >
>> > > > > > > controller
>> > > > > > > flow. Based on some recent PR discussions we have no clear
>> > > consensus
>> > > > on
>> > > > > > the
>> > > > > > > structure and the expectations which can potentially lead to
>> back
>> > > and
>> > > > > > forth
>> > > > > > > changes and unnecessary complexity.
>> > > > > > >
>> > > > > > > *Background*
>> > > > > > > In the initial prototype we had a very basic flow:
>> > > > > > >  1. Observe flink job status
>> > > > > > >  2. (if observation successful) reconcile changes
>> > > > > > >  3. Reschedule reconcile with success/error
>> > > > > > >
>> > > > > > > This basic prototype flow could not cover all requirements
>> and did
>> > > > not
>> > > > > > > allow for things like waiting until Jobmanager deployment is
>> ready
>> > > > etc.
>> > > > > > >
>> > > > > > > To solve these shortcomings, some changes were introduced
>> recently
>> > > > here
>> > > > > > > <https://github.com/apache/flink-kubernetes-operator/pull/21
>> >.
>> > > While
>> > > > > > this
>> > > > > > > change introduced many improvements and safeguards it also
>> > > completely
>> > > > > > > changed the original controller flow. Now the reconciler is
>> > > > responsible
>> > > > > > for
>> > > > > > > ensuring that it can actually reconcile by checking the
>> deployment
>> > > > and
>> > > > > > > ports. The job status observation logic has also been moved
>> into
>> > > the
>> > > > > > actual
>> > > > > > > reconcile logic.
>> > > > > > >
>> > > > > > >
>> > > > > > > *Discussion Question*What controller flow would we like to
>> have? Do
>> > > > we
>> > > > > > want
>> > > > > > > to separate the observer from the reconciler or keep them
>> together?
>> > > > > > >
>> > > > > > > In my personal view, we should try to adopt a very simple
>> flow to
>> > > > make
>> > > > > > the
>> > > > > > > operator clean and modular. If possible I would like to
>> restore the
>> > > > > > > original flow with some modifications:
>> > > > > > >
>> > > > > > >  1. Validate deployment object
>> > > > > > >  2. Observe deployment and flink job status -> Return
>> comprehensive
>> > > > > > status
>> > > > > > > info
>> > > > > > >  3. Reconcile deployment based on observed status and resource
>> > > > changes
>> > > > > > >  (Both 2/3 should be able to reschedule immediately if
>> necessary)
>> > > > > > >
>> > > > > > > I think the Observer component should be able to describe the
>> > > current
>> > > > > > > status of the deployment objects and the flink job to the
>> extent
>> > > that
>> > > > > the
>> > > > > > > reconciler can work with that information alone. If we do it
>> this
>> > > > way,
>> > > > > we
>> > > > > > > can also use the status information that the observer
>> provides to
>> > > > > produce
>> > > > > > > other events and aid operations like shutdown which depend on
>> the
>> > > > > current
>> > > > > > > deployment status.
>> > > > > > >
>> > > > > > > I think this would satisfy our needs, but I might be missing
>> > > > something
>> > > > > > that
>> > > > > > > cannot be done if we structure the code this way.
>> > > > > > >
>> > > > > > > I have a PR open
>> > > > > > > <
>> > > https://github.com/apache/flink-kubernetes-operator/pull/26/commits
>> > > > >
>> > > > > > > which
>> > > > > > > includes some of these proposed changes (as the optional
>> second
>> > > > commit)
>> > > > > > so
>> > > > > > > that you can easily compare with the current state of the
>> operator.
>> > > > > > >
>> > > > > > > Please let us know what we think!
>> > > > > > >
>> > > > > > > Cheers,
>> > > > > > > Gyula
>> > > > > > >
>> > > > > >
>> > > > >
>> > > >
>> > >
>>
>

Re: [DISCUSS] Flink Kubernetes Operator controller flow

Reply via email to