Hi David!

This change definitely warrants a FLIP even if the code change is not huge,
there are quite some implications going forward.

Looping in @morh...@apache.org <morh...@apache.org> for this discussion.

I have some questions / suggestions regarding the condition's meaning and
naming.

In your proposal you have:
 - Ready (True/False) -> This condition is intended for resources which are
fully ready and operational
 - Error (True) -> This condition can be used in scenarios where any
exception/error during resource reconcile process

The problem with the above is that the implementation does not well reflect
this. ResourceLifecycleState STABLE/ROLLED_BACK does not actually mean the
job is running, it just means that the resource is fully reconciled and it
will not be rolled back (so the current pending upgrade is completed). This
is mainly a fault of the ResourceLifecycleState as it doesn't capture the
job status but one could argue that it was "designed" this way.

I think we should probably have more condition types to capture the
difference:
 - JobReady (True/False) -> Flink job is running (Basically job status but
with transition time)
 - ClusterReady (True/False) -> Session / Application cluster is deployed
(Basically JM deployment status but with transition time)
-  UpgradeCompleted (True/False) -> Similar to what you call Ready now
which should correspond to the STABLE/ROLLED_BACK states and mostly tracks
in-progress CR updates

This is my best idea at the moment, not great as it feels a little
redundant with the current status fields. But maybe thats not a problem or
a way to eliminate the old fields later?

I am not so sure of the Error status and what this means in practice. Why
do we want to track the last error in 2 places? It's already in the status.

What do you think?
Gyula

On Wed, May 29, 2024 at 3:55 PM David Radley <david_rad...@uk.ibm.com>
wrote:

> Hi,
> Thanks Lajith for raising this discussion thread under the Flip title.
>
> To summarise the concerns from the other discussion thread.
>
> “
> - I echo Gyula that including some examples and further explanations might
> ease reader's work. With the current version, the FLIP is a bit hard to
> follow. - Will the usage of Conditions be enabled by default? Or will there
> be any disadvantages for Flink users? If Conditions with the same type
> already exist in the Status Conditions
>
> - Do you think we should have clear rules about handling rules for how
> these Conditions should be managed, especially when multiple Conditions of
> the same type are present? For example, resource has multiple causes for
> the same condition (e.g., Error due to network and Error due to I/O). Then,
> overriding the old condition with the new one is not the best approach no?
> Please correct me if I misunderstood.
> “
>
> I see the Google doc link has been reformatted to match the Flip template.
>
> To explicitly answer the questions from Jeyhun and Gyula:
> - “Will the usage of Conditions be enabled by default?” Yes, but this is
> just making the status content useful, whereas before it was not useful.
> - in terms of examples, I am not sure what you would like to see, the
> table Lajith provided shows the status for various ResourceLifecycleStates.
> How the operator gets into these states is the current behaviour. The
> change just shows the appropriate corresponding high level status – that
> could be shown on the User Interfaces.
> - “will there be any disadvantages for Flink users?” None , there is just
> more information in the status, without this it is more difficult to work
> out the status of the job.
> - Multiple conditions question. The status is showing whether the job is
> ready or not, so as long as the last condition is the one that is shown -
> all is as expected. I don’t think this needs rules for precedence and the
> like.
> - The condition’s Reason is going to be more specific.
>
> Gyula and Jeyhun, is the google doc clear enough for you now? Do you feel
> you feedback has been addressed? Lajith and I are happy to provide more
> details.
>
> I wonder whether this change is big enough to warrant a Flip, as it is so
> small. We could do this in an issue. WDYT?
>
> Kind regards, David.
>
>
> From: Lajith Koova <lajith...@gmail.com>
> Date: Wednesday, 29 May 2024 at 13:41
> To: dev@flink.apache.org <dev@flink.apache.org>
> Subject: [EXTERNAL] [DISCUSS] FLIP-XXX Add K8S conditions to Flink CRD
> Hello ,
>
>
> Discussion thread here:
> https://lists.apache.org/thread/dvy8w17pyjv68c3t962w49frl9odoz4z  to
> discuss a proposal to add Conditions field in the CR status of Flink
> Deployment and FlinkSessionJob.
>
>
> Note : Starting this new thread as discussion thread title has been
> modified to follow the FLIP process.
>
>
> Thank you.
>
> Unless otherwise stated above:
>
> IBM United Kingdom Limited
> Registered in England and Wales with number 741598
> Registered office: PO Box 41, North Harbour, Portsmouth, Hants. PO6 3AU
>

Reply via email to