Thank you all for the great discussion. > 2. I agree with you that "completed" is not very clear, but I would suggest the name "alreadyExists". WDYT?
I'm fine with "alreadyExists" and for 3. also with "backoffLimit". > really like the idea of having something like Pod Conditions, but I think it wouldn't add too much value here Okay, let's leave it out. I'm happy to start a VOTE and +1 the FLIP as is. Looking forward to having this feature in the operator! On Sat, Apr 20, 2024 at 4:31 PM Mate Czagany <czmat...@gmail.com> wrote: > Hi, > > Thanks for your comments, Gyula, I really appreciate it! > > I have updated the following things in the FLIP, please comment on these > changes if you have any suggestions or concerns: > - Added path field to FlinkStateSnapshotReference > - Added two examples at the bottom. > - Added error handling section and the new fields associated > ("backoffLimit" and "failures") to the interfaces. > - Renamed field "completed" to "alreadyExists". > > Regarding the separate resources, I don't think that any of the two > solutions would bring too much (dis)advantage to the table, so I am still > neutral, and waiting for others to chime in as well with their thoughts and > feedback! > > Regards, > Mate > > > Gyula Fóra <gyula.f...@gmail.com> ezt írta (időpont: 2024. ápr. 19., P, > 21:43): > > > Hey! > > > > Regarding the question of initialSavepointPath and > > flinkStateSnapshotReference new object, I think we could simply keep an > > extra field as part of the flinkStateSnapshotReference object called > path. > > > > Then the fields could be: > > namespace, name, path > > > > If path is defined we would use that (to support the simple way also) > > otherwise use the resource. I would still deprecate the > > initialSavepointPath field in the jobSpec. > > > > Regarding the Savepoint/Checkpoint vs FlinkStateSnapshot. > > What we need: > > 1. Easy way to list all state snapshots (to select latest) > > 2. Easy way to reference a savepoint/checkpoint from a jobspec > > 3. Differentiate state snapshot types (in some cases users may prefer to > > use checkpoint/savepoint for certain upgrades) -> we should add a label > or > > something for easy selection > > 4. Be able to delete savepoints (and checkpoints maybe) > > > > I am personally slightly more in favor of having a single resource as > that > > ticks all the boxes, while having 2 separate resources will make both > > listing and referencing harder. We would have to introduce state type in > > the reference (name + namespace would not be enough to uniquely identify > a > > state snapshot) > > > > I wonder if I am missing any good argument against the single > > FlinkStateSnapshot here. > > > > Cheers, > > Gyula > > > > > > On Fri, Apr 19, 2024 at 9:09 PM Mate Czagany <czmat...@gmail.com> wrote: > > > >> Hi Robert and Thomas, > >> > >> Thank you for sharing your thoughts, I will try to address your > questions > >> and suggestions: > >> > >> 1. I would really love to hear others' inputs as well about separating > >> the snapshot CRD into two different CRDs instead for savepoints and > >> checkpoints. I think the main upside is that we would not need the > >> mandatory savepoint or checkpoint field inside the spec. The two CRs > could > >> share the same status fields, and their specs would be different. > >> I personally like both solutions, and would love to hear others' > thoughts > >> as well. > >> > >> 2. I agree with you that "completed" is not very clear, but I would > >> suggest the name "alreadyExists". WDYT? > >> > >> 3. I think having a retry loop inside the operator does not add too much > >> complexity to the FLIP. On failure, we check if we have reached the max > >> retries. If we did, the state will be set to "FAILED", else it will be > set > >> to "TRIGGER_PENDING", causing the operator to retry the task. The > "error" > >> field will always be populated with the latest error. Kubernetes Jobs > >> already has a similar field called "backoffLimit", maybe we could use > the > >> same name, with the same logic applied, WDYT? > >> About error events, I think we should keep the "error" field, and upon > >> successful snapshot, we clear it. I will add to the FLIP that there > will be > >> an event generated for each unsuccessful snapshots. > >> > >> 4. I really like the idea of having something like Pod Conditions, but I > >> think it wouldn't add too much value here, because the only 2 stages > >> important to the user are "Triggered" and "Completed", and those > timestamps > >> will already be included in the status field. I think it would make more > >> sense to implement this if there were more lifecycle stages. > >> > >> 5. There will be a new field in JobSpec called > >> "flinkStateSnapshotReference" to reference a FlinkStateSnapshot to > restore > >> from. > >> > >> > How do you see potential effects on API server performance wrt. number > >> of > >> objects vs mutations? Is the proposal more or less neutral in that > regard? > >> > >> While I am not an expert in Kubernetes internals, my understanding is > >> that for the api-server, editing an existing resource or creating a new > one > >> is not different performance-wise, because the whole resource will > always > >> be written to etcd anyways. > >> Retrieving the savepoints from etcd will be different though for some > >> use-cases, e.g. retrieving all snapshots for a specific FlinkDeployment > >> would require the api-server to retrieve every snapshots first in a > >> namespace from etcd, then filter them for that specific > FlinkDeployment. I > >> think this is a worst-case scenario, and it will be up to the user to > >> optimize their queries via e.g. watch queries [1] or resourceVersions > [2]. > >> > >> > Does that mean one would have to create a FlinkStateSnapshot CR when > >> starting a new deployment from savepoint? If so, that's rather > >> complicated. > >> I would prefer something more simple/concise and would rather > >> keep initialSavepointPath > >> > >> Starting a job from a savepoint path will indeed be deprecated with this > >> FLIP. I agree that it will be more complicated to restore from a > savepoint > >> in those cases, but if the user decides to move away from the deprecated > >> savepoint mechanisms, every savepoint will result in a new > >> FlinkStateSnapshot CR. So the only situation I expect this to be an > >> inconvenience is when the user onboards a new Flink job to the operator. > >> But I may not be thinking this through, so please let me know if you > >> disagree. > >> > >> Thank you very much for your questions and suggestions! > >> > >> [1] > >> > https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes > >> [2] > >> > https://kubernetes.io/docs/reference/using-api/api-concepts/#resource-versions > >> > >> Regards, > >> Mate > >> > >> Thomas Weise <t...@apache.org> ezt írta (időpont: 2024. ápr. 19., P, > >> 11:31): > >> > >>> Thanks for the proposal. > >>> > >>> How do you see potential effects on API server performance wrt. number > of > >>> objects vs mutations? Is the proposal more or less neutral in that > >>> regard? > >>> > >>> Thanks for the thorough feedback Robert. > >>> > >>> Couple more questions below. > >>> > >>> --> > >>> > >>> On Fri, Apr 19, 2024 at 5:07 AM Robert Metzger <rmetz...@apache.org> > >>> wrote: > >>> > >>> > Hi Mate, > >>> > thanks for proposing this, I'm really excited about your FLIP. I hope > >>> my > >>> > questions make sense to you: > >>> > > >>> > 1. I would like to discuss the "FlinkStateSnapshot" name and the fact > >>> that > >>> > users have to use either the savepoint or checkpoint spec inside the > >>> > FlinkStateSnapshot. > >>> > Wouldn't it be more intuitive to introduce two CRs: > >>> > FlinkSavepoint and FlinkCheckpoint > >>> > Ideally they can internally share a lot of code paths, but from a > users > >>> > perspective, the abstraction is much clearer. > >>> > > >>> > >>> There are probably pros and cons either way. For example it is > desirable > >>> to > >>> have a single list of state snapshots when looking for the initial > >>> savepoint for a new deployment etc. > >>> > >>> > >>> > > >>> > 2. I also would like to discuss SavepointSpec.completed, as this name > >>> is > >>> > not intuitive to me. How about "ignoreExisting"? > >>> > > >>> > 3. The FLIP proposal seems to leave error handling to the user, e.g. > >>> when > >>> > you create a FlinkStateSnapshot, it will just move to status FAILED. > >>> > Typically in K8s with the control loop stuff, resources are tried to > >>> get > >>> > created until success. I think it would be really nice if the > >>> > FlinkStateSnapshot or FlinkSavepoint resource would retry based on a > >>> > property in the resource. A "FlinkStateSnapshot.retries" number would > >>> > indicate how often the user wants the operator to retry creating a > >>> > savepoint, "retries = -1" means retry forever. In addition, we could > >>> > consider a timeout as well, however, I haven't seen such a concept in > >>> K8s > >>> > CRs yet. > >>> > The benefit of this is that other tools relying on the K8s operator > >>> > wouldn't have to implement this retry loop (which is quite natural > for > >>> > K8s), they would just have to wait for the CR they've created to > >>> transition > >>> > into COMPLETED: > >>> > > >>> > 3. FlinkStateSnapshotStatus.error will only show the last error. What > >>> > about using Events, so that we can show multiple errors and use the > >>> > FlinkStateSnapshotState to report errors? > >>> > > >>> > 4. I wonder if it makes sense to use something like Pod Conditions > >>> (e.g. > >>> > Savepoint Conditions): > >>> > > >>> > https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-conditions > >>> > to track the completion status. We could have the following > conditions: > >>> > - Triggered > >>> > - Completed > >>> > - Failed > >>> > The only benefit of this proposal that I see is that it would tell > the > >>> > user how long it took to create the savepoint. > >>> > > >>> > 5. You mention that "JobSpec.initialSavepointPath" will be > deprecated. > >>> I > >>> > assume we will introduce a new field for referencing a > >>> FlinkStateSnapshot > >>> > CR? I think it would be good to cover this in the FLIP. > >>> > > >>> > Does that mean one would have to create a FlinkStateSnapshot CR when > >>> starting a new deployment from savepoint? If so, that's rather > >>> complicated. > >>> I would prefer something more simple/concise and would rather > >>> keep initialSavepointPath > >>> > >>> > >>> > > >>> > One minor comment: > >>> > > >>> > "/** Dispose the savepoints upon CRD deletion. */" > >>> > > >>> > I think this should be "upon CR deletion", not "CRD deletion". > >>> > > >>> > Thanks again for this great FLIP! > >>> > > >>> > Best, > >>> > Robert > >>> > > >>> > > >>> > On Fri, Apr 19, 2024 at 9:01 AM Gyula Fóra <gyula.f...@gmail.com> > >>> wrote: > >>> > > >>> >> Cc'ing some folks who gave positive feedback on this idea in the > past. > >>> >> > >>> >> I would love to hear your thoughts on the proposed design > >>> >> > >>> >> Gyula > >>> >> > >>> >> On Tue, Apr 16, 2024 at 6:31 PM Őrhidi Mátyás < > >>> matyas.orh...@gmail.com> > >>> >> wrote: > >>> >> > >>> >>> +1 Looking forward to it > >>> >>> > >>> >>> On Tue, Apr 16, 2024 at 8:56 AM Mate Czagany <czmat...@gmail.com> > >>> wrote: > >>> >>> > >>> >>> > Thank you Gyula! > >>> >>> > > >>> >>> > I think that is a great idea. I have updated the Google doc to > only > >>> >>> have 1 > >>> >>> > new configuration option of boolean type, which can be used to > >>> signal > >>> >>> the > >>> >>> > Operator to use the old mode. > >>> >>> > > >>> >>> > Also described in the configuration description, the Operator > will > >>> >>> fallback > >>> >>> > to the old mode if the FlinkStateSnapshot CRD cannot be found on > >>> the > >>> >>> > Kubernetes cluster. > >>> >>> > > >>> >>> > Regards, > >>> >>> > Mate > >>> >>> > > >>> >>> > Gyula Fóra <gyula.f...@gmail.com> ezt írta (időpont: 2024. ápr. > >>> 16., > >>> >>> K, > >>> >>> > 17:01): > >>> >>> > > >>> >>> > > Thanks Mate, this is great stuff. > >>> >>> > > > >>> >>> > > Mate, I think the new configs should probably default to the > new > >>> >>> mode and > >>> >>> > > they should only be useful for users to fall back to the old > >>> >>> behaviour. > >>> >>> > > We could by default use the new Snapshot CRD if the CRD is > >>> installed, > >>> >>> > > otherwise use the old mode by default and log a warning on > >>> startup. > >>> >>> > > > >>> >>> > > So I am suggesting a "dynamic" default behaviour based on > whether > >>> >>> the new > >>> >>> > > CRD was installed or not because we don't want to break > operator > >>> >>> startup. > >>> >>> > > > >>> >>> > > Gyula > >>> >>> > > > >>> >>> > > On Tue, Apr 16, 2024 at 4:48 PM Mate Czagany < > czmat...@gmail.com > >>> > > >>> >>> wrote: > >>> >>> > > > >>> >>> > > > Hi Ferenc, > >>> >>> > > > > >>> >>> > > > Thank you for your comments, I have updated the Google docs > >>> with a > >>> >>> new > >>> >>> > > > section for the new configs. > >>> >>> > > > All of the newly added config keys will have defaults set, > and > >>> by > >>> >>> > default > >>> >>> > > > all the savepoint/checkpoint operations will use the old > >>> system: > >>> >>> write > >>> >>> > > > their results to the FlinkDeployment/FlinkSessionJob status > >>> field. > >>> >>> > > > > >>> >>> > > > I have also added a default for the checkpoint type to be > FULL > >>> >>> (which > >>> >>> > is > >>> >>> > > > also the default currently). That was an oversight on my part > >>> to > >>> >>> miss > >>> >>> > > that. > >>> >>> > > > > >>> >>> > > > Regards, > >>> >>> > > > Mate > >>> >>> > > > > >>> >>> > > > Ferenc Csaky <ferenc.cs...@pm.me.invalid> ezt írta (időpont: > >>> 2024. > >>> >>> > ápr. > >>> >>> > > > 16., K, 16:10): > >>> >>> > > > > >>> >>> > > > > Thank you Mate for initiating this discussion. +1 for this > >>> idea. > >>> >>> > > > > Some Qs: > >>> >>> > > > > > >>> >>> > > > > Can you specify the newly introduced configurations in more > >>> >>> > > > > details? Currently, it is not fully clear to me what are > the > >>> >>> > > > > possible values of > >>> `kubernetes.operator.periodic.savepoint.mode`, > >>> >>> > > > > is it optional, has a default value? > >>> >>> > > > > > >>> >>> > > > > I see that in `SavepointSpec.formatType` has a default, > >>> although > >>> >>> > > > > `CheckppointSpec.checkpointType` not. Are we inferring that > >>> from > >>> >>> > > > > the config? My point is, in general I think it would be > good > >>> to > >>> >>> > > > > handle the two snapshot types in a similar way when it > makes > >>> >>> sense > >>> >>> > > > > to minimize any kind of confusion. > >>> >>> > > > > > >>> >>> > > > > Best, > >>> >>> > > > > Ferenc > >>> >>> > > > > > >>> >>> > > > > > >>> >>> > > > > > >>> >>> > > > > On Tuesday, April 16th, 2024 at 11:34, Mate Czagany < > >>> >>> > > czmat...@gmail.com> > >>> >>> > > > > wrote: > >>> >>> > > > > > >>> >>> > > > > > > >>> >>> > > > > > > >>> >>> > > > > > Hi Everyone, > >>> >>> > > > > > > >>> >>> > > > > > I would like to start a discussion on FLIP-446: > Kubernetes > >>> >>> Operator > >>> >>> > > > State > >>> >>> > > > > > Snapshot CRD. > >>> >>> > > > > > > >>> >>> > > > > > This FLIP adds a new custom resource for Operator users > to > >>> >>> create > >>> >>> > and > >>> >>> > > > > > manage their savepoints and checkpoints. I have also > >>> developed > >>> >>> an > >>> >>> > > > initial > >>> >>> > > > > > POC to prove that this approach is feasible, you can find > >>> the > >>> >>> link > >>> >>> > > for > >>> >>> > > > > that > >>> >>> > > > > > in the FLIP. > >>> >>> > > > > > > >>> >>> > > > > > There is a Confluence page [1] and a Google Docs page [2] > >>> as I > >>> >>> do > >>> >>> > not > >>> >>> > > > > have > >>> >>> > > > > > a Confluence account yet. > >>> >>> > > > > > > >>> >>> > > > > > [1] > >>> >>> > > > > > > >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> > https://cwiki.apache.org/confluence/display/FLINK/FLIP-446%3A+Kubernetes+Operator+State+Snapshot+CRD > >>> >>> > > > > > [2] > >>> >>> > > > > > > >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> > https://docs.google.com/document/d/1VdfLFaE4i6ESbCQ38CH7TKOiPQVvXeOxNV2FeSMnOTg > >>> >>> > > > > > > >>> >>> > > > > > > >>> >>> > > > > > Regards, > >>> >>> > > > > > Mate > >>> >>> > > > > > >>> >>> > > > > >>> >>> > > > >>> >>> > > >>> >>> > >>> >> > >>> > >> >