Thanks everyone for the feedback. Will try to dive deep into Pluggable Enrichers to see how we can incorporate termination-log by using it.
Regards, Swathi C On Thu, Apr 25, 2024 at 12:33 PM Martijn Visser <martijnvis...@apache.org> wrote: > Hi Swathi C, > > Also including the Dev mailing list. > > If you have a good reason for not being able to use the pluggable enricher > FLIP, you'll have to include that rationale in your own FLIP and explain > it. You might get challenged for it in the Dev mailing list thread > discussion, but that's the point. > > Regards, > > Martijn > > On Thu, Apr 25, 2024 at 8:51 AM Swathi C <swathi.c.apa...@gmail.com> > wrote: > >> Hi Martijn and Ahmed, >> >> This proposed FLIP was mainly focusing for the CRUD failures use case ( >> and not job failures ) and might not be able to use pluggable enricher FLIP >> ( as that mainly focuses on job failures ). Hence, for going forward as a >> new FLIP, we might not be able to leverage pluggable enricher FLIP for this >> use case. So, we might not be able to reformulate it for CRUD failures. >> >> So, is it ok with this new proposal or let us know if I'm missing >> anything and if it is related to pluggable enricher FLIP or anyway we can >> use pluggable enricker FLIP here for CRUD failures. >> >> Regards, >> Swathi C >> >> ---------- Forwarded message --------- >> From: Martijn Visser <martijnvis...@apache.org> >> Date: Thu, Apr 25, 2024 at 2:46 AM >> Subject: Re: [ DISCUSS ] FLIP-XXX : [Plugin] Enhancing Flink Failure >> Management in Kubernetes with Dynamic Termination Log Integration >> To: <dev@flink.apache.org> >> Cc: <hamdy10...@gmail.com>, <gyula.f...@gmail.com> >> >> >> I would prefer a separate FLIP >> >> On Wed, Apr 24, 2024 at 3:25 PM Swathi C <swathi.c.apa...@gmail.com> >> wrote: >> >> > Sure Ahmed and Martijn. >> > Fetching the flink particular job related failure and adding this logic >> to >> > termination-log is definitely a sub-task of pluggable enricher as we can >> > leverage pluggable enricher to achieve this. >> > But for CRUD level failures, which is mainly used to notify if the job >> > manager failed might not be using the pluggable enricher. So, let us >> know >> > if that needs to be there as a separate FLIP or we can combine that as >> well >> > under the pluggable enricher ( by adding another sub task ) ? >> > >> > Regards, >> > Swathi C >> > >> > On Wed, Apr 24, 2024 at 3:46 PM Ahmed Hamdy <hamdy10...@gmail.com> >> wrote: >> > >> > > Hi, >> > > I agree with the Martijn, We can reformulate the FLIP to introduce >> > > termination log as supported pluggable enricher. If you believe the >> scope >> > > of work is a subset (Further implementation) we can just add a Jira >> > ticket >> > > for it. IMO this will also help with implementation taking the >> existing >> > > enrichers into reference. >> > > Best Regards >> > > Ahmed Hamdy >> > > >> > > >> > > On Tue, 23 Apr 2024 at 15:23, Martijn Visser < >> martijnvis...@apache.org> >> > > wrote: >> > > >> > > > From a procedural point of view, we shouldn't make FLIPs sub-tasks >> for >> > > > existing FLIPs that have been voted/are released. That will only >> cause >> > > > confusion down the line. A new FLIP should take existing >> functionality >> > > > (like FLIP-304) into account, and propose how to improve on what >> that >> > > > original FLIP has introduced or how you're going to leverage what's >> > > already >> > > > there. >> > > > >> > > > On Tue, Apr 23, 2024 at 11:42 AM ramkrishna vasudevan < >> > > > ramvasu.fl...@gmail.com> wrote: >> > > > >> > > > > Hi Gyula and Ahmed, >> > > > > >> > > > > I totally agree that there is an interlap in the final goal that >> both >> > > the >> > > > > FLIPs are achieving here and infact FLIP-304 is more comprehensive >> > for >> > > > job >> > > > > failures. >> > > > > >> > > > > But as a proposal to move forward can we make Swathi's FLIP/JIRA >> as a >> > > sub >> > > > > task for FLIP-304 and continue with the PR since the main aim is >> to >> > get >> > > > the >> > > > > cluster failure pushed to the termination log for K8s based >> > > deployments. >> > > > > And once it is completed we can work to make FLIP-304 to support >> job >> > > > > failure propagation to termination log? >> > > > > >> > > > > Regards >> > > > > Ram >> > > > > >> > > > > On Thu, Apr 18, 2024 at 10:07 PM Swathi C < >> swathi.c.apa...@gmail.com >> > > >> > > > > wrote: >> > > > > >> > > > > > Hi Gyula and Ahmed, >> > > > > > >> > > > > > Thanks for reviewing this. >> > > > > > >> > > > > > @gyula.f...@gmail.com <gyula.f...@gmail.com> , currently since >> our >> > > aim >> > > > > as >> > > > > > part of this FLIP was only to fail the cluster when job >> > manager/flink >> > > > has >> > > > > > issues such that the cluster would no longer be usable, hence, >> we >> > > > > proposed >> > > > > > only related to that. >> > > > > > Your right, that it covers only job main class errors, job >> manager >> > > run >> > > > > time >> > > > > > failures, if the Job manager wants to write any metadata to any >> > other >> > > > > > system ( ABFS, S3 , ... ) and the job failures will not be >> > covered. >> > > > > > >> > > > > > FLIP-304 is mainly used to provide Failure enrichers for job >> > > failures. >> > > > > > Since, this FLIP is mainly for flink Job manager failures, let >> us >> > > know >> > > > if >> > > > > > we can leverage the goodness of both and try to extend FLIP-304 >> and >> > > add >> > > > > our >> > > > > > plugin implementation to cover the job level issues ( propagate >> > this >> > > > info >> > > > > > to the /dev/termination-log such that, the container status >> reports >> > > it >> > > > > for >> > > > > > flink on K8S by implementing Failure Enricher interface and >> > > > > > processFailure() to do this ) and use this FLIP proposal for >> > generic >> > > > > flink >> > > > > > cluster (Job manager/cluster ) failures. >> > > > > > >> > > > > > Regards, >> > > > > > Swathi C >> > > > > > >> > > > > > On Thu, Apr 18, 2024 at 7:36 PM Ahmed Hamdy < >> hamdy10...@gmail.com> >> > > > > wrote: >> > > > > > >> > > > > > > Hi Swathi! >> > > > > > > Thanks for the proposal. >> > > > > > > Could you please elaborate what this FLIP offers more than >> > > > Flip-304[1]? >> > > > > > > Flip 304 proposes a Pluggable mechanism for enriching Job >> > failures, >> > > > If >> > > > > I >> > > > > > am >> > > > > > > not mistaken this proposal looks like a subset of it. >> > > > > > > >> > > > > > > 1- >> > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://cwiki.apache.org/confluence/display/FLINK/FLIP-304%3A+Pluggable+Failure+Enrichers >> > > > > > > >> > > > > > > Best Regards >> > > > > > > Ahmed Hamdy >> > > > > > > >> > > > > > > >> > > > > > > On Thu, 18 Apr 2024 at 08:23, Gyula Fóra < >> gyula.f...@gmail.com> >> > > > wrote: >> > > > > > > >> > > > > > > > Hi Swathi! >> > > > > > > > >> > > > > > > > Thank you for creating this proposal. I really like the >> general >> > > > idea >> > > > > of >> > > > > > > > increasing the K8s native observability of Flink job errors. >> > > > > > > > >> > > > > > > > I took a quick look at your reference PR, the termination >> log >> > > > related >> > > > > > > logic >> > > > > > > > is contained completely in the ClusterEntrypoint. What type >> of >> > > > errors >> > > > > > > will >> > > > > > > > this actually cover? >> > > > > > > > >> > > > > > > > To me this seems to cover only: >> > > > > > > > - Job main class errors (ie startup errors) >> > > > > > > > - JobManager failures >> > > > > > > > >> > > > > > > > Would regular job errors (that cause only job failover but >> not >> > JM >> > > > > > errors) >> > > > > > > > be reported somehow with this plugin? >> > > > > > > > >> > > > > > > > Thanks >> > > > > > > > Gyula >> > > > > > > > >> > > > > > > > On Tue, Apr 16, 2024 at 8:21 AM Swathi C < >> > > > swathi.c.apa...@gmail.com> >> > > > > > > > wrote: >> > > > > > > > >> > > > > > > > > Hi All, >> > > > > > > > > >> > > > > > > > > I would like to start a discussion on FLIP-XXX : [Plugin] >> > > > Enhancing >> > > > > > > Flink >> > > > > > > > > Failure Management in Kubernetes with Dynamic Termination >> Log >> > > > > > > > Integration. >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> https://docs.google.com/document/d/1tWR0Fi3w7VQeD_9VUORh8EEOva3q-V0XhymTkNaXHOc/edit?usp=sharing >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > This FLIP proposes an improvement plugin and focuses >> mainly >> > on >> > > > > Flink >> > > > > > on >> > > > > > > > > K8S but can be used as a generic plugin and add further >> > > > > enhancements. >> > > > > > > > > >> > > > > > > > > Looking forward to everyone's feedback and suggestions. >> Thank >> > > you >> > > > > !! >> > > > > > > > > >> > > > > > > > > Best Regards, >> > > > > > > > > Swathi Chandrashekar >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > >> > > > >> > > >> > >> >