Hi,
I agree with the Martijn, We can reformulate the FLIP to introduce
termination log as supported pluggable enricher. If you believe the scope
of work is a subset (Further implementation) we can just add a Jira ticket
for it. IMO this will also help with implementation taking the existing
enrichers into reference.
Best Regards
Ahmed Hamdy


On Tue, 23 Apr 2024 at 15:23, Martijn Visser <martijnvis...@apache.org>
wrote:

> From a procedural point of view, we shouldn't make FLIPs sub-tasks for
> existing FLIPs that have been voted/are released. That will only cause
> confusion down the line. A new FLIP should take existing functionality
> (like FLIP-304) into account, and propose how to improve on what that
> original FLIP has introduced or how you're going to leverage what's already
> there.
>
> On Tue, Apr 23, 2024 at 11:42 AM ramkrishna vasudevan <
> ramvasu.fl...@gmail.com> wrote:
>
> > Hi Gyula and Ahmed,
> >
> > I totally agree that there is an interlap in the final goal that both the
> > FLIPs are achieving here and infact FLIP-304 is more comprehensive for
> job
> > failures.
> >
> > But as a proposal to move forward can we make Swathi's FLIP/JIRA as a sub
> > task for FLIP-304 and continue with the PR since the main aim is to get
> the
> > cluster failure pushed to the termination log for K8s based deployments.
> > And once it is completed we can work to make FLIP-304 to support job
> > failure propagation to termination log?
> >
> > Regards
> > Ram
> >
> > On Thu, Apr 18, 2024 at 10:07 PM Swathi C <swathi.c.apa...@gmail.com>
> > wrote:
> >
> > > Hi Gyula and  Ahmed,
> > >
> > > Thanks for reviewing this.
> > >
> > > @gyula.f...@gmail.com <gyula.f...@gmail.com> , currently since our aim
> > as
> > > part of this FLIP was only to fail the cluster when job manager/flink
> has
> > > issues such that the cluster would no longer be usable, hence, we
> > proposed
> > > only related to that.
> > > Your right, that it covers only job main class errors, job manager run
> > time
> > > failures, if the Job manager wants to write any metadata to any other
> > > system ( ABFS, S3 , ... )  and the job failures will not be covered.
> > >
> > > FLIP-304 is mainly used to provide Failure enrichers for job failures.
> > > Since, this FLIP is mainly for flink Job manager failures, let us know
> if
> > > we can leverage the goodness of both and try to extend FLIP-304 and add
> > our
> > > plugin implementation to cover the job level issues ( propagate this
> info
> > > to the /dev/termination-log such that, the container status reports it
> > for
> > > flink on K8S by implementing Failure Enricher interface and
> > > processFailure() to do this ) and use this FLIP proposal for generic
> > flink
> > > cluster (Job manager/cluster ) failures.
> > >
> > > Regards,
> > > Swathi C
> > >
> > > On Thu, Apr 18, 2024 at 7:36 PM Ahmed Hamdy <hamdy10...@gmail.com>
> > wrote:
> > >
> > > > Hi Swathi!
> > > > Thanks for the proposal.
> > > > Could you please elaborate what this FLIP offers more than
> Flip-304[1]?
> > > > Flip 304 proposes a Pluggable mechanism for enriching Job failures,
> If
> > I
> > > am
> > > > not mistaken this proposal looks like a subset of it.
> > > >
> > > > 1-
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-304%3A+Pluggable+Failure+Enrichers
> > > >
> > > > Best Regards
> > > > Ahmed Hamdy
> > > >
> > > >
> > > > On Thu, 18 Apr 2024 at 08:23, Gyula Fóra <gyula.f...@gmail.com>
> wrote:
> > > >
> > > > > Hi Swathi!
> > > > >
> > > > > Thank you for creating this proposal. I really like the general
> idea
> > of
> > > > > increasing the K8s native observability of Flink job errors.
> > > > >
> > > > > I took a quick look at your reference PR, the termination log
> related
> > > > logic
> > > > > is contained completely in the ClusterEntrypoint. What type of
> errors
> > > > will
> > > > > this actually cover?
> > > > >
> > > > > To me this seems to cover only:
> > > > >  - Job main class errors (ie startup errors)
> > > > >  - JobManager failures
> > > > >
> > > > > Would regular job errors (that cause only job failover but not JM
> > > errors)
> > > > > be reported somehow with this plugin?
> > > > >
> > > > > Thanks
> > > > > Gyula
> > > > >
> > > > > On Tue, Apr 16, 2024 at 8:21 AM Swathi C <
> swathi.c.apa...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi All,
> > > > > >
> > > > > > I would like to start a discussion on FLIP-XXX : [Plugin]
> Enhancing
> > > > Flink
> > > > > > Failure Management in Kubernetes with Dynamic Termination Log
> > > > > Integration.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tWR0Fi3w7VQeD_9VUORh8EEOva3q-V0XhymTkNaXHOc/edit?usp=sharing
> > > > > >
> > > > > >
> > > > > > This FLIP proposes an improvement plugin and focuses mainly on
> > Flink
> > > on
> > > > > > K8S but can be used as a generic plugin and add further
> > enhancements.
> > > > > >
> > > > > > Looking forward to everyone's feedback and suggestions. Thank you
> > !!
> > > > > >
> > > > > > Best Regards,
> > > > > > Swathi Chandrashekar
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to