Re: [DISCUSS] FLIP-241: Completed Jobs Information Enhancement

junhan yang Wed, 22 Jun 2022 21:55:50 -0700

Hi all,

Thank you all for your feedbacks. As far as I can see, it looks like the
discussion on this FLIP has been converged.


I will start a new vote thread now.

Best regards,
Junhan

Yangze Guo <[email protected]> 于2022年6月17日周五 14:05写道：

> Thanks for the input, Jiangang.
>
> I think it's a valid demand to distinguish completed jobs with the same
> name.
> - If they are different jobs, I think users need to give them
> different meaningful names respectively.
> - If they are exactly the same job, IIUC, what you need is to figure
> out the order. ApplicationId in Yarn might help. But in this case, you
> can just sort them with the start time.
>
> Best,
> Yangze Guo
>
> On Fri, Jun 17, 2022 at 12:13 PM Jiangang Liu <[email protected]>
> wrote:
> >
> > Thanks for the FLIP. It is helpful to track detail infos for completed
> jobs.
> >
> > I want to ask another question. In our environment, sometimes it is hard
> to
> > distinguish jobs since the same job names may appear multi times in the
> > completed jobs. Because a job may run multi times or different jobs have
> > the same job names. I wonder that wether we can enhance the complete jobs
> > display with more information, such as applicationId and application name
> > in yarn. Maybe it is different in k8s to identify a job.
> >
> > Best
> > Jiangang Liu
> >
> > Yangze Guo <[email protected]> 于2022年6月17日周五 11:40写道：
> >
> > > Thanks for the feedback, Aitozi and Jing.
> > >
> > > > Are each attempts of the TaskManager or JobManager pods (if failure
> > > occurs)
> > > all be shown in the ui?
> > >
> > > The info of the prior execution attempts will be archived, you could
> > > refer to `ArchivedExecutionVertex$priorExecutions`.
> > >
> > > > It seems that most of these metrics are more interesting to batch
> jobs.
> > > Does it make sense to calculate them for pure streaming jobs too?
> > >
> > > All the proposed metrics will be calculated no matter what the job
> type is.
> > >
> > > > Why "duration is less interesting" which is mentioned in the FLIP?
> > >
> > > As a first step, we mainly focus on the most interesting status during
> > > the job lifecycle. The duration of final states like FINISHED and
> > > CANCELED is meaningless, while abnormal conditions like CANCELING will
> > > not be included at the moment.
> > >
> > > > Could you share your thoughts on "accumulated-busy-time"? It should
> > > describe the time while the task is working as expected, i.e. the happy
> > > path. When do we need it for analytics or diagnosis?
> > >
> > > A task could be busy or idle while it is working. Users may adjust the
> > > parallelism or the partition key according to the ratio between them.
> > >
> > > Best,
> > > Yangze Guo
> > >
> > > On Fri, Jun 17, 2022 at 5:08 AM Jing Ge <[email protected]> wrote:
> > > >
> > > > Hi Junhan
> > > >
> > > > These are must-to-have information for batch processing. Thanks for
> > > > bringing it up.
> > > >
> > > > I have some comments:
> > > >
> > > > 1. It seems that most of these metrics are more interesting to batch
> > > jobs.
> > > > Does it make sense to calculate them for pure streaming jobs too?
> > > > 2. Why "duration is less interesting" which is mentioned in the FLIP?
> > > > 3. Could you share your thoughts on "accumulated-busy-time"? It
> should
> > > > describe the time while the task is working as expected, i.e. the
> happy
> > > > path. When do we need it for analytics or diagnosis?
> > > >
> > > > BTW, you might want to optimize the format of the FLIP. Some text is
> > > > running out of the right border of the wiki page.
> > > >
> > > > Best regards,
> > > > Jing
> > > >
> > > > On Thu, Jun 16, 2022 at 4:40 PM Aitozi <[email protected]> wrote:
> > > >
> > > > > Thanks Junhan for driving this. It a great improvement for the
> batch
> > > jobs.
> > > > > I'm looking forward to this feature in our internal use case. +1
> for
> > > it.
> > > > >
> > > > > One more question:
> > > > >
> > > > > Are each attempts of the TaskManager or JobManager pods (if failure
> > > occurs)
> > > > > all be shown in the ui ?
> > > > >
> > > > > Best,
> > > > > Aitozi.
> > > > >
> > > > > Yang Wang <[email protected]> 于2022年6月16日周四 19:10写道：
> > > > >
> > > > > > Thanks Xintong for the explanation.
> > > > > >
> > > > > > It makes sense to leave the discussion about job result store in
> a
> > > > > > dedicated thread.
> > > > > >
> > > > > >
> > > > > > Best,
> > > > > > Yang
> > > > > >
> > > > > > Xintong Song <[email protected]> 于2022年6月16日周四 13:40写道：
> > > > > >
> > > > > > > My impression of JobResultStore is more about fault tolerance
> and
> > > high
> > > > > > > availability. Using it for providing information to users
> sounds
> > > worth
> > > > > > > exploring. We probably need more time to think it through.
> > > > > > >
> > > > > > > Given that it doesn't conflict with what we have proposed in
> this
> > > FLIP,
> > > > > > I'd
> > > > > > > suggest considering it as a separate thread and exclude it
> from the
> > > > > scope
> > > > > > > of this one.
> > > > > > >
> > > > > > > Best,
> > > > > > >
> > > > > > > Xintong
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Jun 16, 2022 at 11:43 AM Yang Wang <
> [email protected]>
> > > > > > wrote:
> > > > > > >
> > > > > > > > This is a very useful feature both for finished streaming and
> > > batch
> > > > > > jobs.
> > > > > > > >
> > > > > > > > Except for the WebUI & REST API improvements, I am curious
> > > whether we
> > > > > > > could
> > > > > > > > also integrate some critical information(e.g. latest
> checkpoint)
> > > into
> > > > > > the
> > > > > > > > job result store[1].
> > > > > > > > I am just feeling this is also somehow related with
> "Completed
> > > Jobs
> > > > > > > > Information Enhancement".
> > > > > > > > And I think the history server is not necessary for all the
> > > scenarios
> > > > > > > > especially when users only want to check the job execution
> > > result.
> > > > > > > >
> > > > > > > > [1].
> > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-194%3A+Introduce+the+JobResultStore
> > > > > > > >
> > > > > > > >
> > > > > > > > Best,
> > > > > > > > Yang
> > > > > > > >
> > > > > > > > Xintong Song <[email protected]> 于2022年6月15日周三 15:37写道：
> > > > > > > >
> > > > > > > > > Thanks Junhan,
> > > > > > > > >
> > > > > > > > > +1 for the proposed improvements.
> > > > > > > > >
> > > > > > > > > Best,
> > > > > > > > >
> > > > > > > > > Xintong
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Jun 15, 2022 at 3:16 PM Yangze Guo <
> [email protected]
> > > >
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Thanks for driving this, Junhan.
> > > > > > > > > >
> > > > > > > > > > I think it's a valuable usability improvement for both
> > > streaming
> > > > > > and
> > > > > > > > > > batch users. Looking forward to the community feedback.
> > > > > > > > > >
> > > > > > > > > > Best,
> > > > > > > > > > Yangze Guo
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 15, 2022 at 3:10 PM junhan yang <
> > > > > > > [email protected]>
> > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi all,
> > > > > > > > > > >
> > > > > > > > > > > I would like to open a discussion on FLIP-241:
> Completed
> > > Jobs
> > > > > > > > > Information
> > > > > > > > > > > Enhancement.
> > > > > > > > > > >
> > > > > > > > > > > As far as we can tell, streaming and batch users have
> > > different
> > > > > > > > > interests
> > > > > > > > > > > in probing a job. As Flink grows into a unified
> streaming &
> > > > > batch
> > > > > > > > > > processor
> > > > > > > > > > > and is adopted by more and more batch users, the user
> > > > > experience
> > > > > > of
> > > > > > > > > > > completed job's inspection has become more and more
> > > important.
> > > > > > > After
> > > > > > > > > > doing
> > > > > > > > > > > several market research, there are several potential
> > > > > improvements
> > > > > > > > > > spotted.
> > > > > > > > > > >
> > > > > > > > > > > The main purpose here is due to the involvement of
> WebUI &
> > > REST
> > > > > > API
> > > > > > > > > > > changes, which should be openly discussed and voted on
> as
> > > > > FLIPs.
> > > > > > > > > > >
> > > > > > > > > > > You can find more details in FLIP-241 document[1].
> Looking
> > > > > > forward
> > > > > > > to
> > > > > > > > > > > your feedback.
> > > > > > > > > > >
> > > > > > > > > > > [1] https://cwiki.apache.org/confluence/x/dRD1D
> > > > > > > > > > >
> > > > > > > > > > > Best regards,
> > > > > > > > > > > Junhan
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > >
>

Re: [DISCUSS] FLIP-241: Completed Jobs Information Enhancement

Reply via email to