Thanks for the FLIP. It is helpful to track detail infos for completed jobs.
I want to ask another question. In our environment, sometimes it is hard to distinguish jobs since the same job names may appear multi times in the completed jobs. Because a job may run multi times or different jobs have the same job names. I wonder that wether we can enhance the complete jobs display with more information, such as applicationId and application name in yarn. Maybe it is different in k8s to identify a job. Best Jiangang Liu Yangze Guo <karma...@gmail.com> 于2022年6月17日周五 11:40写道: > Thanks for the feedback, Aitozi and Jing. > > > Are each attempts of the TaskManager or JobManager pods (if failure > occurs) > all be shown in the ui? > > The info of the prior execution attempts will be archived, you could > refer to `ArchivedExecutionVertex$priorExecutions`. > > > It seems that most of these metrics are more interesting to batch jobs. > Does it make sense to calculate them for pure streaming jobs too? > > All the proposed metrics will be calculated no matter what the job type is. > > > Why "duration is less interesting" which is mentioned in the FLIP? > > As a first step, we mainly focus on the most interesting status during > the job lifecycle. The duration of final states like FINISHED and > CANCELED is meaningless, while abnormal conditions like CANCELING will > not be included at the moment. > > > Could you share your thoughts on "accumulated-busy-time"? It should > describe the time while the task is working as expected, i.e. the happy > path. When do we need it for analytics or diagnosis? > > A task could be busy or idle while it is working. Users may adjust the > parallelism or the partition key according to the ratio between them. > > Best, > Yangze Guo > > On Fri, Jun 17, 2022 at 5:08 AM Jing Ge <j...@ververica.com> wrote: > > > > Hi Junhan > > > > These are must-to-have information for batch processing. Thanks for > > bringing it up. > > > > I have some comments: > > > > 1. It seems that most of these metrics are more interesting to batch > jobs. > > Does it make sense to calculate them for pure streaming jobs too? > > 2. Why "duration is less interesting" which is mentioned in the FLIP? > > 3. Could you share your thoughts on "accumulated-busy-time"? It should > > describe the time while the task is working as expected, i.e. the happy > > path. When do we need it for analytics or diagnosis? > > > > BTW, you might want to optimize the format of the FLIP. Some text is > > running out of the right border of the wiki page. > > > > Best regards, > > Jing > > > > On Thu, Jun 16, 2022 at 4:40 PM Aitozi <gjying1...@gmail.com> wrote: > > > > > Thanks Junhan for driving this. It a great improvement for the batch > jobs. > > > I'm looking forward to this feature in our internal use case. +1 for > it. > > > > > > One more question: > > > > > > Are each attempts of the TaskManager or JobManager pods (if failure > occurs) > > > all be shown in the ui ? > > > > > > Best, > > > Aitozi. > > > > > > Yang Wang <danrtsey...@gmail.com> 于2022年6月16日周四 19:10写道: > > > > > > > Thanks Xintong for the explanation. > > > > > > > > It makes sense to leave the discussion about job result store in a > > > > dedicated thread. > > > > > > > > > > > > Best, > > > > Yang > > > > > > > > Xintong Song <tonysong...@gmail.com> 于2022年6月16日周四 13:40写道: > > > > > > > > > My impression of JobResultStore is more about fault tolerance and > high > > > > > availability. Using it for providing information to users sounds > worth > > > > > exploring. We probably need more time to think it through. > > > > > > > > > > Given that it doesn't conflict with what we have proposed in this > FLIP, > > > > I'd > > > > > suggest considering it as a separate thread and exclude it from the > > > scope > > > > > of this one. > > > > > > > > > > Best, > > > > > > > > > > Xintong > > > > > > > > > > > > > > > > > > > > On Thu, Jun 16, 2022 at 11:43 AM Yang Wang <danrtsey...@gmail.com> > > > > wrote: > > > > > > > > > > > This is a very useful feature both for finished streaming and > batch > > > > jobs. > > > > > > > > > > > > Except for the WebUI & REST API improvements, I am curious > whether we > > > > > could > > > > > > also integrate some critical information(e.g. latest checkpoint) > into > > > > the > > > > > > job result store[1]. > > > > > > I am just feeling this is also somehow related with "Completed > Jobs > > > > > > Information Enhancement". > > > > > > And I think the history server is not necessary for all the > scenarios > > > > > > especially when users only want to check the job execution > result. > > > > > > > > > > > > [1]. > > > > > > > > > > > > > > > > > > > > > > > > > https://cwiki.apache.org/confluence/display/FLINK/FLIP-194%3A+Introduce+the+JobResultStore > > > > > > > > > > > > > > > > > > Best, > > > > > > Yang > > > > > > > > > > > > Xintong Song <tonysong...@gmail.com> 于2022年6月15日周三 15:37写道: > > > > > > > > > > > > > Thanks Junhan, > > > > > > > > > > > > > > +1 for the proposed improvements. > > > > > > > > > > > > > > Best, > > > > > > > > > > > > > > Xintong > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jun 15, 2022 at 3:16 PM Yangze Guo <karma...@gmail.com > > > > > > wrote: > > > > > > > > > > > > > > > Thanks for driving this, Junhan. > > > > > > > > > > > > > > > > I think it's a valuable usability improvement for both > streaming > > > > and > > > > > > > > batch users. Looking forward to the community feedback. > > > > > > > > > > > > > > > > Best, > > > > > > > > Yangze Guo > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Wed, Jun 15, 2022 at 3:10 PM junhan yang < > > > > > yangjunhan1...@gmail.com> > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > Hi all, > > > > > > > > > > > > > > > > > > I would like to open a discussion on FLIP-241: Completed > Jobs > > > > > > > Information > > > > > > > > > Enhancement. > > > > > > > > > > > > > > > > > > As far as we can tell, streaming and batch users have > different > > > > > > > interests > > > > > > > > > in probing a job. As Flink grows into a unified streaming & > > > batch > > > > > > > > processor > > > > > > > > > and is adopted by more and more batch users, the user > > > experience > > > > of > > > > > > > > > completed job's inspection has become more and more > important. > > > > > After > > > > > > > > doing > > > > > > > > > several market research, there are several potential > > > improvements > > > > > > > > spotted. > > > > > > > > > > > > > > > > > > The main purpose here is due to the involvement of WebUI & > REST > > > > API > > > > > > > > > changes, which should be openly discussed and voted on as > > > FLIPs. > > > > > > > > > > > > > > > > > > You can find more details in FLIP-241 document[1]. Looking > > > > forward > > > > > to > > > > > > > > > your feedback. > > > > > > > > > > > > > > > > > > [1] https://cwiki.apache.org/confluence/x/dRD1D > > > > > > > > > > > > > > > > > > Best regards, > > > > > > > > > Junhan > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >