Hi, A quick question/comment about FLIP-102. Have you thought about adding GC stats? I’m not sure what’s easily do-able, but something that would allow user to see GC issues (long/frequent pauses, lots of CPU time spent in the GC) would be quite useful for analysing performance/stability issues, without a need of connecting profilers in a distributed environment?
Piotrek > On 10 Feb 2020, at 10:58, Yadong Xie <vthink...@gmail.com> wrote: > > Hi all > I have drafted the docs of top-level FLIPs for the individual changes > proposed in FLIP-75. > will update it to the cwiki page and start the voting stage soon if there > is no objection. > > - FLIP-98: Better Back Pressure Detection > > <https://docs.google.com/document/d/1b4GadCze-36x5TPHz6ie4WI9fOUuxWoT_rWWgeg68oo/edit?usp=sharing> > - FLIP-99: Make Max Exception Configurable > > <https://docs.google.com/document/d/1tsPpTEx5WqliOAUC924xzRxYOalUuB-GoznGPcxSzJo/edit?usp=sharing> > - FLIP-100: Add Attempt Information > > <https://docs.google.com/document/d/1Ww7biOr6WMVfoYhtBTJftRqEm9FGo33AXgYibdXy47Y/edit?usp=sharing> > - FLIP-101: Add Pending Slots Tab in Job Detail > > <https://docs.google.com/document/d/1ttn7zIn_Z237JOHdmhiei6aCwKdjTU53I07XxA61Fis/edit?usp=sharing> > - FLIP-102: Add More Metrics to TaskManager > > <https://docs.google.com/document/d/18yHdsqUJ1FmNRm0hyeCm3nWvPFpvpJgTJ8BYNAa6Ul8/edit?usp=sharing> > - FLIP-103: Better Taskmanager Log Display > > <https://docs.google.com/document/d/16eEdW2KeLxvABdoXahx4MMMisW4_P9mKiqUE0F4GO1c/edit?usp=sharing> > - FLIP-104: Add More Metrics to Jobmanager > > <https://docs.google.com/document/d/1Fak632iOroOLZFADqwZWu2SS-LUQqCLHnm8Vs3XM5to/edit?usp=sharing> > - FLIP-105: Better Jobmanager Log Display > > <https://docs.google.com/document/d/1ayXaZflelaymQuF3l6UOuGEg6zzbSGGewoDq-9SBOPY/edit?usp=sharing> > > > Yadong Xie <vthink...@gmail.com> 于2020年2月9日周日 下午7:24写道: > >> Hi Till >> I got your point, will create sub FLIPs and votings according to the >> FLIP-75 and previous discussion soon. >> >> Till Rohrmann <trohrm...@apache.org> 于2020年2月9日周日 下午5:27写道: >> >>> Hi Yadong, >>> >>> I think it would be fine to simply link to this discussion thread to keep >>> the discussion history. Maybe an easier way would be to create top-level >>> FLIPs for the individual changes proposed in FLIP-75. The reason I'm >>> proposing this is that it would be easier to vote on it and to implement >>> it >>> because the scope is smaller. But maybe I'm wrong here and others could >>> chime in to voice their opinion. >>> >>> Cheers, >>> Till >>> >>> On Fri, Feb 7, 2020 at 9:58 AM Yadong Xie <vthink...@gmail.com> wrote: >>> >>>> Hi Till >>>> >>>> FLIP-75 has been open since September, and the design doc has been >>> iterated >>>> over 3 versions and more than 20 patches. >>>> I had a try, but it is hard to split the design docs into sub FLIP and >>> keep >>>> all the discussion history at the same time. >>>> >>>> Maybe it is better to start another discussion to talk about the >>> individual >>>> sub FLIP voting? and make the next FLIP follow the new practice if >>>> possible. >>>> >>>> Till Rohrmann <trohrm...@apache.org> 于2020年2月3日周一 下午6:28写道: >>>> >>>>> I think there is no such description because we never did it before. I >>>> just >>>>> figured that FLIP-75 could actually be a good candidate to start this >>>>> practice. We would need a community discussion first, though. >>>>> >>>>> Cheers, >>>>> Till >>>>> >>>>> On Mon, Feb 3, 2020 at 10:28 AM Yadong Xie <vthink...@gmail.com> >>> wrote: >>>>> >>>>>> Hi Till >>>>>> I didn’t find how to create of sub flip at cwiki.apache.org >>>>>> do you mean to create 9 more FLIPS instead of FLIP-75? >>>>>> >>>>>> Till Rohrmann <trohrm...@apache.org> 于2020年1月30日周四 下午11:12写道: >>>>>> >>>>>>> Would it be easier if FLIP-75 would be the umbrella FLIP and we >>> would >>>>>> vote >>>>>>> on the individual improvements as sub FLIPs? Decreasing the scope >>>>> should >>>>>>> make things easier. >>>>>>> >>>>>>> Cheers, >>>>>>> Till >>>>>>> >>>>>>> On Thu, Jan 30, 2020 at 2:35 PM Robert Metzger < >>> rmetz...@apache.org> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks a lot for this work! I believe the web UI is very >>> important, >>>>> in >>>>>>>> particular to new users. I'm very happy to see that you are >>> putting >>>>>>> effort >>>>>>>> into improving the visibility into Flink through the proposed >>>>> changes. >>>>>>>> >>>>>>>> I can not judge if all the changes make total sense, but the >>>>> discussion >>>>>>> has >>>>>>>> been open since September, and a good number of people have >>>> commented >>>>>> in >>>>>>>> the document. >>>>>>>> I wonder if we can move this FLIP to the VOTing stage? >>>>>>>> >>>>>>>> On Wed, Jan 22, 2020 at 6:27 PM Till Rohrmann < >>>> trohrm...@apache.org> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Thanks for the update Yadong. Big +1 for the proposed >>>> improvements >>>>>> for >>>>>>>>> Flink's web UI. I think they will be super helpful for our >>> users. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Till >>>>>>>>> >>>>>>>>> On Tue, Jan 7, 2020 at 10:00 AM Yadong Xie < >>> vthink...@gmail.com> >>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Hi everyone >>>>>>>>>> >>>>>>>>>> We have spent some time updating the documentation since the >>>> last >>>>>>>>>> discussion. >>>>>>>>>> >>>>>>>>>> In short, the latest FLIP-75 contains the following >>>>>>> proposal(including >>>>>>>>> both >>>>>>>>>> frontend and RestAPI) >>>>>>>>>> >>>>>>>>>> 1. Job Level >>>>>>>>>> - better job backpressure detection >>>>>>>>>> - load more feature in job exception >>>>>>>>>> - show attempt history in the subtask >>>>>>>>>> - show attempt timeline >>>>>>>>>> - add pending slots >>>>>>>>>> 2. Task Manager Level >>>>>>>>>> - add more metrics >>>>>>>>>> - better log display >>>>>>>>>> 3. Job Manager Level >>>>>>>>>> - add metrics tab >>>>>>>>>> - better log display >>>>>>>>>> >>>>>>>>>> To help everyone better understand the proposal, we spent >>>> efforts >>>>>> on >>>>>>>>> making >>>>>>>>>> an online POC <http://101.132.122.69:8081/web/#/overview>. >>>>>>>>>> >>>>>>>>>> Now you can compare the difference between the new and old >>>>>>> Web/RestAPI >>>>>>>>> (the >>>>>>>>>> link is inside the doc)! >>>>>>>>>> >>>>>>>>>> Here is the latest FLIP-75 doc: >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit# >>>>>>>>>> >>>>>>>>>> Looking forward to your feedback >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Yadong >>>>>>>>>> >>>>>>>>>> lining jing <jinglini...@gmail.com> 于2019年10月24日周四 >>> 下午2:11写道: >>>>>>>>>> >>>>>>>>>>> Hi all, I have updated the backend design in FLIP-75 >>>>>>>>>>> < >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing >>>>>>>>>>>> >>>>>>>>>>> . >>>>>>>>>>> >>>>>>>>>>> Here are some brief introductions: >>>>>>>>>>> >>>>>>>>>>> - Add metric for manage memory FLINK-14406 >>>>>>>>>>> <https://issues.apache.org/jira/browse/FLINK-14406>. >>>>>>>>>>> - Expose TaskExecutor resource configurations to REST >>> API >>>>>>>>> FLINK-14422 >>>>>>>>>>> <https://issues.apache.org/jira/browse/FLINK-14422>. >>>>>>>>>>> - Add TaskManagerResourceInfo in >>> TaskManagerDetailsInfo to >>>>>> show >>>>>>>>>>> TaskManager Resource FLINK-14435 >>>>>>>>>>> <https://issues.apache.org/jira/browse/FLINK-14435>. >>>>>>>>>>> >>>>>>>>>>> I will continue to update the rest part of the backend >>> design >>>>> in >>>>>>> the >>>>>>>>> doc, >>>>>>>>>>> let's keep discuss here, any feedback is appreciated. >>>>>>>>>>> >>>>>>>>>>> Yadong Xie <vthink...@gmail.com> 于2019年9月27日周五 上午10:13写道: >>>>>>>>>>> >>>>>>>>>>>> Hi all >>>>>>>>>>>> >>>>>>>>>>>> Flink Web UI is the main platform for most users to >>> monitor >>>>>> their >>>>>>>>> jobs >>>>>>>>>>> and >>>>>>>>>>>> clusters. We have reconstructed Flink web in 1.9.0 >>> version, >>>>> but >>>>>>>> there >>>>>>>>>> are >>>>>>>>>>>> still some shortcomings. >>>>>>>>>>>> >>>>>>>>>>>> This discussion thread aims to provide a better >>> experience >>>>> for >>>>>>>> Flink >>>>>>>>> UI >>>>>>>>>>>> users. >>>>>>>>>>>> >>>>>>>>>>>> Here is the design doc I drafted: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> The FLIP can be found at [2]. >>>>>>>>>>>> >>>>>>>>>>>> Please keep the discussion here, in the mailing list. >>>>>>>>>>>> >>>>>>>>>>>> Looking forward to your opinions, any feedbacks are >>>> welcome. >>>>>>>>>>>> >>>>>>>>>>>> [1]: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing >>>>>>>>>>>> < >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit# >>>>>>>>>>>>> >>>>>>>>>>>> [2]: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> https://cwiki.apache.org/confluence/display/FLINK/FLIP-75%3A+Flink+Web+UI+Improvement+Proposal >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>