Re: [DISCUSS] FLIP-75: Flink Web UI Improvement Proposal

Xintong Song Mon, 07 Oct 2019 20:14:33 -0700

2. Sounds good to me.

4. If that is the case, I would suggest to make a large default page size,
so incase of huge log data we have less large pages rather than lots of
small pages.


Thank you~

Xintong Song



On Tue, Oct 8, 2019 at 11:03 AM Yadong Xie <vthink...@gmail.com> wrote:

> Hi Xintong Song
>
> 2. We could switch between the detailed mode(including cpu, task heap,
> task off-heap, shuffle, on-heap managed, off-heap managed) and the summary
> mode(only including cpu and mem), which is very easy to do in UI design.
>
> 4. I think the key point is not pagination in Web UI but the REST API will
> totally *break* without pagination in current design mode.
> In my opinion, pagination is better than nothing, the pagination is a
> solution to keep log API work, and it would be great if there is another
> way to keep it work with huge log data.
>
> Xintong Song <tonysong...@gmail.com> 于2019年9月30日周一 下午7:19写道：
>
> > @Yadong
> >
> > 2. I agree that we can update the task executor ui after flip-56 is done.
> > But I would suggest keep it on discussion to come up with a proper ui
> > design for task executor resources. I don't think the mentioned image
> from
> > flip-56 is a good choice. That image is a simplified figure with cpu and
> > total memory only, for the purpose of demonstrating dynamic slot
> > allocation. In fact, there are 6 fields to be displayed (cpu, task heap,
> > task off-heap, shuffle, on-heap managed, off-heap managed). If we display
> > cpu and total memory only, then user will be confused when seeing a task
> > executor with enough remaining resources but tasks cannot be deployed
> onto
> > it (because the desired type of memory might be used up).
> >
> > 4. I've been using blink webui, which already have log pagination. It's
> > quite common that we need do search for some keywords (e.g., exception,
> > error, warning) from a large amount of logs for diagnosing problems. I
> find
> > it very inconvenient that I have to click into each page searching for
> the
> > keywords, and I'd rather take the effort to find the original log files
> > from the filesystem to view the log. Personally speaking, if the keyword
> > searching cannot be supported, I would prefer to take some time loading
> the
> > non-paginated logs over than paginated ones. Or we may at least have a
> > button on the webui for switching between the two alternatives.
> >
> > @Till
> >
> > Thanks for the inputs.
> >
> > Thank you~
> >
> > Xintong Song
> >
> >
> >
> > On Mon, Sep 30, 2019 at 5:55 PM Till Rohrmann <trohrm...@apache.org>
> > wrote:
> >
> > > For 3. At the moment the log and stdout file serving requires the
> > > TaskExecutor to be running. But in some scenarios when having a NFS, it
> > > should be enough to know where the file is located. However, this
> > > assumption does not hold in the general case.
> > >
> > > Cheers,
> > > Till
> > >
> > > On Mon, Sep 30, 2019 at 11:43 AM Yadong Xie <vthink...@gmail.com>
> wrote:
> > >
> > > > Hi Xintong Song
> > > >
> > > > Thanks for your comments!
> > > >
> > > > 1. I think it is a good idea that to align CPU and memory usage with
> > > > FLIP-49 if it will release in version 1.10
> > > > 2. We can update the task executor UI design after FLIP-56 merged
> into
> > > > master. Actually, the image
> > > > <
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/download/attachments/125309297/BlinkResourceTM.png?version=1&modificationDate=1566223821000&api=v2
> > > > >
> > > > in FLIP-56 is a good UI design, we can follow it in the Flink web.
> > > > 3. No idea about it, maybe anyone famailar with the runtime part
> could
> > > > answer it? but it would be great to add it to the web UI in my
> opinion.
> > > > 4. I'm not sure will keyword searching across all the pages may cost
> > too
> > > > many resources in job manager, but I think it would be very useful if
> > the
> > > > REST API could support it.
> > > >
> > > > Best,
> > > > Yadong
> > > >
> > > > Xintong Song <tonysong...@gmail.com> 于2019年9月29日周日 下午8:11写道：
> > > >
> > > > > Thanks for drafting the FLIP and starting this discussion, Yadong.
> > > > >
> > > > >
> > > > > I have some comments:
> > > > >
> > > > >
> > > > >    - I can see that the proposed memory and cpu usage to be
> displayed
> > > (in
> > > > >    section 1.1) are aligned with the current ResourceProfile
> fields.
> > > > > However,
> > > > >    we are working on changing the memory fields in 1.10 with
> FLIP-49
> > > > [1]. I
> > > > >    suggest we align the UI design with the new FLIP-49 memory
> fields.
> > > > >    - The task executor overview design (in section 1.2) is based on
> > the
> > > > >    current slot model. The coming FLIP-56 [2] which is also planned
> > for
> > > > > 1.10
> > > > >    is changing the model so that task executors no longer have
> fixed
> > > > > number of
> > > > >    slots, but allocated slots (may have different resources) and
> > > > available
> > > > >    resources.
> > > > >       - I can see that there's discussions in the google doc about
> > > using
> > > > >       different color for available resources. However, the
> resource
> > > > > availability
> > > > >       for different fields can be different, and may not be simply
> > > > > displayed by a
> > > > >       different color. E.g., a task executor may have two slot,
> while
> > > > slot
> > > > > 1
> > > > >       takes (20% cpu, 10% heap mem, 50% managed mem, etc.), slot 2
> > > takes
> > > > > (10%
> > > > >       cpu,  35% heap mem, 0% managed mem etc.), and the remaining
> > > > > resources in
> > > > >       the task executor are (70% cpu, 55% heap mem, 50% managed
> mem,
> > > > > etc.). How
> > > > >       do you plan to display that?
> > > > >       - I would suggest to have multiple bars for each task
> executor,
> > > > while
> > > > >       each bar represents one of the resource fields. In addition,
> we
> > > > > may have a
> > > > >       number (or some other figures) showing how many slots are
> > > allocated
> > > > > from
> > > > >       the task executor.
> > > > >    - Is there any way we provide access to logs of terminated task
> > > > >    executors? It occurs to us a lot that a job failed due to a task
> > > > > executor
> > > > >    fail/lost. And we have to find the logs of failed task executors
> > by
> > > > >    manually accessing the file system. I think it would be helpful
> if
> > > we
> > > > > can
> > > > >    find the logs of failed task executors directly in flink webui.
> > > > >    - Regarding log pagination, is there any way to provide keyword
> > > > >    searching across all the pages?
> > > > >
> > > > >
> > > > > Thank you~
> > > > >
> > > > > Xintong Song
> > > > >
> > > > >
> > > > > [1]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-49%3A+Unified+Memory+Configuration+for+TaskExecutors
> > > > > [2]
> > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-56%3A+Dynamic+Slot+Allocation
> > > > >
> > > > > On Fri, Sep 27, 2019 at 3:57 PM Paul Lam <paullin3...@gmail.com>
> > > wrote:
> > > > >
> > > > > > Filed a jira to track this[1].  Thanks a lot.
> > > > > >
> > > > > > [1] https://issues.apache.org/jira/browse/FLINK-14242 <
> > > > > > https://issues.apache.org/jira/browse/FLINK-14242>
> > > > > >
> > > > > > Best,
> > > > > > Paul Lam
> > > > > >
> > > > > > > 在 2019年9月27日，14:34，Yadong Xie <vthink...@gmail.com> 写道：
> > > > > > >
> > > > > > > Hi Paul
> > > > > > > Thanks for your suggestion.
> > > > > > > I think it is easy to implement, could you create a JIRA for
> me?
> > > > > > >
> > > > > > > Paul Lam <paullin3...@gmail.com> 于2019年9月27日周五 上午11:11写道：
> > > > > > >
> > > > > > >> Hi Yadong,
> > > > > > >>
> > > > > > >> Thanks a lot for summing up the Web UI efforts.
> > > > > > >>
> > > > > > >> I have a minor suggestion: can we provide a collapse button
> for
> > > the
> > > > > task
> > > > > > >> names in job graph visualization? For some complex jobs,
> > > especially
> > > > > SQL
> > > > > > >> jobs, the task names are quite long which makes the job graph
> > hard
> > > > to
> > > > > > read.
> > > > > > >>
> > > > > > >> Best,
> > > > > > >> Paul Lam
> > > > > > >>
> > > > > > >>> 在 2019年9月27日，10:13，Yadong Xie <vthink...@gmail.com> 写道：
> > > > > > >>>
> > > > > > >>> Hi all
> > > > > > >>>
> > > > > > >>> Flink Web UI is the main platform for most users to monitor
> > their
> > > > > jobs
> > > > > > >> and
> > > > > > >>> clusters. We have reconstructed Flink web in 1.9.0 version,
> but
> > > > there
> > > > > > are
> > > > > > >>> still some shortcomings.
> > > > > > >>>
> > > > > > >>> This discussion thread aims to provide a better experience
> for
> > > > Flink
> > > > > UI
> > > > > > >>> users.
> > > > > > >>>
> > > > > > >>> Here is the design doc I drafted:
> > > > > > >>>
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
> > > > > > >>>
> > > > > > >>>
> > > > > > >>> The FLIP can be found at [2].
> > > > > > >>>
> > > > > > >>> Please keep the discussion here, in the mailing list.
> > > > > > >>>
> > > > > > >>> Looking forward to your opinions, any feedbacks are welcome.
> > > > > > >>>
> > > > > > >>> [1]:
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit?usp=sharing
> > > > > > >>> <
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1tIa8yN2prWWKJI_fa1u0t6h1r6RJpp56m48pXEyh6iI/edit#
> > > > > > >>>
> > > > > > >>> [2]:
> > > > > > >>>
> > > > > > >>
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-75%3A+Flink+Web+UI+Improvement+Proposal
> > > > > > >>
> > > > > > >>
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] FLIP-75: Flink Web UI Improvement Proposal

Reply via email to