Thanks for updating the FLIP Yadong.

What is the difference between managedMemory and managedMemoryTotal
and networkMemory and networkMemoryTotal in the REST response? If they are
duplicates, then we might be able to remove one.

Apart from that, the proposal looks good to me.

Pulling also Andrey in to hear his opinion about the representation of the
memory components.

Cheers,
Till

On Thu, Mar 19, 2020 at 11:37 AM Yadong Xie <vthink...@gmail.com> wrote:

> Hi all
>
> I have updated the design of the metric page and FLIP doc, please let me
> know what you think about it
>
> FLIP-102:
>
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager
> POC web:
>
> http://101.132.122.69:8081/web/#/task-manager/8e1f1beada3859ee8e46d0960bb1da18/metrics
>
> Till Rohrmann <trohrm...@apache.org> 于2020年2月27日周四 下午10:27写道:
>
> > Thinking a bit more about the problem whether to report the aggregated
> > memory statistics or the individual slot statistics, I think reporting it
> > on a per slot basis won't work nicely together with FLIP-56 (dynamic slot
> > allocation). The problem is that with FLIP-56, we will no longer have
> > dedicated slots. The number of slots might change over the lifetime of a
> > TaskExecutor. Hence, it won't be easy to generate a metric path for every
> > slot which are furthermore also ephemeral. So maybe, the more general and
> > easier solution would be to report the overall memory usage of a
> > TaskExecutor even though it means to do some aggregation on the
> > TaskExecutor.
> >
> > Concerning the JVM limit: Isn't it mainly the code cache? If we display
> > this value, then we should explain what exactly it means. I fear that
> most
> > users won't understand what JVM limit actually means.
> >
> > Cheers,
> > Till
> >
> > On Wed, Feb 26, 2020 at 11:15 AM Yadong Xie <vthink...@gmail.com> wrote:
> >
> > > Hi Till
> > >
> > > Thanks a lot for your response
> > >
> > > > 2. I'm not entirely sure whether I would split the memory ...
> > >
> > > Split the memory display comes from the 'ancient' design of the web, it
> > is
> > > ok for me to change it following total/heap/managed/network/direct/jvm
> > > overhead/mapped sequence
> > >
> > > > 3. Displaying the memory configurations...
> > >
> > > I agree with you that it is not a very nice way, but the hierarchical
> > > relationship of configurations is too complex and hard to display in
> the
> > > other ways (I have tried)
> > >
> > > if anyone has a better idea, please feels no hesitates to help me
> > >
> > >
> > > > 4. What does JVM limit mean in Non-heap.JVM-Overhead?
> > >
> > > JVM limit is "non-heap max metric minus metaspace configuration" as
> > > @Xintong
> > > Song <tonysong...@gmail.com> replyed in this mail thread
> > >
> > >
> > > Till Rohrmann <trohrm...@apache.org> 于2020年2月25日周二 下午6:58写道:
> > >
> > > > Thanks for creating this FLIP Yadong. I think your proposal makes it
> > much
> > > > easier for the user to understand what's happening on Flink
> > > TaskManager's.
> > > >
> > > > I have some comments:
> > > >
> > > > 1. Some of the newly introduced metrics involve computations on the
> > > > TaskManager. I would like to avoid additional computations introduced
> > by
> > > > metrics as much as possible because metrics should not affect the
> > system.
> > > > In particular, total memory sizes which are configured should not be
> > > > derived computationally (getManagedMemoryTotal, getTotalMemorySize).
> > For
> > > > the currently available memory sizes (e.g. getManagedMemoryUsed), one
> > > could
> > > > think about reporting them on a per slot basis and to do the
> > aggregation
> > > on
> > > > the client side. Of course, this would increase the size of the
> > response
> > > > payload.
> > > >
> > > > 2. I'm not entirely sure whether I would split the memory display
> into
> > > JVM
> > > > memory and non JVM memory as you've done it int the POC. From a
> user's
> > > > perspective, one could start displaying the total process memory. The
> > > next
> > > > three most important metrics are the heap, managed memory and network
> > > > buffer usage, I guess. If one is interested in more details, one
> could
> > > then
> > > > display the remaining direct memory usage, the JVM overhead (I'm not
> > sure
> > > > whether I would call this non-heap though) and the mapped memory.
> > > >
> > > > 3. Displaying the memory configurations in three nested boxes does
> not
> > > look
> > > > so nice to me. I'm not sure how else one could display it, though.
> > > >
> > > > 4. What does JVM limit mean in Non-heap.JVM-Overhead?
> > > >
> > > > Cheers,
> > > > Till
> > > >
> > > > On Tue, Feb 25, 2020 at 8:19 AM Yadong Xie <vthink...@gmail.com>
> > wrote:
> > > >
> > > > > Hi Xintong
> > > > > thanks for your advice, the POC web and the FLIP doc was updated
> now
> > > > > here is the new link:
> > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/web/#/task-manager/7e7cf0293645c8537caab915c829aa73/metrics
> > > > >
> > > > >
> > > > > Xintong Song <tonysong...@gmail.com> 于2020年2月21日周五 下午12:00写道:
> > > > >
> > > > > > >
> > > > > > > 1. Should the managed memory be part of direct memory?
> > > > > > >
> > > > > > The answer is no. Managed memory is currently allocated by
> > accessing
> > > to
> > > > > > private field of Unsafe. It is not accounted for in JVM's direct
> > > memory
> > > > > > limit and corresponding metrics. To that end, it is equivalent to
> > > > > > native memory.
> > > > > >
> > > > > >
> > > > > > > 2. Should the shuffle memory also be part of the managed
> memory?
> > > > > >
> > > > > > I don't think so. Shuffle (Network) memory is allocated with
> direct
> > > > > > buffers, and accounted for in JVM's direct memory limit and
> > > > corresponding
> > > > > > metrics. Moreover, the FLIP-49 memory model expose network memory
> > and
> > > > > > managed memory as two independent components of the overall
> memory
> > > > > > footprint.
> > > > > >
> > > > > >
> > > > > > Thank you~
> > > > > >
> > > > > > Xintong Song
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Fri, Feb 21, 2020 at 11:45 AM Kurt Young <ykt...@gmail.com>
> > > wrote:
> > > > > >
> > > > > > > Some questions related to "managed memory":
> > > > > > >
> > > > > > > 1. Should the managed memory be part of direct memory?
> > > > > > > 2. Should the shuffle memory also be part of the managed
> memory?
> > > > > > >
> > > > > > > Best,
> > > > > > > Kurt
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Feb 21, 2020 at 10:41 AM Xintong Song <
> > > tonysong...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for driving this FLIP, Yadong.
> > > > > > > >
> > > > > > > > +1 (non-binding) for the FLIP in general. I think this really
> > > helps
> > > > > our
> > > > > > > > users to understand and use the new FLIP-49 memory
> > configuration.
> > > > > > > >
> > > > > > > > I have a few minor comments.
> > > > > > > > - There's a frame "Other" in the frame "Non-Heap", besides
> "JVM
> > > > > > Overhead"
> > > > > > > > and "JVM Metaspace". IIUC, the purpose of this is to explain
> > the
> > > > > > > > mismatching between the metric "non-heap maximum" and the sum
> > of
> > > > the
> > > > > > > > configurations "JVM metaspace" & "JVM Overhead". However,
> from
> > > the
> > > > > > > > perspective of FLIP-49, JVM Overhead accounts for all the JVM
> > > > > non-heap
> > > > > > > > memory usages except for metaspace. The metrics does not
> match
> > > the
> > > > > > > > configuration because we did not set the a JVM parameter for
> > "max
> > > > > > > non-heap
> > > > > > > > memory" (actually I'm not sure whether it can be specified in
> > > java
> > > > > 8).
> > > > > > > The
> > > > > > > > current UI might confuse people making them think there are
> > other
> > > > > > > non-heap
> > > > > > > > memory usages not accounted by the configurations.
> Therefore, I
> > > > would
> > > > > > > > suggest to remove the "Other" frame, but add another frame
> > inside
> > > > > "JVM
> > > > > > > > Overhead", besides "Configuration", with "JVM limit" as the
> > title
> > > > and
> > > > > > > > "non-heap max metric minus metaspace configuration" as the
> > value
> > > .
> > > > > > > >
> > > > > > > > - In the final release, we have changed "shuffle memory" to
> > > > "network
> > > > > > > > memory" because the latter is easier to understand for
> users. I
> > > > think
> > > > > > we
> > > > > > > > should be updated it in this FLIP as well.
> > > > > > > >
> > > > > > > > - There's a typo "Directed" (should be "Direct") at the
> direct
> > > > memory
> > > > > > > > metric.
> > > > > > > >
> > > > > > > > Thank you~
> > > > > > > >
> > > > > > > > Xintong Song
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Feb 20, 2020 at 5:52 PM Yadong Xie <
> > vthink...@gmail.com>
> > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi all
> > > > > > > > >
> > > > > > > > > I want to start the vote for FLIP-102, which proposes to
> add
> > > more
> > > > > > > metrics
> > > > > > > > > to the task manager in web UI.
> > > > > > > > >
> > > > > > > > > To help everyone better understand the proposal, we spent
> > some
> > > > > > efforts
> > > > > > > on
> > > > > > > > > making an online POC
> > > > > > > > >
> > > > > > > > > previous web:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/#/task-manager/6df6c5f37b2bff125dbc3a7388128559/metrics
> > > > > > > > > POC web:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://101.132.122.69:8081/web/#/task-manager/6df6c5f37b2bff125dbc3a7388128559/metrics
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > The vote will last for at least 72 hours, following the
> > > consensus
> > > > > > > voting
> > > > > > > > > process.
> > > > > > > > >
> > > > > > > > > FLIP wiki:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/FLINK/FLIP-102%3A+Add+More+Metrics+to+TaskManager
> > > > > > > > >
> > > > > > > > > Discussion thread:
> > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://apache-flink-mailing-list-archive.1008284.n3.nabble.com/DISCUSS-FLIP-75-Flink-Web-UI-Improvement-Proposal-td33540.html
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > >
> > > > > > > > > Yadong
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to