[
https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126018#comment-14126018
]
Maysam Yabandeh commented on HDFS-6982:
---------------------------------------
bq. I anticipate that one use case is to push the metrics ganglia / nagios
directly. It should not require any aggregation. Am I correct?
I see another proposal here different from what the design doc suggests and
different from the one that proved to work in practice. I guess we need to
first analyze the pros and cons of this alternative and reach a full-fledged
design before we plan according to it. So, let me try to understand your vision
better by asking some questions about its details:
Lets say that you have a cluster with 1m op/min. If aggregator is not part of
nn, how such a large volume of events are transferred to the aggregator? Do you
envision that jmx creates a metric per command run on the name node and gangila
reads from it over the network and aggregate it? Do you have some numbers of
the volume of the data that needs to be transferred to gangila in real time by
this approach, and some numbers of how much traffic gangila can handle with
reasonable overhead?
In the second architecture in the design doc where the aggregator was placed in
a separate process, it benefited from the existing local log files and parse
them off the memory before the file system pushes it to the disk, thus allowing
a very efficient processing of such large volume of data, The aggregator then
generates a very small set of top users which is efficient to be transferred to
the monitoring tool over the network.
> nntop: top-like tool for name node users
> -----------------------------------------
>
> Key: HDFS-6982
> URL: https://issues.apache.org/jira/browse/HDFS-6982
> Project: Hadoop HDFS
> Issue Type: New Feature
> Reporter: Maysam Yabandeh
> Assignee: Maysam Yabandeh
> Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf
>
>
> In this jira we motivate the need for nntop, a tool that, similarly to what
> top does in Linux, gives the list of top users of the HDFS name node and
> gives insight about which users are sending majority of each traffic type to
> the name node. This information turns out to be the most critical when the
> name node is under pressure and the HDFS admin needs to know which user is
> hammering the name node and with what kind of requests. Here we present the
> design of nntop which has been in production at Twitter in the past 10
> months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K
> nodes), low memory footprint (less than a few MB), and quite efficient for
> the write path (only two hash lookup for updating a metric).
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)