[ 
https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14126018#comment-14126018
 ] 

Maysam Yabandeh commented on HDFS-6982:
---------------------------------------

bq.  I anticipate that one use case is to push the metrics ganglia / nagios 
directly. It should not require any aggregation. Am I correct?

I see another proposal here different from what the design doc suggests and 
different from the one that proved to work in practice. I guess we need to 
first analyze the pros and cons of this alternative and reach a full-fledged 
design before we plan according to it. So, let me try to understand your vision 
better by asking some questions about its details:

Lets say that you have a cluster with 1m op/min. If aggregator is not part of 
nn, how such a large volume of events are transferred to the aggregator? Do you 
envision that jmx creates a metric per command run on the name node and gangila 
reads from it over the network and aggregate it? Do you have some numbers of 
the volume of the data that needs to be transferred to gangila in real time by 
this approach, and some numbers of how much traffic gangila can handle with 
reasonable overhead?

In the second architecture in the design doc where the aggregator was placed in 
a separate process, it benefited from the existing local log files and parse 
them off the memory before the file system pushes it to the disk, thus allowing 
a very efficient processing of such large volume of data, The aggregator then 
generates a very small set of top users which is efficient to be transferred to 
the monitoring tool over the network.

> nntop: top­-like tool for name node users
> -----------------------------------------
>
>                 Key: HDFS-6982
>                 URL: https://issues.apache.org/jira/browse/HDFS-6982
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Maysam Yabandeh
>            Assignee: Maysam Yabandeh
>         Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf
>
>
> In this jira we motivate the need for nntop, a tool that, similarly to what 
> top does in Linux, gives the list of top users of the HDFS name node and 
> gives insight about which users are sending majority of each traffic type to 
> the name node. This information turns out to be the most critical when the 
> name node is under pressure and the HDFS admin needs to know which user is 
> hammering the name node and with what kind of requests. Here we present the 
> design of nntop which has been in production at Twitter in the past 10 
> months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K 
> nodes), low memory footprint (less than a few MB), and quite efficient for 
> the write path (only two hash lookup for updating a metric).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to