[ 
https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14130474#comment-14130474
 ] 

Andrew Wang commented on HDFS-6982:
-----------------------------------

Sorry for coming late to this discussion. Big +1 from me too, thanks for 
working on this [~maysamyabandeh]!

[~wheat9], do you have any specific concerns about the in-process architecture? 
I think these reasons have already been discussed above, but I think there are 
a lot of advantages to having it directly in the NN:

* Useful without additional services. Ganglia / etc can always collect the 
exposed top metrics too.
* More efficient than tailing the audit log since the aggregation is happening 
in the NN.
* Better cross-Hadoop-version story. The audit log is not a well-defined, 
compatible, machine-friendly format.

Maybe Maysam can comment, but they might have run the out-of-process 
architecture in production as an expediency, rather than it being a superior 
design.

I'll also note that we do have a rolling-window-ish metric already, 
MutableQuantiles, so there's some precedent for it. Based on my quick look at 
the patch, I don't see any overheads that are significantly greater in 
RollingWindow, and we can always make it enabled by a configuration option.

> nntop: top­-like tool for name node users
> -----------------------------------------
>
>                 Key: HDFS-6982
>                 URL: https://issues.apache.org/jira/browse/HDFS-6982
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Maysam Yabandeh
>            Assignee: Maysam Yabandeh
>         Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf
>
>
> In this jira we motivate the need for nntop, a tool that, similarly to what 
> top does in Linux, gives the list of top users of the HDFS name node and 
> gives insight about which users are sending majority of each traffic type to 
> the name node. This information turns out to be the most critical when the 
> name node is under pressure and the HDFS admin needs to know which user is 
> hammering the name node and with what kind of requests. Here we present the 
> design of nntop which has been in production at Twitter in the past 10 
> months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K 
> nodes), low memory footprint (less than a few MB), and quite efficient for 
> the write path (only two hash lookup for updating a metric).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to