[ 
https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14169779#comment-14169779
 ] 

Maysam Yabandeh commented on HDFS-6982:
---------------------------------------

Thanks [~andrew.wang] for the well-detailed review. I will submit a new patch 
soon. In the meanwhile, let me double check a couple of points with you.

bq. Since I don't see any modifications to any existing files, I'm also 
wondering how this is exposed to JMX or on the webUI.

You are right. I was not sure where is the best place to integrate nntop with 
nn. I will pick a place and we can update it later.

bq. There's only a {{getDefaultRollingWindow}} class, no other ways of 
constructing a RollingWindow.

The design doc envisions two interfaces to access the top users. One is jmx 
that requires rolling window over only one reporting period, say 1 minute. Jmx 
data however are most useful when they are integrated with an external graphing 
tool. To also allow users with small clusters to benefit from the data computed 
by nntop, we also provide an html interface, which has no graphing capability. 
This basic interface unfortunately does not give a sense of *trend* to the 
viewer. To compensate for that, the html page will show the top users over 
multiple time periods, say 1, 5, 25 minutes; ergo why we have multiple rolling 
window periods in nntop. One of them however is used for jmx interface, which 
is specific by {{getDefaultRollingWindow}}.

About the html interface, I excluded it from this patch for two reasons. First, 
i figured it is better to keep this patch as small as possible and work on the 
html interface patch on a separate jira. Second reason was that previously I 
had used yarn html utils and I am gonna have to rewrite that part using html 
utils which are standard to the hdfs project.

bq. How do we configure multiple reporting periods?

via some conf params. I will make sure that the docs reflect that properly.

bq. WEB_PORT and DEFAULT_WEB_PORT seem to be unused

you right. they are supposed to be used by the html interface. but I should 
remove them from this patch.

bq. getCmdTotal and getTopMetricsRecordPrefix static getters are only used in 
TopMetrics, that might be a better home.

they will later be used by the html interface as well. the html interface will 
show the total operations on top and then details of each command afterwards. 

bq. Rather than MIN_2_MS, could we have a long array with the default periods, 
i.e. DEFAULT_REPORTING_PERIODS?

In addition to the previous explanation about multiple reporting periods for 
the html view, I should add the them reporting periods are expected to be 
specified in the conf file. I dropped the method that reads them from the conf 
file from the patch since it was invoked only via the html interface. But I 
guess I should put it back to avoid confusion.

bq. report, we construct the permStr, but don't actually use it.

you are right. I actually can drop src, dst, and also status. At the beginning 
the vision for nntop was to also report hot directories, etc. and that is why 
we kept the full details in the report method. but i guess we can always put 
such details back if at some point those visions were to pursued.

bq. report, I don't think we need the catch for Throwable t, no checked 
exceptions are being thrown?

the idea was that any unexpected problem from a programming bug in nntop should 
not crash the name node.

bq.  TopUtil: This stuff isn't shared much, seems like we could just move 
things to where they're used

TopUtil was much fatter when it also included html view util functions. Also 
html view will also be a user of TopUtil.

bq. TopMetricsCollector: Is this used?
 
yeah, by the html view. I should drop it from this patch.

> nntop: top­-like tool for name node users
> -----------------------------------------
>
>                 Key: HDFS-6982
>                 URL: https://issues.apache.org/jira/browse/HDFS-6982
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Maysam Yabandeh
>            Assignee: Maysam Yabandeh
>         Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, nntop-design-v1.pdf
>
>
> In this jira we motivate the need for nntop, a tool that, similarly to what 
> top does in Linux, gives the list of top users of the HDFS name node and 
> gives insight about which users are sending majority of each traffic type to 
> the name node. This information turns out to be the most critical when the 
> name node is under pressure and the HDFS admin needs to know which user is 
> hammering the name node and with what kind of requests. Here we present the 
> design of nntop which has been in production at Twitter in the past 10 
> months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K 
> nodes), low memory footprint (less than a few MB), and quite efficient for 
> the write path (only two hash lookup for updating a metric).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to