[ 
https://issues.apache.org/jira/browse/HDFS-6982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14208942#comment-14208942
 ] 

Haohui Mai commented on HDFS-6982:
----------------------------------

Please correct me if I'm wrong. Let's say we have a rolling window of 3, and 
the current observation {{o}} is 

{noformat}
o = [o1, o2, o3];
{noformat}

Consider the following interleaving.

1. The user measures the observation. He gets {{(o1 + o2 + o3) / 3}}.
2. The observation {{o1}} is stale, thus it is reset to zero by {{safeReset}}.
3. Right before {{bucket.inc()}} is called, the user makes another measurement, 
now he gets {{(0 + o2 + o3) / 3}}.
4. {{o1}} is updated.

That way the user gets incorrect measurement in step 3.

My feeling is that it is more robust to calculate the moving average instead of 
reseting the observation in every ticks. Actually, the core functionality can 
be implemented in the following code:

{code}
observation = new ConcurrentHashMap<String, Long>();

synchronized void bulkUpdate(Map<String, Long> updates) {
  for (Map.Entry<String, Long> e : updates) {
    long v = observation.get(e.getKey()) != null ? observation.get(e.getKey()) 
: 0;
    observation.put(e.getKey(), ALPHA * v + e.getValue());
  }
  for (Map.Entry<String, Long> e : observation) {
    if (!updates.containsKey(e.getKey())) {
      long v = ALPHA * e.getValue();
      if (v == 0) { observation.remove(e.getKey()); } else { 
observation.put(e.getKey(), v); }
    }
  }
}
synchronized Map<String, Long> observe() { return map; }
{code}

Assuming that the size of {{updates}} is bounded (which should be the case in 
nntop), it should be fairly efficient. Thoughts?

> nntop: top­-like tool for name node users
> -----------------------------------------
>
>                 Key: HDFS-6982
>                 URL: https://issues.apache.org/jira/browse/HDFS-6982
>             Project: Hadoop HDFS
>          Issue Type: New Feature
>            Reporter: Maysam Yabandeh
>            Assignee: Maysam Yabandeh
>         Attachments: HDFS-6982.patch, HDFS-6982.v2.patch, HDFS-6982.v3.patch, 
> HDFS-6982.v4.patch, HDFS-6982.v5.patch, HDFS-6982.v6.patch, 
> nntop-design-v1.pdf
>
>
> In this jira we motivate the need for nntop, a tool that, similarly to what 
> top does in Linux, gives the list of top users of the HDFS name node and 
> gives insight about which users are sending majority of each traffic type to 
> the name node. This information turns out to be the most critical when the 
> name node is under pressure and the HDFS admin needs to know which user is 
> hammering the name node and with what kind of requests. Here we present the 
> design of nntop which has been in production at Twitter in the past 10 
> months. nntop proved to have low cpu overhead (< 2% in a cluster of 4K 
> nodes), low memory footprint (less than a few MB), and quite efficient for 
> the write path (only two hash lookup for updating a metric).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to