[ 
https://issues.apache.org/jira/browse/AMBARI-16946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated AMBARI-16946:
----------------------------------
    Description: 
There's a mismatch between TimelineMetricsCache and Storm metrics unit, while 
TimelineMetricsCache considers "metric name + timestamp" to be unique but Storm 
is not.

For example, assume that bolt B has task T1, T2 and B has registered metrics 
M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same 
timestamp TS1 (in TaskInfo, not current time), and received later will be 
discarded from TimelineMetricsCache.

If we want to have unique metric point of Storm, we should use "topology name + 
component name + task id + metric name" to metric name so that "metric name + 
timestamp" will be unique.

There're other issues I would like to address, too.

- Currently, hostname is written to hostname of the machine which runs metrics 
sink. Since TaskInfo has hostname of the machine which runs task, we're better 
to use this.
- Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this 
as millisecond, resulting in timestamp flaw, and malfunction of cache eviction. 
It should be multiplied by 1000.
- 'component name' is not unique across the cluster, so it's not fit for app 
id. 'topology name' is unique so proper value of app id is topology name. 

  was:
There's a mismatch between TimelineMetricsCache and Storm metrics unit, while 
TimelineMetricsCache considers "metric name + timestamp" to be unique but Storm 
is not.

For example, assume that bolt B has task T1, T2 and B has registered metrics 
M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same 
timestamp TS1 (in TaskInfo, not current time), and received later will be 
discarded from TimelineMetricsCache.

If we want to have unique metric point of Storm, we should use "topology name + 
component name + task id + metric name" to metric name so that "metric name + 
timestamp" will be unique.

There're other issues I would like to address, too.

- Currently, hostname is written to hostname of the machine which runs metrics 
sink. Since TaskInfo has hostname of the machine which runs task, we're better 
to use this.
- Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this 
as millisecond, resulting in timestamp flaw, and malfunction of cache eviction. 
It should be multiplied by 1000.


> Storm Metrics Sink has high chance to discard some datapoints
> -------------------------------------------------------------
>
>                 Key: AMBARI-16946
>                 URL: https://issues.apache.org/jira/browse/AMBARI-16946
>             Project: Ambari
>          Issue Type: Bug
>          Components: ambari-metrics
>            Reporter: Jungtaek Lim
>
> There's a mismatch between TimelineMetricsCache and Storm metrics unit, while 
> TimelineMetricsCache considers "metric name + timestamp" to be unique but 
> Storm is not.
> For example, assume that bolt B has task T1, T2 and B has registered metrics 
> M1. It's possible for metrics sink to receive (T1, M1) and (T2, M1) with same 
> timestamp TS1 (in TaskInfo, not current time), and received later will be 
> discarded from TimelineMetricsCache.
> If we want to have unique metric point of Storm, we should use "topology name 
> + component name + task id + metric name" to metric name so that "metric name 
> + timestamp" will be unique.
> There're other issues I would like to address, too.
> - Currently, hostname is written to hostname of the machine which runs 
> metrics sink. Since TaskInfo has hostname of the machine which runs task, 
> we're better to use this.
> - Unit of timestamp of TaskInfo is second, while Storm Metrics Sink uses this 
> as millisecond, resulting in timestamp flaw, and malfunction of cache 
> eviction. It should be multiplied by 1000.
> - 'component name' is not unique across the cluster, so it's not fit for app 
> id. 'topology name' is unique so proper value of app id is topology name. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to