[
https://issues.apache.org/jira/browse/SAMZA-349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14125040#comment-14125040
]
Martin Kleppmann commented on SAMZA-349:
----------------------------------------
bq. Is there a way to bypass this problem? I feel the reservoir has to keep all
the events in the time interval. What does the "downsampling" mean here?
If you're willing for the metric to be approximate (which, in practice, is
usually fine), then you don't need to keep every single timing within your
aggregation period. It's sufficient to keep a randomly selected sample, for
example up to 1,000 values. That will give you a good estimate of the metric
while using much less memory. A standard algorithm for this is [reservoir
sampling|http://en.wikipedia.org/wiki/Reservoir_sampling].
In order to expire old values, you don't need to keep a timestamp for every
single value. If you want to aggregate over the last 5 minutes, a simple
approach is to keep a separate reservoir for every minute. To calculate the
metric, you can combine the samples from the last 5 minutely reservoirs. Once a
minute, you throw away the oldest reservoir which is no longer needed.
> add timer in metrics
> --------------------
>
> Key: SAMZA-349
> URL: https://issues.apache.org/jira/browse/SAMZA-349
> Project: Samza
> Issue Type: Bug
> Reporter: Yan Fang
> Assignee: Yan Fang
> Attachments: SAMZA-349.1.patch, SAMZA-349.2.patch, SAMZA-349.3.patch,
> SAMZA-349.patch, SAMZA-349.patch
>
>
> If my understanding is correct, the metrics we provide are for every 60
> seconds and all counters will be reset every 60 seconds. Current the
> MetricsSnapshotReporter seems missing this implementation. It sends out the
> metrics every 60 seconds but does not reset the counter value.
> {code}
> registry.getGroup(group).foreach {
> case (name, metric) =>
> metric.visit(new MetricsVisitor {
> def counter(counter: Counter) = groupMsg.put(name,
> counter.getCount: java.lang.Long)
> def gauge[T](gauge: Gauge[T]) = groupMsg.put(name,
> gauge.getValue.asInstanceOf[Object])
> })
> }
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)