[ 
https://issues.apache.org/jira/browse/SAMZA-349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068819#comment-14068819
 ] 

Chris Riccomini commented on SAMZA-349:
---------------------------------------

What should be done here is dependent on the downstream consumer. If you have a 
running count, the downstream system can simply keep track of the last report 
it got, and subtract the last report from the current report to get the 
difference. If you have just an incremental count, then the downstream system 
can keep the running total.

Given that both implementations allow you to derive the same overall 
information, I agree that it makes more sense to just send incremental updates.

I see a problem with the patch you've provided, though. We fetch the value in 
one line, and then reset back to 0 in the next. This leads to a race condition 
where we might fetch the value, then another thread updates the count 
(reporting happens on a separate thread from SamzaContainer's main thread), 
then we call clear. In this scenario, there is data loss since we reset the 
counter after it's been updated again, but before it's been reported.

To solve this problem, I think we need to use getAndSet in Counter.set, and 
return the old value. Then we can call clear() and atomically get the old value 
while updating the new value back to 0.

> MetricsSnapshotReporter does not refresh metrics every 60 seconds
> -----------------------------------------------------------------
>
>                 Key: SAMZA-349
>                 URL: https://issues.apache.org/jira/browse/SAMZA-349
>             Project: Samza
>          Issue Type: Bug
>            Reporter: Yan Fang
>            Assignee: Yan Fang
>         Attachments: SAMZA-349.patch
>
>
> If my understanding is correct, the metrics we provide are for every 60 
> seconds and all counters will be reset every 60 seconds. Current the 
> MetricsSnapshotReporter seems missing this implementation. It sends out the 
> metrics every 60 seconds but does not reset the counter value.
> {code}
>  registry.getGroup(group).foreach {
>           case (name, metric) =>
>             metric.visit(new MetricsVisitor {
>               def counter(counter: Counter) = groupMsg.put(name, 
> counter.getCount: java.lang.Long)
>               def gauge[T](gauge: Gauge[T]) = groupMsg.put(name, 
> gauge.getValue.asInstanceOf[Object])
>             })
>         }
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to