> On May 13, 2015, 5:14 p.m., Jay Kreps wrote:
> > clients/src/main/java/org/apache/kafka/common/metrics/stats/Rate.java, line
> > 62
> > <https://reviews.apache.org/r/34170/diff/1/?file=958215#file958215line62>
> >
> > Is this actually right? I agree you'll get discontinuities as the
> > measured time shrinks to zero but why is giving back 0 the right answer? In
> > the case you guys were testing 0 was safe, but imagine a case where the
> > monitoring was checking that the value didn't fall below some threshold.
>
> Dong Lin wrote:
> The question is, when Rate.measure() is called right after
> Rate.record(n), what should be the return value? I think there are two
> possibilities: 0 and n/config.timeWindowMs(). I didn't find any use case
> where these two values make a difference.
>
> Which value do you think is the best?
>
> Thank you.
>
> Aditya Auradkar wrote:
> I think returning 0 is reasonable if no time has elapsed (technically).
> As an alternate solution, what if we assumed that "elapsed" is always 1
> (at least). For example:
>
> double elapsed = convert(now - stat.oldest(now).lastWindowMs) + 1
>
> In case of seconds, this basically means that you assume the current
> second is always complete. This is only a problem (for a couple of seconds)
> when all previous samples have zero activity or when the server is just
> starting up.
>
> Jay Kreps wrote:
> Actually there is a fundamental issue with the computation that patching
> around the 0 case doesn't fix. That is the instability of the estimate. This
> is the "taking a poll with sample size one" problem.
>
> Even if you patch the 0 case you still get a bad answer 1 ms later. That
> is, let's say you get a single 50k request and your quota is 1MB/sec.
> Currently at 0ms we estimate infinity which is in fact the measured rate but
> obviously not a good estimate. But even 1 ms later the estimate is bad.
> 50k*1000ms = ~50MB/sec.
>
> This is somewhat rare because it only happens when there is just one
> sample.
>
> They key observation is that if a sample is missing, nothing happened in
> that time period. But the calculation should still use that time period.
>
> So the right way to compute it, I think, is actually
> ellapsed = (num_samples-1)*window_size + (now - current_sample.begin)
>
> For safety I think we should also require the number of samples to be >=
> 2 and default it to 3.
>
> Dong Lin wrote:
> Hi Jay: thanks for comments!
>
> I think the problem here is that, when event number is very small (e.g.
> 0, 1), what value should rate.measure() return, right? If I understand your
> solution formula right, when there is only one sample, your measured rate is
> n/ellapsed = n/(now - current_sample.begin), which is exactly same as the
> current code. I agree that requiring number of samples to be >= 2 solve the
> problem. But what happens when user call rate.measure() when there is only 1
> sample?
>
> I agree with you that we still get a bad answer if we patch the 0 case.
> How about we patch the 0 to timeWindowMs case: if (ellapsed < timeWindowMs)
> then ellapsed = timeWindowMs. Does this solve the problem here?
>
> Dong Lin wrote:
> I think we all agree that there is problem with rate measure when
> ellapsed time is too small. What we need to properly define the rate measure
> to handle such case, which is currently missing.
>
> If we want to require number of samples to be >= 2 as Jay suggested, we
> can also let ellapsed = max(ellapsed, 2*timeWindowMs) and keep the rest of
> the code, right?
If we require num samples greater than 2, we shoud simply change this condition
in MetricConfig:
if (samples < 1)
throw new IllegalArgumentException("The number of samples must be
at least 1.");
Jay's equation is now similar to Dongs.
If num samples = 3,
ellapsed = (num_samples-1)*window_size + (now - current_sample.begin)
ellapsed = 2*window_size + (now - current_sample.begin)
- Aditya
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/34170/#review83629
-----------------------------------------------------------
On May 14, 2015, 7:34 a.m., Dong Lin wrote:
>
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/34170/
> -----------------------------------------------------------
>
> (Updated May 14, 2015, 7:34 a.m.)
>
>
> Review request for kafka.
>
>
> Bugs: KAFKA-2191
> https://issues.apache.org/jira/browse/KAFKA-2191
>
>
> Repository: kafka
>
>
> Description
> -------
>
> KAFKA-2191; Measured rate should not be infinite
>
>
> Diffs
> -----
>
> clients/src/main/java/org/apache/kafka/common/metrics/stats/Rate.java
> 98429da34418f7f1deba1b5e44e2e6025212edb3
>
> clients/src/main/java/org/apache/kafka/common/metrics/stats/SampledStat.java
> b341b7daaa10204906d78b812fb05fd27bc69373
>
> Diff: https://reviews.apache.org/r/34170/diff/
>
>
> Testing
> -------
>
>
> Thanks,
>
> Dong Lin
>
>