I'm not sure if I'm misunderstanding the suggestion, but this metric
was ever intended for alerts. Some metrics are more for informational
purposes than for setting up alerts. In fact it is possible for some
consumers to have zero owned partitions if there are fewer partitions
than consumers in the group.

I think you are looking for some mechanism to determine if a
particular partition has not been owned by an instance in the group.
I think it is a bit difficult to do that directly in the current high
level consumer. Instead, you can monitor the consumer lag using the
consumer offset checker - which is not ideal since it is not
integrated in the consumer. The consumer does have lag mbeans but
those apply only for partitions that are owned. This concern can be
addressed with the new consumer.

On Tue, Jan 27, 2015 at 03:20:55PM -0800, Steven Wu wrote:
> To illustrate my point, I will use "allTopicsOwnedPartitionsCount" guage
> from  ZookeeperConsumerConnector as an example. It captures number of
> partitions for a topic that has been assigned owner for the consumer group.
> let's say that I have a topic with 9 partitions. this metrics should
> normally report value 9. I can setup alert
> if allTopicsOwnedPartitionsCount <9.
> 
> here are the drawbacks of this kind of metric.
> 1) if our metrics report/aggregation system has data loss and cause the
> value reported as zero, we can't really distinguish whether it's an real
> error or it is data loss. so we can get false positive/alarm from data loss
> 2) if we change the number of partitions (e.g. from 9 to 18). we need to
> remember to change the alert rule to "allTopicsOwnedPartitionsCount <18".
> this kind of coupling is a maintenance nightmare.
> 
> A more explicit metric is "NoOwnerPartitionsCount". it should be zero
> normally. if it is not zero, we should be alerted. this way, we won't get
> false alarm from data loss.
> 
> We don't have to change/fix this particular example since a new consumer is
> being worked on. But in new consumer please consider more explicit error
> signals.
> 
> Thanks,
> Steven

Reply via email to