Hey Travis

Thanks for sharing the KIP.

One suggestion (not essential): would it be possible to include the
relevant code snippet and the new class directly in the KIP in `Proposed
Changes` section? That way, everything is self-contained and there’s no
need to switch between the KIP and the codebase.
I understand that you’re incorporating the existing metrics from the old
protocol into the new one, with the goal of maintaining consistency in the
metrics provided. However, I still have a few questions that might be best
addressed here, as this seems like the ideal time to raise them and
reconsider our approach.
-

1. Why are the new metrics being recorded at the thread level exclusively?
Would there be value in exposing these metrics at additional levels (such
as application), especially for operators managing large topologies?
-

2. Are the chosen latency metrics—average and max—sufficient for diagnosing
issues in production, or should more granular statistics (e.g., percentile
latencies) be considered to improve observability?

Let me know your thoughts!


Bests,
Alieh

On Wed, Sep 10, 2025 at 7:38 PM Travis Zhang <[email protected]>
wrote:

> Hi,
>
> I'd like to start a discussion on
> KIP-1216:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-1216%3A+Add+rebalance+listener+metrics+for+Kafka+Streams
>
> This KIP proposes adding latency metrics for each rebalance callback
> to provide operators with the observability needed to effectively
> monitor and optimize Kafka Streams applications in production
> environments.
>
> Thanks,
> Travis
>

Reply via email to