Hey Travis Thanks for sharing the KIP.
One suggestion (not essential): would it be possible to include the relevant code snippet and the new class directly in the KIP in `Proposed Changes` section? That way, everything is self-contained and there’s no need to switch between the KIP and the codebase. I understand that you’re incorporating the existing metrics from the old protocol into the new one, with the goal of maintaining consistency in the metrics provided. However, I still have a few questions that might be best addressed here, as this seems like the ideal time to raise them and reconsider our approach. - 1. Why are the new metrics being recorded at the thread level exclusively? Would there be value in exposing these metrics at additional levels (such as application), especially for operators managing large topologies? - 2. Are the chosen latency metrics—average and max—sufficient for diagnosing issues in production, or should more granular statistics (e.g., percentile latencies) be considered to improve observability? Let me know your thoughts! Bests, Alieh On Wed, Sep 10, 2025 at 7:38 PM Travis Zhang <[email protected]> wrote: > Hi, > > I'd like to start a discussion on > KIP-1216: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-1216%3A+Add+rebalance+listener+metrics+for+Kafka+Streams > > This KIP proposes adding latency metrics for each rebalance callback > to provide operators with the observability needed to effectively > monitor and optimize Kafka Streams applications in production > environments. > > Thanks, > Travis >
