Thanks for the valuable KIP, Lucas! I've read through the document and have some initial comments.
ASH01: I understand that deduplication and backoff are intentionally delegated to the plugin. However, since the plugin runs in the broker process and requiresTopologyPush is called on the heartbeat path, a misbehaving or incorrectly implemented plugin could still affect coordinator stability. There seem to be two separate risks: A slow or blocking requiresTopologyPush implementation could increase heartbeat latency and impact Streams group coordination. A plugin that repeatedly returns true could cause clients to repeatedly send UpdateStreamsGroupTopologyDescription requests, increasing unnecessary network and request load between clients and brokers. Should we consider adding broker-side guard rails independent of the plugin implementation? For example, the broker could enforce a per-group or per (groupId, topologyDescriptionId) minimum interval / rate limit for including TopologyDescriptionId in heartbeat responses, even if the plugin keeps returning true. This would not replace plugin-side deduplication, but it would bound the blast radius of a bad plugin implementation. ASH02: One scenario that may be worth considering is topology visibility during topology updates or rolling upgrades. Since STALE_TOPOLOGY members are skipped, StreamsGroupDescribe seems to expose only the topology for the current TopologyDescriptionId. During an update, however, some members may still be running the previous topology epoch. In that state, returning only the current topology may be less helpful, or potentially misleading, for operators. Would it make sense to expose both the current and previous/stale topology descriptions, tagged by topology epoch or description id? ASH03: Related to this, it would be helpful to clarify the behavior for existing streams groups that were created, or had their latest topology epoch bump, while coordinated by a broker without this feature enabled. If such a group later moves to a plugin-enabled coordinator, there may no longer be a group-creation or topology-epoch-bump event to trigger minting a TopologyDescriptionId. In that case, should the plugin-enabled coordinator backfill a TopologyDescriptionId when it loads the group or handles the next successful heartbeat? Otherwise, existing groups created under plugin-less coordinators may not push a topology until the next topology epoch bump. ASH04: One edge case that may be worth clarifying is what happens if multiple members push different topology descriptions for the same (groupId, TopologyDescriptionId). The plugin contract says that concurrent calls for the same pair carry identical data and should be treated as idempotent. In practice, this assumption could potentially be violated due to configuration drift, a failed rolling deployment, or a client-side bug, even if this is expected to be rare. Should the KIP define the expected behavior in this case? For example, should the broker avoid validating this and leave the policy to the plugin, should the plugin treat the first successfully stored topology as authoritative, or should a mismatched later push be rejected as INVALID_REQUEST? Best Regards Sanghyeok An -----Original Message----- From: "Lucas Brutschy via dev"<[email protected]> To: <[email protected]>; Cc: "Lucas Brutschy"<[email protected]>; Sent: 2026-05-04 (월) 18:39:03 (GMT+09:00) Subject: [DISCUSS] KIP-1331: Streams Group Topology Description Plugin Hi all, I would like to start the discussion on KIP-1331. The idea is to optionally make a topology description available to the broker, in the spirit of KIP-714. Looking forward to your feedback! https://cwiki.apache.org/confluence/display/KAFKA/KIP-1331%3A+Streams+Group+Topology+Description+Plugin Best, Lucas
