[ https://issues.apache.org/jira/browse/KAFKA-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17357869#comment-17357869 ]
Josep Prat commented on KAFKA-12370: ------------------------------------ As I'm already half familiar with these changes, I'd might give it a go and write the KIP. However, due to the timeline, it won't be possible to squeeze this one for 3.0.0. Unless [~guozhang] wants to work on this of course. > Refactor KafkaStreams exposed metadata hierarchy > ------------------------------------------------ > > Key: KAFKA-12370 > URL: https://issues.apache.org/jira/browse/KAFKA-12370 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Guozhang Wang > Priority: Major > Labels: needs-kip > > Currently in KafkaStreams we have two groups of metadata getter: > 1. > {code} > allMetadata > allMetadataForStore > {code} > Return collection of {{StreamsMetadata}}, which only contains the partitions > as active/standby, plus the hostInfo, but not exposing any task info. > 2. > {code} > queryMetadataForKey > {code} > Returns {{KeyQueryMetadata}} that includes the hostInfos of active and > standbys, plus the partition id. > 3. > {code} > localThreadsMetadata > {code} > Returns {{ThreadMetadata}}, that includes a collection of {{TaskMetadata}} > for active and standby tasks. > All the above functions are used for interactive queries, but their exposed > metadata are very different, and some use cases would need to have all > client, thread, and task metadata to fulfill the feature development. At the > same time, we may have a more dynamic "task -> thread" mapping in the future > and also the embedded clients like consumers would not be per thread, but per > client. > --------------- > Rethinking about the metadata, I feel we can have a more consistent hierarchy > as the following: > * {{StreamsMetadata}} represent the metadata for the client, which includes > the set of {{ThreadMetadata}} for its existing thread and the set of > {{TaskMetadata}} for active and standby tasks assigned to this client, plus > client metadata including hostInfo, embedded client ids. > * {{ThreadMetadata}} includes name, state, the set of {{TaskMetadata}} for > currently assigned tasks. > * {{TaskMetadata}} includes the name (including the sub-topology id and the > partition id), the state, the corresponding sub-topology description > (including the state store names, source topic names). > * {{allMetadata}}, {{allMetadataForStore}}, {{allMetadataForKey}} (renamed > from queryMetadataForKey) returns the set of {{StreamsMetadata}}, and > {{localMetadata}} (renamed from localThreadMetadata) returns a single > {{StreamsMetadata}}. > * {{KeyQueryMetadata}} Class would be deprecated and replaced by > {{TaskMetadata}}. > To illustrate as an example, to find out who are the current active host / > standby hosts of a specific store, we would call {{allMetadataForStore}}, and > for each returned {{StreamsMetadata}} we loop over their contained > {{TaskMetadata}} for active / standby, and filter by its corresponding > sub-topology's description's contained store name. -- This message was sent by Atlassian Jira (v8.3.4#803005)