[ https://issues.apache.org/jira/browse/KAFKA-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17289642#comment-17289642 ]
Guozhang Wang commented on KAFKA-12370: --------------------------------------- Note this would be related to https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148648762 for augmented topology description. > Refactor KafkaStreams exposed metadata hierarchy > ------------------------------------------------ > > Key: KAFKA-12370 > URL: https://issues.apache.org/jira/browse/KAFKA-12370 > Project: Kafka > Issue Type: Improvement > Components: streams > Reporter: Guozhang Wang > Priority: Major > > Currently in KafkaStreams we have two groups of metadata getter: > 1. > {code} > allMetadata > allMetadataForStore > {code} > Return collection of {{StreamsMetadata}}, which only contains the partitions > as active/standby, plus the hostInfo, but not exposing any task info. > 2. > {code} > queryMetadataForKey > {code} > Returns {{KeyQueryMetadata}} that includes the hostInfos of active and > standbys, plus the partition id. > 3. > {code} > localThreadsMetadata > {code} > Returns {{ThreadMetadata}}, that includes a collection of {{TaskMetadata}} > for active and standby tasks. > All the above functions are used for interactive queries, but their exposed > metadata are very different, and some use cases would need to have all > client, thread, and task metadata to fulfill the feature development. At the > same time, we may have a more dynamic "task -> thread" mapping in the future > and also the embedded clients like consumers would not be per thread, but per > client. > --------------- > Rethinking about the metadata, I feel we can have a more consistent hierarchy > as the following: > * {{StreamsMetadata}} represent the metadata for the client, which includes > the set of {{ThreadMetadata}} for its existing thread and the set of > {{TaskMetadata}} for active and standby tasks assigned to this client, plus > client metadata including hostInfo, embedded client ids. > * {{ThreadMetadata}} includes name, state, the set of {{TaskMetadata}} for > currently assigned tasks. > * {{TaskMetadata}} includes the name (including the sub-topology id and the > partition id), the state, the corresponding sub-topology description > (including the state store names, source topic names). > * {{allMetadata}}, {{allMetadataForStore}}, {{allMetadataForKey}} (renamed > from queryMetadataForKey) returns the set of {{StreamsMetadata}}, and > {{localMetadata}} (renamed from localThreadMetadata) returns a single > {{StreamsMetadata}}. > To illustrate as an example, to find out who are the current active host / > standby hosts of a specific store, we would call {{allMetadataForStore}}, and > for each returned {{StreamsMetadata}} we loop over their contained > {{TaskMetadata}} for active / standby, and filter by its corresponding > sub-topology's description's contained store name. -- This message was sent by Atlassian Jira (v8.3.4#803005)