[ 
https://issues.apache.org/jira/browse/KAFKA-12370?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17321230#comment-17321230
 ] 

Guozhang Wang commented on KAFKA-12370:
---------------------------------------

Another idea to add: maybe we can also add an {{allMetadataForTasks}} where the 
parameters could either be a list of task ids, or a task id prefix, so that 
users can get a list of metadata for those tasks only.

> Refactor KafkaStreams exposed metadata hierarchy
> ------------------------------------------------
>
>                 Key: KAFKA-12370
>                 URL: https://issues.apache.org/jira/browse/KAFKA-12370
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Guozhang Wang
>            Priority: Major
>              Labels: needs-kip
>
> Currently in KafkaStreams we have two groups of metadata getter:
> 1.
> {code}
> allMetadata
> allMetadataForStore
> {code}
> Return collection of {{StreamsMetadata}}, which only contains the partitions 
> as active/standby, plus the hostInfo, but not exposing any task info.
> 2.
> {code}
> queryMetadataForKey
> {code}
> Returns {{KeyQueryMetadata}} that includes the hostInfos of active and 
> standbys, plus the partition id.
> 3.
> {code}
> localThreadsMetadata
> {code}
> Returns {{ThreadMetadata}}, that includes a collection of {{TaskMetadata}} 
> for active and standby tasks.
> All the above functions are used for interactive queries, but their exposed 
> metadata are very different, and some use cases would need to have all 
> client, thread, and task metadata to fulfill the feature development. At the 
> same time, we may have a more dynamic "task -> thread" mapping in the future 
> and also the embedded clients like consumers would not be per thread, but per 
> client.
> ---------------
> Rethinking about the metadata, I feel we can have a more consistent hierarchy 
> as the following:
> * {{StreamsMetadata}} represent the metadata for the client, which includes 
> the set of {{ThreadMetadata}} for its existing thread and the set of 
> {{TaskMetadata}} for active and standby tasks assigned to this client, plus 
> client metadata including hostInfo, embedded client ids.
> * {{ThreadMetadata}} includes name, state, the set of {{TaskMetadata}} for 
> currently assigned tasks.
> * {{TaskMetadata}} includes the name (including the sub-topology id and the 
> partition id), the state, the corresponding sub-topology description 
> (including the state store names, source topic names).
> * {{allMetadata}}, {{allMetadataForStore}}, {{allMetadataForKey}} (renamed 
> from queryMetadataForKey) returns the set of {{StreamsMetadata}}, and 
> {{localMetadata}} (renamed from localThreadMetadata) returns a single 
> {{StreamsMetadata}}.
> To illustrate as an example, to find out who are the current active host / 
> standby hosts of a specific store, we would call {{allMetadataForStore}}, and 
> for each returned {{StreamsMetadata}} we loop over their contained 
> {{TaskMetadata}} for active / standby, and filter by its corresponding 
> sub-topology's description's contained store name. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to