[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition
[ https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840715#comment-17840715 ] Omnia Ibrahim commented on KAFKA-16212: --- I believe that https://issues.apache.org/jira/browse/KAFKA-10551 (and possibly KAFKA-10549) needs to be done first as ProoduceRequest and Respond interact with replicaManager cache to append log and it would be easier if this produce request path is already topic ID aware before updating the cache. I am prioritising KAFKA-10551 > Cache partitions by TopicIdPartition instead of TopicPartition > -- > > Key: KAFKA-16212 > URL: https://issues.apache.org/jira/browse/KAFKA-16212 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.7.0 >Reporter: Gaurav Narula >Assignee: Omnia Ibrahim >Priority: Major > > From the discussion in [PR > 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it > would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of > {{TopicPartition}} to avoid ambiguity. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition
[ https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819023#comment-17819023 ] Justine Olshan commented on KAFKA-16212: We use something similar in the fetch session cache when the topic ID is unknown. This transition state will be as long as we support ZK I would suspect. > how would this look like that I need to be aware off during extending > ReplicaManager cache to be topicId aware Not sure I understand this question. > Cache partitions by TopicIdPartition instead of TopicPartition > -- > > Key: KAFKA-16212 > URL: https://issues.apache.org/jira/browse/KAFKA-16212 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.7.0 >Reporter: Gaurav Narula >Assignee: Omnia Ibrahim >Priority: Major > > From the discussion in [PR > 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it > would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of > {{TopicPartition}} to avoid ambiguity. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition
[ https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818991#comment-17818991 ] Omnia Ibrahim commented on KAFKA-16212: --- I don't believe ReplicaManager have significant meaning for topic Id zero or null. The code related to KRAFT it assume there will always be a topic Id, while other codes that doesn't care about topic Id and interact with ReplicaManager either not updated yet or doesn't have topic Id awareness design. So theoretically this will simplify proposal #1. However we will have to 1. have validation in varies places to handle topic Id as dummy values. 2. we might need to revert these dummy value and some of the validations later in the future. I think if we have been using similar approach in other places then proposal#1 should be fine. With all of that said I have one worry regarding the code readability and maintenance as having topic Id as Option/Optional.empty/Null/Zero UUID as dummy values in the APIs in different places might be a bit confusing during extending the code. Is there an agreement as part of KIP-516 for how long this transition state will last before having topic id in most places and how would this look like that I need to be aware off during extending ReplicaManager cache to be topicId aware? > Cache partitions by TopicIdPartition instead of TopicPartition > -- > > Key: KAFKA-16212 > URL: https://issues.apache.org/jira/browse/KAFKA-16212 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.7.0 >Reporter: Gaurav Narula >Assignee: Omnia Ibrahim >Priority: Major > > From the discussion in [PR > 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it > would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of > {{TopicPartition}} to avoid ambiguity. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition
[ https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817795#comment-17817795 ] Justine Olshan commented on KAFKA-16212: Thanks [~omnia_h_ibrahim] I remember this particularly being a tricky area for topic IDs, so I appreciate the time being spent here. For fetch requests, I remember we had to do something similar to proposal 1 for the fetch path. When we receive fetch request < 13, we will have a zero topic ID stored. (Note – this is a placeholder and can not be used as a valid topic ID) Does using the zero uuid make some of option 1 easier? Or are we already using the zero uuid to signify something? > Cache partitions by TopicIdPartition instead of TopicPartition > -- > > Key: KAFKA-16212 > URL: https://issues.apache.org/jira/browse/KAFKA-16212 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.7.0 >Reporter: Gaurav Narula >Assignee: Omnia Ibrahim >Priority: Major > > From the discussion in [PR > 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it > would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of > {{TopicPartition}} to avoid ambiguity. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition
[ https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817425#comment-17817425 ] Omnia Ibrahim commented on KAFKA-16212: --- Actually there is another proposal 4# which is to wait till we drop ZK before handling this Jira this will get us to simply replace TopicPartition by TopicIdPrtition as no were in the code we will have situation where topic id might be none. > Cache partitions by TopicIdPartition instead of TopicPartition > -- > > Key: KAFKA-16212 > URL: https://issues.apache.org/jira/browse/KAFKA-16212 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.7.0 >Reporter: Gaurav Narula >Assignee: Omnia Ibrahim >Priority: Major > > From the discussion in [PR > 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it > would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of > {{TopicPartition}} to avoid ambiguity. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition
[ https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817412#comment-17817412 ] Omnia Ibrahim commented on KAFKA-16212: --- The moment the cache in `ReplicaManager.allPartitions` is represented as `Pool[TopicPartition, HostedPartition]` which is a wrapper around `ConcurrentHashMap[opicPartition, HostedPartition]` update this to use `TopicIdPartition` as a key turn out to be tricky as # not all APIs that interact with `ReplicaManager` in order to fetch/update partition cache are aware of topicId like consumer coordinator, handling some requests in KafkaApi where the request schema doesn't have topicId, etc . # TopicId is represented as Optional in many places which means we might endup populate it with null or dummy uuid multiple times to construct TopicIdPartition. I have 3 proposals at the moment: * *Proposal #1 :* Update TopicIdPartitions to have constructor with topicId as optional. And change `ReplicaManager.allPartitions` to be `LinkedBlockingQueue[TopicIdPartition, HostedPartition]`. _*This might be the simplest one as far as I can see.*_ ** any API that is not topic id aware will just get the last entry that match topicIdPartition.topicPartition. ** The code will need to make sure that we don't have duplicates by `TopicIdPartition` in the `LinkedBlockingQueue`. ** We will need to revert having topic Id as optional in TopicIdPartitions once everywhere in Kafka is topic-id aware. * *Proposal #2 :* change `ReplicaManager.allPartitions` to `new Pool[TopicPartition, LinkedBlockingQueue[(Option[Uuid], HostedPartition)]]` where `Option[Uuid]` represent topic id. This make the cache scheme bit complex. The proposal will ** consider the last entry in `LinkedBlockingQueue` is the current value. ** The code will make sure that `LinkedBlockingQueue` has only entry for the same topic id ** Topic Id aware APIs that need to fetch/update the partition will be updated to use `TopicPartition` and topic Id ** Topic Id non-aware APIs will remain using topic partitions and the replicaManager will assume that these APIs referring to the last entry in `LinkedBlockingQueue` * *Proposal#3:* The other option is to keep two separate caches one `Pool[TopicIdPartition, HostedPartition]` for partitions and another one `Pool[TopicPartition, Uuid]` for the last assigned topic id for each partition in order to form `TopicIdPartition`. This is the least favourite as having 2 caches will risk that one of them can go out of data at any time. [~jolshan] Do you have any strong preferences? I am leaning toward 1st as it is less messy than the others. WDYT? > Cache partitions by TopicIdPartition instead of TopicPartition > -- > > Key: KAFKA-16212 > URL: https://issues.apache.org/jira/browse/KAFKA-16212 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.7.0 >Reporter: Gaurav Narula >Assignee: Omnia Ibrahim >Priority: Major > > From the discussion in [PR > 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it > would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of > {{TopicPartition}} to avoid ambiguity. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition
[ https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812824#comment-17812824 ] Lianet Magrans commented on KAFKA-16212: Similar situation and approach on the client side btw [~jolshan]. With the new consumer and KIP-848 we're moving in the direction of internally spreading the use of topic IDs more...step by step. > Cache partitions by TopicIdPartition instead of TopicPartition > -- > > Key: KAFKA-16212 > URL: https://issues.apache.org/jira/browse/KAFKA-16212 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.7.0 >Reporter: Gaurav Narula >Assignee: Omnia Ibrahim >Priority: Major > > From the discussion in [PR > 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it > would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of > {{TopicPartition}} to avoid ambiguity. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition
[ https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812822#comment-17812822 ] Justine Olshan commented on KAFKA-16212: When working on KIP-516 (topic IDs) there were many areas that topic IDs could help but changing them all in one go would have been difficult. I'm happy to see more steps in this direction :) > Cache partitions by TopicIdPartition instead of TopicPartition > -- > > Key: KAFKA-16212 > URL: https://issues.apache.org/jira/browse/KAFKA-16212 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.7.0 >Reporter: Gaurav Narula >Assignee: Omnia Ibrahim >Priority: Major > > From the discussion in [PR > 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it > would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of > {{TopicPartition}} to avoid ambiguity. -- This message was sent by Atlassian Jira (v8.20.10#820010)