[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition

2024-04-25 Thread Omnia Ibrahim (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840715#comment-17840715
 ] 

Omnia Ibrahim commented on KAFKA-16212:
---

I believe that https://issues.apache.org/jira/browse/KAFKA-10551 (and possibly 
KAFKA-10549) needs to be done first as  ProoduceRequest and Respond interact 
with replicaManager cache to append log and it would be easier if this produce 
request path is already topic ID aware before updating the cache. I am 
prioritising KAFKA-10551 

> Cache partitions by TopicIdPartition instead of TopicPartition
> --
>
> Key: KAFKA-16212
> URL: https://issues.apache.org/jira/browse/KAFKA-16212
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.7.0
>Reporter: Gaurav Narula
>Assignee: Omnia Ibrahim
>Priority: Major
>
> From the discussion in [PR 
> 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it 
> would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of 
> {{TopicPartition}} to avoid ambiguity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition

2024-02-20 Thread Justine Olshan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819023#comment-17819023
 ] 

Justine Olshan commented on KAFKA-16212:


We use something similar in the fetch session cache when the topic ID is 
unknown. 
This transition state will be as long as we support ZK I would suspect. 

 

> how would this look like that I need to be aware off during extending 
> ReplicaManager cache to be topicId aware
Not sure I understand this question.

> Cache partitions by TopicIdPartition instead of TopicPartition
> --
>
> Key: KAFKA-16212
> URL: https://issues.apache.org/jira/browse/KAFKA-16212
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.7.0
>Reporter: Gaurav Narula
>Assignee: Omnia Ibrahim
>Priority: Major
>
> From the discussion in [PR 
> 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it 
> would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of 
> {{TopicPartition}} to avoid ambiguity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition

2024-02-20 Thread Omnia Ibrahim (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818991#comment-17818991
 ] 

Omnia Ibrahim commented on KAFKA-16212:
---

I don't believe ReplicaManager have significant meaning for topic Id zero or 
null. The code related to KRAFT it assume there will always be a topic Id, 
while other codes that doesn't care about topic Id and interact with 
ReplicaManager either not updated yet or doesn't have topic Id awareness 
design. So theoretically this will simplify proposal #1. 

However we will have to 
1. have validation in varies places to handle topic Id as dummy values. 
2. we might need to revert these dummy value and some of the validations later 
in the future. 

I think if we have been using similar approach in other places then proposal#1 
should be fine. 

With all of that said I have one worry regarding the code readability and 
maintenance as having topic Id as Option/Optional.empty/Null/Zero UUID as dummy 
values in the APIs in different places might be a bit confusing during 
extending the code. Is there an agreement as part of KIP-516 for how long this 
transition state will last before having topic id in most places and how would 
this look like that I need to be aware off during extending ReplicaManager 
cache to be topicId aware?

> Cache partitions by TopicIdPartition instead of TopicPartition
> --
>
> Key: KAFKA-16212
> URL: https://issues.apache.org/jira/browse/KAFKA-16212
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.7.0
>Reporter: Gaurav Narula
>Assignee: Omnia Ibrahim
>Priority: Major
>
> From the discussion in [PR 
> 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it 
> would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of 
> {{TopicPartition}} to avoid ambiguity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition

2024-02-15 Thread Justine Olshan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817795#comment-17817795
 ] 

Justine Olshan commented on KAFKA-16212:


Thanks [~omnia_h_ibrahim] I remember this particularly being a tricky area for 
topic IDs, so I appreciate the time being spent here.

 

For fetch requests, I remember we had to do something similar to proposal 1 for 
the fetch path. When we receive fetch request < 13, we will have a zero topic 
ID stored. (Note – this is a placeholder and can not be used as a valid topic 
ID)


Does using the zero uuid make some of option 1 easier? Or are we already using 
the zero uuid to signify something?

> Cache partitions by TopicIdPartition instead of TopicPartition
> --
>
> Key: KAFKA-16212
> URL: https://issues.apache.org/jira/browse/KAFKA-16212
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.7.0
>Reporter: Gaurav Narula
>Assignee: Omnia Ibrahim
>Priority: Major
>
> From the discussion in [PR 
> 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it 
> would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of 
> {{TopicPartition}} to avoid ambiguity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition

2024-02-14 Thread Omnia Ibrahim (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817425#comment-17817425
 ] 

Omnia Ibrahim commented on KAFKA-16212:
---

Actually there is another proposal 4# which is to wait till we drop ZK before 
handling this Jira this will get us to simply replace TopicPartition by 
TopicIdPrtition as no were in the code we will have situation where topic id 
might be none. 

> Cache partitions by TopicIdPartition instead of TopicPartition
> --
>
> Key: KAFKA-16212
> URL: https://issues.apache.org/jira/browse/KAFKA-16212
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.7.0
>Reporter: Gaurav Narula
>Assignee: Omnia Ibrahim
>Priority: Major
>
> From the discussion in [PR 
> 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it 
> would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of 
> {{TopicPartition}} to avoid ambiguity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition

2024-02-14 Thread Omnia Ibrahim (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17817412#comment-17817412
 ] 

Omnia Ibrahim commented on KAFKA-16212:
---

The moment the cache in `ReplicaManager.allPartitions` is represented as 
`Pool[TopicPartition, HostedPartition]` which is a wrapper around 
`ConcurrentHashMap[opicPartition, HostedPartition]` update this to use 
`TopicIdPartition` as a key turn out to be tricky as 
 # not all APIs that interact with `ReplicaManager` in order to fetch/update 
partition cache are aware of topicId like consumer coordinator, handling some 
requests in KafkaApi where the request schema doesn't have topicId, etc .
 # TopicId is represented as Optional in many places which means we might endup 
populate it with null or dummy uuid multiple times to construct 
TopicIdPartition. 



I have 3 proposals at the moment:
 * *Proposal #1 :* Update TopicIdPartitions to have constructor with topicId as 
optional. And change `ReplicaManager.allPartitions` to be 
`LinkedBlockingQueue[TopicIdPartition, HostedPartition]`. _*This might be the 
simplest one as far as I can see.*_ 
 ** any API that is not topic id aware will just get the last entry that match 
topicIdPartition.topicPartition.
 ** The code will need to make sure that we don't have duplicates by 
`TopicIdPartition` in the `LinkedBlockingQueue`.
 ** We will need to revert having topic Id as optional in TopicIdPartitions 
once everywhere in Kafka is topic-id aware.


 * *Proposal #2 :* change `ReplicaManager.allPartitions` to `new 
Pool[TopicPartition, LinkedBlockingQueue[(Option[Uuid], HostedPartition)]]` 
where `Option[Uuid]` represent topic id. This make the cache scheme bit 
complex. The proposal will 

 ** consider the last entry in `LinkedBlockingQueue` is the current value.
 ** The code will make sure that `LinkedBlockingQueue` has only entry for the 
same topic id 
 ** Topic Id aware APIs that need to fetch/update the partition will be updated 
to use `TopicPartition` and topic Id
 ** Topic Id non-aware APIs will remain using topic partitions and the 
replicaManager will assume that these APIs referring to the last entry in 
`LinkedBlockingQueue`


 * *Proposal#3:* The other option is to keep two separate caches one 
`Pool[TopicIdPartition, HostedPartition]` for partitions and another one 
`Pool[TopicPartition, Uuid]` for the last assigned topic id for each partition 
in order to form `TopicIdPartition`. This is the least favourite as having 2 
caches will risk that one of them can go out of data at any time.


[~jolshan] Do you have any strong preferences? I am leaning toward 1st as it is 
less messy than the others. WDYT?

> Cache partitions by TopicIdPartition instead of TopicPartition
> --
>
> Key: KAFKA-16212
> URL: https://issues.apache.org/jira/browse/KAFKA-16212
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.7.0
>Reporter: Gaurav Narula
>Assignee: Omnia Ibrahim
>Priority: Major
>
> From the discussion in [PR 
> 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it 
> would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of 
> {{TopicPartition}} to avoid ambiguity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition

2024-01-31 Thread Lianet Magrans (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812824#comment-17812824
 ] 

Lianet Magrans commented on KAFKA-16212:


Similar situation and approach on the client side btw [~jolshan]. With the new 
consumer and KIP-848 we're moving in the direction of internally spreading the 
use of topic IDs more...step by step.

> Cache partitions by TopicIdPartition instead of TopicPartition
> --
>
> Key: KAFKA-16212
> URL: https://issues.apache.org/jira/browse/KAFKA-16212
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.7.0
>Reporter: Gaurav Narula
>Assignee: Omnia Ibrahim
>Priority: Major
>
> From the discussion in [PR 
> 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it 
> would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of 
> {{TopicPartition}} to avoid ambiguity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16212) Cache partitions by TopicIdPartition instead of TopicPartition

2024-01-31 Thread Justine Olshan (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17812822#comment-17812822
 ] 

Justine Olshan commented on KAFKA-16212:


When working on KIP-516 (topic IDs) there were many areas that topic IDs could 
help but changing them all in one go would have been difficult. I'm happy to 
see more steps in this direction :) 

> Cache partitions by TopicIdPartition instead of TopicPartition
> --
>
> Key: KAFKA-16212
> URL: https://issues.apache.org/jira/browse/KAFKA-16212
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.7.0
>Reporter: Gaurav Narula
>Assignee: Omnia Ibrahim
>Priority: Major
>
> From the discussion in [PR 
> 15263|https://github.com/apache/kafka/pull/15263#discussion_r1471075201], it 
> would be better to cache {{allPartitions}} by {{TopicIdPartition}} instead of 
> {{TopicPartition}} to avoid ambiguity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)