[jira] [Created] (KAFKA-16706) Refactor ReplicationQuotaManager/RLMQuotaManager to eliminate code duplication

2024-05-12 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-16706:
--

 Summary: Refactor ReplicationQuotaManager/RLMQuotaManager to 
eliminate code duplication
 Key: KAFKA-16706
 URL: https://issues.apache.org/jira/browse/KAFKA-16706
 Project: Kafka
  Issue Type: Task
Reporter: Abhijeet Kumar


ReplicationQuotaManager and RLMQuotaManager implementations are similar. We 
should explore ways to refactor them to remove code duplication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15181) Race condition on partition assigned to TopicBasedRemoteLogMetadataManager

2023-09-07 Thread Abhijeet Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhijeet Kumar resolved KAFKA-15181.

Resolution: Fixed

> Race condition on partition assigned to TopicBasedRemoteLogMetadataManager 
> ---
>
> Key: KAFKA-15181
> URL: https://issues.apache.org/jira/browse/KAFKA-15181
> Project: Kafka
>  Issue Type: Sub-task
>  Components: core
>Reporter: Jorge Esteban Quilcate Otoya
>Assignee: Abhijeet Kumar
>Priority: Major
>  Labels: tiered-storage
>
> TopicBasedRemoteLogMetadataManager (TBRLMM) uses a cache to be prepared 
> whever partitions are assigned.
> When partitions are assigned to the TBRLMM instance, a consumer is started to 
> keep the cache up to date.
> If the cache hasn't finalized to build, TBRLMM fails to return remote 
> metadata about partitions that are store on the backing topic. TBRLMM may not 
> recover from this failing state.
> A proposal to fix this issue would be wait after a partition is assigned for 
> the consumer to catch up. A similar logic is used at the moment when TBRLMM 
> writes to the topic, and uses send callback to wait for consumer to catch up. 
> This logic can be reused whever a partition is assigned, so when TBRLMM is 
> marked as initialized, cache is ready to serve requests.
> Reference: https://github.com/aiven/kafka/issues/33



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15261) ReplicaFetcher thread should not block if RLMM is not initialized

2023-09-05 Thread Abhijeet Kumar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Abhijeet Kumar resolved KAFKA-15261.

Resolution: Fixed

> ReplicaFetcher thread should not block if RLMM is not initialized
> -
>
> Key: KAFKA-15261
> URL: https://issues.apache.org/jira/browse/KAFKA-15261
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Abhijeet Kumar
>Assignee: Abhijeet Kumar
>Priority: Blocker
> Fix For: 3.6.0
>
>
> While building remote log aux state, the replica fetcher fetches the remote 
> log segment metadata. If the TBRLMM is not initialized yet, the call blocks. 
> Since replica fetchers share a common lock, it prevents other replica 
> fetchers from running as well. Also the same lock is shared in the handle 
> LeaderAndISR request path, hence those calls get blocked as well.
> Instead, replica fetcher should check if RLMM is initialized before 
> attempting to fetch the remote log segment metadata. If RLMM is not 
> initialized, it should throw a retryable error so that it can be retried 
> later, and also does not block other operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15405) Create a new error code to indicate a resource is not ready yet

2023-08-25 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-15405:
--

 Summary: Create a new error code to indicate a resource is not 
ready yet
 Key: KAFKA-15405
 URL: https://issues.apache.org/jira/browse/KAFKA-15405
 Project: Kafka
  Issue Type: Task
Reporter: Abhijeet Kumar


We need a new error code to indicate to the client that the resource is not 
ready on the server yet and is initializing. When the client receives this 
error it should retry again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15293) Update metrics doc to add tiered storage metrics

2023-08-02 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-15293:
--

 Summary: Update metrics doc to add tiered storage metrics
 Key: KAFKA-15293
 URL: https://issues.apache.org/jira/browse/KAFKA-15293
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15261) ReplicaFetcher thread should not block if RLMM is not initialized

2023-07-27 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-15261:
--

 Summary: ReplicaFetcher thread should not block if RLMM is not 
initialized
 Key: KAFKA-15261
 URL: https://issues.apache.org/jira/browse/KAFKA-15261
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar
Assignee: Abhijeet Kumar


While building remote log aux state, the replica fetcher fetches the remote log 
segment metadata. If the TBRLMM is not initialized yet, the call blocks. Since 
replica fetchers share a common lock, it prevents other replica fetchers from 
running as well. Also the same lock is shared in the handle LeaderAndISR 
request path, hence those calls get blocked as well.

Instead, replica fetcher should check if RLMM is initialized before attempting 
to fetch the remote log segment metadata. If RLMM is not initialized, it should 
throw a retryable error so that it can be retried later, and also does not 
block other operations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15260) RLM Task should wait until RLMM is initialized before copying segments to remote

2023-07-27 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-15260:
--

 Summary: RLM Task should wait until RLMM is initialized before 
copying segments to remote
 Key: KAFKA-15260
 URL: https://issues.apache.org/jira/browse/KAFKA-15260
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar


The RLM Task uploads segment to the remote storage for its leader partitions 
and after each upload it sends a message 'COPY_SEGMENT_STARTED' to the Topic 
based RLMM topic and then waits for the TBRLMM to consume the message before 
continuing.

If the RLMM is not initialized, TBRLMM may not be able to consume the message 
within the stipulated time and timeout and RLMM will repeat later. It make take 
a few mins for the TBRLMM to initialize during which RLM Task will keep timing 
out.

Instead the RLM task should wait until RLMM is initialized before attempting to 
copy segments to remote storage.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15245) Improve Tiered Storage Metrics

2023-07-24 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-15245:
--

 Summary: Improve Tiered Storage Metrics
 Key: KAFKA-15245
 URL: https://issues.apache.org/jira/browse/KAFKA-15245
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar
Assignee: Abhijeet Kumar


Rename existing tiered storage metrics to remove ambiguity and add metrics for 
the RemoteIndexCache.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15236) Rename Remote Storage metrics to remove ambiguity

2023-07-22 Thread Abhijeet Kumar (Jira)
Abhijeet Kumar created KAFKA-15236:
--

 Summary: Rename Remote Storage metrics to remove ambiguity
 Key: KAFKA-15236
 URL: https://issues.apache.org/jira/browse/KAFKA-15236
 Project: Kafka
  Issue Type: Sub-task
Reporter: Abhijeet Kumar
Assignee: Abhijeet Kumar


As per the Tiered Storage feature introduced in 
[KIP-405|https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage],
 we added several metrics related to reads(from) and writes(to) for remote 
storage. The naming convention that was followed is confusing to the users.

For eg. in regular Kafka, BytesIn means bytes *_written_* to the log, and 
BytesOut means bytes *_read_* from the log. But with tiered storage, the 
concepts are reversed.
 * RemoteBytesIn means "Number of bytes *_read_* from remote storage per second"
 * RemoteBytesOut means "Number of bytes _*written*_ to remote storage per 
second"

We should rename the tiered storage related metrics to remove any ambiguity.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)