[jira] [Created] (KAFKA-16706) Refactor ReplicationQuotaManager/RLMQuotaManager to eliminate code duplication
Abhijeet Kumar created KAFKA-16706: -- Summary: Refactor ReplicationQuotaManager/RLMQuotaManager to eliminate code duplication Key: KAFKA-16706 URL: https://issues.apache.org/jira/browse/KAFKA-16706 Project: Kafka Issue Type: Task Reporter: Abhijeet Kumar ReplicationQuotaManager and RLMQuotaManager implementations are similar. We should explore ways to refactor them to remove code duplication. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15181) Race condition on partition assigned to TopicBasedRemoteLogMetadataManager
[ https://issues.apache.org/jira/browse/KAFKA-15181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhijeet Kumar resolved KAFKA-15181. Resolution: Fixed > Race condition on partition assigned to TopicBasedRemoteLogMetadataManager > --- > > Key: KAFKA-15181 > URL: https://issues.apache.org/jira/browse/KAFKA-15181 > Project: Kafka > Issue Type: Sub-task > Components: core >Reporter: Jorge Esteban Quilcate Otoya >Assignee: Abhijeet Kumar >Priority: Major > Labels: tiered-storage > > TopicBasedRemoteLogMetadataManager (TBRLMM) uses a cache to be prepared > whever partitions are assigned. > When partitions are assigned to the TBRLMM instance, a consumer is started to > keep the cache up to date. > If the cache hasn't finalized to build, TBRLMM fails to return remote > metadata about partitions that are store on the backing topic. TBRLMM may not > recover from this failing state. > A proposal to fix this issue would be wait after a partition is assigned for > the consumer to catch up. A similar logic is used at the moment when TBRLMM > writes to the topic, and uses send callback to wait for consumer to catch up. > This logic can be reused whever a partition is assigned, so when TBRLMM is > marked as initialized, cache is ready to serve requests. > Reference: https://github.com/aiven/kafka/issues/33 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15261) ReplicaFetcher thread should not block if RLMM is not initialized
[ https://issues.apache.org/jira/browse/KAFKA-15261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhijeet Kumar resolved KAFKA-15261. Resolution: Fixed > ReplicaFetcher thread should not block if RLMM is not initialized > - > > Key: KAFKA-15261 > URL: https://issues.apache.org/jira/browse/KAFKA-15261 > Project: Kafka > Issue Type: Sub-task >Reporter: Abhijeet Kumar >Assignee: Abhijeet Kumar >Priority: Blocker > Fix For: 3.6.0 > > > While building remote log aux state, the replica fetcher fetches the remote > log segment metadata. If the TBRLMM is not initialized yet, the call blocks. > Since replica fetchers share a common lock, it prevents other replica > fetchers from running as well. Also the same lock is shared in the handle > LeaderAndISR request path, hence those calls get blocked as well. > Instead, replica fetcher should check if RLMM is initialized before > attempting to fetch the remote log segment metadata. If RLMM is not > initialized, it should throw a retryable error so that it can be retried > later, and also does not block other operations. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15405) Create a new error code to indicate a resource is not ready yet
Abhijeet Kumar created KAFKA-15405: -- Summary: Create a new error code to indicate a resource is not ready yet Key: KAFKA-15405 URL: https://issues.apache.org/jira/browse/KAFKA-15405 Project: Kafka Issue Type: Task Reporter: Abhijeet Kumar We need a new error code to indicate to the client that the resource is not ready on the server yet and is initializing. When the client receives this error it should retry again. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15293) Update metrics doc to add tiered storage metrics
Abhijeet Kumar created KAFKA-15293: -- Summary: Update metrics doc to add tiered storage metrics Key: KAFKA-15293 URL: https://issues.apache.org/jira/browse/KAFKA-15293 Project: Kafka Issue Type: Sub-task Reporter: Abhijeet Kumar -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15261) ReplicaFetcher thread should not block if RLMM is not initialized
Abhijeet Kumar created KAFKA-15261: -- Summary: ReplicaFetcher thread should not block if RLMM is not initialized Key: KAFKA-15261 URL: https://issues.apache.org/jira/browse/KAFKA-15261 Project: Kafka Issue Type: Sub-task Reporter: Abhijeet Kumar Assignee: Abhijeet Kumar While building remote log aux state, the replica fetcher fetches the remote log segment metadata. If the TBRLMM is not initialized yet, the call blocks. Since replica fetchers share a common lock, it prevents other replica fetchers from running as well. Also the same lock is shared in the handle LeaderAndISR request path, hence those calls get blocked as well. Instead, replica fetcher should check if RLMM is initialized before attempting to fetch the remote log segment metadata. If RLMM is not initialized, it should throw a retryable error so that it can be retried later, and also does not block other operations. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15260) RLM Task should wait until RLMM is initialized before copying segments to remote
Abhijeet Kumar created KAFKA-15260: -- Summary: RLM Task should wait until RLMM is initialized before copying segments to remote Key: KAFKA-15260 URL: https://issues.apache.org/jira/browse/KAFKA-15260 Project: Kafka Issue Type: Sub-task Reporter: Abhijeet Kumar The RLM Task uploads segment to the remote storage for its leader partitions and after each upload it sends a message 'COPY_SEGMENT_STARTED' to the Topic based RLMM topic and then waits for the TBRLMM to consume the message before continuing. If the RLMM is not initialized, TBRLMM may not be able to consume the message within the stipulated time and timeout and RLMM will repeat later. It make take a few mins for the TBRLMM to initialize during which RLM Task will keep timing out. Instead the RLM task should wait until RLMM is initialized before attempting to copy segments to remote storage. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15245) Improve Tiered Storage Metrics
Abhijeet Kumar created KAFKA-15245: -- Summary: Improve Tiered Storage Metrics Key: KAFKA-15245 URL: https://issues.apache.org/jira/browse/KAFKA-15245 Project: Kafka Issue Type: Sub-task Reporter: Abhijeet Kumar Assignee: Abhijeet Kumar Rename existing tiered storage metrics to remove ambiguity and add metrics for the RemoteIndexCache. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15236) Rename Remote Storage metrics to remove ambiguity
Abhijeet Kumar created KAFKA-15236: -- Summary: Rename Remote Storage metrics to remove ambiguity Key: KAFKA-15236 URL: https://issues.apache.org/jira/browse/KAFKA-15236 Project: Kafka Issue Type: Sub-task Reporter: Abhijeet Kumar Assignee: Abhijeet Kumar As per the Tiered Storage feature introduced in [KIP-405|https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage], we added several metrics related to reads(from) and writes(to) for remote storage. The naming convention that was followed is confusing to the users. For eg. in regular Kafka, BytesIn means bytes *_written_* to the log, and BytesOut means bytes *_read_* from the log. But with tiered storage, the concepts are reversed. * RemoteBytesIn means "Number of bytes *_read_* from remote storage per second" * RemoteBytesOut means "Number of bytes _*written*_ to remote storage per second" We should rename the tiered storage related metrics to remove any ambiguity. -- This message was sent by Atlassian Jira (v8.20.10#820010)