[jira] [Created] (KAFKA-17637) Invert the search for LIST_OFFSETS request for remote storage topic
Kamal Chandraprakash created KAFKA-17637: Summary: Invert the search for LIST_OFFSETS request for remote storage topic Key: KAFKA-17637 URL: https://issues.apache.org/jira/browse/KAFKA-17637 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash The timestamp in the records are non-monotonic so we begin the search from earliest to latest offset for LIST_OFFSETS request. When tiered storage is enabled for a topic, then we begin the search from remote to local storage. There can be possible concurrency issue that can happen, when the search moves from remote to local storage, some of the local-log segments might get uploaded to remote and deleted from local in the meantime. This can lead to loss of precision in returning the offset for the given timestamp. If this issue happens, then we might silently search for the timestamp in the next/available local-log segment. One way to fix this issue, is-to trigger the search in local-log first, then move to remote-log, and compare the result. The approach is explained in: https://github.com/apache/kafka/pull/16602#discussion_r1759757001 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15266) Static configs set for non primary synonyms are ignored for Log configs
[ https://issues.apache.org/jira/browse/KAFKA-15266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamal Chandraprakash resolved KAFKA-15266. -- Resolution: Duplicate > Static configs set for non primary synonyms are ignored for Log configs > --- > > Key: KAFKA-15266 > URL: https://issues.apache.org/jira/browse/KAFKA-15266 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 2.6.0 >Reporter: Aman Harish Gandhi >Assignee: Aman Harish Gandhi >Priority: Major > > In our server.properties we had the following config > {code:java} > log.retention.hours=48 > {code} > We noticed that after running alter configs to update broker level config(for > a config unrelated to retention) we were only deleting data after 7 days > instead of the configured 2. > The alterconfig we had ran was similar to this > {code:java} > sh kafka-config.sh --bootstrap-server localhost:9092 --alter --add-config > "log.segment.bytes=50" > {code} > Digging deeper the issue could be pin pointed to the reconfigure block of > DynamicLogConfig inside DynamicBrokerConfig. Here we only look at the > "primary" KafkaConfig synonym of the LogConfig and if it is not set then we > remove the value set in default log config as well. This eventually leads to > the retention.ms not being set in the default log config and that leads to > the default value of 7 days being used. The value set in > "log.retention.hours" is completely ignored in this case. > Pasting the relevant code block here > {code:java} > newConfig.valuesFromThisConfig.forEach { (k, v) => > if (DynamicLogConfig.ReconfigurableConfigs.contains(k)) { > DynamicLogConfig.KafkaConfigToLogConfigName.get(k).foreach { configName => > if (v == null) > newBrokerDefaults.remove(configName) > else > newBrokerDefaults.put(configName, v.asInstanceOf[AnyRef]) > } > } > } {code} > In the above block `DynamicLogConfig.ReconfigurableConfigs` contains only > log.retention.ms. It does not contain the other synonyms like > `log.retention.minutes` or `log.retention.hours`. > This issue seems prevalent in all cases where there are more than 1 > KafkaConfig synonyms for the LogConfig. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-17552) Handle LIST_OFFSETS request for max_timestamp when remote storage is enabled
Kamal Chandraprakash created KAFKA-17552: Summary: Handle LIST_OFFSETS request for max_timestamp when remote storage is enabled Key: KAFKA-17552 URL: https://issues.apache.org/jira/browse/KAFKA-17552 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash context: https://github.com/apache/kafka/pull/16602#discussion_r1759135392 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (KAFKA-15420) Kafka Tiered Storage V1
[ https://issues.apache.org/jira/browse/KAFKA-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamal Chandraprakash reopened KAFKA-15420: -- > Kafka Tiered Storage V1 > --- > > Key: KAFKA-15420 > URL: https://issues.apache.org/jira/browse/KAFKA-15420 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.6.0 >Reporter: Satish Duggana >Assignee: Satish Duggana >Priority: Major > Labels: KIP-405 > Fix For: 3.8.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15420) Kafka Tiered Storage V1
[ https://issues.apache.org/jira/browse/KAFKA-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamal Chandraprakash resolved KAFKA-15420. -- Resolution: Fixed > Kafka Tiered Storage V1 > --- > > Key: KAFKA-15420 > URL: https://issues.apache.org/jira/browse/KAFKA-15420 > Project: Kafka > Issue Type: Improvement >Affects Versions: 3.6.0 >Reporter: Satish Duggana >Assignee: Satish Duggana >Priority: Major > Labels: KIP-405 > Fix For: 3.8.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16948) Reset tier lag metrics on becoming follower
Kamal Chandraprakash created KAFKA-16948: Summary: Reset tier lag metrics on becoming follower Key: KAFKA-16948 URL: https://issues.apache.org/jira/browse/KAFKA-16948 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash Tier lag metrics such as remoteCopyLagBytes and remoteCopyLagSegments are not cleared sometimes when the node transitions from leader to follower post a rolling restart. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15777) Configurable remote fetch bytes per partition from Consumer
[ https://issues.apache.org/jira/browse/KAFKA-15777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamal Chandraprakash resolved KAFKA-15777. -- Resolution: Won't Fix > Configurable remote fetch bytes per partition from Consumer > --- > > Key: KAFKA-15777 > URL: https://issues.apache.org/jira/browse/KAFKA-15777 > Project: Kafka > Issue Type: Task >Reporter: Kamal Chandraprakash >Assignee: Kamal Chandraprakash >Priority: Major > Labels: kip > > A consumer can configure the amount of local bytes to read from each > partition in the FETCH request. > {{max.fetch.bytes}} = 50 MB > {{max.partition.fetch.bytes}} = 1 MB > Similar to this, the consumer should be able to configure > {{max.remote.partition.fetch.bytes}} = 4 MB. > While handling the {{FETCH}} request, if we encounter a partition to read > data from remote storage, then rest of the partitions in the request are > ignored. Essentially, we are serving only 1 MB of remote data per FETCH > request when all the partitions in the request are to be served from the > remote storage. > Providing one more configuration to the client help the user to tune the > values depending on their storage plugin. The user might want to optimise the > number of calls to remote storage vs amount of bytes returned back to the > client in the FETCH response. > [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L1454] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16904) Metric to measure the latency of remote read requests
Kamal Chandraprakash created KAFKA-16904: Summary: Metric to measure the latency of remote read requests Key: KAFKA-16904 URL: https://issues.apache.org/jira/browse/KAFKA-16904 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash [KIP-1018|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests] Emit a new metric to measure the amount of time taken to read from the remote storage: {{kafka.log.remote:type=RemoteLogManager,name=RemoteLogReaderFetchRateAndTimeMs}} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16882) Migrate RemoteLogSegmentLifecycleTest to new test infra
Kamal Chandraprakash created KAFKA-16882: Summary: Migrate RemoteLogSegmentLifecycleTest to new test infra Key: KAFKA-16882 URL: https://issues.apache.org/jira/browse/KAFKA-16882 Project: Kafka Issue Type: Test Reporter: Kamal Chandraprakash Assignee: Kuan Po Tseng Fix For: 3.9.0 as title -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16780) Txn consumer exerts pressure on remote storage when reading non-txn topic
Kamal Chandraprakash created KAFKA-16780: Summary: Txn consumer exerts pressure on remote storage when reading non-txn topic Key: KAFKA-16780 URL: https://issues.apache.org/jira/browse/KAFKA-16780 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash h3. Logic to read aborted txns: # When the consumer enables isolation_level as {{READ_COMMITTED}} and reads a non-txn topic, then the broker has to [traverse|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LocalLog.scala#L394] all the local log segments to collect the aborted transactions since there won't be any entry in the transaction index. # The same [logic|https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1436] is applied while reading from remote storage. In this case, when the FETCH request is reading data from the first remote log segment, then it has to fetch the transaction indexes of all the remaining remote-log segments, and then the call lands to the local-log segments before responding to the FETCH request which increases the time taken to serve the requests. The [EoS Abort Index|https://docs.google.com/document/d/1Rlqizmk7QCDe8qAnVW5e5X8rGvn6m2DCR3JR2yqwVjc] design doc explains how the transaction index file filters out the aborted transaction records. The issue is when consumers are enabled with the {{READ_COMMITTED}} isolation level but read the normal topics. If the topic is enabled with the transaction, then we expect the transaction to either commit/rollback within 15 minutes (default transaction.max.timeout.ms = 15 mins), possibly we may have to search only for one (or) two remote log segments. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16696) Remove the in-memory implementation of RSM and RLMM
Kamal Chandraprakash created KAFKA-16696: Summary: Remove the in-memory implementation of RSM and RLMM Key: KAFKA-16696 URL: https://issues.apache.org/jira/browse/KAFKA-16696 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash The in-memory implementation of RSM and RLMM were written to write the unit/integration tests: [https://github.com/apache/kafka/pull/10218] This is not used by any of the tests and superseded by the LocalTieredStorage framework which uses local-disk as secondary storage and topic as RLMM. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16605) Fix the flaky LogCleanerParameterizedIntegrationTest
Kamal Chandraprakash created KAFKA-16605: Summary: Fix the flaky LogCleanerParameterizedIntegrationTest Key: KAFKA-16605 URL: https://issues.apache.org/jira/browse/KAFKA-16605 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16456) Can't stop kafka debug logs
[ https://issues.apache.org/jira/browse/KAFKA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamal Chandraprakash resolved KAFKA-16456. -- Resolution: Not A Problem You can also dynamically change the broker loggers using the {{kafka-configs.sh}} script: (eg) {code} sh kafka-configs.sh --bootstrap-server localhost:9092 --entity-type broker-loggers --entity-name --add-config org.apache.kafka.clients.NetworkClient=INFO --alter {code} > Can't stop kafka debug logs > --- > > Key: KAFKA-16456 > URL: https://issues.apache.org/jira/browse/KAFKA-16456 > Project: Kafka > Issue Type: Bug > Components: logging >Affects Versions: 3.6.0 >Reporter: Rajan Choudhary >Priority: Major > > I am getting kafka debug logs, which are flooding our logs. Sample below > > {code:java} > 09:50:38.073 [kafka-producer-network-thread | maximo-mp] DEBUG > org.apache.kafka.clients.NetworkClient - [Producer clientId=maximo-mp, > transactionalId=sqout-3664816744674374805414] Received API_VERSIONS response > from node 5 for request with header RequestHeader(apiKey=API_VERSIONS, > apiVersion=3, clientId=maximo-mp, correlationId=8): > ApiVersionsResponseData(errorCode=0, apiKeys=[ApiVersion(apiKey=0, > minVersion=0, maxVersion=9), ApiVersion(apiKey=1, minVersion=0, > maxVersion=13), ApiVersion(apiKey=2, minVersion=0, maxVersion=7), > ApiVersion(apiKey=3, minVersion=0, maxVersion=12), ApiVersion(apiKey=4, > minVersion=0, maxVersion=5), ApiVersion(apiKey=5, minVersion=0, > maxVersion=3), ApiVersion(apiKey=6, minVersion=0, maxVersion=7), > ApiVersion(apiKey=7, minVersion=0, maxVersion=3), ApiVersion(apiKey=8, > minVersion=0, maxVersion=8), ApiVersion(apiKey=9, minVersion=0, > maxVersion=8), ApiVersion(apiKey=10, minVersion=0, maxVersion=4), > ApiVersion(apiKey=11, minVersion=0, maxVersion=7), ApiVersion(apiKey=12, > minVersion=0, m... > 09:50:38.073 [kafka-producer-network-thread | maximo-mp] DEBUG > org.apache.kafka.clients.NetworkClient - [Producer clientId=maximo-mp, > transactionalId=sqout-3664816744674374805414] Node 5 has finalized features > epoch: 1, finalized features: [], supported features: [], API versions: > (Produce(0): 0 to 9 [usable: 9], Fetch(1): 0 to 13 [usable: 12], > ListOffsets(2): 0 to 7 [usable: 6], Metadata(3): 0 to 12 [usable: 11], > LeaderAndIsr(4): 0 to 5 [usable: 5], StopReplica(5): 0 to 3 [usable: 3], > UpdateMetadata(6): 0 to 7 [usable: 7], ControlledShutdown(7): 0 to 3 [usable: > 3], OffsetCommit(8): 0 to 8 [usable: 8], OffsetFetch(9): 0 to 8 [usable: 7], > FindCoordinator(10): 0 to 4 [usable: 3], JoinGroup(11): 0 to 7 [usable: 7], > Heartbeat(12): 0 to 4 [usable: 4], LeaveGroup(13): 0 to 4 [usable: 4], > SyncGroup(14): 0 to 5 [usable: 5], DescribeGroups(15): 0 to 5 [usable: 5], > ListGroups(16): 0 to 4 [usable: 4], SaslHandshake(17): 0 to 1 [usable: 1], > ApiVersions(18): 0 to 3 [usable: 3], CreateTopics(19): 0 to 7 [usable: 7], > Del... > 09:50:38.074 [kafka-producer-network-thread | maximo-mp] DEBUG > org.apache.kafka.clients.producer.internals.TransactionManager - [Producer > clientId=maximo-mp, transactionalId=sqout-3664816744674374805414] ProducerId > of partition sqout-0 set to 43458621 with epoch 0. Reinitialize sequence at > beginning. > 09:50:38.074 [kafka-producer-network-thread | maximo-mp] DEBUG > org.apache.kafka.clients.producer.internals.RecordAccumulator - [Producer > clientId=maximo-mp, transactionalId=sqout-3664816744674374805414] Assigned > producerId 43458621 and producerEpoch 0 to batch with base sequence 0 being > sent to partition sqout-0 > 09:50:38.075 [kafka-producer-network-thread | maximo-mp] DEBUG > org.apache.kafka.clients.NetworkClient - [Producer clientId=maximo-mp, > transactionalId=sqout-3664816744674374805414] Sending PRODUCE request with > header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=maximo-mp, > correlationId=9) and timeout 3 to node 5: > {acks=-1,timeout=3,partitionSizes=[sqout-0=4181]} > 09:50:38.095 [kafka-producer-network-thread | maximo-mp] DEBUG > org.apache.kafka.clients.NetworkClient - [Producer clientId=maximo-mp, > transactionalId=sqout-3664816744674374805414] Received PRODUCE response from > node 5 for request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, > clientId=maximo-mp, correlationId=9): > ProduceResponseData(responses=[TopicProduceResponse(name='sqout', > partitionResponses=[PartitionProduceResponse(index=0, errorCode=0, > baseOffset=796494, logAppendTimeMs=-1, logStartOffset=768203, > recordErrors=[], errorMessage=null)])], throttleTimeMs=0) > 09:50:38.095 [kafka-producer-network-thread | maximo-mp] DEBUG > org.apache.kafka.clients.producer.internals.TransactionManager - [Producer > clientId=maximo-mp, transactionalId=sqout-3664816744674374805414] ProducerId: > 43458621; Set l
[jira] [Created] (KAFKA-16454) Snapshot the state of remote log metadata for all the partitions
Kamal Chandraprakash created KAFKA-16454: Summary: Snapshot the state of remote log metadata for all the partitions Key: KAFKA-16454 URL: https://issues.apache.org/jira/browse/KAFKA-16454 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash When restarting the broker, we are reading from the beginning of the {{__remote_log_metadata}} topic to reconstruct the state of remote log segments, instead we can snapshot the state of remote log segments under each partition directory. Previous work to snapshot the state of the remote log metadata are removed as the solution is not complete. https://github.com/apache/kafka/pull/15636 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16452) Bound highwatermark offset to range b/w local-log-start-offset and log-end-offset
Kamal Chandraprakash created KAFKA-16452: Summary: Bound highwatermark offset to range b/w local-log-start-offset and log-end-offset Key: KAFKA-16452 URL: https://issues.apache.org/jira/browse/KAFKA-16452 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash The high watermark should not go below the local-log-start offset. If the high watermark is less than the local-log-start-offset, then the [UnifiedLog#fetchHighWatermarkMetadata|https://sourcegraph.com/github.com/apache/kafka@d4caa1c10ec81b9c87eaaf52b73c83d5579b68d3/-/blob/core/src/main/scala/kafka/log/UnifiedLog.scala?L358] method will throw the OFFSET_OUT_OF_RANGE error when it converts the offset to metadata. Once this error happens, the followers will receive out-of-range exceptions and the producers won't be able to produce messages since the leader cannot move the high watermark. This issue can happen when the partition undergoes recovery due to corruption in the checkpoint file and it gets elected as leader before it gets a chance to update the HW from the previous leader. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16146) Checkpoint log-start-offset after remote log deletion
Kamal Chandraprakash created KAFKA-16146: Summary: Checkpoint log-start-offset after remote log deletion Key: KAFKA-16146 URL: https://issues.apache.org/jira/browse/KAFKA-16146 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash The log-start-offset checkpoint is not getting updated after deleting the remote log segments due to the below check: https://sourcegraph.com/github.com/apache/kafka@b16df3b103d915d33670b8156217fc6c2b473f61/-/blob/core/src/main/scala/kafka/log/LogManager.scala?L851 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15876) Introduce Remote Storage Not Ready Exception
Kamal Chandraprakash created KAFKA-15876: Summary: Introduce Remote Storage Not Ready Exception Key: KAFKA-15876 URL: https://issues.apache.org/jira/browse/KAFKA-15876 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash When tiered storage is enabled on the cluster, Kafka broker has to build the remote log metadata for all the partitions that it is either leader/follower on node restart. The remote log metadata is built in asynchronous fashion and does not interfere with the broker startup path. Once the broker becomes online, it cannot handle the client requests (FETCH and LIST_OFFSETS) to access remote storage until the metadata gets built for those partitions. Currently, we are returning a ReplicaNotAvailable exception back to the client so that it will retry after sometime. [ReplicaNotAvailableException|https://sourcegraph.com/github.com/apache/kafka@254335d24ab6b6d13142dcdb53fec3856c16de9e/-/blob/clients/src/main/java/org/apache/kafka/common/errors/ReplicaNotAvailableException.java] is applicable when there is a reassignment is in-progress and kind of deprecated with the NotLeaderOrFollowerException ([PR#8979|https://github.com/apache/kafka/pull/8979]). It's good to introduce an appropriate retriable exception for remote storage errors to denote that it is not ready to accept the client requests yet. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15859) Introduce delayed remote list offsets to make LIST_OFFSETS async
Kamal Chandraprakash created KAFKA-15859: Summary: Introduce delayed remote list offsets to make LIST_OFFSETS async Key: KAFKA-15859 URL: https://issues.apache.org/jira/browse/KAFKA-15859 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash LIST_OFFSETS API request is handled by the request handler threads. If there are concurrent LIST_OFFSETS requests to remote storage more than the number of request handler threads, then other requests such as FETCH and PRODUCE might starve and be queued. This can lead to higher latency in producing/consuming messages. The `offsetForTimes` call to remote storage can take time as it has to fetch the offset and time indexes to serve the request so moving the requests to purgatory and handle it via the remote-log-reader threads frees up the request handler threads to serve other requests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15376) Explore options of removing data earlier to the current leader's leader epoch lineage for topics enabled with tiered storage.
[ https://issues.apache.org/jira/browse/KAFKA-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamal Chandraprakash resolved KAFKA-15376. -- Resolution: Fixed This task was already addressed in the code, so closing the ticket: https://sourcegraph.com/github.com/apache/kafka@3.6/-/blob/core/src/main/java/kafka/log/remote/RemoteLogManager.java?L1043-1061 > Explore options of removing data earlier to the current leader's leader epoch > lineage for topics enabled with tiered storage. > - > > Key: KAFKA-15376 > URL: https://issues.apache.org/jira/browse/KAFKA-15376 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Satish Duggana >Assignee: Kamal Chandraprakash >Priority: Major > Fix For: 3.7.0 > > > Followup on the discussion thread: > [https://github.com/apache/kafka/pull/13561#discussion_r1288778006] > -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15777) Configurable remote fetch bytes per partition from Consumer
Kamal Chandraprakash created KAFKA-15777: Summary: Configurable remote fetch bytes per partition from Consumer Key: KAFKA-15777 URL: https://issues.apache.org/jira/browse/KAFKA-15777 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash A consumer can configure the amount of local bytes to read from each partition in the FETCH request. {{max.partition.fetch.bytes}} = 1 MB Similar to this, the consumer should be able to configure {{max.remote.partition.fetch.bytes}} = 4 MB. While handling the {{FETCH}} request, if we encounter a partition to read data from remote storage, then rest of the partitions in the request are ignored. Essentially, we are serving only 1 MB of remote data per FETCH request when all the partitions in the request are to be served from the remote storage. Providing one more configuration to the client help the user to tune the values depends on their storage plugin. The user might want to optimise the number of calls to remote storage vs amount of bytes returned back to the client in the FETCH response. [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L1454] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15776) Configurable delay timeout for DelayedRemoteFetch request
Kamal Chandraprakash created KAFKA-15776: Summary: Configurable delay timeout for DelayedRemoteFetch request Key: KAFKA-15776 URL: https://issues.apache.org/jira/browse/KAFKA-15776 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash We are reusing the {{fetch.max.wait.ms}} config as a delay timeout for DelayedRemoteFetchPurgatory. {{fetch.max.wait.ms}} purpose is to wait for the given amount of time when there is no data available to serve the FETCH request. {code:java} The maximum amount of time the server will block before answering the fetch request if there isn't sufficient data to immediately satisfy the requirement given by fetch.min.bytes. {code} https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/DelayedRemoteFetch.scala#L41 Using the same timeout in the DelayedRemoteFetchPurgatory can confuse the user on how to configure optimal value for each purpose. Moreover, the config is of *LOW* importance and most of the users won't configure it and use the default value of 500 ms. Having the delay timeout of 500 ms in DelayedRemoteFetchPurgatory can lead to larger number of expired delayed remote fetch request when the remote storage have any degradation to serve within the timeout. We should introduce one config (preferably server config) to define the delay timeout for DelayedRemoteFetch requests. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15703) Update Highwatermark while building the remote log auxillary state
Kamal Chandraprakash created KAFKA-15703: Summary: Update Highwatermark while building the remote log auxillary state Key: KAFKA-15703 URL: https://issues.apache.org/jira/browse/KAFKA-15703 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15682) Ensure internal remote log metadata topic does not expire its segments before deleting user-topic segments
Kamal Chandraprakash created KAFKA-15682: Summary: Ensure internal remote log metadata topic does not expire its segments before deleting user-topic segments Key: KAFKA-15682 URL: https://issues.apache.org/jira/browse/KAFKA-15682 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash One of the implementation of RemoteLogMetadataManager is TopicBasedRemoteLogMetadataManager which uses an internal Kafka topic. Unlike other internal topics which are compaction enabled, this topic is not enabled with compaction and retention is set to unlimited. Keeping this internal topic retention to unlimited is not practical in real world use-case where the topic disk usage footprint grows large over a period of time. It is assumed that the user will set the retention to a reasonable time such that it is the max of all the user-created topics (max + X). We can't just rely on it and need an assertion before deleting the internal {{__remote_log_metadata}} segments, otherwise there will be dangling remote log segments which won't be cleared once all the brokers are restarted post the topic truncation. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15632) Drop the invalid remote log metadata events
Kamal Chandraprakash created KAFKA-15632: Summary: Drop the invalid remote log metadata events Key: KAFKA-15632 URL: https://issues.apache.org/jira/browse/KAFKA-15632 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash {{__remote_log_metadata}} topic cleanup policy is set to {{DELETE}} and default retention is set to unlimited. The expectation is that the user will configure the maximum retention time for this internal topic compared to all the other user created topics in the cluster. We cannot keep it to unlimited as the contents of this internal topic need to be in the local storage. RemoteLogMetadata cache expect the events to be in the order of [RemoteLogSegmentState#isValidTransition|https://github.com/apache/kafka/blob/trunk/storage/api/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogSegmentState.java#L93] Once the retention time got expired for this topic say after 30 days due to breach by retention size/time, then there can be partial metadata events and the [cache|https://github.com/apache/kafka/blob/trunk/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemoteLogMetadataCache.java#L160] start to throw RemoteResourceNotFoundError. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15499) Fix the flaky DeleteSegmentsDueToLogStartOffsetBreachTest
Kamal Chandraprakash created KAFKA-15499: Summary: Fix the flaky DeleteSegmentsDueToLogStartOffsetBreachTest Key: KAFKA-15499 URL: https://issues.apache.org/jira/browse/KAFKA-15499 Project: Kafka Issue Type: Test Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash Flaky test failure is reported in [https://github.com/apache/kafka/pull/14330#issuecomment-1717195473] {code:java} java.lang.AssertionError: [BrokerId=0] The base offset of the first log segment of topicA-0 in the log directory is 7 which is smaller than the expected offset 8. The directory of topicA-0 is made of the following files: leader-epoch-checkpoint 0009.timeindex remote_log_snapshot 0009.index 0007.timeindex 0007.index 0007.snapshot 0005.snapshot 0009.log partition.metadata 0009.snapshot 0007.log at org.apache.kafka.tiered.storage.utils.BrokerLocalStorage.waitForOffset(BrokerLocalStorage.java:118) at org.apache.kafka.tiered.storage.utils.BrokerLocalStorage.waitForEarliestLocalOffset(BrokerLocalStorage.java:77) at org.apache.kafka.tiered.storage.actions.ProduceAction.doExecute(ProduceAction.java:121) at org.apache.kafka.tiered.storage.TieredStorageTestAction.execute(TieredStorageTestAction.java:25) at org.apache.kafka.tiered.storage.TieredStorageTestHarness.executeTieredStorageTest(TieredStorageTestHarness.java:177){code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15479) Remote log segments should be considered once during retention size breach
Kamal Chandraprakash created KAFKA-15479: Summary: Remote log segments should be considered once during retention size breach Key: KAFKA-15479 URL: https://issues.apache.org/jira/browse/KAFKA-15479 Project: Kafka Issue Type: Improvement Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash When a remote log segment contains multiple epoch, then it considered for multiple times during deletion. This will affect the deletion by remote log retention size. This is a follow-up of KAFKA-15352 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15432) RLM Stop partitions should not be invoked for non-tiered storage topics
Kamal Chandraprakash created KAFKA-15432: Summary: RLM Stop partitions should not be invoked for non-tiered storage topics Key: KAFKA-15432 URL: https://issues.apache.org/jira/browse/KAFKA-15432 Project: Kafka Issue Type: Improvement Reporter: Kamal Chandraprakash When a stop partition request is sent by the controller. It invokes the RemoteLogManager#stopPartition even for internal and non-tiered-storage enabled topics. The replica manager should not call this method for non-tiered-storage topics. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15431) Add support to assert offloaded segment for already produced event in Tiered Storage Framework
Kamal Chandraprakash created KAFKA-15431: Summary: Add support to assert offloaded segment for already produced event in Tiered Storage Framework Key: KAFKA-15431 URL: https://issues.apache.org/jira/browse/KAFKA-15431 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash See [comment|https://github.com/apache/kafka/pull/14307#discussion_r1314943942] -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-15404) Failing Test DynamicBrokerReconfigurationTest#testThreadPoolResize
[ https://issues.apache.org/jira/browse/KAFKA-15404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamal Chandraprakash resolved KAFKA-15404. -- Resolution: Fixed > Failing Test DynamicBrokerReconfigurationTest#testThreadPoolResize > -- > > Key: KAFKA-15404 > URL: https://issues.apache.org/jira/browse/KAFKA-15404 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.6.0 >Reporter: Justine Olshan >Assignee: Kamal Chandraprakash >Priority: Blocker > Labels: flaky-test > Fix For: 3.6.0 > > > This issue is blocking > `org.apache.kafka.tiered.storage.integration.OffloadAndConsumeFromLeaderTest` > I've seen this failing on all builds pretty consistently. > {{org.opentest4j.AssertionFailedError: Invalid threads: expected 6, got 8: > List(data-plane-kafka-socket-acceptor-ListenerName(EXTERNAL)-SASL_SSL-0, > data-plane-kafka-socket-acceptor-ListenerName(EXTERNAL)-SASL_SSL-0, > data-plane-kafka-socket-acceptor-ListenerName(PLAINTEXT)-PLAINTEXT-0, > data-plane-kafka-socket-acceptor-ListenerName(EXTERNAL)-SASL_SSL-0, > data-plane-kafka-socket-acceptor-ListenerName(INTERNAL)-SSL-0, > data-plane-kafka-socket-acceptor-ListenerName(INTERNAL)-SSL-0, > data-plane-kafka-socket-acceptor-ListenerName(PLAINTEXT)-PLAINTEXT-0, > data-plane-kafka-socket-acceptor-ListenerName(INTERNAL)-SSL-0) ==> expected: > but was: }} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Reopened] (KAFKA-15399) Enable OffloadAndConsumeFromLeader test
[ https://issues.apache.org/jira/browse/KAFKA-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamal Chandraprakash reopened KAFKA-15399: -- > Enable OffloadAndConsumeFromLeader test > --- > > Key: KAFKA-15399 > URL: https://issues.apache.org/jira/browse/KAFKA-15399 > Project: Kafka > Issue Type: Sub-task >Reporter: Kamal Chandraprakash >Assignee: Kamal Chandraprakash >Priority: Major > Fix For: 3.6.0 > > > Build / JDK 17 and Scala 2.13 / initializationError – > org.apache.kafka.tiered.storage.integration.OffloadAndConsumeFromLeaderTest -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15421) Enable DynamicBrokerReconfigurationTest#testThreadPoolResize test
Kamal Chandraprakash created KAFKA-15421: Summary: Enable DynamicBrokerReconfigurationTest#testThreadPoolResize test Key: KAFKA-15421 URL: https://issues.apache.org/jira/browse/KAFKA-15421 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15410) Add basic functionality integration test with tiered storage
Kamal Chandraprakash created KAFKA-15410: Summary: Add basic functionality integration test with tiered storage Key: KAFKA-15410 URL: https://issues.apache.org/jira/browse/KAFKA-15410 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash Add the below basic functionality integration tests with tiered storage: # PartitionsExpandTest # DeleteTopicWithSecondaryStorageTest # DeleteSegmentsByRetentionSizeTest # DeleteSegmentsByRetentionTimeTest # DeleteSegmentsDueToLogStartOffsetBreachTest # EnableRemoteLogOnTopicTest # ListOffsetsTest # ReassignReplicaExpandTest # ReassignReplicaMoveTest # ReassignReplicaShrinkTest and # TransactionsTestWithTieredStore -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15400) Fix flaky RemoteIndexCache test
Kamal Chandraprakash created KAFKA-15400: Summary: Fix flaky RemoteIndexCache test Key: KAFKA-15400 URL: https://issues.apache.org/jira/browse/KAFKA-15400 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash Build / JDK 8 and Scala 2.12 / testRemoveMultipleItems() – kafka.log.remote.RemoteIndexCacheTest {noformat} Errorjava.nio.file.NoSuchFileException: /tmp/kafka-RemoteIndexCacheTest2682821178858144340/tF88YIi7QG-EDPMx8jJfQA:foo-0/2147584984_axAb7u74Q02X0ySdo7Hjbw.txnindexStacktracejava.nio.file.NoSuchFileException: /tmp/kafka-RemoteIndexCacheTest2682821178858144340/tF88YIi7QG-EDPMx8jJfQA:foo-0/2147584984_axAb7u74Q02X0ySdo7Hjbw.txnindex at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)at sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55) at sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144) at sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99) at java.nio.file.Files.readAttributes(Files.java:1737) at java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219) at java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276) at java.nio.file.FileTreeWalker.next(FileTreeWalker.java:372) at java.nio.file.Files.walkFileTree(Files.java:2706)at java.nio.file.Files.walkFileTree(Files.java:2742)at org.apache.kafka.common.utils.Utils.delete(Utils.java:899) at kafka.log.remote.RemoteIndexCacheTest.cleanup(RemoteIndexCacheTest.scala:94) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727) at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) at org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:128) at org.junit.jupiter.engine.extension.TimeoutExtension.interceptAfterEachMethod(TimeoutExtension.java:110) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45) at org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92) at org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86) at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.invokeMethodInExtensionContext(ClassBasedTestDescriptor.java:520) at org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$synthesizeAfterEachMethodAdapter$24(ClassBasedTestDescriptor.java:510) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeAfterEachMethods$10(TestMethodTestDescriptor.java:243) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeAllAfterMethodsOrCallbacks$13(TestMethodTestDescriptor.java:276) at org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeAllAfterMethodsOrCallbacks$14(TestMethodTestDescriptor.java:276) at java.util.ArrayList.forEach(ArrayList.java:1259) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeAllAfterMethodsOrCallbacks(TestMethodTestDescriptor.java:275) at org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeAfterEachMethods(TestMethodTest
[jira] [Created] (KAFKA-15399) Enable OffloadAndConsumeFromLeader test
Kamal Chandraprakash created KAFKA-15399: Summary: Enable OffloadAndConsumeFromLeader test Key: KAFKA-15399 URL: https://issues.apache.org/jira/browse/KAFKA-15399 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash Build / JDK 17 and Scala 2.13 / initializationError – org.apache.kafka.tiered.storage.integration.OffloadAndConsumeFromLeaderTest -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15352) Ensure consistency while deleting the remote log segments
Kamal Chandraprakash created KAFKA-15352: Summary: Ensure consistency while deleting the remote log segments Key: KAFKA-15352 URL: https://issues.apache.org/jira/browse/KAFKA-15352 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash In Kafka-14888, the remote log segments are deleted which breaches the retention time/size before updating the log-start-offset. In middle of deletion, if the consumer starts to read from the beginning of the topic, then it will fail to read the messages and UNKNOWN_SERVER_ERROR will be thrown back to the consumer. To ensure consistency, similar to local log segments where the actual segments are deleted after {{segment.delete.delay.ms}}, we should update the log-start-offset first before deleting the remote log segment. See the [PR#13561|https://github.com/apache/kafka/pull/13561] and [comment|https://github.com/apache/kafka/pull/13561#discussion_r1293086543] for more details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15351) Update log-start-offset after leader election for topics enabled with remote storage
Kamal Chandraprakash created KAFKA-15351: Summary: Update log-start-offset after leader election for topics enabled with remote storage Key: KAFKA-15351 URL: https://issues.apache.org/jira/browse/KAFKA-15351 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash In the FETCH response, the leader-log-start-offset will be piggy-backed. But, there can be a scenario: # Leader deleted the remote log segment and updates it's log-start-offset # Before the replica-2 update it's log-start-offset via FETCH-request, the leadership changed to replica-2. # There are no more eligible segments to delete from remote. # The log-start-offset will be stale (referring to old log-start-offset but the data was already removed from remote) # If the consumer starts to read from the beginning of the topic, it will fail to read. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15331) Handle remote log enabled topic deletion when leader is not available
Kamal Chandraprakash created KAFKA-15331: Summary: Handle remote log enabled topic deletion when leader is not available Key: KAFKA-15331 URL: https://issues.apache.org/jira/browse/KAFKA-15331 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash When a topic gets deleted, then there can be a case where all the replicas can be out of ISR. This case is not handled, See: [https://github.com/apache/kafka/pull/13947#discussion_r1289331347] for more details. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15295) Add config validation when remote storage is enabled on a topic
Kamal Chandraprakash created KAFKA-15295: Summary: Add config validation when remote storage is enabled on a topic Key: KAFKA-15295 URL: https://issues.apache.org/jira/browse/KAFKA-15295 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash If system level remote storage is not enabled, then enabling remote storage on a topic should throw exception while validating the configs. See https://github.com/apache/kafka/pull/14114#discussion_r1280372441 for more details -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15290) Add support to onboard existing topics to tiered storage
Kamal Chandraprakash created KAFKA-15290: Summary: Add support to onboard existing topics to tiered storage Key: KAFKA-15290 URL: https://issues.apache.org/jira/browse/KAFKA-15290 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash This task is about adding support to enable tiered storage for existing topics in the cluster. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15272) Fix the logic which finds candidate log segments to upload it to tiered storage
Kamal Chandraprakash created KAFKA-15272: Summary: Fix the logic which finds candidate log segments to upload it to tiered storage Key: KAFKA-15272 URL: https://issues.apache.org/jira/browse/KAFKA-15272 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash In tiered storage, a segment is eligible for deletion from local disk when it gets uploaded to the remote storage. If the topic active segment contains some messages and there are no new incoming messages, then the active segment gets rotated to passive segment after the configured {{log.roll.ms}} timeout. The [logic|https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L553] to find the candidate segment in RemoteLogManager does not include the recently rotated passive segment as eligible to upload it to remote storage so the passive segment won't be removed even after if it breaches by retention time/size. (ie) Topic won't be empty after it becomes stale. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15194) Rename local tiered storage segment with offset as prefix for easy navigation
Kamal Chandraprakash created KAFKA-15194: Summary: Rename local tiered storage segment with offset as prefix for easy navigation Key: KAFKA-15194 URL: https://issues.apache.org/jira/browse/KAFKA-15194 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash In LocalTieredStorage which is an implementation of RemoteStorageManager, segments are saved with random UUID. This makes navigating to a particular segment harder. To navigate a given segment by offset, prepend the offset information to the segment filename. https://github.com/apache/kafka/pull/13837#discussion_r1258896009 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15168) Handle overlapping remote log segments in RemoteLogMetadata cache
Kamal Chandraprakash created KAFKA-15168: Summary: Handle overlapping remote log segments in RemoteLogMetadata cache Key: KAFKA-15168 URL: https://issues.apache.org/jira/browse/KAFKA-15168 Project: Kafka Issue Type: Sub-task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash For a partition p0 and a given leader epoch, the remote log manager can upload duplicate segments due to leader change. RemoteLogMetadata cache should handle the duplicate segments which may affect the follower and consumer. (eg) L0 uploaded the segment PZyYVdsJQWeBAdBDPqkcVA at t0 for offset range 10 - 90 L1 uploads the segment L5Ufv71IToiZYKgsluzcyA at t1 for offset range 5 - 100 In the RemoteLogLeaderEpochState class, the {{offsetToId}} is a navigable map. It sorts the entries by start-offset which keeps the state as: {code:java} (5 - 100) -> L5Ufv71IToiZYKgsluzcyA (T1) (10 - 90) -> PZyYVdsJQWeBAdBDPqkcVA (T0){code} For a fetch request with fetch-offset as 92, the RemoteLogLeaderEpochState will return the segment PZyYVdsJQWeBAdBDPqkcVA instead of L5Ufv71IToiZYKgsluzcyA, which doesn't have the respective offset and throws error back to the caller. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15167) Tiered Storage Test Harness Framework
Kamal Chandraprakash created KAFKA-15167: Summary: Tiered Storage Test Harness Framework Key: KAFKA-15167 URL: https://issues.apache.org/jira/browse/KAFKA-15167 Project: Kafka Issue Type: Sub-task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash Base class for integration tests exercising the tiered storage functionality in Kafka. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-15166) Add deletePartition API to the RemoteStorageManager
Kamal Chandraprakash created KAFKA-15166: Summary: Add deletePartition API to the RemoteStorageManager Key: KAFKA-15166 URL: https://issues.apache.org/jira/browse/KAFKA-15166 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash Remote Storage Manager exposes {{deleteLogSegmentData}} API to delete the individual log segments Storage providers such as HDFS have support to delete a directory. Having an {{deletePartition}} API to delete the data at the partition level will enhance the topic deletion. This task may require a KIP as it touches the user-facing APIs. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-14559) Handle object name with wildcards in the Jmx tool
Kamal Chandraprakash created KAFKA-14559: Summary: Handle object name with wildcards in the Jmx tool Key: KAFKA-14559 URL: https://issues.apache.org/jira/browse/KAFKA-14559 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash {code} ❯ sh kafka-run-class.sh kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi --object-name kafka.server:type=BrokerTopicMetrics,* Trying to connect to JMX url: service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi. Exception in thread "main" java.lang.NullPointerException at kafka.tools.JmxTool$.main(JmxTool.scala:194) at kafka.tools.JmxTool.main(JmxTool.scala) ❯ sh kafka-run-class.sh kafka.tools.JmxTool --jmx-url service:jmx:rmi:///jndi/rmi://localhost:/jmxrmi --object-name kafka.server:type=BrokerTopicMetrics,* --attributes Count,FifteenMinuteRate Trying to connect to JMX url: service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi. Exception in thread "main" java.lang.NullPointerException at kafka.tools.JmxTool$.queryAttributes(JmxTool.scala:254) at kafka.tools.JmxTool$.main(JmxTool.scala:214) at kafka.tools.JmxTool.main(JmxTool.scala) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-12724) Add 2.8.0 to system tests and streams upgrade tests
Kamal Chandraprakash created KAFKA-12724: Summary: Add 2.8.0 to system tests and streams upgrade tests Key: KAFKA-12724 URL: https://issues.apache.org/jira/browse/KAFKA-12724 Project: Kafka Issue Type: Task Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash Kafka v2.8.0 is released. We should add this version to the following system tests. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-12392) Deprecate batch-size config from console producer
Kamal Chandraprakash created KAFKA-12392: Summary: Deprecate batch-size config from console producer Key: KAFKA-12392 URL: https://issues.apache.org/jira/browse/KAFKA-12392 Project: Kafka Issue Type: Improvement Components: producer Affects Versions: 2.7.0 Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash In console producer, {{batch-size}} option is unused. The console producer doesn't batch the messages by count. This config should be removed/deprecated as it's not applicable for the new producer. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KAFKA-8892) Display the sorted configs in Kafka Configs Help Command.
Kamal Chandraprakash created KAFKA-8892: --- Summary: Display the sorted configs in Kafka Configs Help Command. Key: KAFKA-8892 URL: https://issues.apache.org/jira/browse/KAFKA-8892 Project: Kafka Issue Type: Bug Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash Configs that can be updated dynamically for topics/brokers/users/clients are shown in the help command. Only the topic configs are sorted alphabetically. It will be useful to sort the brokers, users and clients configs also for quick lookup. -- This message was sent by Atlassian Jira (v8.3.2#803003)
[jira] [Resolved] (KAFKA-8488) FetchSessionHandler logging create 73 mb allocation in TLAB which could be no op
[ https://issues.apache.org/jira/browse/KAFKA-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamal Chandraprakash resolved KAFKA-8488. - Resolution: Fixed > FetchSessionHandler logging create 73 mb allocation in TLAB which could be no > op > - > > Key: KAFKA-8488 > URL: https://issues.apache.org/jira/browse/KAFKA-8488 > Project: Kafka > Issue Type: Improvement >Reporter: Wenshuai Hou >Priority: Minor > Attachments: image-2019-06-05-14-04-35-668.png > > > !image-2019-06-05-14-04-35-668.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-8547) 2 __consumer_offsets partitions grow very big
[ https://issues.apache.org/jira/browse/KAFKA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamal Chandraprakash resolved KAFKA-8547. - Resolution: Duplicate Duplicate of https://issues.apache.org/jira/browse/KAFKA-8335 PR [https://github.com/apache/kafka/pull/6715] > 2 __consumer_offsets partitions grow very big > - > > Key: KAFKA-8547 > URL: https://issues.apache.org/jira/browse/KAFKA-8547 > Project: Kafka > Issue Type: Bug > Components: log cleaner >Affects Versions: 2.1.1 > Environment: Ubuntu 18.04, Kafka 2.1.12-2.1.1, running as systemd > service >Reporter: Lerh Chuan Low >Priority: Major > > It seems like log cleaner doesn't clean old data of {{__consumer_offsets}} > on the default policy of compact on that topic. It may eventually cause disk > to run out or for the servers to run out of memory. > We observed a few out of memory errors with our Kafka servers and our theory > was due to 2 overly large partitions in {{__consumer_offsets}}. On further > digging, it looks like these 2 large partitions have segments dating up to 3 > months ago. Also, these old files collectively consumed most of the data from > those partitions (About 10G from the partition's 12G). > When we tried dumping those old segments, we see: > > {code:java} > 1:40 $ ./kafka-run-class.sh kafka.tools.DumpLogSegments --files > 161728257775.log --offsets-decoder --print-data-log --deep-iteration > Dumping 161728257775.log > Starting offset: 161728257775 > offset: 161728257904 position: 61 CreateTime: 1553457816168 isvalid: true > keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId: 367038 > producerEpoch: 3 sequence: -1 isTransactional: true headerKeys: [] > endTxnMarker: COMMIT coordinatorEpoch: 746 > offset: 161728258098 position: 200 CreateTime: 1553457816230 isvalid: true > keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId: 366036 > producerEpoch: 3 sequence: -1 isTransactional: true headerKeys: [] > endTxnMarker: COMMIT coordinatorEpoch: 761 > ...{code} > It looks like all those old segments all contain transactional information > (As a side note, we did take a while to figure out that for a segment with > the control bit set, the key really is {{endTxnMarker}} and the value is > {{coordinatorEpoch}}...otherwise in a non-control batch dump it would have > value and payload. We were wondering if seeing what those 2 partitions > contained in their keys may give us any clues). Our current workaround is > based on this post: > https://issues.apache.org/jira/browse/KAFKA-3917?focusedCommentId=16816874&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16816874. > We set the cleanup policy to both compact,delete and very quickly the > partition was down to below 2G. Not sure if this is something log cleaner > should be able to handle normally? Interestingly, other partitions also > contain transactional information so it's quite curious how 2 specific > partitions were not able to be cleaned. > There's a related issue here: > https://issues.apache.org/jira/browse/KAFKA-3917, just thought it was a > little bit outdated/dead so I opened a new one, please feel free to merge! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-8392) Kafka broker leaks metric when partition leader moves to another node.
Kamal Chandraprakash created KAFKA-8392: --- Summary: Kafka broker leaks metric when partition leader moves to another node. Key: KAFKA-8392 URL: https://issues.apache.org/jira/browse/KAFKA-8392 Project: Kafka Issue Type: Bug Components: metrics Affects Versions: 2.2.0 Reporter: Kamal Chandraprakash When a partition leader moves from one node to another due to an imbalance in leader.imbalance.per.broker.percentage, the old leader broker still emits the static metric value. Steps to reproduce: 1. Create a cluster with 3 nodes. 2. Create a topic with 2 partitions and RF=3 3. Generate some data using the console producer. 4. Move any one of the partition from one node to another using reassign-partitions and preferred-replica-election script. 5. Generate some data using the console producer. 6. Now all the 3 nodes emit bytesIn, bytesOut and MessagesIn for that topic. Is it the expected behavior? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7781) Add validation check for Topic retention.ms property
Kamal Chandraprakash created KAFKA-7781: --- Summary: Add validation check for Topic retention.ms property Key: KAFKA-7781 URL: https://issues.apache.org/jira/browse/KAFKA-7781 Project: Kafka Issue Type: Bug Reporter: Kamal Chandraprakash Using AdminClient#alterConfigs, topic _retention.ms_ property can be assigned to a value lesser than -1. This leads to inconsistency while describing the topic configuration. We should not allow values lesser than -1. In server.properties, if _log.retention.ms_ configured to a value lesser than zero, it's [set|https://github.com/apache/kafka/blob/9295444d48eb057900ef09f1176e34b37331f60b/core/src/main/scala/kafka/server/KafkaConfig.scala#L1320] as -1. This doesn't create any issue in log segment deletion, as the [condition|https://github.com/apache/kafka/blob/9295444d48eb057900ef09f1176e34b37331f60b/core/src/main/scala/kafka/log/Log.scala#L1466] for infinite log retention checks for value lesser than zero. To maintain consistency, we should add the validation check. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-7483) Streams should allow headers to be passed to Serializer
Kamal Chandraprakash created KAFKA-7483: --- Summary: Streams should allow headers to be passed to Serializer Key: KAFKA-7483 URL: https://issues.apache.org/jira/browse/KAFKA-7483 Project: Kafka Issue Type: Bug Components: streams Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash We are storing schema metadata for record key and value in the header. Serializer, includes this metadata in the record header. While doing simple record transformation (x transformed to y) in streams, the same header that was passed from source, pushed to the sink topic. This leads to error while reading the sink topic. We should call the overloaded `serialize(topic, headers, object)` method in org.apache.kafka.streams.processor.internals.RecordCollectorImpl#L156, #L157 which in-turn adds the correct metadata in the record header. With this the sink topic reader have the option to read all the values for a header key using `Headers#headers` [or] only the overwritten value using `Headers#lastHeader` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (KAFKA-5909) Remove source jars from classpath while executing CLI tools
[ https://issues.apache.org/jira/browse/KAFKA-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamal Chandraprakash resolved KAFKA-5909. - Resolution: Not A Problem > Remove source jars from classpath while executing CLI tools > --- > > Key: KAFKA-5909 > URL: https://issues.apache.org/jira/browse/KAFKA-5909 > Project: Kafka > Issue Type: Bug > Components: tools >Reporter: Kamal Chandraprakash >Assignee: Kamal Chandraprakash >Priority: Minor > Labels: newbie > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (KAFKA-5909) Remove source jars from classpath while executing CLI tools
Kamal Chandraprakash created KAFKA-5909: --- Summary: Remove source jars from classpath while executing CLI tools Key: KAFKA-5909 URL: https://issues.apache.org/jira/browse/KAFKA-5909 Project: Kafka Issue Type: Bug Components: tools Affects Versions: 0.11.0.0 Reporter: Kamal Chandraprakash Assignee: Kamal Chandraprakash Priority: Minor -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Resolved] (KAFKA-5763) Refactor NetworkClient to use LogContext
[ https://issues.apache.org/jira/browse/KAFKA-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamal Chandraprakash resolved KAFKA-5763. - Resolution: Fixed Fix Version/s: 1.0.0 [~ijuma] `ConsumerNetworkClient` has already been refactored to use the LogContext. Please reopen the task if it's not completed. > Refactor NetworkClient to use LogContext > > > Key: KAFKA-5763 > URL: https://issues.apache.org/jira/browse/KAFKA-5763 > Project: Kafka > Issue Type: Improvement >Reporter: Ismael Juma > Fix For: 1.0.0 > > > We added a LogContext object which automatically adds a log prefix to every > message written by loggers constructed from it (much like the Logging mixin > available in the server code). We use this in the consumer to ensure that > messages always contain the consumer group and client ids, which is very > helpful when multiple consumers are run on the same instance. We should do > something similar for the NetworkClient. We should always include the client > id. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KAFKA-4750) KeyValueIterator returns null values
[ https://issues.apache.org/jira/browse/KAFKA-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kamal Chandraprakash reassigned KAFKA-4750: --- Assignee: Kamal Chandraprakash > KeyValueIterator returns null values > > > Key: KAFKA-4750 > URL: https://issues.apache.org/jira/browse/KAFKA-4750 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 0.10.1.1 >Reporter: Michal Borowiecki >Assignee: Kamal Chandraprakash > Labels: newbie > > The API for ReadOnlyKeyValueStore.range method promises the returned iterator > will not return null values. However, after upgrading from 0.10.0.0 to > 0.10.1.1 we found null values are returned causing NPEs on our side. > I found this happens after removing entries from the store and I found > resemblance to SAMZA-94 defect. The problem seems to be as it was there, when > deleting entries and having a serializer that does not return null when null > is passed in, the state store doesn't actually delete that key/value pair but > the iterator will return null value for that key. > When I modified our serilizer to return null when null is passed in, the > problem went away. However, I believe this should be fixed in kafka streams, > perhaps with a similar approach as SAMZA-94. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (KAFKA-4340) Change the default value of log.message.timestamp.difference.max.ms to the same as log.retention.ms
[ https://issues.apache.org/jira/browse/KAFKA-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15607070#comment-15607070 ] Kamal Chandraprakash commented on KAFKA-4340: - Jiangjie Qin, I would like to work on this issue as it's look like a newbie-one. Could you assign it to me? > Change the default value of log.message.timestamp.difference.max.ms to the > same as log.retention.ms > --- > > Key: KAFKA-4340 > URL: https://issues.apache.org/jira/browse/KAFKA-4340 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 0.10.1.0 >Reporter: Jiangjie Qin >Assignee: Jiangjie Qin > Fix For: 0.10.1.1 > > > [~junrao] brought up the following scenario: > If users are pumping data with timestamp already passed log.retention.ms into > Kafka, the messages will be appended to the log but will be immediately > rolled out by log retention thread when it kicks in and the messages will be > deleted. > To avoid this produce-and-deleted scenario, we can set the default value of > log.message.timestamp.difference.max.ms to be the same as log.retention.ms. -- This message was sent by Atlassian JIRA (v6.3.4#6332)