[jira] [Reopened] (KAFKA-15420) Kafka Tiered Storage V1

2024-06-17 Thread Kamal Chandraprakash (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash reopened KAFKA-15420:
--

> Kafka Tiered Storage V1
> ---
>
> Key: KAFKA-15420
> URL: https://issues.apache.org/jira/browse/KAFKA-15420
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.6.0
>Reporter: Satish Duggana
>Assignee: Satish Duggana
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15420) Kafka Tiered Storage V1

2024-06-17 Thread Kamal Chandraprakash (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash resolved KAFKA-15420.
--
Resolution: Fixed

> Kafka Tiered Storage V1
> ---
>
> Key: KAFKA-15420
> URL: https://issues.apache.org/jira/browse/KAFKA-15420
> Project: Kafka
>  Issue Type: Improvement
>Affects Versions: 3.6.0
>Reporter: Satish Duggana
>Assignee: Satish Duggana
>Priority: Major
>  Labels: KIP-405
> Fix For: 3.8.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16948) Reset tier lag metrics on becoming follower

2024-06-12 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-16948:


 Summary: Reset tier lag metrics on becoming follower
 Key: KAFKA-16948
 URL: https://issues.apache.org/jira/browse/KAFKA-16948
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


Tier lag metrics such as remoteCopyLagBytes and remoteCopyLagSegments are not 
cleared sometimes when the node transitions from leader to follower post a 
rolling restart.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15777) Configurable remote fetch bytes per partition from Consumer

2024-06-09 Thread Kamal Chandraprakash (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash resolved KAFKA-15777.
--
Resolution: Won't Fix

> Configurable remote fetch bytes per partition from Consumer
> ---
>
> Key: KAFKA-15777
> URL: https://issues.apache.org/jira/browse/KAFKA-15777
> Project: Kafka
>  Issue Type: Task
>Reporter: Kamal Chandraprakash
>Assignee: Kamal Chandraprakash
>Priority: Major
>  Labels: kip
>
> A consumer can configure the amount of local bytes to read from each 
> partition in the FETCH request.
> {{max.fetch.bytes}} = 50 MB
> {{max.partition.fetch.bytes}} = 1 MB
> Similar to this, the consumer should be able to configure 
> {{max.remote.partition.fetch.bytes}} = 4 MB.
> While handling the {{FETCH}} request, if we encounter a partition to read 
> data from remote storage, then rest of the partitions in the request are 
> ignored. Essentially, we are serving only 1 MB of remote data per FETCH 
> request when all the partitions in the request are to be served from the 
> remote storage.
> Providing one more configuration to the client help the user to tune the 
> values depending on their storage plugin. The user might want to optimise the 
> number of calls to remote storage vs amount of bytes returned back to the 
> client in the FETCH response.
> [https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L1454]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16904) Metric to measure the latency of remote read requests

2024-06-06 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-16904:


 Summary: Metric to measure the latency of remote read requests
 Key: KAFKA-16904
 URL: https://issues.apache.org/jira/browse/KAFKA-16904
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


[KIP-1018|https://cwiki.apache.org/confluence/display/KAFKA/KIP-1018%3A+Introduce+max+remote+fetch+timeout+config+for+DelayedRemoteFetch+requests]

Emit a new metric to measure the amount of time taken to read from the remote 
storage:

{{kafka.log.remote:type=RemoteLogManager,name=RemoteLogReaderFetchRateAndTimeMs}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16882) Migrate RemoteLogSegmentLifecycleTest to new test infra

2024-06-03 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-16882:


 Summary: Migrate RemoteLogSegmentLifecycleTest to new test infra
 Key: KAFKA-16882
 URL: https://issues.apache.org/jira/browse/KAFKA-16882
 Project: Kafka
  Issue Type: Test
Reporter: Kamal Chandraprakash
Assignee: Kuan Po Tseng
 Fix For: 3.9.0


as title



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16780) Txn consumer exerts pressure on remote storage when reading non-txn topic

2024-05-16 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-16780:


 Summary: Txn consumer exerts pressure on remote storage when 
reading non-txn topic
 Key: KAFKA-16780
 URL: https://issues.apache.org/jira/browse/KAFKA-16780
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash


h3. Logic to read aborted txns:
 # When the consumer enables isolation_level as {{READ_COMMITTED}} and reads a 
non-txn topic, then the broker has to 
[traverse|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/log/LocalLog.scala#L394]
 all the local log segments to collect the aborted transactions since there 
won't be any entry in the transaction index.
 # The same 
[logic|https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L1436]
 is applied while reading from remote storage. In this case, when the FETCH 
request is reading data from the first remote log segment, then it has to fetch 
the transaction indexes of all the remaining remote-log segments, and then the 
call lands to the local-log segments before responding to the FETCH request 
which increases the time taken to serve the requests.

The [EoS Abort 
Index|https://docs.google.com/document/d/1Rlqizmk7QCDe8qAnVW5e5X8rGvn6m2DCR3JR2yqwVjc]
 design doc explains how the transaction index file filters out the aborted 
transaction records.

The issue is when consumers are enabled with the {{READ_COMMITTED}} isolation 
level but read the normal topics. If the topic is enabled with the transaction, 
then we expect the transaction to either commit/rollback within 15 minutes 
(default transaction.max.timeout.ms = 15 mins), possibly we may have to search 
only for one (or) two remote log segments.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16696) Remove the in-memory implementation of RSM and RLMM

2024-05-09 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-16696:


 Summary: Remove the in-memory implementation of RSM and RLMM
 Key: KAFKA-16696
 URL: https://issues.apache.org/jira/browse/KAFKA-16696
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


The in-memory implementation of RSM and RLMM were written to write the 
unit/integration tests: [https://github.com/apache/kafka/pull/10218]

This is not used by any of the tests and superseded by the LocalTieredStorage 
framework which uses local-disk as secondary storage and topic as RLMM.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16605) Fix the flaky LogCleanerParameterizedIntegrationTest

2024-04-23 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-16605:


 Summary: Fix the flaky LogCleanerParameterizedIntegrationTest
 Key: KAFKA-16605
 URL: https://issues.apache.org/jira/browse/KAFKA-16605
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16456) Can't stop kafka debug logs

2024-04-09 Thread Kamal Chandraprakash (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash resolved KAFKA-16456.
--
Resolution: Not A Problem

You can also dynamically change the broker loggers using the 
{{kafka-configs.sh}} script:
(eg)
{code}
sh kafka-configs.sh --bootstrap-server localhost:9092 --entity-type 
broker-loggers --entity-name  --add-config 
org.apache.kafka.clients.NetworkClient=INFO --alter
{code}

> Can't stop kafka debug logs
> ---
>
> Key: KAFKA-16456
> URL: https://issues.apache.org/jira/browse/KAFKA-16456
> Project: Kafka
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 3.6.0
>Reporter: Rajan Choudhary
>Priority: Major
>
> I am getting kafka debug logs, which are flooding our logs. Sample below
>  
> {code:java}
> 09:50:38.073 [kafka-producer-network-thread | maximo-mp] DEBUG 
> org.apache.kafka.clients.NetworkClient - [Producer clientId=maximo-mp, 
> transactionalId=sqout-3664816744674374805414] Received API_VERSIONS response 
> from node 5 for request with header RequestHeader(apiKey=API_VERSIONS, 
> apiVersion=3, clientId=maximo-mp, correlationId=8): 
> ApiVersionsResponseData(errorCode=0, apiKeys=[ApiVersion(apiKey=0, 
> minVersion=0, maxVersion=9), ApiVersion(apiKey=1, minVersion=0, 
> maxVersion=13), ApiVersion(apiKey=2, minVersion=0, maxVersion=7), 
> ApiVersion(apiKey=3, minVersion=0, maxVersion=12), ApiVersion(apiKey=4, 
> minVersion=0, maxVersion=5), ApiVersion(apiKey=5, minVersion=0, 
> maxVersion=3), ApiVersion(apiKey=6, minVersion=0, maxVersion=7), 
> ApiVersion(apiKey=7, minVersion=0, maxVersion=3), ApiVersion(apiKey=8, 
> minVersion=0, maxVersion=8), ApiVersion(apiKey=9, minVersion=0, 
> maxVersion=8), ApiVersion(apiKey=10, minVersion=0, maxVersion=4), 
> ApiVersion(apiKey=11, minVersion=0, maxVersion=7), ApiVersion(apiKey=12, 
> minVersion=0, m...
> 09:50:38.073 [kafka-producer-network-thread | maximo-mp] DEBUG 
> org.apache.kafka.clients.NetworkClient - [Producer clientId=maximo-mp, 
> transactionalId=sqout-3664816744674374805414] Node 5 has finalized features 
> epoch: 1, finalized features: [], supported features: [], API versions: 
> (Produce(0): 0 to 9 [usable: 9], Fetch(1): 0 to 13 [usable: 12], 
> ListOffsets(2): 0 to 7 [usable: 6], Metadata(3): 0 to 12 [usable: 11], 
> LeaderAndIsr(4): 0 to 5 [usable: 5], StopReplica(5): 0 to 3 [usable: 3], 
> UpdateMetadata(6): 0 to 7 [usable: 7], ControlledShutdown(7): 0 to 3 [usable: 
> 3], OffsetCommit(8): 0 to 8 [usable: 8], OffsetFetch(9): 0 to 8 [usable: 7], 
> FindCoordinator(10): 0 to 4 [usable: 3], JoinGroup(11): 0 to 7 [usable: 7], 
> Heartbeat(12): 0 to 4 [usable: 4], LeaveGroup(13): 0 to 4 [usable: 4], 
> SyncGroup(14): 0 to 5 [usable: 5], DescribeGroups(15): 0 to 5 [usable: 5], 
> ListGroups(16): 0 to 4 [usable: 4], SaslHandshake(17): 0 to 1 [usable: 1], 
> ApiVersions(18): 0 to 3 [usable: 3], CreateTopics(19): 0 to 7 [usable: 7], 
> Del...
> 09:50:38.074 [kafka-producer-network-thread | maximo-mp] DEBUG 
> org.apache.kafka.clients.producer.internals.TransactionManager - [Producer 
> clientId=maximo-mp, transactionalId=sqout-3664816744674374805414] ProducerId 
> of partition sqout-0 set to 43458621 with epoch 0. Reinitialize sequence at 
> beginning.
> 09:50:38.074 [kafka-producer-network-thread | maximo-mp] DEBUG 
> org.apache.kafka.clients.producer.internals.RecordAccumulator - [Producer 
> clientId=maximo-mp, transactionalId=sqout-3664816744674374805414] Assigned 
> producerId 43458621 and producerEpoch 0 to batch with base sequence 0 being 
> sent to partition sqout-0
> 09:50:38.075 [kafka-producer-network-thread | maximo-mp] DEBUG 
> org.apache.kafka.clients.NetworkClient - [Producer clientId=maximo-mp, 
> transactionalId=sqout-3664816744674374805414] Sending PRODUCE request with 
> header RequestHeader(apiKey=PRODUCE, apiVersion=9, clientId=maximo-mp, 
> correlationId=9) and timeout 3 to node 5: 
> {acks=-1,timeout=3,partitionSizes=[sqout-0=4181]}
> 09:50:38.095 [kafka-producer-network-thread | maximo-mp] DEBUG 
> org.apache.kafka.clients.NetworkClient - [Producer clientId=maximo-mp, 
> transactionalId=sqout-3664816744674374805414] Received PRODUCE response from 
> node 5 for request with header RequestHeader(apiKey=PRODUCE, apiVersion=9, 
> clientId=maximo-mp, correlationId=9): 
> ProduceResponseData(responses=[TopicProduceResponse(name='sqout', 
> partitionResponses=[PartitionProduceResponse(index=0, errorCode=0, 
> baseOffset=796494, logAppendTimeMs=-1, logStartOffset=768203, 
> recordErrors=[], errorMessage=null)])], throttleTimeMs=0)
> 09:50:38.095 [kafka-producer-network-thread | maximo-mp] DEBUG 
> org.apache.kafka.clients.producer.internals.TransactionManager - [Producer 
> clientId=maximo-mp, transactionalId=sqout-3664816744674374805414] ProducerId: 
> 43458621; Set 

[jira] [Created] (KAFKA-16454) Snapshot the state of remote log metadata for all the partitions

2024-04-01 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-16454:


 Summary: Snapshot the state of remote log metadata for all the 
partitions
 Key: KAFKA-16454
 URL: https://issues.apache.org/jira/browse/KAFKA-16454
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash


When restarting the broker, we are reading from the beginning of the  
{{__remote_log_metadata}} topic to reconstruct the state of remote log 
segments, instead we can snapshot the state of remote log segments under each 
partition directory. 

Previous work to snapshot the state of the remote log metadata are removed as 
the solution is not complete. 

https://github.com/apache/kafka/pull/15636



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16452) Bound highwatermark offset to range b/w local-log-start-offset and log-end-offset

2024-03-31 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-16452:


 Summary: Bound highwatermark offset to range b/w 
local-log-start-offset and log-end-offset
 Key: KAFKA-16452
 URL: https://issues.apache.org/jira/browse/KAFKA-16452
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


The high watermark should not go below the local-log-start offset. If the high 
watermark is less than the local-log-start-offset, then the 
[UnifiedLog#fetchHighWatermarkMetadata|https://sourcegraph.com/github.com/apache/kafka@d4caa1c10ec81b9c87eaaf52b73c83d5579b68d3/-/blob/core/src/main/scala/kafka/log/UnifiedLog.scala?L358]
 method will throw the OFFSET_OUT_OF_RANGE error when it converts the offset to 
metadata. Once this error happens, the followers will receive out-of-range 
exceptions and the producers won't be able to produce messages since the leader 
cannot move the high watermark.

This issue can happen when the partition undergoes recovery due to corruption 
in the checkpoint file and it gets elected as leader before it gets a chance to 
update the HW from the previous leader.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16146) Checkpoint log-start-offset after remote log deletion

2024-01-16 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-16146:


 Summary: Checkpoint log-start-offset after remote log deletion
 Key: KAFKA-16146
 URL: https://issues.apache.org/jira/browse/KAFKA-16146
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


The log-start-offset checkpoint is not getting updated after deleting the 
remote log segments due to the below check:

https://sourcegraph.com/github.com/apache/kafka@b16df3b103d915d33670b8156217fc6c2b473f61/-/blob/core/src/main/scala/kafka/log/LogManager.scala?L851



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15876) Introduce Remote Storage Not Ready Exception

2023-11-22 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15876:


 Summary: Introduce Remote Storage Not Ready Exception
 Key: KAFKA-15876
 URL: https://issues.apache.org/jira/browse/KAFKA-15876
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


When tiered storage is enabled on the cluster, Kafka broker has to build the 
remote log metadata for all the partitions that it is either leader/follower on 
node restart. The remote log metadata is built in asynchronous fashion and does 
not interfere with the broker startup path. Once the broker becomes online, it 
cannot handle the client requests (FETCH and LIST_OFFSETS) to access remote 
storage until the metadata gets built for those partitions. Currently, we are 
returning a ReplicaNotAvailable exception back to the client so that it will 
retry after sometime.

[ReplicaNotAvailableException|https://sourcegraph.com/github.com/apache/kafka@254335d24ab6b6d13142dcdb53fec3856c16de9e/-/blob/clients/src/main/java/org/apache/kafka/common/errors/ReplicaNotAvailableException.java]
 is applicable when there is a reassignment is in-progress and kind of 
deprecated with the NotLeaderOrFollowerException 
([PR#8979|https://github.com/apache/kafka/pull/8979]). It's good to introduce 
an appropriate retriable exception for remote storage errors to denote that it 
is not ready to accept the client requests yet.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15859) Introduce delayed remote list offsets to make LIST_OFFSETS async

2023-11-20 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15859:


 Summary: Introduce delayed remote list offsets to make 
LIST_OFFSETS async
 Key: KAFKA-15859
 URL: https://issues.apache.org/jira/browse/KAFKA-15859
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


LIST_OFFSETS API request is handled by the request handler threads. If there 
are concurrent LIST_OFFSETS requests to remote storage more than the number of 
request handler threads, then other requests such as FETCH and PRODUCE might 
starve and be queued. This can lead to higher latency in producing/consuming 
messages.

The `offsetForTimes` call to remote storage can take time as it has to fetch 
the offset and time indexes to serve the request so moving the requests to 
purgatory and handle it via the remote-log-reader threads frees up the request 
handler threads to serve other requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15376) Explore options of removing data earlier to the current leader's leader epoch lineage for topics enabled with tiered storage.

2023-11-14 Thread Kamal Chandraprakash (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash resolved KAFKA-15376.
--
Resolution: Fixed

This task was already addressed in the code, so closing the ticket:

https://sourcegraph.com/github.com/apache/kafka@3.6/-/blob/core/src/main/java/kafka/log/remote/RemoteLogManager.java?L1043-1061

> Explore options of removing data earlier to the current leader's leader epoch 
> lineage for topics enabled with tiered storage.
> -
>
> Key: KAFKA-15376
> URL: https://issues.apache.org/jira/browse/KAFKA-15376
> Project: Kafka
>  Issue Type: Task
>  Components: core
>Reporter: Satish Duggana
>Assignee: Kamal Chandraprakash
>Priority: Major
> Fix For: 3.7.0
>
>
> Followup on the discussion thread:
> [https://github.com/apache/kafka/pull/13561#discussion_r1288778006]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15777) Configurable remote fetch bytes per partition from Consumer

2023-11-02 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15777:


 Summary: Configurable remote fetch bytes per partition from 
Consumer
 Key: KAFKA-15777
 URL: https://issues.apache.org/jira/browse/KAFKA-15777
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


A consumer can configure the amount of local bytes to read from each partition 
in the FETCH request.

{{max.partition.fetch.bytes}} = 1 MB

Similar to this, the consumer should be able to configure 
{{max.remote.partition.fetch.bytes}} = 4 MB.

While handling the {{FETCH}} request, if we encounter a partition to read data 
from remote storage, then rest of the partitions in the request are ignored. 
Essentially, we are serving only 1 MB of remote data per FETCH request when all 
the partitions in the request are to be served from the remote storage.

Providing one more configuration to the client help the user to tune the values 
depends on their storage plugin. The user might want to optimise the number of 
calls to remote storage vs amount of bytes returned back to the client in the 
FETCH response.

[https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L1454]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15776) Configurable delay timeout for DelayedRemoteFetch request

2023-11-02 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15776:


 Summary: Configurable delay timeout for DelayedRemoteFetch request
 Key: KAFKA-15776
 URL: https://issues.apache.org/jira/browse/KAFKA-15776
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


We are reusing the {{fetch.max.wait.ms}} config as a delay timeout for 
DelayedRemoteFetchPurgatory. {{fetch.max.wait.ms}} purpose is to wait for the 
given amount of time when there is no data available to serve the FETCH request.
{code:java}
The maximum amount of time the server will block before answering the fetch 
request if there isn't sufficient data to immediately satisfy the requirement 
given by fetch.min.bytes.
{code}

https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/DelayedRemoteFetch.scala#L41

Using the same timeout in the DelayedRemoteFetchPurgatory can confuse the user 
on how to configure optimal value for each purpose. Moreover, the config is of 
*LOW* importance and most of the users won't configure it and use the default 
value of 500 ms.

Having the delay timeout of 500 ms in DelayedRemoteFetchPurgatory can lead to 
larger number of expired delayed remote fetch request when the remote storage 
have any degradation to serve within the timeout.

We should introduce one config (preferably server config) to define the delay 
timeout for DelayedRemoteFetch requests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15703) Update Highwatermark while building the remote log auxillary state

2023-10-27 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15703:


 Summary: Update Highwatermark while building the remote log 
auxillary state
 Key: KAFKA-15703
 URL: https://issues.apache.org/jira/browse/KAFKA-15703
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15682) Ensure internal remote log metadata topic does not expire its segments before deleting user-topic segments

2023-10-25 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15682:


 Summary: Ensure internal remote log metadata topic does not expire 
its segments before deleting user-topic segments
 Key: KAFKA-15682
 URL: https://issues.apache.org/jira/browse/KAFKA-15682
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash


One of the implementation of RemoteLogMetadataManager is 
TopicBasedRemoteLogMetadataManager which uses an internal Kafka topic. Unlike 
other internal topics which are compaction enabled, this topic is not enabled 
with compaction and retention is set to unlimited. 

Keeping this internal topic retention to unlimited is not practical in real 
world use-case where the topic disk usage footprint grows large over a period 
of time. 

It is assumed that the user will set the retention to a reasonable time such 
that it is the max of all the user-created topics (max + X). We can't just rely 
on it and need an assertion before deleting the internal 
{{__remote_log_metadata}} segments, otherwise there will be dangling remote log 
segments which won't be cleared once all the brokers are restarted post the 
topic truncation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15632) Drop the invalid remote log metadata events

2023-10-18 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15632:


 Summary: Drop the invalid remote log metadata events 
 Key: KAFKA-15632
 URL: https://issues.apache.org/jira/browse/KAFKA-15632
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


{{__remote_log_metadata}} topic cleanup policy is set to {{DELETE}} and default 
retention is set to unlimited.

The expectation is that the user will configure the maximum retention time for 
this internal topic compared to all the other user created topics in the 
cluster. We cannot keep it to unlimited as the contents of this internal topic 
need to be in the local storage.

RemoteLogMetadata cache expect the events to be in the order of 
[RemoteLogSegmentState#isValidTransition|https://github.com/apache/kafka/blob/trunk/storage/api/src/main/java/org/apache/kafka/server/log/remote/storage/RemoteLogSegmentState.java#L93]

Once the retention time got expired for this topic say after 30 days due to 
breach by retention size/time, then there can be partial metadata events and 
the 
[cache|https://github.com/apache/kafka/blob/trunk/storage/src/main/java/org/apache/kafka/server/log/remote/metadata/storage/RemoteLogMetadataCache.java#L160]
 start to throw RemoteResourceNotFoundError.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15499) Fix the flaky DeleteSegmentsDueToLogStartOffsetBreachTest

2023-09-25 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15499:


 Summary: Fix the flaky DeleteSegmentsDueToLogStartOffsetBreachTest
 Key: KAFKA-15499
 URL: https://issues.apache.org/jira/browse/KAFKA-15499
 Project: Kafka
  Issue Type: Test
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


Flaky test failure is reported in 
[https://github.com/apache/kafka/pull/14330#issuecomment-1717195473]



{code:java}
java.lang.AssertionError: [BrokerId=0] The base offset of the first log segment 
of topicA-0 in the log directory is 7 which is smaller than the expected offset 
8. The directory of topicA-0 is made of the following files: 
leader-epoch-checkpoint
0009.timeindex
remote_log_snapshot
0009.index
0007.timeindex
0007.index
0007.snapshot
0005.snapshot
0009.log
partition.metadata
0009.snapshot
0007.log
at 
org.apache.kafka.tiered.storage.utils.BrokerLocalStorage.waitForOffset(BrokerLocalStorage.java:118)
at 
org.apache.kafka.tiered.storage.utils.BrokerLocalStorage.waitForEarliestLocalOffset(BrokerLocalStorage.java:77)
at 
org.apache.kafka.tiered.storage.actions.ProduceAction.doExecute(ProduceAction.java:121)
at 
org.apache.kafka.tiered.storage.TieredStorageTestAction.execute(TieredStorageTestAction.java:25)
at 
org.apache.kafka.tiered.storage.TieredStorageTestHarness.executeTieredStorageTest(TieredStorageTestHarness.java:177){code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15479) Remote log segments should be considered once during retention size breach

2023-09-19 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15479:


 Summary: Remote log segments should be considered once during 
retention size breach
 Key: KAFKA-15479
 URL: https://issues.apache.org/jira/browse/KAFKA-15479
 Project: Kafka
  Issue Type: Improvement
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


When a remote log segment contains multiple epoch, then it considered for 
multiple times during deletion. This will affect the deletion by remote log 
retention size. This is a follow-up of KAFKA-15352



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15432) RLM Stop partitions should not be invoked for non-tiered storage topics

2023-09-05 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15432:


 Summary: RLM Stop partitions should not be invoked for non-tiered 
storage topics
 Key: KAFKA-15432
 URL: https://issues.apache.org/jira/browse/KAFKA-15432
 Project: Kafka
  Issue Type: Improvement
Reporter: Kamal Chandraprakash


When a stop partition request is sent by the controller. It invokes the 
RemoteLogManager#stopPartition even for internal and non-tiered-storage enabled 
topics. The replica manager should not call this method for non-tiered-storage 
topics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15431) Add support to assert offloaded segment for already produced event in Tiered Storage Framework

2023-09-04 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15431:


 Summary: Add support to assert offloaded segment for already 
produced event in Tiered Storage Framework
 Key: KAFKA-15431
 URL: https://issues.apache.org/jira/browse/KAFKA-15431
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash


See [comment|https://github.com/apache/kafka/pull/14307#discussion_r1314943942]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15404) Failing Test DynamicBrokerReconfigurationTest#testThreadPoolResize

2023-09-03 Thread Kamal Chandraprakash (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash resolved KAFKA-15404.
--
Resolution: Fixed

> Failing Test DynamicBrokerReconfigurationTest#testThreadPoolResize
> --
>
> Key: KAFKA-15404
> URL: https://issues.apache.org/jira/browse/KAFKA-15404
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.6.0
>Reporter: Justine Olshan
>Assignee: Kamal Chandraprakash
>Priority: Blocker
>  Labels: flaky-test
> Fix For: 3.6.0
>
>
> This issue is blocking 
> `org.apache.kafka.tiered.storage.integration.OffloadAndConsumeFromLeaderTest`
> I've seen this failing on all builds pretty consistently.
> {{org.opentest4j.AssertionFailedError: Invalid threads: expected 6, got 8: 
> List(data-plane-kafka-socket-acceptor-ListenerName(EXTERNAL)-SASL_SSL-0, 
> data-plane-kafka-socket-acceptor-ListenerName(EXTERNAL)-SASL_SSL-0, 
> data-plane-kafka-socket-acceptor-ListenerName(PLAINTEXT)-PLAINTEXT-0, 
> data-plane-kafka-socket-acceptor-ListenerName(EXTERNAL)-SASL_SSL-0, 
> data-plane-kafka-socket-acceptor-ListenerName(INTERNAL)-SSL-0, 
> data-plane-kafka-socket-acceptor-ListenerName(INTERNAL)-SSL-0, 
> data-plane-kafka-socket-acceptor-ListenerName(PLAINTEXT)-PLAINTEXT-0, 
> data-plane-kafka-socket-acceptor-ListenerName(INTERNAL)-SSL-0) ==> expected: 
>  but was: }}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Reopened] (KAFKA-15399) Enable OffloadAndConsumeFromLeader test

2023-08-31 Thread Kamal Chandraprakash (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash reopened KAFKA-15399:
--

> Enable OffloadAndConsumeFromLeader test
> ---
>
> Key: KAFKA-15399
> URL: https://issues.apache.org/jira/browse/KAFKA-15399
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Kamal Chandraprakash
>Assignee: Kamal Chandraprakash
>Priority: Major
> Fix For: 3.6.0
>
>
> Build / JDK 17 and Scala 2.13 / initializationError – 
> org.apache.kafka.tiered.storage.integration.OffloadAndConsumeFromLeaderTest



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15421) Enable DynamicBrokerReconfigurationTest#testThreadPoolResize test

2023-08-31 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15421:


 Summary: Enable 
DynamicBrokerReconfigurationTest#testThreadPoolResize test
 Key: KAFKA-15421
 URL: https://issues.apache.org/jira/browse/KAFKA-15421
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash






--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15410) Add basic functionality integration test with tiered storage

2023-08-28 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15410:


 Summary: Add basic functionality integration test with tiered 
storage
 Key: KAFKA-15410
 URL: https://issues.apache.org/jira/browse/KAFKA-15410
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


Add the below basic functionality integration tests with tiered storage:
 # PartitionsExpandTest
 # DeleteTopicWithSecondaryStorageTest
 # DeleteSegmentsByRetentionSizeTest
 # DeleteSegmentsByRetentionTimeTest
 # DeleteSegmentsDueToLogStartOffsetBreachTest
 # EnableRemoteLogOnTopicTest
 # ListOffsetsTest
 # ReassignReplicaExpandTest
 # ReassignReplicaMoveTest
 # ReassignReplicaShrinkTest and
 # TransactionsTestWithTieredStore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15400) Fix flaky RemoteIndexCache test

2023-08-23 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15400:


 Summary: Fix flaky RemoteIndexCache test
 Key: KAFKA-15400
 URL: https://issues.apache.org/jira/browse/KAFKA-15400
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


Build / JDK 8 and Scala 2.12 / testRemoveMultipleItems() – 
kafka.log.remote.RemoteIndexCacheTest


{noformat}
Errorjava.nio.file.NoSuchFileException: 
/tmp/kafka-RemoteIndexCacheTest2682821178858144340/tF88YIi7QG-EDPMx8jJfQA:foo-0/2147584984_axAb7u74Q02X0ySdo7Hjbw.txnindexStacktracejava.nio.file.NoSuchFileException:
 
/tmp/kafka-RemoteIndexCacheTest2682821178858144340/tF88YIi7QG-EDPMx8jJfQA:foo-0/2147584984_axAb7u74Q02X0ySdo7Hjbw.txnindex
   at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)   at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)at 
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
   at 
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
at 
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
   at java.nio.file.Files.readAttributes(Files.java:1737)  at 
java.nio.file.FileTreeWalker.getAttributes(FileTreeWalker.java:219)  at 
java.nio.file.FileTreeWalker.visit(FileTreeWalker.java:276)  at 
java.nio.file.FileTreeWalker.next(FileTreeWalker.java:372)   at 
java.nio.file.Files.walkFileTree(Files.java:2706)at 
java.nio.file.Files.walkFileTree(Files.java:2742)at 
org.apache.kafka.common.utils.Utils.delete(Utils.java:899)   at 
kafka.log.remote.RemoteIndexCacheTest.cleanup(RemoteIndexCacheTest.scala:94) at 
sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)  at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)   
 at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
   at 
org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
  at 
org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
 at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptLifecycleMethod(TimeoutExtension.java:128)
  at 
org.junit.jupiter.engine.extension.TimeoutExtension.interceptAfterEachMethod(TimeoutExtension.java:110)
  at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
  at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
 at 
org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
 at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
   at 
org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
   at 
org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.invokeMethodInExtensionContext(ClassBasedTestDescriptor.java:520)
   at 
org.junit.jupiter.engine.descriptor.ClassBasedTestDescriptor.lambda$synthesizeAfterEachMethodAdapter$24(ClassBasedTestDescriptor.java:510)
   at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeAfterEachMethods$10(TestMethodTestDescriptor.java:243)
 at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeAllAfterMethodsOrCallbacks$13(TestMethodTestDescriptor.java:276)
   at 
org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeAllAfterMethodsOrCallbacks$14(TestMethodTestDescriptor.java:276)
   at java.util.ArrayList.forEach(ArrayList.java:1259) at 
org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeAllAfterMethodsOrCallbacks(TestMethodTestDescriptor.java:275)
 at 

[jira] [Created] (KAFKA-15399) Enable OffloadAndConsumeFromLeader test

2023-08-23 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15399:


 Summary: Enable OffloadAndConsumeFromLeader test
 Key: KAFKA-15399
 URL: https://issues.apache.org/jira/browse/KAFKA-15399
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


Build / JDK 17 and Scala 2.13 / initializationError – 
org.apache.kafka.tiered.storage.integration.OffloadAndConsumeFromLeaderTest



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15352) Ensure consistency while deleting the remote log segments

2023-08-15 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15352:


 Summary: Ensure consistency while deleting the remote log segments
 Key: KAFKA-15352
 URL: https://issues.apache.org/jira/browse/KAFKA-15352
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash


In Kafka-14888, the remote log segments are deleted which breaches the 
retention time/size before updating the log-start-offset. In middle of 
deletion, if the consumer starts to read from the beginning of the topic, then 
it will fail to read the messages and UNKNOWN_SERVER_ERROR will be thrown back 
to the consumer.

To ensure consistency, similar to local log segments where the actual segments 
are deleted after {{segment.delete.delay.ms}}, we should update the 
log-start-offset first before deleting the remote log segment.

See the [PR#13561|https://github.com/apache/kafka/pull/13561] and 
[comment|https://github.com/apache/kafka/pull/13561#discussion_r1293086543] for 
more details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15351) Update log-start-offset after leader election for topics enabled with remote storage

2023-08-15 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15351:


 Summary: Update log-start-offset after leader election for topics 
enabled with remote storage
 Key: KAFKA-15351
 URL: https://issues.apache.org/jira/browse/KAFKA-15351
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash


In the FETCH response, the leader-log-start-offset will be piggy-backed. But, 
there can be a scenario:
 # Leader deleted the remote log segment and updates it's log-start-offset
 # Before the replica-2 update it's log-start-offset via FETCH-request, the 
leadership changed to replica-2.
 # There are no more eligible segments to delete from remote.
 # The log-start-offset will be stale (referring to old log-start-offset but 
the data was already removed from remote)
 # If the consumer starts to read from the beginning of the topic, it will fail 
to read.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15331) Handle remote log enabled topic deletion when leader is not available

2023-08-10 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15331:


 Summary: Handle remote log enabled topic deletion when leader is 
not available
 Key: KAFKA-15331
 URL: https://issues.apache.org/jira/browse/KAFKA-15331
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash


When a topic gets deleted, then there can be a case where all the replicas can 
be out of ISR. This case is not handled, See: 
[https://github.com/apache/kafka/pull/13947#discussion_r1289331347] for more 
details.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15295) Add config validation when remote storage is enabled on a topic

2023-08-02 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15295:


 Summary: Add config validation when remote storage is enabled on a 
topic
 Key: KAFKA-15295
 URL: https://issues.apache.org/jira/browse/KAFKA-15295
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


If system level remote storage is not enabled, then enabling remote storage on 
a topic should throw exception while validating the configs. 

See https://github.com/apache/kafka/pull/14114#discussion_r1280372441 for more 
details



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15290) Add support to onboard existing topics to tiered storage

2023-08-01 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15290:


 Summary: Add support to onboard existing topics to tiered storage
 Key: KAFKA-15290
 URL: https://issues.apache.org/jira/browse/KAFKA-15290
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


This task is about adding support to enable tiered storage for existing topics 
in the cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15272) Fix the logic which finds candidate log segments to upload it to tiered storage

2023-07-29 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15272:


 Summary: Fix the logic which finds candidate log segments to 
upload it to tiered storage
 Key: KAFKA-15272
 URL: https://issues.apache.org/jira/browse/KAFKA-15272
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


In tiered storage, a segment is eligible for deletion from local disk when it 
gets uploaded to the remote storage. 

If the topic active segment contains some messages and there are no new 
incoming messages, then the active segment gets rotated to passive segment 
after the configured {{log.roll.ms}} timeout.

 

The 
[logic|https://github.com/apache/kafka/blob/trunk/core/src/main/java/kafka/log/remote/RemoteLogManager.java#L553]
 to find the candidate segment in RemoteLogManager does not include the 
recently rotated passive segment as eligible to upload it to remote storage so 
the passive segment won't be removed even after if it breaches by retention 
time/size. (ie) Topic won't be empty after it becomes stale.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15194) Rename local tiered storage segment with offset as prefix for easy navigation

2023-07-17 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15194:


 Summary: Rename local tiered storage segment with offset as prefix 
for easy navigation
 Key: KAFKA-15194
 URL: https://issues.apache.org/jira/browse/KAFKA-15194
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash


In LocalTieredStorage which is an implementation of RemoteStorageManager, 
segments are saved with random UUID. This makes navigating to a particular 
segment harder. To navigate a given segment by offset, prepend the offset 
information to the segment filename.

https://github.com/apache/kafka/pull/13837#discussion_r1258896009



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15168) Handle overlapping remote log segments in RemoteLogMetadata cache

2023-07-08 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15168:


 Summary: Handle overlapping remote log segments in 
RemoteLogMetadata cache
 Key: KAFKA-15168
 URL: https://issues.apache.org/jira/browse/KAFKA-15168
 Project: Kafka
  Issue Type: Sub-task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


For a partition p0 and a given leader epoch, the remote log manager can upload 
duplicate segments due to leader change. RemoteLogMetadata cache should handle 
the duplicate segments which may affect the follower and consumer.

(eg)

L0 uploaded the segment PZyYVdsJQWeBAdBDPqkcVA at t0 for offset range 10 - 90
L1 uploads the segment L5Ufv71IToiZYKgsluzcyA at t1 for offset range 5 - 100

In the RemoteLogLeaderEpochState class, the {{offsetToId}} is a navigable map. 
It sorts the entries by start-offset which keeps the state as:
{code:java}
(5 - 100) -> L5Ufv71IToiZYKgsluzcyA (T1)
(10 - 90) -> PZyYVdsJQWeBAdBDPqkcVA (T0){code}

For a fetch request with fetch-offset as 92, the RemoteLogLeaderEpochState will 
return the segment PZyYVdsJQWeBAdBDPqkcVA instead of L5Ufv71IToiZYKgsluzcyA, 
which doesn't have the respective offset and throws error back to the caller.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15167) Tiered Storage Test Harness Framework

2023-07-08 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15167:


 Summary: Tiered Storage Test Harness Framework
 Key: KAFKA-15167
 URL: https://issues.apache.org/jira/browse/KAFKA-15167
 Project: Kafka
  Issue Type: Sub-task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


Base class for integration tests exercising the tiered storage functionality in 
Kafka.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15166) Add deletePartition API to the RemoteStorageManager

2023-07-08 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-15166:


 Summary: Add deletePartition API to the RemoteStorageManager
 Key: KAFKA-15166
 URL: https://issues.apache.org/jira/browse/KAFKA-15166
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


Remote Storage Manager exposes {{deleteLogSegmentData}} API to delete the 
individual log segments  Storage providers such as HDFS have support to delete 
a directory. Having an {{deletePartition}} API to delete the data at the 
partition level will enhance the topic deletion.

This task may require a KIP as it touches the user-facing APIs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-14559) Handle object name with wildcards in the Jmx tool

2022-12-30 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-14559:


 Summary: Handle object name with wildcards in the Jmx tool
 Key: KAFKA-14559
 URL: https://issues.apache.org/jira/browse/KAFKA-14559
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


{code}
❯ sh kafka-run-class.sh kafka.tools.JmxTool --jmx-url 
service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi --object-name 
kafka.server:type=BrokerTopicMetrics,*

Trying to connect to JMX url: 
service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi.
Exception in thread "main" java.lang.NullPointerException
at kafka.tools.JmxTool$.main(JmxTool.scala:194)
at kafka.tools.JmxTool.main(JmxTool.scala)

❯ sh kafka-run-class.sh kafka.tools.JmxTool --jmx-url 
service:jmx:rmi:///jndi/rmi://localhost:/jmxrmi --object-name 
kafka.server:type=BrokerTopicMetrics,* --attributes Count,FifteenMinuteRate

Trying to connect to JMX url: 
service:jmx:rmi:///jndi/rmi://127.0.0.1:/jmxrmi. 
Exception in thread "main" java.lang.NullPointerException
at kafka.tools.JmxTool$.queryAttributes(JmxTool.scala:254)
at kafka.tools.JmxTool$.main(JmxTool.scala:214)
at kafka.tools.JmxTool.main(JmxTool.scala)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-12724) Add 2.8.0 to system tests and streams upgrade tests

2021-04-28 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-12724:


 Summary: Add 2.8.0 to system tests and streams upgrade tests
 Key: KAFKA-12724
 URL: https://issues.apache.org/jira/browse/KAFKA-12724
 Project: Kafka
  Issue Type: Task
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


Kafka v2.8.0 is released. We should add this version to the following system 
tests.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12392) Deprecate batch-size config from console producer

2021-03-01 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-12392:


 Summary: Deprecate batch-size config from console producer
 Key: KAFKA-12392
 URL: https://issues.apache.org/jira/browse/KAFKA-12392
 Project: Kafka
  Issue Type: Improvement
  Components: producer 
Affects Versions: 2.7.0
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


In console producer, {{batch-size}} option is unused. The console producer 
doesn't batch the messages by count. This config should be removed/deprecated 
as it's not applicable for the new producer.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-8892) Display the sorted configs in Kafka Configs Help Command.

2019-09-10 Thread Kamal Chandraprakash (Jira)
Kamal Chandraprakash created KAFKA-8892:
---

 Summary: Display the sorted configs in Kafka Configs Help Command.
 Key: KAFKA-8892
 URL: https://issues.apache.org/jira/browse/KAFKA-8892
 Project: Kafka
  Issue Type: Bug
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


Configs that can be updated dynamically for topics/brokers/users/clients are 
shown in the help command. Only the topic configs are sorted alphabetically. It 
will be useful to sort the brokers, users and clients configs also for quick 
lookup.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)


[jira] [Resolved] (KAFKA-8488) FetchSessionHandler logging create 73 mb allocation in TLAB which could be no op

2019-06-18 Thread Kamal Chandraprakash (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash resolved KAFKA-8488.
-
Resolution: Fixed

> FetchSessionHandler logging create 73 mb allocation in TLAB which could be no 
> op 
> -
>
> Key: KAFKA-8488
> URL: https://issues.apache.org/jira/browse/KAFKA-8488
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Wenshuai Hou
>Priority: Minor
> Attachments: image-2019-06-05-14-04-35-668.png
>
>
> !image-2019-06-05-14-04-35-668.png!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-8547) 2 __consumer_offsets partitions grow very big

2019-06-16 Thread Kamal Chandraprakash (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash resolved KAFKA-8547.
-
Resolution: Duplicate

Duplicate of https://issues.apache.org/jira/browse/KAFKA-8335

PR [https://github.com/apache/kafka/pull/6715]

> 2 __consumer_offsets partitions grow very big
> -
>
> Key: KAFKA-8547
> URL: https://issues.apache.org/jira/browse/KAFKA-8547
> Project: Kafka
>  Issue Type: Bug
>  Components: log cleaner
>Affects Versions: 2.1.1
> Environment: Ubuntu 18.04, Kafka 2.1.12-2.1.1, running as systemd 
> service
>Reporter: Lerh Chuan Low
>Priority: Major
>
> It seems like log cleaner doesn't clean old data of  {{__consumer_offsets}} 
> on the default policy of compact on that topic. It may eventually cause disk 
> to run out or for the servers to run out of memory.
> We observed a few out of memory errors with our Kafka servers and our theory 
> was due to 2 overly large partitions in {{__consumer_offsets}}. On further 
> digging, it looks like these 2 large partitions have segments dating up to 3 
> months ago. Also, these old files collectively consumed most of the data from 
> those partitions (About 10G from the partition's 12G). 
> When we tried dumping those old segments, we see:
>  
> {code:java}
> 1:40 $ ./kafka-run-class.sh kafka.tools.DumpLogSegments --files 
> 161728257775.log --offsets-decoder --print-data-log --deep-iteration
>  Dumping 161728257775.log
>  Starting offset: 161728257775
>  offset: 161728257904 position: 61 CreateTime: 1553457816168 isvalid: true 
> keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId: 367038 
> producerEpoch: 3 sequence: -1 isTransactional: true headerKeys: [] 
> endTxnMarker: COMMIT coordinatorEpoch: 746
>  offset: 161728258098 position: 200 CreateTime: 1553457816230 isvalid: true 
> keysize: 4 valuesize: 6 magic: 2 compresscodec: NONE producerId: 366036 
> producerEpoch: 3 sequence: -1 isTransactional: true headerKeys: [] 
> endTxnMarker: COMMIT coordinatorEpoch: 761
>  ...{code}
> It looks like all those old segments all contain transactional information 
> (As a side note, we did take a while to figure out that for a segment with 
> the control bit set, the key really is {{endTxnMarker}} and the value is 
> {{coordinatorEpoch}}...otherwise in a non-control batch dump it would have 
> value and payload. We were wondering if seeing what those 2 partitions 
> contained in their keys may give us any clues). Our current workaround is 
> based on this post: 
> https://issues.apache.org/jira/browse/KAFKA-3917?focusedCommentId=16816874=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16816874.
>  We set the cleanup policy to both compact,delete and very quickly the 
> partition was down to below 2G. Not sure if this is something log cleaner 
> should be able to handle normally? Interestingly, other partitions also 
> contain transactional information so it's quite curious how 2 specific 
> partitions were not able to be cleaned. 
> There's a related issue here: 
> https://issues.apache.org/jira/browse/KAFKA-3917, just thought it was a 
> little bit outdated/dead so I opened a new one, please feel free to merge!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8392) Kafka broker leaks metric when partition leader moves to another node.

2019-05-19 Thread Kamal Chandraprakash (JIRA)
Kamal Chandraprakash created KAFKA-8392:
---

 Summary: Kafka broker leaks metric when partition leader moves to 
another node.
 Key: KAFKA-8392
 URL: https://issues.apache.org/jira/browse/KAFKA-8392
 Project: Kafka
  Issue Type: Bug
  Components: metrics
Affects Versions: 2.2.0
Reporter: Kamal Chandraprakash


When a partition leader moves from one node to another due to an imbalance in 
leader.imbalance.per.broker.percentage, the old leader broker still emits the 
static metric value.

Steps to reproduce:
1. Create a cluster with 3 nodes.
2. Create a topic with 2 partitions and RF=3
3. Generate some data using the console producer.
4. Move any one of the partition from one node to another using 
reassign-partitions and preferred-replica-election script.
5. Generate some data using the console producer.
6. Now all the 3 nodes emit bytesIn, bytesOut and MessagesIn for that topic.

Is it the expected behavior?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7781) Add validation check for Topic retention.ms property

2019-01-02 Thread Kamal Chandraprakash (JIRA)
Kamal Chandraprakash created KAFKA-7781:
---

 Summary: Add validation check for Topic retention.ms property
 Key: KAFKA-7781
 URL: https://issues.apache.org/jira/browse/KAFKA-7781
 Project: Kafka
  Issue Type: Bug
Reporter: Kamal Chandraprakash


Using AdminClient#alterConfigs, topic _retention.ms_ property can be assigned 
to a value lesser than -1. This leads to inconsistency while describing the 
topic configuration. We should not allow values lesser than -1. In 
server.properties, if _log.retention.ms_ configured to a value lesser than 
zero, it's 
[set|https://github.com/apache/kafka/blob/9295444d48eb057900ef09f1176e34b37331f60b/core/src/main/scala/kafka/server/KafkaConfig.scala#L1320]
 as -1.

This doesn't create any issue in log segment deletion, as the 
[condition|https://github.com/apache/kafka/blob/9295444d48eb057900ef09f1176e34b37331f60b/core/src/main/scala/kafka/log/Log.scala#L1466]
 for infinite log retention checks for value lesser than zero. To maintain 
consistency, we should add the validation check.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7483) Streams should allow headers to be passed to Serializer

2018-10-04 Thread Kamal Chandraprakash (JIRA)
Kamal Chandraprakash created KAFKA-7483:
---

 Summary: Streams should allow headers to be passed to Serializer
 Key: KAFKA-7483
 URL: https://issues.apache.org/jira/browse/KAFKA-7483
 Project: Kafka
  Issue Type: Bug
  Components: streams
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash


We are storing schema metadata for record key and value in the header. 
Serializer, includes this metadata in the record header. While doing simple 
record transformation (x transformed to y) in streams, the same header that was 
passed from source, pushed to the sink topic. This leads to error while reading 
the sink topic.

We should call the overloaded `serialize(topic, headers, object)` method in  
org.apache.kafka.streams.processor.internals.RecordCollectorImpl#L156, #L157 
which in-turn adds the correct metadata in the record header.

With this the sink topic reader have the option to read all the values for a 
header key using `Headers#headers`  [or] only the overwritten value using 
`Headers#lastHeader`



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-5909) Remove source jars from classpath while executing CLI tools

2018-05-14 Thread Kamal Chandraprakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash resolved KAFKA-5909.
-
Resolution: Not A Problem

> Remove source jars from classpath while executing CLI tools
> ---
>
> Key: KAFKA-5909
> URL: https://issues.apache.org/jira/browse/KAFKA-5909
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Kamal Chandraprakash
>Assignee: Kamal Chandraprakash
>Priority: Minor
>  Labels: newbie
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-5909) Remove source jars from classpath while executing CLI tools

2017-09-15 Thread Kamal Chandraprakash (JIRA)
Kamal Chandraprakash created KAFKA-5909:
---

 Summary: Remove source jars from classpath while executing CLI 
tools
 Key: KAFKA-5909
 URL: https://issues.apache.org/jira/browse/KAFKA-5909
 Project: Kafka
  Issue Type: Bug
  Components: tools
Affects Versions: 0.11.0.0
Reporter: Kamal Chandraprakash
Assignee: Kamal Chandraprakash
Priority: Minor






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Resolved] (KAFKA-5763) Refactor NetworkClient to use LogContext

2017-08-30 Thread Kamal Chandraprakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-5763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash resolved KAFKA-5763.
-
   Resolution: Fixed
Fix Version/s: 1.0.0

[~ijuma] `ConsumerNetworkClient` has already been refactored to use the 
LogContext. Please reopen the task if it's not completed.

> Refactor NetworkClient to use LogContext
> 
>
> Key: KAFKA-5763
> URL: https://issues.apache.org/jira/browse/KAFKA-5763
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Ismael Juma
> Fix For: 1.0.0
>
>
> We added a LogContext object which automatically adds a log prefix to every 
> message written by loggers constructed from it (much like the Logging mixin 
> available in the server code). We use this in the consumer to ensure that 
> messages always contain the consumer group and client ids, which is very 
> helpful when multiple consumers are run on the same instance. We should do 
> something similar for the NetworkClient. We should always include the client 
> id.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Assigned] (KAFKA-4750) KeyValueIterator returns null values

2017-03-20 Thread Kamal Chandraprakash (JIRA)

 [ 
https://issues.apache.org/jira/browse/KAFKA-4750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kamal Chandraprakash reassigned KAFKA-4750:
---

Assignee: Kamal Chandraprakash

> KeyValueIterator returns null values
> 
>
> Key: KAFKA-4750
> URL: https://issues.apache.org/jira/browse/KAFKA-4750
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.1.1
>Reporter: Michal Borowiecki
>Assignee: Kamal Chandraprakash
>  Labels: newbie
>
> The API for ReadOnlyKeyValueStore.range method promises the returned iterator 
> will not return null values. However, after upgrading from 0.10.0.0 to 
> 0.10.1.1 we found null values are returned causing NPEs on our side.
> I found this happens after removing entries from the store and I found 
> resemblance to SAMZA-94 defect. The problem seems to be as it was there, when 
> deleting entries and having a serializer that does not return null when null 
> is passed in, the state store doesn't actually delete that key/value pair but 
> the iterator will return null value for that key.
> When I modified our serilizer to return null when null is passed in, the 
> problem went away. However, I believe this should be fixed in kafka streams, 
> perhaps with a similar approach as SAMZA-94.



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)


[jira] [Commented] (KAFKA-4340) Change the default value of log.message.timestamp.difference.max.ms to the same as log.retention.ms

2016-10-25 Thread Kamal Chandraprakash (JIRA)

[ 
https://issues.apache.org/jira/browse/KAFKA-4340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15607070#comment-15607070
 ] 

Kamal Chandraprakash commented on KAFKA-4340:
-

Jiangjie Qin,

I would like to work on this issue as it's look like a newbie-one. Could you 
assign it to me?

> Change the default value of log.message.timestamp.difference.max.ms to the 
> same as log.retention.ms
> ---
>
> Key: KAFKA-4340
> URL: https://issues.apache.org/jira/browse/KAFKA-4340
> Project: Kafka
>  Issue Type: Improvement
>  Components: core
>Affects Versions: 0.10.1.0
>Reporter: Jiangjie Qin
>Assignee: Jiangjie Qin
> Fix For: 0.10.1.1
>
>
> [~junrao] brought up the following scenario: 
> If users are pumping data with timestamp already passed log.retention.ms into 
> Kafka, the messages will be appended to the log but will be immediately 
> rolled out by log retention thread when it kicks in and the messages will be 
> deleted. 
> To avoid this produce-and-deleted scenario, we can set the default value of 
> log.message.timestamp.difference.max.ms to be the same as log.retention.ms.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)