[jira] [Commented] (KAFKA-17244) java.base/java.lang.VirtualThread$VThreadContinuation.onPinned

2024-08-02 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870479#comment-17870479
 ] 

Jianbin Chen commented on KAFKA-17244:
--

# jdk21, Rocky linux 8.4, intel
 # This issue must occur after you invoke {{kafkaProducer#send}} in the virtual 
thread. You can add the {{-Djdk.tracePinnedThreads=full}} parameter in your 
test program to observe this phenomenon

Thanks for your attention to this issue

I think that this is closely related to the following code. If a synchronized 
lock is triggered in a virtual thread, the current virtual thread will be 
pinned.
{code:java}
                synchronized (dq) {
                    // After taking the lock, validate that the partition 
hasn't changed and retry.
                    if (partitionChanged(topic, topicInfo, partitionInfo, dq, 
nowMs, cluster))
                        continue;                    RecordAppendResult 
appendResult = appendNewBatch(topic, effectivePartition, dq, timestamp, key, 
value, headers, callbacks, buffer, nowMs);
                    // Set buffer to null, so that deallocate doesn't return it 
back to free pool, since it's used in the batch.
                    if (appendResult.newBatchCreated)
                        buffer = null;
                    // If queue has incomplete batches we disable switch (see 
comments in updatePartitionInfo).
                    boolean enableSwitch = allBatchesFull(dq);
                    
topicInfo.builtInPartitioner.updatePartitionInfo(partitionInfo, 
appendResult.appendedBytes, cluster, enableSwitch);
                    return appendResult;
                } {code}
Should we add a ReentrantLock to dq to replace synchronized?

 

> java.base/java.lang.VirtualThread$VThreadContinuation.onPinned
> --
>
> Key: KAFKA-17244
> URL: https://issues.apache.org/jira/browse/KAFKA-17244
> Project: Kafka
>  Issue Type: Wish
>  Components: clients, producer 
>Affects Versions: 3.7.1
>Reporter: Jianbin Chen
>Priority: Major
>
> {code:java}
> Thread[#121,ForkJoinPool-1-worker-2,5,CarrierThreads]
> java.base/java.lang.VirtualThread$VThreadContinuation.onPinned(VirtualThread.java:183)
> java.base/jdk.internal.vm.Continuation.onPinned0(Continuation.java:393)
> java.base/java.lang.VirtualThread.tryYield(VirtualThread.java:756)
> java.base/java.lang.Thread.yield(Thread.java:443)
> java.base/java.util.concurrent.ConcurrentHashMap.initTable(ConcurrentHashMap.java:2295)
> java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1017)
> java.base/java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:1541)
> org.apache.kafka.common.record.CompressionRatioEstimator.getAndCreateEstimationIfAbsent(CompressionRatioEstimator.java:96)
> org.apache.kafka.common.record.CompressionRatioEstimator.estimation(CompressionRatioEstimator.java:59)
> org.apache.kafka.clients.producer.internals.ProducerBatch.(ProducerBatch.java:95)
> org.apache.kafka.clients.producer.internals.ProducerBatch.(ProducerBatch.java:83)
> org.apache.kafka.clients.producer.internals.RecordAccumulator.appendNewBatch(RecordAccumulator.java:399)
> org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:350)
>  <== monitors:1
> org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:1025)
> org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:947) 
> {code}
> Because there is synchronized in the {{RecordAccumulator.append}} method, 
> which causes the virtual thread to be {{{}onPinned{}}}, if this is considered 
> an optimization item, please assign it to me, and I will try to optimize the 
> problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-17244) java.base/java.lang.VirtualThread$VThreadContinuation.onPinned

2024-08-02 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17870479#comment-17870479
 ] 

Jianbin Chen edited comment on KAFKA-17244 at 8/2/24 12:00 PM:
---

Hi [~kirktrue] ,
 # jdk21, Rocky linux 8.4, intel
 # This issue must occur after you invoke {{kafkaProducer#send}} in the virtual 
thread. You can add the {{-Djdk.tracePinnedThreads=full}} parameter in your 
test program to observe this phenomenon

Thanks for your attention to this issue

I think that this is closely related to the following code. If a synchronized 
lock is triggered in a virtual thread, the current virtual thread will be 
pinned.
{code:java}
                synchronized (dq) {
                    // After taking the lock, validate that the partition 
hasn't changed and retry.
                    if (partitionChanged(topic, topicInfo, partitionInfo, dq, 
nowMs, cluster))
                        continue;                    RecordAppendResult 
appendResult = appendNewBatch(topic, effectivePartition, dq, timestamp, key, 
value, headers, callbacks, buffer, nowMs);
                    // Set buffer to null, so that deallocate doesn't return it 
back to free pool, since it's used in the batch.
                    if (appendResult.newBatchCreated)
                        buffer = null;
                    // If queue has incomplete batches we disable switch (see 
comments in updatePartitionInfo).
                    boolean enableSwitch = allBatchesFull(dq);
                    
topicInfo.builtInPartitioner.updatePartitionInfo(partitionInfo, 
appendResult.appendedBytes, cluster, enableSwitch);
                    return appendResult;
                } {code}
Should we add a ReentrantLock to dq to replace synchronized?

 


was (Author: jianbin):
# jdk21, Rocky linux 8.4, intel
 # This issue must occur after you invoke {{kafkaProducer#send}} in the virtual 
thread. You can add the {{-Djdk.tracePinnedThreads=full}} parameter in your 
test program to observe this phenomenon

Thanks for your attention to this issue

I think that this is closely related to the following code. If a synchronized 
lock is triggered in a virtual thread, the current virtual thread will be 
pinned.
{code:java}
                synchronized (dq) {
                    // After taking the lock, validate that the partition 
hasn't changed and retry.
                    if (partitionChanged(topic, topicInfo, partitionInfo, dq, 
nowMs, cluster))
                        continue;                    RecordAppendResult 
appendResult = appendNewBatch(topic, effectivePartition, dq, timestamp, key, 
value, headers, callbacks, buffer, nowMs);
                    // Set buffer to null, so that deallocate doesn't return it 
back to free pool, since it's used in the batch.
                    if (appendResult.newBatchCreated)
                        buffer = null;
                    // If queue has incomplete batches we disable switch (see 
comments in updatePartitionInfo).
                    boolean enableSwitch = allBatchesFull(dq);
                    
topicInfo.builtInPartitioner.updatePartitionInfo(partitionInfo, 
appendResult.appendedBytes, cluster, enableSwitch);
                    return appendResult;
                } {code}
Should we add a ReentrantLock to dq to replace synchronized?

 

> java.base/java.lang.VirtualThread$VThreadContinuation.onPinned
> --
>
> Key: KAFKA-17244
> URL: https://issues.apache.org/jira/browse/KAFKA-17244
> Project: Kafka
>  Issue Type: Wish
>  Components: clients, producer 
>Affects Versions: 3.7.1
>Reporter: Jianbin Chen
>Priority: Major
>
> {code:java}
> Thread[#121,ForkJoinPool-1-worker-2,5,CarrierThreads]
> java.base/java.lang.VirtualThread$VThreadContinuation.onPinned(VirtualThread.java:183)
> java.base/jdk.internal.vm.Continuation.onPinned0(Continuation.java:393)
> java.base/java.lang.VirtualThread.tryYield(VirtualThread.java:756)
> java.base/java.lang.Thread.yield(Thread.java:443)
> java.base/java.util.concurrent.ConcurrentHashMap.initTable(ConcurrentHashMap.java:2295)
> java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1017)
> java.base/java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:1541)
> org.apache.kafka.common.record.CompressionRatioEstimator.getAndCreateEstimationIfAbsent(CompressionRatioEstimator.java:96)
> org.apache.kafka.common.record.CompressionRatioEstimator.estimation(CompressionRatioEstimator.java:59)
> org.apache.kafka.clients.producer.internals.ProducerBatch.(ProducerBatch.java:95)
> org.apache.kafka.clients.producer.internals.ProducerBatch.(ProducerBatch.java:83)
> org.apache.kafka.clients.producer.internals.RecordAccumulator.appendNewBatch(RecordAccumulator.java:399)
> org.a

[jira] [Created] (KAFKA-17244) java.base/java.lang.VirtualThread$VThreadContinuation.onPinned

2024-08-01 Thread Jianbin Chen (Jira)
Jianbin Chen created KAFKA-17244:


 Summary: 
java.base/java.lang.VirtualThread$VThreadContinuation.onPinned
 Key: KAFKA-17244
 URL: https://issues.apache.org/jira/browse/KAFKA-17244
 Project: Kafka
  Issue Type: Wish
Affects Versions: 3.7.1
Reporter: Jianbin Chen


{code:java}
Thread[#121,ForkJoinPool-1-worker-2,5,CarrierThreads]
java.base/java.lang.VirtualThread$VThreadContinuation.onPinned(VirtualThread.java:183)
java.base/jdk.internal.vm.Continuation.onPinned0(Continuation.java:393)
java.base/java.lang.VirtualThread.tryYield(VirtualThread.java:756)
java.base/java.lang.Thread.yield(Thread.java:443)
java.base/java.util.concurrent.ConcurrentHashMap.initTable(ConcurrentHashMap.java:2295)
java.base/java.util.concurrent.ConcurrentHashMap.putVal(ConcurrentHashMap.java:1017)
java.base/java.util.concurrent.ConcurrentHashMap.putIfAbsent(ConcurrentHashMap.java:1541)
org.apache.kafka.common.record.CompressionRatioEstimator.getAndCreateEstimationIfAbsent(CompressionRatioEstimator.java:96)
org.apache.kafka.common.record.CompressionRatioEstimator.estimation(CompressionRatioEstimator.java:59)
org.apache.kafka.clients.producer.internals.ProducerBatch.(ProducerBatch.java:95)
org.apache.kafka.clients.producer.internals.ProducerBatch.(ProducerBatch.java:83)
org.apache.kafka.clients.producer.internals.RecordAccumulator.appendNewBatch(RecordAccumulator.java:399)
org.apache.kafka.clients.producer.internals.RecordAccumulator.append(RecordAccumulator.java:350)
 <== monitors:1
org.apache.kafka.clients.producer.KafkaProducer.doSend(KafkaProducer.java:1025)
org.apache.kafka.clients.producer.KafkaProducer.send(KafkaProducer.java:947) 
{code}
Because there is synchronized in the {{RecordAccumulator.append}} method, which 
causes the virtual thread to be {{{}onPinned{}}}, if this is considered an 
optimization item, please assign it to me, and I will try to optimize the 
problem.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-17068) Failed to modify controller IP under Raft mode

2024-07-02 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862667#comment-17862667
 ] 

Jianbin Chen commented on KAFKA-17068:
--

[~showuon] I'll go check out this KIP. Thanks for your response

> Failed to modify controller IP under Raft mode
> --
>
> Key: KAFKA-17068
> URL: https://issues.apache.org/jira/browse/KAFKA-17068
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 3.7.1
>Reporter: Jianbin Chen
>Priority: Major
>
> {code:java}
> controller.quorum.voters=1@192.168.1.123:9093,2@192.168.1.124:9093,3@192.168.1.125:9093{code}
> change to
> {code:java}
> controller.quorum.voters=1@192.168.1.126:9093,2@192.168.1.124:9093,3@192.168.1.125:9093{code}
> 192.168.1.126 log :
> {code:java}
> [2024-07-03 14:05:22,236] INFO [ControllerRegistrationManager id=1 
> incarnation=ULBsL0bbRvG7iXKCKtCQgg] RegistrationResponseHandler: controller 
> acknowledged ControllerRegistrationRequest. 
> (kafka.server.ControllerRegistrationManager)
> [2024-07-03 14:05:22,708] INFO [ControllerRegistrationManager id=1 
> incarnation=ULBsL0bbRvG7iXKCKtCQgg] Our registration has been persisted to 
> the metadata log. (kafka.server.ControllerRegistrationManager)
> [2024-07-03 14:05:22,816] INFO [AdminClient clientId=adminclient-1] Node -1 
> disconnected. (org.apache.kafka.clients.NetworkClient)
> [2024-07-03 14:05:22,816] WARN [AdminClient clientId=adminclient-1] 
> Connection to node -1 (/192.168.1.126:9092) could not be established. Node 
> may not be available. (org.apache.kafka.clients.NetworkClient){code}
> Then the node at 192.168.1.126 crashed
> {code:java}
> [2024-07-03 14:06:55,134] INFO [MetadataLoader id=1] beginShutdown: shutting 
> down event queue. (org.apache.kafka.queue.KafkaEventQueue)
> [2024-07-03 14:06:55,134] INFO [SnapshotGenerator id=1] beginShutdown: 
> shutting down event queue. (org.apache.kafka.queue.KafkaEventQueue)
> [2024-07-03 14:06:55,134] INFO [SnapshotGenerator id=1] closed event queue. 
> (org.apache.kafka.queue.KafkaEventQueue)
> [2024-07-03 14:06:55,135] INFO [MetadataLoader id=1] closed event queue. 
> (org.apache.kafka.queue.KafkaEventQueue)
> [2024-07-03 14:06:55,135] INFO [SnapshotGenerator id=1] closed event queue. 
> (org.apache.kafka.queue.KafkaEventQueue)
> [2024-07-03 14:06:55,136] INFO Metrics scheduler closed 
> (org.apache.kafka.common.metrics.Metrics)
> [2024-07-03 14:06:55,136] INFO Closing reporter 
> org.apache.kafka.common.metrics.JmxReporter 
> (org.apache.kafka.common.metrics.Metrics)
> [2024-07-03 14:06:55,137] INFO Metrics reporters closed 
> (org.apache.kafka.common.metrics.Metrics)
> [2024-07-03 14:06:55,137] INFO App info kafka.server for 1 unregistered 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2024-07-03 14:06:55,137] INFO App info kafka.server for 1 unregistered 
> (org.apache.kafka.common.utils.AppInfoParser)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-17068) Failed to modify controller IP under Raft mode

2024-07-02 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17862666#comment-17862666
 ] 

Jianbin Chen commented on KAFKA-17068:
--

I found the reason. It's because I forgot to stop the node at 192.168.1.123, 
while I had already changed the other two brokers to 192.168.1.126. Starting 
192.168.1.126 resulted in failure. Is there any way to optimize this situation? 
Otherwise, it's a bit misleading and could lead to confusion during 
troubleshooting

> Failed to modify controller IP under Raft mode
> --
>
> Key: KAFKA-17068
> URL: https://issues.apache.org/jira/browse/KAFKA-17068
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 3.7.1
>Reporter: Jianbin Chen
>Priority: Major
>
> {code:java}
> controller.quorum.voters=1@192.168.1.123:9093,2@192.168.1.124:9093,3@192.168.1.125:9093{code}
> change to
> {code:java}
> controller.quorum.voters=1@192.168.1.126:9093,2@192.168.1.124:9093,3@192.168.1.125:9093{code}
> 192.168.1.126 log :
> {code:java}
> [2024-07-03 14:05:22,236] INFO [ControllerRegistrationManager id=1 
> incarnation=ULBsL0bbRvG7iXKCKtCQgg] RegistrationResponseHandler: controller 
> acknowledged ControllerRegistrationRequest. 
> (kafka.server.ControllerRegistrationManager)
> [2024-07-03 14:05:22,708] INFO [ControllerRegistrationManager id=1 
> incarnation=ULBsL0bbRvG7iXKCKtCQgg] Our registration has been persisted to 
> the metadata log. (kafka.server.ControllerRegistrationManager)
> [2024-07-03 14:05:22,816] INFO [AdminClient clientId=adminclient-1] Node -1 
> disconnected. (org.apache.kafka.clients.NetworkClient)
> [2024-07-03 14:05:22,816] WARN [AdminClient clientId=adminclient-1] 
> Connection to node -1 (/192.168.1.126:9092) could not be established. Node 
> may not be available. (org.apache.kafka.clients.NetworkClient){code}
> Then the node at 192.168.1.126 crashed
> {code:java}
> [2024-07-03 14:06:55,134] INFO [MetadataLoader id=1] beginShutdown: shutting 
> down event queue. (org.apache.kafka.queue.KafkaEventQueue)
> [2024-07-03 14:06:55,134] INFO [SnapshotGenerator id=1] beginShutdown: 
> shutting down event queue. (org.apache.kafka.queue.KafkaEventQueue)
> [2024-07-03 14:06:55,134] INFO [SnapshotGenerator id=1] closed event queue. 
> (org.apache.kafka.queue.KafkaEventQueue)
> [2024-07-03 14:06:55,135] INFO [MetadataLoader id=1] closed event queue. 
> (org.apache.kafka.queue.KafkaEventQueue)
> [2024-07-03 14:06:55,135] INFO [SnapshotGenerator id=1] closed event queue. 
> (org.apache.kafka.queue.KafkaEventQueue)
> [2024-07-03 14:06:55,136] INFO Metrics scheduler closed 
> (org.apache.kafka.common.metrics.Metrics)
> [2024-07-03 14:06:55,136] INFO Closing reporter 
> org.apache.kafka.common.metrics.JmxReporter 
> (org.apache.kafka.common.metrics.Metrics)
> [2024-07-03 14:06:55,137] INFO Metrics reporters closed 
> (org.apache.kafka.common.metrics.Metrics)
> [2024-07-03 14:06:55,137] INFO App info kafka.server for 1 unregistered 
> (org.apache.kafka.common.utils.AppInfoParser)
> [2024-07-03 14:06:55,137] INFO App info kafka.server for 1 unregistered 
> (org.apache.kafka.common.utils.AppInfoParser)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-17068) Failed to modify controller IP under Raft mode

2024-07-02 Thread Jianbin Chen (Jira)
Jianbin Chen created KAFKA-17068:


 Summary: Failed to modify controller IP under Raft mode
 Key: KAFKA-17068
 URL: https://issues.apache.org/jira/browse/KAFKA-17068
 Project: Kafka
  Issue Type: Wish
Affects Versions: 3.7.1
Reporter: Jianbin Chen


{code:java}
controller.quorum.voters=1@192.168.1.123:9093,2@192.168.1.124:9093,3@192.168.1.125:9093{code}
change to
{code:java}
controller.quorum.voters=1@192.168.1.126:9093,2@192.168.1.124:9093,3@192.168.1.125:9093{code}
192.168.1.126 log :
{code:java}
[2024-07-03 14:05:22,236] INFO [ControllerRegistrationManager id=1 
incarnation=ULBsL0bbRvG7iXKCKtCQgg] RegistrationResponseHandler: controller 
acknowledged ControllerRegistrationRequest. 
(kafka.server.ControllerRegistrationManager)
[2024-07-03 14:05:22,708] INFO [ControllerRegistrationManager id=1 
incarnation=ULBsL0bbRvG7iXKCKtCQgg] Our registration has been persisted to the 
metadata log. (kafka.server.ControllerRegistrationManager)
[2024-07-03 14:05:22,816] INFO [AdminClient clientId=adminclient-1] Node -1 
disconnected. (org.apache.kafka.clients.NetworkClient)
[2024-07-03 14:05:22,816] WARN [AdminClient clientId=adminclient-1] Connection 
to node -1 (/192.168.1.126:9092) could not be established. Node may not be 
available. (org.apache.kafka.clients.NetworkClient){code}

Then the node at 192.168.1.126 crashed

{code:java}
[2024-07-03 14:06:55,134] INFO [MetadataLoader id=1] beginShutdown: shutting 
down event queue. (org.apache.kafka.queue.KafkaEventQueue)
[2024-07-03 14:06:55,134] INFO [SnapshotGenerator id=1] beginShutdown: shutting 
down event queue. (org.apache.kafka.queue.KafkaEventQueue)
[2024-07-03 14:06:55,134] INFO [SnapshotGenerator id=1] closed event queue. 
(org.apache.kafka.queue.KafkaEventQueue)
[2024-07-03 14:06:55,135] INFO [MetadataLoader id=1] closed event queue. 
(org.apache.kafka.queue.KafkaEventQueue)
[2024-07-03 14:06:55,135] INFO [SnapshotGenerator id=1] closed event queue. 
(org.apache.kafka.queue.KafkaEventQueue)
[2024-07-03 14:06:55,136] INFO Metrics scheduler closed 
(org.apache.kafka.common.metrics.Metrics)
[2024-07-03 14:06:55,136] INFO Closing reporter 
org.apache.kafka.common.metrics.JmxReporter 
(org.apache.kafka.common.metrics.Metrics)
[2024-07-03 14:06:55,137] INFO Metrics reporters closed 
(org.apache.kafka.common.metrics.Metrics)
[2024-07-03 14:06:55,137] INFO App info kafka.server for 1 unregistered 
(org.apache.kafka.common.utils.AppInfoParser)
[2024-07-03 14:06:55,137] INFO App info kafka.server for 1 unregistered 
(org.apache.kafka.common.utils.AppInfoParser)
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-17020) After enabling tiered storage, occasional residual logs are left in the replica

2024-06-25 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859890#comment-17859890
 ] 

Jianbin Chen commented on KAFKA-17020:
--

[~showuon] My cluster has been running for over half a month, and this is the 
first time this issue has occurred, so it's an intermittent event. Therefore, I 
cannot directly reproduce the problem. I have currently reassigned the 
partitions for some topics with residual log files, and now the issue has not 
reoccurred. I am thinking it might take a long period of operation before it 
appears again.

> After enabling tiered storage, occasional residual logs are left in the 
> replica
> ---
>
> Key: KAFKA-17020
> URL: https://issues.apache.org/jira/browse/KAFKA-17020
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Major
> Attachments: image-2024-06-22-21-45-43-815.png, 
> image-2024-06-22-21-46-12-371.png, image-2024-06-22-21-46-26-530.png, 
> image-2024-06-22-21-46-42-917.png, image-2024-06-22-21-47-00-230.png
>
>
> After enabling tiered storage, occasional residual logs are left in the 
> replica.
> Based on the observed phenomenon, the index values of the rolled-out logs 
> generated by the replica and the leader are not the same. As a result, the 
> logs uploaded to S3 at the same time do not include the corresponding log 
> files on the replica side, making it impossible to delete the local logs.
> !image-2024-06-22-21-45-43-815.png!
> leader config:
> {code:java}
> num.partitions=3
> default.replication.factor=2
> delete.topic.enable=true
> auto.create.topics.enable=false
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=2
> transaction.state.log.min.isr=1
> offsets.retention.minutes=4320
> log.roll.ms=8640
> log.local.retention.ms=60
> log.segment.bytes=536870912
> num.replica.fetchers=1
> log.retention.ms=1581120
> remote.log.manager.thread.pool.size=4
> remote.log.reader.threads=4
> remote.log.metadata.topic.replication.factor=3
> remote.log.storage.system.enable=true
> remote.log.metadata.topic.retention.ms=18000
> rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
> rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
> Pick some cache size, 16 GiB here:
> rsm.config.fetch.chunk.cache.size=34359738368
> rsm.config.fetch.chunk.cache.retention.ms=120
> # # Prefetching size, 16 MiB here:
> rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
> rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
> rsm.config.storage.s3.bucket.name=
> rsm.config.storage.s3.region=us-west-1
> rsm.config.storage.aws.secret.access.key=
> rsm.config.storage.aws.access.key.id=
> rsm.config.chunk.size=8388608
> remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/:/home/admin/s3-0.0.1-SNAPSHOT/
> remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
> remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
> remote.log.metadata.manager.listener.name=PLAINTEXT
> rsm.config.upload.rate.limit.bytes.per.second=31457280
> {code}
>  replica config:
> {code:java}
> num.partitions=3
> default.replication.factor=2
> delete.topic.enable=true
> auto.create.topics.enable=false
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=2
> transaction.state.log.min.isr=1
> offsets.retention.minutes=4320
> log.roll.ms=8640
> log.local.retention.ms=60
> log.segment.bytes=536870912
> num.replica.fetchers=1
> log.retention.ms=1581120
> remote.log.manager.thread.pool.size=4
> remote.log.reader.threads=4
> remote.log.metadata.topic.replication.factor=3
> remote.log.storage.system.enable=true
> #remote.log.metadata.topic.retention.ms=18000
> rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
> rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
> # Pick some cache size, 16 GiB here:
> rsm.config.fetch.chunk.cache.size=34359738368
> rsm.config.fetch.chunk.cache.retention.ms=120
> # # # Prefetching size, 16 MiB here:
> rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
> rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
> rsm.config.storage.s3.bucket.name=
> rsm.config.storage.s3.region=us-west-1
> rsm.config.storage.aws.secret.access.key=
> rsm.config.storage.aws.access.key.id=
> rsm.config.chunk.size=8388608
> remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/*:/home/admin/s3-0.0.1-SNAPSHOT/*
> remo

[jira] [Commented] (KAFKA-17020) After enabling tiered storage, occasional residual logs are left in the replica

2024-06-25 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859884#comment-17859884
 ] 

Jianbin Chen commented on KAFKA-17020:
--

[~showuon] 

I still appreciate your response. The screenshots of the logs in the folders 
corresponding to the topics and partitions for my leader and replicas actually 
illustrate the issue. My configuration is set to store logs locally for only 10 
minutes, and the segment size is limited to 512MB. This segment was already 
full at 512MB over 2 hours ago and satisfied the 10-minute TTL requirement. 
Additionally, the leader has already uploaded the corresponding log to remote 
storage. However, the replicas still retain this log file for several hours, 
and a simple restart does not resolve the issue. Currently, there are a few 
temporary solutions:
 # Stop the replicas' processes, delete the topic partition folders with 
residual logs, and then restart the corresponding broker nodes.
 # Perform a topic partition reassignment. After re-electing the partition 
leader, the issue will also be resolved.

However, these methods are only temporary fixes. I still do not understand why 
this issue suddenly appeared after running smoothly for half a month

> After enabling tiered storage, occasional residual logs are left in the 
> replica
> ---
>
> Key: KAFKA-17020
> URL: https://issues.apache.org/jira/browse/KAFKA-17020
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Major
> Attachments: image-2024-06-22-21-45-43-815.png, 
> image-2024-06-22-21-46-12-371.png, image-2024-06-22-21-46-26-530.png, 
> image-2024-06-22-21-46-42-917.png, image-2024-06-22-21-47-00-230.png
>
>
> After enabling tiered storage, occasional residual logs are left in the 
> replica.
> Based on the observed phenomenon, the index values of the rolled-out logs 
> generated by the replica and the leader are not the same. As a result, the 
> logs uploaded to S3 at the same time do not include the corresponding log 
> files on the replica side, making it impossible to delete the local logs.
> !image-2024-06-22-21-45-43-815.png!
> leader config:
> {code:java}
> num.partitions=3
> default.replication.factor=2
> delete.topic.enable=true
> auto.create.topics.enable=false
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=2
> transaction.state.log.min.isr=1
> offsets.retention.minutes=4320
> log.roll.ms=8640
> log.local.retention.ms=60
> log.segment.bytes=536870912
> num.replica.fetchers=1
> log.retention.ms=1581120
> remote.log.manager.thread.pool.size=4
> remote.log.reader.threads=4
> remote.log.metadata.topic.replication.factor=3
> remote.log.storage.system.enable=true
> remote.log.metadata.topic.retention.ms=18000
> rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
> rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
> Pick some cache size, 16 GiB here:
> rsm.config.fetch.chunk.cache.size=34359738368
> rsm.config.fetch.chunk.cache.retention.ms=120
> # # Prefetching size, 16 MiB here:
> rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
> rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
> rsm.config.storage.s3.bucket.name=
> rsm.config.storage.s3.region=us-west-1
> rsm.config.storage.aws.secret.access.key=
> rsm.config.storage.aws.access.key.id=
> rsm.config.chunk.size=8388608
> remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/:/home/admin/s3-0.0.1-SNAPSHOT/
> remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
> remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
> remote.log.metadata.manager.listener.name=PLAINTEXT
> rsm.config.upload.rate.limit.bytes.per.second=31457280
> {code}
>  replica config:
> {code:java}
> num.partitions=3
> default.replication.factor=2
> delete.topic.enable=true
> auto.create.topics.enable=false
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=2
> transaction.state.log.min.isr=1
> offsets.retention.minutes=4320
> log.roll.ms=8640
> log.local.retention.ms=60
> log.segment.bytes=536870912
> num.replica.fetchers=1
> log.retention.ms=1581120
> remote.log.manager.thread.pool.size=4
> remote.log.reader.threads=4
> remote.log.metadata.topic.replication.factor=3
> remote.log.storage.system.enable=true
> #remote.log.metadata.topic.retention.ms=18000
> rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
> rsm.config.fetch.chunk.cache.path=

[jira] [Commented] (KAFKA-17020) After enabling tiered storage, occasional residual logs are left in the replica

2024-06-24 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859618#comment-17859618
 ] 

Jianbin Chen commented on KAFKA-17020:
--

[~showuon] Can you tell me what kind of logs you need as evidence? This issue 
has been ongoing for several days. I can look for the relevant logs, but first, 
you need to provide me with keywords to use when searching for the logs.

> After enabling tiered storage, occasional residual logs are left in the 
> replica
> ---
>
> Key: KAFKA-17020
> URL: https://issues.apache.org/jira/browse/KAFKA-17020
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Major
> Attachments: image-2024-06-22-21-45-43-815.png, 
> image-2024-06-22-21-46-12-371.png, image-2024-06-22-21-46-26-530.png, 
> image-2024-06-22-21-46-42-917.png, image-2024-06-22-21-47-00-230.png
>
>
> After enabling tiered storage, occasional residual logs are left in the 
> replica.
> Based on the observed phenomenon, the index values of the rolled-out logs 
> generated by the replica and the leader are not the same. As a result, the 
> logs uploaded to S3 at the same time do not include the corresponding log 
> files on the replica side, making it impossible to delete the local logs.
> !image-2024-06-22-21-45-43-815.png!
> leader config:
> {code:java}
> num.partitions=3
> default.replication.factor=2
> delete.topic.enable=true
> auto.create.topics.enable=false
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=2
> transaction.state.log.min.isr=1
> offsets.retention.minutes=4320
> log.roll.ms=8640
> log.local.retention.ms=60
> log.segment.bytes=536870912
> num.replica.fetchers=1
> log.retention.ms=1581120
> remote.log.manager.thread.pool.size=4
> remote.log.reader.threads=4
> remote.log.metadata.topic.replication.factor=3
> remote.log.storage.system.enable=true
> remote.log.metadata.topic.retention.ms=18000
> rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
> rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
> Pick some cache size, 16 GiB here:
> rsm.config.fetch.chunk.cache.size=34359738368
> rsm.config.fetch.chunk.cache.retention.ms=120
> # # Prefetching size, 16 MiB here:
> rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
> rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
> rsm.config.storage.s3.bucket.name=
> rsm.config.storage.s3.region=us-west-1
> rsm.config.storage.aws.secret.access.key=
> rsm.config.storage.aws.access.key.id=
> rsm.config.chunk.size=8388608
> remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/:/home/admin/s3-0.0.1-SNAPSHOT/
> remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
> remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
> remote.log.metadata.manager.listener.name=PLAINTEXT
> rsm.config.upload.rate.limit.bytes.per.second=31457280
> {code}
>  replica config:
> {code:java}
> num.partitions=3
> default.replication.factor=2
> delete.topic.enable=true
> auto.create.topics.enable=false
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=2
> transaction.state.log.min.isr=1
> offsets.retention.minutes=4320
> log.roll.ms=8640
> log.local.retention.ms=60
> log.segment.bytes=536870912
> num.replica.fetchers=1
> log.retention.ms=1581120
> remote.log.manager.thread.pool.size=4
> remote.log.reader.threads=4
> remote.log.metadata.topic.replication.factor=3
> remote.log.storage.system.enable=true
> #remote.log.metadata.topic.retention.ms=18000
> rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
> rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
> # Pick some cache size, 16 GiB here:
> rsm.config.fetch.chunk.cache.size=34359738368
> rsm.config.fetch.chunk.cache.retention.ms=120
> # # # Prefetching size, 16 MiB here:
> rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
> rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
> rsm.config.storage.s3.bucket.name=
> rsm.config.storage.s3.region=us-west-1
> rsm.config.storage.aws.secret.access.key=
> rsm.config.storage.aws.access.key.id=
> rsm.config.chunk.size=8388608
> remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/*:/home/admin/s3-0.0.1-SNAPSHOT/*
> remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
> remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.st

[jira] [Commented] (KAFKA-17020) After enabling tiered storage, occasional residual logs are left in the replica

2024-06-24 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859617#comment-17859617
 ] 

Jianbin Chen commented on KAFKA-17020:
--

[~showuon] The issue is that after the logs in local storage are uploaded to 
remote storage and deleted on the leader side, the replicas fail to delete the 
local logs.

> After enabling tiered storage, occasional residual logs are left in the 
> replica
> ---
>
> Key: KAFKA-17020
> URL: https://issues.apache.org/jira/browse/KAFKA-17020
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Major
> Attachments: image-2024-06-22-21-45-43-815.png, 
> image-2024-06-22-21-46-12-371.png, image-2024-06-22-21-46-26-530.png, 
> image-2024-06-22-21-46-42-917.png, image-2024-06-22-21-47-00-230.png
>
>
> After enabling tiered storage, occasional residual logs are left in the 
> replica.
> Based on the observed phenomenon, the index values of the rolled-out logs 
> generated by the replica and the leader are not the same. As a result, the 
> logs uploaded to S3 at the same time do not include the corresponding log 
> files on the replica side, making it impossible to delete the local logs.
> !image-2024-06-22-21-45-43-815.png!
> leader config:
> {code:java}
> num.partitions=3
> default.replication.factor=2
> delete.topic.enable=true
> auto.create.topics.enable=false
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=2
> transaction.state.log.min.isr=1
> offsets.retention.minutes=4320
> log.roll.ms=8640
> log.local.retention.ms=60
> log.segment.bytes=536870912
> num.replica.fetchers=1
> log.retention.ms=1581120
> remote.log.manager.thread.pool.size=4
> remote.log.reader.threads=4
> remote.log.metadata.topic.replication.factor=3
> remote.log.storage.system.enable=true
> remote.log.metadata.topic.retention.ms=18000
> rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
> rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
> Pick some cache size, 16 GiB here:
> rsm.config.fetch.chunk.cache.size=34359738368
> rsm.config.fetch.chunk.cache.retention.ms=120
> # # Prefetching size, 16 MiB here:
> rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
> rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
> rsm.config.storage.s3.bucket.name=
> rsm.config.storage.s3.region=us-west-1
> rsm.config.storage.aws.secret.access.key=
> rsm.config.storage.aws.access.key.id=
> rsm.config.chunk.size=8388608
> remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/:/home/admin/s3-0.0.1-SNAPSHOT/
> remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
> remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
> remote.log.metadata.manager.listener.name=PLAINTEXT
> rsm.config.upload.rate.limit.bytes.per.second=31457280
> {code}
>  replica config:
> {code:java}
> num.partitions=3
> default.replication.factor=2
> delete.topic.enable=true
> auto.create.topics.enable=false
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=2
> transaction.state.log.min.isr=1
> offsets.retention.minutes=4320
> log.roll.ms=8640
> log.local.retention.ms=60
> log.segment.bytes=536870912
> num.replica.fetchers=1
> log.retention.ms=1581120
> remote.log.manager.thread.pool.size=4
> remote.log.reader.threads=4
> remote.log.metadata.topic.replication.factor=3
> remote.log.storage.system.enable=true
> #remote.log.metadata.topic.retention.ms=18000
> rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
> rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
> # Pick some cache size, 16 GiB here:
> rsm.config.fetch.chunk.cache.size=34359738368
> rsm.config.fetch.chunk.cache.retention.ms=120
> # # # Prefetching size, 16 MiB here:
> rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
> rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
> rsm.config.storage.s3.bucket.name=
> rsm.config.storage.s3.region=us-west-1
> rsm.config.storage.aws.secret.access.key=
> rsm.config.storage.aws.access.key.id=
> rsm.config.chunk.size=8388608
> remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/*:/home/admin/s3-0.0.1-SNAPSHOT/*
> remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
> remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
> remote.log.metadata.

[jira] [Updated] (KAFKA-17020) After enabling tiered storage, occasional residual logs are left in the replica

2024-06-22 Thread Jianbin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianbin Chen updated KAFKA-17020:
-
Attachment: image-2024-06-22-21-47-00-230.png
image-2024-06-22-21-46-42-917.png
image-2024-06-22-21-46-26-530.png
image-2024-06-22-21-46-12-371.png
image-2024-06-22-21-45-43-815.png
External issue URL: 
https://github.com/Aiven-Open/tiered-storage-for-apache-kafka/issues/562
   Description: 
After enabling tiered storage, occasional residual logs are left in the replica.
Based on the observed phenomenon, the index values of the rolled-out logs 
generated by the replica and the leader are not the same. As a result, the logs 
uploaded to S3 at the same time do not include the corresponding log files on 
the replica side, making it impossible to delete the local logs.
!image-2024-06-22-21-45-43-815.png!
leader config:
{code:java}
num.partitions=3
default.replication.factor=2
delete.topic.enable=true
auto.create.topics.enable=false
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=1
offsets.retention.minutes=4320
log.roll.ms=8640
log.local.retention.ms=60
log.segment.bytes=536870912
num.replica.fetchers=1
log.retention.ms=1581120
remote.log.manager.thread.pool.size=4
remote.log.reader.threads=4
remote.log.metadata.topic.replication.factor=3
remote.log.storage.system.enable=true
remote.log.metadata.topic.retention.ms=18000
rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache

Pick some cache size, 16 GiB here:
rsm.config.fetch.chunk.cache.size=34359738368
rsm.config.fetch.chunk.cache.retention.ms=120
# # Prefetching size, 16 MiB here:
rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
rsm.config.storage.s3.bucket.name=
rsm.config.storage.s3.region=us-west-1
rsm.config.storage.aws.secret.access.key=
rsm.config.storage.aws.access.key.id=
rsm.config.chunk.size=8388608
remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/:/home/admin/s3-0.0.1-SNAPSHOT/
remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
remote.log.metadata.manager.listener.name=PLAINTEXT
rsm.config.upload.rate.limit.bytes.per.second=31457280
{code}
 replica config:
{code:java}
num.partitions=3
default.replication.factor=2
delete.topic.enable=true
auto.create.topics.enable=false
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=1
offsets.retention.minutes=4320
log.roll.ms=8640
log.local.retention.ms=60
log.segment.bytes=536870912
num.replica.fetchers=1
log.retention.ms=1581120
remote.log.manager.thread.pool.size=4
remote.log.reader.threads=4
remote.log.metadata.topic.replication.factor=3
remote.log.storage.system.enable=true
#remote.log.metadata.topic.retention.ms=18000
rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
# Pick some cache size, 16 GiB here:
rsm.config.fetch.chunk.cache.size=34359738368
rsm.config.fetch.chunk.cache.retention.ms=120
# # # Prefetching size, 16 MiB here:
rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
rsm.config.storage.s3.bucket.name=
rsm.config.storage.s3.region=us-west-1
rsm.config.storage.aws.secret.access.key=
rsm.config.storage.aws.access.key.id=
rsm.config.chunk.size=8388608
remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/*:/home/admin/s3-0.0.1-SNAPSHOT/*
remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
remote.log.metadata.manager.listener.name=PLAINTEXT
rsm.config.upload.rate.limit.bytes.per.second=31457280 {code}
topic config:
{code:java}
Dynamic configs for topic xx are:
local.retention.ms=60 sensitive=false 
synonyms={DYNAMIC_TOPIC_CONFIG:local.retention.ms=60, 
STATIC_BROKER_CONFIG:log.local.retention.ms=60, 
DEFAULT_CONFIG:log.local.retention.ms=-2}
remote.storage.enable=true sensitive=false 
synonyms={DYNAMIC_TOPIC_CONFIG:remote.storage.enable=true}
retention.ms=1581120 sensitive=false 
synonyms={DYNAMIC_TOPIC_CONFIG:retention.ms=1581120, 
STATIC_BROKER_CONFIG:log.retention.ms=1581120, 
DEFAULT_CON

[jira] [Updated] (KAFKA-17020) After enabling tiered storage, occasional residual logs are left in the replica

2024-06-21 Thread Jianbin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianbin Chen updated KAFKA-17020:
-
Description: 
After enabling tiered storage, occasional residual logs are left in the replica.
Based on the observed phenomenon, the index values of the rolled-out logs 
generated by the replica and the leader are not the same. As a result, the logs 
uploaded to S3 at the same time do not include the corresponding log files on 
the replica side, making it impossible to delete the local logs.
[!https://private-user-images.githubusercontent.com/19943636/341939158-d0b87a7d-aca1-4700-b3e1-fceff0530c79.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5MzkxNTgtZDBiODdhN2QtYWNhMS00NzAwLWIzZTEtZmNlZmYwNTMwYzc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3ZDQ2OGIxMmE3OGI2Njc2YzdkNzkwMzlhNmM5MzAxNjY0MWZiMzA2ZjgwNzgzM2JlYTMxMzM4Njk1NGI5MDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Sdsvwn0dUi_p1dG0W_AvQY6Iqeimy_UZ8VldKUS1Q0E!|https://private-user-images.githubusercontent.com/19943636/341939158-d0b87a7d-aca1-4700-b3e1-fceff0530c79.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5MzkxNTgtZDBiODdhN2QtYWNhMS00NzAwLWIzZTEtZmNlZmYwNTMwYzc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3ZDQ2OGIxMmE3OGI2Njc2YzdkNzkwMzlhNmM5MzAxNjY0MWZiMzA2ZjgwNzgzM2JlYTMxMzM4Njk1NGI5MDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Sdsvwn0dUi_p1dG0W_AvQY6Iqeimy_UZ8VldKUS1Q0E]
leader config:
{code:java}
num.partitions=3
default.replication.factor=2
delete.topic.enable=true
auto.create.topics.enable=false
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=1
offsets.retention.minutes=4320
log.roll.ms=8640
log.local.retention.ms=60
log.segment.bytes=536870912
num.replica.fetchers=1
log.retention.ms=1581120
remote.log.manager.thread.pool.size=4
remote.log.reader.threads=4
remote.log.metadata.topic.replication.factor=3
remote.log.storage.system.enable=true
remote.log.metadata.topic.retention.ms=18000
rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache

Pick some cache size, 16 GiB here:
rsm.config.fetch.chunk.cache.size=34359738368
rsm.config.fetch.chunk.cache.retention.ms=120
# # Prefetching size, 16 MiB here:
rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
rsm.config.storage.s3.bucket.name=
rsm.config.storage.s3.region=us-west-1
rsm.config.storage.aws.secret.access.key=
rsm.config.storage.aws.access.key.id=
rsm.config.chunk.size=8388608
remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/:/home/admin/s3-0.0.1-SNAPSHOT/
remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
remote.log.metadata.manager.listener.name=PLAINTEXT
rsm.config.upload.rate.limit.bytes.per.second=31457280
{code}
 replica config:
{code:java}
num.partitions=3
default.replication.factor=2
delete.topic.enable=true
auto.create.topics.enable=false
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=1
offsets.retention.minutes=4320
log.roll.ms=8640
log.local.retention.ms=60
log.segment.bytes=536870912
num.replica.fetchers=1
log.retention.ms=1581120
remote.log.manager.thread.pool.size=4
remote.log.reader.threads=4
remote.log.metadata.topic.replication.factor=3
remote.log.storage.system.enable=true
#remote.log.metadata.topic.retention.ms=18000
rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
# Pick some cache size, 16 GiB here:
rsm.config.fetch.chunk.cache.size=34359738368
rsm.config.fetch.chunk.cache.retention.ms=120
# # # Prefetching size, 16 MiB here:
rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
rsm.config.stor

[jira] [Updated] (KAFKA-17020) After enabling tiered storage, occasional residual logs are left in the replica

2024-06-21 Thread Jianbin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianbin Chen updated KAFKA-17020:
-
Description: 
After enabling tiered storage, occasional residual logs are left in the replica.
Based on the observed phenomenon, the index values of the rolled-out logs 
generated by the replica and the leader are not the same. As a result, the logs 
uploaded to S3 at the same time do not include the corresponding log files on 
the replica side, making it impossible to delete the local logs.
[!https://private-user-images.githubusercontent.com/19943636/341939158-d0b87a7d-aca1-4700-b3e1-fceff0530c79.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5MzkxNTgtZDBiODdhN2QtYWNhMS00NzAwLWIzZTEtZmNlZmYwNTMwYzc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3ZDQ2OGIxMmE3OGI2Njc2YzdkNzkwMzlhNmM5MzAxNjY0MWZiMzA2ZjgwNzgzM2JlYTMxMzM4Njk1NGI5MDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Sdsvwn0dUi_p1dG0W_AvQY6Iqeimy_UZ8VldKUS1Q0E!|https://private-user-images.githubusercontent.com/19943636/341939158-d0b87a7d-aca1-4700-b3e1-fceff0530c79.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5MzkxNTgtZDBiODdhN2QtYWNhMS00NzAwLWIzZTEtZmNlZmYwNTMwYzc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3ZDQ2OGIxMmE3OGI2Njc2YzdkNzkwMzlhNmM5MzAxNjY0MWZiMzA2ZjgwNzgzM2JlYTMxMzM4Njk1NGI5MDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Sdsvwn0dUi_p1dG0W_AvQY6Iqeimy_UZ8VldKUS1Q0E]
leader config:
num.partitions=3
default.replication.factor=2
delete.topic.enable=true
auto.create.topics.enable=false
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=1
offsets.retention.minutes=4320
log.roll.ms=8640
log.local.retention.ms=60
log.segment.bytes=536870912
num.replica.fetchers=1
log.retention.ms=1581120
remote.log.manager.thread.pool.size=4
remote.log.reader.threads=4
remote.log.metadata.topic.replication.factor=3
remote.log.storage.system.enable=true
remote.log.metadata.topic.retention.ms=18000
rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
 # Pick some cache size, 16 GiB here:
rsm.config.fetch.chunk.cache.size=34359738368
rsm.config.fetch.chunk.cache.retention.ms=120
 # # # Prefetching size, 16 MiB here:
rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
rsm.config.storage.s3.bucket.name=
rsm.config.storage.s3.region=us-west-1
rsm.config.storage.aws.secret.access.key=
rsm.config.storage.aws.access.key.id=
rsm.config.chunk.size=8388608
remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/{*}:/home/admin/s3-0.0.1-SNAPSHOT/{*}
remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
remote.log.metadata.manager.listener.name=PLAINTEXT
rsm.config.upload.rate.limit.bytes.per.second=31457280
replica config:
num.partitions=3
default.replication.factor=2
delete.topic.enable=true
auto.create.topics.enable=false
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=1
offsets.retention.minutes=4320
log.roll.ms=8640
log.local.retention.ms=60
log.segment.bytes=536870912
num.replica.fetchers=1
log.retention.ms=1581120
remote.log.manager.thread.pool.size=4
remote.log.reader.threads=4
remote.log.metadata.topic.replication.factor=3
remote.log.storage.system.enable=true
#remote.log.metadata.topic.retention.ms=18000
rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
 # Pick some cache size, 16 GiB here:
rsm.config.fetch.chunk.cache.size=34359738368
rsm.config.fetch.chunk.cache.retention.ms=120
 # # # Prefetching size, 16 MiB here:
rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
rsm.config.storage.backend.class=i

[jira] (KAFKA-17020) After enabling tiered storage, occasional residual logs are left in the replica

2024-06-21 Thread Jianbin Chen (Jira)


[ https://issues.apache.org/jira/browse/KAFKA-17020 ]


Jianbin Chen deleted comment on KAFKA-17020:
--

was (Author: jianbin):
Restarting does not resolve this issue. The only solution is to delete the log 
folder corresponding to the replica where the log segment anomaly occurred and 
then resynchronize from the leader.
![image](https://github.com/Aiven-Open/tiered-storage-for-apache-kafka/assets/19943636/7256c156-6e90-4799-b0cf-a48c247c5b51)

> After enabling tiered storage, occasional residual logs are left in the 
> replica
> ---
>
> Key: KAFKA-17020
> URL: https://issues.apache.org/jira/browse/KAFKA-17020
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Major
>
> After enabling tiered storage, occasional residual logs are left in the 
> replica.
> Based on the observed phenomenon, the index values of the rolled-out logs 
> generated by the replica and the leader are not the same. As a result, the 
> logs uploaded to S3 at the same time do not include the corresponding log 
> files on the replica side, making it impossible to delete the local logs.
> [!https://private-user-images.githubusercontent.com/19943636/341939158-d0b87a7d-aca1-4700-b3e1-fceff0530c79.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5MzkxNTgtZDBiODdhN2QtYWNhMS00NzAwLWIzZTEtZmNlZmYwNTMwYzc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3ZDQ2OGIxMmE3OGI2Njc2YzdkNzkwMzlhNmM5MzAxNjY0MWZiMzA2ZjgwNzgzM2JlYTMxMzM4Njk1NGI5MDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Sdsvwn0dUi_p1dG0W_AvQY6Iqeimy_UZ8VldKUS1Q0E!|https://private-user-images.githubusercontent.com/19943636/341939158-d0b87a7d-aca1-4700-b3e1-fceff0530c79.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5MzkxNTgtZDBiODdhN2QtYWNhMS00NzAwLWIzZTEtZmNlZmYwNTMwYzc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3ZDQ2OGIxMmE3OGI2Njc2YzdkNzkwMzlhNmM5MzAxNjY0MWZiMzA2ZjgwNzgzM2JlYTMxMzM4Njk1NGI5MDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Sdsvwn0dUi_p1dG0W_AvQY6Iqeimy_UZ8VldKUS1Q0E]
> leader config:
> num.partitions=3
> default.replication.factor=2
> delete.topic.enable=true
> auto.create.topics.enable=false
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=2
> transaction.state.log.min.isr=1
> offsets.retention.minutes=4320
> log.roll.ms=8640
> log.local.retention.ms=60
> log.segment.bytes=536870912
> num.replica.fetchers=1
> log.retention.ms=1581120
> remote.log.manager.thread.pool.size=4
> remote.log.reader.threads=4
> remote.log.metadata.topic.replication.factor=3
> remote.log.storage.system.enable=true
> remote.log.metadata.topic.retention.ms=18000
> rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
> rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
>  # Pick some cache size, 16 GiB here:
> rsm.config.fetch.chunk.cache.size=34359738368
> rsm.config.fetch.chunk.cache.retention.ms=120
>  # # # Prefetching size, 16 MiB here:
> rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
> rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
> rsm.config.storage.s3.bucket.name=
> rsm.config.storage.s3.region=us-west-1
> rsm.config.storage.aws.secret.access.key=
> rsm.config.storage.aws.access.key.id=
> rsm.config.chunk.size=8388608
> remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/{*}:/home/admin/s3-0.0.1-SNAPSHOT/{*}
> remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
> remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
> remote.log.metadata.manager.listener.name=PLAINTEXT
> rsm.config.upload.rate.limit.bytes.per.second=31457280
> replica config:
> num.partitions=3
> default.replication.factor=2
> delete.topic.enable=true
> auto.create.topics.enable=false
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=3
> transaction.state.log.replication.facto

[jira] [Updated] (KAFKA-17020) After enabling tiered storage, occasional residual logs are left in the replica

2024-06-21 Thread Jianbin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianbin Chen updated KAFKA-17020:
-
Description: 
After enabling tiered storage, occasional residual logs are left in the replica.
Based on the observed phenomenon, the index values of the rolled-out logs 
generated by the replica and the leader are not the same. As a result, the logs 
uploaded to S3 at the same time do not include the corresponding log files on 
the replica side, making it impossible to delete the local logs.
[!https://private-user-images.githubusercontent.com/19943636/341939158-d0b87a7d-aca1-4700-b3e1-fceff0530c79.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5MzkxNTgtZDBiODdhN2QtYWNhMS00NzAwLWIzZTEtZmNlZmYwNTMwYzc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3ZDQ2OGIxMmE3OGI2Njc2YzdkNzkwMzlhNmM5MzAxNjY0MWZiMzA2ZjgwNzgzM2JlYTMxMzM4Njk1NGI5MDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Sdsvwn0dUi_p1dG0W_AvQY6Iqeimy_UZ8VldKUS1Q0E!|https://private-user-images.githubusercontent.com/19943636/341939158-d0b87a7d-aca1-4700-b3e1-fceff0530c79.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5MzkxNTgtZDBiODdhN2QtYWNhMS00NzAwLWIzZTEtZmNlZmYwNTMwYzc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3ZDQ2OGIxMmE3OGI2Njc2YzdkNzkwMzlhNmM5MzAxNjY0MWZiMzA2ZjgwNzgzM2JlYTMxMzM4Njk1NGI5MDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Sdsvwn0dUi_p1dG0W_AvQY6Iqeimy_UZ8VldKUS1Q0E]
leader config:
num.partitions=3
default.replication.factor=2
delete.topic.enable=true
auto.create.topics.enable=false
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=1
offsets.retention.minutes=4320
log.roll.ms=8640
log.local.retention.ms=60
log.segment.bytes=536870912
num.replica.fetchers=1
log.retention.ms=1581120
remote.log.manager.thread.pool.size=4
remote.log.reader.threads=4
remote.log.metadata.topic.replication.factor=3
remote.log.storage.system.enable=true
remote.log.metadata.topic.retention.ms=18000
rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
 # Pick some cache size, 16 GiB here:
rsm.config.fetch.chunk.cache.size=34359738368
rsm.config.fetch.chunk.cache.retention.ms=120
 # # # Prefetching size, 16 MiB here:
rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
rsm.config.storage.s3.bucket.name=
rsm.config.storage.s3.region=us-west-1
rsm.config.storage.aws.secret.access.key=
rsm.config.storage.aws.access.key.id=
rsm.config.chunk.size=8388608
remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/{*}:/home/admin/s3-0.0.1-SNAPSHOT/{*}
remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
remote.log.metadata.manager.listener.name=PLAINTEXT
rsm.config.upload.rate.limit.bytes.per.second=31457280
replica config:
num.partitions=3
default.replication.factor=2
delete.topic.enable=true
auto.create.topics.enable=false
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=1
offsets.retention.minutes=4320
log.roll.ms=8640
log.local.retention.ms=60
log.segment.bytes=536870912
num.replica.fetchers=1
log.retention.ms=1581120
remote.log.manager.thread.pool.size=4
remote.log.reader.threads=4
remote.log.metadata.topic.replication.factor=3
remote.log.storage.system.enable=true
#remote.log.metadata.topic.retention.ms=18000
rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
 # Pick some cache size, 16 GiB here:
rsm.config.fetch.chunk.cache.size=34359738368
rsm.config.fetch.chunk.cache.retention.ms=120
 # # # Prefetching size, 16 MiB here:
rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
rsm.config.storage.backend.class=i

[jira] [Commented] (KAFKA-17020) After enabling tiered storage, occasional residual logs are left in the replica

2024-06-21 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-17020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17856903#comment-17856903
 ] 

Jianbin Chen commented on KAFKA-17020:
--

Restarting does not resolve this issue. The only solution is to delete the log 
folder corresponding to the replica where the log segment anomaly occurred and 
then resynchronize from the leader.
![image](https://github.com/Aiven-Open/tiered-storage-for-apache-kafka/assets/19943636/7256c156-6e90-4799-b0cf-a48c247c5b51)

> After enabling tiered storage, occasional residual logs are left in the 
> replica
> ---
>
> Key: KAFKA-17020
> URL: https://issues.apache.org/jira/browse/KAFKA-17020
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Major
>
> After enabling tiered storage, occasional residual logs are left in the 
> replica.
> Based on the observed phenomenon, the index values of the rolled-out logs 
> generated by the replica and the leader are not the same. As a result, the 
> logs uploaded to S3 at the same time do not include the corresponding log 
> files on the replica side, making it impossible to delete the local logs.
> [!https://private-user-images.githubusercontent.com/19943636/341939158-d0b87a7d-aca1-4700-b3e1-fceff0530c79.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5MzkxNTgtZDBiODdhN2QtYWNhMS00NzAwLWIzZTEtZmNlZmYwNTMwYzc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3ZDQ2OGIxMmE3OGI2Njc2YzdkNzkwMzlhNmM5MzAxNjY0MWZiMzA2ZjgwNzgzM2JlYTMxMzM4Njk1NGI5MDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Sdsvwn0dUi_p1dG0W_AvQY6Iqeimy_UZ8VldKUS1Q0E!|https://private-user-images.githubusercontent.com/19943636/341939158-d0b87a7d-aca1-4700-b3e1-fceff0530c79.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5MzkxNTgtZDBiODdhN2QtYWNhMS00NzAwLWIzZTEtZmNlZmYwNTMwYzc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3ZDQ2OGIxMmE3OGI2Njc2YzdkNzkwMzlhNmM5MzAxNjY0MWZiMzA2ZjgwNzgzM2JlYTMxMzM4Njk1NGI5MDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Sdsvwn0dUi_p1dG0W_AvQY6Iqeimy_UZ8VldKUS1Q0E]
> leader config:
> num.partitions=3
> default.replication.factor=2
> delete.topic.enable=true
> auto.create.topics.enable=false
> num.recovery.threads.per.data.dir=1
> offsets.topic.replication.factor=3
> transaction.state.log.replication.factor=2
> transaction.state.log.min.isr=1
> offsets.retention.minutes=4320
> log.roll.ms=8640
> log.local.retention.ms=60
> log.segment.bytes=536870912
> num.replica.fetchers=1
> log.retention.ms=1581120
> remote.log.manager.thread.pool.size=4
> remote.log.reader.threads=4
> remote.log.metadata.topic.replication.factor=3
> remote.log.storage.system.enable=true
> remote.log.metadata.topic.retention.ms=18000
> rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
> rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
> # Pick some cache size, 16 GiB here:
> rsm.config.fetch.chunk.cache.size=34359738368
> rsm.config.fetch.chunk.cache.retention.ms=120
> # # # Prefetching size, 16 MiB here:
> rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
> rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
> rsm.config.storage.s3.bucket.name=
> rsm.config.storage.s3.region=us-west-1
> rsm.config.storage.aws.secret.access.key=
> rsm.config.storage.aws.access.key.id=
> rsm.config.chunk.size=8388608
> remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/*:/home/admin/s3-0.0.1-SNAPSHOT/*
> remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
> remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
> remote.log.metadata.manager.listener.name=PLAINTEXT
> rsm.config.upload.rate.limit.bytes.per.second=31457280
> replica config:
> num.partitions=3
> default.replication.factor=2
> delete.topic.enable=true
> auto.create.topics.enable=false
> num.recovery.threads.per.data.dir=1
> offs

[jira] [Created] (KAFKA-17020) After enabling tiered storage, occasional residual logs are left in the replica

2024-06-21 Thread Jianbin Chen (Jira)
Jianbin Chen created KAFKA-17020:


 Summary: After enabling tiered storage, occasional residual logs 
are left in the replica
 Key: KAFKA-17020
 URL: https://issues.apache.org/jira/browse/KAFKA-17020
 Project: Kafka
  Issue Type: Wish
Affects Versions: 3.7.0
Reporter: Jianbin Chen


After enabling tiered storage, occasional residual logs are left in the replica.
Based on the observed phenomenon, the index values of the rolled-out logs 
generated by the replica and the leader are not the same. As a result, the logs 
uploaded to S3 at the same time do not include the corresponding log files on 
the replica side, making it impossible to delete the local logs.
[!https://private-user-images.githubusercontent.com/19943636/341939158-d0b87a7d-aca1-4700-b3e1-fceff0530c79.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5MzkxNTgtZDBiODdhN2QtYWNhMS00NzAwLWIzZTEtZmNlZmYwNTMwYzc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3ZDQ2OGIxMmE3OGI2Njc2YzdkNzkwMzlhNmM5MzAxNjY0MWZiMzA2ZjgwNzgzM2JlYTMxMzM4Njk1NGI5MDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Sdsvwn0dUi_p1dG0W_AvQY6Iqeimy_UZ8VldKUS1Q0E!|https://private-user-images.githubusercontent.com/19943636/341939158-d0b87a7d-aca1-4700-b3e1-fceff0530c79.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MTkwMzY1OTIsIm5iZiI6MTcxOTAzNjI5MiwicGF0aCI6Ii8xOTk0MzYzNi8zNDE5MzkxNTgtZDBiODdhN2QtYWNhMS00NzAwLWIzZTEtZmNlZmYwNTMwYzc5LnBuZz9YLUFtei1BbGdvcml0aG09QVdTNC1ITUFDLVNIQTI1NiZYLUFtei1DcmVkZW50aWFsPUFLSUFWQ09EWUxTQTUzUFFLNFpBJTJGMjAyNDA2MjIlMkZ1cy1lYXN0LTElMkZzMyUyRmF3czRfcmVxdWVzdCZYLUFtei1EYXRlPTIwMjQwNjIyVDA2MDQ1MlomWC1BbXotRXhwaXJlcz0zMDAmWC1BbXotU2lnbmF0dXJlPWY3ZDQ2OGIxMmE3OGI2Njc2YzdkNzkwMzlhNmM5MzAxNjY0MWZiMzA2ZjgwNzgzM2JlYTMxMzM4Njk1NGI5MDYmWC1BbXotU2lnbmVkSGVhZGVycz1ob3N0JmFjdG9yX2lkPTAma2V5X2lkPTAmcmVwb19pZD0wIn0.Sdsvwn0dUi_p1dG0W_AvQY6Iqeimy_UZ8VldKUS1Q0E]
leader config:
num.partitions=3
default.replication.factor=2
delete.topic.enable=true
auto.create.topics.enable=false
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=1
offsets.retention.minutes=4320
log.roll.ms=8640
log.local.retention.ms=60
log.segment.bytes=536870912
num.replica.fetchers=1
log.retention.ms=1581120
remote.log.manager.thread.pool.size=4
remote.log.reader.threads=4
remote.log.metadata.topic.replication.factor=3
remote.log.storage.system.enable=true
remote.log.metadata.topic.retention.ms=18000
rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
# Pick some cache size, 16 GiB here:
rsm.config.fetch.chunk.cache.size=34359738368
rsm.config.fetch.chunk.cache.retention.ms=120
# # # Prefetching size, 16 MiB here:
rsm.config.fetch.chunk.cache.prefetch.max.size=33554432
rsm.config.storage.backend.class=io.aiven.kafka.tieredstorage.storage.s3.S3Storage
rsm.config.storage.s3.bucket.name=
rsm.config.storage.s3.region=us-west-1
rsm.config.storage.aws.secret.access.key=
rsm.config.storage.aws.access.key.id=
rsm.config.chunk.size=8388608
remote.log.storage.manager.class.path=/home/admin/core-0.0.1-SNAPSHOT/*:/home/admin/s3-0.0.1-SNAPSHOT/*
remote.log.storage.manager.class.name=io.aiven.kafka.tieredstorage.RemoteStorageManager
remote.log.metadata.manager.class.name=org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager
remote.log.metadata.manager.listener.name=PLAINTEXT
rsm.config.upload.rate.limit.bytes.per.second=31457280
replica config:
num.partitions=3
default.replication.factor=2
delete.topic.enable=true
auto.create.topics.enable=false
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=3
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=1
offsets.retention.minutes=4320
log.roll.ms=8640
log.local.retention.ms=60
log.segment.bytes=536870912
num.replica.fetchers=1
log.retention.ms=1581120
remote.log.manager.thread.pool.size=4
remote.log.reader.threads=4
remote.log.metadata.topic.replication.factor=3
remote.log.storage.system.enable=true
#remote.log.metadata.topic.retention.ms=18000
rsm.config.fetch.chunk.cache.class=io.aiven.kafka.tieredstorage.fetch.cache.DiskChunkCache
rsm.config.fetch.chunk.cache.path=/data01/kafka-tiered-storage-cache
# Pick some cache size, 16 GiB here:
rsm.config.fetch.chunk.cache.size=34359738368

[jira] [Updated] (KAFKA-16834) add the reason for the failure of PartitionRegistration#toRecord

2024-05-23 Thread Jianbin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianbin Chen updated KAFKA-16834:
-
Description: 
Transform it into the following output, which is easier for users to understand 
and identify the cause of the problem.
{code:java}
options.handleLoss("the directory " + (directory == DirectoryId.UNASSIGNED ? 
"unassigned" : "lost")
+ " state of one or more replicas");{code}

  was:
Transform it into the following output, which is easier for users to understand 
and identify the cause of the problem.
{code:java}
options.handleLoss("the directory " + directory + " state of one or more 
replicas");{code}


> add the reason for the failure of PartitionRegistration#toRecord
> 
>
> Key: KAFKA-16834
> URL: https://issues.apache.org/jira/browse/KAFKA-16834
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Minor
>
> Transform it into the following output, which is easier for users to 
> understand and identify the cause of the problem.
> {code:java}
> options.handleLoss("the directory " + (directory == DirectoryId.UNASSIGNED ? 
> "unassigned" : "lost")
> + " state of one or more replicas");{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16834) add the reason for the failure of PartitionRegistration#toRecord

2024-05-23 Thread Jianbin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16834?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianbin Chen updated KAFKA-16834:
-
Summary: add the reason for the failure of PartitionRegistration#toRecord  
(was: add PartitionRegistration#toRecord loss info)

> add the reason for the failure of PartitionRegistration#toRecord
> 
>
> Key: KAFKA-16834
> URL: https://issues.apache.org/jira/browse/KAFKA-16834
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Minor
>
> Transform it into the following output, which is easier for users to 
> understand and identify the cause of the problem.
> {code:java}
> options.handleLoss("the directory " + directory + " state of one or more 
> replicas");{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16834) add PartitionRegistration#toRecord loss info

2024-05-23 Thread Jianbin Chen (Jira)
Jianbin Chen created KAFKA-16834:


 Summary: add PartitionRegistration#toRecord loss info
 Key: KAFKA-16834
 URL: https://issues.apache.org/jira/browse/KAFKA-16834
 Project: Kafka
  Issue Type: Wish
Affects Versions: 3.7.0
Reporter: Jianbin Chen


Transform it into the following output, which is easier for users to understand 
and identify the cause of the problem.
{code:java}
options.handleLoss("the directory " + directory + " state of one or more 
replicas");{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16583) Update from 3.4.0 to 3.7.0 image write failed in Kraft mode

2024-05-23 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849145#comment-17849145
 ] 

Jianbin Chen commented on KAFKA-16583:
--

I want to know when this PR can be merged, I am deeply affected by this bug!

> Update from 3.4.0 to 3.7.0 image write failed in Kraft mode
> ---
>
> Key: KAFKA-16583
> URL: https://issues.apache.org/jira/browse/KAFKA-16583
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.7.0
>Reporter: HanXu
>Assignee: HanXu
>Priority: Major
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> How to reproduce:
> 1. Launch a 3.4.0 controller and a 3.4.0 broker(BrokerA) in Kraft mode;
> 2. Create a topic with 1 partition;
> 3. Launch a 3.4.0 broker(Broker B) in Kraft mode and reassign the step 2 
> partition to Broker B;
> 4. Upgrade Broker B to 3.7.0;
> The Broker B will keep log the following error:
> {code:java}
> [2024-04-18 14:46:54,144] ERROR Encountered metadata loading fault: Unhandled 
> error initializing new publishers 
> (org.apache.kafka.server.fault.LoggingFaultHandler)
> org.apache.kafka.image.writer.UnwritableMetadataException: Metadata has been 
> lost because the following could not be represented in metadata version 
> 3.4-IV0: the directory assignment state of one or more replicas
>   at 
> org.apache.kafka.image.writer.ImageWriterOptions.handleLoss(ImageWriterOptions.java:94)
>   at 
> org.apache.kafka.metadata.PartitionRegistration.toRecord(PartitionRegistration.java:391)
>   at org.apache.kafka.image.TopicImage.write(TopicImage.java:71)
>   at org.apache.kafka.image.TopicsImage.write(TopicsImage.java:84)
>   at org.apache.kafka.image.MetadataImage.write(MetadataImage.java:155)
>   at 
> org.apache.kafka.image.loader.MetadataLoader.initializeNewPublishers(MetadataLoader.java:295)
>   at 
> org.apache.kafka.image.loader.MetadataLoader.lambda$scheduleInitializeNewPublishers$0(MetadataLoader.java:266)
>   at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
>   at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
>   at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
>   at java.base/java.lang.Thread.run(Thread.java:840)
> {code}
> Bug:
>  - When reassigning partition, PartitionRegistration#merge will set the new 
> replicas with UNASSIGNED directory;
>  - But in metadata version 3.4.0 PartitionRegistration#toRecord only allows 
> MIGRATING directory;
> {code:java}
> if (options.metadataVersion().isDirectoryAssignmentSupported()) {
> record.setDirectories(Uuid.toList(directories));
> } else {
> for (Uuid directory : directories) {
> if (!DirectoryId.MIGRATING.equals(directory)) {
> options.handleLoss("the directory assignment state of one 
> or more replicas");
> break;
> }
> }
> }
> {code}
> Solution:
> - PartitionRegistration#toRecord allows both MIGRATING and UNASSIGNED



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16662) UnwritableMetadataException: Metadata has been lost

2024-05-21 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848436#comment-17848436
 ] 

Jianbin Chen commented on KAFKA-16662:
--

After I deleted all of the __cluster_metadata-0, the problem did not occur when 
I started the cluster, but all my topic information was lost. Fortunately, this 
is just an offline test environment cluster. According to the phenomenon, it is 
certain that the incompatibility between the 3.5 version of metadata and the 
3.7 version caused this problem. This makes me dare not try to smoothly upgrade 
the cluster. In the past, when using zk, upgrading the broker would never cause 
similar problems!

> UnwritableMetadataException: Metadata has been lost
> ---
>
> Key: KAFKA-16662
> URL: https://issues.apache.org/jira/browse/KAFKA-16662
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
> Environment: Docker Image (bitnami/kafka:3.7.0)
> via Docker Compose
>Reporter: Tobias Bohn
>Priority: Major
> Attachments: log.txt
>
>
> Hello,
> First of all: I am new to this Jira and apologize if anything is set or 
> specified incorrectly. Feel free to advise me.
> We currently have an error in our test system, which unfortunately I can't 
> solve, because I couldn't find anything related to it. No solution could be 
> found via the mailing list either.
> The error occurs when we want to start up a node. The node runs using Kraft 
> and is both a controller and a broker. The following error message appears at 
> startup:
> {code:java}
> kafka  | [2024-04-16 06:18:13,707] ERROR Encountered fatal fault: Unhandled 
> error initializing new publishers 
> (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
> kafka  | org.apache.kafka.image.writer.UnwritableMetadataException: Metadata 
> has been lost because the following could not be represented in metadata 
> version 3.5-IV2: the directory assignment state of one or more replicas
> kafka  |        at 
> org.apache.kafka.image.writer.ImageWriterOptions.handleLoss(ImageWriterOptions.java:94)
> kafka  |        at 
> org.apache.kafka.metadata.PartitionRegistration.toRecord(PartitionRegistration.java:391)
> kafka  |        at org.apache.kafka.image.TopicImage.write(TopicImage.java:71)
> kafka  |        at 
> org.apache.kafka.image.TopicsImage.write(TopicsImage.java:84)
> kafka  |        at 
> org.apache.kafka.image.MetadataImage.write(MetadataImage.java:155)
> kafka  |        at 
> org.apache.kafka.image.loader.MetadataLoader.initializeNewPublishers(MetadataLoader.java:295)
> kafka  |        at 
> org.apache.kafka.image.loader.MetadataLoader.lambda$scheduleInitializeNewPublishers$0(MetadataLoader.java:266)
> kafka  |        at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
> kafka  |        at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
> kafka  |        at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
> kafka  |        at java.base/java.lang.Thread.run(Thread.java:840)
> kafka exited with code 0 {code}
> We use Docker to operate the cluster. The error occurred while we were trying 
> to restart a node. All other nodes in the cluster are still running correctly.
> If you need further information, please let us know. The complete log is 
> attached to this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16662) UnwritableMetadataException: Metadata has been lost

2024-05-21 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848434#comment-17848434
 ] 

Jianbin Chen commented on KAFKA-16662:
--

When I executed ./bin/kafka-features.sh --bootstrap-server 10.58.16.231:9092 
upgrade --metadata 3.7,  It continuously outputs the following exception
{panel:title=我的标题}
[2024-05-22 11:30:36,491] INFO [UnifiedLog partition=remote-test-5, 
dir=/data01/kafka-logs-351] Incremented log start offset to 26267689 due to 
leader offset increment (kafka.log.UnifiedLog)
[2024-05-22 11:30:36,497] INFO [UnifiedLog partition=remote-test2-0, 
dir=/data01/kafka-logs-351] Incremented log start offset to 3099360 due to 
leader offset increment (kafka.log.UnifiedLog)
[2024-05-22 11:30:37,149] ERROR Failed to propagate directory assignments 
because the Controller returned error STALE_BROKER_EPOCH 
(org.apache.kafka.server.AssignmentsManager)
[2024-05-22 11:30:38,064] ERROR Failed to propagate directory assignments 
because the Controller returned error STALE_BROKER_EPOCH 
(org.apache.kafka.server.AssignmentsManager)
[2024-05-22 11:30:39,376] ERROR Failed to propagate directory assignments 
because the Controller returned error STALE_BROKER_EPOCH 
(org.apache.kafka.server.AssignmentsManager)
[2024-05-22 11:30:41,486] ERROR Failed to propagate directory assignments 
because the Controller returned error STALE_BROKER_EPOCH 
(org.apache.kafka.server.AssignmentsManager)
[2024-05-22 11:30:43,794] INFO [BrokerLifecycleManager id=3] Unable to register 
broker 3 because the controller returned error INVALID_REGISTRATION 
(kafka.server.BrokerLifecycleManager)
[2024-05-22 11:30:45,224] ERROR Failed to propagate directory assignments 
because the Controller returned error STALE_BROKER_EPOCH 
(org.apache.kafka.server.AssignmentsManager)
{panel}
controller logs:
{code:java}
java.util.concurrent.CompletionException: 
org.apache.kafka.common.errors.StaleBrokerEpochException: Expected broker epoch 
41885255, but got broker epoch -1
    at 
java.base/java.util.concurrent.CompletableFuture.encodeThrowable(CompletableFuture.java:332)
    at 
java.base/java.util.concurrent.CompletableFuture.completeThrowable(CompletableFuture.java:347)
    at 
java.base/java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:636)
    at 
java.base/java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:510)
    at 
java.base/java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:2194)
    at 
org.apache.kafka.controller.QuorumController$ControllerWriteEvent.complete(QuorumController.java:880)
    at 
org.apache.kafka.controller.QuorumController$ControllerWriteEvent.handleException(QuorumController.java:871)
    at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.completeWithException(KafkaEventQueue.java:148)
    at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:137)
    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
    at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: org.apache.kafka.common.errors.StaleBrokerEpochException: Expected 
broker epoch 41885255, but got broker epoch -1{code}

> UnwritableMetadataException: Metadata has been lost
> ---
>
> Key: KAFKA-16662
> URL: https://issues.apache.org/jira/browse/KAFKA-16662
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
> Environment: Docker Image (bitnami/kafka:3.7.0)
> via Docker Compose
>Reporter: Tobias Bohn
>Priority: Major
> Attachments: log.txt
>
>
> Hello,
> First of all: I am new to this Jira and apologize if anything is set or 
> specified incorrectly. Feel free to advise me.
> We currently have an error in our test system, which unfortunately I can't 
> solve, because I couldn't find anything related to it. No solution could be 
> found via the mailing list either.
> The error occurs when we want to start up a node. The node runs using Kraft 
> and is both a controller and a broker. The following error message appears at 
> startup:
> {code:java}
> kafka  | [2024-04-16 06:18:13,707] ERROR Encountered fatal fault: Unhandled 
> error initializing new publishers 
> (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
> kafka  | org.apache.kafka.image.writer.UnwritableMetadataException: Metadata 
> has been lost because the following could not be represented in metadata 
> version 3.5-IV2: the directory assignment state of one or more replicas
> kafka  |        at 
> org.apache.kafka.image.writer.ImageWriterOptions.handleLoss(ImageWriterOptions.java:94)
> kafka  |        at 
> org.apache.kafka.metadata.PartitionRegistration.toRecord(Pa

[jira] [Commented] (KAFKA-16662) UnwritableMetadataException: Metadata has been lost

2024-05-21 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848431#comment-17848431
 ] 

Jianbin Chen commented on KAFKA-16662:
--

Could someone please pay attention to this issue and help me out?
{code:java}
[admin@kafka-dev-d-010058016231 kafka]$ ./bin/kafka-features.sh 
--bootstrap-server 10.58.16.231:9092 describe
Feature: metadata.version    SupportedMinVersion: 3.0-IV1    
SupportedMaxVersion: 3.7-IV4    FinalizedVersionLevel: 3.5-IV2    Epoch: 
41885646{code}

> UnwritableMetadataException: Metadata has been lost
> ---
>
> Key: KAFKA-16662
> URL: https://issues.apache.org/jira/browse/KAFKA-16662
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
> Environment: Docker Image (bitnami/kafka:3.7.0)
> via Docker Compose
>Reporter: Tobias Bohn
>Priority: Major
> Attachments: log.txt
>
>
> Hello,
> First of all: I am new to this Jira and apologize if anything is set or 
> specified incorrectly. Feel free to advise me.
> We currently have an error in our test system, which unfortunately I can't 
> solve, because I couldn't find anything related to it. No solution could be 
> found via the mailing list either.
> The error occurs when we want to start up a node. The node runs using Kraft 
> and is both a controller and a broker. The following error message appears at 
> startup:
> {code:java}
> kafka  | [2024-04-16 06:18:13,707] ERROR Encountered fatal fault: Unhandled 
> error initializing new publishers 
> (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
> kafka  | org.apache.kafka.image.writer.UnwritableMetadataException: Metadata 
> has been lost because the following could not be represented in metadata 
> version 3.5-IV2: the directory assignment state of one or more replicas
> kafka  |        at 
> org.apache.kafka.image.writer.ImageWriterOptions.handleLoss(ImageWriterOptions.java:94)
> kafka  |        at 
> org.apache.kafka.metadata.PartitionRegistration.toRecord(PartitionRegistration.java:391)
> kafka  |        at org.apache.kafka.image.TopicImage.write(TopicImage.java:71)
> kafka  |        at 
> org.apache.kafka.image.TopicsImage.write(TopicsImage.java:84)
> kafka  |        at 
> org.apache.kafka.image.MetadataImage.write(MetadataImage.java:155)
> kafka  |        at 
> org.apache.kafka.image.loader.MetadataLoader.initializeNewPublishers(MetadataLoader.java:295)
> kafka  |        at 
> org.apache.kafka.image.loader.MetadataLoader.lambda$scheduleInitializeNewPublishers$0(MetadataLoader.java:266)
> kafka  |        at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
> kafka  |        at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
> kafka  |        at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
> kafka  |        at java.base/java.lang.Thread.run(Thread.java:840)
> kafka exited with code 0 {code}
> We use Docker to operate the cluster. The error occurred while we were trying 
> to restart a node. All other nodes in the cluster are still running correctly.
> If you need further information, please let us know. The complete log is 
> attached to this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-16662) UnwritableMetadataException: Metadata has been lost

2024-05-21 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848424#comment-17848424
 ] 

Jianbin Chen edited comment on KAFKA-16662 at 5/22/24 3:10 AM:
---

I have encountered the same issue. Can anyone help me with this? I upgraded 
from 3.5.1 to 3.7.0, and I have already changed inter.broker.protocol.version 
to 3.7 and ran it for some time.

But I have never executed

 
{code:java}
./bin/kafka-features.sh upgrade --metadata 3.7 {code}
The last time I restarted the cluster, I found that it could not be started 
anymore. The last line of the log is as follows:

 
{code:java}
[2024-05-22 11:01:41,087] INFO [MetadataLoader id=3] 
maybePublishMetadata(LOG_DELTA): The loader is still catching up because we 
have loaded up to offset 41872530, but the high water mark is 41872532 
(org.apache.kafka.image.loader.MetadataLoader)
[2024-05-22 11:01:41,088] INFO [MetadataLoader id=3] 
maybePublishMetadata(LOG_DELTA): The loader finished catching up to the current 
high water mark of 41872532 (org.apache.kafka.image.loader.MetadataLoader)
[2024-05-22 11:01:41,092] INFO [BrokerLifecycleManager id=3] The broker has 
caught up. Transitioning from STARTING to RECOVERY. 
(kafka.server.BrokerLifecycleManager)
[2024-05-22 11:01:41,092] INFO [BrokerServer id=3] Finished waiting for the 
controller to acknowledge that we are caught up (kafka.server.BrokerServer)
[2024-05-22 11:01:41,092] INFO [BrokerServer id=3] Waiting for the initial 
broker metadata update to be published (kafka.server.BrokerServer)
[2024-05-22 11:01:41,095] ERROR Encountered fatal fault: Unhandled error 
initializing new publishers 
(org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
org.apache.kafka.image.writer.UnwritableMetadataException: Metadata has been 
lost because the following could not be represented in metadata version 
3.5-IV2: the directory assignment state of one or more replicas
    at 
org.apache.kafka.image.writer.ImageWriterOptions.handleLoss(ImageWriterOptions.java:94)
    at 
org.apache.kafka.metadata.PartitionRegistration.toRecord(PartitionRegistration.java:391)
    at org.apache.kafka.image.TopicImage.write(TopicImage.java:71)
    at org.apache.kafka.image.TopicsImage.write(TopicsImage.java:84)
    at org.apache.kafka.image.MetadataImage.write(MetadataImage.java:155)
    at 
org.apache.kafka.image.loader.MetadataLoader.initializeNewPublishers(MetadataLoader.java:295)
    at 
org.apache.kafka.image.loader.MetadataLoader.lambda$scheduleInitializeNewPublishers$0(MetadataLoader.java:266)
    at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
    at java.base/java.lang.Thread.run(Thread.java:1583) {code}
 

 

 


was (Author: jianbin):
I have encountered the same issue. Can anyone help me with this? I upgraded 
from 3.5.1 to 3.7.0, and I have already changed inter.broker.protocol.version 
to 3.7 and ran it for some time.

But I have never executed

`./bin/kafka-features.sh upgrade --metadata 3.7`

The last time I restarted the cluster, I found that it could not be started 
anymore. The last line of the log is as follows:

```

[2024-05-22 11:01:41,087] INFO [MetadataLoader id=3] 
maybePublishMetadata(LOG_DELTA): The loader is still catching up because we 
have loaded up to offset 41872530, but the high water mark is 41872532 
(org.apache.kafka.image.loader.MetadataLoader)
[2024-05-22 11:01:41,088] INFO [MetadataLoader id=3] 
maybePublishMetadata(LOG_DELTA): The loader finished catching up to the current 
high water mark of 41872532 (org.apache.kafka.image.loader.MetadataLoader)
[2024-05-22 11:01:41,092] INFO [BrokerLifecycleManager id=3] The broker has 
caught up. Transitioning from STARTING to RECOVERY. 
(kafka.server.BrokerLifecycleManager)
[2024-05-22 11:01:41,092] INFO [BrokerServer id=3] Finished waiting for the 
controller to acknowledge that we are caught up (kafka.server.BrokerServer)
[2024-05-22 11:01:41,092] INFO [BrokerServer id=3] Waiting for the initial 
broker metadata update to be published (kafka.server.BrokerServer)
[2024-05-22 11:01:41,095] ERROR Encountered fatal fault: Unhandled error 
initializing new publishers 
(org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
org.apache.kafka.image.writer.UnwritableMetadataException: Metadata has been 
lost because the following could not be represented in metadata version 
3.5-IV2: the directory assignment state of one or more replicas
    at 
org.apache.kafka.image.writer.ImageWriterOptions.handleLoss(ImageWriterOptions.java:94)
    at 
org.apache.kafka.metadata.PartitionRegistration.toRecord(PartitionRegistration.java:391)
    at org.apache.kafka.image.TopicImage.write(TopicImage.java:71)

[jira] [Commented] (KAFKA-16662) UnwritableMetadataException: Metadata has been lost

2024-05-21 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17848424#comment-17848424
 ] 

Jianbin Chen commented on KAFKA-16662:
--

I have encountered the same issue. Can anyone help me with this? I upgraded 
from 3.5.1 to 3.7.0, and I have already changed inter.broker.protocol.version 
to 3.7 and ran it for some time.

But I have never executed

`./bin/kafka-features.sh upgrade --metadata 3.7`

The last time I restarted the cluster, I found that it could not be started 
anymore. The last line of the log is as follows:

```

[2024-05-22 11:01:41,087] INFO [MetadataLoader id=3] 
maybePublishMetadata(LOG_DELTA): The loader is still catching up because we 
have loaded up to offset 41872530, but the high water mark is 41872532 
(org.apache.kafka.image.loader.MetadataLoader)
[2024-05-22 11:01:41,088] INFO [MetadataLoader id=3] 
maybePublishMetadata(LOG_DELTA): The loader finished catching up to the current 
high water mark of 41872532 (org.apache.kafka.image.loader.MetadataLoader)
[2024-05-22 11:01:41,092] INFO [BrokerLifecycleManager id=3] The broker has 
caught up. Transitioning from STARTING to RECOVERY. 
(kafka.server.BrokerLifecycleManager)
[2024-05-22 11:01:41,092] INFO [BrokerServer id=3] Finished waiting for the 
controller to acknowledge that we are caught up (kafka.server.BrokerServer)
[2024-05-22 11:01:41,092] INFO [BrokerServer id=3] Waiting for the initial 
broker metadata update to be published (kafka.server.BrokerServer)
[2024-05-22 11:01:41,095] ERROR Encountered fatal fault: Unhandled error 
initializing new publishers 
(org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
org.apache.kafka.image.writer.UnwritableMetadataException: Metadata has been 
lost because the following could not be represented in metadata version 
3.5-IV2: the directory assignment state of one or more replicas
    at 
org.apache.kafka.image.writer.ImageWriterOptions.handleLoss(ImageWriterOptions.java:94)
    at 
org.apache.kafka.metadata.PartitionRegistration.toRecord(PartitionRegistration.java:391)
    at org.apache.kafka.image.TopicImage.write(TopicImage.java:71)
    at org.apache.kafka.image.TopicsImage.write(TopicsImage.java:84)
    at org.apache.kafka.image.MetadataImage.write(MetadataImage.java:155)
    at 
org.apache.kafka.image.loader.MetadataLoader.initializeNewPublishers(MetadataLoader.java:295)
    at 
org.apache.kafka.image.loader.MetadataLoader.lambda$scheduleInitializeNewPublishers$0(MetadataLoader.java:266)
    at 
org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
    at 
org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
    at java.base/java.lang.Thread.run(Thread.java:1583)

```

> UnwritableMetadataException: Metadata has been lost
> ---
>
> Key: KAFKA-16662
> URL: https://issues.apache.org/jira/browse/KAFKA-16662
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
> Environment: Docker Image (bitnami/kafka:3.7.0)
> via Docker Compose
>Reporter: Tobias Bohn
>Priority: Major
> Attachments: log.txt
>
>
> Hello,
> First of all: I am new to this Jira and apologize if anything is set or 
> specified incorrectly. Feel free to advise me.
> We currently have an error in our test system, which unfortunately I can't 
> solve, because I couldn't find anything related to it. No solution could be 
> found via the mailing list either.
> The error occurs when we want to start up a node. The node runs using Kraft 
> and is both a controller and a broker. The following error message appears at 
> startup:
> {code:java}
> kafka  | [2024-04-16 06:18:13,707] ERROR Encountered fatal fault: Unhandled 
> error initializing new publishers 
> (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
> kafka  | org.apache.kafka.image.writer.UnwritableMetadataException: Metadata 
> has been lost because the following could not be represented in metadata 
> version 3.5-IV2: the directory assignment state of one or more replicas
> kafka  |        at 
> org.apache.kafka.image.writer.ImageWriterOptions.handleLoss(ImageWriterOptions.java:94)
> kafka  |        at 
> org.apache.kafka.metadata.PartitionRegistration.toRecord(PartitionRegistration.java:391)
> kafka  |        at org.apache.kafka.image.TopicImage.write(TopicImage.java:71)
> kafka  |        at 
> org.apache.kafka.image.TopicsImage.write(TopicsImage.java:84)
> kafka  |        at 
> org.apache.kafka.image.MetadataImage.write(MetadataImage.java:155)
> kafka  |        at 
> org.apache.kafka.image.loader.MetadataLoader.initializeNewPublishers(MetadataLoader.java:295)
> kafka  |        at 
> org.apache.kafka.image.loader.MetadataLoader.

[jira] [Resolved] (KAFKA-16378) Under tiered storage, deleting local logs does not free disk space

2024-03-19 Thread Jianbin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianbin Chen resolved KAFKA-16378.
--
Resolution: Fixed

> Under tiered storage, deleting local logs does not free disk space
> --
>
> Key: KAFKA-16378
> URL: https://issues.apache.org/jira/browse/KAFKA-16378
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Major
> Attachments: image-2024-03-15-09-33-13-903.png
>
>
> Of course, this is an occasional phenomenon, as long as the tiered storage 
> topic triggered the deletion of the local log action, there is always the 
> possibility of residual file references, but these files on the local disk is 
> already impossible to find!
> I use the implementation as: [Aiven-Open/tiered-storage-for-apache-kafka: 
> RemoteStorageManager for Apache Kafka® Tiered Storage 
> (github.com)|https://github.com/Aiven-Open/tiered-storage-for-apache-kafka]
> I also filed an issue in their community, which also contains a full 
> description of the problem
> [Disk space not released · Issue #513 · 
> Aiven-Open/tiered-storage-for-apache-kafka 
> (github.com)|https://github.com/Aiven-Open/tiered-storage-for-apache-kafka/issues/513]
> !image-2024-03-15-09-33-13-903.png!
> You can clearly see in this figure that the kafka log has already output the 
> log of the operation that deleted the log, but the log is still referenced 
> and the disk space has not been released



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16378) Under tiered storage, deleting local logs does not free disk space

2024-03-14 Thread Jianbin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianbin Chen updated KAFKA-16378:
-
Component/s: Tiered-Storage

> Under tiered storage, deleting local logs does not free disk space
> --
>
> Key: KAFKA-16378
> URL: https://issues.apache.org/jira/browse/KAFKA-16378
> Project: Kafka
>  Issue Type: Bug
>  Components: Tiered-Storage
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Major
> Attachments: image-2024-03-15-09-33-13-903.png
>
>
> Of course, this is an occasional phenomenon, as long as the tiered storage 
> topic triggered the deletion of the local log action, there is always the 
> possibility of residual file references, but these files on the local disk is 
> already impossible to find!
> I use the implementation as: [Aiven-Open/tiered-storage-for-apache-kafka: 
> RemoteStorageManager for Apache Kafka® Tiered Storage 
> (github.com)|https://github.com/Aiven-Open/tiered-storage-for-apache-kafka]
> I also filed an issue in their community, which also contains a full 
> description of the problem
> [Disk space not released · Issue #513 · 
> Aiven-Open/tiered-storage-for-apache-kafka 
> (github.com)|https://github.com/Aiven-Open/tiered-storage-for-apache-kafka/issues/513]
> !image-2024-03-15-09-33-13-903.png!
> You can clearly see in this figure that the kafka log has already output the 
> log of the operation that deleted the log, but the log is still referenced 
> and the disk space has not been released



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16378) Under tiered storage, deleting local logs does not free disk space

2024-03-14 Thread Jianbin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16378?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianbin Chen updated KAFKA-16378:
-
Issue Type: Bug  (was: Wish)

> Under tiered storage, deleting local logs does not free disk space
> --
>
> Key: KAFKA-16378
> URL: https://issues.apache.org/jira/browse/KAFKA-16378
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Major
> Attachments: image-2024-03-15-09-33-13-903.png
>
>
> Of course, this is an occasional phenomenon, as long as the tiered storage 
> topic triggered the deletion of the local log action, there is always the 
> possibility of residual file references, but these files on the local disk is 
> already impossible to find!
> I use the implementation as: [Aiven-Open/tiered-storage-for-apache-kafka: 
> RemoteStorageManager for Apache Kafka® Tiered Storage 
> (github.com)|https://github.com/Aiven-Open/tiered-storage-for-apache-kafka]
> I also filed an issue in their community, which also contains a full 
> description of the problem
> [Disk space not released · Issue #513 · 
> Aiven-Open/tiered-storage-for-apache-kafka 
> (github.com)|https://github.com/Aiven-Open/tiered-storage-for-apache-kafka/issues/513]
> !image-2024-03-15-09-33-13-903.png!
> You can clearly see in this figure that the kafka log has already output the 
> log of the operation that deleted the log, but the log is still referenced 
> and the disk space has not been released



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16378) Under tiered storage, deleting local logs does not free disk space

2024-03-14 Thread Jianbin Chen (Jira)
Jianbin Chen created KAFKA-16378:


 Summary: Under tiered storage, deleting local logs does not free 
disk space
 Key: KAFKA-16378
 URL: https://issues.apache.org/jira/browse/KAFKA-16378
 Project: Kafka
  Issue Type: Wish
Affects Versions: 3.7.0
Reporter: Jianbin Chen
 Attachments: image-2024-03-15-09-33-13-903.png

Of course, this is an occasional phenomenon, as long as the tiered storage 
topic triggered the deletion of the local log action, there is always the 
possibility of residual file references, but these files on the local disk is 
already impossible to find!

I use the implementation as: [Aiven-Open/tiered-storage-for-apache-kafka: 
RemoteStorageManager for Apache Kafka® Tiered Storage 
(github.com)|https://github.com/Aiven-Open/tiered-storage-for-apache-kafka]

I also filed an issue in their community, which also contains a full 
description of the problem

[Disk space not released · Issue #513 · 
Aiven-Open/tiered-storage-for-apache-kafka 
(github.com)|https://github.com/Aiven-Open/tiered-storage-for-apache-kafka/issues/513]

!image-2024-03-15-09-33-13-903.png!

You can clearly see in this figure that the kafka log has already output the 
log of the operation that deleted the log, but the log is still referenced and 
the disk space has not been released



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16340) Replication factor: 3 larger than available brokers: 1.

2024-03-04 Thread Jianbin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianbin Chen updated KAFKA-16340:
-
Description: 
Setting remote.log.metadata.topic.replication.factor is invalid
{code:java}
broker.id=1
log.cleanup.policy=delete
log.cleaner.enable=true
log.cleaner.delete.retention.ms=30
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
message.max.bytes=5242880
replica.fetch.max.bytes=5242880
log.dirs=/data01/kafka110-logs
num.partitions=2
default.replication.factor=1
delete.topic.enable=true
auto.create.topics.enable=true
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
offsets.retention.minutes=1440
log.retention.minutes=10
log.local.retention.ms=30
log.segment.bytes=104857600
log.retention.check.interval.ms=30
remote.log.metadata.topic.replication.factor=1
remote.log.storage.system.enable=true
remote.log.metadata.topic.retention.ms=-1{code}
!image-2024-03-05-09-31-35-058.png!

 

 
{code:java}
[2024-03-05 09:27:49,672] ERROR Encountered error while creating 
__remote_log_metadata topic. 
(org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager)
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication 
factor: 3 larger than available brokers: 1.
    at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
    at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.createTopic(TopicBasedRemoteLogMetadataManager.java:509)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.initializeResources(TopicBasedRemoteLogMetadataManager.java:396)
    at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: org.apache.kafka.common.errors.InvalidReplicationFactorException: 
Replication factor: 3 larger than available brokers: 1.{code}
 

  was:
Setting remote.log.metadata .topic.replication.factor is invalid
{code:java}
broker.id=1
log.cleanup.policy=delete
log.cleaner.enable=true
log.cleaner.delete.retention.ms=30
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
message.max.bytes=5242880
replica.fetch.max.bytes=5242880
log.dirs=/data01/kafka110-logs
num.partitions=2
default.replication.factor=1
delete.topic.enable=true
auto.create.topics.enable=true
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
offsets.retention.minutes=1440
log.retention.minutes=10
log.local.retention.ms=30
log.segment.bytes=104857600
log.retention.check.interval.ms=30
remote.log.metadata.topic.replication.factor=1
remote.log.storage.system.enable=true
remote.log.metadata.topic.retention.ms=-1{code}
!image-2024-03-05-09-31-35-058.png!

 

 
{code:java}
[2024-03-05 09:27:49,672] ERROR Encountered error while creating 
__remote_log_metadata topic. 
(org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager)
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication 
factor: 3 larger than available brokers: 1.
    at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
    at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.createTopic(TopicBasedRemoteLogMetadataManager.java:509)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.initializeResources(TopicBasedRemoteLogMetadataManager.java:396)
    at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: org.apache.kafka.common.errors.InvalidReplicationFactorException: 
Replication factor: 3 larger than available brokers: 1.{code}
 


>  Replication factor: 3 larger than available brokers: 1.
> 
>
> Key: KAFKA-16340
> URL: https://issues.apache.org/jira/browse/KAFKA-16340
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Major
> Attachments: image-2024-03-05-09-31-35-058.png
>
>
> Setting remote.log.metadata.topic.replication.factor is invalid
> {code:java}
> broker.id=1
> log.cleanup.policy=delete
> log.cleaner.enable=true
> log.

[jira] [Updated] (KAFKA-16340) Replication factor: 3 larger than available brokers: 1.

2024-03-04 Thread Jianbin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianbin Chen updated KAFKA-16340:
-
Description: 
Setting remote.log.metadata .topic.replication.factor is invalid
{code:java}
broker.id=1
log.cleanup.policy=delete
log.cleaner.enable=true
log.cleaner.delete.retention.ms=30
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
message.max.bytes=5242880
replica.fetch.max.bytes=5242880
log.dirs=/data01/kafka110-logs
num.partitions=2
default.replication.factor=1
delete.topic.enable=true
auto.create.topics.enable=true
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
offsets.retention.minutes=1440
log.retention.minutes=10
log.local.retention.ms=30
log.segment.bytes=104857600
log.retention.check.interval.ms=30
remote.log.metadata.topic.replication.factor=1
remote.log.storage.system.enable=true
remote.log.metadata.topic.retention.ms=-1{code}
!image-2024-03-05-09-31-35-058.png!

 

 
{code:java}
[2024-03-05 09:27:49,672] ERROR Encountered error while creating 
__remote_log_metadata topic. 
(org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager)
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication 
factor: 3 larger than available brokers: 1.
    at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
    at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.createTopic(TopicBasedRemoteLogMetadataManager.java:509)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.initializeResources(TopicBasedRemoteLogMetadataManager.java:396)
    at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: org.apache.kafka.common.errors.InvalidReplicationFactorException: 
Replication factor: 3 larger than available brokers: 1.{code}
 

  was:
设置 remote.log.metadata.topic.replication.factor 是无效的
{code:java}
broker.id=1
log.cleanup.policy=delete
log.cleaner.enable=true
log.cleaner.delete.retention.ms=30
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
message.max.bytes=5242880
replica.fetch.max.bytes=5242880
log.dirs=/data01/kafka110-logs
num.partitions=2
default.replication.factor=1
delete.topic.enable=true
auto.create.topics.enable=true
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
offsets.retention.minutes=1440
log.retention.minutes=10
log.local.retention.ms=30
log.segment.bytes=104857600
log.retention.check.interval.ms=30
remote.log.metadata.topic.replication.factor=1
remote.log.storage.system.enable=true
remote.log.metadata.topic.retention.ms=-1{code}
!image-2024-03-05-09-31-35-058.png!

 

 
{code:java}
[2024-03-05 09:27:49,672] ERROR Encountered error while creating 
__remote_log_metadata topic. 
(org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager)
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication 
factor: 3 larger than available brokers: 1.
    at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
    at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.createTopic(TopicBasedRemoteLogMetadataManager.java:509)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.initializeResources(TopicBasedRemoteLogMetadataManager.java:396)
    at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: org.apache.kafka.common.errors.InvalidReplicationFactorException: 
Replication factor: 3 larger than available brokers: 1.{code}
 


>  Replication factor: 3 larger than available brokers: 1.
> 
>
> Key: KAFKA-16340
> URL: https://issues.apache.org/jira/browse/KAFKA-16340
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Major
> Attachments: image-2024-03-05-09-31-35-058.png
>
>
> Setting remote.log.metadata .topic.replication.factor is invalid
> {code:java}
> broker.id=1
> log.cleanup.policy=delete
> log.cleaner.enable=true
> log.cleaner.de

[jira] [Updated] (KAFKA-16340) Replication factor: 3 larger than available brokers: 1.

2024-03-04 Thread Jianbin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianbin Chen updated KAFKA-16340:
-
Description: 
设置 remote.log.metadata.topic.replication.factor 是无效的
{code:java}
broker.id=1
log.cleanup.policy=delete
log.cleaner.enable=true
log.cleaner.delete.retention.ms=30
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
message.max.bytes=5242880
replica.fetch.max.bytes=5242880
log.dirs=/data01/kafka110-logs
num.partitions=2
default.replication.factor=1
delete.topic.enable=true
auto.create.topics.enable=true
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
offsets.retention.minutes=1440
log.retention.minutes=10
log.local.retention.ms=30
log.segment.bytes=104857600
log.retention.check.interval.ms=30
remote.log.metadata.topic.replication.factor=1
remote.log.storage.system.enable=true
remote.log.metadata.topic.retention.ms=-1{code}
!image-2024-03-05-09-31-35-058.png!

 

 
{code:java}
[2024-03-05 09:27:49,672] ERROR Encountered error while creating 
__remote_log_metadata topic. 
(org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager)
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication 
factor: 3 larger than available brokers: 1.
    at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
    at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.createTopic(TopicBasedRemoteLogMetadataManager.java:509)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.initializeResources(TopicBasedRemoteLogMetadataManager.java:396)
    at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: org.apache.kafka.common.errors.InvalidReplicationFactorException: 
Replication factor: 3 larger than available brokers: 1.{code}
 

  was:
I'm having trouble setting remote.log.metadata.topic.replication.factor to be 
invalid when testing tiered storage
{code:java}
broker.id=1
log.cleanup.policy=delete
log.cleaner.enable=true
log.cleaner.delete.retention.ms=30
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
message.max.bytes=5242880
replica.fetch.max.bytes=5242880
log.dirs=/data01/kafka110-logs
num.partitions=2
default.replication.factor=1
delete.topic.enable=true
auto.create.topics.enable=true
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
offsets.retention.minutes=1440
log.retention.minutes=10
log.local.retention.ms=30
log.segment.bytes=104857600
log.retention.check.interval.ms=30
remote.log.metadata.topic.replication.factor=1
remote.log.storage.system.enable=true
remote.log.metadata.topic.retention.ms=-1{code}
!image-2024-03-05-09-31-35-058.png!

 

 
{code:java}
[2024-03-05 09:27:49,672] ERROR Encountered error while creating 
__remote_log_metadata topic. 
(org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager)
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication 
factor: 3 larger than available brokers: 1.
    at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
    at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.createTopic(TopicBasedRemoteLogMetadataManager.java:509)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.initializeResources(TopicBasedRemoteLogMetadataManager.java:396)
    at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: org.apache.kafka.common.errors.InvalidReplicationFactorException: 
Replication factor: 3 larger than available brokers: 1.{code}
 


>  Replication factor: 3 larger than available brokers: 1.
> 
>
> Key: KAFKA-16340
> URL: https://issues.apache.org/jira/browse/KAFKA-16340
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Major
> Attachments: image-2024-03-05-09-31-35-058.png
>
>
> 设置 remote.log.metadata.topic.replication.factor 是无效的
> {code:java}
> broker.id=1
> log.cleanup.policy=delete
> lo

[jira] [Updated] (KAFKA-16340) Replication factor: 3 larger than available brokers: 1.

2024-03-04 Thread Jianbin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianbin Chen updated KAFKA-16340:
-
Description: 
I'm having trouble setting remote.log.metadata.topic.replication.factor to be 
invalid when testing tiered storage
{code:java}
broker.id=1
log.cleanup.policy=delete
log.cleaner.enable=true
log.cleaner.delete.retention.ms=30
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
message.max.bytes=5242880
replica.fetch.max.bytes=5242880
log.dirs=/data01/kafka110-logs
num.partitions=2
default.replication.factor=1
delete.topic.enable=true
auto.create.topics.enable=true
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
offsets.retention.minutes=1440
log.retention.minutes=10
log.local.retention.ms=30
log.segment.bytes=104857600
log.retention.check.interval.ms=30
remote.log.metadata.topic.replication.factor=1
remote.log.storage.system.enable=true
remote.log.metadata.topic.retention.ms=-1{code}
!image-2024-03-05-09-31-35-058.png!

 

 
{code:java}
[2024-03-05 09:27:49,672] ERROR Encountered error while creating 
__remote_log_metadata topic. 
(org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager)
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication 
factor: 3 larger than available brokers: 1.
    at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
    at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.createTopic(TopicBasedRemoteLogMetadataManager.java:509)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.initializeResources(TopicBasedRemoteLogMetadataManager.java:396)
    at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: org.apache.kafka.common.errors.InvalidReplicationFactorException: 
Replication factor: 3 larger than available brokers: 1.{code}
 

  was:
我在测试分层存储时遇到了设置remote.log.metadata.topic.replication.factor 无效的问题
{code:java}
broker.id=1
log.cleanup.policy=delete
log.cleaner.enable=true
log.cleaner.delete.retention.ms=30
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
message.max.bytes=5242880
replica.fetch.max.bytes=5242880
log.dirs=/data01/kafka110-logs
num.partitions=2
default.replication.factor=1
delete.topic.enable=true
auto.create.topics.enable=true
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
offsets.retention.minutes=1440
log.retention.minutes=10
log.local.retention.ms=30
log.segment.bytes=104857600
log.retention.check.interval.ms=30
remote.log.metadata.topic.replication.factor=1
remote.log.storage.system.enable=true
remote.log.metadata.topic.retention.ms=-1{code}
!image-2024-03-05-09-31-35-058.png!

 

 
{code:java}
[2024-03-05 09:27:49,672] ERROR Encountered error while creating 
__remote_log_metadata topic. 
(org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager)
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication 
factor: 3 larger than available brokers: 1.
    at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
    at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.createTopic(TopicBasedRemoteLogMetadataManager.java:509)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.initializeResources(TopicBasedRemoteLogMetadataManager.java:396)
    at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: org.apache.kafka.common.errors.InvalidReplicationFactorException: 
Replication factor: 3 larger than available brokers: 1.{code}
 


>  Replication factor: 3 larger than available brokers: 1.
> 
>
> Key: KAFKA-16340
> URL: https://issues.apache.org/jira/browse/KAFKA-16340
> Project: Kafka
>  Issue Type: Wish
>Affects Versions: 3.7.0
>Reporter: Jianbin Chen
>Priority: Major
> Attachments: image-2024-03-05-09-31-35-058.png
>
>
> I'm having trouble setting remote.log.metadata.topic.replication.factor to be 
> invalid when testing

[jira] [Created] (KAFKA-16340) Replication factor: 3 larger than available brokers: 1.

2024-03-04 Thread Jianbin Chen (Jira)
Jianbin Chen created KAFKA-16340:


 Summary:  Replication factor: 3 larger than available brokers: 1.
 Key: KAFKA-16340
 URL: https://issues.apache.org/jira/browse/KAFKA-16340
 Project: Kafka
  Issue Type: Wish
Affects Versions: 3.7.0
Reporter: Jianbin Chen
 Attachments: image-2024-03-05-09-31-35-058.png

我在测试分层存储时遇到了设置remote.log.metadata.topic.replication.factor 无效的问题
{code:java}
broker.id=1
log.cleanup.policy=delete
log.cleaner.enable=true
log.cleaner.delete.retention.ms=30
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
message.max.bytes=5242880
replica.fetch.max.bytes=5242880
log.dirs=/data01/kafka110-logs
num.partitions=2
default.replication.factor=1
delete.topic.enable=true
auto.create.topics.enable=true
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
offsets.retention.minutes=1440
log.retention.minutes=10
log.local.retention.ms=30
log.segment.bytes=104857600
log.retention.check.interval.ms=30
remote.log.metadata.topic.replication.factor=1
remote.log.storage.system.enable=true
remote.log.metadata.topic.retention.ms=-1{code}
!image-2024-03-05-09-31-35-058.png!

 

 
{code:java}
[2024-03-05 09:27:49,672] ERROR Encountered error while creating 
__remote_log_metadata topic. 
(org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager)
java.util.concurrent.ExecutionException: 
org.apache.kafka.common.errors.InvalidReplicationFactorException: Replication 
factor: 3 larger than available brokers: 1.
    at 
java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396)
    at 
java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073)
    at 
org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:165)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.createTopic(TopicBasedRemoteLogMetadataManager.java:509)
    at 
org.apache.kafka.server.log.remote.metadata.storage.TopicBasedRemoteLogMetadataManager.initializeResources(TopicBasedRemoteLogMetadataManager.java:396)
    at java.base/java.lang.Thread.run(Thread.java:1589)
Caused by: org.apache.kafka.common.errors.InvalidReplicationFactorException: 
Replication factor: 3 larger than available brokers: 1.{code}
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16060) Some questions about tiered storage capabilities

2024-01-10 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17805362#comment-17805362
 ] 

Jianbin Chen commented on KAFKA-16060:
--

Thank you for your replies.

> Some questions about tiered storage capabilities
> 
>
> Key: KAFKA-16060
> URL: https://issues.apache.org/jira/browse/KAFKA-16060
> Project: Kafka
>  Issue Type: Wish
>  Components: core
>Affects Versions: 3.6.1
>Reporter: Jianbin Chen
>Priority: Major
>
> # If a topic has 3 replicas, when the local expiration time is reached, will 
> all 3 replicas trigger the log transfer to the remote storage, or will only 
> the leader in the isr transfer the log to the remote storage (hdfs, s3)
>  # Topics that do not support compression, do you mean topics that 
> log.cleanup.policy=compact?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16060) Some questions about tiered storage capabilities

2024-01-05 Thread Jianbin Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17803449#comment-17803449
 ] 

Jianbin Chen commented on KAFKA-16060:
--

Thank you both for your replies, and allow me to ask an additional question for 
jbod disk mount access, i.e. log.dirs=/data01,/data02

Will there be any future plans to support tiered storage in this manner?

> Some questions about tiered storage capabilities
> 
>
> Key: KAFKA-16060
> URL: https://issues.apache.org/jira/browse/KAFKA-16060
> Project: Kafka
>  Issue Type: Wish
>  Components: core
>Affects Versions: 3.6.1
>Reporter: Jianbin Chen
>Priority: Major
>
> # If a topic has 3 replicas, when the local expiration time is reached, will 
> all 3 replicas trigger the log transfer to the remote storage, or will only 
> the leader in the isr transfer the log to the remote storage (hdfs, s3)
>  # Topics that do not support compression, do you mean topics that 
> log.cleanup.policy=compact?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16060) Some questions about tiered storage capabilities

2023-12-28 Thread Jianbin Chen (Jira)
Jianbin Chen created KAFKA-16060:


 Summary: Some questions about tiered storage capabilities
 Key: KAFKA-16060
 URL: https://issues.apache.org/jira/browse/KAFKA-16060
 Project: Kafka
  Issue Type: Wish
  Components: core
Affects Versions: 3.6.1
Reporter: Jianbin Chen


# If a topic has 3 replicas, when the local expiration time is reached, will 
all 3 replicas trigger the log transfer to the remote storage, or will only the 
leader in the isr transfer the log to the remote storage (hdfs, s3)
 # Topics that do not support compression, do you mean topics that 
log.cleanup.policy=compact?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16039) RecordHeaders supports the addAll method

2023-12-20 Thread Jianbin Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16039?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jianbin Chen updated KAFKA-16039:
-
External issue URL: https://github.com/apache/kafka/pull/15034

> RecordHeaders supports the addAll method
> 
>
> Key: KAFKA-16039
> URL: https://issues.apache.org/jira/browse/KAFKA-16039
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Jianbin Chen
>Priority: Minor
>
> Why not provide an addAll method in RecordHeaders? This will help reduce the 
> amount of code required to copy between headers



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16039) RecordHeaders supports the addAll method

2023-12-20 Thread Jianbin Chen (Jira)
Jianbin Chen created KAFKA-16039:


 Summary: RecordHeaders supports the addAll method
 Key: KAFKA-16039
 URL: https://issues.apache.org/jira/browse/KAFKA-16039
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Reporter: Jianbin Chen


Why not provide an addAll method in RecordHeaders? This will help reduce the 
amount of code required to copy between headers



--
This message was sent by Atlassian Jira
(v8.20.10#820010)