[jira] [Created] (KAFKA-16886) KRaft partition reassignment failed after upgrade to 3.7.0

2024-06-04 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16886:
-

 Summary: KRaft partition reassignment failed after upgrade to 
3.7.0 
 Key: KAFKA-16886
 URL: https://issues.apache.org/jira/browse/KAFKA-16886
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.7.0
Reporter: Luke Chen


Before upgrade, the topic image doesn't have dirID for the assignment. After 
upgrade, the assignment has the dirID. So in the 
{{{}ReplicaManager#applyDelta{}}}, we'll have have directoryId changes in 
{{{}localChanges{}}}, which will invoke {{AssignmentEvent}} 
[here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L2748].
 With that, we'll get the unexpected {{NOT_LEADER_OR_FOLLOWER}} error.

Reproduce steps:
 # Launch a 3.6.0 controller and a 3.6.0 broker(BrokerA) in Kraft mode;
 # Create a topic with 1 partition;
 # Upgrade Broker A, B, Controllers to 3.7.0
 # Upgrade MV to 3.7: ./bin/kafka-features.sh --bootstrap-server localhost:9092 
upgrade --metadata 3.7
 # reassign the step 2 partition to Broker B

 

The logs in broker B:
[2024-05-31 15:33:25,763] INFO [ReplicaFetcherManager on broker 2] Removed 
fetcher for partitions Set(t1-0) (kafka.server.ReplicaFetcherManager)
[2024-05-31 15:33:25,837] INFO [ReplicaFetcherManager on broker 2] Removed 
fetcher for partitions Set(t1-0) (kafka.server.ReplicaFetcherManager)
[2024-05-31 15:33:25,837] INFO [ReplicaAlterLogDirsManager on broker 2] Removed 
fetcher for partitions Set(t1-0) (kafka.server.ReplicaAlterLogDirsManager)
[2024-05-31 15:33:25,853] INFO Log for partition t1-0 is renamed to 
/tmp/kraft-broker-logs/t1-0.3e6d8bebc1c04f3186ad6cf63145b6fd-delete and is 
scheduled for deletion (kafka.log.LogManager)
[2024-05-31 15:33:26,279] ERROR Controller returned error 
NOT_LEADER_OR_FOLLOWER for assignment of partition 
PartitionData(partitionIndex=0, errorCode=6) into directory 
oULBCf49aiRXaWJpO3I-GA (org.apache.kafka.server.AssignmentsManager)
[2024-05-31 15:33:26,280] WARN Re-queueing assignments: 
[Assignment\{timestampNs=26022187148625, partition=t1:0, 
dir=/tmp/kraft-broker-logs, reason='Applying metadata delta'}] 
(org.apache.kafka.server.AssignmentsManager)
[2024-05-31 15:33:26,786] ERROR Controller returned error 
NOT_LEADER_OR_FOLLOWER for assignment of partition 
PartitionData(partitionIndex=0, errorCode=6) into directory 
oULBCf49aiRXaWJpO3I-GA (org.apache.kafka.server.AssignmentsManager)
[2024-05-31 15:33:27,296] WARN Re-queueing assignments: 
[Assignment\{timestampNs=26022187148625, partition=t1:0, 
dir=/tmp/kraft-broker-logs, reason='Applying metadata delta'}] 
(org.apache.kafka.server.AssignmentsManager)
...{{}}
 
Logs in controller:
[2024-05-31 15:33:25,727] INFO [QuorumController id=1] Successfully altered 1 
out of 1 partition reassignment(s). 
(org.apache.kafka.controller.ReplicationControlManager)
[2024-05-31 15:33:25,727] INFO [QuorumController id=1] Replayed partition 
assignment change PartitionChangeRecord(partitionId=0, 
topicId=tMiJOQznTLKtOZ8rLqdgqw, isr=null, leader=-2, replicas=[6, 2], 
removingReplicas=[2], addingReplicas=[6], leaderRecoveryState=-1, 
directories=[RuDIAGGJrTG2NU6tEOkbHw, AA], 
eligibleLeaderReplicas=null, lastKnownElr=null) for topic t1 
(org.apache.kafka.controller.ReplicationControlManager)
[2024-05-31 15:33:25,802] INFO [QuorumController id=1] AlterPartition request 
from node 2 for t1-0 completed the ongoing partition reassignment and triggered 
a leadership change. Returning NEW_LEADER_ELECTED. 
(org.apache.kafka.controller.ReplicationControlManager)
[2024-05-31 15:33:25,802] INFO [QuorumController id=1] UNCLEAN partition change 
for t1-0 with topic ID tMiJOQznTLKtOZ8rLqdgqw: replicas: [6, 2] -> [6], 
directories: [RuDIAGGJrTG2NU6tEOkbHw, AA] -> 
[RuDIAGGJrTG2NU6tEOkbHw], isr: [2] -> [6], removingReplicas: [2] -> [], 
addingReplicas: [6] -> [], leader: 2 -> 6, leaderEpoch: 3 -> 4, partitionEpoch: 
5 -> 6 (org.apache.kafka.controller.ReplicationControlManager)
[2024-05-31 15:33:25,802] INFO [QuorumController id=1] Replayed partition 
assignment change PartitionChangeRecord(partitionId=0, 
topicId=tMiJOQznTLKtOZ8rLqdgqw, isr=[6], leader=6, replicas=[6], 
removingReplicas=[], addingReplicas=[], leaderRecoveryState=-1, 
directories=[RuDIAGGJrTG2NU6tEOkbHw], eligibleLeaderReplicas=null, 
lastKnownElr=null) for topic t1 
(org.apache.kafka.controller.ReplicationControlManager)
[2024-05-31 15:33:26,277] WARN [QuorumController id=1] 
AssignReplicasToDirsRequest from broker 2 references non assigned partition 
t1-0 (org.apache.kafka.controller.ReplicationControlManager)
[2024-05-31 15:33:26,785] WARN [QuorumController id=1] 
AssignReplicasToDirsRequest from broker 2 references non assigned partition 
t1-0 (org.apache.kafka.controller.ReplicationControlManager)
[2024-05-31 15:33:27,293] WARN [QuorumController id=1] 
AssignReplicasToDirsRequest from broker 2 referen

[jira] [Created] (KAFKA-16887) document busy metrics value when remoteLogManager throttling

2024-06-04 Thread Luke Chen (Jira)
Luke Chen created KAFKA-16887:
-

 Summary: document busy metrics value when remoteLogManager 
throttling
 Key: KAFKA-16887
 URL: https://issues.apache.org/jira/browse/KAFKA-16887
 Project: Kafka
  Issue Type: Sub-task
Reporter: Luke Chen


Context: https://github.com/apache/kafka/pull/15820#discussion_r1625304008



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16888) Fix failed StorageToolTest.testFormatSucceedsIfAllDirectoriesAreAvailable

2024-06-04 Thread Chia-Ping Tsai (Jira)
Chia-Ping Tsai created KAFKA-16888:
--

 Summary: Fix failed 
StorageToolTest.testFormatSucceedsIfAllDirectoriesAreAvailable
 Key: KAFKA-16888
 URL: https://issues.apache.org/jira/browse/KAFKA-16888
 Project: Kafka
  Issue Type: Bug
Reporter: Chia-Ping Tsai
Assignee: Chia-Ping Tsai


The commit 
(https://github.com/apache/kafka/commit/459da4795a511f6933e940fcf105a824bd9e589c#diff-4bacfdbf0e63a4d5f3deb1a0d39037a18510ac24ee5ec276fe70bc818ba4d209L505)
 added new string to `stream`, so the test case will fail due to extra output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.8 #8

2024-06-04 Thread Apache Jenkins Server
See 




Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2968

2024-06-04 Thread Apache Jenkins Server
See 




[jira] [Resolved] (KAFKA-16664) Re-add EventAccumulator.take(timeout)

2024-06-04 Thread David Jacot (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16664?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

David Jacot resolved KAFKA-16664.
-
Fix Version/s: 3.8.0
   Resolution: Fixed

> Re-add EventAccumulator.take(timeout)
> -
>
> Key: KAFKA-16664
> URL: https://issues.apache.org/jira/browse/KAFKA-16664
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Jeff Kim
>Assignee: Jeff Kim
>Priority: Major
> Fix For: 3.8.0
>
>
> [https://github.com/apache/kafka/pull/15835] should be used with a timeout in 
> EventAccumulator#take. We added a commit to remove the timeout, we should 
> revert it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16488) fix flaky MirrorConnectorsIntegrationExactlyOnceTest#testReplication

2024-06-04 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-16488.
---
Resolution: Fixed

> fix flaky MirrorConnectorsIntegrationExactlyOnceTest#testReplication
> 
>
> Key: KAFKA-16488
> URL: https://issues.apache.org/jira/browse/KAFKA-16488
> Project: Kafka
>  Issue Type: Test
>Reporter: Chia-Ping Tsai
>Priority: Major
>
> org.apache.kafka.connect.runtime.rest.errors.ConnectRestException: Could not 
> reset connector offsets. Error response: {"error_code":500,"message":"Failed 
> to perform zombie fencing for source connector prior to modifying offsets"}
>   at 
> app//org.apache.kafka.connect.util.clusters.EmbeddedConnect.resetConnectorOffsets(EmbeddedConnect.java:646)
>   at 
> app//org.apache.kafka.connect.util.clusters.EmbeddedConnectCluster.resetConnectorOffsets(EmbeddedConnectCluster.java:48)
>   at 
> app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest.resetAllMirrorMakerConnectorOffsets(MirrorConnectorsIntegrationBaseTest.java:1063)
>   at 
> app//org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationExactlyOnceTest.testReplication(MirrorConnectorsIntegrationExactlyOnceTest.java:90)
>   at 
> java.base@21.0.2/jdk.internal.reflect.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
>   at java.base@21.0.2/java.lang.reflect.Method.invoke(Method.java:580)
>   at 
> app//org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728)
>   at 
> app//org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60)
>   at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131)
>   at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.intercept(TimeoutExtension.java:156)
>   at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestableMethod(TimeoutExtension.java:147)
>   at 
> app//org.junit.jupiter.engine.extension.TimeoutExtension.interceptTestMethod(TimeoutExtension.java:86)
>   at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker$ReflectiveInterceptorCall.lambda$ofVoidMethod$0(InterceptingExecutableInvoker.java:103)
>   at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.lambda$invoke$0(InterceptingExecutableInvoker.java:93)
>   at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain$InterceptedInvocation.proceed(InvocationInterceptorChain.java:106)
>   at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.proceed(InvocationInterceptorChain.java:64)
>   at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.chainAndInvoke(InvocationInterceptorChain.java:45)
>   at 
> app//org.junit.jupiter.engine.execution.InvocationInterceptorChain.invoke(InvocationInterceptorChain.java:37)
>   at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:92)
>   at 
> app//org.junit.jupiter.engine.execution.InterceptingExecutableInvoker.invoke(InterceptingExecutableInvoker.java:86)
>   at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.lambda$invokeTestMethod$7(TestMethodTestDescriptor.java:218)
>   at 
> app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>   at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.invokeTestMethod(TestMethodTestDescriptor.java:214)
>   at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:139)
>   at 
> app//org.junit.jupiter.engine.descriptor.TestMethodTestDescriptor.execute(TestMethodTestDescriptor.java:69)
>   at 
> app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$6(NodeTestTask.java:151)
>   at 
> app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>   at 
> app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$8(NodeTestTask.java:141)
>   at 
> app//org.junit.platform.engine.support.hierarchical.Node.around(Node.java:137)
>   at 
> app//org.junit.platform.engine.support.hierarchical.NodeTestTask.lambda$executeRecursively$9(NodeTestTask.java:139)
>   at 
> app//org.junit.platform.engine.support.hierarchical.ThrowableCollector.execute(ThrowableCollector.java:73)
>   at 
> app//org.junit.platform.engine.support.hierarchical.NodeTestTask.executeRecursively(NodeTestTask.java:138)
>   at 
> app//org.junit.platform.engine.support.hi

[jira] [Resolved] (KAFKA-14657) Admin.fenceProducers fails when Producer has ongoing transaction - but Producer gets fenced

2024-06-04 Thread Edoardo Comar (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Edoardo Comar resolved KAFKA-14657.
---
Resolution: Duplicate

duplicate of https://issues.apache.org/jira/browse/KAFKA-16570

> Admin.fenceProducers fails when Producer has ongoing transaction - but 
> Producer gets fenced
> ---
>
> Key: KAFKA-14657
> URL: https://issues.apache.org/jira/browse/KAFKA-14657
> Project: Kafka
>  Issue Type: Bug
>  Components: admin
>Reporter: Edoardo Comar
>Assignee: Edoardo Comar
>Priority: Major
> Attachments: FenceProducerDuringTx.java, FenceProducerOutsideTx.java
>
>
> Admin.fenceProducers() 
> fails with a ConcurrentTransactionsException if invoked when a Producer has a 
> transaction ongoing.
> However, further attempts by that producer to produce fail with 
> InvalidProducerEpochException and the producer is not re-usable, 
> cannot abort/commit as it is fenced.
> An InvalidProducerEpochException is also logged as error on the broker
> [2023-01-27 17:16:32,220] ERROR [ReplicaManager broker=1] Error processing 
> append operation on partition topic-0 (kafka.server.ReplicaManager)
> org.apache.kafka.common.errors.InvalidProducerEpochException: Epoch of 
> producer 1062 at offset 84 in topic-0 is 0, which is smaller than the last 
> seen epoch
>  
> Conversely, if Admin.fenceProducers() 
> is invoked while there is no open transaction, the call succeeds and further 
> attempts by that producer to produce fail with ProducerFenced.
> see attached snippets
> As the caller of Admin.fenceProducers() is likely unaware of the producers 
> state, the call should succeed regardless



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16888) Fix failed StorageToolTest.testFormatSucceedsIfAllDirectoriesAreAvailable and StorageToolTest.testFormatEmptyDirectory

2024-06-04 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-16888.

Fix Version/s: 3.9.0
   Resolution: Fixed

> Fix failed StorageToolTest.testFormatSucceedsIfAllDirectoriesAreAvailable and 
> StorageToolTest.testFormatEmptyDirectory
> --
>
> Key: KAFKA-16888
> URL: https://issues.apache.org/jira/browse/KAFKA-16888
> Project: Kafka
>  Issue Type: Bug
>Reporter: Chia-Ping Tsai
>Assignee: Kuan Po Tseng
>Priority: Minor
> Fix For: 3.9.0
>
>
> The commit 
> (https://github.com/apache/kafka/commit/459da4795a511f6933e940fcf105a824bd9e589c#diff-4bacfdbf0e63a4d5f3deb1a0d39037a18510ac24ee5ec276fe70bc818ba4d209L505)
>  added new string to `stream`, so the test case will fail due to extra output.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16837) Kafka Connect fails on update connector for incorrect previous Config Provider tasks

2024-06-04 Thread Chris Egerton (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton resolved KAFKA-16837.
---
Fix Version/s: 3.9.0
   Resolution: Fixed

> Kafka Connect fails on update connector for incorrect previous Config 
> Provider tasks
> 
>
> Key: KAFKA-16837
> URL: https://issues.apache.org/jira/browse/KAFKA-16837
> Project: Kafka
>  Issue Type: Bug
>  Components: connect
>Affects Versions: 3.5.1, 3.6.1, 3.8.0
>Reporter: Sergey Ivanov
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.9.0
>
> Attachments: kafka_connect_config.png
>
>
> Hello,
> We faced an issue when is not possible to update Connector config if the 
> *previous* task contains ConfigProvider's value with incorrect value that 
> leads to ConfigException.
> I can provide simple Test Case to reproduce it with FileConfigProvider, but 
> actually any ConfigProvider is acceptable that could raise exception if 
> something wrong with config (like resource doesn't exist).
> *Prerequisites:*
> Kafka Connect instance with config providers:
>  
> {code:java}
> config.providers=file
> config.providers.file.class=org.apache.kafka.common.config.provider.FileConfigProvider{code}
>  
> 1. Create Kafka topic "test"
> 2. On the Kafka Connect instance create the file 
> "/opt/kafka/provider.properties" with content
> {code:java}
> topics=test
> {code}
> 3. Create simple FileSink connector:
> {code:java}
> PUT /connectors/local-file-sink/config
> {
>   "connector.class": "FileStreamSink",
>   "tasks.max": "1",
>   "file": "/opt/kafka/test.sink.txt",
>   "topics": "${file:/opt/kafka/provider.properties:topics}"
> }
> {code}
> 4. Checks that everything works fine:
> {code:java}
> GET /connectors?expand=info&expand=status
> ...
> "status": {
>   "name": "local-file-sink",
>   "connector": {
> "state": "RUNNING",
> "worker_id": "10.10.10.10:8083"
>   },
>   "tasks": [
> {
>   "id": 0,
>   "state": "RUNNING",
>   "worker_id": "10.10.10.10:8083"
> }
>   ],
>   "type": "sink"
> }
>   }
> }
> {code}
> Looks fine.
> 5. Renames the file to "/opt/kafka/provider2.properties".
> 6. Update connector with new correct file name:
> {code:java}
> PUT /connectors/local-file-sink/config
> {
>   "connector.class": "FileStreamSink",
>   "tasks.max": "1",
>   "file": "/opt/kafka/test.sink.txt",
>   "topics": "${file:/opt/kafka/provider2.properties:topics}"
> }
> {code}
> Update {*}succeed{*}, got 200. 
> 7. Checks that everything works fine:
> {code:java}
> {
>   "local-file-sink": {
> "info": {
>   "name": "local-file-sink",
>   "config": {
> "connector.class": "FileStreamSink",
> "file": "/opt/kafka/test.sink.txt",
> "tasks.max": "1",
> "topics": "${file:/opt/kafka/provider2.properties:topics}",
> "name": "local-file-sink"
>   },
>   "tasks": [
> {
>   "connector": "local-file-sink",
>   "task": 0
> }
>   ],
>   "type": "sink"
> },
> "status": {
>   "name": "local-file-sink",
>   "connector": {
> "state": "RUNNING",
> "worker_id": "10.10.10.10:8083"
>   },
>   "tasks": [
> {
>   "id": 0,
>   "state": "FAILED",
>   "worker_id": "10.10.10.10:8083",
>   "trace": "org.apache.kafka.common.errors.InvalidTopicException: 
> Invalid topics: [${file:/opt/kafka/provider.properties:topics}]"
> }
>   ],
>   "type": "sink"
> }
>   }
> }
> {code}
> Config has been updated, but new task has not been created. And as result 
> connector doesn't work.
> It failed on:
> {code:java}
> [2024-05-24T12:08:24.362][ERROR][request_id= ][tenant_id= 
> ][thread=DistributedHerder-connect-1-1][class=org.apache.kafka.connect.runtime.distributed.DistributedHerder][method=lambda$reconfigureConnectorTasksWithExponentialBackoffRetries$44]
>  [Worker clientId=connect-1, groupId=streaming-service_streaming_service] 
> Failed to reconfigure connector's tasks (local-file-sink), retrying after 
> backoff.
> org.apache.kafka.common.config.ConfigException: Could not read properties 
> from file /opt/kafka/provider.properties
>  at 
> org.apache.kafka.common.config.provider.FileConfigProvider.get(FileConfigProvider.java:98)
>  at 
> org.apache.kafka.common.config.ConfigTransformer.transform(ConfigTransformer.java:103)
>  at 
> org.apache.kafka.connect.runtime.WorkerConfigTransformer.transform(WorkerConfigTransformer.java:58)
>  at 
> org.apache.kafka.connect.storage.ClusterConfigState.taskConfig(ClusterConfigState.java:181)
>  at 
> org.apache.kafka.connect.runtime.AbstractHerder.taskConfigsChanged(AbstractHerder.java:804)
>  at 

[jira] [Resolved] (KAFKA-16838) Kafka Connect loads old tasks from removed connectors

2024-06-04 Thread Chris Egerton (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16838?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chris Egerton resolved KAFKA-16838.
---
Fix Version/s: 3.9.0
   Resolution: Fixed

> Kafka Connect loads old tasks from removed connectors
> -
>
> Key: KAFKA-16838
> URL: https://issues.apache.org/jira/browse/KAFKA-16838
> Project: Kafka
>  Issue Type: Bug
>  Components: connect
>Affects Versions: 3.5.1, 3.6.1, 3.8.0
>Reporter: Sergey Ivanov
>Assignee: Chris Egerton
>Priority: Major
> Fix For: 3.9.0
>
>
> Hello,
> When creating connector we faced an error from one of our ConfigProviders 
> about not existing resource, but we didn't try to set that resource as config 
> value:
> {code:java}
> [2024-05-24T12:08:24.362][ERROR][request_id= ][tenant_id= 
> ][thread=DistributedHerder-connect-1-1][class=org.apache.kafka.connect.runtime.distributed.DistributedHerder][method=lambda$reconfigureConnectorTasksWithExponentialBackoffRetries$44]
>  [Worker clientId=connect-1, groupId=streaming-service_streaming_service] 
> Failed to reconfigure connector's tasks (local-file-sink), retrying after 
> backoff.
> org.apache.kafka.common.config.ConfigException: Could not read properties 
> from file /opt/kafka/provider.properties
>  at 
> org.apache.kafka.common.config.provider.FileConfigProvider.get(FileConfigProvider.java:98)
>  at 
> org.apache.kafka.common.config.ConfigTransformer.transform(ConfigTransformer.java:103)
>  at 
> org.apache.kafka.connect.runtime.WorkerConfigTransformer.transform(WorkerConfigTransformer.java:58)
>  at 
> org.apache.kafka.connect.storage.ClusterConfigState.taskConfig(ClusterConfigState.java:181)
>  at 
> org.apache.kafka.connect.runtime.AbstractHerder.taskConfigsChanged(AbstractHerder.java:804)
>  at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.publishConnectorTaskConfigs(DistributedHerder.java:2089)
>  at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.reconfigureConnector(DistributedHerder.java:2082)
>  at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.reconfigureConnectorTasksWithExponentialBackoffRetries(DistributedHerder.java:2025)
>  at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.lambda$null$42(DistributedHerder.java:2038)
>  at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.runRequest(DistributedHerder.java:2232)
>  at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.tick(DistributedHerder.java:470)
>  at 
> org.apache.kafka.connect.runtime.distributed.DistributedHerder.run(DistributedHerder.java:371)
>  at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
>  at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
>  at java.base/java.lang.Thread.run(Thread.java:840)
>  {code}
> It looked like there already was connector with the same name and same 
> config, +but it wasn't.+
> After investigation we found out, that few months ago on that cloud there was 
> the connector with the same name and another value for config provider. Then 
> it was removed, but by some reason when we tried to create connector with the 
> same name months ago AbstractHerder tried to update tasks from our previous 
> connector
> As an example I used FileConfigProvider, but actually any ConfigProvider is 
> acceptable which could raise exception if something wrong with config (like 
> result doesn't exist).
> We continued our investigation and found the issue 
> https://issues.apache.org/jira/browse/KAFKA-7745 that says Connect doesn't 
> send tombstone message for *commit* and *task* records in the config topic of 
> Kafka Connect. As we remember, the config topic is `compact` *that means 
> commit and tasks are are always stored* (months, years after connector 
> removing) while tombstones for connector messages are cleaned with 
> {{delete.retention.ms}}  property. That impacts further connector creations 
> with the same name.
> We didn't investigate reasons in ConfigClusterStore and how to avoid that 
> issue, because would {+}like to ask{+}, probably it's better to fix 
> KAFKA-7745 and send tombstones for commit and task messages as connect does 
> for connector and target messages?
> In the common way the TC looks like:
>  # Create connector with config provider to resource1
>  # Remove connector
>  # Remove resouce1
>  # Wait 2-4 weeks :) (until config topic being compacted and tombstone 
> messages about config and target connector are removed)
>  # Try to create connector with the same name and config provider to reso

Re: [DISCUSS] Apache Kafka 3.7.1 release

2024-06-04 Thread Igor Soarez
Hi Justine,

I'm sorry this release is delayed. A few new blockers have come up and we're 
working through them.

Here's the release plan: 
https://cwiki.apache.org/confluence/display/KAFKA/Release+plan+3.7.1

Best,

--
Igor


[jira] [Resolved] (KAFKA-16886) KRaft partition reassignment failed after upgrade to 3.7.0

2024-06-04 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez resolved KAFKA-16886.
-
Resolution: Fixed

> KRaft partition reassignment failed after upgrade to 3.7.0 
> ---
>
> Key: KAFKA-16886
> URL: https://issues.apache.org/jira/browse/KAFKA-16886
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Luke Chen
>Assignee: Igor Soarez
>Priority: Blocker
> Fix For: 3.8.0, 3.7.1
>
>
> Before upgrade, the topic image doesn't have dirID for the assignment. After 
> upgrade, the assignment has the dirID. So in the 
> {{{}ReplicaManager#applyDelta{}}}, we'll have have directoryId changes in 
> {{{}localChanges{}}}, which will invoke {{AssignmentEvent}} 
> [here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L2748].
>  With that, we'll get the unexpected {{NOT_LEADER_OR_FOLLOWER}} error.
> Reproduce steps:
>  # Launch a 3.6.0 controller and a 3.6.0 broker(BrokerA) in Kraft mode;
>  # Create a topic with 1 partition;
>  # Upgrade Broker A, B, Controllers to 3.7.0
>  # Upgrade MV to 3.7: ./bin/kafka-features.sh --bootstrap-server 
> localhost:9092 upgrade --metadata 3.7
>  # reassign the step 2 partition to Broker B
>  
> The logs in broker B:
> {code:java}
> [2024-05-31 15:33:25,763] INFO [ReplicaFetcherManager on broker 2] Removed 
> fetcher for partitions Set(t1-0) (kafka.server.ReplicaFetcherManager)
> [2024-05-31 15:33:25,837] INFO [ReplicaFetcherManager on broker 2] Removed 
> fetcher for partitions Set(t1-0) (kafka.server.ReplicaFetcherManager)
> [2024-05-31 15:33:25,837] INFO [ReplicaAlterLogDirsManager on broker 2] 
> Removed fetcher for partitions Set(t1-0) 
> (kafka.server.ReplicaAlterLogDirsManager)
> [2024-05-31 15:33:25,853] INFO Log for partition t1-0 is renamed to 
> /tmp/kraft-broker-logs/t1-0.3e6d8bebc1c04f3186ad6cf63145b6fd-delete and is 
> scheduled for deletion (kafka.log.LogManager)
> [2024-05-31 15:33:26,279] ERROR Controller returned error 
> NOT_LEADER_OR_FOLLOWER for assignment of partition 
> PartitionData(partitionIndex=0, errorCode=6) into directory 
> oULBCf49aiRXaWJpO3I-GA (org.apache.kafka.server.AssignmentsManager)
> [2024-05-31 15:33:26,280] WARN Re-queueing assignments: 
> [Assignment\{timestampNs=26022187148625, partition=t1:0, 
> dir=/tmp/kraft-broker-logs, reason='Applying metadata delta'}] 
> (org.apache.kafka.server.AssignmentsManager)
> [2024-05-31 15:33:26,786] ERROR Controller returned error 
> NOT_LEADER_OR_FOLLOWER for assignment of partition 
> PartitionData(partitionIndex=0, errorCode=6) into directory 
> oULBCf49aiRXaWJpO3I-GA (org.apache.kafka.server.AssignmentsManager)
> [2024-05-31 15:33:27,296] WARN Re-queueing assignments: 
> [Assignment\{timestampNs=26022187148625, partition=t1:0, 
> dir=/tmp/kraft-broker-logs, reason='Applying metadata delta'}] 
> (org.apache.kafka.server.AssignmentsManager)
> ...{{}}
> {code}
>  
> Logs in controller:
> {code:java}
> [2024-05-31 15:33:25,727] INFO [QuorumController id=1] Successfully altered 1 
> out of 1 partition reassignment(s). 
> (org.apache.kafka.controller.ReplicationControlManager)
> [2024-05-31 15:33:25,727] INFO [QuorumController id=1] Replayed partition 
> assignment change PartitionChangeRecord(partitionId=0, 
> topicId=tMiJOQznTLKtOZ8rLqdgqw, isr=null, leader=-2, replicas=[6, 2], 
> removingReplicas=[2], addingReplicas=[6], leaderRecoveryState=-1, 
> directories=[RuDIAGGJrTG2NU6tEOkbHw, AA], 
> eligibleLeaderReplicas=null, lastKnownElr=null) for topic t1 
> (org.apache.kafka.controller.ReplicationControlManager)
> [2024-05-31 15:33:25,802] INFO [QuorumController id=1] AlterPartition request 
> from node 2 for t1-0 completed the ongoing partition reassignment and 
> triggered a leadership change. Returning NEW_LEADER_ELECTED. 
> (org.apache.kafka.controller.ReplicationControlManager)
> [2024-05-31 15:33:25,802] INFO [QuorumController id=1] UNCLEAN partition 
> change for t1-0 with topic ID tMiJOQznTLKtOZ8rLqdgqw: replicas: [6, 2] -> 
> [6], directories: [RuDIAGGJrTG2NU6tEOkbHw, AA] -> 
> [RuDIAGGJrTG2NU6tEOkbHw], isr: [2] -> [6], removingReplicas: [2] -> [], 
> addingReplicas: [6] -> [], leader: 2 -> 6, leaderEpoch: 3 -> 4, 
> partitionEpoch: 5 -> 6 (org.apache.kafka.controller.ReplicationControlManager)
> [2024-05-31 15:33:25,802] INFO [QuorumController id=1] Replayed partition 
> assignment change PartitionChangeRecord(partitionId=0, 
> topicId=tMiJOQznTLKtOZ8rLqdgqw, isr=[6], leader=6, replicas=[6], 
> removingReplicas=[], addingReplicas=[], leaderRecoveryState=-1, 
> directories=[RuDIAGGJrTG2NU6tEOkbHw], eligibleLeaderReplicas=null, 
> lastKnownElr=null) for topic t1 
> (org.apache.kafka.controller.ReplicationControlManager)
> [2024-

[jira] [Reopened] (KAFKA-16886) KRaft partition reassignment failed after upgrade to 3.7.0

2024-06-04 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez reopened KAFKA-16886:
-

> KRaft partition reassignment failed after upgrade to 3.7.0 
> ---
>
> Key: KAFKA-16886
> URL: https://issues.apache.org/jira/browse/KAFKA-16886
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Luke Chen
>Assignee: Igor Soarez
>Priority: Blocker
> Fix For: 3.8.0, 3.7.1
>
>
> Before upgrade, the topic image doesn't have dirID for the assignment. After 
> upgrade, the assignment has the dirID. So in the 
> {{{}ReplicaManager#applyDelta{}}}, we'll have have directoryId changes in 
> {{{}localChanges{}}}, which will invoke {{AssignmentEvent}} 
> [here|https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/server/ReplicaManager.scala#L2748].
>  With that, we'll get the unexpected {{NOT_LEADER_OR_FOLLOWER}} error.
> Reproduce steps:
>  # Launch a 3.6.0 controller and a 3.6.0 broker(BrokerA) in Kraft mode;
>  # Create a topic with 1 partition;
>  # Upgrade Broker A, B, Controllers to 3.7.0
>  # Upgrade MV to 3.7: ./bin/kafka-features.sh --bootstrap-server 
> localhost:9092 upgrade --metadata 3.7
>  # reassign the step 2 partition to Broker B
>  
> The logs in broker B:
> {code:java}
> [2024-05-31 15:33:25,763] INFO [ReplicaFetcherManager on broker 2] Removed 
> fetcher for partitions Set(t1-0) (kafka.server.ReplicaFetcherManager)
> [2024-05-31 15:33:25,837] INFO [ReplicaFetcherManager on broker 2] Removed 
> fetcher for partitions Set(t1-0) (kafka.server.ReplicaFetcherManager)
> [2024-05-31 15:33:25,837] INFO [ReplicaAlterLogDirsManager on broker 2] 
> Removed fetcher for partitions Set(t1-0) 
> (kafka.server.ReplicaAlterLogDirsManager)
> [2024-05-31 15:33:25,853] INFO Log for partition t1-0 is renamed to 
> /tmp/kraft-broker-logs/t1-0.3e6d8bebc1c04f3186ad6cf63145b6fd-delete and is 
> scheduled for deletion (kafka.log.LogManager)
> [2024-05-31 15:33:26,279] ERROR Controller returned error 
> NOT_LEADER_OR_FOLLOWER for assignment of partition 
> PartitionData(partitionIndex=0, errorCode=6) into directory 
> oULBCf49aiRXaWJpO3I-GA (org.apache.kafka.server.AssignmentsManager)
> [2024-05-31 15:33:26,280] WARN Re-queueing assignments: 
> [Assignment\{timestampNs=26022187148625, partition=t1:0, 
> dir=/tmp/kraft-broker-logs, reason='Applying metadata delta'}] 
> (org.apache.kafka.server.AssignmentsManager)
> [2024-05-31 15:33:26,786] ERROR Controller returned error 
> NOT_LEADER_OR_FOLLOWER for assignment of partition 
> PartitionData(partitionIndex=0, errorCode=6) into directory 
> oULBCf49aiRXaWJpO3I-GA (org.apache.kafka.server.AssignmentsManager)
> [2024-05-31 15:33:27,296] WARN Re-queueing assignments: 
> [Assignment\{timestampNs=26022187148625, partition=t1:0, 
> dir=/tmp/kraft-broker-logs, reason='Applying metadata delta'}] 
> (org.apache.kafka.server.AssignmentsManager)
> ...{{}}
> {code}
>  
> Logs in controller:
> {code:java}
> [2024-05-31 15:33:25,727] INFO [QuorumController id=1] Successfully altered 1 
> out of 1 partition reassignment(s). 
> (org.apache.kafka.controller.ReplicationControlManager)
> [2024-05-31 15:33:25,727] INFO [QuorumController id=1] Replayed partition 
> assignment change PartitionChangeRecord(partitionId=0, 
> topicId=tMiJOQznTLKtOZ8rLqdgqw, isr=null, leader=-2, replicas=[6, 2], 
> removingReplicas=[2], addingReplicas=[6], leaderRecoveryState=-1, 
> directories=[RuDIAGGJrTG2NU6tEOkbHw, AA], 
> eligibleLeaderReplicas=null, lastKnownElr=null) for topic t1 
> (org.apache.kafka.controller.ReplicationControlManager)
> [2024-05-31 15:33:25,802] INFO [QuorumController id=1] AlterPartition request 
> from node 2 for t1-0 completed the ongoing partition reassignment and 
> triggered a leadership change. Returning NEW_LEADER_ELECTED. 
> (org.apache.kafka.controller.ReplicationControlManager)
> [2024-05-31 15:33:25,802] INFO [QuorumController id=1] UNCLEAN partition 
> change for t1-0 with topic ID tMiJOQznTLKtOZ8rLqdgqw: replicas: [6, 2] -> 
> [6], directories: [RuDIAGGJrTG2NU6tEOkbHw, AA] -> 
> [RuDIAGGJrTG2NU6tEOkbHw], isr: [2] -> [6], removingReplicas: [2] -> [], 
> addingReplicas: [6] -> [], leader: 2 -> 6, leaderEpoch: 3 -> 4, 
> partitionEpoch: 5 -> 6 (org.apache.kafka.controller.ReplicationControlManager)
> [2024-05-31 15:33:25,802] INFO [QuorumController id=1] Replayed partition 
> assignment change PartitionChangeRecord(partitionId=0, 
> topicId=tMiJOQznTLKtOZ8rLqdgqw, isr=[6], leader=6, replicas=[6], 
> removingReplicas=[], addingReplicas=[], leaderRecoveryState=-1, 
> directories=[RuDIAGGJrTG2NU6tEOkbHw], eligibleLeaderReplicas=null, 
> lastKnownElr=null) for topic t1 
> (org.apache.kafka.controller.ReplicationControlManager)
> [2024-05-31 15:33:26,277] WA

[jira] [Resolved] (KAFKA-16583) Update from 3.4.0 to 3.7.0 image write failed in Kraft mode

2024-06-04 Thread Igor Soarez (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Igor Soarez resolved KAFKA-16583.
-
Resolution: Fixed

> Update from 3.4.0 to 3.7.0 image write failed in Kraft mode
> ---
>
> Key: KAFKA-16583
> URL: https://issues.apache.org/jira/browse/KAFKA-16583
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.7.0
>Reporter: HanXu
>Assignee: HanXu
>Priority: Blocker
> Fix For: 3.8.0, 3.7.1
>
>   Original Estimate: 6h
>  Remaining Estimate: 6h
>
> How to reproduce:
> 1. Launch a 3.4.0 controller and a 3.4.0 broker(BrokerA) in Kraft mode;
> 2. Create a topic with 1 partition;
> 3. Launch a 3.4.0 broker(Broker B) in Kraft mode and reassign the step 2 
> partition to Broker B;
> 4. Upgrade Broker B to 3.7.0;
> The Broker B will keep log the following error:
> {code:java}
> [2024-04-18 14:46:54,144] ERROR Encountered metadata loading fault: Unhandled 
> error initializing new publishers 
> (org.apache.kafka.server.fault.LoggingFaultHandler)
> org.apache.kafka.image.writer.UnwritableMetadataException: Metadata has been 
> lost because the following could not be represented in metadata version 
> 3.4-IV0: the directory assignment state of one or more replicas
>   at 
> org.apache.kafka.image.writer.ImageWriterOptions.handleLoss(ImageWriterOptions.java:94)
>   at 
> org.apache.kafka.metadata.PartitionRegistration.toRecord(PartitionRegistration.java:391)
>   at org.apache.kafka.image.TopicImage.write(TopicImage.java:71)
>   at org.apache.kafka.image.TopicsImage.write(TopicsImage.java:84)
>   at org.apache.kafka.image.MetadataImage.write(MetadataImage.java:155)
>   at 
> org.apache.kafka.image.loader.MetadataLoader.initializeNewPublishers(MetadataLoader.java:295)
>   at 
> org.apache.kafka.image.loader.MetadataLoader.lambda$scheduleInitializeNewPublishers$0(MetadataLoader.java:266)
>   at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
>   at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
>   at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
>   at java.base/java.lang.Thread.run(Thread.java:840)
> {code}
> Bug:
>  - When reassigning partition, PartitionRegistration#merge will set the new 
> replicas with UNASSIGNED directory;
>  - But in metadata version 3.4.0 PartitionRegistration#toRecord only allows 
> MIGRATING directory;
> {code:java}
> if (options.metadataVersion().isDirectoryAssignmentSupported()) {
> record.setDirectories(Uuid.toList(directories));
> } else {
> for (Uuid directory : directories) {
> if (!DirectoryId.MIGRATING.equals(directory)) {
> options.handleLoss("the directory assignment state of one 
> or more replicas");
> break;
> }
> }
> }
> {code}
> Solution:
> - PartitionRegistration#toRecord allows both MIGRATING and UNASSIGNED



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16889) Kafka pods are not gracefully terminated

2024-06-04 Thread Sreelekshmi Geetha (Jira)
Sreelekshmi Geetha created KAFKA-16889:
--

 Summary: Kafka pods are not gracefully terminated
 Key: KAFKA-16889
 URL: https://issues.apache.org/jira/browse/KAFKA-16889
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.5.1
Reporter: Sreelekshmi Geetha


The SIGTERM signal is not properly handled by Kafka container and the pod is 
forcibly killed by kubernetes with exit code 137.

 

When we try kill -15 1 on the Kafka container, its exits with code 137, but 
should be 143.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Apache Kafka 3.7.1 release

2024-06-04 Thread Justine Olshan
Hi Igor,

No worries! I'm glad things are getting fixed! That's most important!
And thanks for the release plan!

Justine

On Tue, Jun 4, 2024 at 7:01 AM Igor Soarez  wrote:

> Hi Justine,
>
> I'm sorry this release is delayed. A few new blockers have come up and
> we're working through them.
>
> Here's the release plan:
> https://cwiki.apache.org/confluence/display/KAFKA/Release+plan+3.7.1
>
> Best,
>
> --
> Igor
>


Build failed in Jenkins: Kafka » Kafka Branch Builder » trunk #2969

2024-06-04 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 389021 lines...]
[2024-06-04T14:45:36.059Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testZNodeChangeHandlerForDataChange() PASSED
[2024-06-04T14:45:36.059Z] 
[2024-06-04T14:45:36.059Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testZooKeeperSessionStateMetric() STARTED
[2024-06-04T14:45:36.059Z] 
[2024-06-04T14:45:36.059Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testZooKeeperSessionStateMetric() PASSED
[2024-06-04T14:45:36.059Z] 
[2024-06-04T14:45:36.059Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testExceptionInBeforeInitializingSession() STARTED
[2024-06-04T14:45:36.059Z] 
[2024-06-04T14:45:36.059Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testExceptionInBeforeInitializingSession() PASSED
[2024-06-04T14:45:36.059Z] 
[2024-06-04T14:45:36.059Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testGetChildrenExistingZNode() STARTED
[2024-06-04T14:45:37.981Z] 
[2024-06-04T14:45:37.981Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testGetChildrenExistingZNode() PASSED
[2024-06-04T14:45:37.981Z] 
[2024-06-04T14:45:37.981Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testConnection() STARTED
[2024-06-04T14:45:37.981Z] 
[2024-06-04T14:45:37.981Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testConnection() PASSED
[2024-06-04T14:45:37.981Z] 
[2024-06-04T14:45:37.981Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testZNodeChangeHandlerForCreation() STARTED
[2024-06-04T14:45:37.981Z] 
[2024-06-04T14:45:37.981Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testZNodeChangeHandlerForCreation() PASSED
[2024-06-04T14:45:37.981Z] 
[2024-06-04T14:45:37.981Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testGetAclExistingZNode() STARTED
[2024-06-04T14:45:37.981Z] 
[2024-06-04T14:45:37.981Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testGetAclExistingZNode() PASSED
[2024-06-04T14:45:37.981Z] 
[2024-06-04T14:45:37.981Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testSessionExpiryDuringClose() STARTED
[2024-06-04T14:45:37.981Z] 
[2024-06-04T14:45:37.981Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testSessionExpiryDuringClose() PASSED
[2024-06-04T14:45:37.981Z] 
[2024-06-04T14:45:37.981Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testReinitializeAfterAuthFailure() STARTED
[2024-06-04T14:45:41.560Z] 
[2024-06-04T14:45:41.560Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testReinitializeAfterAuthFailure() PASSED
[2024-06-04T14:45:41.560Z] 
[2024-06-04T14:45:41.560Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testSetAclNonExistentZNode() STARTED
[2024-06-04T14:45:41.560Z] 
[2024-06-04T14:45:41.560Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testSetAclNonExistentZNode() PASSED
[2024-06-04T14:45:41.560Z] 
[2024-06-04T14:45:41.560Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testConnectionLossRequestTermination() STARTED
[2024-06-04T14:45:50.947Z] 
[2024-06-04T14:45:50.947Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testConnectionLossRequestTermination() PASSED
[2024-06-04T14:45:50.947Z] 
[2024-06-04T14:45:50.947Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testExistsNonExistentZNode() STARTED
[2024-06-04T14:45:50.947Z] 
[2024-06-04T14:45:50.947Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testExistsNonExistentZNode() PASSED
[2024-06-04T14:45:50.947Z] 
[2024-06-04T14:45:50.947Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testGetDataNonExistentZNode() STARTED
[2024-06-04T14:45:50.947Z] 
[2024-06-04T14:45:50.947Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testGetDataNonExistentZNode() PASSED
[2024-06-04T14:45:50.947Z] 
[2024-06-04T14:45:50.947Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testConnectionTimeout() STARTED
[2024-06-04T14:45:52.800Z] 
[2024-06-04T14:45:52.800Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testConnectionTimeout() PASSED
[2024-06-04T14:45:52.800Z] 
[2024-06-04T14:45:52.800Z] Gradle Test Run :core:test > Gradle Test Executor 97 
> ZooKeeperClientTest > testBlockOnRequestCompletionFromStateChangeHandler() 
STARTED
[2024-06-04T14:45:5

[DISCUSS] KIP-1043: Administration of groups

2024-06-04 Thread Andrew Schofield
Hi,
I would like to start a discussion thread on KIP-1043: Administration of 
groups. This KIP enhances the command-line tools to make it easier to 
administer groups on clusters with a variety of types of groups.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-1043%3A+Administration+of+groups

Thanks.
Andrew

[jira] [Created] (KAFKA-16890) Failing to build aux state on broker failover

2024-06-04 Thread Francois Visconte (Jira)
Francois Visconte created KAFKA-16890:
-

 Summary: Failing to build aux state on broker failover
 Key: KAFKA-16890
 URL: https://issues.apache.org/jira/browse/KAFKA-16890
 Project: Kafka
  Issue Type: Bug
  Components: Tiered-Storage
Affects Versions: 3.7.0, 3.7.1
Reporter: Francois Visconte


We have clusters where we replace machines often falling into a state where we 
keep having "Error building remote log auxiliary state for loadtest_topic-22" 
and the partition being under-replicated until the leader is manually 
restarted. 

Looking into a specific case, here is what we observed in __remote_log_metadata 
topic:


{code:java}
 
partition: 29, offset: 183593, value: 
RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=GZeRTXLMSNe2BQjRXkg6hQ}, startOffset=10823, endOffset=11536, 
brokerId=10013, maxTimestampMs=1715774588597, eventTimestampMs=1715781657604, 
segmentLeaderEpochs={125=10823, 126=10968, 128=11047, 130=11048, 131=11324, 
133=11442, 134=11443, 135=11445, 136=11521, 137=11533, 139=11535}, 
segmentSizeInBytes=704895, customMetadata=Optional.empty, 
state=COPY_SEGMENT_STARTED}
partition: 29, offset: 183594, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=GZeRTXLMSNe2BQjRXkg6hQ}, customMetadata=Optional.empty, 
state=COPY_SEGMENT_FINISHED, eventTimestampMs=1715781658183, brokerId=10013}
partition: 29, offset: 183669, value: 
RemoteLogSegmentMetadata{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=L1TYzx0lQkagRIF86Kp0QQ}, startOffset=10823, endOffset=11544, 
brokerId=10008, maxTimestampMs=1715781445270, eventTimestampMs=1715782717593, 
segmentLeaderEpochs={125=10823, 126=10968, 128=11047, 130=11048, 131=11324, 
133=11442, 134=11443, 135=11445, 136=11521, 137=11533, 139=11535, 140=11537, 
142=11543}, segmentSizeInBytes=713088, customMetadata=Optional.empty, 
state=COPY_SEGMENT_STARTED}
partition: 29, offset: 183670, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=L1TYzx0lQkagRIF86Kp0QQ}, customMetadata=Optional.empty, 
state=COPY_SEGMENT_FINISHED, eventTimestampMs=1715782718370, brokerId=10008}
partition: 29, offset: 186215, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=L1TYzx0lQkagRIF86Kp0QQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1715867874617, brokerId=10008}
partition: 29, offset: 186216, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=L1TYzx0lQkagRIF86Kp0QQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1715867874725, brokerId=10008}
partition: 29, offset: 186217, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=GZeRTXLMSNe2BQjRXkg6hQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_STARTED, eventTimestampMs=1715867874729, brokerId=10008}
partition: 29, offset: 186218, value: 
RemoteLogSegmentMetadataUpdate{remoteLogSegmentId=RemoteLogSegmentId{topicIdPartition=ClnIeN0MQsi_d4FAOFKaDA:loadtest_topic-22,
 id=GZeRTXLMSNe2BQjRXkg6hQ}, customMetadata=Optional.empty, 
state=DELETE_SEGMENT_FINISHED, eventTimestampMs=1715867874817, brokerId=10008}
{code}
 

It seems that at the time the leader is restarted (10013), a second copy of the 
same segment is tiered by the new leader (10008). Interestingly the segment 
doesn't have the same end offset, which is concerning. 

Then the follower sees the following error repeatedly until the leader is 
restarted: 



 
{code:java}
[2024-05-17 20:46:42,133] DEBUG [ReplicaFetcher replicaId=10013, 
leaderId=10008, fetcherId=0] Handling errors in processFetchRequest for 
partitions HashSet(loadtest_topic-22) (kafka.server.ReplicaFetcherThread)
[2024-05-17 20:46:43,174] DEBUG [ReplicaFetcher replicaId=10013, 
leaderId=10008, fetcherId=0] Received error OFFSET_MOVED_TO_TIERED_STORAGE, at 
fetch offset: 11537, topic-partition: loadtest_topic-22 
(kafka.server.ReplicaFetcherThread)
[2024-05-17 20:46:43,175] ERROR [ReplicaFetcher replicaId=10013, 
leaderId=10008, fetcherId=0] Error building remote log auxiliary state for 
loadtest_topic-22 (kafka.server.ReplicaFetcherThread)
org.apache.kafka.server.log.remote.storage.RemoteStorageException: Couldn't 
build the state from remote store for partition: loadtest_topic-22, 
currentLeaderEpoch: 153, leaderLocalLogStartOffset: 11545, 
leaderLogStartOffset: 11537, epoch: 142as the previous remote log segment 
metadata was not found
{code}
The fo

[jira] [Created] (KAFKA-16891) KIP-1043: Administration of groups

2024-06-04 Thread Andrew Schofield (Jira)
Andrew Schofield created KAFKA-16891:


 Summary: KIP-1043: Administration of groups
 Key: KAFKA-16891
 URL: https://issues.apache.org/jira/browse/KAFKA-16891
 Project: Kafka
  Issue Type: New Feature
Reporter: Andrew Schofield


This issue tracks the development of KIP-1043: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-1043%3A+Administration+of+groups.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] KIP-924: customizable task assignment for Streams

2024-06-04 Thread Matthias J. Sax

Thanks for the update Sophie!

On 5/31/24 3:41 PM, Sophie Blee-Goldman wrote:

Hi all! Coming in with hopefully the last set of minor updates. During
implementation of the new out-of-the-box assignors and the
TaskAssignmentUtils which make heavy use of the new API, we determined a
few quality-of-life improvements were needed to make some of the new
classes/methods a bit easier to use and assign tasks with. These are the
changes:

1. The return type of the ApplicationState#allTasks method was changed from
a Set to a Map

2. Similarly, we changed the return type of KafkaStreamsAssignment#tasks
from a Set to a Map

3. Added two new methods to KafkaStreamsAssignment to allow in-place
modification of the assigned tasks. While not strictly necessary, this
makes it possible to perform iterative assignment strategies and
post-assignment optimizations like the rack-aware algorithm without a huge
amount of code complexity and overhead from converting between mutable
Collections and the immutable KafkaStreamsAssignment. The two new methods
are #assignTask and #removeTask. You can find the full method signatures in
the KIP.

4. We realized that two of the rack-aware assignment configs -- trafficCost
and nonOverlapCost -- were exposed as an int but have a default value of
null, which would result in an NPE. We have changed these to be OptionalInt

5. Since the KafkaStreamsAssignment class holds tasks in a map keyed by
TaskId, it's actually not possible to create an assignment that hits the
error code  ACTIVE_AND_STANDBY_TASK_ASSIGNED_TO_SAME_KAFKASTREAMS. We
removed this error code accordingly.
Note that an IllegalStateException will still be thrown if a user does
attempt this, so the error will still be surfaced to the user. It's just
that this will happen immediately rather than after the face (which if
anything is an improvement since the stacktrace will show them exactly
where the bug in their assignment code occurred).

And that's it! Thanks all
Sophie

On Tue, May 28, 2024 at 1:36 PM Sophie Blee-Goldman 
wrote:


Ah, one more very small thing:

3. We changed the name of a KafkaStreamsAssignment method from #assignment
to just #tasks. The new signature is

  public Set tasks();

The reason for this is that the term "assignment" is used a lot already,
and if we call the object itself an "assignment" then we should refer to
the specific tasks that make up this assignment as just the "tasks"

Also, with the original name, this is a valid but very silly sounding
method call chain: TaskAssignment.assignment().get(0).assignment() (like
I said, too much "assignment" in the mix)

On Tue, May 28, 2024 at 1:13 PM Sophie Blee-Goldman 
wrote:


Hey all,

Two more quick updates to the KIP, please let me know if you have any
questions or feedback or naming suggestions:

1. We'd like to introduce an additional error code with the following
signature:
  * MISSING_PROCESS_ID: A ProcessId present in the input ApplicationState
was not present in the output TaskAssignment

2. While implementing the new TaskInfo class, specifically the
#sourceTopicPartitions and #changelogTopicPartitions APIs, we realized that
the source topic changelog optimization would create some overlap between
these two sets, which might be confusing for users as the API seems to
suggest these are disjoint sets. To make this distinction more clear, we
would like to introduce another small container class called the
TaskTopicPartition, which just contains metadata about how a TopicPartition
relates to a given task, such as whether it is a source topic and whether
it is a changelog topic. The TaskInfo API will then be simplified by
removing the separate #inputTopicPartitions, #changelogTopicPartitions, and
#partitionToRackIds methods, and replacing these with a single method:

Set topicPartitions();

Please refer to the updated KIP for the complete definition of the new
TaskTopicPartition class


Thanks!
Sophie


On Wed, May 15, 2024 at 3:41 PM Sophie Blee-Goldman <
sop...@responsive.dev> wrote:


Thanks Bruno!

First, need to make one quick fix to what I said in the previous email
-- the new rackId() getter will be added to KafkaStreamsState, not
KafkaStreamsApplication (The KIP is correct, but what I put in the email
was not)

U1. I would actually prefer to keep the constructors as is, for reasons
I realize I forgot to mention. Let me know if this makes sense to you or
you would still prefer to break up the constructors anyways:

The KafkaStreamsApplication class has two required parameters and one
optional one. The required params are of course the processId and
assignment so imo it would not make sense to break these up across two
different constructors, since both have to be supplied. The
followupRebalanceDeadline on the other hand is purely optional, which is
why that one is in a separate, non-static constructor

Re: for vs of, unfortunately for is a protected keyword in java. I'm
open to other naming suggestions though. I actually personally prefer the
mor

Re: [DISCUSS] KIP-1035: StateStore managed changelog offsets

2024-06-04 Thread Matthias J. Sax

Nick,

Thanks a lot for updating the KIP. I made a pass over it. Overall LGTM. 
A few nits and some more minor questions:




200: nit (Javadocs for `StateStore.managesOffsets()`):


This is highly
recommended, if possible, to ensure that custom StateStores provide the 
consistency guarantees that Kafka Streams
expects when operating under the {@code exactly-once} {@code processing.mode}.


Given that we make it mandatory, we should rephrase this: "high 
recommended" does not seems to be strong enough wording.




201: Javadocs for `StateStore.commit(final Map 
changelogOffsets)`:



Implementations SHOULD ensure that {@code changelogOffsets} are 
committed to disk atomically with the
records they represent, if possible.


Not sure if I can follow? Why "should ensure", but not "must ensure"?



202: New metrics:

`commit-rate` -> Description says "The number of calls to..." -- Should 
be "The number of calls per second to..."?


`commit-latency-[]` -> Description says "The [] time taken to" -- Should 
be "The [] time in nanoseconds taken to..."? (or milliseconds in case we 
report in millis?)




203: Section "Consumer Rebalance Metadata"

We will then cache these offsets in-memory and close() these stores. 


I think we should not pro-actively close the store, but keep them open, 
until we get tasks assigned. For assigned tasks, we don't need to 
re-open the store, what provides a nice optimization. For other stores, 
we could close them at this point as there is no need to keep them open. 
-- However, this might all be internal implementation details and maybe 
we don't need to specify this on the KIP at all (might be best to just 
not say anything about this part)?




203: "managesOffsets deprecation"


 to allow for its removal in the next major release of Kafka Streams


We don't release Kafka Streams, but Kafka :) -- Also, it's not 
necessarily the next major release, as we have a one year / 3 releases 
guarantee to keep deprecated APIs and we don't know when the next major 
release will happen. Let's just rephrase this in a some more generic 
way: "for its removal in a future [major] release" or something like this.




204: "Downgrade":


by default the on-disk state for any Task containing a RocksDBStore will be 
wiped and restored from their changelogs.


This seems not to be correct? For this case, won't Kafka Streams just 
crash? And a manual store deletions would be required?




205: How do we intent to implement offset management for segmented 
stores? Are we going to add this new CL to _all_ segments? From a 
structure POV is seems best to add to all segments, but it seem 
sufficient to keep the information up-to-date only in the latest 
segments (what would imply that we need to copy the information from the 
current latest segment to a newly created segment explicitly) and only 
_read_ the information from the latest segment as older segments might 
contain stale metadata?




206: `KeyValueStoreTestDriver`:

This is a test class, right? So we don't need to cover it in the KIP, 
and can rename w/o a deprecation phase, as it's all internal code.





-Matthias




On 5/30/24 8:57 AM, Nick Telford wrote:

Hi everyone,

I didn't spot this before, but it looks like the API of
KeyValueStoreTestDriver will need to be updated to change the nomenclature
from "flushed" to "committed":

numFlushedEntryRemoved() -> numCommittedEntryRemoved()
numFlushedEntryStored() -> numCommittedEntryStored()
flushedEntryRemoved(K) -> committedEntryRemoved(K)
flushedEntryStored(K) -> committedEntryStored(K)

The old methods will obviously be marked as @Deprecated.

Any objections before I add this to the KIP?

Regards,
Nick


On Wed, 29 May 2024 at 11:20, Nick Telford  wrote:


I've updated the KIP with the following:

- Deprecation of StateStore#managesOffsets
- Change StateStore#commit to throw UnsupportedOperationException when
called from a Processor (via AbstractReadWriteDecorator)
- Updated consumer rebalance lag computation strategy


based on our Meet discussion
   - I've added a bit more detail here than we discussed, in
   particular around how we handle the offsets for tasks assigned to our 
local
   instance, and how we handle offsets when Tasks are closed/revoked.
- Improved downgrade behaviour
   - Note: users that don't downgrade with upgrade.from will still get
   the wipe-and-restore behaviour by-default.

I believe this covers all the outstanding changes that were requested.
Please let me know if I've missed anything or you think further changes are
needed.

Regards,
Nick

On Wed, 29 May 2024 at 09:28, Nick Telford  wrote:


Hi everyone,

Sorry I haven't got around to updating the KIP yet. Now that I've wrapped
up KIP-989, I'm going to be working on 1035 starting today.

I'll update 

[jira] [Resolved] (KAFKA-15305) The background thread should try to process the remaining task until the shutdown timer is expired

2024-06-04 Thread Chia-Ping Tsai (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chia-Ping Tsai resolved KAFKA-15305.

Resolution: Fixed

push to trunk and 3.8

> The background thread should try to process the remaining task until the 
> shutdown timer is expired
> --
>
> Key: KAFKA-15305
> URL: https://issues.apache.org/jira/browse/KAFKA-15305
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer
>Reporter: Philip Nee
>Assignee: Chia-Ping Tsai
>Priority: Major
>  Labels: consumer-threading-refactor, timeout
> Fix For: 3.8.0
>
>
> While working on https://issues.apache.org/jira/browse/KAFKA-15304
> close() API supplies a timeout parameter so that the consumer can have a 
> grace period to process things before shutting down.  The background thread 
> currently doesn't do that, when close() is initiated, it will immediately 
> close all of its dependencies.
>  
> This might not be desirable because there could be remaining tasks to be 
> processed before closing.  Maybe the correct things to do is to first stop 
> accepting API request, second, let the runOnce() continue to run before the 
> shutdown timer expires, then we can force closing all of its dependencies.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.8 #10

2024-06-04 Thread Apache Jenkins Server
See 




Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.7 #168

2024-06-04 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 457064 lines...]
[2024-06-04T21:33:58.013Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testStartZkBrokerWithAuthorizer(ClusterInstance) 
> testStartZkBrokerWithAuthorizer [1] Type=ZK, MetadataVersion=3.4-IV0, 
Security=PLAINTEXT STARTED
[2024-06-04T21:34:15.678Z] 
[2024-06-04T21:34:15.678Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testStartZkBrokerWithAuthorizer(ClusterInstance) 
> testStartZkBrokerWithAuthorizer [1] Type=ZK, MetadataVersion=3.4-IV0, 
Security=PLAINTEXT PASSED
[2024-06-04T21:34:15.678Z] 
[2024-06-04T21:34:15.678Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[1] Type=ZK, MetadataVersion=3.4-IV0, Security=PLAINTEXT STARTED
[2024-06-04T21:34:36.666Z] 
[2024-06-04T21:34:36.666Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[1] Type=ZK, MetadataVersion=3.4-IV0, Security=PLAINTEXT PASSED
[2024-06-04T21:34:36.666Z] 
[2024-06-04T21:34:36.666Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[2] Type=ZK, MetadataVersion=3.5-IV2, Security=PLAINTEXT STARTED
[2024-06-04T21:34:58.256Z] 
[2024-06-04T21:34:58.256Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[2] Type=ZK, MetadataVersion=3.5-IV2, Security=PLAINTEXT PASSED
[2024-06-04T21:34:58.256Z] 
[2024-06-04T21:34:58.256Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[3] Type=ZK, MetadataVersion=3.6-IV2, Security=PLAINTEXT STARTED
[2024-06-04T21:35:17.166Z] 
[2024-06-04T21:35:17.166Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[3] Type=ZK, MetadataVersion=3.6-IV2, Security=PLAINTEXT PASSED
[2024-06-04T21:35:17.166Z] 
[2024-06-04T21:35:17.166Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[4] Type=ZK, MetadataVersion=3.7-IV0, Security=PLAINTEXT STARTED
[2024-06-04T21:35:34.999Z] 
[2024-06-04T21:35:34.999Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[4] Type=ZK, MetadataVersion=3.7-IV0, Security=PLAINTEXT PASSED
[2024-06-04T21:35:34.999Z] 
[2024-06-04T21:35:34.999Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[5] Type=ZK, MetadataVersion=3.7-IV1, Security=PLAINTEXT STARTED
[2024-06-04T21:35:53.955Z] 
[2024-06-04T21:35:53.955Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[5] Type=ZK, MetadataVersion=3.7-IV1, Security=PLAINTEXT PASSED
[2024-06-04T21:35:53.955Z] 
[2024-06-04T21:35:53.955Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[6] Type=ZK, MetadataVersion=3.7-IV2, Security=PLAINTEXT STARTED
[2024-06-04T21:36:14.967Z] 
[2024-06-04T21:36:14.967Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[6] Type=ZK, MetadataVersion=3.7-IV2, Security=PLAINTEXT PASSED
[2024-06-04T21:36:14.967Z] 
[2024-06-04T21:36:14.967Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[7] Type=ZK, MetadataVersion=3.7-IV4, Security=PLAINTEXT STARTED
[2024-06-04T21:36:37.323Z] 
[2024-06-04T21:36:37.323Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[7] Type=ZK, MetadataVersion=3.7-IV4, Security=PLAINTEXT PASSED
[2024-06-04T21:36:37.323Z] 
[2024-06-04T21:36:37.323Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[8] Type=ZK, MetadataVersion=3.8-IV0, Security=PLAINTEXT STARTED
[2024-06-04T21:36:59.045Z] 
[2024-06-04T21:36:59.045Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkMigrationIntegrationTest > testDualWrite(ClusterInstance) > testDualWrite 
[8] Type=ZK, MetadataVersion=3.8-IV0, Security=PLAINTEXT PASSED
[2024-06-04T21:36:59.045Z] 
[2024-06-04T21:36:59.045Z] Gradle Test Run :core:test > Gradle Test Executor 94 
> ZkConfigMigrationClientTest > testScram() STARTED
[2024-06-04T21:37:00.770Z] 
[2024-06-04T21:37:00.771Z] Gradle Test Run :core

Re: [DISCUSS] KIP-655: Add deduplication processor in kafka-streams

2024-06-04 Thread Matthias J. Sax

Ayoub,

thanks for resurrecting this KIP. I think a built-in de-duplication 
operator will be very useful.



Couple of questions:



100: `deduplicationKeySelector`

Is this the best name? It might indicate that we select a "key" what is 
an overloaded term... Maybe we could use `Field` or `Id` or `Attribute` 
instead of `Key` in the name? Just brainstorming. If we think `Key` is 
the best word, I am also ok with it.




101: `Deduplicated` class

You propose to add static methods `keySerde()` and `valueSerde()` -- in 
other config classes, we use only `with(keySerde, valueSerde)` as we try 
to use the "builder" pattern, and avoid too many overloads. I would 
prefer to omit both methods you suggest and just use a single `with` for 
both serdes.


Similarly, I thing we don't want to add `with(...)` which takes all 
parameters at once (which should only be 3 parameters, not 4 as it's 
currently in the KIP)?




102: Usage of `WindowedStore`:

Would this be efficient? The physical byte layout it "" for 
the store key, so it would be difficult to do an efficient lookup for a 
given "de-duplication key" to discard duplicates, as we don't know the 
timestamp of the first record with the same "de-duplication key".


This boils down to the actual de-duplication logic (some more comments 
below), but what you propose seems to require expensive range-scans what 
could be cost prohibitive in practice. I think we need to find a way to 
use efficient key-point-lookups to make this work.




103: "Processing logic":

Might need some updates (Cf 102 comment). I am not sure if I fully 
understand the logic: cf 105 below.




104:

If no entries found → forward the record + save the record in the store 


This part is critical, and we should discuss in detail. In the end, 
de-duplication does only make sense when EOS is used, and we might want 
to call this out (eg, on the JavaDocs)? But if used with ALOS, it's very 
difficult to ensure that we never lose data... Your proposal to 
first-forward goes into the right direction, but does not really solve 
the problem entirely:


Even if we forward the message first, all downstream processing happens, 
`context.forward()` returns and we update the state store, we could now 
crash w/o committing offsets. For this case, we have no guarantee that 
the result records where published (as we did not flush the producer 
yet), but when re-reading from the input topic, we would find the record 
in the store and incorrectly drop as duplicate...


I think the only solution to make this work would be to use TX-state 
stores in combination with ALOS as proposed via KIP-892?


Using an in-memory store won't help much either? The producer could have 
send the write into the changelog topic, but not into the result topic, 
and thus we could still not guarantee ALOS...?


How do we want to go about this? We could also say, this new operator 
only works with EOS. Would this be too restrictive? -- At lest for know, 
until KIP-892 lands, and we could relax it?




105: "How to detect late records"

In the end, it seems to boil down to determine which of the records to 
forward and which record to drop, for (1) the regular case and (2) the 
out-of-order data case.


Regular case (no out-of-order data): For this case, offset and ts order 
is the same, and we can forward the first record we get. All later 
record within "de-duplication period" with the same "de-duplication key" 
would be dropped. If a record with the same "de-duplication key" arrives 
after "de-duplication period" passed, we cannot drop it any longer, but 
would still forward it, as by the contract of the operator / 
de-duplication period.


For the out-of-order case: The first question we need to answer is, do 
we want to forward the record with the smallest offset or the record 
with the smallest ts? Logically, forwarding with the smallest ts might 
be "more correct", however, it implies we could only forward it after 
"de-duplication period" passed, what might be undesired latency? Would 
this be desired/acceptable?


In contrast, if we forward record with the smallest offset (this is what 
you seem to propose) we don't have a latency issue, but of course the 
question what records to drop is more tricky to answer: it seems you 
propose to compare the time difference of the stored record to the 
current record, but I am wondering why? Would it not be desired to drop 
all duplicates independent of their ts, as long as we find a record in 
the store? Would be good to get some more motivation and tradeoffs 
discussed about the different strategies we could use.


You also propose to drop _any_ late record: I am also not sure if that's 
desired? Could this not lead to data loss? Assume we get a late record, 
but in fact there was never a duplicate? Why would we want to drop it? 
If there is a late record which is indeed a duplicate, but we purged the 
original record from the store already, it seems to be the same case as 
for the "

Re: [DISCUSS] KIP-1049: Add config log.summary.interval.ms to Kafka Streams

2024-06-04 Thread Matthias J. Sax

Jiang,

Thanks for the KIP. I think it make sense. I agree with Sophie that the 
KIP writeup should be improved a little bit, with regard to public API 
(ie, config) which are changed.


The only other idea I had to avoid this issue would be some internal 
change: we would introduce a new logger class allowing you to disable 
logging for this specific logger. However, not sure how we could make 
this a public contract you could rely on? -- Also, given Sophie's 
comment about potential other logs we might want to configure, it seems 
like a good idea to add this config.


@Sophie: what other logs did you have in mind?


One more nit: rejected alternatives lists `log.summary.interval.ms` as 
rejected -- seems this needs to be removed?



-Matthias

On 5/29/24 12:56 AM, Sophie Blee-Goldman wrote:

Sure, as I said I'm supportive of this KIP. Just wanted to mention how the
issue could be mitigated in the meantime since the description made it
sound like you were suffering from excessive logs right now. Apologies if I
misinterpreted that.

I do think it would be nice to have a general setting for log intervals in
Streams. There are some other places where a regular summary log might be
nice. The config name you proposed is generic enough that we could reuse it
for other areas where we'd like to log summaries, so this seems like a good
config to introduce

My only question/request is that the KIP doesn't mention where this config
is being added. I assume from the context and Motivation section that
you're proposing to add this to StreamsConfig, which makes sense to me. But
please update the KIP to say this somewhere.

Otherwise the KIP LGTM. Anyone else have thoughts on this?

On Thu, May 23, 2024 at 12:19 AM jiang dou  wrote:


Thank you for your reply,
I do not recommend agreeing set log level is WARN, because INFO level logs
should be useful


Sophie Blee-Goldman  于2024年5月23日周四 04:30写道:


Thanks for the KIP!

I'm not against adding this as a config for this per se, but if this is
causing you trouble right now you should be able to disable it via log4j
configuration so you don't need to wait for a fix in Kafka Streams

itself.

Putting something like this in your log4j will shut off the offending

log:




log4j.logger.org.apache.kafka.streams.processor.internals.StreamThread=WARN


On Wed, May 22, 2024 at 6:46 AM jiang dou  wrote:


Hi


I would like to propose a change in the kafka-stream summary log。

Now the summary of stream-tread is record every two minutes, and not
support close  or update intervals.

When the kafka  is running, this is absolutely unnecessary and even

harmful

since it fills the logs and thus storage space with unwanted and

useless

data.

I propose adding a configuration to control the output interval or

disable

it

KIP:





https://cwiki.apache.org/confluence/display/KAFKA/KIP-1049%3A+Add+config+log.summary.interval.ms+to+Kafka+Streams










[jira] [Resolved] (KAFKA-16814) KRaft broker cannot startup when `partition.metadata` is missing

2024-06-04 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16814.
---
Resolution: Fixed

> KRaft broker cannot startup when `partition.metadata` is missing
> 
>
> Key: KAFKA-16814
> URL: https://issues.apache.org/jira/browse/KAFKA-16814
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
>Reporter: Luke Chen
>Assignee: Kuan Po Tseng
>Priority: Blocker
> Fix For: 3.8.0, 3.7.1
>
>
> When starting up kafka logManager, we'll check stray replicas to avoid some 
> corner cases. But this check might cause broker unable to startup if 
> `partition.metadata` is missing because when startup kafka, we load log from 
> file, and the topicId of the log is coming from `partition.metadata` file. 
> So, if `partition.metadata` is missing, the topicId will be None, and the 
> `LogManager#isStrayKraftReplica` will fail with no topicID error.
> The `partition.metadata` missing could be some storage failure, or another 
> possible path is unclean shutdown after topic is created in the replica, but 
> before data is flushed into `partition.metadata` file. This is possible 
> because we do the flush in async way 
> [here|https://github.com/apache/kafka/blob/5552f5c26df4eb07b2d6ee218e4a29e4ca790d5c/core/src/main/scala/kafka/log/UnifiedLog.scala#L229].
>  
>  
> {code:java}
> ERROR Encountered fatal fault: Error starting LogManager 
> (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
> java.lang.RuntimeException: The log dir 
> Log(dir=/tmp/kraft-broker-logs/quickstart-events-0, topic=quickstart-events, 
> partition=0, highWatermark=0, lastStableOffset=0, logStartOffset=0, 
> logEndOffset=0) does not have a topic ID, which is not allowed when running 
> in KRaft mode.
>     at 
> kafka.log.LogManager$.$anonfun$isStrayKraftReplica$1(LogManager.scala:1609)
>     at scala.Option.getOrElse(Option.scala:201)
>     at kafka.log.LogManager$.isStrayKraftReplica(LogManager.scala:1608)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$initializeManagers$1(BrokerMetadataPublisher.scala:294)
>     at 
> kafka.server.metadata.BrokerMetadataPublisher.$anonfun$initializeManagers$1$adapted(BrokerMetadataPublisher.scala:294)
>     at kafka.log.LogManager.loadLog(LogManager.scala:359)
>     at kafka.log.LogManager.$anonfun$loadLogs$15(LogManager.scala:493)
>     at 
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577)
>     at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
>     at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
>     at java.base/java.lang.Thread.run(Thread.java:1623) {code}
>  
> Because if we don't do the isStrayKraftReplica check, the topicID and the 
> `partition.metadata` will get recovered after getting topic partition update 
> and becoming leader or follower later. I'm proposing we delete the 
> `isStrayKraftReplica` check if topicID is None (see below), instead of 
> throwing exception to terminate the kafka. 
>  
>  
> === update ===
> Checked KAFKA-14616 and KAFKA-15605, our purpose of finding strayReplicas and 
> delete them is because the replica should be deleted, but left in the log 
> dir. So, if we have a replica that doesn't have topicID (due to 
> `partition.metadata` is missing), then we cannot identify if this is a stray 
> replica or not. In this case, we can do:
>  # Delete it
>  # Ignore it
> For (1), the impact is, if this is not a stray replica, and the 
> replication-factor only has 1, then the data might be moved to another 
> "xxx-stray" dir, and the partition becomes empty.
> For (2), the impact is, if this is a stray replica and we didn't delete it, 
> it might cause partition dir is not created as in KAFKA-15605 or KAFKA-14616.
> As the investigation above, this `partition.metadata` missing issue is mostly 
> because the async `partition.metadata` when creating a topic. Later, before 
> any data append into log, we must make sure partition metadata file is 
> written to the log dir 
> [here|https://github.com/apache/kafka/blob/5552f5c26df4eb07b2d6ee218e4a29e4ca790d5c/core/src/main/scala/kafka/log/UnifiedLog.scala#L772-L774].
>  So, it should be fine if we delete it since the topic should be empty.
> In short, when finding a log without topicID, we should treat it as a stray 
> log and then delete it.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-16843) Remove preAppendErrors from createPutCacheCallback

2024-06-04 Thread PoAn Yang (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

PoAn Yang resolved KAFKA-16843.
---
Resolution: Fixed

> Remove preAppendErrors from createPutCacheCallback
> --
>
> Key: KAFKA-16843
> URL: https://issues.apache.org/jira/browse/KAFKA-16843
> Project: Kafka
>  Issue Type: Improvement
>Reporter: Chia-Ping Tsai
>Assignee: PoAn Yang
>Priority: Minor
>
> origin discussion: 
> [https://github.com/apache/kafka/pull/16072#pullrequestreview-2077368462]
> The method `createPutCacheCallback` has a input argument `preAppendErrors` 
> [0]. It is used to keep the "error" happens before appending. However, the 
> pre-append error is handled before by calling `responseCallback` [1]. Hence, 
> we can remove `preAppendErrors`.
>  
> [0] 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/coordinator/group/GroupMetadataManager.scala#L387
> [1] 
> https://github.com/apache/kafka/blob/4f55786a8a86fe228a0b10a2f28529f5128e5d6f/core/src/main/scala/kafka/coordinator/group/GroupCoordinator.scala#L927C15-L927C84



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » 3.8 #11

2024-06-04 Thread Apache Jenkins Server
See 




Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.7 #169

2024-06-04 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 460623 lines...]
[2024-06-05T01:53:15.987Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > testMigrateTopicDeletions(ClusterInstance) > 
testMigrateTopicDeletions [8] Type=ZK, MetadataVersion=3.8-IV0, 
Security=PLAINTEXT SKIPPED
[2024-06-05T01:53:15.987Z] 
[2024-06-05T01:53:15.987Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > testDeleteLogOnStartup(ClusterInstance) > 
testDeleteLogOnStartup [1] Type=ZK, MetadataVersion=3.8-IV0, Security=PLAINTEXT 
STARTED
[2024-06-05T01:53:36.189Z] 
[2024-06-05T01:53:36.189Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > testDeleteLogOnStartup(ClusterInstance) > 
testDeleteLogOnStartup [1] Type=ZK, MetadataVersion=3.8-IV0, Security=PLAINTEXT 
PASSED
[2024-06-05T01:53:36.189Z] 
[2024-06-05T01:53:36.189Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > 
testPartitionReassignmentInHybridMode(ClusterInstance) > 
testPartitionReassignmentInHybridMode [1] Type=ZK, MetadataVersion=3.7-IV0, 
Security=PLAINTEXT STARTED
[2024-06-05T01:53:54.100Z] 
[2024-06-05T01:53:54.100Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > 
testPartitionReassignmentInHybridMode(ClusterInstance) > 
testPartitionReassignmentInHybridMode [1] Type=ZK, MetadataVersion=3.7-IV0, 
Security=PLAINTEXT PASSED
[2024-06-05T01:53:54.100Z] 
[2024-06-05T01:53:54.100Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > 
testIncrementalAlterConfigsPreMigration(ClusterInstance) > 
testIncrementalAlterConfigsPreMigration [1] Type=ZK, MetadataVersion=3.4-IV0, 
Security=PLAINTEXT STARTED
[2024-06-05T01:53:57.725Z] 
[2024-06-05T01:53:57.725Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > 
testIncrementalAlterConfigsPreMigration(ClusterInstance) > 
testIncrementalAlterConfigsPreMigration [1] Type=ZK, MetadataVersion=3.4-IV0, 
Security=PLAINTEXT PASSED
[2024-06-05T01:53:57.725Z] 
[2024-06-05T01:53:57.725Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > testDualWriteScram(ClusterInstance) > 
testDualWriteScram [1] Type=ZK, MetadataVersion=3.5-IV2, Security=PLAINTEXT 
STARTED
[2024-06-05T01:54:07.611Z] 
[2024-06-05T01:54:07.611Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > testDualWriteScram(ClusterInstance) > 
testDualWriteScram [1] Type=ZK, MetadataVersion=3.5-IV2, Security=PLAINTEXT 
PASSED
[2024-06-05T01:54:07.611Z] 
[2024-06-05T01:54:07.611Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > 
testNewAndChangedTopicsInDualWrite(ClusterInstance) > 
testNewAndChangedTopicsInDualWrite [1] Type=ZK, MetadataVersion=3.4-IV0, 
Security=PLAINTEXT STARTED
[2024-06-05T01:54:19.434Z] 
[2024-06-05T01:54:19.434Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > 
testNewAndChangedTopicsInDualWrite(ClusterInstance) > 
testNewAndChangedTopicsInDualWrite [1] Type=ZK, MetadataVersion=3.4-IV0, 
Security=PLAINTEXT PASSED
[2024-06-05T01:54:19.434Z] 
[2024-06-05T01:54:19.434Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > testDualWriteQuotaAndScram(ClusterInstance) > 
testDualWriteQuotaAndScram [1] Type=ZK, MetadataVersion=3.5-IV2, 
Security=PLAINTEXT STARTED
[2024-06-05T01:54:31.086Z] 
[2024-06-05T01:54:31.086Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > testDualWriteQuotaAndScram(ClusterInstance) > 
testDualWriteQuotaAndScram [1] Type=ZK, MetadataVersion=3.5-IV2, 
Security=PLAINTEXT PASSED
[2024-06-05T01:54:31.086Z] 
[2024-06-05T01:54:31.086Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > testMigrate(ClusterInstance) > testMigrate [1] 
Type=ZK, MetadataVersion=3.4-IV0, Security=PLAINTEXT STARTED
[2024-06-05T01:54:33.206Z] 
[2024-06-05T01:54:33.206Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > testMigrate(ClusterInstance) > testMigrate [1] 
Type=ZK, MetadataVersion=3.4-IV0, Security=PLAINTEXT PASSED
[2024-06-05T01:54:33.206Z] 
[2024-06-05T01:54:33.206Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > testMigrateAcls(ClusterInstance) > 
testMigrateAcls [1] Type=ZK, MetadataVersion=3.4-IV0, Security=PLAINTEXT STARTED
[2024-06-05T01:54:35.496Z] 
[2024-06-05T01:54:35.496Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegrationTest > testMigrateAcls(ClusterInstance) > 
testMigrateAcls [1] Type=ZK, MetadataVersion=3.4-IV0, Security=PLAINTEXT PASSED
[2024-06-05T01:54:35.496Z] 
[2024-06-05T01:54:35.496Z] Gradle Test Run :core:test > Gradle Test Executor 96 
> ZkMigrationIntegr

Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2971

2024-06-04 Thread Apache Jenkins Server
See 




Build failed in Jenkins: Kafka » Kafka Branch Builder » 3.7 #170

2024-06-04 Thread Apache Jenkins Server
See 


Changes:


--
[...truncated 2825 lines...]
[2024-06-05T02:03:09.419Z] [Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.7/core/src/main/scala/kafka/utils/Implicits.scala:58:4:
 @nowarn annotation does not suppress any warnings
[2024-06-05T02:03:09.419Z] [Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.7/core/src/main/scala/kafka/raft/KafkaMetadataLog.scala:473:4:
 @nowarn annotation does not suppress any warnings
[2024-06-05T02:03:09.419Z] [Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.7/core/src/main/scala/kafka/security/authorizer/AclAuthorizer.scala:535:4:
 @nowarn annotation does not suppress any warnings
[2024-06-05T02:03:09.419Z] [Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.7/core/src/main/scala/kafka/tools/ConsoleProducer.scala:38:2:
 @nowarn annotation does not suppress any warnings
[2024-06-05T02:03:09.419Z] [Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.7/core/src/main/scala/kafka/utils/CoreUtils.scala:290:4:
 @nowarn annotation does not suppress any warnings
[2024-06-05T02:03:09.419Z] 17 warnings found
[2024-06-05T02:03:09.419Z] 
[2024-06-05T02:03:09.419Z] > Task :core:classes
[2024-06-05T02:03:09.419Z] > Task :core:compileTestJava NO-SOURCE
[2024-06-05T02:03:10.707Z] > Task :core:checkstyleMain
[2024-06-05T02:03:10.707Z] > Task :shell:compileJava
[2024-06-05T02:03:10.707Z] > Task :shell:classes
[2024-06-05T02:03:11.821Z] > Task :shell:checkstyleMain
[2024-06-05T02:03:20.068Z] > Task :shell:spotbugsMain
[2024-06-05T02:03:21.315Z] > Task :group-coordinator:checkstyleTest
[2024-06-05T02:03:21.315Z] > Task :group-coordinator:check
[2024-06-05T02:03:24.116Z] > Task :streams:streams-scala:spotbugsMain
[2024-06-05T02:03:26.569Z] > Task :metadata:checkstyleTest
[2024-06-05T02:03:26.569Z] > Task :metadata:check
[2024-06-05T02:03:28.485Z] > Task :clients:check
[2024-06-05T02:03:40.760Z] > Task :core:compileTestScala
[2024-06-05T02:03:40.760Z] [Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.7/core/src/test/scala/unit/kafka/server/AdvertiseBrokerTest.scala:22:21:
 imported `QuorumTestHarness` is permanently hidden by definition of object 
QuorumTestHarness in package server
[2024-06-05T02:03:40.760Z] [Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.7/core/src/test/scala/unit/kafka/server/BrokerEpochIntegrationTest.scala:27:21:
 imported `QuorumTestHarness` is permanently hidden by definition of object 
QuorumTestHarness in package server
[2024-06-05T02:03:43.563Z] [Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.7/core/src/test/scala/unit/kafka/server/DynamicConfigTest.scala:20:21:
 imported `QuorumTestHarness` is permanently hidden by definition of object 
QuorumTestHarness in package server
[2024-06-05T02:03:43.563Z] [Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.7/core/src/test/scala/unit/kafka/server/FetcherThreadTestUtils.scala:21:21:
 imported `InitialFetchState` is permanently hidden by definition of object 
InitialFetchState in package server
[2024-06-05T02:03:45.670Z] [Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.7/core/src/test/scala/unit/kafka/server/KafkaMetricReporterClusterIdTest.scala:23:21:
 imported `QuorumTestHarness` is permanently hidden by definition of object 
QuorumTestHarness in package server
[2024-06-05T02:03:45.670Z] [Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.7/core/src/test/scala/unit/kafka/server/LeaderElectionTest.scala:32:21:
 imported `QuorumTestHarness` is permanently hidden by definition of object 
QuorumTestHarness in package server
[2024-06-05T02:03:46.784Z] [Warn] 
/home/jenkins/jenkins-agent/workspace/Kafka_kafka_3.7/core/src/test/scala/unit/kafka/server/LogRecoveryTest.scala:25:21:
 imported `QuorumTestHarness` is permanently hidden by definition of object 
QuorumTestHarness in package server
[2024-06-05T02:04:17.765Z] 
[2024-06-05T02:04:17.765Z] > Task :core:compileScala
[2024-06-05T02:04:17.765Z] Unexpected javac output: warning: [options] 
bootstrap class path not set in conjunction with -source 8
[2024-06-05T02:04:17.765Z] warning: [options] source value 8 is obsolete and 
will be removed in a future release
[2024-06-05T02:04:17.765Z] warning: [options] target value 8 is obsolete and 
will be removed in a future release
[2024-06-05T02:04:17.765Z] warning: [options] To suppress warnings about 
obsolete options, use -Xlint:-options.
[2024-06-05T02:04:17.765Z] 
/home/jenkins/workspace/Kafka_kafka_3.7/core/src/main/java/kafka/log/remote/RemoteLogManager.java:236:
 warning: [removal] AccessController in java.security has been deprecated and 
marked for removal
[2024-06-05T02:04:17.765Z] return 
java.security.AccessController.doPrivileged(new 
PrivilegedAction() {
[2024-06-05T02:04:17.765Z] ^
[2024-06-05T02:04:17.765Z] 
/home/jenkins/workspace/Kafka_kafka_3.7/core/src/main/

[jira] [Resolved] (KAFKA-16662) UnwritableMetadataException: Metadata has been lost

2024-06-04 Thread Luke Chen (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luke Chen resolved KAFKA-16662.
---
Fix Version/s: 3.8.0
   3.7.1
   Resolution: Duplicate

> UnwritableMetadataException: Metadata has been lost
> ---
>
> Key: KAFKA-16662
> URL: https://issues.apache.org/jira/browse/KAFKA-16662
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.7.0
> Environment: Docker Image (bitnami/kafka:3.7.0)
> via Docker Compose
>Reporter: Tobias Bohn
>Priority: Major
> Fix For: 3.8.0, 3.7.1
>
> Attachments: log.txt
>
>
> Hello,
> First of all: I am new to this Jira and apologize if anything is set or 
> specified incorrectly. Feel free to advise me.
> We currently have an error in our test system, which unfortunately I can't 
> solve, because I couldn't find anything related to it. No solution could be 
> found via the mailing list either.
> The error occurs when we want to start up a node. The node runs using Kraft 
> and is both a controller and a broker. The following error message appears at 
> startup:
> {code:java}
> kafka  | [2024-04-16 06:18:13,707] ERROR Encountered fatal fault: Unhandled 
> error initializing new publishers 
> (org.apache.kafka.server.fault.ProcessTerminatingFaultHandler)
> kafka  | org.apache.kafka.image.writer.UnwritableMetadataException: Metadata 
> has been lost because the following could not be represented in metadata 
> version 3.5-IV2: the directory assignment state of one or more replicas
> kafka  |        at 
> org.apache.kafka.image.writer.ImageWriterOptions.handleLoss(ImageWriterOptions.java:94)
> kafka  |        at 
> org.apache.kafka.metadata.PartitionRegistration.toRecord(PartitionRegistration.java:391)
> kafka  |        at org.apache.kafka.image.TopicImage.write(TopicImage.java:71)
> kafka  |        at 
> org.apache.kafka.image.TopicsImage.write(TopicsImage.java:84)
> kafka  |        at 
> org.apache.kafka.image.MetadataImage.write(MetadataImage.java:155)
> kafka  |        at 
> org.apache.kafka.image.loader.MetadataLoader.initializeNewPublishers(MetadataLoader.java:295)
> kafka  |        at 
> org.apache.kafka.image.loader.MetadataLoader.lambda$scheduleInitializeNewPublishers$0(MetadataLoader.java:266)
> kafka  |        at 
> org.apache.kafka.queue.KafkaEventQueue$EventContext.run(KafkaEventQueue.java:127)
> kafka  |        at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.handleEvents(KafkaEventQueue.java:210)
> kafka  |        at 
> org.apache.kafka.queue.KafkaEventQueue$EventHandler.run(KafkaEventQueue.java:181)
> kafka  |        at java.base/java.lang.Thread.run(Thread.java:840)
> kafka exited with code 0 {code}
> We use Docker to operate the cluster. The error occurred while we were trying 
> to restart a node. All other nodes in the cluster are still running correctly.
> If you need further information, please let us know. The complete log is 
> attached to this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Jenkins build is still unstable: Kafka » Kafka Branch Builder » trunk #2972

2024-06-04 Thread Apache Jenkins Server
See