[jira] [Updated] (KAFKA-16041) Replace Afterburn module with Blackbird

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16041?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-16041:

Component/s: KafkaConnect
 (was: connect)

> Replace Afterburn module with Blackbird
> ---
>
> Key: KAFKA-16041
> URL: https://issues.apache.org/jira/browse/KAFKA-16041
> Project: Kafka
>  Issue Type: Task
>  Components: KafkaConnect
>Reporter: Mario Fiore Vitale
>Priority: Major
> Fix For: 4.0.0
>
>
> [Blackbird|https://github.com/FasterXML/jackson-modules-base/blob/master/blackbird/README.md]
>  is the Afterburn replacement for Java 11+



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15976) KIP-995: Allow users to specify initial offsets while creating connectors

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15976:

Component/s: KafkaConnect
 (was: connect)

> KIP-995: Allow users to specify initial offsets while creating connectors
> -
>
> Key: KAFKA-15976
> URL: https://issues.apache.org/jira/browse/KAFKA-15976
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Ashwin Pankaj
>Assignee: Ashwin Pankaj
>Priority: Major
>
> Allow setting the initial offset for a connector in the connector creation 
> API.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-16051) Deadlock on connector initialization

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-16051?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-16051:

Component/s: KafkaConnect
 (was: connect)

> Deadlock on connector initialization
> 
>
> Key: KAFKA-16051
> URL: https://issues.apache.org/jira/browse/KAFKA-16051
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 2.6.3, 3.6.1
>Reporter: Octavian Ciubotaru
>Priority: Major
>
>  
> Tested with Kafka 3.6.1 and 2.6.3.
> The only plugin installed is confluentinc-kafka-connect-jdbc-10.7.4.
> Stack trace for Kafka 3.6.1:
> {noformat}
> Found one Java-level deadlock:
> =
> "pool-3-thread-1":
>   waiting to lock monitor 0x7fbc88006300 (object 0x91002aa0, a 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder),
>   which is held by "Thread-9"
> "Thread-9":
>   waiting to lock monitor 0x7fbc88008800 (object 0x9101ccd8, a 
> org.apache.kafka.connect.storage.MemoryConfigBackingStore),
>   which is held by "pool-3-thread-1"Java stack information for the threads 
> listed above:
> ===
> "pool-3-thread-1":
>     at 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder$ConfigUpdateListener.onTaskConfigUpdate(StandaloneHerder.java:516)
>     - waiting to lock <0x91002aa0> (a 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder)
>     at 
> org.apache.kafka.connect.storage.MemoryConfigBackingStore.putTaskConfigs(MemoryConfigBackingStore.java:137)
>     - locked <0x9101ccd8> (a 
> org.apache.kafka.connect.storage.MemoryConfigBackingStore)
>     at 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.updateConnectorTasks(StandaloneHerder.java:483)
>     at 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.lambda$null$2(StandaloneHerder.java:229)
>     at 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder$$Lambda$692/0x000840557440.run(Unknown
>  Source)
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.21/Executors.java:515)
>     at 
> java.util.concurrent.FutureTask.run(java.base@11.0.21/FutureTask.java:264)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.21/ScheduledThreadPoolExecutor.java:304)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.21/ThreadPoolExecutor.java:1128)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.21/ThreadPoolExecutor.java:628)
>     at java.lang.Thread.run(java.base@11.0.21/Thread.java:829)
> "Thread-9":
>     at 
> org.apache.kafka.connect.storage.MemoryConfigBackingStore.putTaskConfigs(MemoryConfigBackingStore.java:129)
>     - waiting to lock <0x9101ccd8> (a 
> org.apache.kafka.connect.storage.MemoryConfigBackingStore)
>     at 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.updateConnectorTasks(StandaloneHerder.java:483)
>     at 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.requestTaskReconfiguration(StandaloneHerder.java:255)
>     - locked <0x91002aa0> (a 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder)
>     at 
> org.apache.kafka.connect.runtime.HerderConnectorContext.requestTaskReconfiguration(HerderConnectorContext.java:50)
>     at 
> org.apache.kafka.connect.runtime.WorkerConnector$WorkerConnectorContext.requestTaskReconfiguration(WorkerConnector.java:548)
>     at 
> io.confluent.connect.jdbc.source.TableMonitorThread.run(TableMonitorThread.java:86)
> Found 1 deadlock.
> {noformat}
> The jdbc source connector is loading tables from the database and updates the 
> configuration once the list is available. The deadlock is very consistent in 
> my environment, probably because the database is on the same machine.
> Maybe it is possible to avoid this situation by always locking the herder 
> first and the config backing store second. From what I see, 
> updateConnectorTasks sometimes is called before locking on herder and other 
> times it is not.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15888) DistributedHerder log context should not use the same client ID for each Connect worker by default

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15888:

Component/s: (was: connect)

> DistributedHerder log context should not use the same client ID for each 
> Connect worker by default
> --
>
> Key: KAFKA-15888
> URL: https://issues.apache.org/jira/browse/KAFKA-15888
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Yash Mayya
>Assignee: Yash Mayya
>Priority: Minor
> Fix For: 3.7.0
>
>
> By default, if there is no "{{{}client.id"{}}} configured on a Connect worker 
> running in distributed mode, the same client ID ("connect-1") will be used in 
> the log context for the DistributedHerder class in every single worker in the 
> Connect cluster. This default is quite confusing and obviously not very 
> useful. Further, based on how this default is configured 
> ([ref|https://github.com/apache/kafka/blob/150b0e8290cda57df668ba89f6b422719866de5a/connect/runtime/src/main/java/org/apache/kafka/connect/runtime/distributed/DistributedHerder.java#L299]),
>  it seems like this might have been an unintentional bug. We could simply use 
> the workerId (the advertised host name and port of the worker) by default 
> instead, which should be unique for each worker in a cluster.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15841) Add Support for Topic-Level Partitioning in Kafka Connect

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15841:

Component/s: KafkaConnect
 (was: connect)

> Add Support for Topic-Level Partitioning in Kafka Connect
> -
>
> Key: KAFKA-15841
> URL: https://issues.apache.org/jira/browse/KAFKA-15841
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Reporter: Henrique Mota
>Priority: Trivial
>
> In our organization, we utilize JDBC sink connectors to consume data from 
> various topics, where each topic is dedicated to a specific tenant with a 
> single partition. Recently, we developed a custom sink based on the standard 
> JDBC sink, enabling us to pause consumption of a topic when encountering 
> problematic records.
> However, we face limitations within Kafka Connect, as it doesn't allow for 
> appropriate partitioning of topics among workers. We attempted a workaround 
> by breaking down the topics list within the 'topics' parameter. 
> Unfortunately, Kafka Connect overrides this parameter after invoking the 
> {{taskConfigs(int maxTasks)}} method from the 
> {{org.apache.kafka.connect.connector.Connector}} class.
> We request the addition of support in Kafka Connect to enable the 
> partitioning of topics among workers without requiring a fork. This 
> enhancement would facilitate better load distribution and allow for more 
> flexible configurations, particularly in scenarios where topics are dedicated 
> to different tenants.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15524) Flaky test org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testResetSinkConnectorOffsetsZombieSinkTasks

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15524:

Component/s: KafkaConnect
 (was: connect)

> Flaky test 
> org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testResetSinkConnectorOffsetsZombieSinkTasks
> --
>
> Key: KAFKA-15524
> URL: https://issues.apache.org/jira/browse/KAFKA-15524
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.6.0, 3.5.1
>Reporter: Josep Prat
>Priority: Major
>  Labels: flaky, flaky-test
>
> Last seen: 
> [https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-14458/3/testReport/junit/org.apache.kafka.connect.integration/OffsetsApiIntegrationTest/Build___JDK_17_and_Scala_2_13___testResetSinkConnectorOffsetsZombieSinkTasks/]
>  
> h3. Error Message
> {code:java}
> java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: The request timed out.{code}
> h3. Stacktrace
> {code:java}
> java.lang.RuntimeException: java.util.concurrent.ExecutionException: 
> org.apache.kafka.common.errors.TimeoutException: The request timed out. at 
> org.apache.kafka.connect.util.clusters.EmbeddedKafkaCluster.createTopic(EmbeddedKafkaCluster.java:427)
>  at 
> org.apache.kafka.connect.util.clusters.EmbeddedKafkaCluster.createTopic(EmbeddedKafkaCluster.java:401)
>  at 
> org.apache.kafka.connect.util.clusters.EmbeddedKafkaCluster.createTopic(EmbeddedKafkaCluster.java:392)
>  at 
> org.apache.kafka.connect.integration.OffsetsApiIntegrationTest.testResetSinkConnectorOffsetsZombieSinkTasks(OffsetsApiIntegrationTest.java:763)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:568) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
>  at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
>  at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
>  at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
> org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
> org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
> org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
> org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
> org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
> org.junit.runners.ParentRunner.run(ParentRunner.java:413) at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:112)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
>  at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:40)
>  at 
> org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:60)
>  at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:52)
>  at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
>  at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.base/java.lang.reflect.Method.invoke(Method.java:568) at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
>  at 
> org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>  at 
> 

[jira] [Updated] (KAFKA-15570) Add unit tests for MemoryConfigBackingStore

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15570:

Component/s: (was: connect)

> Add unit tests for MemoryConfigBackingStore
> ---
>
> Key: KAFKA-15570
> URL: https://issues.apache.org/jira/browse/KAFKA-15570
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect
>Reporter: Yash Mayya
>Assignee: Yash Mayya
>Priority: Minor
>
> Currently, the 
> [MemoryConfigBackingStore|https://github.com/apache/kafka/blob/6e164bb9ace3ea7a1a9542904d1a01c9fd3a1b48/connect/runtime/src/main/java/org/apache/kafka/connect/storage/MemoryConfigBackingStore.java#L37]
>  class doesn't have any unit tests for its functionality. While most of its 
> functionality is fairly lightweight today, changes will be introduced with 
> [KIP-980|https://cwiki.apache.org/confluence/display/KAFKA/KIP-980%3A+Allow+creating+connectors+in+a+stopped+state]
>  (potentially 
> [KIP-976|https://cwiki.apache.org/confluence/display/KAFKA/KIP-976%3A+Cluster-wide+dynamic+log+adjustment+for+Kafka+Connect]
>  as well) and it would be good to have a test setup in place before those 
> changes are made.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15407) Not able to connect to kafka from the Private NLB from outside the VPC account

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15407:

Component/s: KafkaConnect
 (was: connect)

> Not able to connect to kafka from the Private NLB from outside the VPC 
> account 
> ---
>
> Key: KAFKA-15407
> URL: https://issues.apache.org/jira/browse/KAFKA-15407
> Project: Kafka
>  Issue Type: Bug
>  Components: clients, consumer, KafkaConnect, producer , protocol
> Environment: Staging, PROD
>Reporter: Shivakumar
>Priority: Blocker
> Attachments: image-2023-08-28-12-37-33-100.png
>
>
> !image-2023-08-28-12-37-33-100.png|width=768,height=223!
> Problem statement : 
> We are trying to connect Kafka from another account/VPC account
> Our kafka is in EKS cluster , we have service pointing to these pods for 
> connection
> We tried to create private link endpoint form Account B to connect to our NLB 
> to connect to our Kafka in Account A
> We see the connection reset from both client and target(kafka) in the NLB 
> monitoring tab of AWS.
> We tried various combo of listeners and advertised listeners which did not 
> help us.
> We are assuming we are missing some combination of Listeners and Network 
> level configs with which this connection can be made 
> Can you please guide us with this as we are blocked with a major migration. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15470) Allow creating connectors in a stopped state

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15470:

Component/s: (was: connect)

> Allow creating connectors in a stopped state
> 
>
> Key: KAFKA-15470
> URL: https://issues.apache.org/jira/browse/KAFKA-15470
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Yash Mayya
>Assignee: Yash Mayya
>Priority: Major
>  Labels: connect, kafka-connect, kip-required
> Fix For: 3.7.0
>
>
> [KIP-875: First-class offsets support in Kafka 
> Connect|https://cwiki.apache.org/confluence/display/KAFKA/KIP-875%3A+First-class+offsets+support+in+Kafka+Connect]
>  introduced a new {{STOPPED}} state for connectors along with some REST API 
> endpoints to retrieve and modify offsets for connectors. Currently, only 
> connectors that already exist can be stopped and any newly created connector 
> will always be in the {{RUNNING}} state initially. Allowing the creation of 
> connectors in a {{STOPPED}} state will facilitate multiple new use cases. One 
> interesting use case would be to migrate connectors from one Kafka Connect 
> cluster to another. Individual connector migration would be useful in a 
> number of scenarios such as breaking a large cluster into multiple smaller 
> clusters (or vice versa), moving a connector from a cluster running in one 
> data center to another etc. A connector migration could be achieved by using 
> the following sequence of steps :-
>  # Stop the running connector on the original Kafka Connect cluster
>  # Retrieve the offsets for the connector via the {{GET 
> /connectors/\{connector}/offsets}}  endpoint
>  # Create the connector in a stopped state using the same configuration on 
> the new Kafka Connect cluster
>  # Alter the offsets for the connector on the new cluster via the {{PATCH 
> /connectors/\{connector}/offsets}}  endpoint (using the offsets obtained from 
> the original cluster)
>  # Resume the connector on the new cluster and delete it on the original 
> cluster
> Another use case for creating connectors in a stopped state could be 
> deploying connectors as a part of a larger data pipeline before the source / 
> sink data system has been created or is ready for data transfer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15339) Transient I/O error happening in appending records could lead to the halt of whole cluster

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15339:

Component/s: KafkaConnect
 (was: connect)

> Transient I/O error happening in appending records could lead to the halt of 
> whole cluster
> --
>
> Key: KAFKA-15339
> URL: https://issues.apache.org/jira/browse/KAFKA-15339
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect, producer 
>Affects Versions: 3.5.0
>Reporter: Haoze Wu
>Priority: Major
>
> We are running an integration test in which we start an Embedded Connect 
> Cluster in the active 3.5 branch. However, because of transient disk error, 
> we may encounter an IOException during appending records to one topic. As 
> shown in the stack trace: 
> {code:java}
> [2023-08-13 16:53:51,016] ERROR Error while appending records to 
> connect-config-topic-connect-cluster-0 in dir 
> /tmp/EmbeddedKafkaCluster8003464883598783225 
> (org.apache.kafka.storage.internals.log.LogDirFailureChannel:61)
> java.io.IOException: 
>         at 
> org.apache.kafka.common.record.MemoryRecords.writeFullyTo(MemoryRecords.java:92)
>         at 
> org.apache.kafka.common.record.FileRecords.append(FileRecords.java:188)
>         at kafka.log.LogSegment.append(LogSegment.scala:161)
>         at kafka.log.LocalLog.append(LocalLog.scala:436)
>         at kafka.log.UnifiedLog.append(UnifiedLog.scala:853)
>         at kafka.log.UnifiedLog.appendAsLeader(UnifiedLog.scala:664)
>         at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:1281)
>         at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:1269)
>         at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$6(ReplicaManager.scala:977)
>         at 
> scala.collection.StrictOptimizedMapOps.map(StrictOptimizedMapOps.scala:28)
>         at 
> scala.collection.StrictOptimizedMapOps.map$(StrictOptimizedMapOps.scala:27)
>         at scala.collection.mutable.HashMap.map(HashMap.scala:35)
>         at 
> kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:965)
>         at kafka.server.ReplicaManager.appendRecords(ReplicaManager.scala:623)
>         at kafka.server.KafkaApis.handleProduceRequest(KafkaApis.scala:680)
>         at kafka.server.KafkaApis.handle(KafkaApis.scala:180)
>         at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:76)
>         at java.lang.Thread.run(Thread.java:748) {code}
> However, just because of failing to append the records to one partition. The 
> fetcher for all the other partitions are removed, broker shutdown, and 
> finally embedded connect cluster killed as whole. 
> {code:java}
> [2023-08-13 17:35:37,966] WARN Stopping serving logs in dir 
> /tmp/EmbeddedKafkaCluster6777164631574762227 (kafka.log.LogManager:70)
> [2023-08-13 17:35:37,968] ERROR Shutdown broker because all log dirs in 
> /tmp/EmbeddedKafkaCluster6777164631574762227 have failed 
> (kafka.log.LogManager:143)
> [2023-08-13 17:35:37,968] WARN Abrupt service halt with code 1 and message 
> null (org.apache.kafka.connect.util.clusters.EmbeddedConnectCluster:130)
> [2023-08-13 17:35:37,968] ERROR [LogDirFailureHandler]: Error due to 
> (kafka.server.ReplicaManager$LogDirFailureHandler:135)
> org.apache.kafka.connect.util.clusters.UngracefulShutdownException: Abrupt 
> service halt with code 1 and message null {code}
> I am wondering if we could add configurable retry around the root cause to 
> tolerate the possible I/O faults so that if the retry is successful, the 
> embedded connect cluster could still operate. 
> Any comments and suggestions would be appreciated.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15680) Partition-Count is not getting updated Correctly in the Incremental Co-operative Rebalancing(ICR) Mode of Rebalancing

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15680:

Component/s: KafkaConnect
 (was: connect)

> Partition-Count is not getting updated Correctly in the Incremental 
> Co-operative Rebalancing(ICR) Mode of Rebalancing
> -
>
> Key: KAFKA-15680
> URL: https://issues.apache.org/jira/browse/KAFKA-15680
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.0.1
>Reporter: Pritam Kumar
>Assignee: Pritam Kumar
>Priority: Minor
> Fix For: 3.7.0, 3.6.1
>
>
> * In ICR(Incremental Cooperative Rebalancing) mode, whenever a new worker, 
> say Worker 3 joins, a new global assignment is computed by the leader, say 
> Worker1, that results in the revocation of some tasks from each existing 
> worker i.e Worker1 and Worker2.
>  * Once the new member join is completed, 
> *ConsumerCoordinator.OnJoinComplete()* method is called which primarily 
> computes all the new partitions assigned and the partitions which are revoked 
> and updates the subscription Object.
>  * If it was the case of revocation which we check by checking the 
> “partitonsRevoked” list, we call the method {*}“invoke{*}PartitionRevoked()” 
> which internally calls “updatePartitionCount()” which fetches partition from 
> the *assignment* object which is yet not updated by the new assignment.
>  * It is only just before calling the “{*}invokePartitionsAssigned{*}()” 
> method that we update the *assignment* by invoking the following → 
> *subscriptions.assignFromSubscribed(assignedPartitions);*



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15855) RFC 9266: Channel Bindings for TLS 1.3 support | SCRAM-SHA-*-PLUS variants

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15855?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15855:

Component/s: KafkaConnect
 (was: connect)

> RFC 9266: Channel Bindings for TLS 1.3 support | SCRAM-SHA-*-PLUS variants
> --
>
> Key: KAFKA-15855
> URL: https://issues.apache.org/jira/browse/KAFKA-15855
> Project: Kafka
>  Issue Type: Bug
>  Components: core, KafkaConnect, security
>Reporter: Neustradamus
>Priority: Critical
>  Labels: security
>
> Dear Apache, and Kafka teams,
> Can you add the support of RFC 9266: Channel Bindings for TLS 1.3?
> - [https://datatracker.ietf.org/doc/html/rfc9266]
> Little details, to know easily:
> - tls-unique for TLS =< 1.2
> - tls-server-end-point
> - tls-exporter for TLS = 1.3
> It is needed for SCRAM-SHA-*-PLUS variants.
> Note: Some SCRAM-SHA are already supported.
> I think that you have seen the jabber.ru MITM and Channel Binding is the 
> solution:
> - [https://notes.valdikss.org.ru/jabber.ru-mitm/]
> - [https://snikket.org/blog/on-the-jabber-ru-mitm/]
> - [https://www.devever.net/~hl/xmpp-incident]
> - [https://blog.jmp.chat/b/certwatch]
> IETF links:
> SCRAM-SHA-1(-PLUS):
> - RFC5802: Salted Challenge Response Authentication Mechanism (SCRAM) SASL 
> and GSS-API Mechanisms: [https://tools.ietf.org/html/rfc5802] // July 2010
> - RFC6120: Extensible Messaging and Presence Protocol (XMPP): Core: 
> [https://tools.ietf.org/html/rfc6120] // March 2011
> SCRAM-SHA-256(-PLUS):
> - RFC7677: SCRAM-SHA-256 and SCRAM-SHA-256-PLUS Simple Authentication and 
> Security Layer (SASL) Mechanisms: [https://tools.ietf.org/html/rfc7677] // 
> 2015-11-02
> - RFC8600: Using Extensible Messaging and Presence Protocol (XMPP) for 
> Security Information Exchange: [https://tools.ietf.org/html/rfc8600] // 
> 2019-06-21: 
> [https://mailarchive.ietf.org/arch/msg/ietf-announce/suJMmeMhuAOmGn_PJYgX5Vm8lNA]
> SCRAM-SHA-512(-PLUS):
> - [https://tools.ietf.org/html/draft-melnikov-scram-sha-512]
> SCRAM-SHA3-512(-PLUS):
> - [https://tools.ietf.org/html/draft-melnikov-scram-sha3-512]
> SCRAM BIS: Salted Challenge Response Authentication Mechanism (SCRAM) SASL 
> and GSS-API Mechanisms:
> - [https://tools.ietf.org/html/draft-melnikov-scram-bis]
> -PLUS variants:
> - RFC5056: On the Use of Channel Bindings to Secure Channels: 
> [https://tools.ietf.org/html/rfc5056] // November 2007
> - RFC5929: Channel Bindings for TLS: [https://tools.ietf.org/html/rfc5929] // 
> July 2010
> - Channel-Binding Types: 
> [https://www.iana.org/assignments/channel-binding-types/channel-binding-types.xhtml]
> - RFC9266: Channel Bindings for TLS 1.3: 
> [https://tools.ietf.org/html/rfc9266] // July 2022
> IMAP:
> - RFC9051: Internet Message Access Protocol (IMAP) - Version 4rev2: 
> [https://tools.ietf.org/html/rfc9051] // August 2021
> LDAP:
> - RFC5803: Lightweight Directory Access Protocol (LDAP) Schema for Storing 
> Salted: Challenge Response Authentication Mechanism (SCRAM) Secrets: 
> [https://tools.ietf.org/html/rfc5803] // July 2010
> HTTP:
> - RFC7804: Salted Challenge Response HTTP Authentication Mechanism: 
> [https://tools.ietf.org/html/rfc7804] // March 2016
> JMAP:
> - RFC8621: The JSON Meta Application Protocol (JMAP) for Mail: 
> [https://tools.ietf.org/html/rfc8621] // August 2019
> 2FA:
> - Extensions to Salted Challenge Response (SCRAM) for 2 factor 
> authentication: [https://tools.ietf.org/html/draft-ietf-kitten-scram-2fa]
> Thanks in advance.
> Linked to:
> - [https://github.com/scram-sasl/info/issues/1]
> Note: This ticket can be for other Apache projects too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15335) Support custom SSL configuration for Kafka Connect RestServer

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15335:

Component/s: KafkaConnect
 (was: connect)

> Support custom SSL configuration for Kafka Connect RestServer
> -
>
> Key: KAFKA-15335
> URL: https://issues.apache.org/jira/browse/KAFKA-15335
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Taras Ledkov
>Priority: Major
>
> Root issue to track 
> [KIP-967|https://cwiki.apache.org/confluence/display/KAFKA/KIP-967%3A+Support+custom+SSL+configuration+for+Kafka+Connect+RestServer]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15310) Add timezone configuration option in TimestampConverter from connectors

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15310:

Component/s: KafkaConnect
 (was: connect)

> Add timezone configuration option in TimestampConverter from connectors
> ---
>
> Key: KAFKA-15310
> URL: https://issues.apache.org/jira/browse/KAFKA-15310
> Project: Kafka
>  Issue Type: New Feature
>  Components: config, KafkaConnect
>Reporter: Romulo Souza
>Priority: Minor
>  Labels: needs-kip
> Attachments: Captura de tela de 2023-08-05 09-43-54-1.png, Captura de 
> tela de 2023-08-05 09-44-25-1.png
>
>
> In some cenarios where the use of TimestampConverter happens, it's 
> interesting to have an option to determine a specific timezone other than UTC 
> (hardcoded). E.g., there are use cases where a sink connector sends data to a 
> database and this same data is used in analysis tool without formatting and 
> transformation options.
> It should be added a new Kafka Connector's optional configuration to set the 
> desired timezone with a fallback to UTC when not informed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15208) Upgrade Jackson dependencies to version 2.16.0

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15208?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15208:

Component/s: KafkaConnect
 (was: connect)

> Upgrade Jackson dependencies to version 2.16.0
> --
>
> Key: KAFKA-15208
> URL: https://issues.apache.org/jira/browse/KAFKA-15208
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect, kraft
>Reporter: Said BOUDJELDA
>Assignee: Said BOUDJELDA
>Priority: Major
>  Labels: dependencies
> Fix For: 3.7.0
>
>
> Upgrading the version of Jackson dependencies to the latest stable version 
> 2.16.0 can bring much bug fixing security issues solving and performance 
> improvement 
>  
> Check release notes back to the current version 
> [https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.16.0]
> [https://github.com/FasterXML/jackson/wiki/Jackson-Release-2.14]
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15203) Remove dependency on Reflections

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15203:

Component/s: KafkaConnect
 (was: connect)

> Remove dependency on Reflections 
> -
>
> Key: KAFKA-15203
> URL: https://issues.apache.org/jira/browse/KAFKA-15203
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Divij Vaidya
>Priority: Major
>  Labels: newbie
>
> We currently depend on reflections library which is EOL. Quoting from the 
> GitHub site:
> _> Please note: Reflections library is currently NOT under active development 
> or maintenance_
>  
> This poses a supply chain risk for our project where the security fixes and 
> other major bugs in underlying dependency may not be addressed timely.
> Hence, we should plan to remove this dependency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15333) Flaky build failure throwing Connect Exception: Could not connect to server....

2023-12-26 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15333:

Component/s: KafkaConnect
 (was: connect)

> Flaky build failure throwing Connect Exception: Could not connect to 
> server
> ---
>
> Key: KAFKA-15333
> URL: https://issues.apache.org/jira/browse/KAFKA-15333
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect, unit tests
>Reporter: Philip Nee
>Priority: Major
>
> We frequently observe flaky build failure with the following message.  The is 
> from the most recent PR post 3.5.0:
>  
> {code:java}
> > Task :generator:testClasses UP-TO-DATE
> Unexpected exception thrown.
> org.gradle.internal.remote.internal.MessageIOException: Could not read 
> message from '/127.0.0.1:38354'.
>   at 
> org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:94)
>   at 
> org.gradle.internal.remote.internal.hub.MessageHub$ConnectionReceive.run(MessageHub.java:270)
>   at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:64)
>   at 
> org.gradle.internal.concurrent.AbstractManagedExecutor$1.run(AbstractManagedExecutor.java:47)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
>   at java.lang.Thread.run(Thread.java:750)
> Caused by: java.lang.IllegalArgumentException
>   at 
> org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:72)
>   at 
> org.gradle.internal.remote.internal.hub.InterHubMessageSerializer$MessageReader.read(InterHubMessageSerializer.java:52)
>   at 
> org.gradle.internal.remote.internal.inet.SocketConnection.receive(SocketConnection.java:81)
>   ... 6 more
> > Task :streams:upgrade-system-tests-26:unitTest
> org.gradle.internal.remote.internal.ConnectException: Could not connect to 
> server [3156f144-9a89-4c47-91ad-88a8378ec726 port:37889, 
> addresses:[/127.0.0.1]]. Tried addresses: [/127.0.0.1].
>   at 
> org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:67)
>   at 
> org.gradle.internal.remote.internal.hub.MessageHubBackedClient.getConnection(MessageHubBackedClient.java:36)
>   at 
> org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:103)
>   at 
> org.gradle.process.internal.worker.child.SystemApplicationClassLoaderWorker.call(SystemApplicationClassLoaderWorker.java:65)
>   at 
> worker.org.gradle.process.internal.worker.GradleWorkerMain.run(GradleWorkerMain.java:69)
>   at 
> worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
> Caused by: java.net.ConnectException: Connection refused
>   at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>   at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:716)
>   at sun.nio.ch.SocketAdaptor.connect(SocketAdaptor.java:122)
>   at 
> org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.tryConnect(TcpOutgoingConnector.java:81)
>   at 
> org.gradle.internal.remote.internal.inet.TcpOutgoingConnector.connect(TcpOutgoingConnector.java:54)
>   ... 5 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16051) Deadlock on connector initialization

2023-12-26 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800522#comment-17800522
 ] 

Greg Harris commented on KAFKA-16051:
-

Hi [~developster] Thanks for the report! that certainly looks like a deadlock 
that the standalone implementation should avoid. If you are interested, please 
assign this ticket to yourself and tag me to review your PR!

If you're just looking for a workaround, I would suggest using the distributed 
mode. It shouldn't have the same deadlock you're seeing, and is more similar to 
typical production environments.

> Deadlock on connector initialization
> 
>
> Key: KAFKA-16051
> URL: https://issues.apache.org/jira/browse/KAFKA-16051
> Project: Kafka
>  Issue Type: Bug
>  Components: connect
>Affects Versions: 2.6.3, 3.6.1
>Reporter: Octavian Ciubotaru
>Priority: Major
>
>  
> Tested with Kafka 3.6.1 and 2.6.3.
> The only plugin installed is confluentinc-kafka-connect-jdbc-10.7.4.
> Stack trace for Kafka 3.6.1:
> {noformat}
> Found one Java-level deadlock:
> =
> "pool-3-thread-1":
>   waiting to lock monitor 0x7fbc88006300 (object 0x91002aa0, a 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder),
>   which is held by "Thread-9"
> "Thread-9":
>   waiting to lock monitor 0x7fbc88008800 (object 0x9101ccd8, a 
> org.apache.kafka.connect.storage.MemoryConfigBackingStore),
>   which is held by "pool-3-thread-1"Java stack information for the threads 
> listed above:
> ===
> "pool-3-thread-1":
>     at 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder$ConfigUpdateListener.onTaskConfigUpdate(StandaloneHerder.java:516)
>     - waiting to lock <0x91002aa0> (a 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder)
>     at 
> org.apache.kafka.connect.storage.MemoryConfigBackingStore.putTaskConfigs(MemoryConfigBackingStore.java:137)
>     - locked <0x9101ccd8> (a 
> org.apache.kafka.connect.storage.MemoryConfigBackingStore)
>     at 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.updateConnectorTasks(StandaloneHerder.java:483)
>     at 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.lambda$null$2(StandaloneHerder.java:229)
>     at 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder$$Lambda$692/0x000840557440.run(Unknown
>  Source)
>     at 
> java.util.concurrent.Executors$RunnableAdapter.call(java.base@11.0.21/Executors.java:515)
>     at 
> java.util.concurrent.FutureTask.run(java.base@11.0.21/FutureTask.java:264)
>     at 
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(java.base@11.0.21/ScheduledThreadPoolExecutor.java:304)
>     at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(java.base@11.0.21/ThreadPoolExecutor.java:1128)
>     at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(java.base@11.0.21/ThreadPoolExecutor.java:628)
>     at java.lang.Thread.run(java.base@11.0.21/Thread.java:829)
> "Thread-9":
>     at 
> org.apache.kafka.connect.storage.MemoryConfigBackingStore.putTaskConfigs(MemoryConfigBackingStore.java:129)
>     - waiting to lock <0x9101ccd8> (a 
> org.apache.kafka.connect.storage.MemoryConfigBackingStore)
>     at 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.updateConnectorTasks(StandaloneHerder.java:483)
>     at 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder.requestTaskReconfiguration(StandaloneHerder.java:255)
>     - locked <0x91002aa0> (a 
> org.apache.kafka.connect.runtime.standalone.StandaloneHerder)
>     at 
> org.apache.kafka.connect.runtime.HerderConnectorContext.requestTaskReconfiguration(HerderConnectorContext.java:50)
>     at 
> org.apache.kafka.connect.runtime.WorkerConnector$WorkerConnectorContext.requestTaskReconfiguration(WorkerConnector.java:548)
>     at 
> io.confluent.connect.jdbc.source.TableMonitorThread.run(TableMonitorThread.java:86)
> Found 1 deadlock.
> {noformat}
> The jdbc source connector is loading tables from the database and updates the 
> configuration once the list is available. The deadlock is very consistent in 
> my environment, probably because the database is on the same machine.
> Maybe it is possible to avoid this situation by always locking the herder 
> first and the config backing store second. From what I see, 
> updateConnectorTasks sometimes is called before locking on herder and other 
> times it is not.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-16047) Source connector with EOS enabled have some InitProducerId requests timing out, effectively failing all the tasks & the whole connector

2023-12-26 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-16047?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17800520#comment-17800520
 ] 

Greg Harris commented on KAFKA-16047:
-

It appears that the transaction timeout is used as the timeout to produce 
records to the __transaction_state topic in 
TransactionStateManager#appendTransactionToLog 
[https://github.com/apache/kafka/blob/d582d5aff517879b150bc2739bad99df07e15e2b/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L769-L770]
which is in turn used during init, end, and add-partitions in 
TransactionCoordinator: 
[https://github.com/apache/kafka/blob/d582d5aff517879b150bc2739bad99df07e15e2b/core/src/main/scala/kafka/coordinator/transaction/TransactionCoordinator.scala#L199]
 

I see that after expiration, 
TransactionStateManager#writeTombstonesForExpiredTransactionalIds uses the 
request.timeout.ms: 
[https://github.com/apache/kafka/blob/d582d5aff517879b150bc2739bad99df07e15e2b/core/src/main/scala/kafka/coordinator/transaction/TransactionStateManager.scala#L283-L284]
 

Should the TransactionStateManager be using the transaction timeout ms for a 
single-operation, or should the request timeout be used throughout? Using the 
transaction timeout is certainly shorter than we expect, but I wonder if that's 
also longer than people expect in other situations. When the producer makes 
these requests, it only waits for max.block.ms (default 60s) for them to 
complete. After that point, the producer times out the request, while the 
broker may be left waiting for the transaction timeout (default 60s) to expire.

* We can fix this on the broker side by changing the produce timeout to the 
value of "max(transaction timeout, request timeout)". Someone may have 
increased their max.block.ms & transaction timeout while keeping the request 
timeout short, and would still desire that the init call block for up to 
max.block.ms.
* We can fix this on the client side by changing the 1ms hardcoded value to use 
the same timeout as the overall fenceProducers request, which is either the 
default request timeout (configurable) or specified via the Options argument.

> Source connector with EOS enabled have some InitProducerId requests timing 
> out, effectively failing all the tasks & the whole connector
> ---
>
> Key: KAFKA-16047
> URL: https://issues.apache.org/jira/browse/KAFKA-16047
> Project: Kafka
>  Issue Type: Bug
>  Components: connect, mirrormaker
>Affects Versions: 3.3.0, 3.4.0, 3.3.1, 3.3.2, 3.4.1, 3.6.0, 3.5.1, 3.5.2, 
> 3.6.1
>Reporter: Angelos Kaltsikis
>Priority: Major
>
> Source Connectors with 'exactly.once.support = required' may have some of 
> their tasks that issue InitProducerId requests from the admin client timeout. 
> In the case of MirrorSourceConnector, which was the source connector that i 
> found the bug, the bug was effectively making all the tasks (in the specific 
> case of) become "FAILED". As soon as one of the tasks gets FAILED due to the 
> 'COORDINATOR_NOT_AVAILABLE' messages (due to timeouts), no matter how many 
> restarts i did to the connector/tasks, i couldn't get the 
> MirrorSourceConnector in a healthy RUNNING state again.
> Due to the low timeout that has been [hard-coded in the 
> code|https://github.com/apache/kafka/blob/3.6.1/clients/src/main/java/org/apache/kafka/clients/admin/internals/FenceProducersHandler.java#L87]
>  (1ms), there is a chance that the `InitProducerId` requests timeout in case 
> of "slower-than-expected" Kafka brokers (that do not process & respond to the 
> above request in <= 1ms). (feel free to read more information about the issue 
> in the "More Context" section below)
> [~ChrisEgerton] I would appreciate it if you could respond to the following 
> questions
> - How and why was the 1ms magic number for transaction timeout has to be 
> chosen?
> - Is there any specific reason that it can be guaranteed that the 
> `InitProducerId` request can be processed in such a small time window? 
> - I have tried the above in multiple different Kafka clusters that are hosted 
> in different underlying datacenter hosts and i don't believe that those 
> brokers are "slow" for some reason. If you feel that the brokers are slower 
> than expected, i would appreciate any pointers on how could i find out what 
> is the bottleneck
> h3. Temporary Mitigation
> I have increased the timeout to 1000ms (randomly picked this number, just 
> wanted to give enough time to brokers to always complete those type of 
> requests). It fix can be found in my fork 
> https://github.com/akaltsikis/kafka/commit/8a47992e7dc63954f9d9ac54e8ed1f5a7737c97f
>  
> h3. Final solution
> The temporary mitigation is not ideal, as it still 

[jira] [Resolved] (KAFKA-15111) Correction kafka examples

2023-12-12 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-15111.
-
Resolution: Duplicate

> Correction kafka examples
> -
>
> Key: KAFKA-15111
> URL: https://issues.apache.org/jira/browse/KAFKA-15111
> Project: Kafka
>  Issue Type: Task
>Reporter: Dmitry
>Priority: Minor
> Fix For: 3.6.0
>
>
> Need set TOPIC_NAME = topic1 in KafkaConsumerProducerDemo class and remove 
> unused TOPIC field from KafkaProperties.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15111) Correction kafka examples

2023-12-12 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15111:

Fix Version/s: 3.6.0
   (was: 3.7.0)

> Correction kafka examples
> -
>
> Key: KAFKA-15111
> URL: https://issues.apache.org/jira/browse/KAFKA-15111
> Project: Kafka
>  Issue Type: Task
>Reporter: Dmitry
>Priority: Minor
> Fix For: 3.6.0
>
>
> Need set TOPIC_NAME = topic1 in KafkaConsumerProducerDemo class and remove 
> unused TOPIC field from KafkaProperties.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

2023-12-12 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17795958#comment-17795958
 ] 

Greg Harris commented on KAFKA-15372:
-

This is now set to release for 3.7 and 3.6, but I had some issues with the 3.5 
backport that I had to revert. In particular, the DedicatedMirrorTest has this 
persistent failure:
{noformat}
    org.apache.kafka.test.NoRetryException
        at 
app//org.apache.kafka.connect.mirror.integration.DedicatedMirrorIntegrationTest.lambda$awaitTaskConfigurations$8(DedicatedMirrorIntegrationTest.java:363)
        at 
app//org.apache.kafka.test.TestUtils.lambda$waitForCondition$4(TestUtils.java:337)
        at 
app//org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:385)
        at 
app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:334)
        at 
app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:318)
        at 
app//org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:308)
        at 
app//org.apache.kafka.connect.mirror.integration.DedicatedMirrorIntegrationTest.awaitTaskConfigurations(DedicatedMirrorIntegrationTest.java:353)
        at 
app//org.apache.kafka.connect.mirror.integration.DedicatedMirrorIntegrationTest.testMultiNodeCluster(DedicatedMirrorIntegrationTest.java:301)
        
Caused by:
        java.util.concurrent.ExecutionException: 
org.apache.kafka.connect.runtime.distributed.RebalanceNeededException: Request 
cannot be completed because a rebalance is expected
            at 
org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:123)
            at 
org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:115)
            at 
org.apache.kafka.connect.mirror.integration.DedicatedMirrorIntegrationTest.lambda$awaitTaskConfigurations$8(DedicatedMirrorIntegrationTest.java:357)
            ... 7 more            
Caused by:
            
org.apache.kafka.connect.runtime.distributed.RebalanceNeededException: Request 
cannot be completed because a rebalance is expected{noformat}

> MM2 rolling restart can drop configuration changes silently
> ---
>
> Key: KAFKA-15372
> URL: https://issues.apache.org/jira/browse/KAFKA-15372
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Daniel Urban
>Assignee: Greg Harris
>Priority: Major
> Fix For: 3.7.0, 3.6.2
>
>
> When MM2 is restarted, it tries to update the Connector configuration in all 
> flows. This is a one-time trial, and fails if the Connect worker is not the 
> leader of the group.
> In a distributed setup and with a rolling restart, it is possible that for a 
> specific flow, the Connect worker of the just restarted MM2 instance is not 
> the leader, meaning that Connector configurations can get dropped.
> For example, assuming 2 MM2 instances, and one flow A->B:
>  # MM2 instance 1 is restarted, the worker inside MM2 instance 2 becomes the 
> leader of A->B Connect group.
>  # MM2 instance 1 tries to update the Connector configurations, but fails 
> (instance 2 has the leader, not instance 1)
>  # MM2 instance 2 is restarted, leadership moves to worker in MM2 instance 1
>  # MM2 instance 2 tries to update the Connector configurations, but fails
> At this point, the configuration changes before the restart are never 
> applied. Many times, this can also happen silently, without any indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

2023-12-12 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15372:

Fix Version/s: 3.6.2

> MM2 rolling restart can drop configuration changes silently
> ---
>
> Key: KAFKA-15372
> URL: https://issues.apache.org/jira/browse/KAFKA-15372
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Daniel Urban
>Assignee: Greg Harris
>Priority: Major
> Fix For: 3.7.0, 3.6.2
>
>
> When MM2 is restarted, it tries to update the Connector configuration in all 
> flows. This is a one-time trial, and fails if the Connect worker is not the 
> leader of the group.
> In a distributed setup and with a rolling restart, it is possible that for a 
> specific flow, the Connect worker of the just restarted MM2 instance is not 
> the leader, meaning that Connector configurations can get dropped.
> For example, assuming 2 MM2 instances, and one flow A->B:
>  # MM2 instance 1 is restarted, the worker inside MM2 instance 2 becomes the 
> leader of A->B Connect group.
>  # MM2 instance 1 tries to update the Connector configurations, but fails 
> (instance 2 has the leader, not instance 1)
>  # MM2 instance 2 is restarted, leadership moves to worker in MM2 instance 1
>  # MM2 instance 2 tries to update the Connector configurations, but fails
> At this point, the configuration changes before the restart are never 
> applied. Many times, this can also happen silently, without any indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-15906) Emit offset syncs more often than offset.lag.max for low-throughput/finite partitions

2023-12-07 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris reassigned KAFKA-15906:
---

Assignee: Greg Harris

> Emit offset syncs more often than offset.lag.max for low-throughput/finite 
> partitions
> -
>
> Key: KAFKA-15906
> URL: https://issues.apache.org/jira/browse/KAFKA-15906
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Minor
>
> Right now, the offset.lag.max configuration limits the number of offset syncs 
> are emitted by the MirrorSourceTask, along with a fair rate-limiting 
> semaphore. After 100 records have been emitted for a partition, _and_ the 
> semaphore is available, an offset sync can be emitted.
> For low-volume topics, the `offset.lag.max` default of 100 is much more 
> restrictive than the rate-limiting semaphore. For example, a topic which 
> mirrors at the rate of 1 record/sec may take 100 seconds to receive an offset 
> sync. If the topic is actually finite, the last offset sync will never 
> arrive, and the translation will have a persistent lag.
> Instead, we can periodically flush the offset syncs for partitions that are 
> under the offset.lag.max limit, but have not received an offset sync 
> recently. This could be a new configuration, be a hard-coded time, or be 
> based on the existing emit.checkpoints.interval.seconds and 
> sync.group.offsets.interval.seconds configurations.
>  
> Alternatively, we could decrease the default `offset.lag.max` value to 0, and 
> rely on the fair semaphore to limit the number of syncs emitted for 
> high-throughput partitions. The semaphore is not currently configurable, so 
> users wanting lower throughput on the offset-syncs topic will still need an 
> offset.lag.max > 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15985) Mirrormaker 2 offset sync is incomplete

2023-12-07 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17794433#comment-17794433
 ] 

Greg Harris commented on KAFKA-15985:
-

Hi [~Reamer] The default value of `offset.lag.max` is 100, which could cause 
the 31 < 100 lag you're seeing. You can work-around this by setting 
`offset.lag.max` to `0`.

We're exploring a fix to this in KAFKA-15906.

> Mirrormaker 2 offset sync is incomplete
> ---
>
> Key: KAFKA-15985
> URL: https://issues.apache.org/jira/browse/KAFKA-15985
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 3.5.1
>Reporter: Philipp Dallig
>Priority: Major
>
> We are currently trying to migrate between two Kafka clusters using 
> Mirrormaker2
> new kafka cluster version: 7.5.2-ccs
> old kafka cluster version: kafka_2.13-2.8.0
> The Mirrormaker 2 process runs on the new cluster (target cluster).
> My main problem: The lag in the target cluster is not the same as in the 
> source cluster.
> I have set up a producer and consumer against the old Kafka cluster. If I 
> stop both. The lag in the old Kafka cluster is 0, while it is > 0 in the new 
> Kafka cluster.
> target cluster
> {code}
> GROUP   TOPICPARTITION  CURRENT-OFFSET  
> LOG-END-OFFSET  LAG CONSUMER-ID HOSTCLIENT-ID
> test-sync-5 kafka-replication-test-5 0  36373668  
>   31  -   -   -
> {code}
> source cluster
> {code}
> GROUP   TOPICPARTITION  CURRENT-OFFSET  
> LOG-END-OFFSET  LAG CONSUMER-ID HOSTCLIENT-ID
> test-sync-5 kafka-replication-test-5 0  36683668  
>   0   -   -   -
> {code}
> MM2 configuration without connection properties.
> {code}
> t-kafka->t-extkafka.enabled = true
> t-kafka->t-extkafka.topics = ops_filebeat, kafka-replication-.*
> t-kafka->t-extkafka.sync.topic.acls.enabled = false
> t-kafka->t-extkafka.sync.group.offsets.enabled = true
> t-kafka->t-extkafka.sync.group.offsets.interval.seconds = 30
> t-kafka->t-extkafka.refresh.groups.interval.seconds = 30
> t-kafka->t-extkafka.offset-syncs.topic.location = target
> t-kafka->t-extkafka.emit.checkpoints.interval.seconds = 30
> t-kafka->t-extkafka.replication.policy.class = 
> org.apache.kafka.connect.mirror.IdentityReplicationPolicy
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15816) Typos in tests leak network sockets

2023-12-06 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15816:

Description: 
There are a few tests which leak network sockets due to small typos in the 
tests themselves.

Clients: [https://github.com/apache/kafka/pull/14750] (DONE)
 * NioEchoServer
 * KafkaConsumerTest
 * KafkaProducerTest
 * SelectorTest
 * SslTransportLayerTest
 * SslTransportTls12Tls13Test
 * SslVersionsTransportLayerTest
 * SaslAuthenticatorTest

Core: [https://github.com/apache/kafka/pull/14754] 
 * MiniKdc
 * GssapiAuthenticationTest
 * MirrorMakerIntegrationTest
 * SocketServerTest
 * EpochDrivenReplicationProtocolAcceptanceTest
 * LeaderEpochIntegrationTest

Trogdor: [https://github.com/apache/kafka/pull/14771] 
 * AgentTest

Mirror: [https://github.com/apache/kafka/pull/14761] (DONE)
 * DedicatedMirrorIntegrationTest
 * MirrorConnectorsIntegrationTest
 * MirrorConnectorsWithCustomForwardingAdminIntegrationTest

Runtime: [https://github.com/apache/kafka/pull/14764] 
 * ConnectorTopicsIntegrationTest
 * ExactlyOnceSourceIntegrationTest
 * WorkerTest
 * WorkerGroupMemberTest

Streams: [https://github.com/apache/kafka/pull/14769] (DONE)
 * IQv2IntegrationTest
 * MetricsReporterIntegrationTest
 * NamedTopologyIntegrationTest
 * PurgeRepartitionTopicIntegrationTest

These can be addressed by just fixing the tests.

  was:
There are a few tests which leak network sockets due to small typos in the 
tests themselves.

Clients: [https://github.com/apache/kafka/pull/14750] (DONE)
 * NioEchoServer
 * KafkaConsumerTest
 * KafkaProducerTest
 * SelectorTest
 * SslTransportLayerTest
 * SslTransportTls12Tls13Test
 * SslVersionsTransportLayerTest
 * SaslAuthenticatorTest

Core: [https://github.com/apache/kafka/pull/14754] 
 * MiniKdc
 * GssapiAuthenticationTest
 * MirrorMakerIntegrationTest
 * SocketServerTest
 * EpochDrivenReplicationProtocolAcceptanceTest
 * LeaderEpochIntegrationTest

Trogdor: [https://github.com/apache/kafka/pull/14771] 
 * AgentTest

Mirror: [https://github.com/apache/kafka/pull/14761] 
 * DedicatedMirrorIntegrationTest
 * MirrorConnectorsIntegrationTest
 * MirrorConnectorsWithCustomForwardingAdminIntegrationTest

Runtime: [https://github.com/apache/kafka/pull/14764] 
 * ConnectorTopicsIntegrationTest
 * ExactlyOnceSourceIntegrationTest
 * WorkerTest
 * WorkerGroupMemberTest

Streams: [https://github.com/apache/kafka/pull/14769] (DONE)
 * IQv2IntegrationTest
 * MetricsReporterIntegrationTest
 * NamedTopologyIntegrationTest
 * PurgeRepartitionTopicIntegrationTest

These can be addressed by just fixing the tests.


> Typos in tests leak network sockets
> ---
>
> Key: KAFKA-15816
> URL: https://issues.apache.org/jira/browse/KAFKA-15816
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Minor
>
> There are a few tests which leak network sockets due to small typos in the 
> tests themselves.
> Clients: [https://github.com/apache/kafka/pull/14750] (DONE)
>  * NioEchoServer
>  * KafkaConsumerTest
>  * KafkaProducerTest
>  * SelectorTest
>  * SslTransportLayerTest
>  * SslTransportTls12Tls13Test
>  * SslVersionsTransportLayerTest
>  * SaslAuthenticatorTest
> Core: [https://github.com/apache/kafka/pull/14754] 
>  * MiniKdc
>  * GssapiAuthenticationTest
>  * MirrorMakerIntegrationTest
>  * SocketServerTest
>  * EpochDrivenReplicationProtocolAcceptanceTest
>  * LeaderEpochIntegrationTest
> Trogdor: [https://github.com/apache/kafka/pull/14771] 
>  * AgentTest
> Mirror: [https://github.com/apache/kafka/pull/14761] (DONE)
>  * DedicatedMirrorIntegrationTest
>  * MirrorConnectorsIntegrationTest
>  * MirrorConnectorsWithCustomForwardingAdminIntegrationTest
> Runtime: [https://github.com/apache/kafka/pull/14764] 
>  * ConnectorTopicsIntegrationTest
>  * ExactlyOnceSourceIntegrationTest
>  * WorkerTest
>  * WorkerGroupMemberTest
> Streams: [https://github.com/apache/kafka/pull/14769] (DONE)
>  * IQv2IntegrationTest
>  * MetricsReporterIntegrationTest
>  * NamedTopologyIntegrationTest
>  * PurgeRepartitionTopicIntegrationTest
> These can be addressed by just fixing the tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15816) Typos in tests leak network sockets

2023-12-05 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15816:

Description: 
There are a few tests which leak network sockets due to small typos in the 
tests themselves.

Clients: [https://github.com/apache/kafka/pull/14750] (DONE)
 * NioEchoServer
 * KafkaConsumerTest
 * KafkaProducerTest
 * SelectorTest
 * SslTransportLayerTest
 * SslTransportTls12Tls13Test
 * SslVersionsTransportLayerTest
 * SaslAuthenticatorTest

Core: [https://github.com/apache/kafka/pull/14754] 
 * MiniKdc
 * GssapiAuthenticationTest
 * MirrorMakerIntegrationTest
 * SocketServerTest
 * EpochDrivenReplicationProtocolAcceptanceTest
 * LeaderEpochIntegrationTest

Trogdor: [https://github.com/apache/kafka/pull/14771] 
 * AgentTest

Mirror: [https://github.com/apache/kafka/pull/14761] 
 * DedicatedMirrorIntegrationTest
 * MirrorConnectorsIntegrationTest
 * MirrorConnectorsWithCustomForwardingAdminIntegrationTest

Runtime: [https://github.com/apache/kafka/pull/14764] 
 * ConnectorTopicsIntegrationTest
 * ExactlyOnceSourceIntegrationTest
 * WorkerTest
 * WorkerGroupMemberTest

Streams: [https://github.com/apache/kafka/pull/14769] (DONE)
 * IQv2IntegrationTest
 * MetricsReporterIntegrationTest
 * NamedTopologyIntegrationTest
 * PurgeRepartitionTopicIntegrationTest

These can be addressed by just fixing the tests.

  was:
There are a few tests which leak network sockets due to small typos in the 
tests themselves.

Clients: [https://github.com/apache/kafka/pull/14750]
 * NioEchoServer
 * KafkaConsumerTest
 * KafkaProducerTest
 * SelectorTest
 * SslTransportLayerTest
 * SslTransportTls12Tls13Test
 * SslVersionsTransportLayerTest
 * SaslAuthenticatorTest

Core: [https://github.com/apache/kafka/pull/14754] 
 * MiniKdc
 * GssapiAuthenticationTest
 * MirrorMakerIntegrationTest
 * SocketServerTest
 * EpochDrivenReplicationProtocolAcceptanceTest
 * LeaderEpochIntegrationTest

Trogdor: [https://github.com/apache/kafka/pull/14771] 
 * AgentTest

Mirror: [https://github.com/apache/kafka/pull/14761] 
 * DedicatedMirrorIntegrationTest
 * MirrorConnectorsIntegrationTest
 * MirrorConnectorsWithCustomForwardingAdminIntegrationTest

Runtime: [https://github.com/apache/kafka/pull/14764] 
 * ConnectorTopicsIntegrationTest
 * ExactlyOnceSourceIntegrationTest
 * WorkerTest
 * WorkerGroupMemberTest

Streams: [https://github.com/apache/kafka/pull/14769] (DONE)
 * IQv2IntegrationTest
 * MetricsReporterIntegrationTest
 * NamedTopologyIntegrationTest
 * PurgeRepartitionTopicIntegrationTest

These can be addressed by just fixing the tests.


> Typos in tests leak network sockets
> ---
>
> Key: KAFKA-15816
> URL: https://issues.apache.org/jira/browse/KAFKA-15816
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Minor
>
> There are a few tests which leak network sockets due to small typos in the 
> tests themselves.
> Clients: [https://github.com/apache/kafka/pull/14750] (DONE)
>  * NioEchoServer
>  * KafkaConsumerTest
>  * KafkaProducerTest
>  * SelectorTest
>  * SslTransportLayerTest
>  * SslTransportTls12Tls13Test
>  * SslVersionsTransportLayerTest
>  * SaslAuthenticatorTest
> Core: [https://github.com/apache/kafka/pull/14754] 
>  * MiniKdc
>  * GssapiAuthenticationTest
>  * MirrorMakerIntegrationTest
>  * SocketServerTest
>  * EpochDrivenReplicationProtocolAcceptanceTest
>  * LeaderEpochIntegrationTest
> Trogdor: [https://github.com/apache/kafka/pull/14771] 
>  * AgentTest
> Mirror: [https://github.com/apache/kafka/pull/14761] 
>  * DedicatedMirrorIntegrationTest
>  * MirrorConnectorsIntegrationTest
>  * MirrorConnectorsWithCustomForwardingAdminIntegrationTest
> Runtime: [https://github.com/apache/kafka/pull/14764] 
>  * ConnectorTopicsIntegrationTest
>  * ExactlyOnceSourceIntegrationTest
>  * WorkerTest
>  * WorkerGroupMemberTest
> Streams: [https://github.com/apache/kafka/pull/14769] (DONE)
>  * IQv2IntegrationTest
>  * MetricsReporterIntegrationTest
>  * NamedTopologyIntegrationTest
>  * PurgeRepartitionTopicIntegrationTest
> These can be addressed by just fixing the tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15816) Typos in tests leak network sockets

2023-11-29 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15816:

Description: 
There are a few tests which leak network sockets due to small typos in the 
tests themselves.

Clients: [https://github.com/apache/kafka/pull/14750]
 * NioEchoServer
 * KafkaConsumerTest
 * KafkaProducerTest
 * SelectorTest
 * SslTransportLayerTest
 * SslTransportTls12Tls13Test
 * SslVersionsTransportLayerTest
 * SaslAuthenticatorTest

Core: [https://github.com/apache/kafka/pull/14754] 
 * MiniKdc
 * GssapiAuthenticationTest
 * MirrorMakerIntegrationTest
 * SocketServerTest
 * EpochDrivenReplicationProtocolAcceptanceTest
 * LeaderEpochIntegrationTest

Trogdor: [https://github.com/apache/kafka/pull/14771] 
 * AgentTest

Mirror: [https://github.com/apache/kafka/pull/14761] 
 * DedicatedMirrorIntegrationTest
 * MirrorConnectorsIntegrationTest
 * MirrorConnectorsWithCustomForwardingAdminIntegrationTest

Runtime: [https://github.com/apache/kafka/pull/14764] 
 * ConnectorTopicsIntegrationTest
 * ExactlyOnceSourceIntegrationTest
 * WorkerTest
 * WorkerGroupMemberTest

Streams: [https://github.com/apache/kafka/pull/14769] (DONE)
 * IQv2IntegrationTest
 * MetricsReporterIntegrationTest
 * NamedTopologyIntegrationTest
 * PurgeRepartitionTopicIntegrationTest

These can be addressed by just fixing the tests.

  was:
There are a few tests which leak network sockets due to small typos in the 
tests themselves.

Clients: [https://github.com/apache/kafka/pull/14750]
 * NioEchoServer
 * KafkaConsumerTest
 * KafkaProducerTest
 * SelectorTest
 * SslTransportLayerTest
 * SslTransportTls12Tls13Test
 * SslVersionsTransportLayerTest
 * SaslAuthenticatorTest

Core: [https://github.com/apache/kafka/pull/14754] 
 * MiniKdc
 * GssapiAuthenticationTest
 * MirrorMakerIntegrationTest
 * SocketServerTest
 * EpochDrivenReplicationProtocolAcceptanceTest
 * LeaderEpochIntegrationTest

Trogdor: [https://github.com/apache/kafka/pull/14771] 
 * AgentTest

Mirror: [https://github.com/apache/kafka/pull/14761] 
 * DedicatedMirrorIntegrationTest
 * MirrorConnectorsIntegrationTest
 * MirrorConnectorsWithCustomForwardingAdminIntegrationTest

Runtime: [https://github.com/apache/kafka/pull/14764] 
 * ConnectorTopicsIntegrationTest
 * ExactlyOnceSourceIntegrationTest
 * WorkerTest
 * WorkerGroupMemberTest

Streams: [https://github.com/apache/kafka/pull/14769]
 * IQv2IntegrationTest
 * MetricsReporterIntegrationTest
 * NamedTopologyIntegrationTest
 * PurgeRepartitionTopicIntegrationTest

These can be addressed by just fixing the tests.


> Typos in tests leak network sockets
> ---
>
> Key: KAFKA-15816
> URL: https://issues.apache.org/jira/browse/KAFKA-15816
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Minor
>
> There are a few tests which leak network sockets due to small typos in the 
> tests themselves.
> Clients: [https://github.com/apache/kafka/pull/14750]
>  * NioEchoServer
>  * KafkaConsumerTest
>  * KafkaProducerTest
>  * SelectorTest
>  * SslTransportLayerTest
>  * SslTransportTls12Tls13Test
>  * SslVersionsTransportLayerTest
>  * SaslAuthenticatorTest
> Core: [https://github.com/apache/kafka/pull/14754] 
>  * MiniKdc
>  * GssapiAuthenticationTest
>  * MirrorMakerIntegrationTest
>  * SocketServerTest
>  * EpochDrivenReplicationProtocolAcceptanceTest
>  * LeaderEpochIntegrationTest
> Trogdor: [https://github.com/apache/kafka/pull/14771] 
>  * AgentTest
> Mirror: [https://github.com/apache/kafka/pull/14761] 
>  * DedicatedMirrorIntegrationTest
>  * MirrorConnectorsIntegrationTest
>  * MirrorConnectorsWithCustomForwardingAdminIntegrationTest
> Runtime: [https://github.com/apache/kafka/pull/14764] 
>  * ConnectorTopicsIntegrationTest
>  * ExactlyOnceSourceIntegrationTest
>  * WorkerTest
>  * WorkerGroupMemberTest
> Streams: [https://github.com/apache/kafka/pull/14769] (DONE)
>  * IQv2IntegrationTest
>  * MetricsReporterIntegrationTest
>  * NamedTopologyIntegrationTest
>  * PurgeRepartitionTopicIntegrationTest
> These can be addressed by just fixing the tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15912) Parallelize conversion and transformation steps in Connect

2023-11-28 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17790672#comment-17790672
 ] 

Greg Harris commented on KAFKA-15912:
-

Hey [~mimaison] thanks for the ticket. I've only thought briefly about this and 
haven't found any obvious blockers, but there are some design restrictions:
 # Since the javadocs for Transformation and Predicate don't mention 
thread-safety, I think we have to assume that they are not thread-safe
 # There is room in the API for a Transformation to be stateful and 
order-sensitive, (such as packing records together) so I think we would be 
unable to instantiate multiple copies of a single transform stage, and all 
records would have to pass serially through a stage.

If Transformations and Predicates could declare themselves thread-safe, then we 
would be able to do some finer-grained parallelism, or fallback to actor-style 
parallelism (a single thread with message queue input).

I think it would be ineffective/undesirable for Transformations to take this 
performance optimization burden upon themselves completely like the Task 
implementations do, so we should certainly improve the framework in this area.

> Parallelize conversion and transformation steps in Connect
> --
>
> Key: KAFKA-15912
> URL: https://issues.apache.org/jira/browse/KAFKA-15912
> Project: Kafka
>  Issue Type: Improvement
>  Components: connect
>Reporter: Mickael Maison
>Priority: Major
>
> In busy Connect pipelines, the conversion and transformation steps can 
> sometimes have a very significant impact on performance. This is especially 
> true with large records with complex schemas, for example with CDC connectors 
> like Debezium.
> Today in order to always preserve ordering, converters and transformations 
> are called on one record at a time in a single thread in the Connect worker. 
> As Connect usually handles records in batches (up to max.poll.records in sink 
> pipelines, for source pipelines while it really depends on the connector, 
> most connectors I've seen still tend to return multiple records each loop), 
> it could be highly beneficial to attempt running the converters and 
> transformation chain in parallel by a pool a processing threads.
> It should be possible to do some of these steps in parallel and still keep 
> exact ordering. I'm even considering whether an option to lose ordering but 
> allow even faster processing would make sense.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

2023-11-27 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15372:

Fix Version/s: 3.7.0

> MM2 rolling restart can drop configuration changes silently
> ---
>
> Key: KAFKA-15372
> URL: https://issues.apache.org/jira/browse/KAFKA-15372
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Daniel Urban
>Assignee: Greg Harris
>Priority: Major
> Fix For: 3.7.0
>
>
> When MM2 is restarted, it tries to update the Connector configuration in all 
> flows. This is a one-time trial, and fails if the Connect worker is not the 
> leader of the group.
> In a distributed setup and with a rolling restart, it is possible that for a 
> specific flow, the Connect worker of the just restarted MM2 instance is not 
> the leader, meaning that Connector configurations can get dropped.
> For example, assuming 2 MM2 instances, and one flow A->B:
>  # MM2 instance 1 is restarted, the worker inside MM2 instance 2 becomes the 
> leader of A->B Connect group.
>  # MM2 instance 1 tries to update the Connector configurations, but fails 
> (instance 2 has the leader, not instance 1)
>  # MM2 instance 2 is restarted, leadership moves to worker in MM2 instance 1
>  # MM2 instance 2 tries to update the Connector configurations, but fails
> At this point, the configuration changes before the restart are never 
> applied. Many times, this can also happen silently, without any indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15906) Emit offset syncs more often than offset.lag.max for low-throughput/finite partitions

2023-11-27 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15906:

Description: 
Right now, the offset.lag.max configuration limits the number of offset syncs 
are emitted by the MirrorSourceTask, along with a fair rate-limiting semaphore. 
After 100 records have been emitted for a partition, _and_ the semaphore is 
available, an offset sync can be emitted.

For low-volume topics, the `offset.lag.max` default of 100 is much more 
restrictive than the rate-limiting semaphore. For example, a topic which 
mirrors at the rate of 1 record/sec may take 100 seconds to receive an offset 
sync. If the topic is actually finite, the last offset sync will never arrive, 
and the translation will have a persistent lag.

Instead, we can periodically flush the offset syncs for partitions that are 
under the offset.lag.max limit, but have not received an offset sync recently. 
This could be a new configuration, be a hard-coded time, or be based on the 
existing emit.checkpoints.interval.seconds and 
sync.group.offsets.interval.seconds configurations.

 

Alternatively, we could decrease the default `offset.lag.max` value to 0, and 
rely on the fair semaphore to limit the number of syncs emitted for 
high-throughput partitions. The semaphore is not currently configurable, so 
users wanting lower throughput on the offset-syncs topic will still need an 
offset.lag.max > 0.

  was:
Right now, the offset.lag.max configuration limits the number of offset syncs 
are emitted by the MirrorSourceTask, along with a fair rate-limiting semaphore. 
After 100 records have been emitted for a partition, _and_ the semaphore is 
available, an offset sync can be emitted.

For low-volume topics, the `offset.lag.max` default of 100 is much more 
restrictive than the rate-limiting semaphore. For example, a topic which 
mirrors at the rate of 1 record/sec may take 100 seconds to receive an offset 
sync. If the topic is actually finite, the last offset sync will never arrive, 
and the translation will have a persistent lag.

Instead, we can periodically flush the offset syncs for partitions that are 
under the offset.lag.max limit, but have not received an offset sync recently. 
This could be a new configuration, be a hard-coded time, or be based on the 
existing emit.checkpoints.interval.seconds and 
sync.group.offsets.interval.seconds configurations.


> Emit offset syncs more often than offset.lag.max for low-throughput/finite 
> partitions
> -
>
> Key: KAFKA-15906
> URL: https://issues.apache.org/jira/browse/KAFKA-15906
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Reporter: Greg Harris
>Priority: Minor
>
> Right now, the offset.lag.max configuration limits the number of offset syncs 
> are emitted by the MirrorSourceTask, along with a fair rate-limiting 
> semaphore. After 100 records have been emitted for a partition, _and_ the 
> semaphore is available, an offset sync can be emitted.
> For low-volume topics, the `offset.lag.max` default of 100 is much more 
> restrictive than the rate-limiting semaphore. For example, a topic which 
> mirrors at the rate of 1 record/sec may take 100 seconds to receive an offset 
> sync. If the topic is actually finite, the last offset sync will never 
> arrive, and the translation will have a persistent lag.
> Instead, we can periodically flush the offset syncs for partitions that are 
> under the offset.lag.max limit, but have not received an offset sync 
> recently. This could be a new configuration, be a hard-coded time, or be 
> based on the existing emit.checkpoints.interval.seconds and 
> sync.group.offsets.interval.seconds configurations.
>  
> Alternatively, we could decrease the default `offset.lag.max` value to 0, and 
> rely on the fair semaphore to limit the number of syncs emitted for 
> high-throughput partitions. The semaphore is not currently configurable, so 
> users wanting lower throughput on the offset-syncs topic will still need an 
> offset.lag.max > 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15906) Emit offset syncs more often than offset.lag.max for low-throughput/finite partitions

2023-11-27 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15906:
---

 Summary: Emit offset syncs more often than offset.lag.max for 
low-throughput/finite partitions
 Key: KAFKA-15906
 URL: https://issues.apache.org/jira/browse/KAFKA-15906
 Project: Kafka
  Issue Type: Improvement
  Components: mirrormaker
Reporter: Greg Harris


Right now, the offset.lag.max configuration limits the number of offset syncs 
are emitted by the MirrorSourceTask, along with a fair rate-limiting semaphore. 
After 100 records have been emitted for a partition, _and_ the semaphore is 
available, an offset sync can be emitted.

For low-volume topics, the `offset.lag.max` default of 100 is much more 
restrictive than the rate-limiting semaphore. For example, a topic which 
mirrors at the rate of 1 record/sec may take 100 seconds to receive an offset 
sync. If the topic is actually finite, the last offset sync will never arrive, 
and the translation will have a persistent lag.

Instead, we can periodically flush the offset syncs for partitions that are 
under the offset.lag.max limit, but have not received an offset sync recently. 
This could be a new configuration, be a hard-coded time, or be based on the 
existing emit.checkpoints.interval.seconds and 
sync.group.offsets.interval.seconds configurations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15905) Restarts of MirrorCheckpointTask should not permanently interrupt offset translation

2023-11-27 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15905:
---

 Summary: Restarts of MirrorCheckpointTask should not permanently 
interrupt offset translation
 Key: KAFKA-15905
 URL: https://issues.apache.org/jira/browse/KAFKA-15905
 Project: Kafka
  Issue Type: Improvement
  Components: mirrormaker
Affects Versions: 3.6.0
Reporter: Greg Harris


Executive summary: When the MirrorCheckpointTask restarts, it loses the state 
of checkpointsPerConsumerGroup, which limits offset translation to records 
mirrored after the latest restart.

For example, if 1000 records are mirrored and the OffsetSyncs are read by 
MirrorCheckpointTask, the emitted checkpoints are cached, and translation can 
happen at the ~500th record. If MirrorCheckpointTask restarts, and 1000 more 
records are mirrored, translation can happen at the ~1500th record, but no 
longer at the ~500th record.

Context:

Before KAFKA-13659, MM2 made translation decisions based on the 
incompletely-initialized OffsetSyncStore, and the checkpoint could appear to go 
backwards temporarily during restarts. To fix this, we forced the 
OffsetSyncStore to initialize completely before translation could take place, 
ensuring that the latest OffsetSync had been read, and thus providing the most 
accurate translation.

Before KAFKA-14666, MM2 translated offsets only off of the latest OffsetSync. 
Afterwards, an in-memory sparse cache of historical OffsetSyncs was kept, to 
allow for translation of earlier offsets. This came with the caveat that the 
cache's sparseness allowed translations to go backwards permanently. To prevent 
this behavior, a cache of the latest Checkpoints was kept in the 
MirrorCheckpointTask#checkpointsPerConsumerGroup variable, and offset 
translation remained restricted to the fully-initialized OffsetSyncStore.

Effectively, the MirrorCheckpointTask ensures that it translates based on an 
OffsetSync emitted during it's lifetime, to ensure that no previous 
MirrorCheckpointTask emitted a later sync. If we can read the checkpoints 
emitted by previous generations of MirrorCheckpointTask, we can still ensure 
that checkpoints are monotonic, while allowing translation further back in 
history.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15862) Remove SecurityManager Support

2023-11-20 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15862:
---

 Summary: Remove SecurityManager Support
 Key: KAFKA-15862
 URL: https://issues.apache.org/jira/browse/KAFKA-15862
 Project: Kafka
  Issue Type: New Feature
  Components: clients, KafkaConnect, Tiered-Storage
Reporter: Greg Harris
Assignee: Greg Harris


https://cwiki.apache.org/confluence/display/KAFKA/KIP-1006%3A+Remove+SecurityManager+Support



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15816) Typos in tests leak network sockets

2023-11-16 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15816:

Description: 
There are a few tests which leak network sockets due to small typos in the 
tests themselves.

Clients: [https://github.com/apache/kafka/pull/14750]
 * NioEchoServer
 * KafkaConsumerTest
 * KafkaProducerTest
 * SelectorTest
 * SslTransportLayerTest
 * SslTransportTls12Tls13Test
 * SslVersionsTransportLayerTest
 * SaslAuthenticatorTest

Core: [https://github.com/apache/kafka/pull/14754] 
 * MiniKdc
 * GssapiAuthenticationTest
 * MirrorMakerIntegrationTest
 * SocketServerTest
 * EpochDrivenReplicationProtocolAcceptanceTest
 * LeaderEpochIntegrationTest

Trogdor: [https://github.com/apache/kafka/pull/14771] 
 * AgentTest

Mirror: [https://github.com/apache/kafka/pull/14761] 
 * DedicatedMirrorIntegrationTest
 * MirrorConnectorsIntegrationTest
 * MirrorConnectorsWithCustomForwardingAdminIntegrationTest

Runtime: [https://github.com/apache/kafka/pull/14764] 
 * ConnectorTopicsIntegrationTest
 * ExactlyOnceSourceIntegrationTest
 * WorkerTest
 * WorkerGroupMemberTest

Streams: [https://github.com/apache/kafka/pull/14769]
 * IQv2IntegrationTest
 * MetricsReporterIntegrationTest
 * NamedTopologyIntegrationTest
 * PurgeRepartitionTopicIntegrationTest

These can be addressed by just fixing the tests.

  was:
There are a few tests which leak network sockets due to small typos in the 
tests themselves.

Clients: https://github.com/apache/kafka/pull/14750
 * KafkaConsumerTest
 * KafkaProducerTest
 * ConfigResourceTest
 * SelectorTest
 * SslTransportLayerTest
 * SslTransportTls12Tls13Test
 * SslVersionsTransportLayerTest

Core:
 * DescribeAuthorizedOperationsTest
 * SslGssapiSslEndToEndAuthorizationTest
 * SaslMultiMechanismConsumerTest
 * SaslPlaintextConsumerTest
 * SaslSslAdminIntegrationTest
 * SaslSslConsumerTest
 * MultipleListenersWithDefaultJaasContextTest
 * DescribeClusterRequestTest

Trogdor:
 * AgentTest

These can be addressed by just fixing the tests.


> Typos in tests leak network sockets
> ---
>
> Key: KAFKA-15816
> URL: https://issues.apache.org/jira/browse/KAFKA-15816
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Minor
>
> There are a few tests which leak network sockets due to small typos in the 
> tests themselves.
> Clients: [https://github.com/apache/kafka/pull/14750]
>  * NioEchoServer
>  * KafkaConsumerTest
>  * KafkaProducerTest
>  * SelectorTest
>  * SslTransportLayerTest
>  * SslTransportTls12Tls13Test
>  * SslVersionsTransportLayerTest
>  * SaslAuthenticatorTest
> Core: [https://github.com/apache/kafka/pull/14754] 
>  * MiniKdc
>  * GssapiAuthenticationTest
>  * MirrorMakerIntegrationTest
>  * SocketServerTest
>  * EpochDrivenReplicationProtocolAcceptanceTest
>  * LeaderEpochIntegrationTest
> Trogdor: [https://github.com/apache/kafka/pull/14771] 
>  * AgentTest
> Mirror: [https://github.com/apache/kafka/pull/14761] 
>  * DedicatedMirrorIntegrationTest
>  * MirrorConnectorsIntegrationTest
>  * MirrorConnectorsWithCustomForwardingAdminIntegrationTest
> Runtime: [https://github.com/apache/kafka/pull/14764] 
>  * ConnectorTopicsIntegrationTest
>  * ExactlyOnceSourceIntegrationTest
>  * WorkerTest
>  * WorkerGroupMemberTest
> Streams: [https://github.com/apache/kafka/pull/14769]
>  * IQv2IntegrationTest
>  * MetricsReporterIntegrationTest
>  * NamedTopologyIntegrationTest
>  * PurgeRepartitionTopicIntegrationTest
> These can be addressed by just fixing the tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15845) Add Junit5 test extension which detects leaked Kafka clients and servers

2023-11-16 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15845:
---

 Summary: Add Junit5 test extension which detects leaked Kafka 
clients and servers
 Key: KAFKA-15845
 URL: https://issues.apache.org/jira/browse/KAFKA-15845
 Project: Kafka
  Issue Type: Improvement
  Components: clients
Reporter: Greg Harris
Assignee: Greg Harris


We have many tests which accidentally leak Kafka clients and servers. This 
contributes to test flakiness and build instability.

We should use a test extension to make it easier to find these leaked clients 
and servers, and force test-implementors to resolve their resource leaks prior 
to merge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15834) Subscribing to non-existent topic blocks StreamThread from stopping

2023-11-15 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15834:
---

 Summary: Subscribing to non-existent topic blocks StreamThread 
from stopping
 Key: KAFKA-15834
 URL: https://issues.apache.org/jira/browse/KAFKA-15834
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 3.6.0
Reporter: Greg Harris


In 
NamedTopologyIntegrationTest#shouldContinueProcessingOtherTopologiesWhenNewTopologyHasMissingInputTopics
 a topology is created which references an input topic which does not exist. 
The test as-written passes, but the KafkaStreams#close(Duration) at the end 
times out, and leaves StreamsThreads running.

>From some cursory investigation it appears that this is happening:
1. The consumer calls the StreamsPartitionAssignor, which calls 
TaskManager#handleRebalanceStart as a side-effect
2. handleRebalanceStart sets the rebalanceInProgress flag
3. This flag is checked by StreamThread.runLoop, and causes the loop to remain 
running.
4. The consumer never calls StreamsRebalanceListener#onPartitionsAssigned, 
because the topic does not exist
5. Because no partitions are ever assigned, the 
TaskManager#handleRebalanceComplete never clears the rebalanceInProgress flag
 
This log message is printed in a tight loop while the close is ongoing and the 
consumer is being polled with zero duration:
{noformat}
[2023-11-15 11:42:43,661] WARN [Consumer 
clientId=NamedTopologyIntegrationTestshouldContinueProcessingOtherTopologiesWhenNewTopologyHasMissingInputTopics-942756f8-5213-4c44-bb6b-5f805884e026-StreamThread-1-consumer,
 
groupId=NamedTopologyIntegrationTestshouldContinueProcessingOtherTopologiesWhenNewTopologyHasMissingInputTopics]
 Received unknown topic or partition error in fetch for partition 
unique_topic_prefix-topology-1-store-repartition-0 
(org.apache.kafka.clients.consumer.internals.FetchCollector:321)
{noformat}
Practically, this means that this test leaks two StreamsThreads and the 
associated clients and sockets, and delays the completion of the test until the 
KafkaStreams#close(Duration) call times out.

Either we should change the rebalanceInProgress flag to avoid getting stuck in 
this rebalance state, or figure out a way to shut down a StreamsThread that is 
in an extended rebalance state during shutdown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15827) KafkaBasedLog.withExistingClients leaks clients if start is not called

2023-11-14 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15827:
---

 Summary: KafkaBasedLog.withExistingClients leaks clients if start 
is not called
 Key: KAFKA-15827
 URL: https://issues.apache.org/jira/browse/KAFKA-15827
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 3.6.0
Reporter: Greg Harris
Assignee: Greg Harris


The KafkaBasedLog base implementation creates consumers and producers, and 
closes them after they are instantiated. There are subclasses of the 
KafkaBasedLog which accept pre-created consumers and producers, and have the 
responsibility for closing the clients when the KafkaBasedLog is stopped.

It appears that the KafkaBasedLog subclasses do not close the clients when 
start() is skipped and stop() is called directly. This happens in a few tests, 
and causes the passed-in clients to be leaked.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15826) WorkerSinkTask leaks Consumer if plugin start or stop blocks indefinitely

2023-11-14 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15826:
---

 Summary: WorkerSinkTask leaks Consumer if plugin start or stop 
blocks indefinitely
 Key: KAFKA-15826
 URL: https://issues.apache.org/jira/browse/KAFKA-15826
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 3.6.0
Reporter: Greg Harris
Assignee: Greg Harris


The WorkerSourceTask cancel() method closes the Producer, releasing it's 
resources. The WorkerSInkTask does not do the same for the Consumer, as it does 
not override the cancel() method.

WorkerSinkTask should close the consumer if the task is cancelled, as progress 
for a cancelled task will be discarded anyway. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15819) KafkaServer leaks KafkaRaftManager when ZK migration enabled

2023-11-13 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15819:

Priority: Minor  (was: Major)

> KafkaServer leaks KafkaRaftManager when ZK migration enabled
> 
>
> Key: KAFKA-15819
> URL: https://issues.apache.org/jira/browse/KAFKA-15819
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Minor
>
> In SharedServer, TestRaftServer, and MetadataShell, the KafkaRaftManager is 
> maintained as an instance variable, and shutdown when the outer instance is 
> shutdown. However, in the KafkaServer, the KafkaRaftManager is instantiated 
> and started, but then the reference is lost.
> [https://github.com/apache/kafka/blob/49d3122d425171b6a59a2b6f02d3fe63d3ac2397/core/src/main/scala/kafka/server/KafkaServer.scala#L416-L442]
> Instead, the KafkaServer should behave like the other call-sites of 
> KafkaRaftManager, and shutdown the KafkaRaftManager during shutdown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15819) KafkaServer leaks KafkaRaftManager when ZK migration enabled

2023-11-13 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15819:
---

 Summary: KafkaServer leaks KafkaRaftManager when ZK migration 
enabled
 Key: KAFKA-15819
 URL: https://issues.apache.org/jira/browse/KAFKA-15819
 Project: Kafka
  Issue Type: Bug
  Components: kraft
Affects Versions: 3.6.0
Reporter: Greg Harris
Assignee: Greg Harris


In SharedServer, TestRaftServer, and MetadataShell, the KafkaRaftManager is 
maintained as an instance variable, and shutdown when the outer instance is 
shutdown. However, in the KafkaServer, the KafkaRaftManager is instantiated and 
started, but then the reference is lost.

[https://github.com/apache/kafka/blob/49d3122d425171b6a59a2b6f02d3fe63d3ac2397/core/src/main/scala/kafka/server/KafkaServer.scala#L416-L442]

Instead, the KafkaServer should behave like the other call-sites of 
KafkaRaftManager, and shutdown the KafkaRaftManager during shutdown.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15816) Typos in tests leak network sockets

2023-11-13 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15816:

Description: 
There are a few tests which leak network sockets due to small typos in the 
tests themselves.

Clients: https://github.com/apache/kafka/pull/14750
 * KafkaConsumerTest
 * KafkaProducerTest
 * ConfigResourceTest
 * SelectorTest
 * SslTransportLayerTest
 * SslTransportTls12Tls13Test
 * SslVersionsTransportLayerTest

Core:
 * DescribeAuthorizedOperationsTest
 * SslGssapiSslEndToEndAuthorizationTest
 * SaslMultiMechanismConsumerTest
 * SaslPlaintextConsumerTest
 * SaslSslAdminIntegrationTest
 * SaslSslConsumerTest
 * MultipleListenersWithDefaultJaasContextTest
 * DescribeClusterRequestTest

Trogdor:
 * AgentTest

These can be addressed by just fixing the tests.

  was:
There are a few tests which leak network sockets due to small typos in the 
tests themselves.

Clients:
 * KafkaConsumerTest
 * KafkaProducerTest
 * ConfigResourceTest
 * SelectorTest
 * SslTransportLayerTest
 * SslTransportTls12Tls13Test
 * SslVersionsTransportLayerTest

Core:
 * DescribeAuthorizedOperationsTest
 * SslGssapiSslEndToEndAuthorizationTest
 * SaslMultiMechanismConsumerTest
 * SaslPlaintextConsumerTest
 * SaslSslAdminIntegrationTest
 * SaslSslConsumerTest
 * MultipleListenersWithDefaultJaasContextTest
 * DescribeClusterRequestTest

Trogdor:
 * AgentTest

These can be addressed by just fixing the tests.


> Typos in tests leak network sockets
> ---
>
> Key: KAFKA-15816
> URL: https://issues.apache.org/jira/browse/KAFKA-15816
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Minor
>
> There are a few tests which leak network sockets due to small typos in the 
> tests themselves.
> Clients: https://github.com/apache/kafka/pull/14750
>  * KafkaConsumerTest
>  * KafkaProducerTest
>  * ConfigResourceTest
>  * SelectorTest
>  * SslTransportLayerTest
>  * SslTransportTls12Tls13Test
>  * SslVersionsTransportLayerTest
> Core:
>  * DescribeAuthorizedOperationsTest
>  * SslGssapiSslEndToEndAuthorizationTest
>  * SaslMultiMechanismConsumerTest
>  * SaslPlaintextConsumerTest
>  * SaslSslAdminIntegrationTest
>  * SaslSslConsumerTest
>  * MultipleListenersWithDefaultJaasContextTest
>  * DescribeClusterRequestTest
> Trogdor:
>  * AgentTest
> These can be addressed by just fixing the tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15816) Typos in tests leak network sockets

2023-11-13 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15816:
---

 Summary: Typos in tests leak network sockets
 Key: KAFKA-15816
 URL: https://issues.apache.org/jira/browse/KAFKA-15816
 Project: Kafka
  Issue Type: Bug
  Components: unit tests
Affects Versions: 3.6.0
Reporter: Greg Harris
Assignee: Greg Harris


There are a few tests which leak network sockets due to small typos in the 
tests themselves.

Clients:
 * KafkaConsumerTest
 * KafkaProducerTest
 * ConfigResourceTest
 * SelectorTest
 * SslTransportLayerTest
 * SslTransportTls12Tls13Test
 * SslVersionsTransportLayerTest

Core:
 * DescribeAuthorizedOperationsTest
 * SslGssapiSslEndToEndAuthorizationTest
 * SaslMultiMechanismConsumerTest
 * SaslPlaintextConsumerTest
 * SaslSslAdminIntegrationTest
 * SaslSslConsumerTest
 * MultipleListenersWithDefaultJaasContextTest
 * DescribeClusterRequestTest

Trogdor:
 * AgentTest

These can be addressed by just fixing the tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15815) JsonRestServer leaks sockets via HttpURLConnection when keep-alive enabled

2023-11-13 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15815:
---

 Summary: JsonRestServer leaks sockets via HttpURLConnection when 
keep-alive enabled
 Key: KAFKA-15815
 URL: https://issues.apache.org/jira/browse/KAFKA-15815
 Project: Kafka
  Issue Type: Bug
Affects Versions: 3.6.0
Reporter: Greg Harris


By default HttpURLConnection has keep-alive enabled, which allows a single 
HttpURLConnection to be left open in order to be re-used for later requests. 
This means that despite JsonRestServer calling `close()` on the relevant 
InputStream, and calling `disconnect()` on the connection itself, the 
HttpURLConnection does not call `close()` on the underlying socket.

This affects the Trogdor AgentTest and CoordinatorTest suites, where most of 
the methods make HTTP requests using the JsonRestServer. The effect is that ~32 
sockets are leaked per test run, all remaining in the CLOSE_WAIT state (half 
closed) after the test. This is because the JettyServer has correctly closed 
the connections, but the HttpURLConnection has not.

There does not appear to be a way to locally override the HttpURLConnection's 
behavior in this case, and only disabling keep-alive overall (via the system 
property `http.keepAlive=false`) seems to resolve the socket leaks.

To prevent the leaks, we can move JsonRestServer to an alternative HTTP 
implementation, perhaps the jetty-client that Connect uses, or disable 
keepAlive during tests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15804) Broker leaks ServerSocketChannel when exception is thrown from ZkConfigManager during startup

2023-11-09 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15804?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784630#comment-17784630
 ] 

Greg Harris commented on KAFKA-15804:
-

I believe what is happening is:

1. The SocketServer is created
2. The exception is thrown from the dynamicConfigManager
3. SocketServer.enableRequestProcessing is never called, so the SocketServer is 
never started
4. Because the SocketServer is never started, the SocketServer main runnable 
never exits, and so the SocketServerChannel is never closed.

I think that when the SocketServer beginShutdown()/shutdown() is called without 
calling enableRequestProcessing() first, the SocketServer should still close 
these resources.

> Broker leaks ServerSocketChannel when exception is thrown from 
> ZkConfigManager during startup
> -
>
> Key: KAFKA-15804
> URL: https://issues.apache.org/jira/browse/KAFKA-15804
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Priority: Minor
>
> This exception is thrown during the 
> RemoteTopicCrudTest.testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic
>  test in zk mode:
> {noformat}
> org.apache.kafka.common.config.ConfigException: You have to delete all topics 
> with the property remote.storage.enable=true before disabling tiered storage 
> cluster-wide
> at 
> org.apache.kafka.storage.internals.log.LogConfig.validateRemoteStorageOnlyIfSystemEnabled(LogConfig.java:566)
>         at kafka.log.LogManager.updateTopicConfig(LogManager.scala:956)
>         at 
> kafka.server.TopicConfigHandler.updateLogConfig(ConfigHandler.scala:73)
> at 
> kafka.server.TopicConfigHandler.processConfigChanges(ConfigHandler.scala:94)
> at 
> kafka.server.ZkConfigManager.$anonfun$startup$4(ZkConfigManager.scala:176)
> at 
> kafka.server.ZkConfigManager.$anonfun$startup$4$adapted(ZkConfigManager.scala:175)
> at scala.collection.immutable.Map$Map2.foreach(Map.scala:360)
> at 
> kafka.server.ZkConfigManager.$anonfun$startup$1(ZkConfigManager.scala:175)
> at 
> kafka.server.ZkConfigManager.$anonfun$startup$1$adapted(ZkConfigManager.scala:166)
> at scala.collection.immutable.HashMap.foreach(HashMap.scala:1115)
> at kafka.server.ZkConfigManager.startup(ZkConfigManager.scala:166)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:575)
> at 
> kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1(KafkaServerTestHarness.scala:356)
> at 
> kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1$adapted(KafkaServerTestHarness.scala:352)
> at scala.collection.immutable.List.foreach(List.scala:333)
> at 
> kafka.integration.KafkaServerTestHarness.createBrokers(KafkaServerTestHarness.scala:352)
> at 
> kafka.integration.KafkaServerTestHarness.recreateBrokers(KafkaServerTestHarness.scala:146)
> at 
> kafka.admin.RemoteTopicCrudTest.$anonfun$testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic$1(RemoteTopicCrudTest.scala:319)
> at 
> org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:53)
> at 
> org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:35)
> at org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3111)
> at 
> kafka.admin.RemoteTopicCrudTest.testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic(RemoteTopicCrudTest.scala:319){noformat}
> This leak only occurs for this one test in the RemoteTopicCrudTest; all other 
> tests including the kraft-mode version do not exhibit a leaked socket.
> Here is where the ServerSocket is instantiated:
> {noformat}
> at 
> java.base/java.nio.channels.ServerSocketChannel.open(ServerSocketChannel.java:113)
>         at kafka.network.Acceptor.openServerSocket(SocketServer.scala:724)
>         at kafka.network.Acceptor.(SocketServer.scala:608)
>         at kafka.network.DataPlaneAcceptor.(SocketServer.scala:454)
>         at 
> kafka.network.SocketServer.createDataPlaneAcceptor(SocketServer.scala:270)
>         at 
> kafka.network.SocketServer.createDataPlaneAcceptorAndProcessors(SocketServer.scala:249)
>         at kafka.network.SocketServer.$anonfun$new$31(SocketServer.scala:175)
>         at 
> kafka.network.SocketServer.$anonfun$new$31$adapted(SocketServer.scala:175)
>         at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
>         at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
>         at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
>         at kafka.network.SocketServer.(SocketServer.scala:175)
>         at kafka.server.KafkaServer.startup(KafkaServer.scala:344)
>      

[jira] [Updated] (KAFKA-15804) Broker leaks ServerSocketChannel when exception is thrown from ZkConfigManager during startup

2023-11-09 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15804:

Component/s: (was: Tiered-Storage)

> Broker leaks ServerSocketChannel when exception is thrown from 
> ZkConfigManager during startup
> -
>
> Key: KAFKA-15804
> URL: https://issues.apache.org/jira/browse/KAFKA-15804
> Project: Kafka
>  Issue Type: Bug
>  Components: core, unit tests
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Priority: Minor
>
> This exception is thrown during the 
> RemoteTopicCrudTest.testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic
>  test in zk mode:
> {noformat}
> org.apache.kafka.common.config.ConfigException: You have to delete all topics 
> with the property remote.storage.enable=true before disabling tiered storage 
> cluster-wide
> at 
> org.apache.kafka.storage.internals.log.LogConfig.validateRemoteStorageOnlyIfSystemEnabled(LogConfig.java:566)
>         at kafka.log.LogManager.updateTopicConfig(LogManager.scala:956)
>         at 
> kafka.server.TopicConfigHandler.updateLogConfig(ConfigHandler.scala:73)
> at 
> kafka.server.TopicConfigHandler.processConfigChanges(ConfigHandler.scala:94)
> at 
> kafka.server.ZkConfigManager.$anonfun$startup$4(ZkConfigManager.scala:176)
> at 
> kafka.server.ZkConfigManager.$anonfun$startup$4$adapted(ZkConfigManager.scala:175)
> at scala.collection.immutable.Map$Map2.foreach(Map.scala:360)
> at 
> kafka.server.ZkConfigManager.$anonfun$startup$1(ZkConfigManager.scala:175)
> at 
> kafka.server.ZkConfigManager.$anonfun$startup$1$adapted(ZkConfigManager.scala:166)
> at scala.collection.immutable.HashMap.foreach(HashMap.scala:1115)
> at kafka.server.ZkConfigManager.startup(ZkConfigManager.scala:166)
> at kafka.server.KafkaServer.startup(KafkaServer.scala:575)
> at 
> kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1(KafkaServerTestHarness.scala:356)
> at 
> kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1$adapted(KafkaServerTestHarness.scala:352)
> at scala.collection.immutable.List.foreach(List.scala:333)
> at 
> kafka.integration.KafkaServerTestHarness.createBrokers(KafkaServerTestHarness.scala:352)
> at 
> kafka.integration.KafkaServerTestHarness.recreateBrokers(KafkaServerTestHarness.scala:146)
> at 
> kafka.admin.RemoteTopicCrudTest.$anonfun$testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic$1(RemoteTopicCrudTest.scala:319)
> at 
> org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:53)
> at 
> org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:35)
> at org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3111)
> at 
> kafka.admin.RemoteTopicCrudTest.testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic(RemoteTopicCrudTest.scala:319){noformat}
> This leak only occurs for this one test in the RemoteTopicCrudTest; all other 
> tests including the kraft-mode version do not exhibit a leaked socket.
> Here is where the ServerSocket is instantiated:
> {noformat}
> at 
> java.base/java.nio.channels.ServerSocketChannel.open(ServerSocketChannel.java:113)
>         at kafka.network.Acceptor.openServerSocket(SocketServer.scala:724)
>         at kafka.network.Acceptor.(SocketServer.scala:608)
>         at kafka.network.DataPlaneAcceptor.(SocketServer.scala:454)
>         at 
> kafka.network.SocketServer.createDataPlaneAcceptor(SocketServer.scala:270)
>         at 
> kafka.network.SocketServer.createDataPlaneAcceptorAndProcessors(SocketServer.scala:249)
>         at kafka.network.SocketServer.$anonfun$new$31(SocketServer.scala:175)
>         at 
> kafka.network.SocketServer.$anonfun$new$31$adapted(SocketServer.scala:175)
>         at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
>         at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
>         at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
>         at kafka.network.SocketServer.(SocketServer.scala:175)
>         at kafka.server.KafkaServer.startup(KafkaServer.scala:344)
>         at 
> kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1(KafkaServerTestHarness.scala:356)
>         at 
> kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1$adapted(KafkaServerTestHarness.scala:352)
>         at scala.collection.immutable.List.foreach(List.scala:333)
>         at 
> kafka.integration.KafkaServerTestHarness.createBrokers(KafkaServerTestHarness.scala:352)
>         at 
> kafka.integration.KafkaServerTestHarness.recreateBrokers(KafkaServerTestHarness.scala:146)
>         at 
> 

[jira] [Updated] (KAFKA-15804) Broker leaks ServerSocketChannel when exception is thrown from ZkConfigManager during startup

2023-11-09 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15804:

Description: 
This exception is thrown during the 
RemoteTopicCrudTest.testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic
 test in zk mode:
{noformat}
org.apache.kafka.common.config.ConfigException: You have to delete all topics 
with the property remote.storage.enable=true before disabling tiered storage 
cluster-wide
at 
org.apache.kafka.storage.internals.log.LogConfig.validateRemoteStorageOnlyIfSystemEnabled(LogConfig.java:566)
        at kafka.log.LogManager.updateTopicConfig(LogManager.scala:956)
        at 
kafka.server.TopicConfigHandler.updateLogConfig(ConfigHandler.scala:73)
at 
kafka.server.TopicConfigHandler.processConfigChanges(ConfigHandler.scala:94)
at 
kafka.server.ZkConfigManager.$anonfun$startup$4(ZkConfigManager.scala:176)
at 
kafka.server.ZkConfigManager.$anonfun$startup$4$adapted(ZkConfigManager.scala:175)
at scala.collection.immutable.Map$Map2.foreach(Map.scala:360)
at 
kafka.server.ZkConfigManager.$anonfun$startup$1(ZkConfigManager.scala:175)
at 
kafka.server.ZkConfigManager.$anonfun$startup$1$adapted(ZkConfigManager.scala:166)
at scala.collection.immutable.HashMap.foreach(HashMap.scala:1115)
at kafka.server.ZkConfigManager.startup(ZkConfigManager.scala:166)
at kafka.server.KafkaServer.startup(KafkaServer.scala:575)
at 
kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1(KafkaServerTestHarness.scala:356)
at 
kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1$adapted(KafkaServerTestHarness.scala:352)
at scala.collection.immutable.List.foreach(List.scala:333)
at 
kafka.integration.KafkaServerTestHarness.createBrokers(KafkaServerTestHarness.scala:352)
at 
kafka.integration.KafkaServerTestHarness.recreateBrokers(KafkaServerTestHarness.scala:146)
at 
kafka.admin.RemoteTopicCrudTest.$anonfun$testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic$1(RemoteTopicCrudTest.scala:319)
at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:53)
at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:35)
at org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3111)
at 
kafka.admin.RemoteTopicCrudTest.testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic(RemoteTopicCrudTest.scala:319){noformat}
This leak only occurs for this one test in the RemoteTopicCrudTest; all other 
tests including the kraft-mode version do not exhibit a leaked socket.

Here is where the ServerSocket is instantiated:
{noformat}
at 
java.base/java.nio.channels.ServerSocketChannel.open(ServerSocketChannel.java:113)
        at kafka.network.Acceptor.openServerSocket(SocketServer.scala:724)
        at kafka.network.Acceptor.(SocketServer.scala:608)
        at kafka.network.DataPlaneAcceptor.(SocketServer.scala:454)
        at 
kafka.network.SocketServer.createDataPlaneAcceptor(SocketServer.scala:270)
        at 
kafka.network.SocketServer.createDataPlaneAcceptorAndProcessors(SocketServer.scala:249)
        at kafka.network.SocketServer.$anonfun$new$31(SocketServer.scala:175)
        at 
kafka.network.SocketServer.$anonfun$new$31$adapted(SocketServer.scala:175)
        at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
        at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
        at kafka.network.SocketServer.(SocketServer.scala:175)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:344)
        at 
kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1(KafkaServerTestHarness.scala:356)
        at 
kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1$adapted(KafkaServerTestHarness.scala:352)
        at scala.collection.immutable.List.foreach(List.scala:333)
        at 
kafka.integration.KafkaServerTestHarness.createBrokers(KafkaServerTestHarness.scala:352)
        at 
kafka.integration.KafkaServerTestHarness.recreateBrokers(KafkaServerTestHarness.scala:146)
        at 
kafka.admin.RemoteTopicCrudTest.$anonfun$testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic$1(RemoteTopicCrudTest.scala:319)
        at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:53)
        at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:35)
        at org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3111)
        at 
kafka.admin.RemoteTopicCrudTest.testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic(RemoteTopicCrudTest.scala:319){noformat}
And the associated DataPlaneAcceptor:
{noformat}
         at java.base/java.nio.channels.Selector.open(Selector.java:295)
         at 

[jira] [Created] (KAFKA-15804) Broker leaks ServerSocketChannel when exception is thrown from ZkConfigManager during startup

2023-11-09 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15804:
---

 Summary: Broker leaks ServerSocketChannel when exception is thrown 
from ZkConfigManager during startup
 Key: KAFKA-15804
 URL: https://issues.apache.org/jira/browse/KAFKA-15804
 Project: Kafka
  Issue Type: Bug
  Components: core, Tiered-Storage, unit tests
Affects Versions: 3.6.0
Reporter: Greg Harris


This exception is thrown during the 
RemoteTopicCrudTest.testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic
 test in zk mode:
{noformat}
org.apache.kafka.common.config.ConfigException: You have to delete all topics 
with the property remote.storage.enable=true before disabling tiered storage 
cluster-wide

org.apache.kafka.storage.internals.log.LogConfig.validateRemoteStorageOnlyIfSystemEnabled(LogConfig.java:566)
kafka.log.LogManager.updateTopicConfig(LogManager.scala:956)
kafka.server.TopicConfigHandler.updateLogConfig(ConfigHandler.scala:73)
kafka.server.TopicConfigHandler.processConfigChanges(ConfigHandler.scala:94)
kafka.server.ZkConfigManager.$anonfun$startup$4(ZkConfigManager.scala:176)
kafka.server.ZkConfigManager.$anonfun$startup$4$adapted(ZkConfigManager.scala:175)
scala.collection.immutable.Map$Map2.foreach(Map.scala:360)
kafka.server.ZkConfigManager.$anonfun$startup$1(ZkConfigManager.scala:175)
kafka.server.ZkConfigManager.$anonfun$startup$1$adapted(ZkConfigManager.scala:166)
scala.collection.immutable.HashMap.foreach(HashMap.scala:1115)
kafka.server.ZkConfigManager.startup(ZkConfigManager.scala:166)
kafka.server.KafkaServer.startup(KafkaServer.scala:575)
kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1(KafkaServerTestHarness.scala:356)
kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1$adapted(KafkaServerTestHarness.scala:352)
scala.collection.immutable.List.foreach(List.scala:333)
kafka.integration.KafkaServerTestHarness.createBrokers(KafkaServerTestHarness.scala:352)
kafka.integration.KafkaServerTestHarness.recreateBrokers(KafkaServerTestHarness.scala:146)
kafka.admin.RemoteTopicCrudTest.$anonfun$testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic$1(RemoteTopicCrudTest.scala:319)
org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:53)
org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:35)
org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3111)
kafka.admin.RemoteTopicCrudTest.testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic(RemoteTopicCrudTest.scala:319){noformat}
This leak only occurs for this one test in the RemoteTopicCrudTest; all other 
tests including the kraft-mode version do not exhibit a leaked socket.

Here is where the ServerSocket is instantiated:
{noformat}
at 
java.base/java.nio.channels.ServerSocketChannel.open(ServerSocketChannel.java:113)
        at kafka.network.Acceptor.openServerSocket(SocketServer.scala:724)
        at kafka.network.Acceptor.(SocketServer.scala:608)
        at kafka.network.DataPlaneAcceptor.(SocketServer.scala:454)
        at 
kafka.network.SocketServer.createDataPlaneAcceptor(SocketServer.scala:270)
        at 
kafka.network.SocketServer.createDataPlaneAcceptorAndProcessors(SocketServer.scala:249)
        at kafka.network.SocketServer.$anonfun$new$31(SocketServer.scala:175)
        at 
kafka.network.SocketServer.$anonfun$new$31$adapted(SocketServer.scala:175)
        at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
        at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
        at scala.collection.AbstractIterable.foreach(Iterable.scala:933)
        at kafka.network.SocketServer.(SocketServer.scala:175)
        at kafka.server.KafkaServer.startup(KafkaServer.scala:344)
        at 
kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1(KafkaServerTestHarness.scala:356)
        at 
kafka.integration.KafkaServerTestHarness.$anonfun$createBrokers$1$adapted(KafkaServerTestHarness.scala:352)
        at scala.collection.immutable.List.foreach(List.scala:333)
        at 
kafka.integration.KafkaServerTestHarness.createBrokers(KafkaServerTestHarness.scala:352)
        at 
kafka.integration.KafkaServerTestHarness.recreateBrokers(KafkaServerTestHarness.scala:146)
        at 
kafka.admin.RemoteTopicCrudTest.$anonfun$testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic$1(RemoteTopicCrudTest.scala:319)
        at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:53)
        at org.junit.jupiter.api.AssertThrows.assertThrows(AssertThrows.java:35)
        at org.junit.jupiter.api.Assertions.assertThrows(Assertions.java:3111)
        at 
kafka.admin.RemoteTopicCrudTest.testClusterWideDisablementOfTieredStorageWithEnabledTieredTopic(RemoteTopicCrudTest.scala:319){noformat}
And the associated DataPlaneAcceptor:
{noformat}
        at java.base/java.nio.channels.Selector.open(Selector.java:295)         
at 

[jira] [Commented] (KAFKA-15800) Malformed connect source offsets corrupt other partitions with DataException

2023-11-09 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17784555#comment-17784555
 ] 

Greg Harris commented on KAFKA-15800:
-

[~showuon] Sorry for not bringing this up in the release thread, that slipped 
my mind.

We should be able to merge this before Tuesday 14th. I'll work with the 
reviewers to get this moving quickly.

> Malformed connect source offsets corrupt other partitions with DataException
> 
>
> Key: KAFKA-15800
> URL: https://issues.apache.org/jira/browse/KAFKA-15800
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.5.0, 3.6.0, 3.5.1
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Blocker
> Fix For: 3.5.2, 3.7.0, 3.6.1
>
>
> The KafkaOffsetBackingStore consumer callback was recently augmented with a 
> call to OffsetUtils.processPartitionKey: 
> [https://github.com/apache/kafka/blob/f1e58a35d7aebbe72844faf3e5019d9aa7a85e4a/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaOffsetBackingStore.java#L323]
> This function deserializes the offset key, which may be malformed in the 
> topic: 
> [https://github.com/apache/kafka/blob/f1e58a35d7aebbe72844faf3e5019d9aa7a85e4a/connect/runtime/src/main/java/org/apache/kafka/connect/storage/OffsetUtils.java#L92]
> When this happens, a DataException is thrown, and propagates to the 
> KafkaBasedLog try-catch surrounding the batch processing of the records: 
> [https://github.com/apache/kafka/blob/f1e58a35d7aebbe72844faf3e5019d9aa7a85e4a/connect/runtime/src/main/java/org/apache/kafka/connect/util/KafkaBasedLog.java#L445-L454]
> For example:
> {noformat}
> ERROR Error polling: org.apache.kafka.connect.errors.DataException: 
> Converting byte[] to Kafka Connect data failed due to serialization error:  
> (org.apache.kafka.connect.util.KafkaBasedLog:453){noformat}
> This means that one DataException for a malformed record may cause the 
> remainder of the batch to be dropped, corrupting the in-memory state of the 
> KafkaOffsetBackingStore. This prevents tasks using the 
> KafkaOffsetBackingStore from seeing all of the offsets in the topics, and can 
> cause duplicate records to be emitted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15800) Malformed connect source offsets corrupt other partitions with DataException

2023-11-08 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15800:
---

 Summary: Malformed connect source offsets corrupt other partitions 
with DataException
 Key: KAFKA-15800
 URL: https://issues.apache.org/jira/browse/KAFKA-15800
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 3.5.1, 3.6.0, 3.5.0
Reporter: Greg Harris
Assignee: Greg Harris
 Fix For: 3.5.2, 3.7.0, 3.6.1


The KafkaOffsetBackingStore consumer callback was recently augmented with a 
call to OffsetUtils.processPartitionKey: 
[https://github.com/apache/kafka/blob/f1e58a35d7aebbe72844faf3e5019d9aa7a85e4a/connect/runtime/src/main/java/org/apache/kafka/connect/storage/KafkaOffsetBackingStore.java#L323]

This function deserializes the offset key, which may be malformed in the topic: 
[https://github.com/apache/kafka/blob/f1e58a35d7aebbe72844faf3e5019d9aa7a85e4a/connect/runtime/src/main/java/org/apache/kafka/connect/storage/OffsetUtils.java#L92]

When this happens, a DataException is thrown, and propagates to the 
KafkaBasedLog try-catch surrounding the batch processing of the records: 
[https://github.com/apache/kafka/blob/f1e58a35d7aebbe72844faf3e5019d9aa7a85e4a/connect/runtime/src/main/java/org/apache/kafka/connect/util/KafkaBasedLog.java#L445-L454]

For example:
{noformat}
ERROR Error polling: org.apache.kafka.connect.errors.DataException: Converting 
byte[] to Kafka Connect data failed due to serialization error:  
(org.apache.kafka.connect.util.KafkaBasedLog:453){noformat}
This means that one DataException for a malformed record may cause the 
remainder of the batch to be dropped, corrupting the in-memory state of the 
KafkaOffsetBackingStore. This prevents tasks using the KafkaOffsetBackingStore 
from seeing all of the offsets in the topics, and can cause duplicate records 
to be emitted.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15564) Kafka 3.5.1- Mirror Maker 2 replicating the wrong consumer group offsets in destination

2023-10-25 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17779568#comment-17779568
 ] 

Greg Harris commented on KAFKA-15564:
-

[~hemanthsavasere] Were you able to work around this problem with 
`offset.lag.max`?

> Kafka 3.5.1- Mirror Maker 2 replicating the wrong consumer group offsets in 
> destination 
> 
>
> Key: KAFKA-15564
> URL: https://issues.apache.org/jira/browse/KAFKA-15564
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.5.1
>Reporter: Hemanth Savasere
>Priority: Major
> Attachments: Destination Consumer Group Offsets Describe.txt, MM2 
> 3.5.1 Logs.txt, Source Consumer Group Offsets Describe.txt, 
> docker-compose.yml, mm2.properties
>
>
> Issue : Mirror Maker 2 not replicating the proper consumer group offsets when 
> replication happens from source to destination kafka brokers
> Steps to Reproduce :
> # Created 2 Kafka clusters locally using the attached docker-compose.yml 
> file. Was using the confluent platform 7.2.1 docker images
> # Had added the below section in docker file to create 20 topics and produce 
> randomly between 1000 to 2000 messages in each topic, and then consume the 
> same messages using consumer-groups.   
> {code:java}
> /etc/confluent/docker/run &
> sleep 20
> for i in {1..20}; do
>   kafka-topics --create --bootstrap-server kafka-1:9092 
> --replication-factor 1 --partitions 1 --topic topic$$i
> done
> for i in {1..20}; do
>   num_msgs=$$(shuf -i 1000-2000 -n 1)
>   seq 1 $$num_msgs | kafka-console-producer --broker-list 
> kafka-1:9092 --topic topic$$i
> done
> for i in {1..20}; do
>   timeout 10 kafka-console-consumer --bootstrap-server kafka-1:9092 
> --topic topic$$i --group consumer-group$$i
> done
> wait 
> {code}
> # Ran the Mirror Maker 2 using the connect-mirror-maker.sh script after 
> downloading the Kafka 3.5.1 release binaries. Verified the 3.5.1 version was 
> running and commitID 2c6fb6c54472e90a was shown in attached MM2 logs file.   
> NOTE : Have attached the Kafka Docker file, the mirror maker 2 logs and 
> mirror maker 2 properties file. Also, have attached the describe command 
> output of all the consumer groups present in source and destination.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-14767) Gradle build fails with missing commitId after git gc

2023-10-23 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14767?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-14767:

Fix Version/s: 3.4.2
   3.5.2
   3.6.1

> Gradle build fails with missing commitId after git gc
> -
>
> Key: KAFKA-14767
> URL: https://issues.apache.org/jira/browse/KAFKA-14767
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Minor
> Fix For: 3.4.2, 3.5.2, 3.7.0, 3.6.1
>
>
> Reproduction steps:
> 1. `git gc`
> 2. `./gradlew jar`
> Expected behavior: build completes successfully (or shows other build errors)
> Actual behavior:
> {noformat}
> Task failed with an exception.
> ---
> * What went wrong:
> A problem was found with the configuration of task 
> ':storage:createVersionFile' (type 'DefaultTask').
>   - Property 'commitId' doesn't have a configured value.
>     
>     Reason: This property isn't marked as optional and no value has been 
> configured.
>     
>     Possible solutions:
>       1. Assign a value to 'commitId'.
>       2. Mark property 'commitId' as optional.
>     
>     Please refer to 
> https://docs.gradle.org/7.6/userguide/validation_problems.html#value_not_set 
> for more details about this problem.{noformat}
> This appears to be due to the fact that the build.gradle determineCommitId() 
> function is unable to read the git commit hash for the current HEAD. This 
> appears to happen after a `git gc` takes place, which causes the 
> `.git/refs/heads/*` files to be moved to `.git/packed-refs`.
> The determineCommitId() should be patched to also try reading from the 
> packed-refs to determine the commit hash.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-10339) MirrorMaker2 Exactly-once Semantics

2023-10-18 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776850#comment-17776850
 ] 

Greg Harris commented on KAFKA-10339:
-

[~funkerman] Hello!

In your example, no, because auto-synced offsets is at-least-once. If Consumer 
A reads and commits A, B, C, they may be re-read by consumer B. The guarantee 
that "MM2 EOS" provides is that consumer B will read mirrored messages at most 
once, regardless of what consumer A does.

> MirrorMaker2 Exactly-once Semantics
> ---
>
> Key: KAFKA-10339
> URL: https://issues.apache.org/jira/browse/KAFKA-10339
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect, mirrormaker
>Reporter: Ning Zhang
>Assignee: Ning Zhang
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.5.0
>
> Attachments: image-2023-10-17-15-33-57-983.png
>
>
> MirrorMaker2 is currently implemented on Kafka Connect Framework, more 
> specifically the Source Connector / Task, which do not provide exactly-once 
> semantics (EOS) out-of-the-box, as discussed in 
> https://github.com/confluentinc/kafka-connect-jdbc/issues/461,  
> https://github.com/apache/kafka/pull/5553, 
> https://issues.apache.org/jira/browse/KAFKA-6080  and 
> https://issues.apache.org/jira/browse/KAFKA-3821. Therefore MirrorMaker2 
> currently does not provide EOS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15611) Use virtual threads in the Connect framework

2023-10-16 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15611:
---

 Summary: Use virtual threads in the Connect framework
 Key: KAFKA-15611
 URL: https://issues.apache.org/jira/browse/KAFKA-15611
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: Greg Harris


Virtual threads have been finalized in JDK21, so we may include optional 
support for them in Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15575) Prevent Connectors from exceeding tasks.max configuration

2023-10-10 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15575:
---

 Summary: Prevent Connectors from exceeding tasks.max configuration
 Key: KAFKA-15575
 URL: https://issues.apache.org/jira/browse/KAFKA-15575
 Project: Kafka
  Issue Type: Task
  Components: KafkaConnect
Reporter: Greg Harris


The Connector::taskConfigs(int maxTasks) function is used by Connectors to 
enumerate tasks configurations. This takes an argument which comes from the 
tasks.max connector config. This is the Javadoc for that method:
{noformat}
/**
 * Returns a set of configurations for Tasks based on the current configuration,
 * producing at most {@code maxTasks} configurations.
 *
 * @param maxTasks maximum number of configurations to generate
 * @return configurations for Tasks
 */
public abstract List> taskConfigs(int maxTasks);
{noformat}
This includes the constraint that the number of tasks is at most maxTasks, but 
this constraint is not enforced by the framework.

 

We should begin enforcing this constraint by dropping configs that exceed the 
limit, and logging a warning. For sink connectors this should harmlessly 
rebalance the consumer subscriptions onto the remaining tasks. For source 
connectors that distribute their work via task configs, this may result in an 
interruption in data transfer.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15564) Kafka 3.5.1- Mirror Maker 2 replicating the wrong consumer group offsets in destination

2023-10-07 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17772847#comment-17772847
 ] 

Greg Harris commented on KAFKA-15564:
-

Hi [~hemanthsavasere] the offsets translation behavior was changed in 
KAFKA-12468 to avoid a data loss bug. You will need to lower the 
`offset.lag.max` if you want more precise consumer offset translation at the 
end of the topic.

> Kafka 3.5.1- Mirror Maker 2 replicating the wrong consumer group offsets in 
> destination 
> 
>
> Key: KAFKA-15564
> URL: https://issues.apache.org/jira/browse/KAFKA-15564
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Affects Versions: 3.5.1
>Reporter: Hemanth Savasere
>Priority: Major
> Attachments: Destination Consumer Group Offsets Describe.txt, MM2 
> 3.5.1 Logs.txt, Source Consumer Group Offsets Describe.txt, 
> docker-compose.yml, mm2.properties
>
>
> Issue : Mirror Maker 2 not replicating the proper consumer group offsets when 
> replication happens from source to destination kafka brokers
> Steps to Reproduce :
> # Created 2 Kafka clusters locally using the attached docker-compose.yml 
> file. Was using the confluent platform 7.2.1 docker images
> # Had added the below section in docker file to create 20 topics and produce 
> randomly between 1000 to 2000 messages in each topic, and then consume the 
> same messages using consumer-groups.   
> {code:java}
> /etc/confluent/docker/run &
> sleep 20
> for i in {1..20}; do
>   kafka-topics --create --bootstrap-server kafka-1:9092 
> --replication-factor 1 --partitions 1 --topic topic$$i
> done
> for i in {1..20}; do
>   num_msgs=$$(shuf -i 1000-2000 -n 1)
>   seq 1 $$num_msgs | kafka-console-producer --broker-list 
> kafka-1:9092 --topic topic$$i
> done
> for i in {1..20}; do
>   timeout 10 kafka-console-consumer --bootstrap-server kafka-1:9092 
> --topic topic$$i --group consumer-group$$i
> done
> wait 
> {code}
> # Ran the Mirror Maker 2 using the connect-mirror-maker.sh script after 
> downloading the Kafka 3.5.1 release binaries. Verified the 3.5.1 version was 
> running and commitID 2c6fb6c54472e90a was shown in attached MM2 logs file.   
> NOTE : Have attached the Kafka Docker file, the mirror maker 2 logs and 
> mirror maker 2 properties file. Also, have attached the describe command 
> output of all the consumer groups present in source and destination.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15559) KIP-987: Connect Static Assignments

2023-10-05 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15559:
---

 Summary: KIP-987: Connect Static Assignments
 Key: KAFKA-15559
 URL: https://issues.apache.org/jira/browse/KAFKA-15559
 Project: Kafka
  Issue Type: New Feature
  Components: KafkaConnect
Reporter: Greg Harris
Assignee: Greg Harris


https://cwiki.apache.org/confluence/display/KAFKA/KIP-987%3A+Connect+Static+Assignments



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15528) KIP-986: Cross-Cluster Replication

2023-10-02 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15528:
---

 Summary: KIP-986: Cross-Cluster Replication
 Key: KAFKA-15528
 URL: https://issues.apache.org/jira/browse/KAFKA-15528
 Project: Kafka
  Issue Type: New Feature
Reporter: Greg Harris


https://cwiki.apache.org/confluence/display/KAFKA/KIP-986%3A+Cross-Cluster+Replication



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-10339) MirrorMaker2 Exactly-once Semantics

2023-09-20 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10339?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17767334#comment-17767334
 ] 

Greg Harris commented on KAFKA-10339:
-

Please note that this KAFKA ticket only applies to data replication being 
exactly once. This means that if you replicate data between two topics, and 
consume from the target topic from the beginning with read_committed, you will 
read each record in the source topic exactly once.

It does _not_ apply to the MirrorCheckpointTask, offset translation, or 
consumer group auto sync. These processes are _not_ exactly once, they are 
at-least-once. This means that if you commit offsets on your upstream topic and 
then start consuming from the target topic with the translated offsets, you may 
read the same record more than once. You should read every record at least once.

> MirrorMaker2 Exactly-once Semantics
> ---
>
> Key: KAFKA-10339
> URL: https://issues.apache.org/jira/browse/KAFKA-10339
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect, mirrormaker
>Reporter: Ning Zhang
>Assignee: Ning Zhang
>Priority: Major
>  Labels: needs-kip
> Fix For: 3.5.0
>
>
> MirrorMaker2 is currently implemented on Kafka Connect Framework, more 
> specifically the Source Connector / Task, which do not provide exactly-once 
> semantics (EOS) out-of-the-box, as discussed in 
> https://github.com/confluentinc/kafka-connect-jdbc/issues/461,  
> https://github.com/apache/kafka/pull/5553, 
> https://issues.apache.org/jira/browse/KAFKA-6080  and 
> https://issues.apache.org/jira/browse/KAFKA-3821. Therefore MirrorMaker2 
> currently does not provide EOS.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15473) Connect connector-plugins endpoint shows duplicate plugins

2023-09-19 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15473:

Description: 
In <3.6.0-rc0, duplicates of a plugin would be shown if it subclassed multiple 
interfaces. For example:
{noformat}
  {
"class": "org.apache.kafka.connect.storage.StringConverter",
"type": "converter"
  },
  { 
"class": "org.apache.kafka.connect.storage.StringConverter",
"type": "converter"
  },{noformat}
In 3.6.0-rc0, there are many more listings for the same plugin. For example:
{noformat}
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter",
    "version": "3.6.0"
  },{noformat}
These duplicates appear to happen when a plugin with the same class name 
appears in multiple locations/classloaders.
When interpreting a connector configuration, only one of these plugins will be 
chosen, so only one is relevant to show to users. The REST API should only 
display the plugins which are eligible to be loaded, and hide the duplicates.

  was:
In <3.6.0-rc0, only one copy of each plugin would be shown. For example:
{noformat}
  {
"class": "org.apache.kafka.connect.storage.StringConverter",
"type": "converter"
  },{noformat}
In 3.6.0-rc0, there are multiple listings for the same plugin. For example:

 
{noformat}
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter",
    "version": "3.6.0"
  },{noformat}
These duplicates appear to happen when a plugin with the same class name 
appears in multiple locations/classloaders.
When interpreting a connector configuration, only one of these plugins will be 
chosen, so only one is relevant to show to users. The REST API should only 
display the plugins which are eligible to be loaded, and hide the duplicates.


> Connect connector-plugins endpoint shows duplicate plugins
> --
>
> Key: KAFKA-15473
> URL: https://issues.apache.org/jira/browse/KAFKA-15473
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Major
> Fix For: 3.6.0
>
>
> In <3.6.0-rc0, duplicates of a plugin would be shown if it subclassed 
> multiple interfaces. For example:
> {noformat}
>   {
> "class": "org.apache.kafka.connect.storage.StringConverter",
> "type": "converter"
>   },
>   { 
> "class": "org.apache.kafka.connect.storage.StringConverter",
> "type": "converter"
>   },{noformat}
> In 3.6.0-rc0, there are many more listings for the same plugin. For example:
> {noformat}
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter",
>     "version": "3.6.0"
>   },{noformat}
> These duplicates appear to happen when a plugin with the same class name 
> appears in multiple locations/classloaders.
> When interpreting a connector configuration, only one of these plugins will 
> be chosen, so only one is relevant to show to users. The REST API should only 
> display the plugins which are eligible to be loaded, and hide the duplicates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15473) Connect connector-plugins endpoint shows duplicate plugins

2023-09-18 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766595#comment-17766595
 ] 

Greg Harris commented on KAFKA-15473:
-

I've opened [https://github.com/apache/kafka/pull/14398] with strategy (3) from 
above. We can always implement (1) in the future and change the 
PluginInfo::equals implementation to show these duplicates, so we can hide them 
for now. I think (2) removes functionality from the API and would count as a 
regression itself.

> Connect connector-plugins endpoint shows duplicate plugins
> --
>
> Key: KAFKA-15473
> URL: https://issues.apache.org/jira/browse/KAFKA-15473
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Major
> Fix For: 3.6.0
>
>
> In <3.6.0-rc0, only one copy of each plugin would be shown. For example:
> {noformat}
>   {
> "class": "org.apache.kafka.connect.storage.StringConverter",
> "type": "converter"
>   },{noformat}
> In 3.6.0-rc0, there are multiple listings for the same plugin. For example:
>  
> {noformat}
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter",
>     "version": "3.6.0"
>   },{noformat}
> These duplicates appear to happen when a plugin with the same class name 
> appears in multiple locations/classloaders.
> When interpreting a connector configuration, only one of these plugins will 
> be chosen, so only one is relevant to show to users. The REST API should only 
> display the plugins which are eligible to be loaded, and hide the duplicates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15473) Connect connector-plugins endpoint shows duplicate plugins

2023-09-18 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766592#comment-17766592
 ] 

Greg Harris commented on KAFKA-15473:
-

Plugins could also appear multiple times <3.6.0-rc0 if multiple versions were 
on the plugin path concurrently. The DelegatingClassLoader would prefer the one 
with the latest version, but all of the different versions would be visible in 
the REST API.
It also treated the undefined version as distinct from defined versions. Since 
3.6.0 is also the first version with KAFKA-15291 and there are some public 
plugins which package the AK transforms, there are now going to be both 
un-versioned and versioned copies of the transforms classes.

I think there are multiple ways to address this:
1. Show the source location of the plugins in the REST API to distinguish 
between apparently equivalent entries
2. Use the DelegatingClassLoader logic for choosing among multiple 
similarly-named plugins to only show the entry which the DelegatingClassLoader 
would use
3. Deduplicate the PluginInfos to avoid obvious repetition, but allow multiple 
versions to still be shown.

(1) would require a KIP, and might be scope creep for the REST API. 
Theoretically the API client shouldn't care about the on-disk layout of plugins.
(2) hides the duplicates introduced by KAFKA-15244 and KAFKA-15291, but also 
hides all but the latest version of each plugin. If someone has multiple 
versions of a plugin installed, they can currently diagnose that via REST API. 
After implementing solution (2), they would need to look at the worker logs.
(3) hides the duplicates introduced by KAFKA-15244, but leaves the duplicates 
introduced by KAFKA-15291, and allows users to diagnose when multiple versions 
are installed.

I'm not sure which of these to implement. In <3.6.0-rc0 it was not possible to 
diagnose installations with multiple copies of the same plugin when they had 
the same (or undefined) version, and in 3.6.0-rc0 that is now possible. If we 
implement solution (2) or (3), we would take away that capability.
We might just be able to leave this as-is, and try to implement some form of 
solution (1) to make these duplicates more descriptive.

> Connect connector-plugins endpoint shows duplicate plugins
> --
>
> Key: KAFKA-15473
> URL: https://issues.apache.org/jira/browse/KAFKA-15473
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Major
> Fix For: 3.6.0
>
>
> In <3.6.0-rc0, only one copy of each plugin would be shown. For example:
> {noformat}
>   {
> "class": "org.apache.kafka.connect.storage.StringConverter",
> "type": "converter"
>   },{noformat}
> In 3.6.0-rc0, there are multiple listings for the same plugin. For example:
>  
> {noformat}
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter",
>     "version": "3.6.0"
>   },{noformat}
> These duplicates appear to happen when a plugin with the same class name 
> appears in multiple locations/classloaders.
> When interpreting a connector configuration, only one of these plugins will 
> be chosen, so only one is relevant to show to users. The REST API should only 
> display the plugins which are eligible to be loaded, and hide the duplicates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15473) Connect connector-plugins endpoint shows duplicate plugins

2023-09-18 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17766578#comment-17766578
 ] 

Greg Harris commented on KAFKA-15473:
-

it appears that the bug which prompted the fix in KAFKA-15244 (wrong PluginType 
being inferred) also could cause duplicates. For example:
{noformat}
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },{noformat}
Here, the second entry should have been "header_converter". So while there are 
more duplicates in 3.6.0-rc0 than there were in <3.6.0-rc0, the presence of 
duplicates is not new. We wouldn't be breaking any third-party clients, because 
they would already have to handle these duplicates.

> Connect connector-plugins endpoint shows duplicate plugins
> --
>
> Key: KAFKA-15473
> URL: https://issues.apache.org/jira/browse/KAFKA-15473
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Blocker
> Fix For: 3.6.0
>
>
> In <3.6.0-rc0, only one copy of each plugin would be shown. For example:
> {noformat}
>   {
> "class": "org.apache.kafka.connect.storage.StringConverter",
> "type": "converter"
>   },{noformat}
> In 3.6.0-rc0, there are multiple listings for the same plugin. For example:
>  
> {noformat}
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter",
>     "version": "3.6.0"
>   },{noformat}
> These duplicates appear to happen when a plugin with the same class name 
> appears in multiple locations/classloaders.
> When interpreting a connector configuration, only one of these plugins will 
> be chosen, so only one is relevant to show to users. The REST API should only 
> display the plugins which are eligible to be loaded, and hide the duplicates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15473) Connect connector-plugins endpoint shows duplicate plugins

2023-09-18 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15473:

Priority: Major  (was: Blocker)

> Connect connector-plugins endpoint shows duplicate plugins
> --
>
> Key: KAFKA-15473
> URL: https://issues.apache.org/jira/browse/KAFKA-15473
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Major
> Fix For: 3.6.0
>
>
> In <3.6.0-rc0, only one copy of each plugin would be shown. For example:
> {noformat}
>   {
> "class": "org.apache.kafka.connect.storage.StringConverter",
> "type": "converter"
>   },{noformat}
> In 3.6.0-rc0, there are multiple listings for the same plugin. For example:
>  
> {noformat}
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter"
>   },
>   {
>     "class": "org.apache.kafka.connect.storage.StringConverter",
>     "type": "converter",
>     "version": "3.6.0"
>   },{noformat}
> These duplicates appear to happen when a plugin with the same class name 
> appears in multiple locations/classloaders.
> When interpreting a connector configuration, only one of these plugins will 
> be chosen, so only one is relevant to show to users. The REST API should only 
> display the plugins which are eligible to be loaded, and hide the duplicates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15473) Connect connector-plugins endpoint shows duplicate plugins

2023-09-18 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15473:
---

 Summary: Connect connector-plugins endpoint shows duplicate plugins
 Key: KAFKA-15473
 URL: https://issues.apache.org/jira/browse/KAFKA-15473
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 3.6.0
Reporter: Greg Harris
Assignee: Greg Harris
 Fix For: 3.6.0


In <3.6.0-rc0, only one copy of each plugin would be shown. For example:
{noformat}
  {
"class": "org.apache.kafka.connect.storage.StringConverter",
"type": "converter"
  },{noformat}
In 3.6.0-rc0, there are multiple listings for the same plugin. For example:

 
{noformat}
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter"
  },
  {
    "class": "org.apache.kafka.connect.storage.StringConverter",
    "type": "converter",
    "version": "3.6.0"
  },{noformat}
These duplicates appear to happen when a plugin with the same class name 
appears in multiple locations/classloaders.
When interpreting a connector configuration, only one of these plugins will be 
chosen, so only one is relevant to show to users. The REST API should only 
display the plugins which are eligible to be loaded, and hide the duplicates.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14595) Move ReassignPartitionsCommand to tools

2023-09-07 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14595?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17762822#comment-17762822
 ] 

Greg Harris commented on KAFKA-14595:
-

I merged [https://github.com/apache/kafka/pull/14217] but I am leaving this 
open as the other PRs have not landed yet.

> Move ReassignPartitionsCommand to tools
> ---
>
> Key: KAFKA-14595
> URL: https://issues.apache.org/jira/browse/KAFKA-14595
> Project: Kafka
>  Issue Type: Sub-task
>Reporter: Mickael Maison
>Assignee: Nikolay Izhikov
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

2023-09-01 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15372:

Fix Version/s: (was: 3.6.0)

> MM2 rolling restart can drop configuration changes silently
> ---
>
> Key: KAFKA-15372
> URL: https://issues.apache.org/jira/browse/KAFKA-15372
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Daniel Urban
>Assignee: Greg Harris
>Priority: Major
>
> When MM2 is restarted, it tries to update the Connector configuration in all 
> flows. This is a one-time trial, and fails if the Connect worker is not the 
> leader of the group.
> In a distributed setup and with a rolling restart, it is possible that for a 
> specific flow, the Connect worker of the just restarted MM2 instance is not 
> the leader, meaning that Connector configurations can get dropped.
> For example, assuming 2 MM2 instances, and one flow A->B:
>  # MM2 instance 1 is restarted, the worker inside MM2 instance 2 becomes the 
> leader of A->B Connect group.
>  # MM2 instance 1 tries to update the Connector configurations, but fails 
> (instance 2 has the leader, not instance 1)
>  # MM2 instance 2 is restarted, leadership moves to worker in MM2 instance 1
>  # MM2 instance 2 tries to update the Connector configurations, but fails
> At this point, the configuration changes before the restart are never 
> applied. Many times, this can also happen silently, without any indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

2023-09-01 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761381#comment-17761381
 ] 

Greg Harris commented on KAFKA-15372:
-

I implemented the REST-less fix, but it's complex enough that I don't feel 
comfortable including it in the upcoming release.

> MM2 rolling restart can drop configuration changes silently
> ---
>
> Key: KAFKA-15372
> URL: https://issues.apache.org/jira/browse/KAFKA-15372
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Daniel Urban
>Assignee: Greg Harris
>Priority: Major
> Fix For: 3.6.0
>
>
> When MM2 is restarted, it tries to update the Connector configuration in all 
> flows. This is a one-time trial, and fails if the Connect worker is not the 
> leader of the group.
> In a distributed setup and with a rolling restart, it is possible that for a 
> specific flow, the Connect worker of the just restarted MM2 instance is not 
> the leader, meaning that Connector configurations can get dropped.
> For example, assuming 2 MM2 instances, and one flow A->B:
>  # MM2 instance 1 is restarted, the worker inside MM2 instance 2 becomes the 
> leader of A->B Connect group.
>  # MM2 instance 1 tries to update the Connector configurations, but fails 
> (instance 2 has the leader, not instance 1)
>  # MM2 instance 2 is restarted, leadership moves to worker in MM2 instance 1
>  # MM2 instance 2 tries to update the Connector configurations, but fails
> At this point, the configuration changes before the restart are never 
> applied. Many times, this can also happen silently, without any indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

2023-08-28 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris reassigned KAFKA-15372:
---

Assignee: Greg Harris

> MM2 rolling restart can drop configuration changes silently
> ---
>
> Key: KAFKA-15372
> URL: https://issues.apache.org/jira/browse/KAFKA-15372
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Daniel Urban
>Assignee: Greg Harris
>Priority: Major
> Fix For: 3.6.0
>
>
> When MM2 is restarted, it tries to update the Connector configuration in all 
> flows. This is a one-time trial, and fails if the Connect worker is not the 
> leader of the group.
> In a distributed setup and with a rolling restart, it is possible that for a 
> specific flow, the Connect worker of the just restarted MM2 instance is not 
> the leader, meaning that Connector configurations can get dropped.
> For example, assuming 2 MM2 instances, and one flow A->B:
>  # MM2 instance 1 is restarted, the worker inside MM2 instance 2 becomes the 
> leader of A->B Connect group.
>  # MM2 instance 1 tries to update the Connector configurations, but fails 
> (instance 2 has the leader, not instance 1)
>  # MM2 instance 2 is restarted, leadership moves to worker in MM2 instance 1
>  # MM2 instance 2 tries to update the Connector configurations, but fails
> At this point, the configuration changes before the restart are never 
> applied. Many times, this can also happen silently, without any indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15211) DistributedConfigTest#shouldFailWithInvalidKeySize fails when run after TestSslUtils#generate

2023-08-24 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15211?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15211:

Fix Version/s: 3.4.2
   3.5.2

> DistributedConfigTest#shouldFailWithInvalidKeySize fails when run after 
> TestSslUtils#generate
> -
>
> Key: KAFKA-15211
> URL: https://issues.apache.org/jira/browse/KAFKA-15211
> Project: Kafka
>  Issue Type: Test
>  Components: clients, KafkaConnect
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Minor
> Fix For: 3.6.0, 3.4.2, 3.5.2
>
>
> The DistributedConfigTest#shouldFailWithInvalidKeySize attempts to configure 
> a hashing algorithm with a key size of 0. When run alone, this test passes, 
> as the default Java hashing algorithm used rejects the key size.
> However, when TestSslUtils#generate runs first, such as via the 
> RestForwardingIntegrationTest, the BouncyCastleProvider is loaded, which 
> provides an alternative hashing algorithm. This implementation does _not_ 
> reject the key size, causing the test to fail.
> We should ether prevent TestSslUtils#generate from leaving the 
> BouncyCastleProvider loaded after use, or adjust the test to pass when the 
> BouncyCastleProvider is loaded.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15377) GET /connectors/{connector}/tasks-config endpoint exposes externalized secret values

2023-08-24 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15377:

Fix Version/s: 3.6.0
   3.4.2
   3.5.2

> GET /connectors/{connector}/tasks-config endpoint exposes externalized secret 
> values
> 
>
> Key: KAFKA-15377
> URL: https://issues.apache.org/jira/browse/KAFKA-15377
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Yash Mayya
>Assignee: Yash Mayya
>Priority: Major
> Fix For: 3.6.0, 3.4.2, 3.5.2
>
>
> The {{GET /connectors/\{connector}/tasks-config}} endpoint added in 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-661%3A+Expose+task+configurations+in+Connect+REST+API]
>  exposes externalized secret values in task configurations (see 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-297%3A+Externalizing+Secrets+for+Connect+Configurations)].
>  A similar bug was fixed in https://issues.apache.org/jira/browse/KAFKA-5117 
> / [https://github.com/apache/kafka/pull/6129] for the {{GET 
> /connectors/\{connector}/tasks}} endpoint. The config provider placeholder 
> should be used instead of the resolved config value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-14901) Flaky test ExactlyOnceSourceIntegrationTest.testConnectorReconfiguration

2023-08-24 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758769#comment-17758769
 ] 

Greg Harris commented on KAFKA-14901:
-

This flakiness is no longer present on trunk. I believe that another bug fix 
resolved the flakiness but I don't know which.

> Flaky test ExactlyOnceSourceIntegrationTest.testConnectorReconfiguration
> 
>
> Key: KAFKA-14901
> URL: https://issues.apache.org/jira/browse/KAFKA-14901
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect
>Reporter: Greg Harris
>Priority: Major
>  Labels: flaky-test
> Fix For: 3.6.0
>
> Attachments: transaction-flake-trace-0.out, transaction-flake.out
>
>
> The EOS Source test appears to be very rarely failing (<5% chance) with the 
> following error:
> {noformat}
> org.apache.kafka.common.KafkaException: Unexpected error in 
> InitProducerIdResponse; The server experienced an unexpected error when 
> processing the request.
>   at 
> app//org.apache.kafka.clients.producer.internals.TransactionManager$InitProducerIdHandler.handleResponse(TransactionManager.java:1303)
>   at 
> app//org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1207)
>   at 
> app//org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:154)
>   at 
> app//org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:594)
>   at 
> app//org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:586)
>   at 
> app//org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:426)
>   at 
> app//org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:316)
>   at 
> app//org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243)
>   at java.base@11.0.16.1/java.lang.Thread.run(Thread.java:829){noformat}
> which appears to be triggered by the following failure inside the broker:
> {noformat}
> [2023-04-12 14:01:38,931] ERROR [KafkaApi-0] Unexpected error handling 
> request RequestHeader(apiKey=INIT_PRODUCER_ID, apiVersion=4, 
> clientId=simulated-task-producer-exactlyOnceQuestionMark-1, correlationId=5, 
> headerVersion=2) -- 
> InitProducerIdRequestData(transactionalId='exactly-once-source-integration-test-exactlyOnceQuestionMark-1',
>  transactionTimeoutMs=6, producerId=-1, producerEpoch=-1) with context 
> RequestContext(header=RequestHeader(apiKey=INIT_PRODUCER_ID, apiVersion=4, 
> clientId=simulated-task-producer-exactlyOnceQuestionMark-1, correlationId=5, 
> headerVersion=2), connectionId='127.0.0.1:54213-127.0.0.1:54367-46', 
> clientAddress=/127.0.0.1, principal=User:ANONYMOUS, 
> listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, 
> clientInformation=ClientInformation(softwareName=apache-kafka-java, 
> softwareVersion=3.5.0-SNAPSHOT), fromPrivilegedListener=true, 
> principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@615924cd])
>  (kafka.server.KafkaApis:76)
> java.lang.IllegalStateException: Preparing transaction state transition to 
> Empty while it already a pending state Ongoing
>     at 
> kafka.coordinator.transaction.TransactionMetadata.prepareTransitionTo(TransactionMetadata.scala:380)
>     at 
> kafka.coordinator.transaction.TransactionMetadata.prepareIncrementProducerEpoch(TransactionMetadata.scala:311)
>     at 
> kafka.coordinator.transaction.TransactionCoordinator.prepareInitProducerIdTransit(TransactionCoordinator.scala:240)
>     at 
> kafka.coordinator.transaction.TransactionCoordinator.$anonfun$handleInitProducerId$3(TransactionCoordinator.scala:151)
>     at 
> kafka.coordinator.transaction.TransactionMetadata.inLock(TransactionMetadata.scala:242)
>     at 
> kafka.coordinator.transaction.TransactionCoordinator.$anonfun$handleInitProducerId$2(TransactionCoordinator.scala:150)
>     at scala.util.Either.flatMap(Either.scala:352)
>     at 
> kafka.coordinator.transaction.TransactionCoordinator.handleInitProducerId(TransactionCoordinator.scala:145)
>     at 
> kafka.server.KafkaApis.handleInitProducerIdRequest(KafkaApis.scala:2236)
>     at kafka.server.KafkaApis.handle(KafkaApis.scala:202)
>     at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:76)
>     at java.base/java.lang.Thread.run(Thread.java:829{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14901) Flaky test ExactlyOnceSourceIntegrationTest.testConnectorReconfiguration

2023-08-24 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-14901.
-
Resolution: Cannot Reproduce

> Flaky test ExactlyOnceSourceIntegrationTest.testConnectorReconfiguration
> 
>
> Key: KAFKA-14901
> URL: https://issues.apache.org/jira/browse/KAFKA-14901
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect
>Reporter: Greg Harris
>Priority: Major
>  Labels: flaky-test
> Fix For: 3.6.0
>
> Attachments: transaction-flake-trace-0.out, transaction-flake.out
>
>
> The EOS Source test appears to be very rarely failing (<5% chance) with the 
> following error:
> {noformat}
> org.apache.kafka.common.KafkaException: Unexpected error in 
> InitProducerIdResponse; The server experienced an unexpected error when 
> processing the request.
>   at 
> app//org.apache.kafka.clients.producer.internals.TransactionManager$InitProducerIdHandler.handleResponse(TransactionManager.java:1303)
>   at 
> app//org.apache.kafka.clients.producer.internals.TransactionManager$TxnRequestHandler.onComplete(TransactionManager.java:1207)
>   at 
> app//org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:154)
>   at 
> app//org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:594)
>   at 
> app//org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:586)
>   at 
> app//org.apache.kafka.clients.producer.internals.Sender.maybeSendAndPollTransactionalRequest(Sender.java:426)
>   at 
> app//org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:316)
>   at 
> app//org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:243)
>   at java.base@11.0.16.1/java.lang.Thread.run(Thread.java:829){noformat}
> which appears to be triggered by the following failure inside the broker:
> {noformat}
> [2023-04-12 14:01:38,931] ERROR [KafkaApi-0] Unexpected error handling 
> request RequestHeader(apiKey=INIT_PRODUCER_ID, apiVersion=4, 
> clientId=simulated-task-producer-exactlyOnceQuestionMark-1, correlationId=5, 
> headerVersion=2) -- 
> InitProducerIdRequestData(transactionalId='exactly-once-source-integration-test-exactlyOnceQuestionMark-1',
>  transactionTimeoutMs=6, producerId=-1, producerEpoch=-1) with context 
> RequestContext(header=RequestHeader(apiKey=INIT_PRODUCER_ID, apiVersion=4, 
> clientId=simulated-task-producer-exactlyOnceQuestionMark-1, correlationId=5, 
> headerVersion=2), connectionId='127.0.0.1:54213-127.0.0.1:54367-46', 
> clientAddress=/127.0.0.1, principal=User:ANONYMOUS, 
> listenerName=ListenerName(PLAINTEXT), securityProtocol=PLAINTEXT, 
> clientInformation=ClientInformation(softwareName=apache-kafka-java, 
> softwareVersion=3.5.0-SNAPSHOT), fromPrivilegedListener=true, 
> principalSerde=Optional[org.apache.kafka.common.security.authenticator.DefaultKafkaPrincipalBuilder@615924cd])
>  (kafka.server.KafkaApis:76)
> java.lang.IllegalStateException: Preparing transaction state transition to 
> Empty while it already a pending state Ongoing
>     at 
> kafka.coordinator.transaction.TransactionMetadata.prepareTransitionTo(TransactionMetadata.scala:380)
>     at 
> kafka.coordinator.transaction.TransactionMetadata.prepareIncrementProducerEpoch(TransactionMetadata.scala:311)
>     at 
> kafka.coordinator.transaction.TransactionCoordinator.prepareInitProducerIdTransit(TransactionCoordinator.scala:240)
>     at 
> kafka.coordinator.transaction.TransactionCoordinator.$anonfun$handleInitProducerId$3(TransactionCoordinator.scala:151)
>     at 
> kafka.coordinator.transaction.TransactionMetadata.inLock(TransactionMetadata.scala:242)
>     at 
> kafka.coordinator.transaction.TransactionCoordinator.$anonfun$handleInitProducerId$2(TransactionCoordinator.scala:150)
>     at scala.util.Either.flatMap(Either.scala:352)
>     at 
> kafka.coordinator.transaction.TransactionCoordinator.handleInitProducerId(TransactionCoordinator.scala:145)
>     at 
> kafka.server.KafkaApis.handleInitProducerIdRequest(KafkaApis.scala:2236)
>     at kafka.server.KafkaApis.handle(KafkaApis.scala:202)
>     at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:76)
>     at java.base/java.lang.Thread.run(Thread.java:829{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15197) Flaky test MirrorConnectorsIntegrationExactlyOnceTest.testOffsetTranslationBehindReplicationFlow()

2023-08-24 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15197?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17758768#comment-17758768
 ] 

Greg Harris commented on KAFKA-15197:
-

We merged KAFKA-15202 and the flakiness decreased substantially, with only 2 
failures in the past week. I'll continue to monitor this but it shouldn't be a 
blocker for the upcoming release.

> Flaky test 
> MirrorConnectorsIntegrationExactlyOnceTest.testOffsetTranslationBehindReplicationFlow()
> --
>
> Key: KAFKA-15197
> URL: https://issues.apache.org/jira/browse/KAFKA-15197
> Project: Kafka
>  Issue Type: Test
>  Components: mirrormaker
>Reporter: Divij Vaidya
>Priority: Major
>  Labels: flaky-test
> Fix For: 3.6.0
>
>
> As of Jul 17th, this is the second most flaky test in our CI on trunk and 
> fails 46% of times. 
> See: 
> [https://ge.apache.org/scans/tests?search.relativeStartTime=P28D=kafka=Europe/Berlin]
>  
> Note that MirrorConnectorsIntegrationExactlyOnceTest has multiple tests but 
> testOffsetTranslationBehindReplicationFlow is the one that is the reason for 
> most failures. see: 
> [https://ge.apache.org/scans/tests?search.relativeStartTime=P28D=kafka=Europe/Berlin=org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationExactlyOnceTest]
>  
>  
> Reason for failure is: 
> |org.opentest4j.AssertionFailedError: Condition not met within timeout 2. 
> Offsets for consumer group consumer-group-lagging-behind not translated from 
> primary for topic primary.test-topic-1 ==> expected:  but was: |



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15398) Document Connect threat model

2023-08-23 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15398:
---

 Summary: Document Connect threat model
 Key: KAFKA-15398
 URL: https://issues.apache.org/jira/browse/KAFKA-15398
 Project: Kafka
  Issue Type: Task
  Components: KafkaConnect
Reporter: Greg Harris


Kafka Connect is a plugin framework, regularly requiring the installation of 
third-party code. This poses a security hazard for operators, who may be 
compromised by actively malicious plugins or well-intentioned plugins which are 
exploitable.

We should document the threat model that the Connect architecture uses, and 
make it clear to operators what trust and verification is required in order to 
operate Connect safely.

At a high level, this documentation may include:
 # Plugins are arbitrary code with unrestricted access to the filesystem, 
secrets, and network resources of the hosting Connect worker
 # The filesystem of the worker is trusted
 # Connector configurations passed via REST API are trusted
 # Plugins may have exploits triggered by certain configurations, or by 
external connections.
 # Exploits may also be present in plugins/drivers/dependencies used by Connect 
plugins, such as JDBC drivers
 # The default installation without REST API security is exploitable when run 
on an untrusted network.

Documenting this security model will also make it easier to discuss changing 
the model and improving the security architecture of Connect in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15393) MirrorMaker2 integration tests are shutting down uncleanly

2023-08-22 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15393:
---

 Summary: MirrorMaker2 integration tests are shutting down uncleanly
 Key: KAFKA-15393
 URL: https://issues.apache.org/jira/browse/KAFKA-15393
 Project: Kafka
  Issue Type: Test
  Components: mirrormaker
Reporter: Greg Harris
Assignee: Greg Harris


The MirrorConnectorsBaseIntegrationTest and it's derived test classes often 
shut down uncleanly. An unclean shutdown takes longer than a clean shutdown, 
because the shutdown must wait for timeouts to elapse.

It appears that during the teardown, the MirrorConnectorsBaseIntegrationTest 
deletes all of the topics on the EmbeddedKafkaCluster. This causes extremely 
poor behavior for the connect worker (which is unable to access internal 
topics) and for the connectors (which get stuck on their final offset commit 
refreshing metadata and then get cancelled).

The EmbeddedKafkaCluster is not reused for the next test, so cleaning the 
topics is unnecessary.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15292) Flaky test IdentityReplicationIntegrationTest#testReplicateSourceDefault()

2023-08-22 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757734#comment-17757734
 ] 

Greg Harris commented on KAFKA-15292:
-

The listed exception is shadowing the real exception, which is probably a 
shutdown timeout. I've opened KAFKA-15392 to resolve the exception shadowing 
which should hopefully make it easier to diagnose this flakiness.

> Flaky test IdentityReplicationIntegrationTest#testReplicateSourceDefault()
> --
>
> Key: KAFKA-15292
> URL: https://issues.apache.org/jira/browse/KAFKA-15292
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Kirk True
>Priority: Major
>  Labels: flaky-test, mirror-maker, mirrormaker
>
> The test testReplicateSourceDefault in 
> `org.apache.kafka.connect.mirror.integration.IdentityReplicationIntegrationTest
>  is flaky about 2% of the time as shown in [Gradle 
> Enterprise|[https://ge.apache.org/scans/tests?search.relativeStartTime=P90D=kafka=America/Los_Angeles=org.apache.kafka.connect.mirror.integration.IdentityReplicationIntegrationTest=FLAKY]].
>  
> {code:java}
> java.lang.RuntimeException: Could not stop worker
> at 
> o.a.k.connect.util.clusters.EmbeddedConnectCluster.stopWorker(EmbeddedConnectCluster.java:230)
> at java.lang.Iterable.forEach(Iterable.java:75)
> at 
> o.a.k.connect.util.clusters.EmbeddedConnectCluster.stop(EmbeddedConnectCluster.java:163)
> at 
> o.a.k.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest.shutdownClusters(MirrorConnectorsIntegrationBaseTest.java:267)
> at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
> at 
> jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:568)
> at 
> org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:727)
> ...
> at 
> worker.org.gradle.process.internal.worker.GradleWorkerMain.main(GradleWorkerMain.java:74)
> Caused by: java.lang.IllegalStateException: !STOPPED
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.destroy(HandlerWrapper.java:140)
> at o.a.k.connect.runtime.rest.RestServer.stop(RestServer.java:361)
> at o.a.k.connect.runtime.Connect.stop(Connect.java:69) at 
> o.a.k.connect.util.clusters.WorkerHandle.stop(WorkerHandle.java:57)
> at 
> o.a.k.connect.util.clusters.EmbeddedConnectCluster.stopWorker(EmbeddedConnectCluster.java:225)
> ... 93 more{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15392) RestServer starts but does not stop ServletContextHandler

2023-08-22 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15392?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15392:

Description: 
Due to the initialization order of the connect RestServer and Herder, the jetty 
Server is started before the ServletContextHandler instances are installed. 
This causes jetty to consider them "unmanaged" and thus will not call the 
start() and stop() lifecycle on our behalf.

RestServer#initializeResources already explicitly calls start() for these 
unmanaged resources, but there is no accompanying stop() call, so the resources 
never enter the STOPPED state.

The jetty server has one more operation after stopping: destroy(), which 
asserts that resources are already stopped. If the jetty server is ever 
destroyed, this exception will be thrown:
{noformat}
java.lang.IllegalStateException: !STOPPED
at 
org.eclipse.jetty.server.handler.HandlerWrapper.destroy(HandlerWrapper.java:140)
at o.a.k.connect.runtime.rest.RestServer.stop(RestServer.java:361){noformat}
Fortunately, destroy() is currently only called when an error has already 
occurred, so this IllegalStateException is never thrown on happy-path 
execution. Instead, if RestServer shutdown encounters an error (such as 
exceeding the GRACEFUL_SHUTDOWN_TIMEOUT and timing out) the other error will be 
shadowed by the IllegalStateException.

Rather than only calling destroy() on failure and shadowing the error, 
destroy() should always be called and it's errors reported separately.

  was:
Due to the initialization order of the connect RestServer and Herder, the jetty 
Server is started before the ServletContextHandler instances are installed. 
This causes jetty to consider them "unmanaged" and thus will not call the 
start() and stop() lifecycle on our behalf.

RestServer#initializeResources already explicitly calls start() for these 
unmanaged resources, but there is no accompanying stop() call, so the resources 
never enter the STOPPED state.

The jetty server has one more operation after stopping: destroy(), which 
asserts that resources are already stopped. If the jetty server is ever 
destroyed, this exception will be thrown:
java.lang.IllegalStateException: !STOPPED
at 
org.eclipse.jetty.server.handler.HandlerWrapper.destroy(HandlerWrapper.java:140)
at o.a.k.connect.runtime.rest.RestServer.stop(RestServer.java:361)
Fortunately, destroy() is currently only called when an error has already 
occurred, so this IllegalStateException is never thrown on happy-path 
execution. Instead, if RestServer shutdown encounters an error (such as 
exceeding the GRACEFUL_SHUTDOWN_TIMEOUT and timing out) the other error will be 
shadowed by the IllegalStateException.

Rather than only calling destroy() on failure and shadowing the error, 
destroy() should always be called and it's errors reported separately.


> RestServer starts but does not stop ServletContextHandler
> -
>
> Key: KAFKA-15392
> URL: https://issues.apache.org/jira/browse/KAFKA-15392
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Minor
>
> Due to the initialization order of the connect RestServer and Herder, the 
> jetty Server is started before the ServletContextHandler instances are 
> installed. This causes jetty to consider them "unmanaged" and thus will not 
> call the start() and stop() lifecycle on our behalf.
> RestServer#initializeResources already explicitly calls start() for these 
> unmanaged resources, but there is no accompanying stop() call, so the 
> resources never enter the STOPPED state.
> The jetty server has one more operation after stopping: destroy(), which 
> asserts that resources are already stopped. If the jetty server is ever 
> destroyed, this exception will be thrown:
> {noformat}
> java.lang.IllegalStateException: !STOPPED
> at 
> org.eclipse.jetty.server.handler.HandlerWrapper.destroy(HandlerWrapper.java:140)
> at o.a.k.connect.runtime.rest.RestServer.stop(RestServer.java:361){noformat}
> Fortunately, destroy() is currently only called when an error has already 
> occurred, so this IllegalStateException is never thrown on happy-path 
> execution. Instead, if RestServer shutdown encounters an error (such as 
> exceeding the GRACEFUL_SHUTDOWN_TIMEOUT and timing out) the other error will 
> be shadowed by the IllegalStateException.
> Rather than only calling destroy() on failure and shadowing the error, 
> destroy() should always be called and it's errors reported separately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15392) RestServer starts but does not stop ServletContextHandler

2023-08-22 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15392:
---

 Summary: RestServer starts but does not stop ServletContextHandler
 Key: KAFKA-15392
 URL: https://issues.apache.org/jira/browse/KAFKA-15392
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: Greg Harris
Assignee: Greg Harris


Due to the initialization order of the connect RestServer and Herder, the jetty 
Server is started before the ServletContextHandler instances are installed. 
This causes jetty to consider them "unmanaged" and thus will not call the 
start() and stop() lifecycle on our behalf.

RestServer#initializeResources already explicitly calls start() for these 
unmanaged resources, but there is no accompanying stop() call, so the resources 
never enter the STOPPED state.

The jetty server has one more operation after stopping: destroy(), which 
asserts that resources are already stopped. If the jetty server is ever 
destroyed, this exception will be thrown:
java.lang.IllegalStateException: !STOPPED
at 
org.eclipse.jetty.server.handler.HandlerWrapper.destroy(HandlerWrapper.java:140)
at o.a.k.connect.runtime.rest.RestServer.stop(RestServer.java:361)
Fortunately, destroy() is currently only called when an error has already 
occurred, so this IllegalStateException is never thrown on happy-path 
execution. Instead, if RestServer shutdown encounters an error (such as 
exceeding the GRACEFUL_SHUTDOWN_TIMEOUT and timing out) the other error will be 
shadowed by the IllegalStateException.

Rather than only calling destroy() on failure and shadowing the error, 
destroy() should always be called and it's errors reported separately.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

2023-08-22 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757542#comment-17757542
 ] 

Greg Harris commented on KAFKA-15372:
-

Reading through the KIP-710 thread, it appears that the public API was 
intentionally left out of the implementation to avoid security problems. 
In particular, the connector config endpoint security is typically managed by a 
rest extension and by default is unsecured, while the internal endpoints are 
already secured by default. If we were to keep the current security posture of 
KIP-710 (all endpoints secured) and add a new endpoint, we would need to secure 
it by default. This would be a divergence between MM2 dedicated mode and normal 
Connect Distributed mode that the KIP discussion wanted to avoid.

Because of that, I don't think we can implement connector config forwarding in 
a bugfix, it's going to at least need a slightly longer discussion about 
securing the original endpoint vs duplicating the endpoint.
I think that means that we'll need to implement the alternative solution, which 
delays applying the configuration until the worker becomes the leader.

> MM2 rolling restart can drop configuration changes silently
> ---
>
> Key: KAFKA-15372
> URL: https://issues.apache.org/jira/browse/KAFKA-15372
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Daniel Urban
>Priority: Major
> Fix For: 3.6.0
>
>
> When MM2 is restarted, it tries to update the Connector configuration in all 
> flows. This is a one-time trial, and fails if the Connect worker is not the 
> leader of the group.
> In a distributed setup and with a rolling restart, it is possible that for a 
> specific flow, the Connect worker of the just restarted MM2 instance is not 
> the leader, meaning that Connector configurations can get dropped.
> For example, assuming 2 MM2 instances, and one flow A->B:
>  # MM2 instance 1 is restarted, the worker inside MM2 instance 2 becomes the 
> leader of A->B Connect group.
>  # MM2 instance 1 tries to update the Connector configurations, but fails 
> (instance 2 has the leader, not instance 1)
>  # MM2 instance 2 is restarted, leadership moves to worker in MM2 instance 1
>  # MM2 instance 2 tries to update the Connector configurations, but fails
> At this point, the configuration changes before the restart are never 
> applied. Many times, this can also happen silently, without any indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-14627) Modernize Connect plugin discovery

2023-08-18 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-14627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-14627.
-
Resolution: Fixed

> Modernize Connect plugin discovery
> --
>
> Key: KAFKA-14627
> URL: https://issues.apache.org/jira/browse/KAFKA-14627
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Major
> Fix For: 3.6.0
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-898%3A+Modernize+Connect+plugin+discovery



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15226) System tests for plugin.discovery worker configuration

2023-08-18 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-15226.
-
Fix Version/s: 3.6.0
   Resolution: Fixed

> System tests for plugin.discovery worker configuration
> --
>
> Key: KAFKA-15226
> URL: https://issues.apache.org/jira/browse/KAFKA-15226
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Major
> Fix For: 3.6.0
>
>
> Add system tests as described in KIP-898, targeting the startup behavior of 
> the connect worker, various states of plugin migration, and the migration 
> script.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

2023-08-18 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756062#comment-17756062
 ] 

Greg Harris commented on KAFKA-15372:
-

Here's a proof-of-concept for the fix: 
[https://github.com/apache/kafka/pull/14245]

> MM2 rolling restart can drop configuration changes silently
> ---
>
> Key: KAFKA-15372
> URL: https://issues.apache.org/jira/browse/KAFKA-15372
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Daniel Urban
>Priority: Major
> Fix For: 3.6.0
>
>
> When MM2 is restarted, it tries to update the Connector configuration in all 
> flows. This is a one-time trial, and fails if the Connect worker is not the 
> leader of the group.
> In a distributed setup and with a rolling restart, it is possible that for a 
> specific flow, the Connect worker of the just restarted MM2 instance is not 
> the leader, meaning that Connector configurations can get dropped.
> For example, assuming 2 MM2 instances, and one flow A->B:
>  # MM2 instance 1 is restarted, the worker inside MM2 instance 2 becomes the 
> leader of A->B Connect group.
>  # MM2 instance 1 tries to update the Connector configurations, but fails 
> (instance 2 has the leader, not instance 1)
>  # MM2 instance 2 is restarted, leadership moves to worker in MM2 instance 1
>  # MM2 instance 2 tries to update the Connector configurations, but fails
> At this point, the configuration changes before the restart are never 
> applied. Many times, this can also happen silently, without any indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Comment Edited] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

2023-08-18 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756048#comment-17756048
 ] 

Greg Harris edited comment on KAFKA-15372 at 8/18/23 4:38 PM:
--

[~durban] thank you for the sanity check, I see the flaw now.

The NotLeaderException is created/thrown in the Herder, and the caller 
(typically the ConnectorsResource) is responsible for handling the error and 
forwarding to the leader. See HerderRequestHandler#completeOrForwardRequest.
I thought the MirrorMaker class used this handler for applying the connector 
configuration on startup, but it doesn't. It never triggers the forwarding 
logic that normal REST requests have.

More importantly, KIP-710 only added {_}internal rest resources{_}, which are 
for writing task configurations and fencing zombies. It didn't include the 
connector configuration endpoint, which is typically public-api.
This was an oversight in the KIP-710 design/implementation, and should be 
fixed. Currently, KIP-710 only benefits the internal herder requests for 
writing task configurations and fencing zombies, it is not capable of 
forwarding connector configurations.
Since KIP-710 already added the necessary configuration options to 
enable/disable the internal REST, and this arguably should have been covered by 
the original implementation, I think we can address this in a bug-fix.

Alternatively, we could have MirrorMaker periodically apply its local 
configurations, retrying until it receives leadership, and then stopping. This 
wouldn't require changes to the REST API, and would cover some additional error 
scenarios, but would possibly cause configurations to flap between versions as 
leadership changes.

To workaround this issue, users will need to fully stop and fully start MM2, to 
allow the first node to apply configurations successfully.


was (Author: gharris1727):
[~durban] thank you for the sanity check, I see the flaw now.

The NotLeaderException is created/thrown in the Herder, and the caller 
(typically the ConnectorsResource) is responsible for handling the error and 
forwarding to the leader. See HerderRequestHandler#completeOrForwardRequest.
I thought the MirrorMaker class used this handler for applying the connector 
configuration on startup, but it doesn't. It never triggers the forwarding 
logic that normal REST requests have.

More importantly, KIP-710 only added {_}internal rest resources{_}, which are 
for writing task configurations and fencing zombies. It didn't include the 
connector configuration endpoint, which is typically public-api.
This was an oversight in the KIP-710 design/implementation, and should be 
fixed. Currently, KIP-710 only benefits the internal herder requests for 
writing task configurations and fencing zombies, it is not capable of 
forwarding connector configurations.
Since KIP-710 already added the necessary configuration options to 
enable/disable the internal REST, and this arguably should have been covered by 
the original implementation, I think we can address this in a bug-fix.

To workaround this issue, users will need to fully stop and fully start MM2, to 
allow the first node to apply configurations successfully.

> MM2 rolling restart can drop configuration changes silently
> ---
>
> Key: KAFKA-15372
> URL: https://issues.apache.org/jira/browse/KAFKA-15372
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Daniel Urban
>Priority: Major
> Fix For: 3.6.0
>
>
> When MM2 is restarted, it tries to update the Connector configuration in all 
> flows. This is a one-time trial, and fails if the Connect worker is not the 
> leader of the group.
> In a distributed setup and with a rolling restart, it is possible that for a 
> specific flow, the Connect worker of the just restarted MM2 instance is not 
> the leader, meaning that Connector configurations can get dropped.
> For example, assuming 2 MM2 instances, and one flow A->B:
>  # MM2 instance 1 is restarted, the worker inside MM2 instance 2 becomes the 
> leader of A->B Connect group.
>  # MM2 instance 1 tries to update the Connector configurations, but fails 
> (instance 2 has the leader, not instance 1)
>  # MM2 instance 2 is restarted, leadership moves to worker in MM2 instance 1
>  # MM2 instance 2 tries to update the Connector configurations, but fails
> At this point, the configuration changes before the restart are never 
> applied. Many times, this can also happen silently, without any indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

2023-08-18 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15372:

Issue Type: Bug  (was: Improvement)

> MM2 rolling restart can drop configuration changes silently
> ---
>
> Key: KAFKA-15372
> URL: https://issues.apache.org/jira/browse/KAFKA-15372
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Daniel Urban
>Priority: Major
>
> When MM2 is restarted, it tries to update the Connector configuration in all 
> flows. This is a one-time trial, and fails if the Connect worker is not the 
> leader of the group.
> In a distributed setup and with a rolling restart, it is possible that for a 
> specific flow, the Connect worker of the just restarted MM2 instance is not 
> the leader, meaning that Connector configurations can get dropped.
> For example, assuming 2 MM2 instances, and one flow A->B:
>  # MM2 instance 1 is restarted, the worker inside MM2 instance 2 becomes the 
> leader of A->B Connect group.
>  # MM2 instance 1 tries to update the Connector configurations, but fails 
> (instance 2 has the leader, not instance 1)
>  # MM2 instance 2 is restarted, leadership moves to worker in MM2 instance 1
>  # MM2 instance 2 tries to update the Connector configurations, but fails
> At this point, the configuration changes before the restart are never 
> applied. Many times, this can also happen silently, without any indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

2023-08-18 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15372:

Fix Version/s: 3.6.0

> MM2 rolling restart can drop configuration changes silently
> ---
>
> Key: KAFKA-15372
> URL: https://issues.apache.org/jira/browse/KAFKA-15372
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: Daniel Urban
>Priority: Major
> Fix For: 3.6.0
>
>
> When MM2 is restarted, it tries to update the Connector configuration in all 
> flows. This is a one-time trial, and fails if the Connect worker is not the 
> leader of the group.
> In a distributed setup and with a rolling restart, it is possible that for a 
> specific flow, the Connect worker of the just restarted MM2 instance is not 
> the leader, meaning that Connector configurations can get dropped.
> For example, assuming 2 MM2 instances, and one flow A->B:
>  # MM2 instance 1 is restarted, the worker inside MM2 instance 2 becomes the 
> leader of A->B Connect group.
>  # MM2 instance 1 tries to update the Connector configurations, but fails 
> (instance 2 has the leader, not instance 1)
>  # MM2 instance 2 is restarted, leadership moves to worker in MM2 instance 1
>  # MM2 instance 2 tries to update the Connector configurations, but fails
> At this point, the configuration changes before the restart are never 
> applied. Many times, this can also happen silently, without any indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

2023-08-18 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756048#comment-17756048
 ] 

Greg Harris commented on KAFKA-15372:
-

[~durban] thank you for the sanity check, I see the flaw now.

The NotLeaderException is created/thrown in the Herder, and the caller 
(typically the ConnectorsResource) is responsible for handling the error and 
forwarding to the leader. See HerderRequestHandler#completeOrForwardRequest.
I thought the MirrorMaker class used this handler for applying the connector 
configuration on startup, but it doesn't. It never triggers the forwarding 
logic that normal REST requests have.

More importantly, KIP-710 only added {_}internal rest resources{_}, which are 
for writing task configurations and fencing zombies. It didn't include the 
connector configuration endpoint, which is typically public-api.
This was an oversight in the KIP-710 design/implementation, and should be 
fixed. Currently, KIP-710 only benefits the internal herder requests for 
writing task configurations and fencing zombies, it is not capable of 
forwarding connector configurations.
Since KIP-710 already added the necessary configuration options to 
enable/disable the internal REST, and this arguably should have been covered by 
the original implementation, I think we can address this in a bug-fix.

To workaround this issue, users will need to fully stop and fully start MM2, to 
allow the first node to apply configurations successfully.

> MM2 rolling restart can drop configuration changes silently
> ---
>
> Key: KAFKA-15372
> URL: https://issues.apache.org/jira/browse/KAFKA-15372
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Reporter: Daniel Urban
>Priority: Major
>
> When MM2 is restarted, it tries to update the Connector configuration in all 
> flows. This is a one-time trial, and fails if the Connect worker is not the 
> leader of the group.
> In a distributed setup and with a rolling restart, it is possible that for a 
> specific flow, the Connect worker of the just restarted MM2 instance is not 
> the leader, meaning that Connector configurations can get dropped.
> For example, assuming 2 MM2 instances, and one flow A->B:
>  # MM2 instance 1 is restarted, the worker inside MM2 instance 2 becomes the 
> leader of A->B Connect group.
>  # MM2 instance 1 tries to update the Connector configurations, but fails 
> (instance 2 has the leader, not instance 1)
>  # MM2 instance 2 is restarted, leadership moves to worker in MM2 instance 1
>  # MM2 instance 2 tries to update the Connector configurations, but fails
> At this point, the configuration changes before the restart are never 
> applied. Many times, this can also happen silently, without any indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15227) Use plugin.discovery=SERVICE_LOAD in all plugin test suites

2023-08-17 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755658#comment-17755658
 ] 

Greg Harris commented on KAFKA-15227:
-

We can delay this for some time, as the performance improvements are marginal 
in AK. It will also catch any non-migrated plugins added in PRs.

> Use plugin.discovery=SERVICE_LOAD in all plugin test suites
> ---
>
> Key: KAFKA-15227
> URL: https://issues.apache.org/jira/browse/KAFKA-15227
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect
>Reporter: Greg Harris
>Priority: Major
>
> To speed up these tests where we know all plugins are migrated, use 
> SERVICE_LOAD mode in all known test suites.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Assigned] (KAFKA-15227) Use plugin.discovery=SERVICE_LOAD in all plugin test suites

2023-08-17 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris reassigned KAFKA-15227:
---

Assignee: (was: Greg Harris)

> Use plugin.discovery=SERVICE_LOAD in all plugin test suites
> ---
>
> Key: KAFKA-15227
> URL: https://issues.apache.org/jira/browse/KAFKA-15227
> Project: Kafka
>  Issue Type: Test
>  Components: KafkaConnect
>Reporter: Greg Harris
>Priority: Major
>
> To speed up these tests where we know all plugins are migrated, use 
> SERVICE_LOAD mode in all known test suites.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

2023-08-17 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755647#comment-17755647
 ] 

Greg Harris commented on KAFKA-15372:
-

[~durban] I should clarify, I believe the behavior you described is the 
expected (but undesirable) behavior for versions before 3.5.0 and earlier, and 
for 3.5.0+ with the configuration set to the default `false`.

When the internal REST API is enabled, the worker which is starting (that is 
not the leader) should forward configurations to the leader via the internal 
REST API. If you're not seeing the REST forwarding being attempted, or the 
forwarding failing, that would explain why configuration updates are being 
dropped.

If the above does not apply, then I'm interested to hear more details of your 
reproduction case. Thanks!

> MM2 rolling restart can drop configuration changes silently
> ---
>
> Key: KAFKA-15372
> URL: https://issues.apache.org/jira/browse/KAFKA-15372
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Reporter: Daniel Urban
>Priority: Major
>
> When MM2 is restarted, it tries to update the Connector configuration in all 
> flows. This is a one-time trial, and fails if the Connect worker is not the 
> leader of the group.
> In a distributed setup and with a rolling restart, it is possible that for a 
> specific flow, the Connect worker of the just restarted MM2 instance is not 
> the leader, meaning that Connector configurations can get dropped.
> For example, assuming 2 MM2 instances, and one flow A->B:
>  # MM2 instance 1 is restarted, the worker inside MM2 instance 2 becomes the 
> leader of A->B Connect group.
>  # MM2 instance 1 tries to update the Connector configurations, but fails 
> (instance 2 has the leader, not instance 1)
>  # MM2 instance 2 is restarted, leadership moves to worker in MM2 instance 1
>  # MM2 instance 2 tries to update the Connector configurations, but fails
> At this point, the configuration changes before the restart are never 
> applied. Many times, this can also happen silently, without any indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15372) MM2 rolling restart can drop configuration changes silently

2023-08-17 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17755629#comment-17755629
 ] 

Greg Harris commented on KAFKA-15372:
-

Hi [~durban] Thanks for the bug report.

Is this reproducible with 3.5.0 and `dedicated.mode.enable.internal.rest` set 
to `true`? This configuration was added in 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-710%3A+Full+support+for+distributed+mode+in+dedicated+MirrorMaker+2.0+clusters]
 .

> MM2 rolling restart can drop configuration changes silently
> ---
>
> Key: KAFKA-15372
> URL: https://issues.apache.org/jira/browse/KAFKA-15372
> Project: Kafka
>  Issue Type: Improvement
>  Components: mirrormaker
>Reporter: Daniel Urban
>Priority: Major
>
> When MM2 is restarted, it tries to update the Connector configuration in all 
> flows. This is a one-time trial, and fails if the Connect worker is not the 
> leader of the group.
> In a distributed setup and with a rolling restart, it is possible that for a 
> specific flow, the Connect worker of the just restarted MM2 instance is not 
> the leader, meaning that Connector configurations can get dropped.
> For example, assuming 2 MM2 instances, and one flow A->B:
>  # MM2 instance 1 is restarted, the worker inside MM2 instance 2 becomes the 
> leader of A->B Connect group.
>  # MM2 instance 1 tries to update the Connector configurations, but fails 
> (instance 2 has the leader, not instance 1)
>  # MM2 instance 2 is restarted, leadership moves to worker in MM2 instance 1
>  # MM2 instance 2 tries to update the Connector configurations, but fails
> At this point, the configuration changes before the restart are never 
> applied. Many times, this can also happen silently, without any indication.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15228) Add sync-manifests subcommand to connect-plugin-path tool

2023-08-16 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-15228.
-
Resolution: Fixed

> Add sync-manifests subcommand to connect-plugin-path tool
> -
>
> Key: KAFKA-15228
> URL: https://issues.apache.org/jira/browse/KAFKA-15228
> Project: Kafka
>  Issue Type: New Feature
>  Components: KafkaConnect, tools
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Major
> Fix For: 3.6.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (KAFKA-15350) MetadataLoaderMetrics has ClassNotFoundException in system tests

2023-08-15 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris resolved KAFKA-15350.
-
Resolution: Invalid

I believe this was an environmental problem, due to some stale artifacts. A 
clean build fixed the issues I was seeing.

> MetadataLoaderMetrics has ClassNotFoundException in system tests
> 
>
> Key: KAFKA-15350
> URL: https://issues.apache.org/jira/browse/KAFKA-15350
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft, system tests
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Priority: Blocker
>
> The system tests appear to be failing on trunk with:
> {noformat}
> [2023-08-15 22:11:34,235] ERROR Exiting Kafka due to fatal exception 
> (kafka.Kafka$)
> java.lang.NoClassDefFoundError: 
> org/apache/kafka/image/loader/MetadataLoaderMetrics
>at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:68)
>at kafka.Kafka$.buildServer(Kafka.scala:83)
>at kafka.Kafka$.main(Kafka.scala:91)
>at kafka.Kafka.main(Kafka.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.kafka.image.loader.MetadataLoaderMetrics
>at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>... 4 more{noformat}
> This happens with the `tests/kafkatest/tests/connect/`, 
> `tests/kafkatest/tests/core/kraft_upgrade_test.py` and possibly others.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15350) MetadataLoaderMetrics has ClassNotFoundException in system tests

2023-08-15 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15350:

Issue Type: Bug  (was: Improvement)

> MetadataLoaderMetrics has ClassNotFoundException in system tests
> 
>
> Key: KAFKA-15350
> URL: https://issues.apache.org/jira/browse/KAFKA-15350
> Project: Kafka
>  Issue Type: Bug
>  Components: kraft, system tests
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Priority: Blocker
>
> The system tests appear to be failing on trunk with:
> {noformat}
> [2023-08-15 22:11:34,235] ERROR Exiting Kafka due to fatal exception 
> (kafka.Kafka$)
> java.lang.NoClassDefFoundError: 
> org/apache/kafka/image/loader/MetadataLoaderMetrics
>at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:68)
>at kafka.Kafka$.buildServer(Kafka.scala:83)
>at kafka.Kafka$.main(Kafka.scala:91)
>at kafka.Kafka.main(Kafka.scala)
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.kafka.image.loader.MetadataLoaderMetrics
>at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
>at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>... 4 more{noformat}
> This happens with the `tests/kafkatest/tests/connect/`, 
> `tests/kafkatest/tests/core/kraft_upgrade_test.py` and possibly others.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15336) Connect plugin Javadocs should mention serviceloader manifests

2023-08-15 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15336:

Issue Type: Improvement  (was: Bug)

> Connect plugin Javadocs should mention serviceloader manifests
> --
>
> Key: KAFKA-15336
> URL: https://issues.apache.org/jira/browse/KAFKA-15336
> Project: Kafka
>  Issue Type: Improvement
>  Components: KafkaConnect
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Minor
> Fix For: 3.6.0
>
>
> Similar to the ConfigProvider, the Javadocs for the Connect plugin classes 
> should mention that plugin implementations should have ServiceLoader 
> manifests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (KAFKA-15336) Connect plugin Javadocs should mention serviceloader manifests

2023-08-15 Thread Greg Harris (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-15336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Harris updated KAFKA-15336:

Issue Type: Bug  (was: Improvement)

> Connect plugin Javadocs should mention serviceloader manifests
> --
>
> Key: KAFKA-15336
> URL: https://issues.apache.org/jira/browse/KAFKA-15336
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Affects Versions: 3.6.0
>Reporter: Greg Harris
>Assignee: Greg Harris
>Priority: Minor
> Fix For: 3.6.0
>
>
> Similar to the ConfigProvider, the Javadocs for the Connect plugin classes 
> should mention that plugin implementations should have ServiceLoader 
> manifests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15350) MetadataLoaderMetrics has ClassNotFoundException in system tests

2023-08-15 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15350:
---

 Summary: MetadataLoaderMetrics has ClassNotFoundException in 
system tests
 Key: KAFKA-15350
 URL: https://issues.apache.org/jira/browse/KAFKA-15350
 Project: Kafka
  Issue Type: Improvement
  Components: kraft, system tests
Affects Versions: 3.6.0
Reporter: Greg Harris


The system tests appear to be failing on trunk with:
{noformat}
[2023-08-15 22:11:34,235] ERROR Exiting Kafka due to fatal exception 
(kafka.Kafka$)
java.lang.NoClassDefFoundError: 
org/apache/kafka/image/loader/MetadataLoaderMetrics
   at kafka.server.KafkaRaftServer.(KafkaRaftServer.scala:68)
   at kafka.Kafka$.buildServer(Kafka.scala:83)
   at kafka.Kafka$.main(Kafka.scala:91)
   at kafka.Kafka.main(Kafka.scala)
Caused by: java.lang.ClassNotFoundException: 
org.apache.kafka.image.loader.MetadataLoaderMetrics
   at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:352)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
   ... 4 more{noformat}
This happens with the `tests/kafkatest/tests/connect/`, 
`tests/kafkatest/tests/core/kraft_upgrade_test.py` and possibly others.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15349) ducker-ak should fail fast when gradlew systemTestLibs fails

2023-08-15 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15349:
---

 Summary: ducker-ak should fail fast when gradlew systemTestLibs 
fails
 Key: KAFKA-15349
 URL: https://issues.apache.org/jira/browse/KAFKA-15349
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Greg Harris


If you introduce a flaw into the gradle build which causes the systemTestLibs 
to fail, such as a circular dependency, then the ducker_test function continues 
to run tests which are invalid.

Rather than proceeding to run the tests, the script should fail fast and make 
the user address the error before continuing.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Commented] (KAFKA-15343) Fix MirrorConnectIntegrationTests causing ci build failures.

2023-08-15 Thread Greg Harris (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-15343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17754771#comment-17754771
 ] 

Greg Harris commented on KAFKA-15343:
-

Hi [~prasanth] and thank you for reporting this issue! It is certainly not good 
that one test can cause the whole build to fail, preventing other tests from 
running.

Can you speak to the frequency that you've seen this failure? Naively I would 
expect that with >1 ephemeral ports available, that such a failure would be 
quite rare.

If this is true, I don't think it is appropriate to disable these tests. They 
are extremely important test coverage for the MirrorMaker2 feature, and 
disabling them may lead to undetected regressions.

As far as resolving this issue, I think we should:

1. Find where we are leaking Kafka clients in the MM2 integration test suites, 
either within the framework or within the Mirror connectors.

2. Close Kafka clients in a timely fashion (some relevant work in 
https://issues.apache.org/jira/browse/KAFKA-14725 and 
https://issues.apache.org/jira/browse/KAFKA-15090 )

2. Try to reproduce the Gradle daemon crash in a more controlled environment

3. Report the daemon crash to the Gradle upstream



Since random port selection and port-reuse are standard procedures (not 
specific to Kafka) there could be downstream projects using Gradle that are 
affected. If there is something specific about the Kafka clients' connections 
that affect gradle, then we should investigate further to help the Gradle 
project resolve the issue.

> Fix MirrorConnectIntegrationTests causing ci build failures.
> 
>
> Key: KAFKA-15343
> URL: https://issues.apache.org/jira/browse/KAFKA-15343
> Project: Kafka
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.6.0
>Reporter: Prasanth Kumar
>Priority: Major
>
> There are several instances of tests interacting badly with gradle daemon(s) 
> running on ports that the kafka broker previously used. After going through 
> the debug logs we observed a few retrying kafka clients trying to connect to 
> broker which got shutdown and the gradle worker chose the same port on which 
> broker was running. Later in the build, the gradle daemon attempted to 
> connect to the worker and could not, triggering a failure. Ideally gradle 
> would not exit when connected to from an invalid client - in testing with 
> netcat, it would often handle these without dying. However there appear to be 
> some cases where the daemon dies completely. Both the broker code and the 
> gradle workers bind to port 0, resulting in the OS assigning it an unused 
> port. This does avoid conflicts, but does not ensure that long lived clients 
> do not attempt to connect to these ports afterwards. It's possible that 
> closing the client in between may be enough to work around this issue. Till 
> then we will disable the test to avoid the ci blocker from testing the code 
> changes.
> *MirrorConnectorsIntegrationBaseTest and extending Tests*
> {code:java}
> [2023-07-04T11:48:16.128Z] 2023-07-04T11:47:46.804+ [DEBUG] 
> [TestEventLogger] 
> MirrorConnectorsWithCustomForwardingAdminIntegrationTest > 
> testReplicateSourceDefault() STANDARD_OUT
> [2023-07-04T11:48:16.128Z] 2023-07-04T11:47:46.804+ [DEBUG] 
> [TestEventLogger] [2023-07-04 11:47:46,799]
>  INFO primary REST service: http://localhost:43809/connectors 
> (org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest:224)
> [2023-07-04T11:48:16.128Z] 2023-07-04T11:47:46.804+ [DEBUG] 
> [TestEventLogger] [2023-07-04 11:47:46,799] 
> INFO backup REST service: http://localhost:43323/connectors 
> (org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest:225)
> [2023-07-04T11:48:16.128Z] 2023-07-04T11:47:46.804+ [DEBUG] 
> [TestEventLogger] [2023-07-04 11:47:46,799] 
> INFO primary brokers: localhost:37557 
> (org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest:226)
> [2023-07-04T11:59:12.968Z] 2023-07-04T11:59:12.900+ [DEBUG] 
> [org.gradle.internal.remote.internal.inet.TcpIncomingConnector] 
> Accepted connection from /127.0.0.1:47660 to /127.0.0.1:37557.
> [2023-07-04T11:59:13.233Z] 
> org.gradle.internal.remote.internal.MessageIOException: Could not read 
> message from '/127.0.0.1:47660'.
> [2023-07-04T11:59:12.970Z] 2023-07-04T11:59:12.579+ [DEBUG] 
> [org.gradle.internal.remote.internal.inet.TcpIncomingConnector] Listening on 
> [d6bf30cb-bca2-46d9-8aeb-b9fd0497f54d port:37557, 
> addresses:[localhost/127.0.0.1]].
> [2023-07-04T11:59:46.519Z] 2023-07-04T11:59:13.014+ [ERROR] 
> [system.err] org.gradle.internal.remote.internal.ConnectException: Could not 
> connect to server [d6bf30cb-bca2-46d9-8aeb-b9fd0497f54d port:37557, 
> 

[jira] [Created] (KAFKA-15336) Connect plugin Javadocs should mention serviceloader manifests

2023-08-11 Thread Greg Harris (Jira)
Greg Harris created KAFKA-15336:
---

 Summary: Connect plugin Javadocs should mention serviceloader 
manifests
 Key: KAFKA-15336
 URL: https://issues.apache.org/jira/browse/KAFKA-15336
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Affects Versions: 3.6.0
Reporter: Greg Harris
Assignee: Greg Harris


Similar to the ConfigProvider, the Javadocs for the Connect plugin classes 
should mention that plugin implementations should have ServiceLoader manifests.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


<    1   2   3   4   5   6   7   >