[PR] Fixing flaky Unit tests in GroupMetadataManagerTest [kafka]
vamossagar12 opened a new pull request, #15100: URL: https://github.com/apache/kafka/pull/15100 Noticed 2 cases of Flaky unit tests. One in this build: https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-15080/1/tests/ `testStaticMemberGetsBackAssignmentUponRejoin` and this link: https://github.com/apache/kafka/pull/14988#issuecomment-1853155239 which mentions `testNoGroupEpochBumpWhenStaticMemberTemporarilyLeaves` as flaky. The main reason for the flakiness is that the order of topic-partitions in the assigned partitions is inverted. This PR aims to correct the 2. I ran all the 182 tests in `GroupMetadataManagerTest` 30 times via a script and all the tests passed on all occasions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
[ https://issues.apache.org/jira/browse/KAFKA-16063 ] Arpit Goyal deleted comment on KAFKA-16063: - was (Author: JIRAUSER301926): [~divijvaidya] It seems intellij profiler is available in the ultimate edition and not the community edition. > Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests > - > > Key: KAFKA-16063 > URL: https://issues.apache.org/jira/browse/KAFKA-16063 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Arpit Goyal >Priority: Major > Attachments: Screenshot 2023-12-29 at 12.38.29.png > > > All test extending `EndToEndAuthorizationTest` are leaking > DefaultDirectoryService objects. > This can be observed using the heap dump at > [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0] > (unzip this and you will find a hprof which can be opened with your > favourite heap analyzer) > The stack trace looks like this: > !Screenshot 2023-12-29 at 12.38.29.png! > > I suspect that the reason is because DefaultDirectoryService#startup() > registers a shutdownhook which is somehow messed up by > QuorumTestHarness#teardown(). > We need to investigate why this is leaking and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16052: Create a real dummy replicaManager instance to save heap memory [kafka]
showuon commented on PR #15094: URL: https://github.com/apache/kafka/pull/15094#issuecomment-1872473260 > I noticed this build had more failures than usual. Not sure if its because we actually got to run all the tests without OOMing or if something new has come up. Had a quick check, they look unrelated. Just re-triggering it just in case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16052: Create a real dummy replicaManager instance to save heap memory [kafka]
showuon commented on code in PR #15094: URL: https://github.com/apache/kafka/pull/15094#discussion_r1438489270 ## core/src/test/scala/unit/kafka/coordinator/AbstractCoordinatorConcurrencyTest.scala: ## @@ -148,25 +156,34 @@ abstract class AbstractCoordinatorConcurrencyTest[M <: CoordinatorMember] { } object AbstractCoordinatorConcurrencyTest { - trait Action extends Runnable { def await(): Unit } trait CoordinatorMember { } - class TestReplicaManager extends ReplicaManager( -null, null, null, null, null, null, null, null, null, null, null, null, null, null, null, None, null) { + class TestReplicaManager(config: KafkaConfig, + mtime: Time, + scheduler: Scheduler, + logManager: LogManager, + quotaManagers: QuotaManagers, + val watchKeys: mutable.Set[TopicPartitionOperationKey], + val producePurgatory: DelayedOperationPurgatory[DelayedProduce]) +extends ReplicaManager( + config, + metrics = null, + mtime, + scheduler, + logManager, + None, + quotaManagers, + null, + null, + null, + delayedProducePurgatoryParam = Some(producePurgatory)) { Review Comment: I found while running the profiler that after we creating the real replicaManager, we'll also create `ExpirationReaper` threads for Fetch, RemoteFetch, ... etc. Although we will close them after the tests, we can actually avoid creating them beforehand like what we did in replicaManagerTest. (ref: https://github.com/apache/kafka/pull/15077/files#diff-5ace24a01ad7792f3c168abe80e4c23674d6ce8295f5a54a94933c4e4d75e7a5R2893) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-16071) NPE in testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
Luke Chen created KAFKA-16071: - Summary: NPE in testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress Key: KAFKA-16071 URL: https://issues.apache.org/jira/browse/KAFKA-16071 Project: Kafka Issue Type: Test Reporter: Luke Chen Found in the CI build result. h3. Error Message java.lang.NullPointerException h3. Stacktrace java.lang.NullPointerException at org.apache.kafka.tools.TopicCommandIntegrationTest.testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress(TopicCommandIntegrationTest.java:800) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.platform.commons.util.ReflectionUtils.invokeMethod(ReflectionUtils.java:728) at org.junit.jupiter.engine.execution.MethodInvocation.proceed(MethodInvocation.java:60) at org.junit.jupiter.engine.execution.InvocationInterceptorChain$ValidatingInvocation.proceed(InvocationInterceptorChain.java:131) https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15095/1/testReport/junit/org.apache.kafka.tools/TopicCommandIntegrationTest/Build___JDK_8_and_Scala_2_12___testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress_String__zk/ -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] MINOR: Close MockitoStatic in try-with-resource [kafka]
showuon merged PR #15095: URL: https://github.com/apache/kafka/pull/15095 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: Close MockitoStatic in try-with-resource [kafka]
showuon commented on PR #15095: URL: https://github.com/apache/kafka/pull/15095#issuecomment-1872469104 Failed tests are unrelated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
[ https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801295#comment-17801295 ] Arpit Goyal commented on KAFKA-16063: - [~divijvaidya] It seems intellij profiler is available in the ultimate edition and not the community edition. > Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests > - > > Key: KAFKA-16063 > URL: https://issues.apache.org/jira/browse/KAFKA-16063 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Arpit Goyal >Priority: Major > Attachments: Screenshot 2023-12-29 at 12.38.29.png > > > All test extending `EndToEndAuthorizationTest` are leaking > DefaultDirectoryService objects. > This can be observed using the heap dump at > [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0] > (unzip this and you will find a hprof which can be opened with your > favourite heap analyzer) > The stack trace looks like this: > !Screenshot 2023-12-29 at 12.38.29.png! > > I suspect that the reason is because DefaultDirectoryService#startup() > registers a shutdownhook which is somehow messed up by > QuorumTestHarness#teardown(). > We need to investigate why this is leaking and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16070) Extract the setReadOnly method into Headers
Lighting Sui created KAFKA-16070: Summary: Extract the setReadOnly method into Headers Key: KAFKA-16070 URL: https://issues.apache.org/jira/browse/KAFKA-16070 Project: Kafka Issue Type: Improvement Components: clients Reporter: Lighting Sui Abstract the setReadOnly function into the Headers Interface and provide a default implementation. The setReadOnly function in Headers can be rewritten by RecordHeaders, so that the setReadOnly function in KafkaProducer can be removed and the code can be cleaned up. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Created] (KAFKA-16069) Source Tasks re-transform records after Retriable exceptions
Greg Harris created KAFKA-16069: --- Summary: Source Tasks re-transform records after Retriable exceptions Key: KAFKA-16069 URL: https://issues.apache.org/jira/browse/KAFKA-16069 Project: Kafka Issue Type: Bug Components: KafkaConnect Reporter: Greg Harris In the SinkTask, records which fail to be delivered to the task#put with a Retriable exception are re-delivered on the next iteration. The SourceTask follows a similar pattern, where records which fail to be delivered to Producer#send with a Retriable exception are retried. However, the SinkTask accumulates the post-transform records, and does not recompute the transformations over again after a retriable exception. The SourceTask does not accumulate ProducerRecords, and instead recomputes the transformations and converters starting from the pre-transformation AbstractWorkerSourceTask#toSend list. This means that stateful transformations and converters may see the records rewind, without any indication that the records are the same. For stateless transformations and converters, this means that redundant computation is performed that may be better allocated to other tasks. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16052: Create a real dummy replicaManager instance to save heap memory [kafka]
jolshan commented on PR #15094: URL: https://github.com/apache/kafka/pull/15094#issuecomment-1872386326 Thanks Divij. Were we able to get the heap usage for this branch too? (Is it the one in the ticket?) Just wanted to confirm we are still seeing the drop in heap usage. I noticed this build had more failures than usual. Not sure if its because we actually got to run all the tests without OOMing or if something new has come up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (KAFKA-16066) Upgrade apacheds to 2.0.0.AM27
[ https://issues.apache.org/jira/browse/KAFKA-16066?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801246#comment-17801246 ] Anish Lukkireddy commented on KAFKA-16066: -- can this be assigned to me > Upgrade apacheds to 2.0.0.AM27 > -- > > Key: KAFKA-16066 > URL: https://issues.apache.org/jira/browse/KAFKA-16066 > Project: Kafka > Issue Type: Improvement >Reporter: Divij Vaidya >Priority: Major > Labels: newbie, newbie++ > > We are currently using a very old dependency. Notably, apacheds is only used > for testing when we use MiniKdc, hence, there is nothing stopping us from > upgrading it. > Notably, apacheds has removed the component > org.apache.directory.server:apacheds-protocol-kerberos in favour of using > Apache Kerby, hence, we need to make changes in MiniKdc.scala for this > upgrade to work correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] WIP: add parallelism to Jenkins tests [kafka]
divijvaidya opened a new pull request, #15099: URL: https://github.com/apache/kafka/pull/15099 *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-16068) Use TestPlugins in ConnectorValidationIntegrationTest to silence plugin scanning errors
Greg Harris created KAFKA-16068: --- Summary: Use TestPlugins in ConnectorValidationIntegrationTest to silence plugin scanning errors Key: KAFKA-16068 URL: https://issues.apache.org/jira/browse/KAFKA-16068 Project: Kafka Issue Type: Task Components: KafkaConnect Reporter: Greg Harris The ConnectorValidationIntegrationTest creates test plugins, some with erroneous behavior. In particular: {noformat} [2023-12-29 10:28:06,548] ERROR Failed to discover Converter in classpath: Unable to instantiate TestConverterWithPrivateConstructor: Plugin class default constructor must be public (org.apache.kafka.connect.runtime.isolation.ReflectionScanner:138) [2023-12-29 10:28:06,550] ERROR Failed to discover Converter in classpath: Unable to instantiate TestConverterWithConstructorThatThrowsException: Failed to invoke plugin constructor (org.apache.kafka.connect.runtime.isolation.ReflectionScanner:138) java.lang.reflect.InvocationTargetException{noformat} These plugins should be eliminated from the classpath, so that the errors do not appear in unrelated tests. Instead, plugins with erroneous behavior should only be present in the TestPlugins, so that tests can opt-in to loading them. There are already plugins with private constructors and throwing-exceptions-constructors, so they should be able to be re-used. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16068) Use TestPlugins in ConnectorValidationIntegrationTest to silence plugin scanning errors
[ https://issues.apache.org/jira/browse/KAFKA-16068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Greg Harris updated KAFKA-16068: Description: The ConnectorValidationIntegrationTest creates test plugins, some with erroneous behavior. In particular: {noformat} [2023-12-29 10:28:06,548] ERROR Failed to discover Converter in classpath: Unable to instantiate TestConverterWithPrivateConstructor: Plugin class default constructor must be public (org.apache.kafka.connect.runtime.isolation.ReflectionScanner:138) [2023-12-29 10:28:06,550] ERROR Failed to discover Converter in classpath: Unable to instantiate TestConverterWithConstructorThatThrowsException: Failed to invoke plugin constructor (org.apache.kafka.connect.runtime.isolation.ReflectionScanner:138) java.lang.reflect.InvocationTargetException{noformat} These plugins should be eliminated from the classpath, so that the errors do not appear in unrelated tests. Instead, plugins with erroneous behavior should only be present in the TestPlugins, so that tests can opt-in to loading them. There are already plugins with private constructors and throwing-exceptions-constructors, so they should be able to be re-used. was: The ConnectorValidationIntegrationTest creates test plugins, some with erroneous behavior. In particular: {noformat} [2023-12-29 10:28:06,548] ERROR Failed to discover Converter in classpath: Unable to instantiate TestConverterWithPrivateConstructor: Plugin class default constructor must be public (org.apache.kafka.connect.runtime.isolation.ReflectionScanner:138) [2023-12-29 10:28:06,550] ERROR Failed to discover Converter in classpath: Unable to instantiate TestConverterWithConstructorThatThrowsException: Failed to invoke plugin constructor (org.apache.kafka.connect.runtime.isolation.ReflectionScanner:138) java.lang.reflect.InvocationTargetException{noformat} These plugins should be eliminated from the classpath, so that the errors do not appear in unrelated tests. Instead, plugins with erroneous behavior should only be present in the TestPlugins, so that tests can opt-in to loading them. There are already plugins with private constructors and throwing-exceptions-constructors, so they should be able to be re-used. > Use TestPlugins in ConnectorValidationIntegrationTest to silence plugin > scanning errors > --- > > Key: KAFKA-16068 > URL: https://issues.apache.org/jira/browse/KAFKA-16068 > Project: Kafka > Issue Type: Task > Components: KafkaConnect >Reporter: Greg Harris >Priority: Minor > > The ConnectorValidationIntegrationTest creates test plugins, some with > erroneous behavior. In particular: > > {noformat} > [2023-12-29 10:28:06,548] ERROR Failed to discover Converter in classpath: > Unable to instantiate TestConverterWithPrivateConstructor: Plugin class > default constructor must be public > (org.apache.kafka.connect.runtime.isolation.ReflectionScanner:138) > [2023-12-29 10:28:06,550] > ERROR Failed to discover Converter in classpath: Unable to instantiate > TestConverterWithConstructorThatThrowsException: Failed to invoke plugin > constructor (org.apache.kafka.connect.runtime.isolation.ReflectionScanner:138) > java.lang.reflect.InvocationTargetException{noformat} > These plugins should be eliminated from the classpath, so that the errors do > not appear in unrelated tests. Instead, plugins with erroneous behavior > should only be present in the TestPlugins, so that tests can opt-in to > loading them. > There are already plugins with private constructors and > throwing-exceptions-constructors, so they should be able to be re-used. > -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16047: Leverage the fenceProducers timeout in the InitProducerId [kafka]
gharris1727 commented on code in PR #15078: URL: https://github.com/apache/kafka/pull/15078#discussion_r1438351308 ## clients/src/test/java/org/apache/kafka/clients/admin/KafkaAdminClientTest.java: ## @@ -7031,6 +7031,7 @@ public void testFenceProducers() throws Exception { try (AdminClientUnitTestEnv env = mockClientEnv()) { String transactionalId = "copyCat"; Node transactionCoordinator = env.cluster().nodes().iterator().next(); +final FenceProducersOptions options = new FenceProducersOptions().timeoutMs(1); Review Comment: This doesn't test the null-default case which is going to be more common, and doesn't seem necessary for the correctness of the test either. I think instead, we should change the `request -> request instanceof InitProducerIdRequest` to assert that the increased timeout is used, instead of the 1ms timeout. And maybe refactor/parameterize the test to try a default-options and options with an explicit timeout, if we want to test both branches. ## clients/src/main/java/org/apache/kafka/clients/admin/KafkaAdminClient.java: ## @@ -4394,7 +4394,10 @@ public ListTransactionsResult listTransactions(ListTransactionsOptions options) public FenceProducersResult fenceProducers(Collection transactionalIds, FenceProducersOptions options) { AdminApiFuture.SimpleAdminApiFuture future = FenceProducersHandler.newFuture(transactionalIds); -FenceProducersHandler handler = new FenceProducersHandler(logContext); +if (options.timeoutMs() == null) { +options.timeoutMs(defaultApiTimeoutMs); Review Comment: I think it's best if we don't mutate the options if the user passed it in. If someone were to log the options somewhere else, they would see our injected default instead of the fact that no value was provided. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16064: Improve ControllerApiTest [kafka]
wernerdv commented on PR #15091: URL: https://github.com/apache/kafka/pull/15091#issuecomment-1872240851 The tests passed https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15091/1/testReport/kafka.server/ControllerApisTest/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (KAFKA-16062) Upgrade mockito to 5.8.0
[ https://issues.apache.org/jira/browse/KAFKA-16062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Divij Vaidya updated KAFKA-16062: - Fix Version/s: 3.7.0 > Upgrade mockito to 5.8.0 > > > Key: KAFKA-16062 > URL: https://issues.apache.org/jira/browse/KAFKA-16062 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Divij Vaidya >Priority: Major > Fix For: 3.7.0, 3.8.0 > > > Upgrading to use the latest version of mockito. Updated from 5.5.0 to 5.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16065) Fix leak in DelayedOperationTest
[ https://issues.apache.org/jira/browse/KAFKA-16065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Divij Vaidya updated KAFKA-16065: - Fix Version/s: 3.6.2 > Fix leak in DelayedOperationTest > > > Key: KAFKA-16065 > URL: https://issues.apache.org/jira/browse/KAFKA-16065 > Project: Kafka > Issue Type: Sub-task >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > Fix For: 3.7.0, 3.6.2, 3.8.0 > > > Fix leak in DelayedOperationTest. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16065) Fix leak in DelayedOperationTest
[ https://issues.apache.org/jira/browse/KAFKA-16065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Divij Vaidya updated KAFKA-16065: - Fix Version/s: 3.7.0 3.8.0 > Fix leak in DelayedOperationTest > > > Key: KAFKA-16065 > URL: https://issues.apache.org/jira/browse/KAFKA-16065 > Project: Kafka > Issue Type: Sub-task >Reporter: Luke Chen >Assignee: Luke Chen >Priority: Major > Fix For: 3.7.0, 3.8.0 > > > Fix leak in DelayedOperationTest. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16065: close DelayedFuturePurgatory [kafka]
divijvaidya commented on PR #15090: URL: https://github.com/apache/kafka/pull/15090#issuecomment-1872236638 backported to 3.6 and 3.7 branches -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16065: close DelayedFuturePurgatory [kafka]
divijvaidya merged PR #15090: URL: https://github.com/apache/kafka/pull/15090 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16065: close DelayedFuturePurgatory [kafka]
divijvaidya commented on PR #15090: URL: https://github.com/apache/kafka/pull/15090#issuecomment-1872233650 The test modified in this PR is successful https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15090/1/testReport/kafka.server/DelayedOperationTest/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (KAFKA-16025) Streams StateDirectory has orphaned locks after rebalancing, blocking future rebalancing
[ https://issues.apache.org/jira/browse/KAFKA-16025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sabit updated KAFKA-16025: -- Attachment: (was: Screenshot 1702750558363.png) > Streams StateDirectory has orphaned locks after rebalancing, blocking future > rebalancing > > > Key: KAFKA-16025 > URL: https://issues.apache.org/jira/browse/KAFKA-16025 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 3.4.0 > Environment: Linux >Reporter: Sabit >Priority: Major > > Hello, > > We are encountering an issue where during rebalancing, we see streams threads > on one client get stuck in rebalancing. Upon enabling debug logs, we saw that > some tasks were having issues initializing due to failure to grab a lock in > the StateDirectory: > > {{2023-12-14 22:51:57.352000Z stream-thread > [i-0f1a5e7a42158e04b-StreamThread-14] Could not initialize task 0_51 since: > stream-thread [i-0f1a5e7a42158e04b-StreamThread-14] standby-task [0_51] > Failed to lock the state directory for task 0_51; will retry}} > > We were able to reproduce this behavior reliably on 3.4.0. This is the > sequence that triggers the bug. > Assume in a streams consumer group, there are 5 instances (A, B, C, D, E), > each with 5 threads (1-5), and the consumer is using stateful tasks which > have state stores on disk. There are 10 active tasks and 10 standby tasks. > # Instance A is deactivated > # As an example, lets say task 0_1, previously on instance B, moves to > instance C > # Task 0_1 leaves behind it's state directory on Instance B's disk, > currently unused, and no lock for it exists in Instance B's StateDirectory > in-memory lock tracker > # Instance A is re-activated > # Streams thread 1 on Instance B is asked to re-join the consumer group due > to a new member being added > # As part of re-joining, thread 1 lists non-empty state directories in order > to report the offset's it has in it's state stores as part of it's metadata. > Thread 1 sees that the directory for 0_1 is not empty. > # The cleanup thread on instance B runs. The cleanup thread locks state > store 0_1, sees the directory for 0_1 was last modified more than > `state.cleanup.delay.ms` ago, deletes it, and unlocks it successfully > # Thread 1 takes a lock on directory 0_1 due to it being found not-empty > before, unaware that the cleanup has run between the time of the check and > the lock. It tracks this lock in it's own in-memory store, in addition to > StateDirectory's in-memory lock store > # Thread 1 successfully joins the consumer group > # After every consumer in the group joins the group, assignments are > calculated, and then every consumer calls sync group to receive the new > assignments > # Thread 1 on Instance B calls sync group but gets an error - the group > coordinator has triggered a new rebalance and all members must rejoin the > group > # Thread 1 again lists non-empty state directories in order to report the > offset's it has in it's state stores as part of it's metadata. Prior to doing > so, it clears it's in-memory store tracking the locks it has taken for the > purpose of gathering rebalance metadata > # Thread 1 no longer takes a lock on 0_1 as it is empty > # However, that lock on 0_1 owned by Thread 1 remains in StateDirectory > # All consumers re-join and sync successfully, receiving their new > assignments > # Thread 2 on Instance B is assigned task 0_1 > # Thread 2 cannot take a lock on 0_1 in the StateDirectory because it is > still being held by Thread 1 > # Thread 2 remains in rebalancing state, and cannot make progress on task > 0_1, or any other tasks it has assigned. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] MINOR: fix typo. [kafka]
divijvaidya merged PR #15098: URL: https://github.com/apache/kafka/pull/15098 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Updated] (KAFKA-16062) Upgrade mockito to 5.8.0
[ https://issues.apache.org/jira/browse/KAFKA-16062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Divij Vaidya updated KAFKA-16062: - Fix Version/s: 3.8.0 > Upgrade mockito to 5.8.0 > > > Key: KAFKA-16062 > URL: https://issues.apache.org/jira/browse/KAFKA-16062 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Divij Vaidya >Priority: Major > Fix For: 3.8.0 > > > Upgrading to use the latest version of mockito. Updated from 5.5.0 to 5.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16062: Upgrade mockito to 5.8.0 [kafka]
divijvaidya merged PR #15089: URL: https://github.com/apache/kafka/pull/15089 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16062: Upgrade mockito to 5.8.0 [kafka]
divijvaidya commented on PR #15089: URL: https://github.com/apache/kafka/pull/15089#issuecomment-1872181451 Unrelated test failures. ``` [Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.DedicatedMirrorIntegrationTest.testMultiNodeCluster()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15089/1/testReport/junit/org.apache.kafka.connect.mirror.integration/DedicatedMirrorIntegrationTest/Build___JDK_11_and_Scala_2_13___testMultiNodeCluster__/) [Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationSSLTest.testReplicateFromLatest()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15089/1/testReport/junit/org.apache.kafka.connect.mirror.integration/MirrorConnectorsIntegrationSSLTest/Build___JDK_11_and_Scala_2_13___testReplicateFromLatest__/) [Build / JDK 11 and Scala 2.13 / org.apache.kafka.streams.integration.ConsistencyVectorIntegrationTest.shouldHaveSamePositionBoundActiveAndStandBy](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15089/1/testReport/junit/org.apache.kafka.streams.integration/ConsistencyVectorIntegrationTest/Build___JDK_11_and_Scala_2_13___shouldHaveSamePositionBoundActiveAndStandBy/) [Build / JDK 11 and Scala 2.13 / org.apache.kafka.trogdor.coordinator.CoordinatorTest.testTaskRequestWithOldStartMsGetsUpdated()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15089/1/testReport/junit/org.apache.kafka.trogdor.coordinator/CoordinatorTest/Build___JDK_11_and_Scala_2_13___testTaskRequestWithOldStartMsGetsUpdated__/) [Build / JDK 17 and Scala 2.13 / kafka.api.TransactionsBounceTest.testWithGroupMetadata()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15089/1/testReport/junit/kafka.api/TransactionsBounceTest/Build___JDK_17_and_Scala_2_13___testWithGroupMetadata__/) [Build / JDK 17 and Scala 2.13 / org.apache.kafka.controller.QuorumControllerTest.testFenceMultipleBrokers()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15089/1/testReport/junit/org.apache.kafka.controller/QuorumControllerTest/Build___JDK_17_and_Scala_2_13___testFenceMultipleBrokers__/) [Build / JDK 21 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.IdentityReplicationIntegrationTest.testReplicateSourceDefault()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15089/1/testReport/junit/org.apache.kafka.connect.mirror.integration/IdentityReplicationIntegrationTest/Build___JDK_21_and_Scala_2_13___testReplicateSourceDefault__/) [Build / JDK 21 and Scala 2.13 / kafka.api.DelegationTokenEndToEndAuthorizationWithOwnerTest.testNoConsumeWithoutDescribeAclViaSubscribe(String).quorum=kraft](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15089/1/testReport/junit/kafka.api/DelegationTokenEndToEndAuthorizationWithOwnerTest/Build___JDK_21_and_Scala_2_13___testNoConsumeWithoutDescribeAclViaSubscribe_String__quorum_kraft/) [Build / JDK 21 and Scala 2.13 / kafka.api.TransactionsBounceTest.testWithGroupMetadata()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15089/1/testReport/junit/kafka.api/TransactionsBounceTest/Build___JDK_21_and_Scala_2_13___testWithGroupMetadata__/) [Build / JDK 21 and Scala 2.13 / kafka.api.TransactionsBounceTest.testWithGroupId()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15089/1/testReport/junit/kafka.api/TransactionsBounceTest/Build___JDK_21_and_Scala_2_13___testWithGroupId__/) [Build / JDK 21 and Scala 2.13 / kafka.api.TransactionsBounceTest.testWithGroupMetadata()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15089/1/testReport/junit/kafka.api/TransactionsBounceTest/Build___JDK_21_and_Scala_2_13___testWithGroupMetadata___2/) [Build / JDK 21 and Scala 2.13 / org.apache.kafka.tiered.storage.integration.TransactionsWithTieredStoreTest.testBumpTransactionalEpoch(String).quorum=kraft+kip848](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15089/1/testReport/junit/org.apache.kafka.tiered.storage.integration/TransactionsWithTieredStoreTest/Build___JDK_21_and_Scala_2_13___testBumpTransactionalEpoch_String__quorum_kraft_kip848/) [Build / JDK 8 and Scala 2.12 / org.apache.kafka.connect.mirror.integration.IdentityReplicationIntegrationTest.testReplicateSourceDefault()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15089/1/testReport/junit/org.apache.kafka.connect.mirror.integration/IdentityReplicationIntegrationTest/Build___JDK_8_and_Scala_2_12___testReplicateSourceDefault__/) [Build / JDK 8 and Scala 2.12 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationExactlyOnceTest.testSyncTopicConfigs()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15089/1/testReport/junit/org.apache.kafka.connect.mirror.integration/MirrorConnectorsIntegrationExactlyOnceTest/Build___JDK_8_and_Scala_2_12___testSyncTopicConfigs__/) [Build / JDK 8 and Scala 2.12 /
Re: [PR] fix typo. [kafka]
LiangliangSui commented on PR #15098: URL: https://github.com/apache/kafka/pull/15098#issuecomment-1872156471 Hi team, PTAL, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] fix typo. [kafka]
LiangliangSui opened a new pull request, #15098: URL: https://github.com/apache/kafka/pull/15098 fix typo. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] MINOR: move setReadOnly to Headers. [kafka]
LiangliangSui commented on PR #15097: URL: https://github.com/apache/kafka/pull/15097#issuecomment-1872153551 Hi team, PTAL, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] MINOR: move setReadOnly to Headers. [kafka]
LiangliangSui opened a new pull request, #15097: URL: https://github.com/apache/kafka/pull/15097 Promote the setReadOnly function to Headers instead of only in RecordHeaders. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Assigned] (KAFKA-9545) Flaky Test `RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted`
[ https://issues.apache.org/jira/browse/KAFKA-9545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy reassigned KAFKA-9545: - Assignee: Lucas Brutschy (was: Ashwin Pankaj) > Flaky Test `RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted` > -- > > Key: KAFKA-9545 > URL: https://issues.apache.org/jira/browse/KAFKA-9545 > Project: Kafka > Issue Type: Bug > Components: streams >Reporter: Jason Gustafson >Assignee: Lucas Brutschy >Priority: Major > > https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/4678/testReport/org.apache.kafka.streams.integration/RegexSourceIntegrationTest/testRegexMatchesTopicsAWhenDeleted/ > {code} > java.lang.AssertionError: Condition not met within timeout 15000. Stream > tasks not updated > at org.hamcrest.MatcherAssert.assertThat(MatcherAssert.java:26) > at > org.apache.kafka.test.TestUtils.lambda$waitForCondition$5(TestUtils.java:367) > at > org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:415) > at > org.apache.kafka.test.TestUtils.retryOnExceptionWithTimeout(TestUtils.java:383) > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:366) > at org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:337) > at > org.apache.kafka.streams.integration.RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted(RegexSourceIntegrationTest.java:224) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] KAFKA-9545: Fix IllegalStateException in updateLags [kafka]
lucasbru opened a new pull request, #15096: URL: https://github.com/apache/kafka/pull/15096 We attempt to update lags when in state PENDING_SHUTDOWN or REBALANCING. In these states, however, our representation of the assignment may not be up-to-date with the subscription object inside the consumer. This can cause a bug, in particular, when we subscribe to a set of topics via a regular expression and the underlying topic is deleted. The consumer subscription may reflect that topic deletion already, while our internal state still contains references to the deleted topic, because `onAssignment` has not yet been executed. Therefore, we will attempt to call `currentLag` on partitions that are not assigned to us any more inside the consumer, leading to an `IllegalStateException`. This bug causes flakiness of the test `RegexSourceIntegrationTest.testRegexMatchesTopicsAWhenDeleted`. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] MINOR: Close MockitoStatic in try-with-resource [kafka]
divijvaidya opened a new pull request, #15095: URL: https://github.com/apache/kafka/pull/15095 Minor code cleanup to use try-with-resource so that there is no leak in case an exception is thrown before closing mocks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16052: Create a real dummy replicaManager instance to save heap memory [kafka]
divijvaidya commented on PR #15094: URL: https://github.com/apache/kafka/pull/15094#issuecomment-1872122325 @showuon @jolshan please review when you get a chance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] KAFKA-16052: Create a real dummy replicaManager instance to save heap memory [kafka]
divijvaidya opened a new pull request, #15094: URL: https://github.com/apache/kafka/pull/15094 For more detail, please check [KAFKA-16052](https://issues.apache.org/jira/browse/KAFKA-16052). The mockito will keep the invocation history in the test suite and cause the huge heap usage. Since the mock replicaManager is only used to bypass the replicaManager constructor without verifying/mocking anything, we can create a real dummy replicaManager to avoid the mockito invocation history in memory. Co-authored-by: Luke Chen -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (KAFKA-16059) Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test
[ https://issues.apache.org/jira/browse/KAFKA-16059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801185#comment-17801185 ] Divij Vaidya commented on KAFKA-16059: -- I found that there are actually leaked threads in KafkaApisTest as well. Have started a PR associated with this Jira to fix that. > Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test > -- > > Key: KAFKA-16059 > URL: https://issues.apache.org/jira/browse/KAFKA-16059 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Divij Vaidya >Priority: Major > Attachments: Screenshot 2023-12-29 at 11.13.01.png > > > We are leaking hundreds of ExpirationReaper-1-AlterAcls threads in one of the > tests in :core:test > {code:java} > "ExpirationReaper-1-AlterAcls" prio=0 tid=0x0 nid=0x0 waiting on condition > java.lang.Thread.State: TIMED_WAITING > on > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3688fc67 > at java.base@17.0.9/jdk.internal.misc.Unsafe.park(Native Method) > at > java.base@17.0.9/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252) > at > java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672) > at > java.base@17.0.9/java.util.concurrent.DelayQueue.poll(DelayQueue.java:265) > at > app//org.apache.kafka.server.util.timer.SystemTimer.advanceClock(SystemTimer.java:87) > at > app//kafka.server.DelayedOperationPurgatory.advanceClock(DelayedOperation.scala:418) > at > app//kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper.doWork(DelayedOperation.scala:444) > at > app//org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:131) > {code} > The objective of this Jira is to identify the test and fix this leak -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16059) Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test
[ https://issues.apache.org/jira/browse/KAFKA-16059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Divij Vaidya reassigned KAFKA-16059: Assignee: Divij Vaidya (was: Arpit Goyal) > Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test > -- > > Key: KAFKA-16059 > URL: https://issues.apache.org/jira/browse/KAFKA-16059 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Divij Vaidya >Priority: Major > Attachments: Screenshot 2023-12-29 at 11.13.01.png > > > We are leaking hundreds of ExpirationReaper-1-AlterAcls threads in one of the > tests in :core:test > {code:java} > "ExpirationReaper-1-AlterAcls" prio=0 tid=0x0 nid=0x0 waiting on condition > java.lang.Thread.State: TIMED_WAITING > on > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3688fc67 > at java.base@17.0.9/jdk.internal.misc.Unsafe.park(Native Method) > at > java.base@17.0.9/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252) > at > java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672) > at > java.base@17.0.9/java.util.concurrent.DelayQueue.poll(DelayQueue.java:265) > at > app//org.apache.kafka.server.util.timer.SystemTimer.advanceClock(SystemTimer.java:87) > at > app//kafka.server.DelayedOperationPurgatory.advanceClock(DelayedOperation.scala:418) > at > app//kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper.doWork(DelayedOperation.scala:444) > at > app//org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:131) > {code} > The objective of this Jira is to identify the test and fix this leak -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16059: Fix thread leak KafkaAPIsTest [kafka]
divijvaidya commented on PR #15093: URL: https://github.com/apache/kafka/pull/15093#issuecomment-1872116408 @showuon @jolshan please review when you get a chance -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] KAFKA-16059: Fix thread leak KafkaAPIsTest [kafka]
divijvaidya opened a new pull request, #15093: URL: https://github.com/apache/kafka/pull/15093 KafkaApisTest creates instance of `KafkaApis.scala` which created reaper threads for purgatory but we never call close and hence, these threads are leaked. This PR closes the instance of KafkaApis in `@AfterEach`. This leak is similar to the one detected in https://github.com/apache/kafka/pull/15084 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] KAFKA-16067 Refactoring ConsumerGroupListing + add test [kafka]
rykovsi opened a new pull request, #15092: URL: https://github.com/apache/kafka/pull/15092 *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (KAFKA-16060) Some questions about tiered storage capabilities
[ https://issues.apache.org/jira/browse/KAFKA-16060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801167#comment-17801167 ] Satish Duggana commented on KAFKA-16060: [~jianbin] For Q2: Do you mean compaction topics are supported with tiered storage? No, tiered storage is not supported for compaction enabled topics. > Some questions about tiered storage capabilities > > > Key: KAFKA-16060 > URL: https://issues.apache.org/jira/browse/KAFKA-16060 > Project: Kafka > Issue Type: Wish > Components: core >Affects Versions: 3.6.1 >Reporter: Jianbin Chen >Priority: Major > > # If a topic has 3 replicas, when the local expiration time is reached, will > all 3 replicas trigger the log transfer to the remote storage, or will only > the leader in the isr transfer the log to the remote storage (hdfs, s3) > # Topics that do not support compression, do you mean topics that > log.cleanup.policy=compact? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] KAFKA-16064: Improve ControllerApiTest [kafka]
wernerdv opened a new pull request, #15091: URL: https://github.com/apache/kafka/pull/15091 Refactoring ControllerApiTest to close an instance of ControllerApis in a tearDown method. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-16067) Refactoring ConsumerGroupListing + add test
Svyatoslav created KAFKA-16067: -- Summary: Refactoring ConsumerGroupListing + add test Key: KAFKA-16067 URL: https://issues.apache.org/jira/browse/KAFKA-16067 Project: Kafka Issue Type: Test Components: admin, clients Affects Versions: 3.7.0 Reporter: Svyatoslav Refactoring ConsumerGroupListing + add test Look in to PR -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-15742) KRaft support in GroupCoordinatorIntegrationTest
[ https://issues.apache.org/jira/browse/KAFKA-15742?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dmitry reassigned KAFKA-15742: -- Assignee: Dmitry > KRaft support in GroupCoordinatorIntegrationTest > > > Key: KAFKA-15742 > URL: https://issues.apache.org/jira/browse/KAFKA-15742 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Sameer Tejani >Assignee: Dmitry >Priority: Minor > Labels: kraft, kraft-test, newbie > > The following tests in GroupCoordinatorIntegrationTest in > core/src/test/scala/integration/kafka/api/GroupCoordinatorIntegrationTest.scala > need to be updated to support KRaft > 41 : def testGroupCoordinatorPropagatesOffsetsTopicCompressionCodec(): Unit = > { > Scanned 63 lines. Found 0 KRaft tests out of 1 tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16064) improve ControllerApiTest
[ https://issues.apache.org/jira/browse/KAFKA-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Divij Vaidya reassigned KAFKA-16064: Assignee: Dmitry (was: Dmitry) > improve ControllerApiTest > - > > Key: KAFKA-16064 > URL: https://issues.apache.org/jira/browse/KAFKA-16064 > Project: Kafka > Issue Type: Test >Reporter: Luke Chen >Assignee: Dmitry >Priority: Major > Labels: newbie, newbie++ > > It's usually more robust to automatically handle clean-up during tearDown by > instrumenting the create method so that it keeps track of all creations. > > context: > https://github.com/apache/kafka/pull/15084#issuecomment-1871302733 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16066) Upgrade apacheds to 2.0.0.AM27
[ https://issues.apache.org/jira/browse/KAFKA-16066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Divij Vaidya updated KAFKA-16066: - Labels: newbie newbie++ (was: newbie) > Upgrade apacheds to 2.0.0.AM27 > -- > > Key: KAFKA-16066 > URL: https://issues.apache.org/jira/browse/KAFKA-16066 > Project: Kafka > Issue Type: Improvement >Reporter: Divij Vaidya >Priority: Major > Labels: newbie, newbie++ > > We are currently using a very old dependency. Notably, apacheds is only used > for testing when we use MiniKdc, hence, there is nothing stopping us from > upgrading it. > Notably, apacheds has removed the component > org.apache.directory.server:apacheds-protocol-kerberos in favour of using > Apache Kerby, hence, we need to make changes in MiniKdc.scala for this > upgrade to work correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16064) improve ControllerApiTest
[ https://issues.apache.org/jira/browse/KAFKA-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801161#comment-17801161 ] Divij Vaidya commented on KAFKA-16064: -- Hey [~javakillah] You should be able to assign JIRAs to yourself now. I have meanwhile assigned this to you. > improve ControllerApiTest > - > > Key: KAFKA-16064 > URL: https://issues.apache.org/jira/browse/KAFKA-16064 > Project: Kafka > Issue Type: Test >Reporter: Luke Chen >Assignee: Dmitry >Priority: Major > Labels: newbie, newbie++ > > It's usually more robust to automatically handle clean-up during tearDown by > instrumenting the create method so that it keeps track of all creations. > > context: > https://github.com/apache/kafka/pull/15084#issuecomment-1871302733 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16064) improve ControllerApiTest
[ https://issues.apache.org/jira/browse/KAFKA-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Divij Vaidya reassigned KAFKA-16064: Assignee: Dmitry > improve ControllerApiTest > - > > Key: KAFKA-16064 > URL: https://issues.apache.org/jira/browse/KAFKA-16064 > Project: Kafka > Issue Type: Test >Reporter: Luke Chen >Assignee: Dmitry >Priority: Major > Labels: newbie, newbie++ > > It's usually more robust to automatically handle clean-up during tearDown by > instrumenting the create method so that it keeps track of all creations. > > context: > https://github.com/apache/kafka/pull/15084#issuecomment-1871302733 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16058: close controllerApi instance to avoid thread leaking [kafka]
ijuma commented on PR #15084: URL: https://github.com/apache/kafka/pull/15084#issuecomment-1872030491 Sounds good, thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-16066) Upgrade apacheds to 2.0.0.AM27
Divij Vaidya created KAFKA-16066: Summary: Upgrade apacheds to 2.0.0.AM27 Key: KAFKA-16066 URL: https://issues.apache.org/jira/browse/KAFKA-16066 Project: Kafka Issue Type: Improvement Reporter: Divij Vaidya We are currently using a very old dependency. Notably, apacheds is only used for testing when we use MiniKdc, hence, there is nothing stopping us from upgrading it. Notably, apacheds has removed the component org.apache.directory.server:apacheds-protocol-kerberos in favour of using Apache Kerby, hence, we need to make changes in MiniKdc.scala for this upgrade to work correctly. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16064) improve ControllerApiTest
[ https://issues.apache.org/jira/browse/KAFKA-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801158#comment-17801158 ] Dmitry commented on KAFKA-16064: [~showuon] Hello, can you assign it to me? > improve ControllerApiTest > - > > Key: KAFKA-16064 > URL: https://issues.apache.org/jira/browse/KAFKA-16064 > Project: Kafka > Issue Type: Test >Reporter: Luke Chen >Priority: Major > Labels: newbie, newbie++ > > It's usually more robust to automatically handle clean-up during tearDown by > instrumenting the create method so that it keeps track of all creations. > > context: > https://github.com/apache/kafka/pull/15084#issuecomment-1871302733 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16065: close DelayedFuturePurgatory [kafka]
showuon commented on PR #15090: URL: https://github.com/apache/kafka/pull/15090#issuecomment-1872004068 @divijvaidya , please take a look when available. Thanks. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
[ https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801156#comment-17801156 ] Arpit Goyal commented on KAFKA-16063: - Thanks [~divijvaidya]. I am picking it up. > Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests > - > > Key: KAFKA-16063 > URL: https://issues.apache.org/jira/browse/KAFKA-16063 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Arpit Goyal >Priority: Major > Attachments: Screenshot 2023-12-29 at 12.38.29.png > > > All test extending `EndToEndAuthorizationTest` are leaking > DefaultDirectoryService objects. > This can be observed using the heap dump at > [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0] > (unzip this and you will find a hprof which can be opened with your > favourite heap analyzer) > The stack trace looks like this: > !Screenshot 2023-12-29 at 12.38.29.png! > > I suspect that the reason is because DefaultDirectoryService#startup() > registers a shutdownhook which is somehow messed up by > QuorumTestHarness#teardown(). > We need to investigate why this is leaking and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Assigned] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
[ https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arpit Goyal reassigned KAFKA-16063: --- Assignee: Arpit Goyal > Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests > - > > Key: KAFKA-16063 > URL: https://issues.apache.org/jira/browse/KAFKA-16063 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Arpit Goyal >Priority: Major > Attachments: Screenshot 2023-12-29 at 12.38.29.png > > > All test extending `EndToEndAuthorizationTest` are leaking > DefaultDirectoryService objects. > This can be observed using the heap dump at > [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0] > (unzip this and you will find a hprof which can be opened with your > favourite heap analyzer) > The stack trace looks like this: > !Screenshot 2023-12-29 at 12.38.29.png! > > I suspect that the reason is because DefaultDirectoryService#startup() > registers a shutdownhook which is somehow messed up by > QuorumTestHarness#teardown(). > We need to investigate why this is leaking and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[PR] KAFKA-16065: close DelayedFuturePurgatory [kafka]
showuon opened a new pull request, #15090: URL: https://github.com/apache/kafka/pull/15090 Found 1 thread leaking in DelayedOperationTest. Closing it. ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-16065) Fix leak in DelayedOperationTest
Luke Chen created KAFKA-16065: - Summary: Fix leak in DelayedOperationTest Key: KAFKA-16065 URL: https://issues.apache.org/jira/browse/KAFKA-16065 Project: Kafka Issue Type: Sub-task Reporter: Luke Chen Assignee: Luke Chen Fix leak in DelayedOperationTest. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16052) OOM in Kafka test suite
[ https://issues.apache.org/jira/browse/KAFKA-16052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801153#comment-17801153 ] Divij Vaidya commented on KAFKA-16052: -- [~jolshan] [~showuon] - I found another culprit - https://issues.apache.org/jira/browse/KAFKA-16063 > OOM in Kafka test suite > --- > > Key: KAFKA-16052 > URL: https://issues.apache.org/jira/browse/KAFKA-16052 > Project: Kafka > Issue Type: Bug >Affects Versions: 3.7.0 >Reporter: Divij Vaidya >Priority: Major > Attachments: Screenshot 2023-12-27 at 14.04.52.png, Screenshot > 2023-12-27 at 14.22.21.png, Screenshot 2023-12-27 at 14.45.20.png, Screenshot > 2023-12-27 at 15.31.09.png, Screenshot 2023-12-27 at 17.44.09.png, Screenshot > 2023-12-28 at 00.13.06.png, Screenshot 2023-12-28 at 00.18.56.png, Screenshot > 2023-12-28 at 11.26.03.png, Screenshot 2023-12-28 at 11.26.09.png, Screenshot > 2023-12-28 at 18.44.19.png, newRM.patch > > > *Problem* > Our test suite is failing with frequent OOM. Discussion in the mailing list > is here: [https://lists.apache.org/thread/d5js0xpsrsvhgjb10mbzo9cwsy8087x4] > *Setup* > To find the source of leaks, I ran the :core:test build target with a single > thread (see below on how to do it) and attached a profiler to it. This Jira > tracks the list of action items identified from the analysis. > How to run tests using a single thread: > {code:java} > diff --git a/build.gradle b/build.gradle > index f7abbf4f0b..81df03f1ee 100644 > --- a/build.gradle > +++ b/build.gradle > @@ -74,9 +74,8 @@ ext { > "--add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED" > )- maxTestForks = project.hasProperty('maxParallelForks') ? > maxParallelForks.toInteger() : Runtime.runtime.availableProcessors() > - maxScalacThreads = project.hasProperty('maxScalacThreads') ? > maxScalacThreads.toInteger() : > - Math.min(Runtime.runtime.availableProcessors(), 8) > + maxTestForks = 1 > + maxScalacThreads = 1 > userIgnoreFailures = project.hasProperty('ignoreFailures') ? > ignoreFailures : false userMaxTestRetries = > project.hasProperty('maxTestRetries') ? maxTestRetries.toInteger() : 0 > diff --git a/gradle.properties b/gradle.properties > index 4880248cac..ee4b6e3bc1 100644 > --- a/gradle.properties > +++ b/gradle.properties > @@ -30,4 +30,4 @@ scalaVersion=2.13.12 > swaggerVersion=2.2.8 > task=build > org.gradle.jvmargs=-Xmx2g -Xss4m -XX:+UseParallelGC > -org.gradle.parallel=true > +org.gradle.parallel=false {code} > *Result of experiment* > This is how the heap memory utilized looks like, starting from tens of MB to > ending with 1.5GB (with spikes of 2GB) of heap being used as the test > executes. Note that the total number of threads also increases but it does > not correlate with sharp increase in heap memory usage. The heap dump is > available at > [https://www.dropbox.com/scl/fi/nwtgc6ir6830xlfy9z9cu/GradleWorkerMain_10311_27_12_2023_13_37_08.hprof.zip?rlkey=ozbdgh5vih4rcynnxbatzk7ln=0] > > !Screenshot 2023-12-27 at 14.22.21.png! -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
[ https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801152#comment-17801152 ] Divij Vaidya commented on KAFKA-16063: -- All tests that are using "SaslSetup" are leaking these objects "ApacheDS Shutdown Hook" > Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests > - > > Key: KAFKA-16063 > URL: https://issues.apache.org/jira/browse/KAFKA-16063 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Priority: Major > Attachments: Screenshot 2023-12-29 at 12.38.29.png > > > All test extending `EndToEndAuthorizationTest` are leaking > DefaultDirectoryService objects. > This can be observed using the heap dump at > [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0] > (unzip this and you will find a hprof which can be opened with your > favourite heap analyzer) > The stack trace looks like this: > !Screenshot 2023-12-29 at 12.38.29.png! > > I suspect that the reason is because DefaultDirectoryService#startup() > registers a shutdownhook which is somehow messed up by > QuorumTestHarness#teardown(). > We need to investigate why this is leaking and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16064) improve ControllerApiTest
[ https://issues.apache.org/jira/browse/KAFKA-16064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Chen updated KAFKA-16064: -- Labels: newbie newbie++ (was: ) > improve ControllerApiTest > - > > Key: KAFKA-16064 > URL: https://issues.apache.org/jira/browse/KAFKA-16064 > Project: Kafka > Issue Type: Test >Reporter: Luke Chen >Priority: Major > Labels: newbie, newbie++ > > It's usually more robust to automatically handle clean-up during tearDown by > instrumenting the create method so that it keeps track of all creations. > > context: > https://github.com/apache/kafka/pull/15084#issuecomment-1871302733 -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16058: close controllerApi instance to avoid thread leaking [kafka]
showuon commented on PR #15084: URL: https://github.com/apache/kafka/pull/15084#issuecomment-1871990196 > It's usually more robust to automatically handle clean-up during tearDown by instrumenting the create method so that it keeps track of all creations. Thanks for the comment @ijuma . Since the thread leaking is fixed now, we can improve it later. I've opened [KAFKA-16064](https://issues.apache.org/jira/browse/KAFKA-16064) for this improvement. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-16064) improve ControllerApiTest
Luke Chen created KAFKA-16064: - Summary: improve ControllerApiTest Key: KAFKA-16064 URL: https://issues.apache.org/jira/browse/KAFKA-16064 Project: Kafka Issue Type: Test Reporter: Luke Chen It's usually more robust to automatically handle clean-up during tearDown by instrumenting the create method so that it keeps track of all creations. context: https://github.com/apache/kafka/pull/15084#issuecomment-1871302733 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-15742) KRaft support in GroupCoordinatorIntegrationTest
[ https://issues.apache.org/jira/browse/KAFKA-15742?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801148#comment-17801148 ] Dmitry commented on KAFKA-15742: [~dengziming] Hi, could you watch the PR [https://github.com/apache/kafka/pull/15086] ? It is similar to the one you approved - https://github.com/apache/kafka/pull/14175 > KRaft support in GroupCoordinatorIntegrationTest > > > Key: KAFKA-15742 > URL: https://issues.apache.org/jira/browse/KAFKA-15742 > Project: Kafka > Issue Type: Task > Components: core >Reporter: Sameer Tejani >Priority: Minor > Labels: kraft, kraft-test, newbie > > The following tests in GroupCoordinatorIntegrationTest in > core/src/test/scala/integration/kafka/api/GroupCoordinatorIntegrationTest.scala > need to be updated to support KRaft > 41 : def testGroupCoordinatorPropagatesOffsetsTopicCompressionCodec(): Unit = > { > Scanned 63 lines. Found 0 KRaft tests out of 1 tests -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16052: create a real dummy replicaManager instance to save memory [kafka]
showuon closed pull request #15083: KAFKA-16052: create a real dummy replicaManager instance to save memory URL: https://github.com/apache/kafka/pull/15083 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16052: create a real dummy replicaManager instance to save memory [kafka]
showuon commented on PR #15083: URL: https://github.com/apache/kafka/pull/15083#issuecomment-1871984181 > in this PR @divijvaidya , Good catch! We have to close replicaManager now since we're creating a real one. Let's use your commit for the fix. Please open a PR with this [commit](https://github.com/divijvaidya/kafka/commit/d4228a339035d83225408e5651e5cacbf604f0e8). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
Divij Vaidya created KAFKA-16063: Summary: Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests Key: KAFKA-16063 URL: https://issues.apache.org/jira/browse/KAFKA-16063 Project: Kafka Issue Type: Sub-task Reporter: Divij Vaidya Attachments: Screenshot 2023-12-29 at 12.38.29.png All test extending `EndToEndAuthorizationTest` are leaking DefaultDirectoryService objects. This can be observed using the heap dump at [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0] (unzip this and you will find a hprof which can be opened with your favourite heap analyzer) The stack trace looks like this: !Screenshot 2023-12-29 at 12.38.29.png! I suspect that the reason is because DefaultDirectoryService#startup() registers a shutdownhook which is somehow messed up by QuorumTestHarness#teardown(). We need to investigate why this is leaking and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16063) Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests
[ https://issues.apache.org/jira/browse/KAFKA-16063?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801147#comment-17801147 ] Divij Vaidya commented on KAFKA-16063: -- [~goyarpit] if you are interested, you can pick this one. > Fix leaked ApplicationShutdownHooks in EndToEndAuthorizationTests > - > > Key: KAFKA-16063 > URL: https://issues.apache.org/jira/browse/KAFKA-16063 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Priority: Major > Attachments: Screenshot 2023-12-29 at 12.38.29.png > > > All test extending `EndToEndAuthorizationTest` are leaking > DefaultDirectoryService objects. > This can be observed using the heap dump at > [https://www.dropbox.com/scl/fi/4jaq8rowkmijaoj7ec1nm/GradleWorkerMain_10311_27_12_2023_13_37_08_Leak_Suspects.zip?rlkey=minkbvopb0c65m5wryqw234xb=0] > (unzip this and you will find a hprof which can be opened with your > favourite heap analyzer) > The stack trace looks like this: > !Screenshot 2023-12-29 at 12.38.29.png! > > I suspect that the reason is because DefaultDirectoryService#startup() > registers a shutdownhook which is somehow messed up by > QuorumTestHarness#teardown(). > We need to investigate why this is leaking and fix it. -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16051: Fixed deadlock in StandaloneHerder [kafka]
developster commented on PR #15080: URL: https://github.com/apache/kafka/pull/15080#issuecomment-1871963622 Sure @vamossagar12 , just tested both unpatched 3.6.1 and patched 3.8.0. Unpatched version reached deadlock in 40% of the cases (6 out of 15) at the same time I was unable to reproduce deadlock in the patched version (0 out of 20). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[PR] KAFKA-16062: Upgrade mockito to 5.8.0 [kafka]
divijvaidya opened a new pull request, #15089: URL: https://github.com/apache/kafka/pull/15089 Upgrade mockito from 5.5.0 to 5.8.0. Upgrade diff: https://github.com/mockito/mockito/compare/v5.5.0...v5.8.0 No specific highlight in the release notes, it's majorly an upgrade of dependencies of Mockito. Release notes for 5.8-0 - https://github.com/mockito/mockito/releases/tag/v5.8.0 Release notes for 5.7.0 - https://github.com/mockito/mockito/releases/tag/v5.7.0 Release notes for 5.6.0 - https://github.com/mockito/mockito/releases/tag/v5.6.0 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Created] (KAFKA-16062) Upgrade mockito to 5.8.0
Divij Vaidya created KAFKA-16062: Summary: Upgrade mockito to 5.8.0 Key: KAFKA-16062 URL: https://issues.apache.org/jira/browse/KAFKA-16062 Project: Kafka Issue Type: Sub-task Reporter: Divij Vaidya Assignee: Divij Vaidya Upgrading to use the latest version of mockito. Updated from 5.5.0 to 5.8.0 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-16059) Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test
[ https://issues.apache.org/jira/browse/KAFKA-16059?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Divij Vaidya updated KAFKA-16059: - Attachment: Screenshot 2023-12-29 at 11.13.01.png > Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test > -- > > Key: KAFKA-16059 > URL: https://issues.apache.org/jira/browse/KAFKA-16059 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Arpit Goyal >Priority: Major > Attachments: Screenshot 2023-12-29 at 11.13.01.png > > > We are leaking hundreds of ExpirationReaper-1-AlterAcls threads in one of the > tests in :core:test > {code:java} > "ExpirationReaper-1-AlterAcls" prio=0 tid=0x0 nid=0x0 waiting on condition > java.lang.Thread.State: TIMED_WAITING > on > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3688fc67 > at java.base@17.0.9/jdk.internal.misc.Unsafe.park(Native Method) > at > java.base@17.0.9/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252) > at > java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672) > at > java.base@17.0.9/java.util.concurrent.DelayQueue.poll(DelayQueue.java:265) > at > app//org.apache.kafka.server.util.timer.SystemTimer.advanceClock(SystemTimer.java:87) > at > app//kafka.server.DelayedOperationPurgatory.advanceClock(DelayedOperation.scala:418) > at > app//kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper.doWork(DelayedOperation.scala:444) > at > app//org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:131) > {code} > The objective of this Jira is to identify the test and fix this leak -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Resolved] (KAFKA-16060) Some questions about tiered storage capabilities
[ https://issues.apache.org/jira/browse/KAFKA-16060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Divij Vaidya resolved KAFKA-16060. -- Resolution: Not A Problem > Some questions about tiered storage capabilities > > > Key: KAFKA-16060 > URL: https://issues.apache.org/jira/browse/KAFKA-16060 > Project: Kafka > Issue Type: Wish > Components: core >Affects Versions: 3.6.1 >Reporter: Jianbin Chen >Priority: Major > > # If a topic has 3 replicas, when the local expiration time is reached, will > all 3 replicas trigger the log transfer to the remote storage, or will only > the leader in the isr transfer the log to the remote storage (hdfs, s3) > # Topics that do not support compression, do you mean topics that > log.cleanup.policy=compact? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15964) Flaky test: testHighAvailabilityTaskAssignorLargeNumConsumers – org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest
[ https://issues.apache.org/jira/browse/KAFKA-15964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy updated KAFKA-15964: --- Component/s: streams unit tests > Flaky test: testHighAvailabilityTaskAssignorLargeNumConsumers – > org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > --- > > Key: KAFKA-15964 > URL: https://issues.apache.org/jira/browse/KAFKA-15964 > Project: Kafka > Issue Type: Bug > Components: streams, unit tests >Reporter: Apoorv Mittal >Priority: Major > Labels: flaky-test > > PR build: > [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14767/15/tests/] > > {code:java} > java.lang.AssertionError: The first assignment took too long to complete at > 94250ms.Stacktracejava.lang.AssertionError: The first assignment took too > long to complete at 94250ms.at > org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest.completeLargeAssignment(StreamsAssignmentScaleTest.java:220) > at > org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest.testHighAvailabilityTaskAssignorLargeNumConsumers(StreamsAssignmentScaleTest.java:85) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) >at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) >at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Updated] (KAFKA-15964) Flaky test: testHighAvailabilityTaskAssignorLargeNumConsumers – org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest
[ https://issues.apache.org/jira/browse/KAFKA-15964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Lucas Brutschy updated KAFKA-15964: --- Labels: flaky-test (was: ) > Flaky test: testHighAvailabilityTaskAssignorLargeNumConsumers – > org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest > --- > > Key: KAFKA-15964 > URL: https://issues.apache.org/jira/browse/KAFKA-15964 > Project: Kafka > Issue Type: Bug >Reporter: Apoorv Mittal >Priority: Major > Labels: flaky-test > > PR build: > [https://ci-builds.apache.org/blue/organizations/jenkins/Kafka%2Fkafka-pr/detail/PR-14767/15/tests/] > > {code:java} > java.lang.AssertionError: The first assignment took too long to complete at > 94250ms.Stacktracejava.lang.AssertionError: The first assignment took too > long to complete at 94250ms.at > org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest.completeLargeAssignment(StreamsAssignmentScaleTest.java:220) > at > org.apache.kafka.streams.processor.internals.StreamsAssignmentScaleTest.testHighAvailabilityTaskAssignorLargeNumConsumers(StreamsAssignmentScaleTest.java:85) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:568) at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) >at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299) >at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16059) Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test
[ https://issues.apache.org/jira/browse/KAFKA-16059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801128#comment-17801128 ] Divij Vaidya commented on KAFKA-16059: -- Having said that, please wait before picking up this JIRA. I think this is resolved by the commit I mentioned above. I am verifying it now by running the full suite. > Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test > -- > > Key: KAFKA-16059 > URL: https://issues.apache.org/jira/browse/KAFKA-16059 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Arpit Goyal >Priority: Major > Attachments: Screenshot 2023-12-29 at 11.13.01.png > > > We are leaking hundreds of ExpirationReaper-1-AlterAcls threads in one of the > tests in :core:test > {code:java} > "ExpirationReaper-1-AlterAcls" prio=0 tid=0x0 nid=0x0 waiting on condition > java.lang.Thread.State: TIMED_WAITING > on > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3688fc67 > at java.base@17.0.9/jdk.internal.misc.Unsafe.park(Native Method) > at > java.base@17.0.9/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252) > at > java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672) > at > java.base@17.0.9/java.util.concurrent.DelayQueue.poll(DelayQueue.java:265) > at > app//org.apache.kafka.server.util.timer.SystemTimer.advanceClock(SystemTimer.java:87) > at > app//kafka.server.DelayedOperationPurgatory.advanceClock(DelayedOperation.scala:418) > at > app//kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper.doWork(DelayedOperation.scala:444) > at > app//org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:131) > {code} > The objective of this Jira is to identify the test and fix this leak -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Comment Edited] (KAFKA-16059) Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test
[ https://issues.apache.org/jira/browse/KAFKA-16059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801128#comment-17801128 ] Divij Vaidya edited comment on KAFKA-16059 at 12/29/23 10:33 AM: - Having said that, please wait before starting to work on this JIRA. I think this is resolved by the commit I mentioned above. I am verifying it now by running the full suite. was (Author: divijvaidya): Having said that, please wait before picking up this JIRA. I think this is resolved by the commit I mentioned above. I am verifying it now by running the full suite. > Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test > -- > > Key: KAFKA-16059 > URL: https://issues.apache.org/jira/browse/KAFKA-16059 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Arpit Goyal >Priority: Major > Attachments: Screenshot 2023-12-29 at 11.13.01.png > > > We are leaking hundreds of ExpirationReaper-1-AlterAcls threads in one of the > tests in :core:test > {code:java} > "ExpirationReaper-1-AlterAcls" prio=0 tid=0x0 nid=0x0 waiting on condition > java.lang.Thread.State: TIMED_WAITING > on > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3688fc67 > at java.base@17.0.9/jdk.internal.misc.Unsafe.park(Native Method) > at > java.base@17.0.9/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252) > at > java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672) > at > java.base@17.0.9/java.util.concurrent.DelayQueue.poll(DelayQueue.java:265) > at > app//org.apache.kafka.server.util.timer.SystemTimer.advanceClock(SystemTimer.java:87) > at > app//kafka.server.DelayedOperationPurgatory.advanceClock(DelayedOperation.scala:418) > at > app//kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper.doWork(DelayedOperation.scala:444) > at > app//org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:131) > {code} > The objective of this Jira is to identify the test and fix this leak -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16059) Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test
[ https://issues.apache.org/jira/browse/KAFKA-16059?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801127#comment-17801127 ] Divij Vaidya commented on KAFKA-16059: -- Hey [~goyarpit] I am using intellij profiler and it's quite simple to use, no additional setup required. First, we will ensure that tests are executed using a single thread. You can specify it using the build parameter maxParallelForks, i.e. you execute the command `./gradlew -PmaxParallelForks=1 -PmaxScalacThreads=1 :core:test` Now, since the tests are executing, you can attach your favourite profiler to it. I am using Intellij profiler, where you select the process you want to attach the profiler to, right click and then click on "CPU and Memory live charts". You can also take a heap dump and a thread dump using this interface. !Screenshot 2023-12-29 at 11.13.01.png! > Fix leak of ExpirationReaper-1-AlterAcls threads in :core:test > -- > > Key: KAFKA-16059 > URL: https://issues.apache.org/jira/browse/KAFKA-16059 > Project: Kafka > Issue Type: Sub-task >Reporter: Divij Vaidya >Assignee: Arpit Goyal >Priority: Major > Attachments: Screenshot 2023-12-29 at 11.13.01.png > > > We are leaking hundreds of ExpirationReaper-1-AlterAcls threads in one of the > tests in :core:test > {code:java} > "ExpirationReaper-1-AlterAcls" prio=0 tid=0x0 nid=0x0 waiting on condition > java.lang.Thread.State: TIMED_WAITING > on > java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject@3688fc67 > at java.base@17.0.9/jdk.internal.misc.Unsafe.park(Native Method) > at > java.base@17.0.9/java.util.concurrent.locks.LockSupport.parkNanos(LockSupport.java:252) > at > java.base@17.0.9/java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.awaitNanos(AbstractQueuedSynchronizer.java:1672) > at > java.base@17.0.9/java.util.concurrent.DelayQueue.poll(DelayQueue.java:265) > at > app//org.apache.kafka.server.util.timer.SystemTimer.advanceClock(SystemTimer.java:87) > at > app//kafka.server.DelayedOperationPurgatory.advanceClock(DelayedOperation.scala:418) > at > app//kafka.server.DelayedOperationPurgatory$ExpiredOperationReaper.doWork(DelayedOperation.scala:444) > at > app//org.apache.kafka.server.util.ShutdownableThread.run(ShutdownableThread.java:131) > {code} > The objective of this Jira is to identify the test and fix this leak -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (KAFKA-16060) Some questions about tiered storage capabilities
[ https://issues.apache.org/jira/browse/KAFKA-16060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17801126#comment-17801126 ] Divij Vaidya commented on KAFKA-16060: -- Hey [~jianbin] Questions are best asked by sending an email to developer mailing list or user mailing list specified at [https://kafka.apache.org/contact] I will answer your questions as a one time exception here but in future, please send an email. 1. It's only the leader that copies data to remote storage. Any replica will not delete logs locally even if local expiration time is reaches until it knows that leader has copied the specifed log to remote, i.e. data from local storage is never removed until there is a copy available in remote. You can find more information about this at [https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage] 2. Yes, compression sounds like a typo. Where did you find it? As a reference, the early access notes for Tiered Storage at [https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Tiered+Storage+Early+Access+Release+Notes] mention it more clearly. > Some questions about tiered storage capabilities > > > Key: KAFKA-16060 > URL: https://issues.apache.org/jira/browse/KAFKA-16060 > Project: Kafka > Issue Type: Wish > Components: core >Affects Versions: 3.6.1 >Reporter: Jianbin Chen >Priority: Major > > # If a topic has 3 replicas, when the local expiration time is reached, will > all 3 replicas trigger the log transfer to the remote storage, or will only > the leader in the isr transfer the log to the remote storage (hdfs, s3) > # Topics that do not support compression, do you mean topics that > log.cleanup.policy=compact? -- This message was sent by Atlassian Jira (v8.20.10#820010)
Re: [PR] KAFKA-16015: Fix custom timeouts overwritten by defaults [kafka]
divijvaidya commented on PR #15030: URL: https://github.com/apache/kafka/pull/15030#issuecomment-1871892425 backported to 3.7 branch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Resolved] (KAFKA-16015) kafka-leader-election timeout values always overwritten by default values
[ https://issues.apache.org/jira/browse/KAFKA-16015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Divij Vaidya resolved KAFKA-16015. -- Resolution: Fixed > kafka-leader-election timeout values always overwritten by default values > -- > > Key: KAFKA-16015 > URL: https://issues.apache.org/jira/browse/KAFKA-16015 > Project: Kafka > Issue Type: Bug > Components: admin, tools >Affects Versions: 3.5.1, 3.6.1 >Reporter: Sergio Troiano >Assignee: Sergio Troiano >Priority: Minor > Fix For: 3.7.0, 3.8.0 > > > Using the *kafka-leader-election.sh* I was getting random timeouts like these: > {code:java} > Error completing leader election (PREFERRED) for partition: > sebatestemptytopic-4: org.apache.kafka.common.errors.TimeoutException: The > request timed out. > Error completing leader election (PREFERRED) for partition: > __CruiseControlMetrics-3: org.apache.kafka.common.errors.TimeoutException: > The request timed out. > Error completing leader election (PREFERRED) for partition: > __KafkaCruiseControlModelTrainingSamples-18: > org.apache.kafka.common.errors.TimeoutException: The request timed out. > Error completing leader election (PREFERRED) for partition: > __KafkaCruiseControlPartitionMetricSamples-8: > org.apache.kafka.common.errors.TimeoutException: The request timed out. {code} > These timeouts were raised from the client side as the controller always > finished with all the Kafka leader elections. > One pattern I detected was always the timeouts were raised after about 15 > seconds. > > So i checked this command has an option to pass configurations > {code:java} > Option Description > -- --- > --admin.config Configuration properties files to pass > to the admin client {code} > I created the file in order to increment the values of *request.timeout.ms* > and *default.api.timeout.ms.* So even after increasing these values I got > the same result, timeouts were happening, like the new values were not having > any effect. > So I checked the source code and I came across with a bug, no matter the > value we pass to the timeouts the default values were ALWAYS overwriting them. > > This is the[3.6 > branch|https://github.com/apache/kafka/blob/3.6/core/src/main/scala/kafka/admin/LeaderElectionCommand.scala#L42] > {code:java} > object LeaderElectionCommand extends Logging { > def main(args: Array[String]): Unit = { > run(args, 30.second) > } def run(args: Array[String], timeout: Duration): Unit = { > val commandOptions = new LeaderElectionCommandOptions(args) > CommandLineUtils.maybePrintHelpOrVersion( > commandOptions, > "This tool attempts to elect a new leader for a set of topic > partitions. The type of elections supported are preferred replicas and > unclean replicas." > ) validate(commandOptions) val electionType = > commandOptions.options.valueOf(commandOptions.electionType) val > jsonFileTopicPartitions = > Option(commandOptions.options.valueOf(commandOptions.pathToJsonFile)).map { > path => > parseReplicaElectionData(Utils.readFileAsString(path)) > } val singleTopicPartition = ( > Option(commandOptions.options.valueOf(commandOptions.topic)), > Option(commandOptions.options.valueOf(commandOptions.partition)) > ) match { > case (Some(topic), Some(partition)) => Some(Set(new > TopicPartition(topic, partition))) > case _ => None > } /* Note: No need to look at --all-topic-partitions as we want this > to be None if it is use. > * The validate function should be checking that this option is required > if the --topic and --path-to-json-file > * are not specified. > */ > val topicPartitions = > jsonFileTopicPartitions.orElse(singleTopicPartition) val adminClient = { > val props = > Option(commandOptions.options.valueOf(commandOptions.adminClientConfig)).map > { config => > Utils.loadProps(config) > }.getOrElse(new Properties()) props.setProperty( > AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, > commandOptions.options.valueOf(commandOptions.bootstrapServer) > ) > props.setProperty(AdminClientConfig.DEFAULT_API_TIMEOUT_MS_CONFIG, > timeout.toMillis.toString) > props.setProperty(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG, > (timeout.toMillis / 2).toString) Admin.create(props) > } {code} > As we can see the default timeout is 30 seconds, and the request timeout is > 30/2 which validates the 15 seconds timeout. > Also we can see in the code how the custom values passed by the config file > are overwritten by the defaults. > > >
[jira] [Updated] (KAFKA-16015) kafka-leader-election timeout values always overwritten by default values
[ https://issues.apache.org/jira/browse/KAFKA-16015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Divij Vaidya updated KAFKA-16015: - Fix Version/s: 3.7.0 3.8.0 > kafka-leader-election timeout values always overwritten by default values > -- > > Key: KAFKA-16015 > URL: https://issues.apache.org/jira/browse/KAFKA-16015 > Project: Kafka > Issue Type: Bug > Components: admin, tools >Affects Versions: 3.5.1, 3.6.1 >Reporter: Sergio Troiano >Assignee: Sergio Troiano >Priority: Minor > Fix For: 3.7.0, 3.8.0 > > > Using the *kafka-leader-election.sh* I was getting random timeouts like these: > {code:java} > Error completing leader election (PREFERRED) for partition: > sebatestemptytopic-4: org.apache.kafka.common.errors.TimeoutException: The > request timed out. > Error completing leader election (PREFERRED) for partition: > __CruiseControlMetrics-3: org.apache.kafka.common.errors.TimeoutException: > The request timed out. > Error completing leader election (PREFERRED) for partition: > __KafkaCruiseControlModelTrainingSamples-18: > org.apache.kafka.common.errors.TimeoutException: The request timed out. > Error completing leader election (PREFERRED) for partition: > __KafkaCruiseControlPartitionMetricSamples-8: > org.apache.kafka.common.errors.TimeoutException: The request timed out. {code} > These timeouts were raised from the client side as the controller always > finished with all the Kafka leader elections. > One pattern I detected was always the timeouts were raised after about 15 > seconds. > > So i checked this command has an option to pass configurations > {code:java} > Option Description > -- --- > --admin.config Configuration properties files to pass > to the admin client {code} > I created the file in order to increment the values of *request.timeout.ms* > and *default.api.timeout.ms.* So even after increasing these values I got > the same result, timeouts were happening, like the new values were not having > any effect. > So I checked the source code and I came across with a bug, no matter the > value we pass to the timeouts the default values were ALWAYS overwriting them. > > This is the[3.6 > branch|https://github.com/apache/kafka/blob/3.6/core/src/main/scala/kafka/admin/LeaderElectionCommand.scala#L42] > {code:java} > object LeaderElectionCommand extends Logging { > def main(args: Array[String]): Unit = { > run(args, 30.second) > } def run(args: Array[String], timeout: Duration): Unit = { > val commandOptions = new LeaderElectionCommandOptions(args) > CommandLineUtils.maybePrintHelpOrVersion( > commandOptions, > "This tool attempts to elect a new leader for a set of topic > partitions. The type of elections supported are preferred replicas and > unclean replicas." > ) validate(commandOptions) val electionType = > commandOptions.options.valueOf(commandOptions.electionType) val > jsonFileTopicPartitions = > Option(commandOptions.options.valueOf(commandOptions.pathToJsonFile)).map { > path => > parseReplicaElectionData(Utils.readFileAsString(path)) > } val singleTopicPartition = ( > Option(commandOptions.options.valueOf(commandOptions.topic)), > Option(commandOptions.options.valueOf(commandOptions.partition)) > ) match { > case (Some(topic), Some(partition)) => Some(Set(new > TopicPartition(topic, partition))) > case _ => None > } /* Note: No need to look at --all-topic-partitions as we want this > to be None if it is use. > * The validate function should be checking that this option is required > if the --topic and --path-to-json-file > * are not specified. > */ > val topicPartitions = > jsonFileTopicPartitions.orElse(singleTopicPartition) val adminClient = { > val props = > Option(commandOptions.options.valueOf(commandOptions.adminClientConfig)).map > { config => > Utils.loadProps(config) > }.getOrElse(new Properties()) props.setProperty( > AdminClientConfig.BOOTSTRAP_SERVERS_CONFIG, > commandOptions.options.valueOf(commandOptions.bootstrapServer) > ) > props.setProperty(AdminClientConfig.DEFAULT_API_TIMEOUT_MS_CONFIG, > timeout.toMillis.toString) > props.setProperty(AdminClientConfig.REQUEST_TIMEOUT_MS_CONFIG, > (timeout.toMillis / 2).toString) Admin.create(props) > } {code} > As we can see the default timeout is 30 seconds, and the request timeout is > 30/2 which validates the 15 seconds timeout. > Also we can see in the code how the custom values passed by the config file > are overwritten
Re: [PR] KAFKA-16015: Fix custom timeouts overwritten by defaults [kafka]
divijvaidya merged PR #15030: URL: https://github.com/apache/kafka/pull/15030 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: jira-unsubscr...@kafka.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
Re: [PR] KAFKA-16015: Fix custom timeouts overwritten by defaults [kafka]
divijvaidya commented on PR #15030: URL: https://github.com/apache/kafka/pull/15030#issuecomment-1871886450 Unrelated test failures ``` [Build / JDK 17 and Scala 2.13 / org.apache.kafka.common.network.SslTransportLayerTest."testUngracefulRemoteCloseDuringHandshakeRead(Args).tlsProtocol=TLSv1.2, useInlinePem=true"](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15030/7/testReport/junit/org.apache.kafka.common.network/SslTransportLayerTest/Build___JDK_17_and_Scala_2_13testUngracefulRemoteCloseDuringHandshakeRead_Args__tlsProtocol_TLSv1_2__useInlinePem_true_/) [Build / JDK 17 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest.testReplicateFromLatest()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15030/7/testReport/junit/org.apache.kafka.connect.mirror.integration/MirrorConnectorsIntegrationBaseTest/Build___JDK_17_and_Scala_2_13___testReplicateFromLatest__/) [Build / JDK 17 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest.testReplicateFromLatest()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15030/7/testReport/junit/org.apache.kafka.connect.mirror.integration/MirrorConnectorsIntegrationBaseTest/Build___JDK_17_and_Scala_2_13___testReplicateFromLatest___2/) [Build / JDK 17 and Scala 2.13 / kafka.api.TransactionsBounceTest.testWithGroupMetadata()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15030/7/testReport/junit/kafka.api/TransactionsBounceTest/Build___JDK_17_and_Scala_2_13___testWithGroupMetadata__/) [Build / JDK 17 and Scala 2.13 / kafka.server.ControllerRegistrationManagerTest.testWrongIncarnationId()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15030/7/testReport/junit/kafka.server/ControllerRegistrationManagerTest/Build___JDK_17_and_Scala_2_13___testWrongIncarnationId__/) [Build / JDK 17 and Scala 2.13 / org.apache.kafka.controller.QuorumControllerTest."testBrokerHeartbeatDuringMigration(MetadataVersion).3.6-IV0"](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15030/7/testReport/junit/org.apache.kafka.controller/QuorumControllerTest/Build___JDK_17_and_Scala_2_13testBrokerHeartbeatDuringMigration_MetadataVersion__3_6_IV0_/) [Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.IdentityReplicationIntegrationTest.testReplicateFromLatest()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15030/7/testReport/junit/org.apache.kafka.connect.mirror.integration/IdentityReplicationIntegrationTest/Build___JDK_11_and_Scala_2_13___testReplicateFromLatest__/) [Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.IdentityReplicationIntegrationTest.testReplicateFromLatest()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15030/7/testReport/junit/org.apache.kafka.connect.mirror.integration/IdentityReplicationIntegrationTest/Build___JDK_11_and_Scala_2_13___testReplicateFromLatest___2/) [Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationBaseTest.testReplicateFromLatest()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15030/7/testReport/junit/org.apache.kafka.connect.mirror.integration/MirrorConnectorsIntegrationBaseTest/Build___JDK_11_and_Scala_2_13___testReplicateFromLatest__/) [Build / JDK 11 and Scala 2.13 / org.apache.kafka.connect.mirror.integration.MirrorConnectorsIntegrationSSLTest.testReplicateFromLatest()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15030/7/testReport/junit/org.apache.kafka.connect.mirror.integration/MirrorConnectorsIntegrationSSLTest/Build___JDK_11_and_Scala_2_13___testReplicateFromLatest__/) [Build / JDK 11 and Scala 2.13 / kafka.api.TransactionsBounceTest.testWithGroupId()](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15030/7/testReport/junit/kafka.api/TransactionsBounceTest/Build___JDK_11_and_Scala_2_13___testWithGroupId__/) [Build / JDK 11 and Scala 2.13 / kafka.api.TransactionsTest.testFailureToFenceEpoch(String).quorum=kraft](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15030/7/testReport/junit/kafka.api/TransactionsTest/Build___JDK_11_and_Scala_2_13___testFailureToFenceEpoch_String__quorum_kraft/) [Build / JDK 11 and Scala 2.13 / kafka.api.TransactionsTest.testFailureToFenceEpoch(String).quorum=kraft](https://ci-builds.apache.org/job/Kafka/job/kafka-pr/job/PR-15030/7/testReport/junit/kafka.api/TransactionsTest/Build___JDK_11_and_Scala_2_13___testFailureToFenceEpoch_String__quorum_kraft_2/) [Build / JDK 11 and Scala 2.13 /