[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
[ https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177995#comment-17177995 ] Rajini Sivaram commented on KAFKA-10158: [~bbyrne] We can close this now, right? > Fix flaky > kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress > --- > > Key: KAFKA-10158 > URL: https://issues.apache.org/jira/browse/KAFKA-10158 > Project: Kafka > Issue Type: Bug > Components: unit tests >Reporter: Chia-Ping Tsai >Assignee: Brian Byrne >Priority: Minor > Fix For: 2.7.0 > > > Altering the assignments is a async request so it is possible that the > reassignment is still in progress when we start to verify the > "under-replicated-partitions". In order to make it stable, it needs a wait > for the reassignment completion before verifying the topic command with > "under-replicated-partitions". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
[ https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148200#comment-17148200 ] Boyang Chen commented on KAFKA-10158: - Failed again: [https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3175/] h3. Stacktrace org.junit.ComparisonFailure: --under-replicated-partitions shouldn't return anything: ' Topic: testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress-ryTXst4I8P Partition: 0 Leader: 2 Replicas: 0,2 Isr: 2 ' expected:<[]> but was:<[ Topic: testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress-ryTXst4I8P Partition: 0 Leader: 2 Replicas: 0,2 Isr: 2 ]> at org.junit.Assert.assertEquals(Assert.java:117) at kafka.admin.TopicCommandWithAdminClientTest.testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress(TopicCommandWithAdminClientTest.scala:702) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:119) at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164) at org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:414) at
[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
[ https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147967#comment-17147967 ] Boyang Chen commented on KAFKA-10158: - failed again: [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/7155/] org.junit.ComparisonFailure: --under-replicated-partitions shouldn't return anything: ' Topic: testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress-onVcRPnNWU Partition: 0 Leader: 1 Replicas: 0,1 Isr: 1 ' expected:<[]> but was:<[ Topic: testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress-onVcRPnNWU Partition: 0 Leader: 1 Replicas: 0,1 Isr: 1 ]> at org.junit.Assert.assertEquals(Assert.java:117) at kafka.admin.TopicCommandWithAdminClientTest.testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress(TopicCommandWithAdminClientTest.scala:702) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at org.junit.runners.ParentRunner.run(ParentRunner.java:413) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at jdk.internal.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:119) at jdk.internal.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182) at org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164)
[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
[ https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147923#comment-17147923 ] Lucas Bradstreet commented on KAFKA-10158: -- [~chia7712] I'm quite sure what the right way to fix it is. I think if we produced messages in multiple batches, and set "replica.fetch.max.bytes" low enough it would ensure that the follower throttled itself prior to joining the ISR. I think checking for the reassignment to complete before checking for under replicated partitions defeats the purpose of the test. I think the test was designed to show that in progress reassignments would not show up as URPs. I think the test could be improved by checking that a reassignment is still in progress at the end of the test, after the --under-replicated-partitions check is made. > Fix flaky > kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress > --- > > Key: KAFKA-10158 > URL: https://issues.apache.org/jira/browse/KAFKA-10158 > Project: Kafka > Issue Type: Bug > Components: unit tests >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Minor > Fix For: 2.7.0 > > > Altering the assignments is a async request so it is possible that the > reassignment is still in progress when we start to verify the > "under-replicated-partitions". In order to make it stable, it needs a wait > for the reassignment completion before verifying the topic command with > "under-replicated-partitions". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
[ https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147909#comment-17147909 ] Chia-Ping Tsai commented on KAFKA-10158: Any suggestion about fixing the throttle? For another, does it make sense to wait the reassignment to complete before checking the under replicated partitions? > Fix flaky > kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress > --- > > Key: KAFKA-10158 > URL: https://issues.apache.org/jira/browse/KAFKA-10158 > Project: Kafka > Issue Type: Bug > Components: unit tests >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Minor > Fix For: 2.7.0 > > > Altering the assignments is a async request so it is possible that the > reassignment is still in progress when we start to verify the > "under-replicated-partitions". In order to make it stable, it needs a wait > for the reassignment completion before verifying the topic command with > "under-replicated-partitions". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
[ https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147907#comment-17147907 ] Lucas Bradstreet commented on KAFKA-10158: -- For me that check passes and the following check fails: {code:java} val underReplicatedOutput = TestUtils.grabConsoleOutput( topicService.describeTopic(new TopicCommandOptions(Array("--under-replicated-partitions" assertEquals(s"--under-replicated-partitions shouldn't return anything: '$underReplicatedOutput'", "", underReplicatedOutput) {code} When this check fails the reassignment is usually finishing because the replication throttle didn't quite work the way the test expects it to. The throttle doesn't work because only one fetch request is required to get the follower into sync. That said, I think depending on the timing I could see the lines you point to failing too. > Fix flaky > kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress > --- > > Key: KAFKA-10158 > URL: https://issues.apache.org/jira/browse/KAFKA-10158 > Project: Kafka > Issue Type: Bug > Components: unit tests >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Minor > Fix For: 2.7.0 > > > Altering the assignments is a async request so it is possible that the > reassignment is still in progress when we start to verify the > "under-replicated-partitions". In order to make it stable, it needs a wait > for the reassignment completion before verifying the topic command with > "under-replicated-partitions". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
[ https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147895#comment-17147895 ] Chia-Ping Tsai commented on KAFKA-10158: [~lucasbradstreet] not sure whether I have caught you point. It seems to me the issue you described is that the throttle does not work expectedly and so the following check gets failed, right? {code} // let's wait until the LAIR is propagated TestUtils.waitUntilTrue(() => { val reassignments = adminClient.listPartitionReassignments(Collections.singleton(tp)).reassignments().get() !reassignments.get(tp).addingReplicas().isEmpty }, "Reassignment didn't add the second node") // describe the topic and test if it's under-replicated val simpleDescribeOutput = TestUtils.grabConsoleOutput( topicService.describeTopic(new TopicCommandOptions(Array("--topic", testTopicName val simpleDescribeOutputRows = simpleDescribeOutput.split("\n") assertTrue(simpleDescribeOutputRows(0).startsWith(s"Topic: $testTopicName")) assertEquals(2, simpleDescribeOutputRows.size) {code} > Fix flaky > kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress > --- > > Key: KAFKA-10158 > URL: https://issues.apache.org/jira/browse/KAFKA-10158 > Project: Kafka > Issue Type: Bug > Components: unit tests >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Minor > Fix For: 2.7.0 > > > Altering the assignments is a async request so it is possible that the > reassignment is still in progress when we start to verify the > "under-replicated-partitions". In order to make it stable, it needs a wait > for the reassignment completion before verifying the topic command with > "under-replicated-partitions". -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
[ https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147861#comment-17147861 ] Lucas Bradstreet commented on KAFKA-10158: -- I think the problem here is subtly different to the one diagnosed. The test is attempting to show that an in progress reassignment does not show up as under replication partitions. The problem is that the reassignment may be completing as the `--under-replicated-partitions` check is taking place, and unfortunately there is some inconsistency in how the command line tool performs this check. It first performs a topic describe and then looks up whether a reassignment is in progress. As these checks are performed at different times, it can end up seeing the topic describe while the reassignment is in progress, and then not see the reassignment is in progress immediately after. This inconsistency would not be a problem for this test if there wasn't a second problem. We set a replication throttle {noformat} TestUtils.setReplicationThrottleForPartitions(adminClient, brokerIds, Set(tp), throttleBytes = 1) {noformat} however the throttle does not become effective due to the way the check works in the replica fetcher: {noformat} !fetchState.isReplicaInSync && quota.isThrottled(topicPartition) && quota.isQuotaExceeded {noformat} The first thread causes the follower to think it's in sync, and the follow up fetch causes fetchState.isReplicaInSync to return true, which means it does not throttle itself. > Fix flaky > kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress > --- > > Key: KAFKA-10158 > URL: https://issues.apache.org/jira/browse/KAFKA-10158 > Project: Kafka > Issue Type: Bug > Components: unit tests >Reporter: Chia-Ping Tsai >Assignee: Chia-Ping Tsai >Priority: Minor > Fix For: 2.7.0 > > > Altering the assignments is a async request so it is possible that the > reassignment is still in progress when we start to verify the > "under-replicated-partitions". In order to make it stable, it needs a wait > for the reassignment completion before verifying the topic command with > "under-replicated-partitions". -- This message was sent by Atlassian Jira (v8.3.4#803005)