[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress

2020-08-14 Thread Rajini Sivaram (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17177995#comment-17177995
 ] 

Rajini Sivaram commented on KAFKA-10158:


[~bbyrne] We can close this now, right?

> Fix flaky 
> kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
> ---
>
> Key: KAFKA-10158
> URL: https://issues.apache.org/jira/browse/KAFKA-10158
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Chia-Ping Tsai
>Assignee: Brian Byrne
>Priority: Minor
> Fix For: 2.7.0
>
>
> Altering the assignments is a async request so it is possible that the 
> reassignment is still in progress when we start to verify the 
> "under-replicated-partitions". In order to make it stable, it needs a wait 
> for the reassignment completion before verifying the topic command with 
> "under-replicated-partitions".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress

2020-06-29 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17148200#comment-17148200
 ] 

Boyang Chen commented on KAFKA-10158:
-

Failed again: 

[https://builds.apache.org/job/kafka-pr-jdk8-scala2.12/3175/]
h3. Stacktrace

org.junit.ComparisonFailure: --under-replicated-partitions shouldn't return 
anything: ' Topic: 
testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress-ryTXst4I8P 
Partition: 0 Leader: 2 Replicas: 0,2 Isr: 2 ' expected:<[]> but was:<[ Topic: 
testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress-ryTXst4I8P 
Partition: 0 Leader: 2 Replicas: 0,2 Isr: 2 ]> at 
org.junit.Assert.assertEquals(Assert.java:117) at 
kafka.admin.TopicCommandWithAdminClientTest.testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress(TopicCommandWithAdminClientTest.scala:702)
 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.ParentRunner.run(ParentRunner.java:413) at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
 at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
 at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
 at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
 at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
 at sun.reflect.GeneratedMethodAccessor8.invoke(Unknown Source) at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
 at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
 at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
 at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
 at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:119)
 at sun.reflect.GeneratedMethodAccessor7.invoke(Unknown Source) at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.lang.reflect.Method.invoke(Method.java:498) at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
 at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
 at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182)
 at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164)
 at 
org.gradle.internal.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:414)
 at 

[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress

2020-06-29 Thread Boyang Chen (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147967#comment-17147967
 ] 

Boyang Chen commented on KAFKA-10158:
-

failed again: [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/7155/]

 

org.junit.ComparisonFailure: --under-replicated-partitions shouldn't return 
anything: ' Topic: 
testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress-onVcRPnNWU 
Partition: 0 Leader: 1 Replicas: 0,1 Isr: 1 ' expected:<[]> but was:<[ Topic: 
testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress-onVcRPnNWU 
Partition: 0 Leader: 1 Replicas: 0,1 Isr: 1 ]> at 
org.junit.Assert.assertEquals(Assert.java:117) at 
kafka.admin.TopicCommandWithAdminClientTest.testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress(TopicCommandWithAdminClientTest.scala:702)
 at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
Method) at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
 at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
 at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
 at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
 at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at 
org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100)
 at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:366) at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103)
 at 
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63)
 at org.junit.runners.ParentRunner$4.run(ParentRunner.java:331) at 
org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:79) at 
org.junit.runners.ParentRunner.runChildren(ParentRunner.java:329) at 
org.junit.runners.ParentRunner.access$100(ParentRunner.java:66) at 
org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:293) at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:306) at 
org.junit.runners.ParentRunner.run(ParentRunner.java:413) at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110)
 at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58)
 at 
org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38)
 at 
org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62)
 at 
org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51)
 at jdk.internal.reflect.GeneratedMethodAccessor25.invoke(Unknown Source) at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
 at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
 at 
org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:33)
 at 
org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:94)
 at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at 
org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:119)
 at jdk.internal.reflect.GeneratedMethodAccessor24.invoke(Unknown Source) at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 at java.base/java.lang.reflect.Method.invoke(Method.java:566) at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:36)
 at 
org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
 at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:182)
 at 
org.gradle.internal.remote.internal.hub.MessageHubBackedObjectConnection$DispatchWrapper.dispatch(MessageHubBackedObjectConnection.java:164)
 

[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress

2020-06-29 Thread Lucas Bradstreet (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147923#comment-17147923
 ] 

Lucas Bradstreet commented on KAFKA-10158:
--

[~chia7712] I'm quite sure what the right way to fix it is. I think if we 
produced messages in multiple batches, and set 

"replica.fetch.max.bytes" low enough it would ensure that the follower 
throttled itself prior to joining the ISR.

I think checking for the reassignment to complete before checking for under 
replicated partitions defeats the purpose of the test. I think the test was 
designed to show that in progress reassignments would not show up as URPs. I 
think the test could be improved by checking that a reassignment is still in 
progress at the end of the test, after the --under-replicated-partitions check 
is made.

> Fix flaky 
> kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
> ---
>
> Key: KAFKA-10158
> URL: https://issues.apache.org/jira/browse/KAFKA-10158
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Minor
> Fix For: 2.7.0
>
>
> Altering the assignments is a async request so it is possible that the 
> reassignment is still in progress when we start to verify the 
> "under-replicated-partitions". In order to make it stable, it needs a wait 
> for the reassignment completion before verifying the topic command with 
> "under-replicated-partitions".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress

2020-06-29 Thread Chia-Ping Tsai (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147909#comment-17147909
 ] 

Chia-Ping Tsai commented on KAFKA-10158:


Any suggestion about fixing the throttle?

For another, does it make sense to wait the reassignment to complete before 
checking the under replicated partitions? 

> Fix flaky 
> kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
> ---
>
> Key: KAFKA-10158
> URL: https://issues.apache.org/jira/browse/KAFKA-10158
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Minor
> Fix For: 2.7.0
>
>
> Altering the assignments is a async request so it is possible that the 
> reassignment is still in progress when we start to verify the 
> "under-replicated-partitions". In order to make it stable, it needs a wait 
> for the reassignment completion before verifying the topic command with 
> "under-replicated-partitions".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress

2020-06-29 Thread Lucas Bradstreet (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147907#comment-17147907
 ] 

Lucas Bradstreet commented on KAFKA-10158:
--

For me that check passes and the following check fails:
{code:java}
val underReplicatedOutput = TestUtils.grabConsoleOutput( 
topicService.describeTopic(new 
TopicCommandOptions(Array("--under-replicated-partitions" 
assertEquals(s"--under-replicated-partitions shouldn't return anything: 
'$underReplicatedOutput'", "", underReplicatedOutput) {code}
 

When this check fails the reassignment is usually finishing because the 
replication throttle didn't quite work the way the test expects it to. The 
throttle doesn't work because only one fetch request is required to get the 
follower into sync.

That said, I think depending on the timing I could see the lines you point to 
failing too.

> Fix flaky 
> kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
> ---
>
> Key: KAFKA-10158
> URL: https://issues.apache.org/jira/browse/KAFKA-10158
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Minor
> Fix For: 2.7.0
>
>
> Altering the assignments is a async request so it is possible that the 
> reassignment is still in progress when we start to verify the 
> "under-replicated-partitions". In order to make it stable, it needs a wait 
> for the reassignment completion before verifying the topic command with 
> "under-replicated-partitions".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress

2020-06-29 Thread Chia-Ping Tsai (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147895#comment-17147895
 ] 

Chia-Ping Tsai commented on KAFKA-10158:


[~lucasbradstreet] not sure whether I have caught you point. It seems to me the 
issue you described is that the throttle does not work expectedly and so the 
following check gets failed, right?

{code}
// let's wait until the LAIR is propagated
TestUtils.waitUntilTrue(() => {
  val reassignments = 
adminClient.listPartitionReassignments(Collections.singleton(tp)).reassignments().get()
  !reassignments.get(tp).addingReplicas().isEmpty
}, "Reassignment didn't add the second node")

// describe the topic and test if it's under-replicated
val simpleDescribeOutput = TestUtils.grabConsoleOutput(
  topicService.describeTopic(new TopicCommandOptions(Array("--topic", 
testTopicName
val simpleDescribeOutputRows = simpleDescribeOutput.split("\n")
assertTrue(simpleDescribeOutputRows(0).startsWith(s"Topic: $testTopicName"))
assertEquals(2, simpleDescribeOutputRows.size)
{code}

> Fix flaky 
> kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
> ---
>
> Key: KAFKA-10158
> URL: https://issues.apache.org/jira/browse/KAFKA-10158
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Minor
> Fix For: 2.7.0
>
>
> Altering the assignments is a async request so it is possible that the 
> reassignment is still in progress when we start to verify the 
> "under-replicated-partitions". In order to make it stable, it needs a wait 
> for the reassignment completion before verifying the topic command with 
> "under-replicated-partitions".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (KAFKA-10158) Fix flaky kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress

2020-06-29 Thread Lucas Bradstreet (Jira)


[ 
https://issues.apache.org/jira/browse/KAFKA-10158?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17147861#comment-17147861
 ] 

Lucas Bradstreet commented on KAFKA-10158:
--

I think the problem here is subtly different to the one diagnosed. The test is 
attempting to show that an in progress reassignment does not show up as under 
replication partitions. The problem is that the reassignment may be completing 
as the `--under-replicated-partitions` check is taking place, and unfortunately 
there is some inconsistency in how the command line tool performs this check. 
It first performs a topic describe and then looks up whether a reassignment is 
in progress. As these checks are performed at different times, it can end up 
seeing the topic describe while the reassignment is in progress, and then not 
see the reassignment is in progress immediately after.

This inconsistency would not be a problem for this test if there wasn't a 
second problem. We set a replication throttle
{noformat}
TestUtils.setReplicationThrottleForPartitions(adminClient, brokerIds, Set(tp), 
throttleBytes = 1) {noformat}
however the throttle does not become effective due to the way the check works 
in the replica fetcher:
{noformat}
!fetchState.isReplicaInSync && quota.isThrottled(topicPartition) && 
quota.isQuotaExceeded
{noformat}
The first thread causes the follower to think it's in sync, and the follow up 
fetch causes fetchState.isReplicaInSync to return true, which means it does not 
throttle itself.

> Fix flaky 
> kafka.admin.TopicCommandWithAdminClientTest#testDescribeUnderReplicatedPartitionsWhenReassignmentIsInProgress
> ---
>
> Key: KAFKA-10158
> URL: https://issues.apache.org/jira/browse/KAFKA-10158
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Chia-Ping Tsai
>Assignee: Chia-Ping Tsai
>Priority: Minor
> Fix For: 2.7.0
>
>
> Altering the assignments is a async request so it is possible that the 
> reassignment is still in progress when we start to verify the 
> "under-replicated-partitions". In order to make it stable, it needs a wait 
> for the reassignment completion before verifying the topic command with 
> "under-replicated-partitions".



--
This message was sent by Atlassian Jira
(v8.3.4#803005)