[jira] [Commented] (KAFKA-9263) Reocurrence: Transient failure in kafka.api.PlaintextAdminIntegrationTest.testLogStartOffsetCheckpoint and kafka.api.PlaintextAdminIntegrationTest.testAlterReplicaLogDi
[ https://issues.apache.org/jira/browse/KAFKA-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17242016#comment-17242016 ] Chia-Ping Tsai commented on KAFKA-9263: --- PlaintextAdminIntegrationTest.testLogStartOffsetCheckpoint does not fail recently and I looped it 200 times, all pass. The https://github.com/apache/kafka/pull/9423 which fixes kafka.api.PlaintextAdminIntegrationTest.testAlterReplicaLogDirs is going to be merged so I will revise the title of this issue (i.e remove PlaintextAdminIntegrationTest.testLogStartOffsetCheckpoint) > Reocurrence: Transient failure in > kafka.api.PlaintextAdminIntegrationTest.testLogStartOffsetCheckpoint and > kafka.api.PlaintextAdminIntegrationTest.testAlterReplicaLogDirs > -- > > Key: KAFKA-9263 > URL: https://issues.apache.org/jira/browse/KAFKA-9263 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 2.4.0 >Reporter: John Roesler >Priority: Major > Labels: flaky-test > > This test has failed for me on > https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/9691/testReport/junit/kafka.api/AdminClientIntegrationTest/testAlterReplicaLogDirs/ > {noformat} > Error Message > org.scalatest.exceptions.TestFailedException: only 0 messages are produced > within timeout after replica movement. Producer future > Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for > 1 ms.)) > Stacktrace > org.scalatest.exceptions.TestFailedException: only 0 messages are produced > within timeout after replica movement. Producer future > Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for > 1 ms.)) > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) > at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389) > at org.scalatest.Assertions.fail(Assertions.scala:1091) > at org.scalatest.Assertions.fail$(Assertions.scala:1087) > at org.scalatest.Assertions$.fail(Assertions.scala:1389) > at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:842) > at > kafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs(AdminClientIntegrationTest.scala:459) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.lang.Thread.run(Thread.java:834) > Standard Output > [2019-12-03 04:54:16,111] ERROR [ReplicaFetcher replicaId=2, leaderId=1, > fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-12-03 04:54:21,711] ERROR [ReplicaFetcher replicaId=0, leaderId=1, > fetcherId=0] Error for partition topic-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-12-03 04:54:21,712] ERROR [ReplicaFetcher replicaId=2, leaderId=1, > fetcherId=0] Error for partition topic-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-12-03 04:54:27,092] ERROR [ReplicaFetcher replicaId=2, leaderId=1, > fetcherId=0] Error for partition unclean-test-topic-1-0 at
[jira] [Commented] (KAFKA-9263) Reocurrence: Transient failure in kafka.api.PlaintextAdminIntegrationTest.testLogStartOffsetCheckpoint and kafka.api.PlaintextAdminIntegrationTest.testAlterReplicaLogDi
[ https://issues.apache.org/jira/browse/KAFKA-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17178021#comment-17178021 ] Bill Bejeck commented on KAFKA-9263: Test failure [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/7843/testReport/junit/kafka.api/PlaintextAdminIntegrationTest/testAlterReplicaLogDirs/] {noformat} 2020-08-14 17:34:06,420] ERROR [ReplicaManager broker=0] Error while changing replica dir for partition topic-0 (kafka.server.ReplicaManager:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: Error while fetching partition state for topic-0 [2020-08-14 17:34:06,420] ERROR [ReplicaManager broker=1] Error while changing replica dir for partition topic-0 (kafka.server.ReplicaManager:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: Error while fetching partition state for topic-0 [2020-08-14 17:34:06,420] ERROR [ReplicaManager broker=2] Error while changing replica dir for partition topic-0 (kafka.server.ReplicaManager:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: Error while fetching partition state for topic-0 [2020-08-14 17:36:24,822] ERROR [Consumer instanceId=test_instance_id_1, clientId=test_client_id, groupId=test_group_id] Offset commit failed on partition test_topic-1 at offset 0: The coordinator is not aware of this member. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:1191) [2020-08-14 17:36:24,823] ERROR Thread Thread[Thread-4,5,FailOnTimeoutGroup] died (org.apache.zookeeper.server.NIOServerCnxnFactory:92) org.apache.kafka.clients.consumer.CommitFailedException: Commit cannot be completed since the group has already rebalanced and assigned the partitions to another member. This means that the time between subsequent calls to poll() was longer than the configured max.poll.interval.ms, which typically implies that the poll loop is spending too much time message processing. You can address this either by increasing max.poll.interval.ms or by reducing the maximum size of batches returned in poll() with max.poll.records. at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:1257) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator$OffsetCommitResponseHandler.handle(ConsumerCoordinator.java:1164) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1132) at org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:1107) at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:206) at org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:169) at org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:129) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:602) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:412) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:297) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236) at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215) at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:1006) at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1394) at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1348) at kafka.api.PlaintextAdminIntegrationTest$$anon$1.run(PlaintextAdminIntegrationTest.scala:1071) [2020-08-14 17:36:24,848] ERROR [Consumer instanceId=test_instance_id_2, clientId=test_client_id, groupId=test_group_id] Offset commit failed on partition test_topic1-0 at offset 0: Specified group generation id is not valid. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:1191) [2020-08-14 17:36:24,852] ERROR [Consumer clientId=test_client_id, groupId=test_group_id] Offset commit failed on partition test_topic2-0 at offset 0: Specified group generation id is not valid. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator:1191) [2020-08-14 18:29:44,855] ERROR [ReplicaManager broker=1] Error while changing replica dir for partition topic-0 (kafka.server.ReplicaManager:76) org.apache.kafka.common.errors.UnknownTopicOrPartitionException: Error while fetching partition state for topic-0 [2020-08-14 18:29:44,855] ERROR [ReplicaManager broker=0]
[jira] [Commented] (KAFKA-9263) Reocurrence: Transient failure in kafka.api.PlaintextAdminIntegrationTest.testLogStartOffsetCheckpoint and kafka.api.PlaintextAdminIntegrationTest.testAlterReplicaLogDi
[ https://issues.apache.org/jira/browse/KAFKA-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17028762#comment-17028762 ] Chia-Ping Tsai commented on KAFKA-9263: --- update the title since KAFKA-9183 had renamed AdminClientIntegrationTest to PlaintextAdminIntegrationTest > Reocurrence: Transient failure in > kafka.api.PlaintextAdminIntegrationTest.testLogStartOffsetCheckpoint and > kafka.api.PlaintextAdminIntegrationTest.testAlterReplicaLogDirs > -- > > Key: KAFKA-9263 > URL: https://issues.apache.org/jira/browse/KAFKA-9263 > Project: Kafka > Issue Type: Bug > Components: clients >Affects Versions: 2.4.0 >Reporter: John Roesler >Priority: Major > Labels: flaky-test > > This test has failed for me on > https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/9691/testReport/junit/kafka.api/AdminClientIntegrationTest/testAlterReplicaLogDirs/ > {noformat} > Error Message > org.scalatest.exceptions.TestFailedException: only 0 messages are produced > within timeout after replica movement. Producer future > Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for > 1 ms.)) > Stacktrace > org.scalatest.exceptions.TestFailedException: only 0 messages are produced > within timeout after replica movement. Producer future > Some(Failure(java.util.concurrent.TimeoutException: Timeout after waiting for > 1 ms.)) > at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) > at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389) > at org.scalatest.Assertions.fail(Assertions.scala:1091) > at org.scalatest.Assertions.fail$(Assertions.scala:1087) > at org.scalatest.Assertions$.fail(Assertions.scala:1389) > at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:842) > at > kafka.api.AdminClientIntegrationTest.testAlterReplicaLogDirs(AdminClientIntegrationTest.scala:459) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > at > org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) > at > org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) > at > org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) > at > org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) > at > org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:288) > at > org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:282) > at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) > at java.base/java.lang.Thread.run(Thread.java:834) > Standard Output > [2019-12-03 04:54:16,111] ERROR [ReplicaFetcher replicaId=2, leaderId=1, > fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-12-03 04:54:21,711] ERROR [ReplicaFetcher replicaId=0, leaderId=1, > fetcherId=0] Error for partition topic-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-12-03 04:54:21,712] ERROR [ReplicaFetcher replicaId=2, leaderId=1, > fetcherId=0] Error for partition topic-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-12-03 04:54:27,092] ERROR [ReplicaFetcher replicaId=2, leaderId=1, > fetcherId=0] Error for partition unclean-test-topic-1-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-12-03 04:54:27,091] ERROR [ReplicaFetcher replicaId=0, leaderId=1, > fetcherId=0] Error for