[jira] [Commented] (KAFKA-8465) Make sure that the copy of the same topic is evenly distributed across a broker's disk.
[ https://issues.apache.org/jira/browse/KAFKA-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897788#comment-16897788 ] ASF GitHub Bot commented on KAFKA-8465: --- lordcheng10 commented on pull request #6866: [KAFKA-8465]replication strategy for the topic dimension URL: https://github.com/apache/kafka/pull/6866 When some partiton's replication is assigned to a broker, which disks should these copies be placed on the broker? The original strategy is to allocate according to the number of partiitons, but this will cause a partiton with too many topics to be stored on a disk, which may cause disk hotspot problems. In order to solve this problem, we propose an improved strategy: first ensure that the number of partitions of each disk in the topic dimension is even. If the number of partitions of a topic on two disks is equal, then sort according to the total number of partitions on the disk. Select a disk with the least number of partitions to store the current replication. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Make sure that the copy of the same topic is evenly distributed across a > broker's disk. > --- > > Key: KAFKA-8465 > URL: https://issues.apache.org/jira/browse/KAFKA-8465 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.0.1, 2.2.0, 2.2.1 >Reporter: ChenLin >Priority: Major > Attachments: image-2019-07-30-13-40-12-711.png, > image-2019-07-30-13-40-49-878.png, > replication_strategy_for_the_topic_dimension.patch > > > When some partiton's replication is assigned to a broker, which disks should > these copies be placed on the broker? The original strategy is to allocate > according to the number of partiitons。This strategy will result in uneven > disk allocation for the topic dimension. > In order to solve this problem, we propose an improved strategy: first > ensure that the number of partitions of each disk in the topic dimension is > even. If the number of partitions of a topic on two disks is equal, then sort > according to the total number of partitions on the disk. Select a disk with > the least number of partitions to store the current replication. > !image-2019-07-30-13-40-12-711.png! > !image-2019-07-30-13-40-49-878.png! > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8465) Make sure that the copy of the same topic is evenly distributed across a broker's disk.
[ https://issues.apache.org/jira/browse/KAFKA-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897787#comment-16897787 ] ASF GitHub Bot commented on KAFKA-8465: --- lordcheng10 commented on pull request #6866: [KAFKA-8465]replication strategy for the topic dimension URL: https://github.com/apache/kafka/pull/6866 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Make sure that the copy of the same topic is evenly distributed across a > broker's disk. > --- > > Key: KAFKA-8465 > URL: https://issues.apache.org/jira/browse/KAFKA-8465 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.0.1, 2.2.0, 2.2.1 >Reporter: ChenLin >Priority: Major > Attachments: image-2019-07-30-13-40-12-711.png, > image-2019-07-30-13-40-49-878.png, > replication_strategy_for_the_topic_dimension.patch > > > When some partiton's replication is assigned to a broker, which disks should > these copies be placed on the broker? The original strategy is to allocate > according to the number of partiitons。This strategy will result in uneven > disk allocation for the topic dimension. > In order to solve this problem, we propose an improved strategy: first > ensure that the number of partitions of each disk in the topic dimension is > even. If the number of partitions of a topic on two disks is equal, then sort > according to the total number of partitions on the disk. Select a disk with > the least number of partitions to store the current replication. > !image-2019-07-30-13-40-12-711.png! > !image-2019-07-30-13-40-49-878.png! > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8465) Make sure that the copy of the same topic is evenly distributed across a broker's disk.
[ https://issues.apache.org/jira/browse/KAFKA-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897786#comment-16897786 ] ASF GitHub Bot commented on KAFKA-8465: --- lordcheng10 commented on pull request #6866: [KAFKA-8465]replication strategy for the topic dimension URL: https://github.com/apache/kafka/pull/6866 When some partiton's replication is assigned to a broker, which disks should these copies be placed on the broker? The original strategy is to allocate according to the number of partiitons, but this will cause a partiton with too many topics to be stored on a disk, which may cause disk hotspot problems. In order to solve this problem, we propose an improved strategy: first ensure that the number of partitions of each disk in the topic dimension is even. If the number of partitions of a topic on two disks is equal, then sort according to the total number of partitions on the disk. Select a disk with the least number of partitions to store the current replication. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Make sure that the copy of the same topic is evenly distributed across a > broker's disk. > --- > > Key: KAFKA-8465 > URL: https://issues.apache.org/jira/browse/KAFKA-8465 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.0.1, 2.2.0, 2.2.1 >Reporter: ChenLin >Priority: Major > Attachments: image-2019-07-30-13-40-12-711.png, > image-2019-07-30-13-40-49-878.png, > replication_strategy_for_the_topic_dimension.patch > > > When some partiton's replication is assigned to a broker, which disks should > these copies be placed on the broker? The original strategy is to allocate > according to the number of partiitons。This strategy will result in uneven > disk allocation for the topic dimension. > In order to solve this problem, we propose an improved strategy: first > ensure that the number of partitions of each disk in the topic dimension is > even. If the number of partitions of a topic on two disks is equal, then sort > according to the total number of partitions on the disk. Select a disk with > the least number of partitions to store the current replication. > !image-2019-07-30-13-40-12-711.png! > !image-2019-07-30-13-40-49-878.png! > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8729) Augment ProduceResponse error messaging for specific culprit records
[ https://issues.apache.org/jira/browse/KAFKA-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897561#comment-16897561 ] ASF GitHub Bot commented on KAFKA-8729: --- tuvtran commented on pull request #7142: KAFKA-8729, pt 1: Add 4 new metrics to keep track of various types of invalid record rejections URL: https://github.com/apache/kafka/pull/7142 Right now we only have very generic `FailedProduceRequestsPerSec` and `FailedFetchRequestsPerSec` metrics that mark whenever a record is failed on the broker side. To improve the debugging UX, I added 4 new metrics in `BrokerTopicStats` to log various scenarios when an `InvalidRecordException` is thrown when `LogValidator` fails to validate a record: -- `NoKeyCompactedTopicRecordsPerSec`: counter of failures by compacted records with no key -- `InvalidMagicNumberRecordsPerSec`: counter of failures by records with invalid magic number -- `InvalidMessageCrcRecordsPerSec`: counter of failures by records with crc corruption -- `NonIncreasingOffsetRecordsPerSec`: counter of failures by records with invalid offset *More detailed description of your change, if necessary. The PR title and PR message become the squashed commit message, so use a separate comment to ping reviewers.* *Summary of testing strategy (including rationale) for the feature or bug fix. Unit and/or integration tests are expected for any behaviour change and system tests should be considered for larger changes.* ### Committer Checklist (excluded from commit message) - [ ] Verify design and implementation - [ ] Verify test coverage and CI build status - [ ] Verify documentation (including upgrade notes) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Augment ProduceResponse error messaging for specific culprit records > > > Key: KAFKA-8729 > URL: https://issues.apache.org/jira/browse/KAFKA-8729 > Project: Kafka > Issue Type: Improvement > Components: clients, core >Reporter: Guozhang Wang >Assignee: Tu Tran >Priority: Major > > 1. We should replace the misleading CORRUPT_RECORD error code with a new > INVALID_RECORD. > 2. We should augment the ProduceResponse with customizable error message and > indicators of culprit records. > 3. We should change the client-side handling logic of non-retriable > INVALID_RECORD to re-batch the records. > Details see: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (KAFKA-8731) InMemorySessionStore throws NullPointerException on startup
[ https://issues.apache.org/jira/browse/KAFKA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897546#comment-16897546 ] Bill Bejeck edited comment on KAFKA-8731 at 7/31/19 9:33 PM: - merged to trunk and cherry-picked to 2.3 was (Author: bbejeck): cherry-picked to 2.3 > InMemorySessionStore throws NullPointerException on startup > --- > > Key: KAFKA-8731 > URL: https://issues.apache.org/jira/browse/KAFKA-8731 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.3.0 >Reporter: Jonathan Gordon >Assignee: Sophie Blee-Goldman >Priority: Blocker > Fix For: 2.4.0, 2.3.1 > > > I receive a NullPointerException on startup when trying to use the new > InMemorySessionStore via Stores.inMemorySessionStore(...) using the DSL. > Here's the stack trace: > {{ERROR [2019-07-29 21:56:52,246] > org.apache.kafka.streams.processor.internals.StreamThread: stream-thread > [trace_indexer-c8439020-12af-4db2-ad56-3e58cd56540f-StreamThread-1] > Encountered the following error during processing:}} > {{! java.lang.NullPointerException: null}} > {{! at > org.apache.kafka.streams.state.internals.InMemorySessionStore.remove(InMemorySessionStore.java:123)}} > {{! at > org.apache.kafka.streams.state.internals.InMemorySessionStore.put(InMemorySessionStore.java:115)}} > {{! at > org.apache.kafka.streams.state.internals.InMemorySessionStore.lambda$init$0(InMemorySessionStore.java:93)}} > {{! at > org.apache.kafka.streams.processor.internals.StateRestoreCallbackAdapter.lambda$adapt$1(StateRestoreCallbackAdapter.java:47)}} > {{! at > org.apache.kafka.streams.processor.internals.CompositeRestoreListener.restoreBatch(CompositeRestoreListener.java:89)}} > {{! at > org.apache.kafka.streams.processor.internals.StateRestorer.restore(StateRestorer.java:92)}} > {{! at > org.apache.kafka.streams.processor.internals.StoreChangelogReader.processNext(StoreChangelogReader.java:317)}} > {{! at > org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:92)}} > {{! at > org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:328)}} > {{! at > org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:867)}} > {{! at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)}} > {{! at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)}} > > Here's the Slack thread: > [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1564438647169600] > > Here's a current PR aimed at fixing the issue: > [https://github.com/apache/kafka/pull/7132] > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KAFKA-8729) Augment ProduceResponse error messaging for specific culprit records
[ https://issues.apache.org/jira/browse/KAFKA-8729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang updated KAFKA-8729: - Issue Type: Improvement (was: Bug) > Augment ProduceResponse error messaging for specific culprit records > > > Key: KAFKA-8729 > URL: https://issues.apache.org/jira/browse/KAFKA-8729 > Project: Kafka > Issue Type: Improvement > Components: clients, core >Reporter: Guozhang Wang >Assignee: Tu Tran >Priority: Major > > 1. We should replace the misleading CORRUPT_RECORD error code with a new > INVALID_RECORD. > 2. We should augment the ProduceResponse with customizable error message and > indicators of culprit records. > 3. We should change the client-side handling logic of non-retriable > INVALID_RECORD to re-batch the records. > Details see: > https://cwiki.apache.org/confluence/display/KAFKA/KIP-467%3A+Augment+ProduceResponse+error+messaging+for+specific+culprit+records -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8731) InMemorySessionStore throws NullPointerException on startup
[ https://issues.apache.org/jira/browse/KAFKA-8731?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897533#comment-16897533 ] ASF GitHub Bot commented on KAFKA-8731: --- bbejeck commented on pull request #7132: KAFKA-8731: InMemorySessionStore throws NullPointerException on startup URL: https://github.com/apache/kafka/pull/7132 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > InMemorySessionStore throws NullPointerException on startup > --- > > Key: KAFKA-8731 > URL: https://issues.apache.org/jira/browse/KAFKA-8731 > Project: Kafka > Issue Type: Bug > Components: streams >Affects Versions: 2.3.0 >Reporter: Jonathan Gordon >Assignee: Sophie Blee-Goldman >Priority: Blocker > Fix For: 2.4.0, 2.3.1 > > > I receive a NullPointerException on startup when trying to use the new > InMemorySessionStore via Stores.inMemorySessionStore(...) using the DSL. > Here's the stack trace: > {{ERROR [2019-07-29 21:56:52,246] > org.apache.kafka.streams.processor.internals.StreamThread: stream-thread > [trace_indexer-c8439020-12af-4db2-ad56-3e58cd56540f-StreamThread-1] > Encountered the following error during processing:}} > {{! java.lang.NullPointerException: null}} > {{! at > org.apache.kafka.streams.state.internals.InMemorySessionStore.remove(InMemorySessionStore.java:123)}} > {{! at > org.apache.kafka.streams.state.internals.InMemorySessionStore.put(InMemorySessionStore.java:115)}} > {{! at > org.apache.kafka.streams.state.internals.InMemorySessionStore.lambda$init$0(InMemorySessionStore.java:93)}} > {{! at > org.apache.kafka.streams.processor.internals.StateRestoreCallbackAdapter.lambda$adapt$1(StateRestoreCallbackAdapter.java:47)}} > {{! at > org.apache.kafka.streams.processor.internals.CompositeRestoreListener.restoreBatch(CompositeRestoreListener.java:89)}} > {{! at > org.apache.kafka.streams.processor.internals.StateRestorer.restore(StateRestorer.java:92)}} > {{! at > org.apache.kafka.streams.processor.internals.StoreChangelogReader.processNext(StoreChangelogReader.java:317)}} > {{! at > org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:92)}} > {{! at > org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:328)}} > {{! at > org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:867)}} > {{! at > org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:805)}} > {{! at > org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:774)}} > > Here's the Slack thread: > [https://confluentcommunity.slack.com/archives/C48AHTCUQ/p1564438647169600] > > Here's a current PR aimed at fixing the issue: > [https://github.com/apache/kafka/pull/7132] > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (KAFKA-8704) Add PartitionAssignor adapter for backwards compatibility
[ https://issues.apache.org/jira/browse/KAFKA-8704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guozhang Wang resolved KAFKA-8704. -- Resolution: Fixed Fix Version/s: 2.4.0 > Add PartitionAssignor adapter for backwards compatibility > - > > Key: KAFKA-8704 > URL: https://issues.apache.org/jira/browse/KAFKA-8704 > Project: Kafka > Issue Type: Sub-task > Components: clients >Reporter: Sophie Blee-Goldman >Assignee: Sophie Blee-Goldman >Priority: Major > Fix For: 2.4.0 > > > As part of KIP-429, we are deprecating the old > consumer.internal.PartitionAssignor in favor of a [new > consumer.PartitionAssignor|https://issues.apache.org/jira/browse/KAFKA-8703] > interface that is part of the public API. > > Although the old PartitionAssignor was technically part of the internal > package, some users may have implemented it and this change will break source > compatibility for them as they would need to modify their class to implement > the new interface. The number of users affected may be small, but nonetheless > we would like to add an adapter to maintain compatibility for these users. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8179) Incremental Rebalance Protocol for Kafka Consumer
[ https://issues.apache.org/jira/browse/KAFKA-8179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897521#comment-16897521 ] ASF GitHub Bot commented on KAFKA-8179: --- guozhangwang commented on pull request #7110: KAFKA-8179: PartitionAssignorAdapter URL: https://github.com/apache/kafka/pull/7110 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Incremental Rebalance Protocol for Kafka Consumer > - > > Key: KAFKA-8179 > URL: https://issues.apache.org/jira/browse/KAFKA-8179 > Project: Kafka > Issue Type: Improvement > Components: consumer >Reporter: Guozhang Wang >Assignee: Guozhang Wang >Priority: Major > > Recently Kafka community is promoting cooperative rebalancing to mitigate the > pain points in the stop-the-world rebalancing protocol. This ticket is > created to initiate that idea at the Kafka consumer client, which will be > beneficial for heavy-stateful consumers such as Kafka Streams applications. > In short, the scope of this ticket includes reducing unnecessary rebalance > latency due to heavy partition migration: i.e. partitions being revoked and > re-assigned. This would make the built-in consumer assignors (range, > round-robin etc) to be aware of previously assigned partitions and be sticky > in best-effort. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8677) Flakey test GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl
[ https://issues.apache.org/jira/browse/KAFKA-8677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897497#comment-16897497 ] Bruno Cadonna commented on KAFKA-8677: -- https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6624/testReport/junit/kafka.api/SaslGssapiSslEndToEndAuthorizationTest/testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl/ > Flakey test > GroupEndToEndAuthorizationTest#testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl > > > Key: KAFKA-8677 > URL: https://issues.apache.org/jira/browse/KAFKA-8677 > Project: Kafka > Issue Type: Bug > Components: core, security, unit tests >Affects Versions: 2.4.0 >Reporter: Boyang Chen >Priority: Major > Labels: flaky-test > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6325/console] > > *18:43:39* kafka.api.GroupEndToEndAuthorizationTest > > testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl STARTED*18:44:00* > kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl > failed, log available in > /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/core/build/reports/testOutput/kafka.api.GroupEndToEndAuthorizationTest.testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl.test.stdout*18:44:00* > *18:44:00* kafka.api.GroupEndToEndAuthorizationTest > > testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl FAILED*18:44:00* > org.scalatest.exceptions.TestFailedException: Consumed 0 records before > timeout instead of the expected 1 records -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8555) Flaky test ExampleConnectIntegrationTest#testSourceConnector
[ https://issues.apache.org/jira/browse/KAFKA-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897500#comment-16897500 ] Bruno Cadonna commented on KAFKA-8555: -- https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/672/testReport/junit/org.apache.kafka.connect.integration/ExampleConnectIntegrationTest/testSourceConnector/ > Flaky test ExampleConnectIntegrationTest#testSourceConnector > > > Key: KAFKA-8555 > URL: https://issues.apache.org/jira/browse/KAFKA-8555 > Project: Kafka > Issue Type: Bug >Reporter: Boyang Chen >Priority: Major > Attachments: log-job139.txt, log-job141.txt, log-job23145.txt, > log-job23215.txt, log-job6046.txt > > > [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/22798/console] > *02:03:21* > org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector > failed, log available in > /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11/connect/runtime/build/reports/testOutput/org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector.test.stdout*02:03:21* > *02:03:21* > org.apache.kafka.connect.integration.ExampleConnectIntegrationTest > > testSourceConnector FAILED*02:03:21* > org.apache.kafka.connect.errors.DataException: Insufficient records committed > by connector simple-conn in 15000 millis. Records expected=2000, > actual=1013*02:03:21* at > org.apache.kafka.connect.integration.ConnectorHandle.awaitCommits(ConnectorHandle.java:188)*02:03:21* > at > org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector(ExampleConnectIntegrationTest.java:181)*02:03:21* -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8555) Flaky test ExampleConnectIntegrationTest#testSourceConnector
[ https://issues.apache.org/jira/browse/KAFKA-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897496#comment-16897496 ] Bruno Cadonna commented on KAFKA-8555: -- https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6624/testReport/junit/org.apache.kafka.connect.integration/ExampleConnectIntegrationTest/testSourceConnector/ > Flaky test ExampleConnectIntegrationTest#testSourceConnector > > > Key: KAFKA-8555 > URL: https://issues.apache.org/jira/browse/KAFKA-8555 > Project: Kafka > Issue Type: Bug >Reporter: Boyang Chen >Priority: Major > Attachments: log-job139.txt, log-job141.txt, log-job23145.txt, > log-job23215.txt, log-job6046.txt > > > [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/22798/console] > *02:03:21* > org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector > failed, log available in > /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11/connect/runtime/build/reports/testOutput/org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector.test.stdout*02:03:21* > *02:03:21* > org.apache.kafka.connect.integration.ExampleConnectIntegrationTest > > testSourceConnector FAILED*02:03:21* > org.apache.kafka.connect.errors.DataException: Insufficient records committed > by connector simple-conn in 15000 millis. Records expected=2000, > actual=1013*02:03:21* at > org.apache.kafka.connect.integration.ConnectorHandle.awaitCommits(ConnectorHandle.java:188)*02:03:21* > at > org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector(ExampleConnectIntegrationTest.java:181)*02:03:21* -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8260) Flaky test ConsumerBounceTest#testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup
[ https://issues.apache.org/jira/browse/KAFKA-8260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897416#comment-16897416 ] Bill Bejeck commented on KAFKA-8260: Failed again on [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/671/] {noformat} Error Message org.scalatest.exceptions.TestFailedException: The remaining consumers in the group could not fetch the expected records Stacktrace org.scalatest.exceptions.TestFailedException: The remaining consumers in the group could not fetch the expected records at org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530) at org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529) at org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389) at org.scalatest.Assertions.fail(Assertions.scala:1091) at org.scalatest.Assertions.fail$(Assertions.scala:1087) at org.scalatest.Assertions$.fail(Assertions.scala:1389) at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:822) at kafka.api.ConsumerBounceTest.testRollingBrokerRestartsWithSmallerMaxGroupSizeConfigDisruptsBigGroup(ConsumerBounceTest.scala:330) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at org.junit.runners.ParentRunner.run(ParentRunner.java:412) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at jdk.internal.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118) at jdk.internal.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at java.base/jdk.internal.reflect.Delegating
[jira] [Commented] (KAFKA-8541) Flakey test LeaderElectionCommandTest#testTopicPartition
[ https://issues.apache.org/jira/browse/KAFKA-8541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897410#comment-16897410 ] Bill Bejeck commented on KAFKA-8541: Failed again on [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/671/] {noformat} Error Message kafka.common.AdminCommandFailedException: Timeout waiting for election results Stacktrace kafka.common.AdminCommandFailedException: Timeout waiting for election results at kafka.admin.LeaderElectionCommand$.electLeaders(LeaderElectionCommand.scala:133) at kafka.admin.LeaderElectionCommand$.run(LeaderElectionCommand.scala:88) at kafka.admin.LeaderElectionCommand$.main(LeaderElectionCommand.scala:41) at kafka.admin.LeaderElectionCommandTest.$anonfun$testTopicPartition$1(LeaderElectionCommandTest.scala:127) at kafka.admin.LeaderElectionCommandTest.$anonfun$testTopicPartition$1$adapted(LeaderElectionCommandTest.scala:105) at kafka.utils.TestUtils$.resource(TestUtils.scala:1588) at kafka.admin.LeaderElectionCommandTest.testTopicPartition(LeaderElectionCommandTest.scala:105) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12) at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56) at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at org.junit.runners.BlockJUnit4ClassRunner$1.evaluate(BlockJUnit4ClassRunner.java:100) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:365) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:103) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:63) at org.junit.runners.ParentRunner$4.run(ParentRunner.java:330) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:78) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:328) at org.junit.runners.ParentRunner.access$100(ParentRunner.java:65) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:292) at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) at org.junit.runners.ParentRunner$3.evaluate(ParentRunner.java:305) at org.junit.runners.ParentRunner.run(ParentRunner.java:412) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at jdk.internal.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.base/java.lang.reflect.Method.invoke(Method.java:566) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118) at jdk.internal.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.
[jira] [Commented] (KAFKA-8672) RebalanceSourceConnectorsIntegrationTest#testReconfigConnector
[ https://issues.apache.org/jira/browse/KAFKA-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897374#comment-16897374 ] Matthias J. Sax commented on KAFKA-8672: [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6606/testReport/junit/org.apache.kafka.connect.integration/RebalanceSourceConnectorsIntegrationTest/testReconfigConnector/] > RebalanceSourceConnectorsIntegrationTest#testReconfigConnector > -- > > Key: KAFKA-8672 > URL: https://issues.apache.org/jira/browse/KAFKA-8672 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect >Affects Versions: 2.4.0 >Reporter: Matthias J. Sax >Priority: Critical > Labels: flaky-test > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6281/testReport/junit/org.apache.kafka.connect.integration/RebalanceSourceConnectorsIntegrationTest/testReconfigConnector/] > {quote}java.lang.RuntimeException: Could not find enough records. found 33, > expected 100 at > org.apache.kafka.connect.util.clusters.EmbeddedKafkaCluster.consume(EmbeddedKafkaCluster.java:306) > at > org.apache.kafka.connect.integration.RebalanceSourceConnectorsIntegrationTest.testReconfigConnector(RebalanceSourceConnectorsIntegrationTest.java:180){quote} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8661) Flaky Test RebalanceSourceConnectorsIntegrationTest#testStartTwoConnectors
[ https://issues.apache.org/jira/browse/KAFKA-8661?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897373#comment-16897373 ] Matthias J. Sax commented on KAFKA-8661: [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/6606/testReport/junit/org.apache.kafka.connect.integration/RebalanceSourceConnectorsIntegrationTest/testStartTwoConnectors/] and [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/23793/testReport/junit/org.apache.kafka.connect.integration/RebalanceSourceConnectorsIntegrationTest/testStartTwoConnectors/] > Flaky Test RebalanceSourceConnectorsIntegrationTest#testStartTwoConnectors > -- > > Key: KAFKA-8661 > URL: https://issues.apache.org/jira/browse/KAFKA-8661 > Project: Kafka > Issue Type: Bug > Components: KafkaConnect, unit tests >Affects Versions: 2.3.0 >Reporter: Matthias J. Sax >Priority: Critical > Labels: flaky-test > Fix For: 2.4.0, 2.3.1 > > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/224/testReport/junit/org.apache.kafka.connect.integration/RebalanceSourceConnectorsIntegrationTest/testStartTwoConnectors/] > {quote}java.lang.AssertionError: Condition not met within timeout 3. > Connector tasks did not start in time. at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:376) at > org.apache.kafka.test.TestUtils.waitForCondition(TestUtils.java:353) at > org.apache.kafka.connect.integration.RebalanceSourceConnectorsIntegrationTest.testStartTwoConnectors(RebalanceSourceConnectorsIntegrationTest.java:120){quote} -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KAFKA-8465) Make sure that the copy of the same topic is evenly distributed across a broker's disk.
[ https://issues.apache.org/jira/browse/KAFKA-8465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ChenLin updated KAFKA-8465: --- Summary: Make sure that the copy of the same topic is evenly distributed across a broker's disk. (was: Copying strategy for the topic dimension) > Make sure that the copy of the same topic is evenly distributed across a > broker's disk. > --- > > Key: KAFKA-8465 > URL: https://issues.apache.org/jira/browse/KAFKA-8465 > Project: Kafka > Issue Type: Improvement > Components: core >Affects Versions: 2.0.1, 2.2.0, 2.2.1 >Reporter: ChenLin >Priority: Major > Attachments: image-2019-07-30-13-40-12-711.png, > image-2019-07-30-13-40-49-878.png, > replication_strategy_for_the_topic_dimension.patch > > > When some partiton's replication is assigned to a broker, which disks should > these copies be placed on the broker? The original strategy is to allocate > according to the number of partiitons。This strategy will result in uneven > disk allocation for the topic dimension. > In order to solve this problem, we propose an improved strategy: first > ensure that the number of partitions of each disk in the topic dimension is > even. If the number of partitions of a topic on two disks is equal, then sort > according to the total number of partitions on the disk. Select a disk with > the least number of partitions to store the current replication. > !image-2019-07-30-13-40-12-711.png! > !image-2019-07-30-13-40-49-878.png! > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (KAFKA-8688) Upgrade system tests fail due to data loss with older message format
[ https://issues.apache.org/jira/browse/KAFKA-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rajini Sivaram resolved KAFKA-8688. --- Resolution: Fixed Reviewer: Ismael Juma Fix Version/s: 2.4.0 > Upgrade system tests fail due to data loss with older message format > > > Key: KAFKA-8688 > URL: https://issues.apache.org/jira/browse/KAFKA-8688 > Project: Kafka > Issue Type: Bug > Components: system tests >Reporter: Rajini Sivaram >Assignee: Rajini Sivaram >Priority: Major > Fix For: 2.4.0 > > > System test failure for TestUpgrade/test_upgrade: from_kafka_version=0.9.0.1, > to_message_format_version=0.9.0.1, compression_types=.lz4 > {code:java} > 3 acked message did not make it to the Consumer. They are: [33906, 33900, > 33903]. The first 3 missing messages were validated to ensure they are in > Kafka's data files. 3 were missing. This suggests data loss. Here are some of > the messages not found in the data files: [33906, 33900, 33903] > Traceback (most recent call last): > File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py", > line 132, in run > data = self.run_test() > File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py", > line 189, in run_test > return self.test_context.function(self.test) > File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/mark/_mark.py", > line 428, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/core/upgrade_test.py", > line 136, in test_upgrade > self.run_produce_consume_validate(core_test_action=lambda: > self.perform_upgrade(from_kafka_version, > File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 112, in run_produce_consume_validate > self.validate() > File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 135, in validate > assert succeeded, error_msg > AssertionError: 3 acked message did not make it to the Consumer. They are: > [33906, 33900, 33903]. The first 3 missing messages were validated to ensure > they are in Kafka's data files. 3 were missing. This suggests data loss. Here > are some of the messages not found in the data files: [33906, 33900, 33903] > {code} > Logs show: > # Broker 1 is leader of partition > # Broker 2 successfully fetches from offset 10947 and processes request > # Broker 2 sends fetch request to broker 1 for offset 10950 > # Broker 1 sets is HW to 10950, acknowledges produce requests up to HW > # Broker 2 is elected leader > # Broker 2 truncates to its local HW of 10947 - 3 messages are lost > This data loss is a known issue that was fixed under KIP-101. But since this > can still happen with older messages formats, we should update upgrade tests > to cope with some data loss. > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8688) Upgrade system tests fail due to data loss with older message format
[ https://issues.apache.org/jira/browse/KAFKA-8688?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897274#comment-16897274 ] Rajini Sivaram commented on KAFKA-8688: --- PR: https://github.com/apache/kafka/pull/7102 > Upgrade system tests fail due to data loss with older message format > > > Key: KAFKA-8688 > URL: https://issues.apache.org/jira/browse/KAFKA-8688 > Project: Kafka > Issue Type: Bug > Components: system tests >Reporter: Rajini Sivaram >Assignee: Rajini Sivaram >Priority: Major > > System test failure for TestUpgrade/test_upgrade: from_kafka_version=0.9.0.1, > to_message_format_version=0.9.0.1, compression_types=.lz4 > {code:java} > 3 acked message did not make it to the Consumer. They are: [33906, 33900, > 33903]. The first 3 missing messages were validated to ensure they are in > Kafka's data files. 3 were missing. This suggests data loss. Here are some of > the messages not found in the data files: [33906, 33900, 33903] > Traceback (most recent call last): > File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py", > line 132, in run > data = self.run_test() > File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/tests/runner_client.py", > line 189, in run_test > return self.test_context.function(self.test) > File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/venv/local/lib/python2.7/site-packages/ducktape-0.7.5-py2.7.egg/ducktape/mark/_mark.py", > line 428, in wrapper > return functools.partial(f, *args, **kwargs)(*w_args, **w_kwargs) > File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/core/upgrade_test.py", > line 136, in test_upgrade > self.run_produce_consume_validate(core_test_action=lambda: > self.perform_upgrade(from_kafka_version, > File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 112, in run_produce_consume_validate > self.validate() > File > "/home/jenkins/workspace/system-test-kafka_5.3.x/kafka/tests/kafkatest/tests/produce_consume_validate.py", > line 135, in validate > assert succeeded, error_msg > AssertionError: 3 acked message did not make it to the Consumer. They are: > [33906, 33900, 33903]. The first 3 missing messages were validated to ensure > they are in Kafka's data files. 3 were missing. This suggests data loss. Here > are some of the messages not found in the data files: [33906, 33900, 33903] > {code} > Logs show: > # Broker 1 is leader of partition > # Broker 2 successfully fetches from offset 10947 and processes request > # Broker 2 sends fetch request to broker 1 for offset 10950 > # Broker 1 sets is HW to 10950, acknowledges produce requests up to HW > # Broker 2 is elected leader > # Broker 2 truncates to its local HW of 10947 - 3 messages are lost > This data loss is a known issue that was fixed under KIP-101. But since this > can still happen with older messages formats, we should update upgrade tests > to cope with some data loss. > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (KAFKA-8735) BrokerMetadataCheckPoint should check metadata.properties existence itself
[ https://issues.apache.org/jira/browse/KAFKA-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897153#comment-16897153 ] Qinghui Xu edited comment on KAFKA-8735 at 7/31/19 1:03 PM: Errors during the tests: {code:java} 2019-07-30 15:36:31 ERROR BrokerMetadataCheckpoint:74 - Failed to read meta.properties file under dir /var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507/meta.properties due to /var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507/meta.properties 2019-07-30 15:36:31 ERROR KafkaServer:76 - Fail to read meta.properties under log directory /var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507 java.nio.file.NoSuchFileException: /var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507/meta.properties at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) at java.nio.file.Files.newByteChannel(Files.java:361) at java.nio.file.Files.newByteChannel(Files.java:407) at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) at java.nio.file.Files.newInputStream(Files.java:152) at org.apache.kafka.common.utils.Utils.loadProps(Utils.java:574) at kafka.server.BrokerMetadataCheckpoint.liftedTree2$1(BrokerMetadataCheckpoint.scala:63) at kafka.server.BrokerMetadataCheckpoint.read(BrokerMetadataCheckpoint.scala:62) at kafka.server.KafkaServer$$anonfun$getBrokerIdAndOfflineDirs$1.apply(KafkaServer.scala:668) at kafka.server.KafkaServer$$anonfun$getBrokerIdAndOfflineDirs$1.apply(KafkaServer.scala:666) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at kafka.server.KafkaServer.getBrokerIdAndOfflineDirs(KafkaServer.scala:666) at kafka.server.KafkaServer.startup(KafkaServer.scala:209) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38) ... at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java
[jira] [Commented] (KAFKA-8735) BrokerMetadataCheckPoint should check metadata.properties existence itself
[ https://issues.apache.org/jira/browse/KAFKA-8735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897153#comment-16897153 ] Qinghui Xu commented on KAFKA-8735: --- Errors during the tests: {code:java} 2019-07-30 15:36:31 ERROR BrokerMetadataCheckpoint:74 - Failed to read meta.properties file under dir /var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507/meta.properties due to /var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507/meta.properties 2019-07-30 15:36:31 ERROR KafkaServer:76 - Fail to read meta.properties under log directory /var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507 java.nio.file.NoSuchFileException: /var/folders/h9/msx_bvyj4x1cmcc6wmndg0y416l7n9/T/junit4372466223918887888/junit4263066398108675507/meta.properties at sun.nio.fs.UnixException.translateToIOException(UnixException.java:86) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102) at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107) at sun.nio.fs.UnixFileSystemProvider.newByteChannel(UnixFileSystemProvider.java:214) at java.nio.file.Files.newByteChannel(Files.java:361) at java.nio.file.Files.newByteChannel(Files.java:407) at java.nio.file.spi.FileSystemProvider.newInputStream(FileSystemProvider.java:384) at java.nio.file.Files.newInputStream(Files.java:152) at org.apache.kafka.common.utils.Utils.loadProps(Utils.java:574) at kafka.server.BrokerMetadataCheckpoint.liftedTree2$1(BrokerMetadataCheckpoint.scala:63) at kafka.server.BrokerMetadataCheckpoint.read(BrokerMetadataCheckpoint.scala:62) at kafka.server.KafkaServer$$anonfun$getBrokerIdAndOfflineDirs$1.apply(KafkaServer.scala:668) at kafka.server.KafkaServer$$anonfun$getBrokerIdAndOfflineDirs$1.apply(KafkaServer.scala:666) at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:35) at kafka.server.KafkaServer.getBrokerIdAndOfflineDirs(KafkaServer.scala:666) at kafka.server.KafkaServer.startup(KafkaServer.scala:209) at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:38) ... at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:46) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.ExternalResource$1.evaluate(ExternalResource.java:48) at org.junit.rules.RunRules.evaluate(RunRules.java:20) at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70) at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50) at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238) at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63) at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236) at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53) at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229) at org.junit.runners.ParentRunner.run(ParentRunner.java:309) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.runTestClass(JUnitTestClassExecutor.java:110) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:58) at org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecutor.execute(JUnitTestClassExecutor.java:38) at org.gradle.api.internal.tasks.testing.junit.AbstractJUnitTestClassProcessor.processTestClass(AbstractJUnitTestClassProcessor.java:62) at org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:51) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35) at org.gradle.internal.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24) at org.gradle.internal.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32) at org.gradle.internal.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93) at com.sun.proxy.$Proxy2.processTestClass(Unknown Source) at org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:118) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.gradle.internal.dispatch.Reflection
[jira] [Commented] (KAFKA-8589) Flakey test ResetConsumerGroupOffsetTest#testResetOffsetsExistingTopic
[ https://issues.apache.org/jira/browse/KAFKA-8589?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897081#comment-16897081 ] Bruno Cadonna commented on KAFKA-8589: -- https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/23807/testReport/junit/kafka.admin/ResetConsumerGroupOffsetTest/testResetOffsetsExistingTopic/ > Flakey test ResetConsumerGroupOffsetTest#testResetOffsetsExistingTopic > -- > > Key: KAFKA-8589 > URL: https://issues.apache.org/jira/browse/KAFKA-8589 > Project: Kafka > Issue Type: Bug > Components: admin, clients, unit tests >Affects Versions: 2.4.0 >Reporter: Boyang Chen >Priority: Major > Labels: flaky-test > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/5724/consoleFull] > *20:25:15* > kafka.admin.ResetConsumerGroupOffsetTest.testResetOffsetsExistingTopic > failed, log available in > /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk11-scala2.12/core/build/reports/testOutput/kafka.admin.ResetConsumerGroupOffsetTest.testResetOffsetsExistingTopic.test.stdout*20:25:15* > *20:25:15* kafka.admin.ResetConsumerGroupOffsetTest > > testResetOffsetsExistingTopic FAILED*20:25:15* > java.util.concurrent.ExecutionException: > org.apache.kafka.common.errors.CoordinatorNotAvailableException: The > coordinator is not available.*20:25:15* at > org.apache.kafka.common.internals.KafkaFutureImpl.wrapAndThrow(KafkaFutureImpl.java:45)*20:25:15* > at > org.apache.kafka.common.internals.KafkaFutureImpl.access$000(KafkaFutureImpl.java:32)*20:25:15* > at > org.apache.kafka.common.internals.KafkaFutureImpl$SingleWaiter.await(KafkaFutureImpl.java:89)*20:25:15* > at > org.apache.kafka.common.internals.KafkaFutureImpl.get(KafkaFutureImpl.java:260)*20:25:15* > at > kafka.admin.ConsumerGroupCommand$ConsumerGroupService.$anonfun$resetOffsets$1(ConsumerGroupCommand.scala:379)*20:25:15* > at > scala.collection.TraversableOnce.$anonfun$foldLeft$1(TraversableOnce.scala:160)*20:25:15* > at > scala.collection.TraversableOnce.$anonfun$foldLeft$1$adapted(TraversableOnce.scala:160)*20:25:15* > at scala.collection.Iterator.foreach(Iterator.scala:941)*20:25:15* > at scala.collection.Iterator.foreach$(Iterator.scala:941)*20:25:15* > at > scala.collection.AbstractIterator.foreach(Iterator.scala:1429)*20:25:15* >at scala.collection.IterableLike.foreach(IterableLike.scala:74)*20:25:15* >at > scala.collection.IterableLike.foreach$(IterableLike.scala:73)*20:25:15* > at scala.collection.AbstractIterable.foreach(Iterable.scala:56)*20:25:15* > at > scala.collection.TraversableOnce.foldLeft(TraversableOnce.scala:160)*20:25:15* > at > scala.collection.TraversableOnce.foldLeft$(TraversableOnce.scala:158)*20:25:15* > at > scala.collection.AbstractTraversable.foldLeft(Traversable.scala:108)*20:25:15* > at > kafka.admin.ConsumerGroupCommand$ConsumerGroupService.resetOffsets(ConsumerGroupCommand.scala:377)*20:25:15* > at > kafka.admin.ResetConsumerGroupOffsetTest.resetOffsets(ResetConsumerGroupOffsetTest.scala:507)*20:25:15* > at > kafka.admin.ResetConsumerGroupOffsetTest.resetAndAssertOffsets(ResetConsumerGroupOffsetTest.scala:477)*20:25:15* > at > kafka.admin.ResetConsumerGroupOffsetTest.testResetOffsetsExistingTopic(ResetConsumerGroupOffsetTest.scala:123)*20:25:15* > *20:25:15* Caused by:*20:25:15* > org.apache.kafka.common.errors.CoordinatorNotAvailableException: The > coordinator is not available.*20* -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8555) Flaky test ExampleConnectIntegrationTest#testSourceConnector
[ https://issues.apache.org/jira/browse/KAFKA-8555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897079#comment-16897079 ] Bruno Cadonna commented on KAFKA-8555: -- https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/667/testReport/junit/org.apache.kafka.connect.integration/ExampleConnectIntegrationTest/testSourceConnector/ > Flaky test ExampleConnectIntegrationTest#testSourceConnector > > > Key: KAFKA-8555 > URL: https://issues.apache.org/jira/browse/KAFKA-8555 > Project: Kafka > Issue Type: Bug >Reporter: Boyang Chen >Priority: Major > Attachments: log-job139.txt, log-job141.txt, log-job23145.txt, > log-job23215.txt, log-job6046.txt > > > [https://builds.apache.org/job/kafka-pr-jdk8-scala2.11/22798/console] > *02:03:21* > org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector > failed, log available in > /home/jenkins/jenkins-slave/workspace/kafka-pr-jdk8-scala2.11/connect/runtime/build/reports/testOutput/org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector.test.stdout*02:03:21* > *02:03:21* > org.apache.kafka.connect.integration.ExampleConnectIntegrationTest > > testSourceConnector FAILED*02:03:21* > org.apache.kafka.connect.errors.DataException: Insufficient records committed > by connector simple-conn in 15000 millis. Records expected=2000, > actual=1013*02:03:21* at > org.apache.kafka.connect.integration.ConnectorHandle.awaitCommits(ConnectorHandle.java:188)*02:03:21* > at > org.apache.kafka.connect.integration.ExampleConnectIntegrationTest.testSourceConnector(ExampleConnectIntegrationTest.java:181)*02:03:21* -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (KAFKA-8600) Replace DescribeDelegationToken request/response with automated protocol
[ https://issues.apache.org/jira/browse/KAFKA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897047#comment-16897047 ] Mickael Maison edited comment on KAFKA-8600 at 7/31/19 11:19 AM: - I already have PRs opened for Renew and Expire, so feel free to review them if you have time: [https://github.com/apache/kafka/pull/7038] [https://github.com/apache/kafka/pull/7098] I've not started looking at this one, so grab it if you want. was (Author: mimaison): I already have PRs opened for Renew and Expire, so feel free to review them if you have time: [https://github.com/apache/kafka/pull/7038] [https://github.com/apache/kafka/pull/7098 ] I've not started looking at this one, so grab it if you want. > Replace DescribeDelegationToken request/response with automated protocol > > > Key: KAFKA-8600 > URL: https://issues.apache.org/jira/browse/KAFKA-8600 > Project: Kafka > Issue Type: Sub-task >Reporter: Mickael Maison >Assignee: Viktor Somogyi-Vass >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8600) Replace DescribeDelegationToken request/response with automated protocol
[ https://issues.apache.org/jira/browse/KAFKA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897058#comment-16897058 ] Viktor Somogyi-Vass commented on KAFKA-8600: Thanks, I'll make some time for the reviews too! > Replace DescribeDelegationToken request/response with automated protocol > > > Key: KAFKA-8600 > URL: https://issues.apache.org/jira/browse/KAFKA-8600 > Project: Kafka > Issue Type: Sub-task >Reporter: Mickael Maison >Assignee: Mickael Maison >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Assigned] (KAFKA-8600) Replace DescribeDelegationToken request/response with automated protocol
[ https://issues.apache.org/jira/browse/KAFKA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Viktor Somogyi-Vass reassigned KAFKA-8600: -- Assignee: Viktor Somogyi-Vass (was: Mickael Maison) > Replace DescribeDelegationToken request/response with automated protocol > > > Key: KAFKA-8600 > URL: https://issues.apache.org/jira/browse/KAFKA-8600 > Project: Kafka > Issue Type: Sub-task >Reporter: Mickael Maison >Assignee: Viktor Somogyi-Vass >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8600) Replace DescribeDelegationToken request/response with automated protocol
[ https://issues.apache.org/jira/browse/KAFKA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16897047#comment-16897047 ] Mickael Maison commented on KAFKA-8600: --- I already have PRs opened for Renew and Expire, so feel free to review them if you have time: [https://github.com/apache/kafka/pull/7038] [https://github.com/apache/kafka/pull/7098 ] I've not started looking at this one, so grab it if you want. > Replace DescribeDelegationToken request/response with automated protocol > > > Key: KAFKA-8600 > URL: https://issues.apache.org/jira/browse/KAFKA-8600 > Project: Kafka > Issue Type: Sub-task >Reporter: Mickael Maison >Assignee: Mickael Maison >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Resolved] (KAFKA-8722) Data crc check repair
[ https://issues.apache.org/jira/browse/KAFKA-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jason Gustafson resolved KAFKA-8722. Resolution: Fixed Fix Version/s: (was: 0.10.2.2) 0.10.2.3 > Data crc check repair > - > > Key: KAFKA-8722 > URL: https://issues.apache.org/jira/browse/KAFKA-8722 > Project: Kafka > Issue Type: Improvement > Components: log >Affects Versions: 0.10.2.2 >Reporter: ChenLin >Priority: Major > Fix For: 0.10.2.3 > > Attachments: image-2019-07-27-14-50-08-128.png, > image-2019-07-27-14-50-58-300.png, image-2019-07-27-14-56-25-610.png, > image-2019-07-27-14-57-06-687.png, image-2019-07-27-15-05-12-565.png, > image-2019-07-27-15-06-07-123.png, image-2019-07-27-15-10-21-709.png, > image-2019-07-27-15-18-22-716.png, image-2019-07-30-11-39-01-605.png > > > In our production environment, when we consume kafka's topic data in an > operating program, we found an error: > org.apache.kafka.common.KafkaException: Record for partition > rl_dqn_debug_example-49 at offset 2911287689 is invalid, cause: Record is > corrupt (stored crc = 3580880396, computed crc = 1701403171) > at > org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:869) > at > org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:788) > at > org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:480) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1188) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1046) > at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:88) > at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120) > at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75) > at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50) > at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) > At this point we used the kafka.tools.DumpLogSegments tool to parse the disk > log file and found that there was indeed dirty data: > !image-2019-07-27-14-57-06-687.png! > By looking at the code, I found that in some cases kafka would not verify the > data and write it to disk, so we fixed it. > We found that when record.offset is not equal to the offset we are > expecting, kafka will set the variable inPlaceAssignment to false. When > inPlaceAssignment is false, data will not be verified: > !image-2019-07-27-14-50-58-300.png! > !image-2019-07-27-14-50-08-128.png! > Our repairs are as follows: > !image-2019-07-30-11-39-01-605.png! > We did a comparative test for this. By modifying the client-side producer > code, we made some dirty data. For the original kafka version, it was able to > write to the disk normally, but when it was consumed, it was reported, but > our repaired version was written. At the time, it can be verified, so this > producer write failed: > !image-2019-07-27-15-05-12-565.png! > At this time, when the client consumes, an error will be reported: > !image-2019-07-27-15-06-07-123.png! > When the kafka server is replaced with the repaired version, the producer > will verify that the dirty data is written. The producer failed to write the > data this time > !image-2019-07-27-15-10-21-709.png! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8722) Data crc check repair
[ https://issues.apache.org/jira/browse/KAFKA-8722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896978#comment-16896978 ] ASF GitHub Bot commented on KAFKA-8722: --- hachikuji commented on pull request #7124: KAFKA-8722: Data crc check repair URL: https://github.com/apache/kafka/pull/7124 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Data crc check repair > - > > Key: KAFKA-8722 > URL: https://issues.apache.org/jira/browse/KAFKA-8722 > Project: Kafka > Issue Type: Improvement > Components: log >Affects Versions: 0.10.2.2 >Reporter: ChenLin >Priority: Major > Fix For: 0.10.2.2 > > Attachments: image-2019-07-27-14-50-08-128.png, > image-2019-07-27-14-50-58-300.png, image-2019-07-27-14-56-25-610.png, > image-2019-07-27-14-57-06-687.png, image-2019-07-27-15-05-12-565.png, > image-2019-07-27-15-06-07-123.png, image-2019-07-27-15-10-21-709.png, > image-2019-07-27-15-18-22-716.png, image-2019-07-30-11-39-01-605.png > > > In our production environment, when we consume kafka's topic data in an > operating program, we found an error: > org.apache.kafka.common.KafkaException: Record for partition > rl_dqn_debug_example-49 at offset 2911287689 is invalid, cause: Record is > corrupt (stored crc = 3580880396, computed crc = 1701403171) > at > org.apache.kafka.clients.consumer.internals.Fetcher.parseRecord(Fetcher.java:869) > at > org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:788) > at > org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:480) > at > org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1188) > at > org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1046) > at kafka.consumer.NewShinyConsumer.receive(BaseConsumer.scala:88) > at kafka.tools.ConsoleConsumer$.process(ConsoleConsumer.scala:120) > at kafka.tools.ConsoleConsumer$.run(ConsoleConsumer.scala:75) > at kafka.tools.ConsoleConsumer$.main(ConsoleConsumer.scala:50) > at kafka.tools.ConsoleConsumer.main(ConsoleConsumer.scala) > At this point we used the kafka.tools.DumpLogSegments tool to parse the disk > log file and found that there was indeed dirty data: > !image-2019-07-27-14-57-06-687.png! > By looking at the code, I found that in some cases kafka would not verify the > data and write it to disk, so we fixed it. > We found that when record.offset is not equal to the offset we are > expecting, kafka will set the variable inPlaceAssignment to false. When > inPlaceAssignment is false, data will not be verified: > !image-2019-07-27-14-50-58-300.png! > !image-2019-07-27-14-50-08-128.png! > Our repairs are as follows: > !image-2019-07-30-11-39-01-605.png! > We did a comparative test for this. By modifying the client-side producer > code, we made some dirty data. For the original kafka version, it was able to > write to the disk normally, but when it was consumed, it was reported, but > our repaired version was written. At the time, it can be verified, so this > producer write failed: > !image-2019-07-27-15-05-12-565.png! > At this time, when the client consumes, an error will be reported: > !image-2019-07-27-15-06-07-123.png! > When the kafka server is replaced with the repaired version, the producer > will verify that the dirty data is written. The producer failed to write the > data this time > !image-2019-07-27-15-10-21-709.png! -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Comment Edited] (KAFKA-8600) Replace DescribeDelegationToken request/response with automated protocol
[ https://issues.apache.org/jira/browse/KAFKA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896966#comment-16896966 ] Viktor Somogyi-Vass edited comment on KAFKA-8600 at 7/31/19 9:20 AM: - Hey [~mimaison] are you working on this? I probably would like to get this done as I'm working on this part anyway as part of KIP-373. Would you be fine if I implemented this? (In fact I could get done the other two DT protocols as well :) ) was (Author: viktorsomogyi): Hey [~mimaison] are you working on this? I probably would like to get this done as I'm working on this part anyway as part of KIP-373. Would you be fine if I implemented this? > Replace DescribeDelegationToken request/response with automated protocol > > > Key: KAFKA-8600 > URL: https://issues.apache.org/jira/browse/KAFKA-8600 > Project: Kafka > Issue Type: Sub-task >Reporter: Mickael Maison >Assignee: Mickael Maison >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8738) Cleaning thread blocked when more than one ALTER_REPLICA_LOG_DIRS requests sent
[ https://issues.apache.org/jira/browse/KAFKA-8738?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896969#comment-16896969 ] Alexandre Dupriez commented on KAFKA-8738: -- Interesting. Do you have a self-contained use case to reproduce? > Cleaning thread blocked when more than one ALTER_REPLICA_LOG_DIRS requests > sent > > > Key: KAFKA-8738 > URL: https://issues.apache.org/jira/browse/KAFKA-8738 > Project: Kafka > Issue Type: Bug >Affects Versions: 2.1.1 >Reporter: dingsainan >Priority: Major > > Hi, > > I am experiencing one situation that the log cleaner dose not work for the > related topic-partition when using --kafka-reassign-partitions.sh tool for > V2.1.1 for more than one time frequently. > > My operation: > submitting one task for migration replica in one same broker first, when > the previous task still in progress, we submit one new task for the same > topic-partition. > > {code:java} > // the first task: > {"partitions": > [{"topic": "lancer_ops_billions_all_log_json_billions", > "partition": 1, > "replicas": [6,15], > "log_dirs": ["any","/data/mnt/storage02/datum/kafka_data"]}] > } > //the second task > {"partitions": > [{"topic": "lancer_ops_billions_all_log_json_billions", > "partition": 1, > "replicas": [6,15], > "log_dirs": ["any","/data/mnt/storage03/datum/kafka_data"]}] > } > > {code} > > My search: > Kafka executes abortAndPauseCleaning() once task is submitted, shortly, > another task is submitted for the same topic-partition, so the clean thread > status is {color:#ff}LogCleaningPaused(2){color} currently. When the > second task completed, the clean thread will be resumed for this > topic-partition once. In my case, the previous task is killed directly, no > resumeClean() is executed for the first task, so when the second task is > completed, the clean status for the topic-partition is still > {color:#ff}LogCleaningPaused(1){color}, which blocks the clean thread for > the topic-partition. > > _That's all my search, please confirm._ > > _Thanks_ > _Nora_ -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Updated] (KAFKA-8733) Offline partitions occur when leader's disk is slow in reads while responding to follower fetch requests.
[ https://issues.apache.org/jira/browse/KAFKA-8733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Satish Duggana updated KAFKA-8733: -- Description: We found offline partitions issue multiple times on some of the hosts in our clusters. After going through the broker logs and hosts’s disk stats, it looks like this issue occurs whenever the read/write operations take more time on that disk. In a particular case where read time is more than the replica.lag.time.max.ms, follower replicas will be out of sync as their earlier fetch requests are stuck while reading the local log and their fetch status is not yet updated as mentioned in the below code of `ReplicaManager`. If there is an issue in reading the data from the log for a duration more than replica.lag.time.max.ms then all the replicas will be out of sync and partition becomes offline if min.isr.replicas > 1 and unclean.leader.election is false. {code:java} def readFromLog(): Seq[(TopicPartition, LogReadResult)] = { val result = readFromLocalLog( // this call took more than `replica.lag.time.max.ms` replicaId = replicaId, fetchOnlyFromLeader = fetchOnlyFromLeader, readOnlyCommitted = fetchOnlyCommitted, fetchMaxBytes = fetchMaxBytes, hardMaxBytesLimit = hardMaxBytesLimit, readPartitionInfo = fetchInfos, quota = quota, isolationLevel = isolationLevel) if (isFromFollower) updateFollowerLogReadResults(replicaId, result). // fetch time gets updated here, but mayBeShrinkIsr should have been already called and the replica is removed from isr else result } val logReadResults = readFromLog() {code} Attached the graphs of disk weighted io time stats when this issue occurred. I will raise a [KIP|https://cwiki.apache.org/confluence/display/KAFKA/KIP-501+Avoid+offline+partitions+in+the+edgcase+scenario+of+follower+fetch+requests+not+processed+in+time] describing options on how to handle this scenario. was: We found offline partitions issue multiple times on some of the hosts in our clusters. After going through the broker logs and hosts’s disk stats, it looks like this issue occurs whenever the read/write operations take more time on that disk. In a particular case where read time is more than the replica.lag.time.max.ms, follower replicas will be out of sync as their earlier fetch requests are stuck while reading the local log and their fetch status is not yet updated as mentioned in the below code of `ReplicaManager`. If there is an issue in reading the data from the log for a duration more than replica.lag.time.max.ms then all the replicas will be out of sync and partition becomes offline if min.isr.replicas > 1 and unclean.leader.election is false. {code:java} def readFromLog(): Seq[(TopicPartition, LogReadResult)] = { val result = readFromLocalLog( // this call took more than `replica.lag.time.max.ms` replicaId = replicaId, fetchOnlyFromLeader = fetchOnlyFromLeader, readOnlyCommitted = fetchOnlyCommitted, fetchMaxBytes = fetchMaxBytes, hardMaxBytesLimit = hardMaxBytesLimit, readPartitionInfo = fetchInfos, quota = quota, isolationLevel = isolationLevel) if (isFromFollower) updateFollowerLogReadResults(replicaId, result). // fetch time gets updated here, but mayBeShrinkIsr should have been already called and the replica is removed from sir else result } val logReadResults = readFromLog() {code} Attached the graphs of disk weighted io time stats when this issue occurred. I will raise a [KIP|https://cwiki.apache.org/confluence/display/KAFKA/KIP-501+Avoid+offline+partitions+in+the+edgcase+scenario+of+follower+fetch+requests+not+processed+in+time] describing options on how to handle this scenario. > Offline partitions occur when leader's disk is slow in reads while responding > to follower fetch requests. > - > > Key: KAFKA-8733 > URL: https://issues.apache.org/jira/browse/KAFKA-8733 > Project: Kafka > Issue Type: Bug > Components: core >Affects Versions: 1.1.2, 2.4.0 >Reporter: Satish Duggana >Assignee: Satish Duggana >Priority: Critical > Attachments: weighted-io-time-2.png, wio-time.png > > > We found offline partitions issue multiple times on some of the hosts in our > clusters. After going through the broker logs and hosts’s disk stats, it > looks like this issue occurs whenever the read/write operations take more > time on that disk. In a particular case where read time is more than the > replica.lag.time.max.ms, follower replicas will be out of sync as their > earlier fetch requests are stuck while reading the local log and their fetch > status is not yet updated as mentioned in the below code of `ReplicaManager`. > If there is an issue in reading the data from the log for a duration m
[jira] [Commented] (KAFKA-8600) Replace DescribeDelegationToken request/response with automated protocol
[ https://issues.apache.org/jira/browse/KAFKA-8600?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896966#comment-16896966 ] Viktor Somogyi-Vass commented on KAFKA-8600: Hey [~mimaison] are you working on this? I probably would like to get this done as I'm working on this part anyway as part of KIP-373. Would you be fine if I implemented this? > Replace DescribeDelegationToken request/response with automated protocol > > > Key: KAFKA-8600 > URL: https://issues.apache.org/jira/browse/KAFKA-8600 > Project: Kafka > Issue Type: Sub-task >Reporter: Mickael Maison >Assignee: Mickael Maison >Priority: Major > -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Created] (KAFKA-8739) rejoining broker fails to sanity check existing log segments
sanjiv marathe created KAFKA-8739: - Summary: rejoining broker fails to sanity check existing log segments Key: KAFKA-8739 URL: https://issues.apache.org/jira/browse/KAFKA-8739 Project: Kafka Issue Type: Bug Components: replication Affects Versions: 2.3.0 Reporter: sanjiv marathe kafka claims it can be used as a storage. But following scenario proves other wise. # Consider a topic with single partition, repl-factor 2, with two brokers, say A and B with A is the leader. # Broker B fails due to sector errors. Sysadmin fixes the issues and brings it up again after a few minutes. A few log segments are lost/corrupted. # Broker B catches up with missed out msgs by fetching them from the leader A, but does not realize the issue with earlier log segments. # Broker A fails, B becomes the leader. # A new consumer requests msgs from the beginning. Broker B fails to deliver msgs belonging to corrupted log segments. Suggested solution A broker, immediately after coming up, should go through a sanity check, e.g. CRC check of its log segments. Any corrupted/lost, should be refetched from the leader. -- This message was sent by Atlassian JIRA (v7.6.14#76016)
[jira] [Commented] (KAFKA-8041) Flaky Test LogDirFailureTest#testIOExceptionDuringLogRoll
[ https://issues.apache.org/jira/browse/KAFKA-8041?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896845#comment-16896845 ] Bruno Cadonna commented on KAFKA-8041: -- https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/654/testReport/junit/kafka.server/LogDirFailureTest/testIOExceptionDuringLogRoll/ > Flaky Test LogDirFailureTest#testIOExceptionDuringLogRoll > - > > Key: KAFKA-8041 > URL: https://issues.apache.org/jira/browse/KAFKA-8041 > Project: Kafka > Issue Type: Bug > Components: core, unit tests >Affects Versions: 2.0.1, 2.3.0 >Reporter: Matthias J. Sax >Priority: Critical > Labels: flaky-test > Fix For: 2.4.0 > > > [https://builds.apache.org/blue/organizations/jenkins/kafka-2.0-jdk8/detail/kafka-2.0-jdk8/236/tests] > {quote}java.lang.AssertionError: Expected some messages > at kafka.utils.TestUtils$.fail(TestUtils.scala:357) > at kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:787) > at > kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:189) > at > kafka.server.LogDirFailureTest.testIOExceptionDuringLogRoll(LogDirFailureTest.scala:63){quote} > STDOUT > {quote}[2019-03-05 03:44:58,614] ERROR [ReplicaFetcher replicaId=1, > leaderId=0, fetcherId=0] Error for partition topic-6 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-03-05 03:44:58,614] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=0] Error for partition topic-0 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=0] Error for partition topic-10 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=0] Error for partition topic-4 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=0] Error for partition topic-8 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-03-05 03:44:58,615] ERROR [ReplicaFetcher replicaId=1, leaderId=0, > fetcherId=0] Error for partition topic-2 at offset 0 > (kafka.server.ReplicaFetcherThread:76) > org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server > does not host this topic-partition. > [2019-03-05 03:45:00,248] ERROR Error while rolling log segment for topic-0 > in dir > /home/jenkins/jenkins-slave/workspace/kafka-2.0-jdk8/core/data/kafka-3869208920357262216 > (kafka.server.LogDirFailureChannel:76) > java.io.FileNotFoundException: > /home/jenkins/jenkins-slave/workspace/kafka-2.0-jdk8/core/data/kafka-3869208920357262216/topic-0/.index > (Not a directory) > at java.io.RandomAccessFile.open0(Native Method) > at java.io.RandomAccessFile.open(RandomAccessFile.java:316) > at java.io.RandomAccessFile.(RandomAccessFile.java:243) > at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:121) > at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:12) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) > at kafka.log.AbstractIndex.resize(AbstractIndex.scala:115) > at kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:184) > at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:12) > at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:251) > at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:184) > at kafka.log.LogSegment.onBecomeInactiveSegment(LogSegment.scala:501) > at kafka.log.Log.$anonfun$roll$8(Log.scala:1520) > at kafka.log.Log.$anonfun$roll$8$adapted(Log.scala:1520) > at scala.Option.foreach(Option.scala:257) > at kafka.log.Log.$anonfun$roll$2(Log.scala:1520) > at kafka.log.Log.maybeHandleIOException(Log.scala:1881) > at kafka.log.Log.roll(Log.scala:1484) > at > kafka.server.LogDirFailureTest.testProduceAfterLogDirFailureOnLeader(LogDirFailureTest.scala:154) > at > kafka.server.LogDirFailureTest.testIOExceptionDuringLogRoll(LogDirFailureTest.scala:63) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at >
[jira] [Commented] (KAFKA-8460) Flaky Test PlaintextConsumerTest#testLowMaxFetchSizeForRequestAndPartition
[ https://issues.apache.org/jira/browse/KAFKA-8460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16896842#comment-16896842 ] Bruno Cadonna commented on KAFKA-8460: -- https://builds.apache.org/job/kafka-pr-jdk11-scala2.13/654/testReport/kafka.api/PlaintextConsumerTest/testLowMaxFetchSizeForRequestAndPartition/ > Flaky Test PlaintextConsumerTest#testLowMaxFetchSizeForRequestAndPartition > --- > > Key: KAFKA-8460 > URL: https://issues.apache.org/jira/browse/KAFKA-8460 > Project: Kafka > Issue Type: Bug >Reporter: Boyang Chen >Priority: Major > > [https://builds.apache.org/job/kafka-pr-jdk11-scala2.12/5168/consoleFull] > *16:17:04* kafka.api.PlaintextConsumerTest > > testLowMaxFetchSizeForRequestAndPartition FAILED*16:17:04* > org.scalatest.exceptions.TestFailedException: Timed out before consuming > expected 2700 records. The number consumed was 1980.*16:17:04* at > org.scalatest.Assertions.newAssertionFailedException(Assertions.scala:530)*16:17:04* > at > org.scalatest.Assertions.newAssertionFailedException$(Assertions.scala:529)*16:17:04* > at > org.scalatest.Assertions$.newAssertionFailedException(Assertions.scala:1389)*16:17:04* > at org.scalatest.Assertions.fail(Assertions.scala:1091)*16:17:04* at > org.scalatest.Assertions.fail$(Assertions.scala:1087)*16:17:04* at > org.scalatest.Assertions$.fail(Assertions.scala:1389)*16:17:04* at > kafka.utils.TestUtils$.waitUntilTrue(TestUtils.scala:789)*16:17:04* at > kafka.utils.TestUtils$.pollRecordsUntilTrue(TestUtils.scala:765)*16:17:04* at > kafka.api.AbstractConsumerTest.consumeRecords(AbstractConsumerTest.scala:156)*16:17:04* > at > kafka.api.PlaintextConsumerTest.testLowMaxFetchSizeForRequestAndPartition(PlaintextConsumerTest.scala:801)*16:17:04* -- This message was sent by Atlassian JIRA (v7.6.14#76016)